RT Journal Article SR Electronic T1 Phased Diploid Genome Assembly with Single Molecule Real-Time Sequencing JF bioRxiv FD Cold Spring Harbor Laboratory SP 056887 DO 10.1101/056887 A1 Chen-Shan Chin A1 Paul Peluso A1 Fritz J. Sedlazeck A1 Maria Nattestad A1 Gregory T. Concepcion A1 Alicia Clum A1 Christopher Dunn A1 Ronan O'Malley A1 Rosa Figueroa-Balderas A1 Abraham Morales-Cruz A1 Grant R. Cramer A1 Massimo Delledonne A1 Chongyuan Luo A1 Joseph R. Ecker A1 Dario Cantu A1 David R. Rank A1 Michael C. Schatz YR 2016 UL http://biorxiv.org/content/early/2016/06/03/056887.abstract AB While genome assembly projects have been successful in a number of haploid or inbred species, one of the current main challenges is assembling non-inbred or rearranged heterozygous genomes. To address this critical need, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble Single Molecule Real-Time (SMRT®) Sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We demonstrate the quality of this approach by assembling new reference sequences for three heterozygous samples, including an F1 hybrid of the model species Arabidopsis thaliana, the widely cultivated V. vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata that have challenged short-read assembly approaches. The FALCON-based assemblies were substantially more contiguous and complete than alternate short or long-read approaches. The phased diploid assembly enabled the study of haplotype structures and heterozygosities between the homologous chromosomes, including identifying widespread heterozygous structural variations within the coding sequences.