RT Journal Article SR Electronic T1 Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly repetitive transposable elements JF bioRxiv FD Cold Spring Harbor Laboratory SP 001834 DO 10.1101/001834 A1 Rajiv C. McCoy A1 Ryan W. Taylor A1 Timothy A. Blauwkamp A1 Joanna L. Kelley A1 Michael Kertesz A1 Dmitry Pushkarev A1 Dmitri A. Petrov A1 Anna-Sophie Fiston-Lavier YR 2014 UL http://biorxiv.org/content/early/2014/01/19/001834.abstract AB High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, mostly due to the presence of repeats, which cannot be reconstructed unambiguously with short read data alone. One class of repeats, called transposable elements (TEs), is particularly problematic due to high sequence identity, high copy number, and a capacity to induce complex genomic rearrangements. Despite their importance to genome function and evolution, most current de novo assembly approaches cannot resolve TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly parallel library preparation and local assembly of short read data and achieve lengths of 2-15 Kbp with an extremely low error rate (< 0.05%). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain yw;cn,bw,sp) achieving an NG50 contig size of 77.9 Kbp and covering 97.2% of the current reference genome (including heterochromatin). TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recover and accurately place 80.4% of annotated transposable elements with perfect identity to the current reference genome. As TEs are complex and highly repetitive features that are ubiquitous in genomes across the tree of life, TruSeq synthetic long-read technology offers a powerful approach to drastically improve de novo assemblies of whole genomes.