PT - JOURNAL ARTICLE AU - Mikhail Kolmogorov AU - Joel Armstrong AU - Brian J. Raney AU - Ian Streeter AU - Matthew Dunn AU - Fengtang Yang AU - Duncan Odom AU - Paul Flicek AU - Thomas Keane AU - David Thybert AU - Benedict Paten AU - Son Pham TI - Chromosome assembly of large and complex genomes using multiple references AID - 10.1101/088435 DP - 2016 Jan 01 TA - bioRxiv PG - 088435 4099 - http://biorxiv.org/content/early/2016/11/19/088435.short 4100 - http://biorxiv.org/content/early/2016/11/19/088435.full AB - Despite the rapid development of sequencing technologies, assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout, a reference-assisted assembly tool that now works for large and complex genomes. Taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. Using Ragout, we transformed NGS assemblies of 15 different Mus musculus and one Mus spretus genomes into sets of complete chromosomes, leaving less than 5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long PacBio reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. Additionally, we applied Ragout to Mus caroli and Mus pahari genomes, which exhibit karyotype-scale variations compared to other genomes from the Muridae family. Chromosome color maps confirmed most large-scale rearrangements that Ragout detected.