RT Journal Article SR Electronic T1 Alignathon: A competitive assessment of whole genome alignment methods JF bioRxiv FD Cold Spring Harbor Laboratory SP 003285 DO 10.1101/003285 A1 Dent Earl A1 Ngan Nguyen A1 Glenn Hickey A1 Robert S. Harris A1 Stephen Fitzgerald A1 Kathryn Beal A1 Igor Seledtsov A1 Vladimir Molodtsov A1 Brian J. Raney A1 Hiram Clawson A1 Jaebum Kim A1 Carsten Kemena A1 Jia-Ming Chang A1 Ionas Erb A1 Alexander Poliakov A1 Minmei Hou A1 Javier Herrero A1 Victor Solovyev A1 Aaron E. Darling A1 Jian Ma A1 Cedric Notredame A1 Michael Brudno A1 Inna Dubchak A1 David Haussler A1 Benedict Paten YR 2014 UL http://biorxiv.org/content/early/2014/03/10/003285.abstract AB Background Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark datasets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole genome alignment (WGA).Results Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments, and assessments were performed collectively after all the submissions were received. Three datasets were used: two of simulated primate and mammalian phylogenies, and one of 20 real fly genomes. In total 35 submissions were assessed, submitted by ten teams using 12 different alignment pipelines.Conclusions We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable difference in the alignment quality of differently annotated regions, and found few tools aligned the duplications analysed. We found many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all datasets, submissions and assessment programs for further study, and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.