PT  - JOURNAL ARTICLE
AU  - Dent Earl
AU  - Ngan Nguyen
AU  - Glenn Hickey
AU  - Robert S. Harris
AU  - Stephen Fitzgerald
AU  - Kathryn Beal
AU  - Igor Seledtsov
AU  - Vladimir Molodtsov
AU  - Brian J. Raney
AU  - Hiram Clawson
AU  - Jaebum Kim
AU  - Carsten Kemena
AU  - Jia-Ming Chang
AU  - Ionas Erb
AU  - Alexander Poliakov
AU  - Minmei Hou
AU  - Javier Herrero
AU  - Victor Solovyev
AU  - Aaron E. Darling
AU  - Jian Ma
AU  - Cedric Notredame
AU  - Michael Brudno
AU  - Inna Dubchak
AU  - David Haussler
AU  - Benedict Paten
TI  - Alignathon: A competitive assessment of whole genome alignment methods
AID  - 10.1101/003285
DP  - 2014 Jan 01
TA  - bioRxiv
PG  - 003285
4099  - http://biorxiv.org/content/early/2014/03/10/003285.short
4100  - http://biorxiv.org/content/early/2014/03/10/003285.full
AB  - Background Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark datasets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole genome alignment (WGA).Results Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments, and assessments were performed collectively after all the submissions were received. Three datasets were used: two of simulated primate and mammalian phylogenies, and one of 20 real fly genomes. In total 35 submissions were assessed, submitted by ten teams using 12 different alignment pipelines.Conclusions We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable difference in the alignment quality of differently annotated regions, and found few tools aligned the duplications analysed. We found many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all datasets, submissions and assessment programs for further study, and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.