RT Journal Article SR Electronic T1 Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly JF bioRxiv FD Cold Spring Harbor Laboratory SP 072116 DO 10.1101/072116 A1 Valerie A. Schneider A1 Tina Graves-Lindsay A1 Kerstin Howe A1 Nathan Bouk A1 Hsiu-Chuan Chen A1 Paul A. Kitts A1 Terence D. Murphy A1 Kim D. Pruitt A1 Françoise Thibaud-Nissen A1 Derek Albracht A1 Robert S. Fulton A1 Milinn Kremitzki A1 Vince Magrini A1 Chris Markovic A1 Sean McGrath A1 Karyn Meltz Steinberg A1 Kate Auger A1 Will Chow A1 Joanna Collins A1 Glenn Harden A1 Tim Hubbard A1 Sarah Pelan A1 Jared T. Simpson A1 Glen Threadgold A1 James Torrance A1 Jonathan Wood A1 Laura Clarke A1 Sergey Koren A1 Matthew Boitano A1 Heng Li A1 Chen-Shan Chin A1 Adam M. Phillippy A1 Richard Durbin A1 Richard K. Wilson A1 Paul Flicek A1 Deanna M. Church YR 2016 UL http://biorxiv.org/content/early/2016/08/30/072116.abstract AB The human reference genome assembly plays a central role in nearly all aspects of today’s basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009 and reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that while the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.