RT Journal Article SR Electronic T1 Extensive sequencing of seven human genomes to characterize benchmark reference materials JF bioRxiv FD Cold Spring Harbor Laboratory SP 026468 DO 10.1101/026468 A1 Justin M. Zook A1 David Catoe A1 Jennifer McDaniel A1 Lindsay Vang A1 Noah Spies A1 Arend Sidow A1 Ziming Weng A1 Yuling Liu A1 Chris Mason A1 Noah Alexander A1 Elizabeth Henaff A1 Feng Chen A1 Erich Jaeger A1 Ali Moshrefi A1 Khoa Pham A1 William Stedman A1 Tiffany Liang A1 Michael Saghbini A1 Zeljko Dzakula A1 Alex Hastie A1 Han Cao A1 Gintaras Deikus A1 Eric Schadt A1 Robert Sebra A1 Ali Bashir A1 Rebecca M. Truty A1 Christopher C. Chang A1 Natali Gulbahce A1 Keyan Zhao A1 Srinka Ghosh A1 Fiona Hyland A1 Yutao Fu A1 Mark Chaisson A1 Chunlin Xiao A1 Jonathan Trow A1 Stephen T. Sherry A1 Alexander W. Zaranek A1 Madeleine Ball A1 Jason Bobe A1 Preston Estep A1 George M. Church A1 Patrick Marks A1 Sofia Kyriazopoulou-Panagiotopoulou A1 Grace X.Y. Zheng A1 Michael Schnall-Levin A1 Heather S. Ordonez A1 Patrice A. Mudivarti A1 Kristina Giorda A1 Marc Salit A1 Genome in a Bottle Consortium YR 2015 UL http://biorxiv.org/content/early/2015/09/15/026468.abstract AB The Genome in a Bottle Consortium hosted by the National Institute of Standards and Technology, (NIST), is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data described come from 11 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode™ WGS, and Illumina paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available and highly characterized. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.