TY - JOUR T1 - Extensive sequencing of seven human genomes to characterize benchmark reference materials JF - bioRxiv DO - 10.1101/026468 SP - 026468 AU - Justin M. Zook AU - David Catoe AU - Jennifer McDaniel AU - Lindsay Vang AU - Noah Spies AU - Arend Sidow AU - Ziming Weng AU - Yuling Liu AU - Chris Mason AU - Noah Alexander AU - Elizabeth Henaff AU - Feng Chen AU - Erich Jaeger AU - Ali Moshrefi AU - Khoa Pham AU - William Stedman AU - Tiffany Liang AU - Michael Saghbini AU - Zeljko Dzakula AU - Alex Hastie AU - Han Cao AU - Gintaras Deikus AU - Eric Schadt AU - Robert Sebra AU - Ali Bashir AU - Rebecca M. Truty AU - Christopher C. Chang AU - Natali Gulbahce AU - Keyan Zhao AU - Srinka Ghosh AU - Fiona Hyland AU - Yutao Fu AU - Mark Chaisson AU - Chunlin Xiao AU - Jonathan Trow AU - Stephen T. Sherry AU - Alexander W. Zaranek AU - Madeleine Ball AU - Jason Bobe AU - Preston Estep AU - George M. Church AU - Patrick Marks AU - Sofia Kyriazopoulou-Panagiotopoulou AU - Grace X.Y. Zheng AU - Michael Schnall-Levin AU - Heather S. Ordonez AU - Patrice A. Mudivarti AU - Kristina Giorda AU - Ying Sheng AU - Karoline Bjarnesdatter Rypdal AU - Marc Salit Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/12/23/026468.abstract N2 - The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode™ WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly. ER -