RT Journal Article SR Electronic T1 Genome Graphs JF bioRxiv FD Cold Spring Harbor Laboratory SP 101378 DO 10.1101/101378 A1 Adam M. Novak A1 Glenn Hickey A1 Erik Garrison A1 Sean Blum A1 Abram Connelly A1 Alexander Dilthey A1 Jordan Eizenga A1 M. A. Saleh Elmohamed A1 Sally Guthrie A1 André Kahles A1 Stephen Keenan A1 Jerome Kelleher A1 Deniz Kural A1 Heng Li A1 Michael F. Lin A1 Karen Miga A1 Nancy Ouyang A1 Goran Rakocevic A1 Maciek Smuga-Otto A1 Alexander Wait Zaranek A1 Richard Durbin A1 Gil McVean A1 David Haussler A1 Benedict Paten YR 2017 UL http://biorxiv.org/content/early/2017/01/18/101378.abstract AB There is increasing recognition that a single, monoploid reference genome is a poor universal reference structure for human genetics, because it represents only a tiny fraction of human variation. Adding this missing variation results in a structure that can be described as a mathematical graph: a genome graph. We demonstrate that, in comparison to the existing reference genome (GRCh38), genome graphs can substantially improve the fractions of reads that map uniquely and perfectly. Furthermore, we show that this fundamental simplification of read mapping transforms the variant calling problem from one in which many non-reference variants must be discovered de-novo to one in which the vast majority of variants are simply re-identified within the graph. Using standard benchmarks as well as a novel reference-free evaluation, we show that a simplistic variant calling procedure on a genome graph can already call variants at least as well as, and in many cases better than, a state-of-the-art method on the linear human reference genome. We anticipate that graph-based references will supplant linear references in humans and in other applications where cohorts of sequenced individuals are available.