RT Journal Article SR Electronic T1 Computational Pan-Genomics: Status, Promises and Challenges JF bioRxiv FD Cold Spring Harbor Laboratory SP 043430 DO 10.1101/043430 A1 The Computational Pan-Genomics Consortium A1 Tobias Marschall A1 Manja Marz A1 Thomas Abeel A1 Louis Dijkstra A1 Bas E. Dutilh A1 Ali Ghaffaari A1 Paul Kersey A1 Wigard P. Kloosterman A1 Veli Mäkinen A1 Adam M. Novak A1 Benedict Paten A1 David Porubsky A1 Eric Rivals A1 Can Alkan A1 Jasmijn Baaijens A1 Paul I. W. De Bakker A1 Valentina Boeva A1 Raoul J. P. Bonnal A1 Francesca Chiaromonte A1 Rayan Chikhi A1 Francesca D. Ciccarelli A1 Robin Cijvat A1 Erwin Datema A1 Cornelia M. Van Duijn A1 Evan E. Eichler A1 Corinna Ernst A1 Eleazar Eskin A1 Erik Garrison A1 Mohammed El-Kebir A1 Gunnar W. Klau A1 Jan O. Korbel A1 Eric-Wubbo Lameijer A1 Benjamin Langmead A1 Marcel Martin A1 Paul Medvedev A1 John C. Mu A1 Pieter Neerincx A1 Klaasjan Ouwens A1 Pierre Peterlongo A1 Nadia Pisanti A1 Sven Rahmann A1 Ben Raphael A1 Knut Reinert A1 Dick de Ridder A1 Jeroen de Ridder A1 Matthias Schlesner A1 Ole Schulz-Trieglaff A1 Ashley D. Sanders A1 Siavash Sheikhizadeh A1 Carl Shneider A1 Sandra Smit A1 Daniel Valenzuela A1 Jiayin Wang A1 Lodewyk Wessels A1 Ying Zhang A1 Victor Guryev A1 Fabio Vandin A1 Kai Ye A1 Alexander Schönhuth YR 2016 UL http://biorxiv.org/content/early/2016/08/25/043430.abstract AB Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic datasets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this paper, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies, and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.