TY - JOUR T1 - Computational Pan-Genomics: Status, Promises and Challenges JF - bioRxiv DO - 10.1101/043430 SP - 043430 AU - The Computational Pan-Genomics Consortium AU - Tobias Marschall AU - Manja Marz AU - Thomas Abeel AU - Louis Dijkstra AU - Bas E. Dutilh AU - Ali Ghaffaari AU - Paul Kersey AU - Wigard P. Kloosterman AU - Veli Mäkinen AU - Adam M. Novak AU - Benedict Paten AU - David Porubsky AU - Eric Rivals AU - Can Alkan AU - Jasmijn Baaijens AU - Paul I. W. De Bakker AU - Valentina Boeva AU - Raoul J. P. Bonnal AU - Francesca Chiaromonte AU - Rayan Chikhi AU - Francesca D. Ciccarelli AU - Robin Cijvat AU - Erwin Datema AU - Cornelia M. Van Duijn AU - Evan E. Eichler AU - Corinna Ernst AU - Eleazar Eskin AU - Erik Garrison AU - Mohammed El-Kebir AU - Gunnar W. Klau AU - Jan O. Korbel AU - Eric-Wubbo Lameijer AU - Benjamin Langmead AU - Marcel Martin AU - Paul Medvedev AU - John C. Mu AU - Pieter Neerincx AU - Klaasjan Ouwens AU - Pierre Peterlongo AU - Nadia Pisanti AU - Sven Rahmann AU - Ben Raphael AU - Knut Reinert AU - Dick de Ridder AU - Jeroen de Ridder AU - Matthias Schlesner AU - Ole Schulz-Trieglaff AU - Ashley D. Sanders AU - Siavash Sheikhizadeh AU - Carl Shneider AU - Sandra Smit AU - Daniel Valenzuela AU - Jiayin Wang AU - Lodewyk Wessels AU - Ying Zhang AU - Victor Guryev AU - Fabio Vandin AU - Kai Ye AU - Alexander Schönhuth Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/08/25/043430.abstract N2 - Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic datasets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this paper, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies, and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains. ER -