Computational Pan-Genomics: Status, Promises and Challenges
Abstract
Many disciplines, from human genetics and oncology to plant and animal breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic datasets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this paper, we examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies, and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. In this way, we aim to form a computational pan-genomics community that bridges several biological and computational disciplines.
Subject Area
- Biochemistry (11725)
- Bioengineering (8728)
- Bioinformatics (29135)
- Biophysics (14940)
- Cancer Biology (12052)
- Cell Biology (17363)
- Clinical Trials (138)
- Developmental Biology (9408)
- Ecology (14147)
- Epidemiology (2067)
- Evolutionary Biology (18272)
- Genetics (12223)
- Genomics (16773)
- Immunology (11844)
- Microbiology (28027)
- Molecular Biology (11564)
- Neuroscience (60841)
- Paleontology (451)
- Pathology (1864)
- Pharmacology and Toxicology (3232)
- Physiology (4940)
- Plant Biology (10405)
- Synthetic Biology (2878)
- Systems Biology (7335)
- Zoology (1642)