Multi-platform discovery of haplotype-resolved structural variation in human genomes
ABSTRACT
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, and strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per human genome. We also discover 156 inversions per genome—most of which previously escaped detection. Fifty-eight of the inversions we discovered intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The method and the dataset serve as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies.
Subject Area
- Biochemistry (11715)
- Bioengineering (8723)
- Bioinformatics (29129)
- Biophysics (14936)
- Cancer Biology (12049)
- Cell Biology (17359)
- Clinical Trials (138)
- Developmental Biology (9406)
- Ecology (14144)
- Epidemiology (2067)
- Evolutionary Biology (18268)
- Genetics (12221)
- Genomics (16767)
- Immunology (11843)
- Microbiology (28014)
- Molecular Biology (11560)
- Neuroscience (60814)
- Paleontology (450)
- Pathology (1864)
- Pharmacology and Toxicology (3231)
- Physiology (4940)
- Plant Biology (10384)
- Synthetic Biology (2878)
- Systems Biology (7333)
- Zoology (1642)