%0 Journal Article %A Jean Monlong %A Patrick Cossette %A Caroline Meloche %A Guy Rouleau %A Simon L. Girard %A Guillaume Bourque %T Human copy number variants are enriched in regions of low-mappability %D 2016 %R 10.1101/034165 %J bioRxiv %P 034165 %X Germline copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing can help identify CNVs, existing analytical methods suffer from limited sensitivity and specificity. Here we show that this is in large part due to the non-uniformity of read coverage, even after intra-sample normalization, and that this is exacerbated in regions of low-mappability. To improve on this, we propose PopSV, an analytical method that uses multiple samples to control for technical variation and enables the robust detection of CNVs. We show that PopSV is able to detect up to 2.7 times more variants compared to previous methods, with an accuracy of about 90%. Applying PopSV to 640 normal and cancer whole-genome datasets, we demonstrate that CNVs affect on average 7.4 million DNA bases in each individual, a 23% increase over previous estimates. Notably, we find that regions of low-mappability are approximately 8 times more likely to harbor CNVs than the rest of the genome, which contrasts with somatic CNVs that are nearly uniformly distributed. In addition to the known enrichment in segmental duplication, we also observe that CNVs are enriched near centromeres and telomeres, in specific types of satellite and short tandem repeats, and in some of the most recent families of transposable elements. Although CNVs are found to be depleted in protein-coding genes, we identify 7206 genes with at least one exonic CNV, 682 of which harbored CNVs in low-mappability regions that would have been missed by other methods. Our results provide the most exhaustive map of CNVs across the human genome to date and demonstrate the broad functional impact of this type of genetic variation including in regions of low-mappability.CNVCopy-Number Variation or Copy Number Variant.KbKilo base.RDRead-Depth, also called read coverage or depth of coverage.SVStructural Variation or Structural Variant.WGSWhole-Genome Sequencing. %U https://www.biorxiv.org/content/biorxiv/early/2016/05/09/034165.full.pdf