TY - JOUR T1 - DNA damage is a major cause of sequencing errors, directly confounding variant identification JF - bioRxiv DO - 10.1101/070334 SP - 070334 AU - Lixin Chen AU - Pingfang Liu AU - Thomas C. Evans, Jr AU - Laurence M. Ettwiller Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/08/23/070334.abstract N2 - Pervasive mutations in somatic cells generate a heterogeneous genomic population within an organism and may result in serious medical conditions. While cancer is the most studied disease associated with somatic variations, recent advances in single cell and ultra deep sequencing indicate that a number of phenotypes and pathologies are impacted by cell specific variants. Currently, the accurate identification of low allelic frequency somatic variants relies on a combination of deep sequencing coverage and multiple evidences of the presence of variants. However, in this study we show that false positive variants can account for more than 70% of identified somatic variations, rendering conventional detection methods inadequate for accurate determination of low allelic variants. Interestingly, these false positive variants primarily originate from mutagenic DNA damage which directly confounds determination of genuine somatic mutations. Furthermore, we developed and validated a simple metric to measure mutagenic DNA damage, and demonstrated that mutagenic DNA damage is the leading cause of sequencing errors in widely used resources including the 1000 Genomes Project and The Cancer Genome Atlas. ER -