Abstract
One of the main causes of cancer mortality is tumor evolution to therapy-resistant disease. Drug resistance may emerge from the rise of ancestral clones that gain fitness through therapy-induced natural selection. Previously, it was shown that the presence of drug-resistant subclones at diagnosis or prior to therapy could be a strong predictor of poor survival, disease transformation, and refractoriness, with direct implications for disease management. Although such prognostic mutations are most commonly identified using amplicon-based or hybrid-capture deep sequencing in a clinical setting, their sensitive detection relies on the accurate analysis of background noise, specifically sequencing errors that arise from prior polymerase chain reaction cycles. In this work, we provide a comprehensive, unbiased model that precisely describes this background noise and show that it can be approximated by aggregating negative binomial (NB) distributions, using tumor-only data. We evaluate our model and its NB approximation with simulated exponentially expanded populations, as well as ultra-deep sequencing data from cell line and patient sample dilution experiments. Our method goes beyond estimating fixed detection thresholds for all variants, having the power to assess mutation-specific sensitivities that allow identification of 1-2 mutated alleles out of 10,000 wild-type. This facilitates the design of precise treatment strategies and contributes significantly to combatting drug resistance and increasing positive outcomes.