Abstract
Mutational processes in tumors leave tell-tale genomic signatures composed of “passenger” mutations and mutations that have quantifiable effects on the proliferation and survival of cancer cell lineages. We identify the contributions of mutational processes to each oncogenic variant, quantifying responsibility for origination of changes at oncogenic variant sites contributing to tumorigenesis in 23 cancer types. We demonstrate that the variants driving melanomas and lung cancers are predominantly attributable to the actionable, preventable, exogenous mutational processes of ultraviolet light and tobacco exposure, whereas gliomas and prostate adenocarcinomas are largely attributable to endogenous processes associated with aging. Preventable mutations associated with pathogen exposure and APOBEC activity account for a large proportion of the cancer effect within head and neck, bladder, cervical, and breast cancers. These attributions complement epidemiological approaches—revealing the burden of cancer driven by single-nucleotide variants caused by either endogenous or exogenous, non-actionable or actionable processes, and crucially inform cancer prevention.
Introduction
In the past half-century, our understanding of the origins of cancers has progressed from a deep mystery to a widespread acceptance that cancers are the outcome of an evolutionary process driven by mutation, consequent genetic variation, and selection on that genetic variation (Merlo et al., 2006; Nowell, 1976; Somarelli et al., 2020). Epidemiological studies have established both an association with age (Siegel et al., 2020) and causation by exposure to carcinogens (Smith et al., 2016), demonstrating that endogenous processes and exogenous mutagens can increase the rate of mutations (Barnes et al., 2018), create somatic genetic variation (Yates and Campbell, 2012), and increase the rate of cancer (Golemis et al., 2018; Greaves, 2015). In recent years, large-scale analyses of whole-exome and whole-genome tumor sequencing have been able to recover characteristic tissue-specific signatures of these underlying mutagenic processes in the patterns of variants that have suffused cancer genomes (Alexandrov et al., 2020). However, the specific cancer driver architecture within each kind of cancer tissue has also been demonstrated to be predictable (Hosseini et al., 2019) and, crucially, circumscribed (Venkatesan et al., 2017). Therefore, the causation of cancer by each mutational processes is not determined solely by their effect on mutation rate nor upon the amount of somatic genetic variation they induce, but critically depends upon the degree to which the specific mutations they supply provide selective advantages to clonal lineages within tissues that give rise to cancer.
To evaluate selective advantages requires knowledge of mutation bias, for which characteristic patterns have long been attributed to specific tissues (Brash et al., 1991; Pfeifer, 2015; Pfeifer et al., 2002; Poon et al., 2014). These patterns can be validated within model organisms under laboratory conditions and attributed to specific mutagenic sources (Segovia et al., 2015); numerous algorithms have been developed to deconvolve the total substitution load to its constituent mutational signatures (Grolleman et al., 2019). Application of these algorithms to whole-exome or whole-genome data recapitulates underlying mutation rates in their trinucleotide context without bias from natural selection because the vast majority of mutations are accumulating neutrally (Cannataro and Townsend, 2018; Greenman et al., 2007). Nevertheless, Poulos et al (2018) demonstrated that known major driver mutations are statistically associated with specific mutational signatures. Therefore, specific mutagenic processes in different tissues are driving tumorigenesis via mutations in genes that confer a survival and proliferative advantage to somatic cells. Estimation of the effects of each mutagenic process on the development of cancer requires quantification of the effects of each single nucleotide variant (SNV) toward tumorigenesis.
Quantification of the cancer effect of mutations requires estimation of their relative impact on cancer lineage survival and replication, an estimation that critically depends on an understanding of the baseline rate of mutation in the absence of natural selection (Cannataro and Townsend, 2018). Ostrow et al. (2014) performed a comprehensive analysis of ratios of non-synonymous change to synonymous change to quantify genomic natural selection in the somatic evolution of cancer. This approach has been applied in the field of evolutionary biology for decades and has recently been adapted to the nuances of cancer evolution in several meaningful ways (Shpak and Lu, 2016; Zhao et al., 2016), such as taking tissue-specific trinucleotide mutational patterns into account (c.f. Van den Eynden and Larsson, 2017). Martincorena et al. (2017) performed an analysis using trinucleotide substitution rates and covariate-informed gene-level mutation rates to quantify gene-wide selection conferring enhanced proliferation and survival of cancer cell lineages. Temko et al. (2018) deconvolved the underlying mutational signatures in tumor sets, associated signatures and drivers, and quantified the relative intragenic selection of the SNVs in a selection of high-burden driver genes. Cannataro et al. (2018) quantified the site-specific selective effect on each SNV during primary tumor development by determining the constituent mutational signatures driving mutation load in each tumor, coupling these rates with covariate-informed gene-level mutation rates, and quantifying their contribution to cancer cell lineage survival and reproduction in comparison to the convolved baseline mutation rate. These cancer drivers—and their relative effect—may be related back to the mechanisms driving genomic variation, i.e., the processes behind the detected mutational signatures.
Mutagenic environmental exposures have been correlated to specific cancer incidences by epidemiological studies spanning the previous 70 years (Doll and Hill, 1950; Loeb and Harris, 2008). Recently, cancer incidence has also been correlated with tissue-specific stem cell division numbers (Tomasetti et al., 2017; Tomasetti and Vogelstein, 2015), which has been interpreted as evidence that cancers are mainly driven by endogenous, i.e., aging or “bad luck”, effects. Other analyses dispute this conclusion, pointing out that it is confounded by the sensitivity of rapidly dividing tissues to exogenous mutational sources (Ashford et al., 2015; Wu et al., 2016), and by the exclusion of cancer types with known environmental causes (Wild et al., 2015). To determine the relative contributions of endogenous and exogenous processes on cancer phenotypes, tumor sequence data can be used to parameterize the magnitude of age-associated, exogenous and actionable mutational processes that contribute to molecular variation and the consequent cancer effects of each mutation attributable to these processes on tumorigenesis. Such analyses of the evolutionary dynamics driving tumorigenesis back to the sources of the heterogeneity fueling cancer evolution are essential to the advancement of our understanding of oncogenesis and cancer prevention.
Here we analyze the signatures of mutational processes in diverse cancer types. We quantify the cancer effect size of consequent single-nucleotide variants. We determine which cancer drivers in each tumor are attributable to actionable, and preventable, sources of mutagenesis. We quantify the contribution of each mutagenic process to cancer effect in individual patient tumors, and their relative contribution across tumors within sampled cancer types. We identify cancer types where the discrepancy between mutagenic input and cancer effect is largest, and smallest, and analyze which mutagenic processes are most proportionally discrepant with their cancer effect within each cancer type. This analysis enables comparison of the proportions of cancer effect attributable to age-associated processes to the proportions of cancer effect attributable to putatively preventable mutagenic processes such ultraviolet light exposure, tobacco smoking or chewing, and APOBEC mutagenesis, addressing a longstanding controversy regarding the role of endogenous “bad luck” and exogenous exposure to tumorigenesis—and moreover, informing the benefits of prevention of mutation in the prevention of cancer.
Methods
To attribute the increased cellular reproduction and survival conferred by single nucleotide variants responsible for cancer growth to their underlying mutational sources we determined the sources of mutation within individual tumors, calculated the effect size of each single nucleotide substitution among tumors in each tumor type, and evaluated the likelihood that each of these substitutions was the product of each mutational source within each tumor. Thus, single-nucleotide substitutions responsible for the largest influence on cellular division and survival, and hence the cancer phenotype, may be attributed to the root sources of molecular variation within each somatic tissue. We analyzed the pan-cancer whole-exome tumor sequencing dataset curated in (Cannataro et al., 2018), except all Yale-Gilead tumors that might have been treated with chemotherapies were removed (removed tumors in Table S1). Scripts used to perform these analyses are available online (Townsend-Lab-Yale, n.d.).
Attributing Sources of Mutation within Tumors
To attribute observed sets of substitutions in tumors to the underlying sources of mutations, we used the R package deconstructSigs (Rosenthal et al., 2016) to extract version 3.1 COSMIC mutational signatures from each tumor’s set of non-recurrent substitutions. We excluded recurrent variants because they are much more likely to be under selection in the cancer cell population; non-recurrent mutations more accurately reflect mutational influx. To minimize signature bleeding because some COSMIC signatures share similar mutational profiles, we limited the number of signatures detectable in each tumor type to those signatures detected at any prevalence in tumors of that type previously by Alexandrov et al. (2020), with the addition of enabling inference of SBS16 within esophageal squamous cell carcinoma (Li et al., 2018). We also applied the recommended minimum threshold for the number of substitutions necessary to attribute to a signature associated with increased mutagenesis. For example, signatures attributable to defective DNA mismatch repair were only allowed in tumors with over 200 substitutions (Alexandrov et al., 2020). Some tumors analyzed exhibited fewer than 50 substitutions (Supplemental Fig. S1)—a threshold below which precise deconvolution of mutational signatures becomes problematic (Rosenthal et al., 2016). For these tumors, we mixed the deconstructSigs estimates of the signature weights for the specific tumor with the average signature weights for the tumors with 50 or more substitutions of the same tumor type, weighting the former in proportion to the number of variants in the tumor out of 50.
As some COSMIC signatures have been attributed to artifactual processes such as sample handling and sequencing, we focus on the tumor-type-specific subset of signatures B that represent biologically relevant mutational processes (Alexandrov et al., 2020). The fitted weights of signatures in B reflect the relative rates that their underlying mutational processes contribute mutations. To determine the tumor-specific relative weight wi of a biological signature i ∈ B, we divided its fitted weight by the sum of the fitted weights of all biologically associated signatures; i.e.,
Calculating the mutational source variant weight
After calculating the tumor-specific relative weights of mutational sources (Fig. 1A) as described in Eq. 1 (Fig 1B), we used the trinucleotide-context-specific relative mutation rates defined for mutational signatures to calculate the probabilities that each variant in each tumor derives from the mutational processes underlying each biological mutational signature. Let ψ be a matrix of trinucleotide-context-specific relative mutation rates for the signatures in B, with ψi,j being the rate for signature i of trinucleotide-context-specific mutation j. Retrospectively, the probability that a single-nucleotide variant constituting a trinucleotide-context-specific mutation j derived from the process underlying mutational signature i in tumor n is where each wk,n is the relative weight of signature k in tumor n (Eq. 1).
For a given variant, we can scale these probabilities by the impact of the variant on the survival and proliferation of cancer cells—that is, the variant’s cancer effect size—to quantify the relative contributions of each mutational source to the cancer.
Calculating the Effect Sizes of Variant Substitutions in Tumors
To attribute a cancer effect to each substitution, we used the R package cancereffectsizeR, version 2.1.2 (Cannataro et al., 2018). As described in our previous work, the package’s underlying model assumes that substitutions fix in accord with a Poisson distribution at the rate that mutations arise (mutation rate μ) multiplied by their cancer effect size γ. The term γ is , where λ is the rate of substitutions, N is the effective population size of cancer cells, and u(s) is the probability of fixation of a new mutation as a function of the selection coefficient s, from population genetic theory. Thus, for every variant, we calculated the effect size that maximized the likelihood function for the N tumors, Nj of which exhibited at least one substitution event of that variant in tumor j ≤ N, and Nk of which exhibited no such variant in tumor k ≤ N. Each tumor-specific mutation rate μ1, …, μN was calculated by extracting the mutation rate in each trinucleotide context of each variant from the tumor-specific mutational signature weights (Eq. 1) and convolving it with the gene-specific mutation rate as in Cannataro et al. (2018).
Calculating mutational source effect weight
The relative contribution to cancer effect of variant i from mutational process b in tumor n ≤ N, scales the effect size γi by the probability it was caused by the process underlying signature b, within each tumor, relative to all contributions of cancer effect in that tumor.
To quantify the contribution of each mutational process to the total relative cancer effect of a variant in tumors of each cancer type, we average αi,b,n across all N tumors for a fixed value of indices b and i. To quantify the proportion of population-level burden of cancer effect size contributed by each mutational process to each tumor type, we sum αi,b,n across all i variants across all N tumors for a fixed value of index b.
Results
Proportionate Contributions of Mutational Processes to Cancer Effect Can Be Calculated
To determine the sources of mutagenesis occurring in tumor samples, we deconvolved the mutational burden of each tumor into the most likely distribution of attributed single-nucleotide variant mutational signatures (Alexandrov and Zhivagui, 2018; Petljak and Alexandrov, 2016). Applied to a lung squamous cell carcinoma (LUSC) tumor sample from a single patient (MDA-1229-T), this deconvolution yielded three trinucleotide mutational signatures (age-associated #5, tobacco smoking #4, and unknown aetiology #8; Fig. 1A), each contributing to the flux of single nucleotide variants in the tumor at a calculated weight (Fig. 1B). The trinucleotide signature weight or combination of trinucleotide signature weights contributing to a specific variant (Fig. 1A) times the proportion of mutational causation attributable to each corresponding cancer effect (Fig. 1B) provides the probability each source contributed to each variant in this tumor (Fig. 1C). In this instance, age-associated signature #5 is the most likely contributor to TP53 R282W and NFE2L2 R34P, whereas tobacco smoking is the most likely source of mutations causing OR13G1 L79L. However, only a few of the mutations that occur in somatic tissue are thought to be selected for their effects on growth or survival, and therefore causative of cancer, and the level of causation is presumably quantitative—i.e., the mutations in a type of cancer that drive cancer are responsible to different degrees for the manifestation of a cancer phenotype (Cannataro et al., 2018). In this case, TP53 R282W has a higher cancer effect size than NFE2L2 R34P, and the odorant receptor mutation OR13G1 L79L has negligible to no effect. The product of the probability that each mutational source contributed to each variant in this tumor and the effect of the specific variant (Fig. 1D) quantifies the probability-weighted cancer effect for each variant by each source (Fig. 1E). Summing the probability-weighted cancer effect for each source across variants yields the proportion of cancer effect attributable to each source of mutations (Fig. 1F). Age-associated mutational signature #5 contributed the highest weight in MDA-1229-T, and led to the largest estimated effect through its high probability of being causative of both the NFE2L2 R34P and TP53 R282W mutations. Via deconvolution of the mutational signatures responsible for recurrent variants in cancer and calculation of the cancer effect sizes of the nucleotide substitutions driving cancer evolution, we have calculated which mutagenic sources fueling nucleotide variation can be attributed as proportionally causative of individual tumors in patients.
Mutagenic Input and Cancer Effect from each Source Can Differ Substantially within Tumors
The match between the proportional input to total mutations by each mutagenic source (Fig. 1B) and the proportional cancer effect arising from each mutagenic source (Fig. 1F) varies in each patient’s tumor (Fig. 2). Quantifying the match by the Jensen-Shannon Divergence (JSD) between proportional mutational input and proportion of cancer causation, we found the tumor type with the lowest median divergence to be ovarian serous cystadenocarcinoma (OV, Fig. 2A). The mutational input to the OV tumor in Fig. 2A with the lowest JSD (TCGA-24-1103) was entirely attributed to the BRCA-1- and BRCA-2-associated signature (#3); thus, all cancer effects were attributable to this single source of mutation. Indeed, tumor sample TCGA-24-1103 has a somatic BRCA2 L1638E mutation. Examining a tumor at the second quartile of the distribution of JSD (TCGA-09-0366; Fig. 2A), there is a slight mismatch between the mutational input and the contribution to cancer causation—with the clock-like signature #5 exhibiting slightly more cancer effect than signature weight. Three additional tumors drawn at the median JSD, the fourth quintile, and the highest JSD demonstrate increasing degrees of mismatch between the mutational input and the contribution to cancer causation. These mismatches are even more frequent in 22 other tumor types analyzed (rectal adenocarcinoma, at approximately the second quartile of median JSD across tumor types, Fig. 2B; human papillomavirus virus negative head and neck squamous-cell carcinoma, at approximately the third quartile of median JSD across tumor types, Fig. 2C; and low-grade glioma, exhibiting the greatest median mismatch across tumor types, Fig. 2D; Supplemental Fig. 2).
Mutagenic Input and Cancer Effect from each Source Can Differ Enormously among Oncogenic Variants within each Cancer Type
Many well-known processes have been established as major contributors to tumor mutation burden, such as tobacco in lung tissues, ultraviolet radiation in skin tissues, and APOBEC cytidine deaminases in bladder, cervical, and HNSC tissues. However, mutational processes are trinucleotide-specific, which leads to differences in underlying amino-acid mutation rates depending on the sequence context of each variant site. The mutational process mostly likely to originate an oncogenic variant can not only differ from variant to variant, but can also differ from the mutational process that causes the greatest number of mutations within each tumor type (Fig. 3). For instance, among actionable processes, mutations in lung adenocarcinoma and lung squamous-cell carcinoma were most frequently attributed to tobacco-associated mutagenesis (Fig. 3A–B). The high attribution of KRAS G12C mutations to this lung-specific mutagenic process explains their high frequency in LUAD compared to other RAS-driven cancer types such as pancreas or colon adenocarcinomas. Major driver variants of KRAS and TP53, in LUAD and LUSC respectively, exhibit markedly different origination rates from tobacco-associated processes. Perhaps most notable is the minimal attribution of EGFR L858R to tobacco-associated mutagenic processes. The attribution of tobacco-associated mutagenic processes to the cancer effects of KRAS G12 variants and EGFR L858R (Fig. 3A–B) are consistent with—and provide an explanation for—the increased odds of KRAS mutation in tumor tissue of ever smokers compared to never smokers, as well as the increased odds of EGFR mutation in never smokers compared to ever smokers (Chapman et al., 2016). Even nucleotide variants that do not cause an amino-acid substitution have quantifiable cancer effects that can be attributed to mutagenic processes—e.g. TP53 T125T, which affects splicing of the TP53 transcript (Varley et al., 2001), is attributable to tobacco in both LUSC and LUAD; Fig. 3A–B(Varley et al., 2001). Ultraviolet light (UV) is the major mutagenic process leading to both total mutations and most major oncogenic variants in primary skin cutaneous carcinoma (SKCM, Fig. 3C). SKCM oncogenic variants are dominated by the high effect size of UV-driven BRAFV600E (cf. Cannataro et al., 2018), but one major oncogenic variant common to SKCM (KIT K642E) is almost entirely attributable to age-associated processes rather than UV (Fig. 3C). Many of the high-effect mutations of CTNNB1 in LIHC are attributable to mutational processes generating (COSMIC Signature 16; Letouzé et al., 2017) (Fig. 3D). The greatest proportion of cancer effect for several oncogenic somatic variants in LIHC—such as TP53 R249S and CTNNB1 D32V—is attributable to mutagenic chemical exposure; and the greatest proportion of cancer effect in several other CTNNB1 variants are attributable to processes with as-yet unknown etiology that may in the future be linked to other mutagenic chemical exposures. Mutations in bladder urothelial carcinoma (BLCA) were most frequently attributed to APOBEC cytidine deaminases that are thought to be activated by exposure to viruses, which may be presumed to be preventable. However, 7 of the top 10 variants as determined by cancer effect were attributed to non-actionable, age-associated processes rather than to APOBEC-associated mutagenic processes. In contrast, three known cancer driver variants (FGFR3 S249C, PIK3CA E545K, and PIK3CA E542K), were almost entirely attributed to the action of APOBEC cytidine deaminases. Cervical squamous-cell carcinoma and endocervical adenocarcinoma (CESC), human-papillomavirus-negative head-and-neck squamous-cell carcinoma (HNSC HPV negative), and human-papillomavirus-positive head-and-neck squamous-cell carcinoma (HNSC HPV positive) were also dominated by APOBEC-associated mutations. CESC and HNSC also exhibited diversity in which process was most likely to originate each oncogenic variant (Cannataro et al., 2019); however, attributions of APOBEC-associated processes for the origination of oncogenic PIK3CA E542K and PIK3CA E545K mutations are consistent across multiple cancer types (cf. Fig. 3B, E–H).
Relative Mutagenic Input and Relative Cancer Effect are Specific to each Tumor Type
The mismatches between the proportional input to total mutations by each mutagenic source (Fig. 1B) and the proportional cancer effect arising from each mutagenic source (Fig. 1F) exist not only at the level of individual tumors, but also at the level of tumor types—where they indicate which mutational sources make an outsized contribution to the causation of cancer compared to their production of mutations, and vice versa. Many tumor-type mutational-signature pairs exhibit statistically significant differences between the proportional input to total mutations by each mutagenic source and the proportional cancer effect arising from each mutagenic source (continuity-corrected Wilcoxon two-sided rank-sum tests, P < 0.05; Fig. 4A). For example, APOBEC-related signatures 2 and 13 exhibit larger mutation weight than cancer effect across many cancer types, as do 17a, 21, 22, 26, 28, and 37. In contrast, the aging-associated signature 1 exhibits larger cancer effect than mutation weight across many cancer types, as does polymerase-epsilon signature 10b. In lower-grade glioma, the age-associated signature 5 constitutes much more of the mutation weight than its cancer effect (68% compared to 23%), whereas age-associated signature 1 has the opposite relationship (23% compared to 78%). This difference between the two age-associated signatures is largely attributable to the high effect size of IDH1 variants, which occur predominantly as a consequence of ACG→ATG mutations that are frequent in signature 1 and rare in signature 5. A similar contrast can be seen in thyroid adenocarcinoma, wherein the APOBEC-related signature 2 exhibits high mutation weight and virtually zero cancer effect (30% compared to 0.2%), and wherein the aging signature 5 exhibits much more cancer effect than mutation weight (68% compared to 30%). This contrast comes about because thyroid adenocarcinoma is often driven by BRAF V600E mutations that convey enormous cancer effects, and BRAF V600E mutations come about frequently as a consequence of GTG→GAG mutations that are found at low frequency within the aging signature 5, but are found at extremely low frequency within APOBEC signature 2.
Preventable Mutational Processes Contribute Substantially to Causation of Skin, Head-and-Neck, Cervical, and Lung Cancer
Among the non-age-related etiologies are a number of mutational processes that are putatively “preventable”—in that they can be mitigated by individual behaviors or interventions (Fig. 4B–C). Skin cancer, lung cancer, HPV-positive head and neck cancer, and cervical cancer are notable for the dominant role of putatively preventable processes underlying both raw SNV mutation weight (Fig. 4A) and cancer effect (Fig. 4B). Lower-grade glioma, glioblastoma, and prostate adenocarcinoma are notable for the lack of putatively preventable processes underlying both raw mutation weight and cancer effect. However, the mutation weights and cancer effects are not the same. For example, 57% of the mutation weight of thyroid adenocarcinoma is associated with APOBEC processes that might be preventable by avoiding viral infections. However, these mutations contribute only 2.9% of the cancer effect (P < 0.001, Wilcoxon rank-sum test), so that preventing APOBEC-associated mutation would likely do little to prevent the majority of THCA cancer. A contrasting case is the lung cancers: the net cancer effects of the SNVs attributable to tobacco chewing (3% in LUAD and 4% in LUSC) and tobacco smoking (44% in LUAD and 24% in LUSC) are larger than the mutation weights of these sources of mutagenesis (2% and 2% for tobacco chewing and 35% and 19% for smoking in LUAD and LUSC, respectively; P < 0.001 for signature 4 for both lung cancer types and for signature 29 in LUSC; P = 0.008 for signature 29 in LUAD; Wilcoxon rank-sum test).
Age-associated Mutational Processes Contribute Substantially to Causation of Glioma, Prostate, Thyroid, Pancreatic, and Colorectal cancer
Because each mutagenic process is linked to a trinucleotide variant signature that has been identified as clock-like (ubiquitous and age-associated) or non-clocklike (Alexandrov et al., 2020, not associated with age; 2015), the proportion of total mutations attributable to clock-like processes and non-clocklike processes in each cancer type can be quantified (Fig. 4B). Among tissues, the cancer types with the greatest proportion of total mutations contributed by non-clocklike processes are melanoma (primary and metastatic), cervical squamous cell carcinoma and endocervical adenocarcinoma, and head and neck cancers. Lower grade glioma exhibits the greatest proportion of total mutations contributed by age-associated, “clocklike” processes. Moreover, with regard to the explanation of tumorigenesis and cancer incidence, the cancer effect attributable to age-associated processes and non-age-associated processes in each cancer type can be quantified (Fig. 4C). Among tumor tissues, those with the greatest proportion of cancer effect contributed by age-associated processes are gliomas (LGG, GBM) and prostate adenocarcinomas, consistent with the strong association of the incidence of these cancers with age (Dubrow and Darefsky, 2011; Rawla, 2019), as well as pancreatic cancers. Primary and metastatic melanoma, lung adenocarcinoma, and HPV-positive head and neck squamous-cell carcinoma exhibit the greatest proportion of cancer effect contributed by non-age-associated processes, consistent with the strong association of these cancers with exogenous factors (UV exposure, smoking, and HPV infection).
Our analysis attributes an amount of cancer causation to such endogenous and inactionable processes that varies widely among cancer types. Cancer types varied in the degree to which their causation was associated with COSMIC signature #1, which correlates with stem cell division in different tissues and represents the processes associated with the mitotic clock (Tomasetti et al., 2017; Fig. 3C; c.f. Tomasetti and Vogelstein, 2015), ranging from extremely small contributions (<0.05%) in THCA, LUSC, SKCM, KIRC, LUAD, LIHC, and BLCA, to 76% of the cancer effect in LGG. Combining the replication-associated and non-replication-associated aging signatures, causation attributable to all aging-associated processes ranged from 13% in SKCM to 99% in LGG; in around half of the cancer types a minority of cancer causation was attributable to age-associated processes. Age has not been associated across cancer types with any of the signatures that have unknown etiology. Greater than 50% of the cancer effect leading to KIRC is attributed to unknown mutational processes; OV, PRAD, and BRCA (ER-) all have >30% of their cancer effect caused by currently unknown or unattributed mutational processes (Fig. 4C).
Discussion
Here we have shown that the impact on carcinogenesis of mutagenic processes associated with single-nucleotide variant signatures can be quantified. This quantification is distinct from the number or proportion of mutations that can be attributed to a process, because it accounts for the extent to which each mutation contributes to the cancer phenotype—increased replicative and survival advantage in each tissue and cancer type—via single-nucleotide variants. We have shown how to use the proportions of observed mutations in a tumor caused by each signature to calculate the probability that each mutational source contributed to each variant in this tumor. Each of these probabilities serves to weight the cancer effect size of each variant, yielding the probability-weighted portion of effect size for each variant attributable to each source of mutations and thus the proportion of cancer-causation attributable to each source of mutations. In turn, the quantification of cancer-causation within each tumor, characterized across a population of patients, provides a reductionist molecular approach toward quantifying the degree to which a process can be held responsible for carcinogenesis in a cancer type that is wholly distinct from traditional epidemiological studies.
Our analysis of the cancer effects of single-nucleotide mutations and associated signatures has been enabled by quantitative estimates of their intrinsic mutation rates (Fousteri and Mullenders, 2008; Lawrence et al., 2013; Stamatoyannopoulos et al., 2009). Deconvolution of the quantitative contributions of known mutation signatures explains the high prevalence of KRAS G12C and low prevalence of EGFR L858R in ever-smokers, and the converse relationships in never-smokers. It illuminates the potent role of ultraviolet light in BRAF V600E-driven melanoma. It attributes major drivers PIK3CA E542K and E545K to the potentially virally-induced action of APOBEC cytidine deaminases, and highlights unknown processes that deserve further identification such as those underlying high-cancer-effect single-nucleotide variants of LIHC. Importantly, germline variants, copy-number variation, epigenetic alterations, and changes to the aging tissue microenvironment also contribute to the cancer phenotype (Laconi et al., 2020; Liggett and DeGregori, 2017; Montgomery et al., 2018; Mroz et al., 2015; Ramakodi et al., 2016; Sun et al., 2018). Incorporation of signatures associated with these kinds of alterations (Macintyre et al., 2018) and of attributions of each signature to relevant sources would markedly increase the purview of inferred cancer causation, revealing a full picture of the importance of diverse mechanisms behind the spectrum of genomic alterations fueling cancer evolution.
For an individual cancer patient, calculation of the relative cancer effect of diverse sources of mutation provides an estimate of how much each mutagenic process is responsible for an individual’s cancer. From a public health perspective, these calculations constitute a bridge between molecular studies and long-standing epidemiological analyses that have associated behaviors (e.g., smoking) or professions (e.g., sun exposure) with cancer incidence. Public health intervention targeted at minimizing exposure to these actionable signatures would mitigate disease severity by preventing the accumulation of mutations that directly contribute to the cancer phenotype. Finally, our findings connect specific mutagenesis patterns and processes with cancer, providing a “smoking gun” that can inform individuals as to why an instance of cancer happened—and have promise to play a significant role in demonstrating individual as well as group-level cause for legal recourse due to carcinogenic exposure (e.g. Lee, 2016).
The quantification of cancer effect attributable to specific sources of mutation has evident parallels to epidemiological results that assess the effect of risk factors on cancer causation (Shield et al., 2016). These epidemiological results often rely on correlation, and calculate an increase in the probability of cancer in relation to some behavior or exposure. Calculations of the relative cancer effect of diverse sources of mutation, in principle, directly relates the mutations driving tumorigenesis to mechanistic processes. However, multiple challenges impede their use at a population level in comparison to longstanding, well-crafted epidemiological studies: 1) conducting appropriate tumor sampling—most large tumor sequencing studies are sampled haphazardly, without reference to a distinct population, without stratification or even “random” sampling; 2) formulating an “apples to apples” quantitative mapping comparing proportions of effect to odds ratios; and 3) forming a discrete mapping of mutational signatures to mechanistic processes to epidemiological factors. These attributions of cause associated with COSMIC signatures are critical to our interpretations of these results, and range in surety from well-established (e.g., UV #7 and smoking #4), to presumptive (e.g. indirect damage from UV light #38).
Recent research has touched on a debate as to what extent “bad luck”—endogenous mutagenic processes that accumulate naturally with age—plays a role in the incidence of cancer arising in various tissues. Here, we addressed the question regarding the relative contributions of exogenous and endogenous sources of mutation to tumorigenesis by quantifying the extent that specific variants are driving tumorigenesis, and attributing the variants back to the mutational processes that originally fueled their creation. We found that signatures relating to aging processes (#1 and #5) were responsible for the majority of cancer effect in tumors of the brain (LGG, GBM) and tissues with large amounts of epithelial turnover (READ, COAD, STAD, UCEC). Other tumors whose cancer effects could largely be attributed to aging include PRAD—a tumor type strongly associated with age (Bostwick et al., 2004), THCA (whose major single-nucleotide driver, BRAFV600E, is more likely to be caused by mutations associated with clock-like signature #5 than by mutations associated with other signatures), and PAAD. Several tumor types have large proportions of the cancer effect size directly attributable to mutational processes that are actionable, i.e., interventions could reduce the mutations in these tissues that are responsible for the cancer-causing variants. CESC, HNSC, and BLCA are largely driven by mutations attributed to virus-induced APOBEC activity, SKCM is largely driven by UV light exposure, and mutations responsible for increased proliferation and survival of cancerous cells within lung cancers trace back to smoking.
The importance of understanding the underlying sources of mutations that ultimately lead to cancer in each and every patient—whether they are endogenous or exogenous, and whether they come from sources that are actionable—is underscored by the remarkable successes of anti-smoking interventions against carcinogenic exposures, which have saved many lives (Holford et al., 2014). In our study, some cancer types such as KIRC, BRCA (ER-), OV, and PRAD exhibited a large proportion of cancer effect that was attributable to signatures with unknown etiology. As we gain greater insight into these mutational signatures and their diverse causative mechanisms, we may discover additional actionable mutational processes that can be mitigated by proactive public health interventions.
Author Contributions
VLC and JPT designed the research. VLC assembled all data and executed all analyses. JDM provided computational tools and expertise. VLC and JPT wrote the manuscript. All authors reviewed the manuscript.
Declaration of Interests
JPT has consulted for Black Diamond Therapeutics, Agios Pharmaceuticals, and Servier Pharmaceuticals. Other authors declare no potential conflicts of interest with the publication of this research.
Acknowledgements
Members of the Townsend Lab provided stimulating discussions and helpful feedback on this research. Funding for this research was provided by NIH 1P50DE030707, NIH 1R01LM012487, NIH 1R01CA215900, NIH 5R01 CA231112, the Yale Cancer Biology Training program (NIH T32 CA193200/CA/NCI HHS/United States), and the Elihu Professorship endowed research funds.