Integration of polygenic risk scores with modifiable risk factors improves risk prediction: results from a pan-cancer analysis

Linda Kachuri; Rebecca E. Graff; Karl Smith-Byrne; Travis J. Meyers; Sara R. Rashkin; Elad Ziv; John S. Witte; Mattias Johansson

doi:10.1101/2020.01.28.922088

ABSTRACT

Cancer risk is determined by a complex interplay of genetic and modifiable risk factors. Combining individual germline risk variants into polygenic risk scores (PRS) creates a personalized genetic susceptibility profile that can be leveraged for disease prediction. Using data from the UK Biobank cohort (413,753 individuals; 22,755 incident cases), we systematically quantify the added predictive value of augmenting conventional cancer risk factors with PRS for 16 cancer types. Our results indicate that incorporating PRS in addition to family history of cancer and modifiable risk factors improves prediction accuracy, but the magnitude of incremental improvement varies substantially between cancers. We also demonstrate the utility of PRS for risk stratification. Individuals with high genetic risk (PRS≥80^th percentile) have significantly divergent 5-year absolute risk trajectories across strata based on family history and modifiable risk factors. Finally, we estimate that high genetic risk accounts for 4.0% to 30.3% of new cancer cases, which exceeds the impact of many lifestyle-related risk factors. In summary, we provide novel quantitative data illustrating the importance of integrating PRS into personalized cancer risk assessment.

INTRODUCTION

Cancer susceptibility is inherently complex, but it is well-accepted that heritable genetic factors and modifiable exposures contribute to cancer development. While our knowledge of causal modifiable risk factors has gradually evolved over the past decades, genome-wide association studies (GWAS) have rapidly produced a wealth of germline genetic risk variants for different cancers. These studies have shed light on genetic mechanisms of cancer susceptibility, however, the public health impact of GWAS findings has been modest. In response, GWAS results have been leveraged to create polygenic risk scores (PRS) by combining weighted genotypes for risk alleles into a single, integrated measure of an individual’s genetic predisposition to a specific phenotypic profile. Such genetic risk scores are not designed to reflect the complexity of molecular susceptibility mechanisms, but they are highly amenable to phenotypic prediction.

Multiple studies have demonstrated that PRS can generate informative predictions for heritable traits^{1, 2} and diseases^{3, 4}, prompting many to advocate for increased integration of genetic risk scores into clinical practice^{5, 6}. An important step towards realizing the promise of PRS in precision medicine lies in systematically assessing the added value of genetic information in comparison to conventional risk factors and examining how it affects lifetime risk trajectories⁶. The recent development of large, prospective cohorts with both genome-wide genotyping and deep phenotyping data, such as the UK Biobank⁷, provide an opportunity for integrative analyses of genetic variation and modifiable risk factors. In addition to evaluating PRS predictive performance, these data also provide a unique opportunity to answer etiological questions about the relative contribution of genetic and modifiable risk factors to cancer susceptibility.

Our overarching aim was to quantify the relative contribution of common, low-penetrance risk variants to cancer risk prediction and overall disease susceptibility. To address these aims, we assembled PRS for 16 cancer types, based on results from previously published GWAS, and applied them to 413,870 individuals in the UK Biobank (UKB) cohort. First, we assessed the degree to which PRS can improve risk prediction and stratification based on established cancer risk factors, such as family history and modifiable health-related characteristics. Next, we estimated the proportion of cancer cases at the population-level that can be attributed to high genetic susceptibility, captured by the PRS, and compared this to modifiable determinants of cancer.

RESULTS

Characteristics of the UKB study population are presented in Supplementary Table 1. Over the course of the follow-up period a total of 22,755 incident cancers were diagnosed in 413,753 individuals, after excluding participants outside of the age enrollment criteria and those who withdrew consent after enrollment. Established cancer risk factors (listed in Supplementary Table 2) exhibited associations of expected magnitude and direction with each cancer (Supplementary Table 3). Family history of cancer in first-degree relatives, at the corresponding site, conferred a significantly higher risk of prostate (HR=1.84, 95% CI: 1.68-2.00, p=9.1×10^-46), breast (HR=1.56, 1.44-1.69, p=3.0×10^-29), lung (HR=1.61, 1.43-1.81, p=7.4×10^-15), and colorectal (HR=1.26, 1.14-1.40, p=1.2×10^-5) cancers. Metrics of tobacco use, such as smoking status, intensity, and duration were positively associated with risks of lung, colorectal, bladder, kidney, pancreatic, and oral cavity/oropharyngeal cancers. Weekly alcohol intake was associated with higher risks of breast (HR per 70 grams = 1.04, p=2.3×10^-5), colorectal (HR=1.04, p=5.9×10^-9), and oral cavity/pharyngeal (HR=1.05, p=3.0×10^-10) cancers. Adiposity was associated with cancer risk at multiple sites, including endometrium (BMI: HR per 1-unit = 1.09, 1.08-1.10, p=1.6×10^-49), colon/rectum (waist-to-hip ratio: HR per 10% increase = 1.17, 1.11-1.24, p=2.2×10^-8), and kidney (BMI: HR=1.04, 1.02-1.05, p=1.7×10^-6). Particulate matter (PM_2.5) was associated with lung cancer risk⁸ (PM_2.5: HR per 1 micro-g/m³ = 1.10, 1.05-1.15, p=1.9×10^-5) in the model that included smoking status and intensity.

View this table:

Supplementary Table 1:

Characteristics of the UK Biobank study population, restricted to participants of predominately European ancestry, stratified by incident cancer status.

View this table:

Supplementary Table 2:

Risk factors in addition to age and sex (if applicable), such as environmental exposures, lifestyle factors, and family history, that were included in the most comprehensive model for each cancer. Risk factors were selected based on literature review and availability in the UK Biobank cohort.

View this table:

Supplementary Table 3:

Hazard ratios (HR) for each cancer risk factor estimated using a cause-specific Cox regression model accounting for death as a competing risk.

All PRS associations with the target cancer reached at least nominal statistical significance (Figure 1; Supplementary Table 4). We considered three PRS approaches (see Methods for details): standard weights corresponding to reported risk allele effect sizes (PRS_β); unweighted sum of risk alleles (PRS_unw); inverse variance weights that incorporate the standard error of the risk effect size (PRS_IV). The latter approach resulted in stronger or equivalent (HR ± 0.01) associations for most cancers, except Non-Hodgkin lymphoma (NHL). Compared to standard PRS_β, substantial differences were observed for prostate (PRS_IV: HR=1.77, P=4.3×10^-366 vs. PRS_β: HR=1.39, P=2.0×10^-105), colon/rectum (PRS_IV: HR=1.48, P=1.8×10^-94 vs. PRS_β: HR=1.32, P=5.5×10^-50), leukemia (PRS_IV: HR=1.70, P=6.3×10^-23 vs. PRS_β: HR=1.45, P=8.0×10^-13), and thyroid (PRS_IV: HR=1.75, P=1.9×10^-15 vs. PRS_β: HR=1.57, P=5.7×10^-10). All subsequent analyses use PRS_IV since this approach appears to improve PRS performance by appropriately downweighing the contribution of variants with less precisely estimated effects.

View this table:

Supplementary Table 4:

Hazard ratios (HR) per one standard deviation (SD) increase in the standardized polygenic risk score (PRS) for each cancer, estimated using cause-specific Cox proportional hazards models, accounting for mortality as a competing risk. Results comparing three types of weighting approaches for combining individual risk variants in the PRS are presented: standard weights based on log odds ratios (PRS_β), unweighted sum of risk alleles (PRS_unw), and inverse variance (IV) weights (PRS_IV).

Figure 1:

Hazard ratios (HR) per one standard deviation (SD) increase in the standardized polygenic risk score (PRS) estimated using cause-specific Cox proportional hazards models, accounting for mortality as a competing risk. A comparison of three weighting approaches for combining individual risk variants in the PRS is presented: standard weights based on per-allele log odds ratios (PRS_β), unweighted sum of risk alleles (PRS_unw), and inverse variance (IV) weights (PRS_IV).

Improvement in risk prediction

The predictive performance of each risk model was evaluated based on its ability to accurately estimate risk (calibration) and distinguish cancer cases from cancer-free individuals (discrimination). All cancer-specific risk models were well-calibrated (Goodness of fit p>0.05; Supplementary Figure 1). Model discrimination was assessed by Harrell’s C-index, estimated as a weighted mean between 1 and 5 years of follow-up time. For completeness, we also report the AUC at 5 years of follow-up time⁹. Proportionality violations (p<0.05) were detected for age in the breast cancer model and PRS_IV for cervical cancer. For breast cancer this was resolved by incorporating an interaction term with follow-up time. As a sensitivity analysis for cervical cancer we modelled a time-varying PRS effect (Supplementary Figure 2).

Supplementary Figure 1:

Calibration plots comparing predicted and observed event probabilities for each of the 16 cancers examined. The most comprehensive risk factor model available is plotted in black and the same model with the addition of the polygenic risk score (PRS) is overlaid in red.

Supplementary Figure 2:

Plots of Schoenfeld residuals and corresponding p-values for the cervical cancer polygenic risk score (PRS_IV). Below is a comparison of PRS_IV effects estimated using a time-varying model (blue) and hazard ratio estimated under the proportionality assumption (red).

The C-index reached 0.60 with age and/or sex, for all cancers except for breast and thyroid (Supplementary Table 5). For cancers with available information on family history of cancer at the same site (prostate, breast, colon/rectum, and lung), incorporating this had a modest impact on the C-index (ΔC<0.01). In fact, replacing family history with the PRS resulted in an improvement in discrimination for prostate (C=0.763, ΔC=0.047), breast (C=0.618, ΔC=0.060), and colorectal (C=0.708, ΔC=0.029), but not lung (C=0.711, ΔC=-0.002) cancers.

View this table:

Supplementary Table 5:

Assessment of model discrimination for each cancer comparing different combinations of conventional risk factors and polygenic risk scores (PRS).

Next, we assessed the change in the C-index (ΔC) after incorporating the PRS into prediction models with all available risk factors for each cancer (Figure 2; Supplementary Table 5). The resulting improvement in prediction performance was variable. The largest increases in the C-index were observed for cancer sites with few available predictors, such as testes (C_PRS=0.766, ΔC=0.138), thyroid (C_PRS=0.692, ΔC=0.099), prostate (C_PRS=0.768, ΔC=0.051) and lymphocytic leukemia (C_PRS=0.756, ΔC=0.061). However, adding the PRS also improved prediction accuracy for melanoma (C_PRS=0.664, ΔC=0.042), breast (C_PRS=0.631, ΔC=0.060), and colorectal (C_PRS=0.716, ΔC=0.030) cancers, which have multiple environmental risk factors. The highest overall C-index was observed for lung (C_PRS=0.849) and bladder (C_PRS=0.814) cancers, which was primarily attributed to non-genetic predictors (C without PRS: lung = 0.846; bladder = 0.808). Changes in the AUC at 5 years of follow-up were of similar magnitude (Supplementary Table 5).

Figure 2:

Assessment of model discrimination based on Harrell’s C index, computed as a weighted average between 1 and 5 years of follow-up time. Comparisons are conducted between the most comprehensive risk factor model for each cancer, including all available lifestyle-related risk factors and family history (if applicable), and a nested model that also includes the standardized polygenic risk score (PRS_IV) for that cancer and the top 15 genetic ancestry principal components.

As a complementary metric of model performance, Royston’s R² was calculated to quantify the variation in the time-to-event outcome captured by each risk model¹⁰. Across all 16 sites, the median change in R² (ΔR²) was 0.066. Large improvements, defined as ΔR² >0.10, were observed for cancers of the breast (R²_PRS=0.146; ΔR²=0.103), pancreas (R²_PRS=0.439; ΔR²=0.103), leukemia (R²_PRS=0.415; ΔR²=0.160), prostate (R²_PRS=0.510; ΔR² =0.161), thyroid (R²_PRS=0.310; ΔR² =0.230), and testis (R²_PRS=0.605; ΔR² =0.421). These results parallel the trend in improvement observed based on C-index and AUC.

For 15 out of 16 cancers, incorporating the PRS resulted in significant improvement in reclassification, as indicated by positive percentile-based net reclassification index (NRI)¹¹ values with 95% bootstrapped confidence intervals excluding 0 (Supplementary Table 6). The overall NRI was primarily driven by the event NRI (NRI_e), which is the increase in the proportion of cancer cases reclassified to a higher risk group. Positive NRI_e values >0.25 were observed for prostate, thyroid, breast, testicular, leukemia, melanoma, and colorectal cancers. The largest reclassification improvement in non-event NRI (NRI_ne) observed for the lung PRS (NRI_ne=0.015) and breast PRS (NRI_ne=0.012). Four cancers (testes, leukemia, kidney, oral cavity/pharynx) had significantly negative NRI_ne values indicating that adding the PRS decreased classification accuracy in cancer-free individuals.

View this table:

Supplementary Table 6:

Percentile net reclassification improvement (NRI) index comparing the most comprehensive conventional risk factor model for each cancer with the model that incorporates the polygenic risk score (PRS) in addition to these risk factors. Event NRI (NRI_e) and non-event NRI (NRI_ne) quantify reclassification improvement in cases and event-free individuals, respectively.

Refinement of risk stratification

The ability of the PRS to refine risk estimates was assessed by examining 5-year absolute risk trajectories as a function of age, across strata defined by percentiles of PRS (high risk ≥80%, average: >20% to <80%, low risk: ≤20%) and family history of cancer (Figure 3). Significantly diverging risk trajectories, overall and at age 60, were observed for prostate (P<4.5×10^-25), breast (P<9.3×10^-36), colorectal (P<2.0×10^-21), and lung cancers (P<0.031). For all cancers except lung, risk stratification was primarily driven by PRS. For instance, 60-year-old men with a high PRS but no family history of prostate cancer had a higher mean 5-year disease risk (4.74%) compared to men with a positive family history and an average PRS (3.66%). For lung cancer, on the other hand, participants with a positive family history had higher average 5-year risks, even with a low PRS (0.54%), compared to those without (high PRS: 0.46%; low PRS: 0.29%). There was evidence of interaction between the PRS and family history of cancer for prostate (P = 9.0×10^-128), breast (P = 1.7×10^-104), colorectal (P = 8.7×10^-14) cancers (Supplementary Table 7). For lung cancer the interaction with family history was limited to the high PRS group (P = 5.9×10^-3).

View this table:

Supplementary Table 7:

Assessment of interaction on the absolute risk scale between ordinal polygenic risk score (PRS) categories (average: 20^th to <80^th percentile; high: ≥80^th percentile vs. low: ≤20^th percentile) and family history of cancer (yes vs. none) or elevated modifiable risk factor profile (summary score >50^th percentile vs. ≤50^th percentile). Interaction was assessed using linear regression models with predicted absolute risk of cancer at age 60 (age 50 for pre-menopausal breast cancer) as the outcome.

Figure 3:

Comparison of predicted 5-year absolute risk trajectories across strata defined by the presence of family history and the level of genetic risk, based on percentiles of the normalized polygenic risk score (PRS) distribution. Low PRS corresponds to ≤20^th percentile, average PRS is defined as >20^th to <80^th percentile, and high PRS includes individuals in the ≥80^th percentile of the genetic risk score distribution. P-values are based on t-tests comparing mean absolute risk in each stratum at age 60.

We also compared 5-year risk projections across strata of PRS and modifiable risk factors. Effects of multiple risk factors were combined into a single score by generating summary linear predictors for each cancer (see Methods for details). For several common cancers, individuals with a high PRS were predicted to have higher cancer risk, even modifiable risk factor scores below the median (Figure 4). PRS achieved significant risk stratification for breast cancer (pre-menopausal: P<5.9×10^-12; post-menopausal: P<4.3×10^-50), colorectal cancer (P<1.8×10^-42), and melanoma (P<4.6×10^-105) (Figure 4). The same pattern of stratification was observed for NHL, leukemia, pancreatic, thyroid, and testicular cancers (Supplementary Figure 3). For other phenotypes, lifestyle-related risk factors had a stronger overall influence on risk trajectories than PRS (Figure 5). However, the stratifying by levels of PRS still resulted in significantly diverging risk projections for several cancers (lung: P<1.1×10^-13; oral cavity/pharynx: P<1.2×10^-12; kidney: P<1.1×10^-13). For bladder cancer, the risk trajectories for high PRS/reduced modifiable risk and low PRS/high modifiable risk were overlapping (P=0.98).

Supplementary Figure 3:

Predicted 5-year absolute risk trajectories across strata defined by PRS and modifiable risk factors, where applicable. Low genetic risk is based on percentiles of the standardized polygenic risk score (PRS). Low PRS corresponds to ≤20^th percentile, average PRS is defined as >20^th to <80^th percentile, and high PRS includes individuals in the ≥80^th percentile. Individuals below the median of the modifiable risk factor distribution were considered to have reduced risk, whereas those above the median had elevated risk. P-values are based on t-tests comparing mean absolute risk in each stratum at age 60 or age 50 for cervical and testicular cancers.

Figure 4:

Predicted 5-year absolute risk trajectories for cancers where risk stratification is driven by genetic factors. Low genetic risk is based on percentiles of the standardized polygenic risk score (PRS). Low PRS corresponds to ≤20^th percentile, average PRS is defined as >20^th to <80^th percentile, and high PRS includes individuals in the ≥80^th percentile. Individuals below the median of the modifiable risk factor distribution were considered to have reduced risk, whereas those above the median had elevated risk. P-values are based on t-tests comparing mean absolute risk in each stratum at age 60, except for pre-menopausal breast cancer where differences at age 50 were tested.

Figure 5:

Predicted 5-year absolute risk trajectories for cancers where risk stratification is driven by modifiable risk factors. Low genetic risk is based on percentiles of the standardized polygenic risk score (PRS). Low PRS corresponds to ≤20^th percentile, average PRS is defined as >20^th to <80^th percentile, and high PRS includes individuals in the ≥80^th percentile. Individuals below the median of the modifiable risk factor distribution were considered to have reduced risk, whereas those above the median had elevated risk. P-values are based on t-tests comparing mean absolute risk in each stratum at age 60.

There was evidence of larger than additive risk differences, at age 60 between elevated modifiable risk factor profiles and all ordinal PRS categories for melanoma (P=3.3×10^-122), post-menopausal breast (P=1.3×10^-21), colorectal (P=1.3×10^-208), lung (P=1.1×10^-37), bladder (P=1.5×10^-50), kidney (P=5.5×10^-29), and oral cavity/pharynx cancers (P=5.2×10^-11) (Supplementary Table 7). For pre-menopausal breast cancer the interaction was limited to women in the high PRS group (P=4.4×10^-4).

Quantifying population-level impact

Population attributable fractions (PAF) were used to summarize the relative contribution of genetic susceptibility and modifiable risk factors to cancer risk at the population level. In order to allow comparisons between PAF estimates, the PRS and modifiable risk score distributions were both dichotomized at ≥80^th percentile. All risk factors nominally contributed (P<0.05) to cancer incidence (Figure 6; Supplementary Table 8), with the exception of the PRS for oral cavity/pharynx cancer (P=0.78) and PM_2.5 for lung cancer in never smokers (P=0.44).

View this table:

Supplementary Table 8:

Population attributable fractions (PAF) were estimated at 5 years of follow-up time for the top 20% (≥80^th percentile) of the modifiable risk factor and polygenic risk score (PRS) distributions, respectively, and family history of cancer at the relevant site. PAF estimates were derived from Cox proportional hazard regression models that were adjusted for age at enrollment, sex, family history of cancer (if available), genotyping array, and the top 15 genetic ancestry principal components.

Figure 6:

Population attributable fractions (PAF) estimated at 5 years of follow-up time for the top 20% (≥80^th percentile) of the modifiable risk factor and polygenic risk score (PRS) distributions, respectively, and family history of cancer at the relevant site. PAF estimates were derived from Cox proportional hazard regression models that were adjusted for age at enrollment, sex, family history of cancer (if available), genotyping array, and the top 15 genetic ancestry principal components.

PAF for high genetic risk exceeded the contribution of modifiable exposures for several cancers, such as thyroid (PAF_PRS=0.268, P=1.7×10^-9), prostate (PAF_PRS=0.232, P=5.5×10^-158), colon/rectum (PAF_PRS=0.167, P=9.2×10^-50), breast (PAF_PRS=0.166, P=2.6×10^-85), and melanoma (PAF_PRS=0.139, P=1.3×10^-23). For testicular cancer (PAF_PRS=0.303, P=4.5×10^-4), leukemia (PAF_PRS=0.269, P=4.5×10^-4), lung cancer in never smokers (PAF_PRS=0.077, P=0.045), and NHL (PAF_PRS=0.053, P=1.9×10^-3), PRS was the only significant risk factor other than demographic factors. Cancers for which modifiable risk factors had a substantially larger impact on disease burden than PRS included oral cavity/pharynx (PAF_mod=0.310 vs. PAF_PRS=0.006), lung (AF_mod=0.636 vs. PAF_PRS=0.040), endometrium (PAF_mod=0.353 vs. PAF_PRS=0.043), kidney (PAF_mod=0.210 vs. PAF_PRS=0.046), and bladder cancers (PAF_mod=0.189 vs. PAF_PRS=0.085). For other sites, such as pancreas (PAF_mod=0.118 vs. PAF_PRS=0.133) and ovary (PAF_mod=0.100 vs. PAF_PRS=0.082), the contribution of PRS and modifiable risk factors were more balanced.

DISCUSSION

Cancer is a multifactorial disease with a complex web of etiological factors, from macro-level determinants, such as health policy, to individual-level characteristics, such as health-related behaviors and heritable genetic profiles. Heritable and modifiable risk factors act in concert to influence cancer development, but their relative contributions to disease risk are rarely compared directly in the same population. In this study we provide new insight into the potential utility of PRS for cancer risk prediction and provide insight into the relative of contribution of genetic and modifiable risk factors to cancer incidence at population level.

Our first major finding is that cancer-specific PRS comprised of lead GWAS variants improve risk prediction for all 16 cancers examined. However, the magnitude of the resulting improvement in prediction varies substantially between sites. In evaluating the added predictive value of the PRS it is important to keep in mind that achieving the same incremental increase in the C-index/AUC is more difficult when the baseline model already performs well¹². This was applicable to most cancers, where age and/or sex alone achieved non-trivial risk discrimination (C-index/AUC>0.60). Expanding the set of predictors to include modifiable risk factors further improved discrimination, as previously shown¹³. By adding the PRS to the most comprehensive risk factor models facilitated by our data, we adopted a conservative approach for quantifying its added predictive value, which provides an informative benchmark for future efforts seeking to incorporate genetic predisposition in cancer risk assessment.

Cancer sites for which the PRS resulted in the largest gains in prediction performance included prostate, testicular, and thyroid cancers, as well as leukemia, and melanoma. This is consistent with high heritability estimates reported for these cancers in twin studies¹⁴ and our analyses in the UK Biobank¹⁵. Modelling the PRS in addition to established risk factors yielded very modest improvements in risk discrimination for cancers of the lung, endometrium, bladder, oral cavity/pharynx, and kidney. These cancers have strong environmental risk factors, such as smoking, alcohol consumption, obesity, and HPV infection, some of which were captured in our analysis. Limited predictive ability for cervical and endometrial cancers may also be due to a low number of variants included in the PRS (9 and 10, respectively). The association of the lung cancer PRS with cigarettes per day¹⁶ may have diminished its apparent predictive value when added to a model with smoking status and intensity, which already achieved an AUC>0.80 making difficult to elicit further improvement. Furthermore, PRS may be particularly relevant for assessing lung cancer risk in never smokers, since other risk factors have a limited impact in this population.

Few pan-cancer PRS studies have been conducted in prospective cohorts and none have considered the breadth of modifiable risk factors that we evaluated. Shi et al.¹⁷ tested 11 cancer PRS in cases from The Cancer Genome Atlas and controls from the Electronic Medical Records and Genomics Network. This analysis was limited by fewer risk variants in each PRS, as well as potential for bias due to selection of cases and controls from different populations. A phenome-wide analysis in the Michigan Genomics Initiative cohort by Fritsche et al.¹⁸ examined PRS for 12 cancers and reported similar associations for the target phenotype. However, risk stratification was not formally evaluated. Considering cancer-specific studies, the PRS presented here achieved superior prediction performance for some cancers^19–22, but not others^{23, 24}. For pancreatic cancer²⁵ and melanoma²⁶, our results are consistent with previous analyses using PRS of similar composition. Generally, comparison of prediction performance is complicated by differences in PRS content, population characteristics, and inclusion of different non-genetic predictors. Outside the cancer literature, our conclusions align with a recent study of ischemic stroke, which demonstrated that the PRS is similarly or more predictive than multiple established risk factors, including family history²⁷.

Our second major finding advances the idea of using germline genetic information to refine individual risk estimates. We show that incorporating PRS improves risk stratification provided by conventional risk factors alone, as illustrated by significantly diverging 5-year risk projections within strata based on family history or modifiable risk factors. For certain cancers, including some with strong environmental risk factors, such as melanoma, breast, colorectal, and pancreatic cancers, PRS was the primary determinant of risk stratification. For others, such as lung and bladder cancers, modifiable risk factors had a stronger impact on 5-year risk trajectories. A consistent finding for all cancers was that individuals in the top 20% of the PRS distribution with an unfavorable modifiable risk factor profile had the highest level of risk, with evidence that the effects of PRS and modifiable risk factors may be synergistic. Taken together, these findings highlight the potential for attenuating high genetic risk by adhering to a healthier lifestyle. Similar risk stratification results based on genetic and modifiable risk factors have also been reported for coronary disease²⁸ and Alzheimer’s²⁹.

In addition to evaluating predictive performance and risk stratification, our work demonstrates the relevance of common genetic risk variants at the population level. High genetic risk (PRS≥80^th percentile) explained between 4.0% and 30.3% of new cancer cases, and for many phenotypes this exceeded PAF estimates for modifiable risk factors or family history. The contribution of genetic variation to disease risk is typically conveyed by heritability, which is an informative metric, although not easily translated into a measure of disease burden useful in a public health context. Recent work on cancer PAF in the UK³⁰ and a series of publications from the ComPARe initiative in Canada^{31, 32} examined wide range of modifiable risk factors. Despite providing useful data, these studies overlook the contribution of genetic susceptibility. Our work addresses these limitations by providing a more complete perspective on the determinants of cancer and potential impact of future prevention policies.

In evaluating the contributions of our study, several limitations should be acknowledged. First, we did not account for the impact of workplace exposures and socio-economic determinants of health, thereby underestimating the role of non-genetic risk factors. We also lacked data on several known carcinogens, such as ionizing radiation, and clinical biomarkers, such as prostate-specific antigen, thus limiting the extent to which our results inform risk discrimination for certain cancers. Information on family history was also not available for all cancer types. Second, since the UK Biobank cohort is unrepresentative of the general UK population due to low participation and resulting healthy volunteer bias³³, we may have underestimated PAFs for modifiable risk factors. Finally, the models presented here are calibrated to the UKB population and we urge caution in extrapolating prediction performance and absolute risk projections to other populations. Since our analytic sample is restricted to individuals of predominantly European ancestry, this limits the applicability of our findings to diverse populations.

This work has several important strengths. The UK Biobank resource enabled us to simultaneously evaluate heritable and modifiable cancer risk factors in a population-based cohort with uniform deep phenotyping. We report a series of metrics that comprehensively characterize different dimensions of predictive performance that can be improved by incorporating genetic risk scores. While our results are promising, we anticipate that the performance of the PRS reported here may be enhanced by adopting less stringent p-value thresholding to include additional risk variants, optimizing subtype-specific weights, and implementing more sophisticated PRS models that incorporate linkage disequilibrium structure, functional annotations, or SNP interactions. Some of these strategies are already being successfully implemented^{4, 23}. We also provide insight into PRS modelling by showing that accounting for the variance in risk allele effect sizes improves PRS performance. This approach may be particularly advantageous for PRS derived from multiple sources rather than a single GWAS. Throughout this study we consider a relatively lenient definition of high genetic risk, corresponding to the top 20% of the PRS distribution. Exploring other cut-points will be informative, however, our results are valuable for demonstrating that the utility of PRS for stratification is not limited to the most extreme ends of the genetic susceptibility spectrum. This threshold is also compelling from a population-health perspective, as it allows us to quantify the proportion of cases attributed to a risk factor with a 20% prevalence.

Genetic risk scores have the potential to become a powerful tool for precision health, but only if the resulting information can be understood and acted on appropriately. One important consideration is the accuracy and stability of PRS-based risk classifications, especially at clinically actionable risk thresholds that exist for certain cancers. For instance, there are established screening programs for breast and colorectal cancers, and increasing evidence supporting the effectiveness of low-dose computed tomography for lung cancer screening^{34, 35}. For these cancers PRS could be used to adjust the optimal age for screening initiation and/or intensity. However, to justify this, studies are needed to demonstrate the benefit of using PRS to supplement conventional screening criteria. Such trials are already underway for breast cancer, where genetic risk scores are being incorporated to personalize risk-based screening³⁶. For other cancers, such as prostate, screening remains controversial and PRS may prove useful in identifying a subset of high-risk individuals who may benefit the most from screening.

Another area where PRS may prove useful is for prioritizing individuals for targeted health and lifestyle-related interventions. In support of this, our study demonstrates that those with the highest levels of genetic risk, based on the PRS, may also experience larger decreases in risk from shifting to a healthier lifestyle. However, there is also accumulating evidence that simply reporting genetic risk information to individuals does not induce behavior change that could lead to meaningful reductions in risk³⁷. Therefore, progress in our ability to construct and apply PRS to identify high-risk individuals must be also accompanied by the development of effective behavioral interventions that can be implemented in response to high disease risk, in addition to early detection and screening protocols.

Ultimately, the impact of PRS on clinical decision-making should be carefully evaluated in randomized trials prior to deployment in healthcare settings. By demonstrating cancer-specific improvements in risk prediction, as well as the substantial proportion of cancer incidence that is captured by known genetic susceptibility variants, we provide novel evidence that contextualizes the potential for using genetic information to improve cancer outcomes.

METHODS

Study Population

The UK Biobank (UKB) is a population-based prospective cohort of individuals aged 40 to 69 years, enrolled between 2006 and 2010. All participants completed extensive questionnaires, in-person physical assessments, and provided blood samples for DNA extraction and genotyping⁷. Health-related outcomes were ascertained via individual record linkage to national cancer and mortality registries and hospital in-patient encounters⁷. Details of the quality control and phenotyping procedures for this dataset have been previously described^{15, 16}. Briefly, individuals with at least one recorded incident diagnosis of a borderline, in situ, or malignant primary cancer were defined as cases. Cancer diagnoses coded by International Classification of Diseases (ICD)-9 or ICD-10 codes were converted into ICD-O-3 codes using the SEER site recode paradigm in order to classify cancers by organ site.

Participants were genotyped on the UKB Affymetrix Axiom array (89%) or the UK BiLEVE array (11%)⁷. Genotype imputation was performed using the Haplotype Reference Consortium as the main reference panel, supplemented with the UK10K and 1000 Genomes phase 3 reference panels⁷. Genetic ancestry principal components (PCs) were computed using fastPCA³⁸ based on a set of 407,219 unrelated samples and 147,604 genetic markers⁷. All analyses were restricted to self-reported European ancestry individuals with concordant self-reported and genetically inferred sex. To further minimize potential for population stratification, we excluded individuals with values for either of the first two ancestry PCs outside of five standard deviations of the population mean. Based on a subset of genotyped autosomal variants with minor allele frequency (MAF)≥0.01 and genotype call rate ≥97%, we excluded samples with call rates <97% and/or heterozygosity more than five standard deviations from the mean of the population. With the same subset of SNPs, we used KING³⁸ to estimate relatedness among the samples. We excluded one individual from each pair of first-degree relatives, preferentially retaining individuals to maximize the number of cancer cases remaining, resulting in a total of 413,870 UKB participants.

Polygenic Risk Scores

In order to derive polygenic risk scores (PRS) for each of the 16 cancers, we extracted previously associated variants by searching the National Human Genome Research Institute (NHGRI)-European Bioinformatics Institute (EBI) Catalog of published GWAS. For every eligible GWAS, both the original primary manuscript and supplemental materials were reviewed. Additional relevant studies were identified by examining the reference section of each article and via PubMed searches of other studies in which each article had been cited. We abstracted all autosomal variants with minor allele frequency MAF≥ 0.01 and P<5×10^-8 identified in populations of at least 70% European ancestry and published by June 2018, with the exception of one colorectal cancer GWAS³⁹ (published in December 2018). For inclusion in the PRS we preferentially selected independent SNPs (LD r²<0.3) with the highest imputation score and we excluded SNPs with allele mismatches or MAF differences >0.10 relative to the 1000 Genomes reference population, and palindromic SNPs with MAF≥0.45. For associations reported in more than one study of the same ancestry and phenotype, we selected the one with the most information (i.e., which reported the risk allele and effect estimate) with the smallest p-value. Further details of the PRS development approach, including a list of source studies, is described by Graff et al¹⁶.

We considered three approaches for combining risk variants in the PRS. First, we used standard PRS weights, corresponding to the log odds ratio (β) for each risk allele: We compared this to an unweighted score corresponding to the sum of the risk alleles, which is equivalent to assigning all variants an equal weight of 1: Lastly, we applied inverse variance (IV) weights that incorporated the standard error (SE) of the SNP log(OR) to account for uncertainty in risk allele effect sizes and downweigh the contribution of variants with less precisely estimated associations (weights provided in Supplementary Data 1): Each PRS was standardized across the entire analytic cohort to have a mean of 0 and standard deviation (SD) of 1.

Statistical Analysis

Development of risk models for each cancer

Cancer-specific prediction models consisting of four classes of risk factors were developed: i) demographic factors (age and sex); ii) family history of cancer in first-degree relatives; iii) modifiable risk factors; and iv) genetic susceptibility, represented by the PRS. Family history of cancer was derived based on self-reported illnesses in non-adopted first-degree relatives, which only listed cancers of the prostate, breast, bowel, or lung. In addition to these four cancer sites, family history of breast cancer was included as a predictor for ovarian cancer^{40, 41}. Models for pancreatic cancer included a composite variable for family history of cancer at any of these four sites^{42, 43}. Selection of modifiable risk factors was informed by literature review and reports, such as the European Code Against Cancer⁴⁴, with an emphasis on risk factors that are likely to have a causal role. Final models included established environmental and lifestyle-related characteristics that were collected for the entire UK Biobank cohort (Supplementary Table 1).

Cause-specific Cox proportional-hazard models were used to estimate the hazard ratios (HR) and corresponding 95% confidence intervals (CI) for genetic and lifestyle factors associated with each incident cancer. Death from any cause, other than cancer site-specific mortality, was treated as a competing event. Information on primary and contributing causes of death was used to identify cancer site-specific mortality. Follow-up time was calculated from the date of enrollment to the date of cancer diagnosis, date of death, or end of follow-up (January 1, 2015). For each cancer, individuals with a past or prevalent cancer diagnosis at that same site were excluded from the analysis, while individuals diagnosed with cancers at other sites were retained in the population. All models including the PRS were also adjusted for genotyping array and the first 15 genetic ancestry PCs. For the PRS, HR estimates correspond to 1 SD increase in the standardized genetic score.

Risk model evaluation

The predictive performance of each risk model was evaluated based on its ability to accurately estimate risk (calibration) and distinguish cancer cases from cancer-free individuals (discrimination). Calibration was assessed with a Hosmer-Lemeshow goodness-of-fit statistic modified for time-to-event outcomes⁴⁵, and by plotting the expected event status against the observed event probability⁴⁶ across risk deciles. For rarer cancers calibration was assessed across quantiles of risk to ensure a minimum of 5 cases per group. Violation of the proportionality of hazards assumption was assessed by examining the association between standardized Schoenfeld residuals and time.

We evaluated nested models starting with the most minimal set of predictors, such as demographic factors, followed by models including family history of cancer and modifiable risk factors, and finally models incorporating the PRS. Risk discrimination was assessed based on Harrell’s C-index, calculated as a weighted average between 1 and 5 years of follow-up time, and Area Under the Curve (AUC) at 5 years. We also report pseudo R² coefficients based on Royston’s measure of explained variation for survival models¹⁰. Percentile-based net reclassification improvement (NRI) index¹¹ was used to quantify improvements in reclassification. NRI summarizes the proportion of appropriate directional changes in predicted risks. Any upward movement in risk categories for cases indicates improved classification, and any downward movement implies worse reclassification. The opposite is expected for non-cases: Where n_U is the number of individuals up-classified and n_D is the number down-classified. Overall NRI is the sum of the NRI in cases and NRI in non-cases: NRI = NRI_e + NRI_ne. Bootstrapped confidence intervals were obtained based on 1000 replicates.

Risk stratification: genetic vs. modifiable factors

For each individual, we estimated the 5-year absolute risk of being diagnosed with a specific cancer using the formula of Benichou & Gail⁴⁷, as implemented by Ozenne et al⁴⁸. Absolute risk trajectories were examined as a function of age across strata defined by genetic and modifiable risk profiles, as well as family history. Individuals in the top 20% of the PRS distribution (PRS≥80^th percentile) for a given cancer were classified has having high genetic risk, those in the bottom 20% (PRS≤20^th percentile) were classified as low risk, and the middle category (>20^th to <80^th percentile) classified as average genetic risk.

Modifiable risk factors were summarized by generating summary linear predictors (predicted log-hazard ratios) based on risk factors in Supplementary Table 1, excluding age, sex, and family history. Individuals above the median of this risk score distribution were considered to have an unfavorable modifiable risk profile. Risk trajectories in each stratum were visualized by fitting linear models with smoothing splines across individual risk estimates as a function of age. Differences in mean absolute risk at age 60 were tested using a two-sample t-test. We also tested for interaction between the 3-level ordinal PRS variable and the modifiable risk score (dichotomized at the median) in a linear model with the predicted absolute risk as the outcome.

Etiology: contribution of genetic vs. modifiable risk factors

The relative contribution of genetic and modifiable cancer risk factors at the population level was quantified with population attributable fractions (PAF) using the method of Sjölander & Vansteedlandt^{49, 50} based on the counterfactual framework. To obtain comparable AF estimates, thresholds for high genetic risk and high burden of modifiable risk factors corresponded to the top 20% (≥80^th percentile) of each risk score distribution.

DATA AVAILABILITY

The UK Biobank in an open access resource, available at https://www.ukbiobank.ac.uk/researchers/. This research was conducted with approved access to UK Biobank data under application number 14105.

COMPETING INTERESTS

The authors declare no competing interests.

ACKNOWLEDGEMENTS

Disclaimer: Where authors are identified as personnel of the International Agency for Research on Cancer / World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer / World Health Organization.

This research was supported by funding from the National Institutes of Health (US NCI R25T CA112355 and R01 CA201358; PI: Witte) and Cancer Research UK (C18281/A19169).

References

1.↵
Khera, A.V., et al. Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood. Cell 177, 587–596 e589 (2019).
OpenUrl CrossRef PubMed
2.↵
Yengo, L., et al. Meta-analysis of genome-wide association studies for height and body mass index in approximately 700000 individuals of European ancestry. Hum Mol Genet 27, 3641–3649 (2018).
OpenUrl CrossRef PubMed
3.↵
Inouye, M., et al. Genomic Risk Prediction of Coronary Artery Disease in 480,000 Adults: Implications for Primary Prevention. J Am Coll Cardiol 72, 1883–1893 (2018).
OpenUrl FREE Full Text
4.↵
Khera, A.V., et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 50, 1219–1224 (2018).
OpenUrl CrossRef PubMed
5.↵
Torkamani, A., Wineinger, N.E. & Topol, E.J. The personal and clinical utility of polygenic risk scores. Nat Rev Genet 19, 581–590 (2018).
OpenUrl CrossRef PubMed
6.↵
Lambert, S.A., Abraham, G. & Inouye, M. Towards clinical utility of polygenic risk scores. Hum Mol Genet (2019).
7.↵
Bycroft, C., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
OpenUrl CrossRef PubMed
8.↵
Raaschou-Nielsen, O., et al. Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). Lancet Oncol 14, 813–822 (2013).
OpenUrl CrossRef PubMed Web of Science
9.
Heagerty, P.J. & Zheng, Y. Survival model predictive accuracy and ROC curves. Biometrics 61, 92–105 (2005).
OpenUrl CrossRef PubMed Web of Science
10.↵
Royston, P. Explained variation for survival models. The Stata Journal 6, 83–96 (2006).
OpenUrl Web of Science
11.↵
McKearnan, S.B., Wolfson, J., Vock, D.M., Vazquez-Benitez, G. & O’Connor, P.J. Performance of the Net Reclassification Improvement for Nonnested Models and a Novel Percentile-Based Alternative. Am J Epidemiol 187, 1327–1335 (2018).
OpenUrl
12.↵
Pencina, M.J., D’Agostino, R.B. & Massaro, J.M. Understanding increments in model performance metrics. Lifetime Data Anal 19, 202–218 (2013).
OpenUrl CrossRef PubMed
13.↵
Usher-Smith, J.A., Sharp, S.J., Luben, R. & Griffin, S.J. Development and Validation of Lifestyle-Based Models to Predict Incidence of the Most Common Potentially Preventable Cancers. Cancer Epidemiol Biomarkers Prev 28, 67–75 (2019).
OpenUrl Abstract/FREE Full Text
14.↵
Mucci, L.A., et al. Familial Risk and Heritability of Cancer Among Twins in Nordic Countries. JAMA 315, 68–76 (2016).
OpenUrl CrossRef PubMed
15.↵
Rashkin, S.R., et al. Pan-cancer study detects novel genetic risk variants and shared genetic basis in two large cohorts. bioRxiv 635367 (2019).
16.↵
Graff, R.E., et al. Cross-Cancer Evaluation of Polygenic Risk Scores for 17 Cancer Types in Two Large Cohorts. bioRxiv, 2020.2001.2018.911578 (2020).
17.↵
Shi, Z., et al. Systematic evaluation of cancer-specific genetic risk score for 11 types of cancer in The Cancer Genome Atlas and Electronic Medical Records and Genomics cohorts. Cancer Med 8, 3196–3205 (2019).
OpenUrl
18.↵
Fritsche, L.G., et al. Association of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative. Am J Hum Genet 102, 1048–1061 (2018).
OpenUrl
19.↵
Amin Al Olama, A., et al. Risk Analysis of Prostate Cancer in PRACTICAL, a Multinational Consortium, Using 25 Known Prostate Cancer Susceptibility Loci. Cancer Epidemiol Biomarkers Prev 24, 1121–1129 (2015).
OpenUrl Abstract/FREE Full Text
20.
Hoffmann, T.J., et al. A large multiethnic genome-wide association study of prostate cancer identifies novel risk variants and substantial ethnic differences. Cancer Discov 5, 878–891 (2015).
OpenUrl Abstract/FREE Full Text
21.
Smith, T., Gunter, M.J., Tzoulaki, I. & Muller, D.C. The added value of genetic information in colorectal cancer risk prediction models: development and evaluation in the UK Biobank prospective cohort study. Br J Cancer 119, 1036–1039 (2018).
OpenUrl
22.↵
Garcia-Closas, M., et al. Common genetic polymorphisms modify the effect of smoking on absolute risk of bladder cancer. Cancer Res 73, 2211–2220 (2013).
OpenUrl Abstract/FREE Full Text
23.↵
Mavaddat, N., et al. Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes. Am J Hum Genet 104, 21–34 (2019).
OpenUrl CrossRef PubMed
24.↵
Yang, X., et al. Evaluation of polygenic risk scores for ovarian cancer risk prediction in a prospective cohort study. J Med Genet 55, 546–554 (2018).
OpenUrl Abstract/FREE Full Text
25.↵
Klein, A.P., et al. Genome-wide meta-analysis identifies five new susceptibility loci for pancreatic cancer. Nat Commun 9, 556 (2018).
OpenUrl
26.↵
Fritsche, L.G., et al. Exploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb. PLoS Genet 15, e1008202 (2019).
OpenUrl
27.↵
Abraham, G., et al. Genomic risk score offers predictive performance comparable to clinical risk factors for ischaemic stroke. Nat Commun 10, 5819 (2019).
OpenUrl
28.↵
Khera, A.V., et al. Genetic Risk, Adherence to a Healthy Lifestyle, and Coronary Disease. N Engl J Med 375, 2349–2358 (2016).
OpenUrl CrossRef PubMed
29.↵
Licher, S., et al. Genetic predisposition, modifiable-risk-factor profile and long-term dementia risk in the general population. Nat Med 25, 1364–1369 (2019).
OpenUrl
30.↵
Brown, K.F., et al. The fraction of cancer attributable to modifiable risk factors in England, Wales, Scotland, Northern Ireland, and the United Kingdom in 2015. Br J Cancer 118, 1130–1141 (2018).
OpenUrl CrossRef PubMed
31.↵
Brenner, D.R., et al. The burden of cancer attributable to modifiable risk factors in Canada: Methods overview. Prev Med 122, 3–8 (2019).
OpenUrl
32.↵
Poirier, A.E., et al. The current and future burden of cancer attributable to modifiable risk factors in Canada: Summary of results. Prev Med 122, 140–147 (2019).
OpenUrl
33.↵
Fry, A., et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am J Epidemiol 186, 1026–1034 (2017).
OpenUrl CrossRef PubMed
34.↵
National Lung Screening Trial Research, T., et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 365, 395–409 (2011).
OpenUrl CrossRef PubMed Web of Science
35.↵
De Koning, H., Van Der Aalst, C., Ten Haaf, K. & Oudkerk, M. PL02.05 Effects of Volume CT Lung Cancer Screening: Mortality Results of the NELSON Randomised-Controlled Population Based Trial. Journal of Thoracic Oncology 13, S185 (2018).
OpenUrl
36.↵
Shieh, Y., et al. Breast Cancer Screening in the Precision Medicine Era: Risk-Based Screening in a Population-Based Trial. J Natl Cancer Inst 109(2017).
37.↵
Hollands, G.J., et al. The impact of communicating genetic risks of disease on risk-reducing health behaviour: systematic review with meta-analysis. BMJ 352, i1102 (2016).
OpenUrl Abstract/FREE Full Text
38.↵
Manichaikul, A., et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
OpenUrl CrossRef PubMed Web of Science
39.↵
Huyghe, J.R., et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat Genet 51, 76–87 (2019).
OpenUrl CrossRef
40.↵
Wooster, R. & Weber, B.L. Breast and ovarian cancer. N Engl J Med 348, 2339–2347 (2003).
OpenUrl CrossRef PubMed Web of Science
41.↵
Kazerouni, N., Greene, M.H., Lacey, J.V., Jr.., Mink, P.J. & Schairer, C. Family history of breast cancer as a risk factor for ovarian cancer in a prospective study. Cancer 107, 1075–1083 (2006).
OpenUrl PubMed
42.↵
Olson, S.H. & Kurtz, R.C. Epidemiology of pancreatic cancer and the role of family history. J Surg Oncol 107, 1–7 (2013).
OpenUrl CrossRef PubMed
43.↵
Molina-Montes, E., et al. Risk of pancreatic cancer associated with family history of cancer and other medical conditions by accounting for smoking among relatives. Int J Epidemiol 47, 473–483 (2018).
OpenUrl
44.↵
Schuz, J., et al. European Code against Cancer 4th Edition: 12 ways to reduce your cancer risk. Cancer Epidemiol 39 Suppl 1, S1–10 (2015).
OpenUrl CrossRef PubMed
45.↵
Demler, O.V., Paynter, N.P. & Cook, N.R. Tests of calibration and goodness-of-fit in the survival setting. Stat Med 34, 1659–1680 (2015).
OpenUrl CrossRef PubMed
46.↵
Gerds, T.A., Andersen, P.K. & Kattan, M.W. Calibration plots for risk prediction models in the presence of competing risks. Stat Med 33, 3191–3203 (2014).
OpenUrl
47.↵
Benichou, J. & Gail, M.H. Estimates of absolute cause-specific risk in cohort studies. Biometrics 46, 813–826 (1990).
OpenUrl CrossRef PubMed Web of Science
48.↵
Ozenne, B., Lyngholm Sørensen, A., Scheike, T., Torp-Pedersen, C. & Gerds, T.A. riskRegression: Predicting the Risk of an Event using Cox Regression Models. The R Journal 9, 440–460 (2017).
OpenUrl
49.↵
Sjolander, A. & Vansteelandt, S. Doubly robust estimation of attributable fractions in survival analysis. Stat Methods Med Res 26, 948–969 (2017).
OpenUrl
50.↵
Dahlqwist, E., Zetterqvist, J., Pawitan, Y. & Sjolander, A. Model-based estimation of the attributable fraction for cross-sectional, case-control and cohort studies using the R package AF. Eur J Epidemiol 31, 575–582 (2016).
OpenUrl CrossRef

View the discussion thread.

Posted January 29, 2020.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Genetics

Subject Areas

All Articles

Animal Behavior and Cognition (5204)
Biochemistry (11718)
Bioengineering (8724)
Bioinformatics (29132)
Biophysics (14937)
Cancer Biology (12052)
Cell Biology (17362)
Clinical Trials (138)
Developmental Biology (9407)
Ecology (14146)
Epidemiology (2067)
Evolutionary Biology (18270)
Genetics (12223)
Genomics (16768)
Immunology (11844)
Microbiology (28016)
Molecular Biology (11560)
Neuroscience (60841)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4940)
Plant Biology (10405)
Scientific Communication and Education (1681)
Synthetic Biology (2878)
Systems Biology (7333)
Zoology (1642)

[1] 1.↵
Khera, A.V., et al. Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood. Cell 177, 587–596 e589 (2019).
OpenUrl CrossRef PubMed

[2] 2.↵
Yengo, L., et al. Meta-analysis of genome-wide association studies for height and body mass index in approximately 700000 individuals of European ancestry. Hum Mol Genet 27, 3641–3649 (2018).
OpenUrl CrossRef PubMed

[3] 3.↵
Inouye, M., et al. Genomic Risk Prediction of Coronary Artery Disease in 480,000 Adults: Implications for Primary Prevention. J Am Coll Cardiol 72, 1883–1893 (2018).
OpenUrl FREE Full Text

[4] 4.↵
Khera, A.V., et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 50, 1219–1224 (2018).
OpenUrl CrossRef PubMed

[5] 5.↵
Torkamani, A., Wineinger, N.E. & Topol, E.J. The personal and clinical utility of polygenic risk scores. Nat Rev Genet 19, 581–590 (2018).
OpenUrl CrossRef PubMed

[6] 6.↵
Lambert, S.A., Abraham, G. & Inouye, M. Towards clinical utility of polygenic risk scores. Hum Mol Genet (2019).

[7] 7.↵
Bycroft, C., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
OpenUrl CrossRef PubMed

[8] 8.↵
Raaschou-Nielsen, O., et al. Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). Lancet Oncol 14, 813–822 (2013).
OpenUrl CrossRef PubMed Web of Science

[9] 9.
Heagerty, P.J. & Zheng, Y. Survival model predictive accuracy and ROC curves. Biometrics 61, 92–105 (2005).
OpenUrl CrossRef PubMed Web of Science

[10] 10.↵
Royston, P. Explained variation for survival models. The Stata Journal 6, 83–96 (2006).
OpenUrl Web of Science

[11] 11.↵
McKearnan, S.B., Wolfson, J., Vock, D.M., Vazquez-Benitez, G. & O’Connor, P.J. Performance of the Net Reclassification Improvement for Nonnested Models and a Novel Percentile-Based Alternative. Am J Epidemiol 187, 1327–1335 (2018).
OpenUrl

[12] 12.↵
Pencina, M.J., D’Agostino, R.B. & Massaro, J.M. Understanding increments in model performance metrics. Lifetime Data Anal 19, 202–218 (2013).
OpenUrl CrossRef PubMed

[13] 13.↵
Usher-Smith, J.A., Sharp, S.J., Luben, R. & Griffin, S.J. Development and Validation of Lifestyle-Based Models to Predict Incidence of the Most Common Potentially Preventable Cancers. Cancer Epidemiol Biomarkers Prev 28, 67–75 (2019).
OpenUrl Abstract/FREE Full Text

[14] 14.↵
Mucci, L.A., et al. Familial Risk and Heritability of Cancer Among Twins in Nordic Countries. JAMA 315, 68–76 (2016).
OpenUrl CrossRef PubMed

[15] 15.↵
Rashkin, S.R., et al. Pan-cancer study detects novel genetic risk variants and shared genetic basis in two large cohorts. bioRxiv 635367 (2019).

[16] 16.↵
Graff, R.E., et al. Cross-Cancer Evaluation of Polygenic Risk Scores for 17 Cancer Types in Two Large Cohorts. bioRxiv, 2020.2001.2018.911578 (2020).

[17] 17.↵
Shi, Z., et al. Systematic evaluation of cancer-specific genetic risk score for 11 types of cancer in The Cancer Genome Atlas and Electronic Medical Records and Genomics cohorts. Cancer Med 8, 3196–3205 (2019).
OpenUrl

[18] 18.↵
Fritsche, L.G., et al. Association of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative. Am J Hum Genet 102, 1048–1061 (2018).
OpenUrl

[19] 19.↵
Amin Al Olama, A., et al. Risk Analysis of Prostate Cancer in PRACTICAL, a Multinational Consortium, Using 25 Known Prostate Cancer Susceptibility Loci. Cancer Epidemiol Biomarkers Prev 24, 1121–1129 (2015).
OpenUrl Abstract/FREE Full Text

[20] 20.
Hoffmann, T.J., et al. A large multiethnic genome-wide association study of prostate cancer identifies novel risk variants and substantial ethnic differences. Cancer Discov 5, 878–891 (2015).
OpenUrl Abstract/FREE Full Text

[21] 21.
Smith, T., Gunter, M.J., Tzoulaki, I. & Muller, D.C. The added value of genetic information in colorectal cancer risk prediction models: development and evaluation in the UK Biobank prospective cohort study. Br J Cancer 119, 1036–1039 (2018).
OpenUrl

[22] 22.↵
Garcia-Closas, M., et al. Common genetic polymorphisms modify the effect of smoking on absolute risk of bladder cancer. Cancer Res 73, 2211–2220 (2013).
OpenUrl Abstract/FREE Full Text

[23] 23.↵
Mavaddat, N., et al. Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes. Am J Hum Genet 104, 21–34 (2019).
OpenUrl CrossRef PubMed

[24] 24.↵
Yang, X., et al. Evaluation of polygenic risk scores for ovarian cancer risk prediction in a prospective cohort study. J Med Genet 55, 546–554 (2018).
OpenUrl Abstract/FREE Full Text

[25] 25.↵
Klein, A.P., et al. Genome-wide meta-analysis identifies five new susceptibility loci for pancreatic cancer. Nat Commun 9, 556 (2018).
OpenUrl

[26] 26.↵
Fritsche, L.G., et al. Exploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb. PLoS Genet 15, e1008202 (2019).
OpenUrl

[27] 27.↵
Abraham, G., et al. Genomic risk score offers predictive performance comparable to clinical risk factors for ischaemic stroke. Nat Commun 10, 5819 (2019).
OpenUrl

[28] 28.↵
Khera, A.V., et al. Genetic Risk, Adherence to a Healthy Lifestyle, and Coronary Disease. N Engl J Med 375, 2349–2358 (2016).
OpenUrl CrossRef PubMed

[29] 29.↵
Licher, S., et al. Genetic predisposition, modifiable-risk-factor profile and long-term dementia risk in the general population. Nat Med 25, 1364–1369 (2019).
OpenUrl

[30] 30.↵
Brown, K.F., et al. The fraction of cancer attributable to modifiable risk factors in England, Wales, Scotland, Northern Ireland, and the United Kingdom in 2015. Br J Cancer 118, 1130–1141 (2018).
OpenUrl CrossRef PubMed

[31] 31.↵
Brenner, D.R., et al. The burden of cancer attributable to modifiable risk factors in Canada: Methods overview. Prev Med 122, 3–8 (2019).
OpenUrl

[32] 32.↵
Poirier, A.E., et al. The current and future burden of cancer attributable to modifiable risk factors in Canada: Summary of results. Prev Med 122, 140–147 (2019).
OpenUrl

[33] 33.↵
Fry, A., et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am J Epidemiol 186, 1026–1034 (2017).
OpenUrl CrossRef PubMed

[34] 34.↵
National Lung Screening Trial Research, T., et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 365, 395–409 (2011).
OpenUrl CrossRef PubMed Web of Science

[35] 35.↵
De Koning, H., Van Der Aalst, C., Ten Haaf, K. & Oudkerk, M. PL02.05 Effects of Volume CT Lung Cancer Screening: Mortality Results of the NELSON Randomised-Controlled Population Based Trial. Journal of Thoracic Oncology 13, S185 (2018).
OpenUrl

[36] 36.↵
Shieh, Y., et al. Breast Cancer Screening in the Precision Medicine Era: Risk-Based Screening in a Population-Based Trial. J Natl Cancer Inst 109(2017).

[37] 37.↵
Hollands, G.J., et al. The impact of communicating genetic risks of disease on risk-reducing health behaviour: systematic review with meta-analysis. BMJ 352, i1102 (2016).
OpenUrl Abstract/FREE Full Text

[38] 38.↵
Manichaikul, A., et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
OpenUrl CrossRef PubMed Web of Science

[39] 39.↵
Huyghe, J.R., et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat Genet 51, 76–87 (2019).
OpenUrl CrossRef

[40] 40.↵
Wooster, R. & Weber, B.L. Breast and ovarian cancer. N Engl J Med 348, 2339–2347 (2003).
OpenUrl CrossRef PubMed Web of Science

[41] 41.↵
Kazerouni, N., Greene, M.H., Lacey, J.V., Jr.., Mink, P.J. & Schairer, C. Family history of breast cancer as a risk factor for ovarian cancer in a prospective study. Cancer 107, 1075–1083 (2006).
OpenUrl PubMed

[42] 42.↵
Olson, S.H. & Kurtz, R.C. Epidemiology of pancreatic cancer and the role of family history. J Surg Oncol 107, 1–7 (2013).
OpenUrl CrossRef PubMed

[43] 43.↵
Molina-Montes, E., et al. Risk of pancreatic cancer associated with family history of cancer and other medical conditions by accounting for smoking among relatives. Int J Epidemiol 47, 473–483 (2018).
OpenUrl

[44] 44.↵
Schuz, J., et al. European Code against Cancer 4th Edition: 12 ways to reduce your cancer risk. Cancer Epidemiol 39 Suppl 1, S1–10 (2015).
OpenUrl CrossRef PubMed

[45] 45.↵
Demler, O.V., Paynter, N.P. & Cook, N.R. Tests of calibration and goodness-of-fit in the survival setting. Stat Med 34, 1659–1680 (2015).
OpenUrl CrossRef PubMed

[46] 46.↵
Gerds, T.A., Andersen, P.K. & Kattan, M.W. Calibration plots for risk prediction models in the presence of competing risks. Stat Med 33, 3191–3203 (2014).
OpenUrl

[47] 47.↵
Benichou, J. & Gail, M.H. Estimates of absolute cause-specific risk in cohort studies. Biometrics 46, 813–826 (1990).
OpenUrl CrossRef PubMed Web of Science

[48] 48.↵
Ozenne, B., Lyngholm Sørensen, A., Scheike, T., Torp-Pedersen, C. & Gerds, T.A. riskRegression: Predicting the Risk of an Event using Cox Regression Models. The R Journal 9, 440–460 (2017).
OpenUrl

[49] 49.↵
Sjolander, A. & Vansteelandt, S. Doubly robust estimation of attributable fractions in survival analysis. Stat Methods Med Res 26, 948–969 (2017).
OpenUrl

[50] 50.↵
Dahlqwist, E., Zetterqvist, J., Pawitan, Y. & Sjolander, A. Model-based estimation of the attributable fraction for cross-sectional, case-control and cohort studies using the R package AF. Eur J Epidemiol 31, 575–582 (2016).
OpenUrl CrossRef

Integration of polygenic risk scores with modifiable risk factors improves risk prediction: results from a pan-cancer analysis

ABSTRACT

INTRODUCTION

RESULTS

Improvement in risk prediction

Refinement of risk stratification

Quantifying population-level impact

DISCUSSION

METHODS

Study Population

Polygenic Risk Scores

Statistical Analysis

Development of risk models for each cancer

Risk model evaluation

Risk stratification: genetic vs. modifiable factors

Etiology: contribution of genetic vs. modifiable risk factors

DATA AVAILABILITY

COMPETING INTERESTS

ACKNOWLEDGEMENTS

References

Citation Manager Formats

Subject Area