Abstract
Whether the epidemiological association of amyloid beta (Aβ) and tau pathology with Alzheimer’s disease (AD) is causal remains unclear. The recent failures to demonstrate the efficacy of several amyloid beta-modifying drugs may indicate the possibility that the observed association is not causal. These failures also led to efforts to develop tau-directed treatments whose efficacy is still tentative. Herein, we conducted a two-sample Mendelian randomization analysis to determine whether the relationship between the cerebrospinal fluid (CSF) biomarkers for amyloid and tau pathology and the risk of AD is causal. We used the summary statistics of a genome-wide association study (GWAS) for CSF biomarkers (Aβ1-42, phosphorylated tau 181 [p-tau], and total tau [t-tau]) in 3,146 individuals and for late-onset AD (LOAD) in 21,982 LOAD cases and 41,944 cognitively normal controls. We tested the association between the change in the genetically predicted CSF biomarkers and LOAD risk. We found a modest decrease in the LOAD risk per one standard deviation (SD) increase in the genetically predicted CSF Aβ (odds ratio [OR], 0.63 for AD; 95% confidence interval [CI], 0.38-0.87; P = 0.02). In contrast, we observed a significant increase in the LOAD risk per one SD increase in the genetically predicted CSF p-tau (OR, 2.37; 95% CI, 1.46-3.28; P = 1.09×10−5). However, no causal association was observed of the CSF t-tau with the LOAD risk (OR, 1.15; 95% CI, 0.85-1.45; P = 0.29). Our findings need to be validated in future studies with more genetic variants identified in larger GWASs for CSF biomarkers.
Introduction
Alzheimer’s disease (AD), a leading cause of dementia, is the largest burden source of morbidity and mortality in older adults. One in every 85 individuals is expected to develop AD, which means that delaying the onset by one year can reduce the number of patients with AD worldwide up to 9 million by 20201. Given that eightfold as many individuals have preclinical AD at risk of progression2, the development of disease-modifying therapies is urgently required. Amyloid beta (Aβ) peptides are transmembrane amyloid precursor proteins3 and tau is a microtubule-associated protein4. Decades of research have accumulated the evidence on the pathophysiology of Aβ and tau proteins that independently form plaques and tangles and lead normal functional neurons into a disabled state, AD5. Understanding AD as the result of abnormal proteins, extracellular amyloid plaques, and intraneuronal neurofibrillary tau tangles, two-thirds of the novel treatment pipelines aim at disease-modifying therapies, 90% of which are anti-amyloid and anti-tau protein agents6.
However, numerous trials to develop novel therapies targeting the amyloid plaques to modify the disease progress recently turned out failures. These failures could bring a reasonable doubt about the role of Aβ in the pathophysiology of AD with delicate elaboration7. One possible explanation of the failure of clinical trials targeting the amyloid plaques is that the intervention is performed too late in the disease course to reverse the pathology in the trial participants.8–10. However, the poor efficacy of the amyloid-targeting therapy may be due to the amyloid being a downstream result, rather than a cause of AD11, 12. With these recent failures, tau protein has gained more attention as a target for disease-modifying therapies. Although previous animal studies showed that the suppression of tau gene expression was protective to cognitive impairment, this impact required accompanying regulation of Aβ13. In addition, recent studies of the association between premortem cognitive function and AD neuropathology, including tau protein, have shown vague results14, 15. These results also brought on a doubt on the tau pathology in AD16. Thus, further research is still required to determine whether Aβ or tau proteins are causal to AD or are surrogate markers for AD. This issue is crucial for the successful development of disease-modifying drugs.
One promising approach for investigating the causality is Mendelian randomization (MR) using genetic variants as the instrumental variables (IVs)17. The association between the genetic variants and the disease outcome can provide evidence of causation while, subject to certain assumptions, minimizing confounding factors, including age, education, or other environmental exposures. This method may be useful to elaborate the causal relationship of Aβ or tau protein with AD without confounding factors and reverse causality18–20.
Herein, we hypothesized that Aβ or tau protein have a causal effect on the risk for late-onset AD (LOAD), and tested the hypothesis using two-sample MR (TSMR) methods with a summary statistics from large-scale genome-wide association studies (GWASs) of cerebrospinal fluid (CSF) biomarkers (Aβ1-42 [Aβ], phosphorylated tau 181 [p-tau], and total tau [t-tau]) and late-onset AD21, 22.
Materials and methods
Exposure
In this study, we used three CSF biomarkers for AD, Aβ, p-tau, and t-tau, as exposures for investigating the causal relationship with the outcome of interest. Meta-analyzed GWAS summary statistics of these biomarkers were obtained from 3,146 individuals in nine different studies (Knight ADRC, the Charles F. and Joanne Knight Alzheimer’s Disease Research Center; ADNI1, Alzheimer’s Disease Neuroimaging Initiative phase 1; ADNI2, Alzheimer’s Disease Neuroimaging Initiative phase 2; BIOCARD, Predictors of Cognitive Decline Among Normal Individuals; HB, Saarland University in Homburg/Saar, Germany; MAYO, Mayo Clinic; SWEDEN, Skåne University Hospital; UPENN, Perelman School of Medicine at the University of Pennsylvania; UW, the University of Washington)21. The sample size of these GWASs is the largest at present with respect to Aβ, p-tau, and t-tau collected from CSF. The effect per single-nucleotide polymorphism (SNP) in the GWAS summary statistics was defined as a standardized beta coefficient since each phenotype was converted using a log-transformation to follow the normal distribution.
Outcome
Our outcome of interest was LOAD, defined as AD with an onset at 65 years of age or older. We utilized the summary-level data from the stage 1 meta-analysis of the GWASs for LOAD in the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site22. The meta-analysis result was obtained from the four consortia (The Alzheimer Disease Genetics Consortium; The European Alzheimer’s disease Initiative; The Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium; and The Genetic and Environmental Risk in AD Consortium Genetic and Environmental Risk in AD/Defining Genetic, Polygenic and Environmental Risk for Alzheimer’s Disease Consortium). It consisted of 46 case-control studies that included 63,926 individuals of European ancestry (21,982 LOAD cases and 41,944 cognitively normal controls).
Selection of instruments for Mendelian randomization
We performed the following procedures to select appropriate genetic variants that preferentially satisfy three IV assumptions of the MR analysis23.
First, we selected the top SNPs with a relaxed threshold (P < 1 × 10−5), which was considered in recent MR analyses in the case when GWAS for exposure traits only yielded a small number of genome-wide significant SNPs24–27. The sample size of the data used in the present study is the largest on CSF biomarkers to this date21. CSF biomarkers are expensive, they are acquired through an invasive procedure, and require skilled professionals, which results in a difficulty to gather a sample size sufficient enough to identify many independent SNPs passing a genome-wide significant level (P < 5 × 10−8)28. We relaxed the threshold (P < 1 × 10−5) to compensate for the small sample size.
Second, we selected the independent genetic variants among those that passed the relaxed threshold, using the cutoff of linkage disequilibrium (LD) value (r2 < 0.001) to ensure that the IVs for exposure were independent29. The LD between the SNPs was calculated based on the European individuals from the 1000 Genomes Project. If a certain SNP was not available in the summary statistics of the outcome, we substituted that SNP with its LD proxy SNP having a high correlation coefficient (r2 ≥ 0.8) based on the European ancestry using the LDlink (https://ldlink.nci.nih.gov/). If such LD proxy SNP was not found, the SNP was excluded from the IV set.
Third, we eliminated the SNPs that had ambiguous alleles from the IV set when the alleles in the exposure and the outcome were not identical. For example, we excluded an SNP if the effect allele and the non-effect allele of the exposure and outcome were T/C, and T/G, respectively29.
Fourth, to ensure that there was no horizontal pleiotropy among the IVs, we conducted an MR-Pleiotropy Residual Sum and Outlier (MR-PRESSO) test that detects pleiotropic variants among the exposure-associated variants30. Considering the SNPs that had a direct effect on LOAD, which means a direct pleiotropic effect on the outcome of interest, we excluded rs769449 in the apolipoprotein E (APOE) region that is highly associated with LOAD from the set of IVs31, 32. The APOE region has been reported to have multiple pleiotropic effects in many previous studies33. When the MR analysis is performed with the outliers detected by MR-PRESSO or variants in the APOE region, including the pleiotropic SNPs in the instruments, it may result in a positive bias or a negative bias due to horizontal pleiotropy and induce inaccurate causal relationship34. Therefore, we excluded the outliers detected by MR-PRESSO. Subsequently, to confirm the absence of horizontal pleiotropy, we performed the MR-Egger intercept test with the intercept unconstrained35. The intercept of the MR-Egger regression represents a statistical estimate of the directional pleiotropic effect, which can be a confounding factor in MR. The selected genetic variants are listed in Supplementary Tables 1-3.
Two-sample Mendelian randomization method
TSMR utilizes the GWAS summary statistics obtained from two large sample sets, allowing to use more robustly associated genetic instruments compared with one-sample MR17. TSMR in the present study was performed using the Two Sample MR R package (version 0.4.22) from the MR-Base platform29. To confirm that the findings of the estimation of the causal effect of the exposures on the risk of LOAD are credible, we used diverse methods, including the inverse-variance weighted (IVW), maximum likelihood, weighted median, and MR-Egger regression. These multiple methods have been developed and differ from each other in terms of sensitivity to heterogeneity, bias, and power. We selected the IVW method as our primary MR method because it provides reliable results in the presence of heterogeneity in an MR analysis and is appropriate when using a large number of SNPs36. The standard error (SE) of the IVW effect was estimated using a multiplicative random effects model. We performed a leave-one-out analysis that estimates the causal effect of all but one SNP at a time iteratively, using the IVW method, to test if the results were derived from any particular SNP.
The maximum likelihood method is a likelihood-based method that assumes a bivariate normal distribution of the exposure and outcome, which better elucidates the correlation between two different GWAS summary statistics than does the IVW23. We also used the weighted median and MR-Egger regression. Since these two methods provide convincing causal estimates in the presence of violation of the MR assumptions, they were used as sensitivity analyses in the MR studies35, 37.
We used a forest plot to visualize the heterogeneity between the instruments due to horizontal pleiotropy and the contribution of each instrument to the overall estimate29. A funnel plot showing the proportion of the precision (1/ SE) to Wald ratios per SNP was used to evaluate the bias due to the invalid instruments. The overall symmetry in the funnel plot represents the lack of severe heterogeneity and bias driven by directional horizontal pleiotropy that violates the MR assumptions38.
Power calculation
We calculated the statistical power of the MR using an online tool (https://sb452.shinyapps.io/power/)39 based on the proportion of variance in the exposure (R2) explained by genetic instruments, true causal effect of the exposure on the outcome, sample size, and ratio of cases to controls of the outcome. R2 was obtained from the MR-Steiger directionality test40. We estimated the true causal effect based on the observed odds ratios (ORs) between the CSF biomarkers and the risk of LOAD.
Results
In our main analysis, we excluded the outliers using MR-PRESSO to select suitable instruments that satisfied one of the core IV assumptions, that is, no horizontal pleiotropy. The number of the outlier SNPs predicted to have pleiotropy by MR-PRESSO was one (out of 15 top SNPs) and thirteen (out of 20 top SNPs) for CSF Aβ and p-tau, respectively. After excluding the outliers, the MR-Egger intercept test showed no evidence of horizontal pleiotropy in both Aβ and p-tau (Aβ: intercept = −0.027, SE = 0.015, P = 1; p-tau: intercept = −0.004, SE = 0.044, P = 0.93) (Supplementary Table 4).
CSF Aβ showed evidence for a causal effect on the risk for LOAD (IVW OR, 0.63 for LOAD per 1 standard deviation (SD) increase in the genetically predicted CSF Aβ; 95% confidence interval [CI], 0.38-0.87; P = 0.02) (Table 1 and Fig. 1A). We found a more prominent causality between CSF p-tau and the risk for LOAD (IVW OR, 2.37 for LOAD per 1 SD increase in the genetically predicted CSF p-tau; 95% CI, 1.46-3.28; P = 1.09×10−5) (Table 1 and Fig. 1C). In the sensitivity analyses, the causal effect of CSF p-tau on the risk for LOAD was significant in both the maximum likelihood and the weighted median (maximum likelihood P = 2.98×10−5 and weighted median P = 2.51×10−4), while the causal association between CSF Aβ and the risk for LOAD was significant in the maximum likelihood (P = 0.03). Both CSF Aβ and CSF p-tau were not significant in the MR-Egger regression. Although the effect estimate for CSF p-tau was similar to that derived by the IVW, maximum likelihood, and weighted median, the estimates for CSF Aβ were different. MR-Egger was shown to yield minimally biased estimates regardless of the pleiotropic SNPs in the instruments35. However, in our MR analysis, the potential pleiotropic SNPs were detected as outliers using the MR-PRESSO and all of them were excluded, which may suggest that the IVW has a greater power and derives a more precise estimate than the MR-Egger regression.
While the effect size for CSF Aβ on the risk for LOAD yielded patterns indicating a moderate heterogeneity among the instruments, the instruments of CSF p-tau showed little heterogeneity (Fig. 1B and Fig. 1D). The Cochran Q statistics, P value, and heterogeneity (I2 [%]) were 13.41, 0.42, and 3 for CSF Aβ, and 1.58, 0.95, and 0 for CSF p-tau, respectively (Supplementary Tables 4). The leave-one-out analysis confirmed that a single SNP was not exclusively responsible for the associations with the risk for LOAD (Fig. 2A and Fig. 2C). In the funnel plot, each dot shows the proportion of the precision (1/SE) to Wald ratio per SNP and the vertical lines represent the MR estimates jointed by the instruments (Fig. 2B and Fig. 2D). We observed an overall symmetry in the funnel plot, which indicates that our results were less likely biased by invalid instruments.
We also investigated the association between t-tau and the risk for LOAD. In contrast to CSF p-tau, no causal evidence was found for the effect of CSF t-tau on the risk for LOAD (IVW OR, 1.15 for LOAD per 1 SD increase in the genetically predicted CSF t-tau; 95% CI, 0.85-1.45; P = 0.29) (Table 1, Fig. 1E, Fig. 1F, Fig. 2E, and Fig. 2F). There was no outlier detected by MR-PRESSO among the instruments of CSF t-tau. We confirmed that there was no evidence for horizontal pleiotropy (Intercept = −0.020, SE = 0.014, P = 0.18) and little heterogeneity between the IVs (Q = 18.45, P = 0.36, I2 [%] = 8) (Supplementary Table 4).
Given the observed ORs between the measured CSF biomarkers and the risk for LOAD, our MR analysis showed sufficient statistical power (> 90%) to detect the causal effects of the CSF biomarkers on the risk for LOAD with a level of significance of 0.05. Supplementary Table 5 presents the estimates of the statistical power for our MR analysis.
Discussion
Using TSMR with genetic instruments from large-scale GWASs, we investigated the potential causal relationship between CSF biomarkers and the risk for LOAD. In this MR study, the genetic association of the CSF Aβ and p-tau instruments supported the causality of CSF Aβ and p-tau on the risk for LOAD. In contrast, we found no significant evidence of a causal relationship between CSF t-tau and the risk for LOAD. Our results are consistent with those of recent reports 41–43.
Although Aβ, p-tau, and t-tau in the CSF have been reported to be useful as disease progression markers41, 44, there is still little evidence for their causal relationship with LOAD in randomized clinical trials (RCTs)45, 46. Recent RCTs on the elimination of the accumulated Aβ or tau proteins could not provide solid evidence for improvement of the symptoms of LOAD47–49. While clinical trials with small sample sizes have shown that eliminating the Aβ elements led to symptomatic improvement50, larger studies have failed to establish consistent results47, 51, 52. The agents reducing tau phosphorylation represented promising benefits in pilot clinical studies53, 54, but failed to show significant improvements in a cohort study54; tau aggregation inhibitors showed a similar pattern55. Although another approach for proving the causality for LOAD is the induced pathologic accumulation of Aβ and tau proteins in RCTs, such intervention in humans is not allowed due to ethical issues. Instead, the development of AD phenotypes has been attempted in numerous animal models with accumulating Aβ56, 57 and tau proteins58, and these still have various limitations. Transgenic animal models generally represent familiar AD rather than sporadic LOAD due to targeting a specific pathologic substance; therefore, they cannot provide a full explanation of LOAD59. In addition, animal models could not represent the complex symptomatology of dementia that presents in humans.
In consideration of these perspectives, the principles of MR can be applied to provide clues for the causality of these biomarkers in the etiology of LOAD60. This approach, which is conceptually similar to that of RCTs,61 is based on the Mendel’s law of segregation that genetic variants are randomly allocated at meiosis and that these genetic variants are consequently independent of many confounding factors or reverse causation. Thus, an MR analysis could enable the inference of the risk for LOAD driven through the genetically determined risk of amyloid accumulation and tau pathology. In our study, we found evidence supporting the potential causal relationships between Aβ and p-tau proteins in the CSF and the risk for LOAD using MR with genetic instruments selected from large-scale GWASs.
The causal estimates in our analysis were based on the largest GWAS to date, which may increase the precision of the estimates. We estimated a 0.63-fold decrease of the risk for LOAD per 1 SD increase in the CSF Aβ, and a 2.37-fold increase in the risk for LOAD per 1 SD increase in the CSF p-tau. These directions of association are consistent with those in previous reports41, 62. Markedly increased levels of p-tau proteins and decreased levels of Aβ in the CSF are represented as a specific finding in LOAD41. Aβ accumulation in the neuronal plaques and its binding to various receptors have been known as a hallmark of LOAD. Aβ binding to receptors has been understood as a process leading to neuronal toxicity, inducing mitochondrial dysfunction and oxidative stress63, 64. The pathologic process of tau in LOAD consists of the development of phosphorylated pre-tangles and formation of the neuropil threads65. After a process of hyperphosphorylation, acetylation, N-glycosylation, and truncation, tau forms the tangles in LOAD42. The causal relationship of Aβ and p-tau observed in our MR analysis supports that Aβ and p-tau may play important roles in the pathophysiology of LOAD. Further studies investigating the biological mechanisms are needed.
T-tau in the CSF and the risk for LOAD did not show a causal relationship, which is consistent with the findings of previous studies66, 67. While the CSF level of p-tau increases specifically in LOAD, the CSF level of t-tau can increase in various conditions of neurodegeneration, including LOAD and other brain disorders68. Our result may support a recent proposal emphasizing the tau hyperphosphorylation in AD versus the excessive production of tau proteins42.
The measured CSF biomarkers in AD reflect both the production and the clearance of these markers at a given time. In contrast, neuroimages represent the neuropathologic load or damage accumulated over time directly in the brain69. Thus, imaging GWAS, such as amyloid or tau deposition in the brain measured by positron emission tomography (PET)70 as phenotypes, could provide additional information for the association between these biomarkers and the risk for LOAD. However, the sample size of the current imaging genetic studies for these biomarkers is limited. Further studies with larger samples of genetic and imaging data could be helpful.
This study has several limitations. First, our causal estimates may be affected by several factors; horizontal pleiotropy, which was not detected by the applied MR sensitivity analysis methods71, and the possibility of misclassified LOAD cases72, 73. Unlike the balanced or positive bias induced by horizontal pleiotropy, the misclassified cases in the outcome may lead our results toward null. However, the estimates were statistically significant and consistent in various methods applied in our analysis. Second, our GWAS data included samples of Caucasian ancestry, which may limit the generalization of our findings. Finally, even though we employed the summary statistics from the largest GWASs on Aβ and tau proteins to date21, we applied a relaxed threshold to include more IVs as done in other psychiatric MR studies 24–27. Despite using instruments with a less stringent threshold, which may lead to null findings, our power analysis of the MR showed a statistical power greater than 90% and our analysis derived significant causal estimates.
In conclusion, this MR analysis suggested a possible causal relationship of the CSF Aβ and p-tau with the risk for LOAD. In addition, our findings showed that the association between t-tau and the risk for LOAD was not causal. Our results suggested that the etiology of LOAD involves multiple biological processes, including the amyloid and tau proteins in the AD pathophysiology. This complex nature of LOAD could partly explain the recent multiple failures of clinical trials of anti-amyloid monotherapy47, 51, 52, 74, 75. Further MR studies for multiple candidate biomarkers could be helpful to find appropriate drug targets for LOAD and larger GWAS data with sufficient numbers of IVs are necessary to validate the causality of CSF Aβ and p-tau on the risk for LOAD.
Acknowledgments
The authors thank the researchers at the Washington University School of Medicine for providing the summary statistics of GWAS for CSF biomarkers.
This work was supported by the a grant from the National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT) [grant number NRF-2019R1A2C4070496 to HH Won; NRF-2018R1C1B6001708 to W Myung]. This work was also supported by grants from the National Institutes of Health [grant number R01LM012535, R03AG054936, and R03AG063250 to K Nho].