Abstract
Background Vitamin D has been linked to a variety of diseases in observational studies. However, the causal role of vitamin D in human complex traits and diseases being unclear and under much debate. In this study, we used Mendelian randomization (MR) with genetic variants (as instrumental variables) to examine the causal role of vitamin D on various human complex traits and diseases.
Methods We performed MR analysis using genome-wide significant 25(OH)D SNPs from the latest vitamin D GWAS and 45 large-scale meta-GWASs on various outcomes (average sample size=137 112 per GWAS) to determine the causal role of 25(OH)D. We applied MR-Egger regression, MR-PRESSO, and weighted median approach to estimate the causal effect of 25(OH)D while examining and controlling for potential biases from horizontal pleiotropy.
Results We found limited evidence in support for causal effects of 25(OH)D on obesity-related traits, autoimmune diseases, cardiometabolic traits, neurological and psychiatric disorders. Sensitivity analysis using additional vitamin D instruments (association P<10−5) did not show any additional evidence for causal effect of vitamin D. Notably, we identified horizontal pleiotropy for 20% of the trait pairs analyzed here.
Conclusions Despite the largely augmented sample size and substantially improved statistical power with the most recent, largest vitamin D GWAS, our MR analysis did not convincingly support a causal effect of circulating 25(OH)D on complex traits and diseases examined here. Our results can inform ongoing and future randomized clinical trials of vitamin D supplementation. Future studies are warranted to prioritize the most promising target diseases for vitamin D intervention.
Introduction
Vitamin D and its major circulating form, 25-hydroxyvitamin D [25(OH)D], is a fat soluble, steroid pre-hormone that plays an essential role in human health. Vitamin D deficiency has long been observed to be associated with rickets and osteomalacia.1 In addition, a variety of common diseases such as cancer, autoimmune inflammatory diseases, cardiovascular conditions and diabetes, have been reported to be linked with vitamin D in observational studies.2–6 However, measurement error, confounding or reverse causality hinders causal interpretation of the results from observational studies. Although the effect of circulating vitamin D on disease risk can be demonstrated by traditional randomized controlled trials (RCTs), large-scale RCTs of vitamin D supplementation are not readily available due to their high cost and long duration.7,8 A recently completed RCT by VITAL (ClinicalTrials.gov: NCT01169259) involving a total of 25 871 participants with a median follow-up period of 5.3 years did not find lowered incidence of invasive cancer or cardiovascular events for those who took vitamin D3 at a dose of 2000 IU per day compared the placebo group.9 In addition, there are ongoing RCTs aiming at other diseases and conditions, such as depression (DepFuD; ClinicalTrials.gov: NCT02521012) and cognitive change (ClinicalTrials.gov: NCT03613116). To better inform the field on the chance of success for such vitamin D trials, a better understanding in the causal role of vitamin D in these diseases and traits, preferably through less costly observational study, is necessary.
The genetic basis of circulating 25(OH)D has been demonstrated by several large-scale genome-wide association studies (GWASs), which have identified associations of GC, NADSYN1/DHCR7, CYP2R1, CYP24A1, SEC23A, AMDHD1 with serum levels of vitamin D.10,11 These genetic discoveries in 25(OH)D may help disentangle the causal relationships between vitamin D and other traits through Mendelian randomization (MR), an approach that uses genetic variants as instrumental variables (IVs) for assessing the causal effect of an exposure on an outcome.12 Under certain assumptions – that the IVs are strongly associated with the exposure, but not associated with any confounders of the exposure-outcome relationship, nor affect the outcome via other pathways than through the exposure – an unbiased estimate of the causal effect can be estimated using the observed IV-exposure and IV-outcome associations.13
To date, a large number of MR studies examining the effect of circulating vitamin D on various diseases have been conducted. However, the causal role of vitamin D beyond its importance for bone health remains unclear and is under much debate, as nearly all outcomes assessed by MR studies of vitamin D have been negative. The only replicated results from MR studies are higher vitamin D level decreases the risk of adult/pediatric multiple sclerosis14–16 and Alzheimer’s disease;17,18 whereas findings from MR studies for blood pressure, ovarian cancer, and all-cause mortality are yet to be replicated.19–22 Overall, results from MR studies for vitamin D are not consistent with conventional observational studies, which have associated vitamin D with a variety of human diseases and traits. It is also worth noting that among almost all hitherto available MR studies for vitamin D are based on genetic variants (SNPs rs2282679, rs12785878, rs6013897 and rs10741657) identified by the SUNLIGHT meta-GWAS published in 2010.10 The recently updated vitamin D meta-GWAS identified two new genetic loci associated with circulating 25(OH)D level and improved SNP-vitamin D association estimates with larger sample size, which provided an exceptional opportunity to re-evaluate the casual effect of 25(OH)D with an improved instrument strength and accuracy of estimation. Moreover, the rapidly accumulating publicly available GWAS summary statistics for various diseases and traits allowed us to evaluate the effect of vitamin D on a wide range of outcomes using two-sample MR.
Here, we conducted a two-sample MR analysis (where IV-exposure and IV-outcome associations are estimated in two samples) to examine the effect of 25(OH)D on complex human traits and diseases using 45 GWASs. Five genetic variants associated with plasma 25(OH)D concentration were used as IVs. Summary statistics for the IV-exposure associations were extracted from the largest vitamin D GWAS involving 73 699 individuals. Summary statistics for the IV-outcome associations were extracted from publicly available GWASs of 45 traits involving 6170 075 individuals (average 137 112 individuals per GWAS).
Methods
Data for IV-exposure
We retrieved summary data for the associations between six SNPs and circulating 25(OH)D concentration from the SUNLIGHT meta-GWAS involving 79 366 discovery samples and 42 757 replication samples of European ancestry. Genome-wide analyses were performed within each cohort following a uniform plan. Specifically, an additive genetic model using linear regression on natural-log transformed 25(OH)D were fitted to each SNP, and a fixed-effects inverse variance weighted meta-analysis across the contributing cohorts was performed, with control for population structure within each cohort, and consistent quality control thresholds across cohorts (minor allele frequency (MAF)>0.05, imputation info score>0.8, Hardy-Weinberg equilibrium (HWE)>1×10−6). A minimum of 10 000 individuals was required to contribute to each reported SNP-phenotype association.11
Among the six SNPs, four were previously identified as being robustly associated with vitamin D (rs3755967 at GC, rs12785878 at NADSYN1/DHCR7, rs10741657 at CYP2R1, rs17216707 at CYP24A1); and the other two were newly identified and replicated (rs10745742 at AMDHD1, rs8018720 at SEC23A). We excluded strand ambiguous SNPs (A/T and G/C SNPs) from the analysis to harmonize between exposure and outcome GWASs and used five SNPs as IVs. In addition to the leading index SNPs, we further performed P-value informed pruning of SNPs across the genome based on their linkage disequilibrium (LD) patterns (r2<0.0001 with clumping window size of 1Mb). We selected all LD-clumped SNPs associated with 25(OH)D concentration at P<1×10−5. This resulted in an additional 21 independent index SNPs which may increase the power of our analysis by incorporating more instruments. Excluding strand ambiguous SNPs from this extended SNP set yielded a total of 19 SNPs as IVs. Details of the IVs are presented in Supplementary Table 1.
Data for IV-outcome
We retrieved summary data for the associations between index SNPs and 45 complex traits and diseases from publicly available GWASs conducted mainly in European ancestry populations (3 GWASs were conducted in a mixed population including both European and Asian samples; Supplementary Table 2). These traits span a wide range of phenotypes including anthropometric traits, autoimmune inflammatory diseases, cardiovascular events and metabolomic traits, and neurological and psychiatric disorders. For each trait, we retrieved the appropriate variant annotations (SNP rs ID, chromosome, position, reference and alternate allele) and summary statistics (effect size, standard error, P-value, allele frequency [if available] and sample size of the study [if available]) for each index SNP. The sample sizes of these GWASs range from 9 954 to 766 345.
Statistical analysis
MR uses SNPs as IVs to estimate the effects of risk factors on outcomes while controlling for potential confounding, by leveraging the random allocation of SNP alleles at conception. Three assumptions need to be satisfied to ensure a valid IV. The first is the relevance assumption, that IVs should be strongly associated with the exposure; the second assumption requires no association between IVs and confounders of the exposure-outcome relationship; and the third is the exclusion restriction assumption, indicating that genetic variants should affect the outcome only through exposure. If all MR assumptions are met, a causal effect can be estimated based on the observed IV-exposure and IV-outcome associations.
We conducted two-sample MR to test for the potential causal relationship between circulating 25(OH)D and various traits. We applied a number of MR methods to estimate causal effects including an inverse variance weighted meta-analysis (IVW) approach,23 MR-PRESSO,24 MR-Egger,25 and a weighted median approach.26 While IVW does not account for potential horizontal pleiotropy, i.e., a genetic variant affects the outcome via a separate biological pathway from the exposure under investigation (violation of exclusion restriction assumption), MR-PRESSO, MR-Egger and weighted median approaches control potential horizontal pleiotropy under different model assumptions. In addition, we performed MR-Egger regression, modified Q test, modified Q’ tests, and MR-PRESSO to detect estimation bias due to horizontal pleiotropy. MR-PRESSO implements a global test to evaluate the presence of horizontal pleiotropy, and an outlier test to detect specific outliers.24 In MR-Egger, significant difference of an intercept from zero suggests the existence of average directional horizontal pleiotropy.25 Modified Q and Q’ tests are traditionally used to identify over-dispersion and have been applied in the context of MR to detect outliers caused by horizontal pleiotropy.27
Lastly, we estimated the power of detecting the causal effects of 25(OH)D on various traits in our study. We followed the simulation framework outlined in Verbanck et al.,24 while assuming five IVs having the same effect sizes and standard errors for exposure, and allele frequencies as the five SNPs used in our real data analysis. We showed the power to detect significant causal effects at different sample sizes for the outcome GWAS (ranging from 10 000 to 150 000; matched with real outcome GWAS sample sizes), and different true causal effect sizes (ranging from 0 to 1) at a type-I error rate of 0.05.
Results
As shown in Table 1, Table 2 and Fig 1, in the primary analysis where five index SNPs were used as IVs, vitamin D did not show any causal effects on the outcomes examined here. However, several traits were identified to be potentially causally affected by 25(OH)D using both IVW and the adjusted MR-PRESSO approaches, and without apparent signs of horizontal pleiotropy. These traits include height (β=0.12, 95%CI: 0.01-0.23), hemoglobin A1c (HbA1c; β=0.04, 95%CI: 0.02-0.07), insomnia complaints (odds ratio=1.23, 95%CI: 1.03-1.48), age at menopause (β=−0.42, 95%CI: −0.08- −0.03). However, only HbA1c remained significant in the MR-Egger regression approach (β=0.05, 95%CI: 0.01-0.09), and none of these putative causal pairs survived multiple testing correction. We visualized the effect sizes of IV-exposure vs. IV-outcome for the four nominally significant pairs in scatter plot (Fig 2). The corresponding odds ratios and 95%CI of binary traits are presented in Supplementary Table 3.
We further found that 25(OH)D levels significantly affected two traits using the MR-Egger regression approach: the Crohn’s disease (β=0.45, 95%CI: 0.08-0.82) and triglycerides (β=0.12, 95%CI: 0.02-0.21). However, these two traits showed evidence for horizontal pleiotropy (Ppleiotropy=0.011 and 0.008 from MR-Egger intercept test for Crohn’s disease and triglycerides, respectively) and were not significant in either IVW (PIVW=0.98 and 0.76) or the adjusted MR-PRESSO approach (PMR-PRESSO=0.98 and 0.76).
We also performed a sensitivity analysis using an expanded set of IVs (independent SNPs associated with 25(OH)D levels at the P-value threshold of 1×10−5), as this approach was used in previous MR studies when fewer genome-wide significant SNPs were available as IVs.28,29 As shown in Supplementary Fig 1, Supplementary Table 4 and Supplementary Table 5, when incorporating additional IVs, only insomnia complaints remained nominally significant (P<0.05) using both the IVW and the adjusted MR-PRESSO approaches without evidence of horizontal pleiotropy. Although height and HbA1c showed some degrees of horizontal pleiotropy, both were nominally significant in the adjusted MR-PRESSO approach (PMR-PRESSO=0.02 and 0.01 for height and HbA1c, respectively).
For a majority of outcomes that we studied here (26 out of 45), their GWASs included more than 110 000 individuals. Under this sample size, our current study had >80% power to detect a causal effect (β) of 0.2 at the P-value threshold of 0.05. We also presented power estimation for a range of GWAS sample sizes of the outcome (Supplementary Table 6).
Discussion
In this study, we used two-sample MR methods with 25(OH)D-associated SNPs and 45 large-scale meta-GWASs covering a wide spectrum of human complex traits and diseases to determine the causal role of 25(OH)D.
In general, our results, which were based an increased number of IVs and larger GWAS sample sizes, showed limited evidence for an causal effect for 25(OH)D on the traits or diseases investigated here. An earlier MR study which aggregated information from 21 adult cohorts with up to 42 024 participants and explored the causal relationship between vitamin D status and obesity, did not identify any effect of genetically instrumented 25(OH)D on BMI (P=0.57).30 In our current analysis, we expanded this outcome category by incorporating body fat percentage, hip circumference, waist circumference, waist-to-hip ratio (both BMI adjusted and unadjusted), in addition to BMI. We used the latest GWAS summary statistics of these outcomes with substantially augmented sample sizes, ranging from 76 137 in body fat percentage to 322 154 in BMI. We did not find any evidence in support for a causal role of 25(OH)D in obesity-related traits (all P-values>0.2). Likewise, earlier MR studies have reported null findings for most of the autoimmune diseases including inflammatory bowel disease [Crohn’s disease (P=0.67), ulcerative colitis (P=0.42)],31 eczema (P=0.27),32 lupus (P=0.79) and rheumatoid arthritis (P=0.66).33 We results using summary statistics from greatly enlarged GWAS were in line with previous findings in autoimmune diseases. Although a causal role of 25(OH)D has been considered more solid with multiple sclerosis, where three independent MR studies have consistently identified an association between genetically predicated 25(OH)D and the disease,14–16 we unfortunately could not replicate this finding due to limited data accessibility. Among the cardiometabolic traits, previous MR studies did not identify any causal role of 25(OH)D with coronary artery disease,34 myocardial infarction,35 total cholesterol levels including triglycerides and low-density lipoprotein (LDL) cholesterol,36 type 2 diabetes,37,38 fasting insulin and adiponectin,39 all of which, were further validated by our results. There are, however, some discrepancies. For example, an earlier study incorporated 31 435 individuals and used four index SNPs from two 25(OH)D-associated genes found no associations between 25(OH)D-lowering alleles and either non-fasting remnant cholesterol or LDL-cholesterol, but identified that a 50% decrease in 25(OH)D levels was genetically associated with 6.0% (P=0.001) lower HDL-cholesterol levels.36 The effect of 25(OH)D on HDL-cholesterol was no longer significant in our current analysis where the summary statistics of a more recent HDL GWAS involving 187 167 individuals (a 5-fold increase in sample size) were used. Finally, putative causal relationships between 25(OH)D and neurological or psychiatric traits have rarely been investigated. So far, only a few MR studies explored the role of 25(OH)D in cognitive function,40 major depression,41 and Alzheimer’s disease;17,18 and nominally significant findings have only been reported in Alzheimer’s disease but not the other two traits. It might worth noting that previous MR studies conducted in Alzheimer’s disease used either IV-exposure association estimates from the smaller 25(OH)D GWAS (SUNLIGHT consortium 2010)10 or correlated IVs, which affected the validity of the results. In our updated analysis, the causal relationship between 25(OH)D and Alzheimer’s disease attenuated (P=0.053). Extending the outcome list by incorporating additional neurological or psychiatric traits did not identify any significant results. These results are important in informing the ongoing RCT of vitamin D on depression (ClinicalTrials.gov: NCT02521012) and cognitive change (ClinicalTrials.gov: NCT03613116).
While null findings for the outcomes examined here would the main conclusion of our study, we found nominally significant results for 25(OH)D to affect height (P=0.026), glycated hemoglobin (HbA1c) (P=0.001), age at menopause (P=0.03) and insomnia (P=0.02). Note that one previous MR study did not identify any association between 25(OH)D and glycemic traits.42 These results should be interpreted with caution given the large number of statistical tests we performed, and the fact that none of them survived multiple testing correction. Further investigations are warranted to elucidate these suggestive findings.
To ensure the validity of MR results, several important assumptions need to be satisfied. First of all, the relevance assumption, that IVs should be strongly associated with the exposure, is naturally guaranteed by selecting 25(OH)D-associated independent SNPs with genome-wide significance as IVs. Secondly, none of the IVs used in our analysis were cited by the NHGRI-EBI Catalog of published GWASs as associated with potential confounders, such as BMI, smoking or alcohol consumption at P=1×10−5 level. This could also be partly reflected by negligible associations between 25(OH)D and various traits from the current and past MR studies, many of which, act as potential confounders for other disease outcomes (e.g., obesity related traits). Finally, the exclusion restriction assumption requires IVs to affect the outcome only through the exposure (no horizontal pleiotropy). We employed a number of methods to control for horizontal pleiotropy. While MR-Egger regression with only a few SNPs can be underpowered to identify pleiotropy, we also used MR-PRESSO global test and modified Q and Q’ tests test for horizontal pleiotropy. We also increased the number of IVs by incorporating independent SNPs associated with 25(OH)D at P=1×10−5 level. No causal relationships appeared when using additional instruments after multiple testing correction. We expect that some of these ambiguous causal links will become evident in the future when larger GWASs become available. However, we also note that our MR analysis is well-powered to detect moderate causal effects of 25(OH)D (>80% power with a causal effect β>0.2), as shown in our power calculation.
Our MR analysis showed limited evidence on any effects of vitamin D on the traits and diseases examined here. This is inconsistent with many observational studies which may be affected by various confounding biases. Our results may inform costly RCTs that would be paid for by publicly funded agencies and help to prioritize the most promising target diseases for vitamin D intervention.
Author contributions
X.J. and C-Y.C. contributed to study conception, data analysis, interpretation of the results and drafting of the manuscript. T.G. contributed to interpretation of the results and critical revision of the manuscript.
Competing interests
The authors declare no competing interests.
Acknowledgements
We thank the various genome-wide association consortia for generously sharing the genome-wide association summary statistics. This work was supported by the Swedish Research Council [Vetenskapsrådet International Postdoc grant] (X.J.) and the National Institute on Aging at the National Institutes of Health [grant number K99AG054573] (T.G.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.