MR-TRYX: Exploiting horizontal pleiotropy to infer novel causal pathways

Yoonsu Cho; Philip C Haycock; Tom R Gaunt; Jie Zheng; Andrew P Morris; George Davey Smith; Gibran Hemani

doi:10.1101/476085

Abstract

Background In Mendelian randomization (MR) analysis, variants that exert horizontal pleiotropy, influencing the outcome through a pathway excluding the hypothesised exposure, are typically treated as a nuisance. However, they could provide valuable information for identifying novel pathways to the traits under investigation.

Methods Following the advice of William Bateson to “TReasure Your eXceptions”, we developed the MR-TRYX framework. Here, we begin by detecting outliers in a single exposure-outcome MR analysis. Outliers are hypothesised to arise due to horizontal pleiotropy, so we search through the MR-Base database of GWAS summary statistics to systematically identify other (“candidate”) traits that associate with the outliers. We developed a LASSO-based multivariable MR approach to model the heterogeneity in the exposure-outcome analysis due to pathways through candidate traits.

Results Through simulations we showed that commonly used outlier removal methods can increase type 1 error rates, but adjustment for detected pleiotropic pathways can improve power without the increase in type 1 error rates. We illustrate the use of MR-TRYX through investigation of several causal relationships: i) systolic blood pressure on coronary heart disease (CHD); ii) urate on CHD; iii) sleep duration on schizophrenia; and iv) education level on body mass index. Many pleiotropic pathways were uncovered with already established causal effects, validating the approach. Novel putative causal pathways, such as pain related phenotypes influencing CHD, were also identified. Adjustment for these pleiotropic pathways substantially reduced the heterogeneity across the analyses.

Conclusion Incorporating GWAS on thousands of traits in MR-Base to model horizontal pleiotropy in MR analysis can improve power through reducing heterogeneity, whilst enabling the identification of novel causal relationships.

Introduction

Mendelian randomization (MR) is now widely used to infer the causal influence of one trait (the exposure) on another (the outcome) ^1,2. It is generally performed by obtaining genetic instruments for an exposure through genome-wide association studies (GWAS). If the instruments are valid, in that they are unconfounded and influence the outcome only through the exposure (vertical pleiotropy), then they will each provide an independent, unbiased estimate of the causal effect of the exposure on the outcome ³. Meta-analysing these estimates can provide a more precise estimate of the causal relationship between the exposure and the outcome ^4,5. If, however, some of the instruments are invalid, particularly because they additionally influence the outcome through pathways that bypass the exposure (horizontal pleiotropy) ³, then the causal effect estimate is liable to be biased. To date, MR method development has viewed horizontal pleiotropy as a nuisance that needs to be factored out of the analysis^6-9. Departing from this viewpoint, here we exploit horizontal pleiotropy as an opportunity to identify new traits that putatively influence the outcome. We then use this knowledge to improve the original exposure-outcome estimates.

A crucial feature of MR is that it can be performed using only GWAS summary data, where the causal effect estimate can be obtained solely from the association results of the instrumental single nucleotide polymorphisms (SNPs) on the exposure and on the outcome ⁵. This means that causal inference between two traits can be made even if they have never been measured together in the same sample of individuals. Complete GWAS summary results have now been collected from thousands of complex trait and common diseases ¹⁰, meaning that one can search the database for candidate traits that might be influenced by the outliers. In turn, the causal influence of each of those candidate traits on the outcome can be estimated using MR by identifying their instruments (and excluding the original outlier). Should any of these candidate traits putatively associate with the outcome then this goes some way towards explaining the horizontal pleiotropic effect that was exhibited by the outlier SNP in the initial exposure-outcome hypothesis.

Several methods exist for identifying outliers in MR, each likely to be sensitive to different patterns of horizontal pleiotropy. Cook’s distance can be used to measure the influence of a particular SNP on the combined estimate from all SNPs ¹¹, identifying SNPs with large influences as outliers. Steiger filtering removes those SNPs that do not explain substantially more of the variance in the exposure trait than in the outcome, attempting to guard against using SNPs as instruments that are likely to be associated with the outcome through a pathway other than the exposure ¹². Finally, meta-analysis tools can be used to evaluate if a particular SNP contributes disproportionately to the heterogeneity between the estimates obtained from the set of instruments, and this has been adapted recently to detect outliers in MR analysis ^13-15. A potential limitation of heterogeneity-based outlier removal is that this practice is a form of cherry picking ^9,16. While outlier removal can certainly improve power by reducing noise in estimation, it could also potentially induce higher type 1 error rates, which we go on to explore through simulations.

Recent large-scale MR scans have indicated that horizontal pleiotropy is widespread based on systematic analysis of heterogeneity ^15,17. This suggests that many SNPs used as instruments are likely to associate with other traits, which in turn might associate with the original outcome of interest - hence giving rise to heterogeneity. As such we have an opportunity to identify novel pathways through exploiting outliers. Equipped with automated MR analysis software, outlier detection methods and a database of complete GWAS summary datasets, we developed MR-TRYX (from the phrase coined by William Bateson, “Treasure your exceptions¹⁸”), a framework for identifying novel putative causal factors when performing a simple exposure-outcome analysis. In this paper we present simulations to show how knowledge of horizontal pathways can be used to discover novel putative causal factors for an outcome of interest, and to also improve the power and reliability of the original exposure-outcome association analysis. We apply MR-TRYX to several exemplar analyses to demonstrate its potential utility.

Methods

Overview of MR-TRYX

Figure 1 shows an overview of the approach. MR-TRYX is applied to an exposure-outcome analysis and it has two objectives. The first is to use outliers in the original exposure-outcome analysis to identify novel putative factors that influence the outcome independently of the exposure. The second is to re-estimate the original exposure-outcome association by adjusting outlier SNPs for the horizontal pleiotropic pathways that might arise through the novel putative associations.

Figure 1.

Conceptual framework of the study: Illustration of identifying novel factors that influence the original association. (a) Where (gy) is the total effect of the SNP on the outcome, (gx) is the SNP-exposure effect, (xy) is the exposure-outcome effect as estimated through MR analysis from the non-outlier SNPs, (gp) is the SNP-candidate trait effect and (py) is the causal effect of the candidate trait on the outcome. (b) The open circles represent valid instruments and the slope of the dotted line represents the causal effect estimate of the exposure on the outcome. The closed circle represents an outlier SNP which influences the outcome, through two independent pathways (py). (c) One way in which the red SNP can exhibit a larger influence on the outcome than expected given its effect on the exposure is if it influences the outcome additionally through another pathway (horizontal pleiotropy). (d) Using the MR-Base database of GWAS summary data for hundreds of traits we can search for ‘candidate traits’ with which the outlier SNP has an association. (e) The causal inference of each of those candidate traits on the outcome can be estimated using MR by identifying their instruments (excluding the original outlier SNP). This allows us to identify new traits that putatively influence the outcome.

Outlier detection

Several outlier detection methods now exist that are based on the contribution of each SNP to overall heterogeneity in an inverse-variance weighted (IVW) meta-analysis ¹⁹. We used the approach implemented in the RadialMR R package (https://github.com/WSpiller/RadialMR) to detect outliers. Full details are provided elsewhere ²⁰, but briefly, we used the so-called ‘modified 2^nd order weighting’ approach to estimate total Cochran’s Q statistic as a measure of heterogeneity, as well as the individual contributions of each SNP, q_i²⁰. This has been shown to be comparable to the simulation-based approach in MR-PRESSO ^15,12. The probability of a SNP being an outlier is calculated based on q_i being chi-square distributed with 1 degree of freedom. For demonstration purposes we adopted a conservative p-value threshold for identifying outliers, dividing 0.05 by the number of SNPs as a correction for multiple testing. We are not, however, suggesting that this arbitrary threshold will be optimal for identifying outliers, and users can apply other approaches or thresholds through the MR-TRYX software. We employed modified 2nd order weights throughout this paper to avoid problems arising due to the no measurement error in the exposure (NOME) assumption ²⁰, assuming a multiplicative random effects model if any residual heterogeneity was detected.

Candidate trait detection

Traits associated with the detected outliers could causally influence the outcome. MR-TRYX searches the MR-Base database to identify the traits that have associations with the detected outliers. By default, we limit the search to traits for which the GWAS results registered at MR-Base have more than 500,000 SNPs and sample sizes exceeding 5,000. Traits that have an association with outlier SNPs at genome-wide p-value threshold (p < 5 x 10⁻⁸; in keeping with traditional GWAS thresholds used for instrument selection) are regarded as potential risk factors for the outcome and defined as “candidate traits". Each candidate trait is tested for its influence on the original exposure and outcome traits (Figure 1) using the IVW random effects model. We take forward putative associations based on FDR < 0.05 but we note that the use of arbitrary thresholds is problematic ^22,23, and we use them here to make high dimensional investigations more manageable.

Assessing causal estimates of the association of candidate trait with the outcome

Suppose we have g₀,g_x1,…,g_xE instruments for the exposures x where g₀ is an outlier in the x-y MR analysis due to an association with candidate trait p, where E indicates the number of genetic variants. Also, p has g₀,g_p1,…,g_pM genetic instruments, where M is the number of genetic variants for p. To obtain the estimate of (py) uncontaminated by shared genetic effects between P and x (Figure 1A), we perform multivariable MR analysis ²⁴. We obtain a unique list of T clumped instruments for both x and p, and then obtain the genetic effects of each of these SNPs on the exposure (gx), candidate trait (gp), and outcome (gy). Finally, we estimate the causal influence of p on y conditioning on x by regressing (gy) ~(gx) + (gp) weighted by the inverse of the variance of the (gy) estimates. The whole process is automated within the TwoSampleMR R package which connects to the MR-Base database.

In the case of an outlier SNP associating with many candidate traits we first apply a LASSO regression of (gy) ~(gx) + (gp_i) +…+(gp_p) and use cross validation to obtain the shrinkage parameter that minimises the mean squared error. We retain only the candidate traits that are putatively associated with the outcome and have non-zero effects after shrinkage. Then we apply remaining traits in a multivariable model with x against the outcome, as described above ²⁴. We perform the LASSO step because many traits in the MR- Base database have considerable overlap and redundancy, and the statistical power of multivariable analysis depends on the heterogeneity between the genetic effects on the exposure variables ²⁴. Using LASSO therefore automates the removal of redundant traits. With the remaining traits we then obtain estimates of (py) that are conditionally independent of x and amongst all P traits by combining them in a multivariable analysis on the outcome y.

Adjusting exposure-outcome associations for known candidate-trait associations

An illustration of how outliers arise in MR analyses is shown in Figure 2. If a SNP g has some influence on exposure x, and x has some influence on outcome y, the SNP effect on y is expected to be (gy) = (gx)(xy), where (gx) is the SNP effect on x and (xy) is the causal effect of x on y. Any substantive difference between (gy) and (gx)(xy) could be due to an additional influence on y arising from the SNP’s effect through an alternative pathway.

Figure 2.

Result from simulations. (A) Power (first panel where the simulated causal effect is 0.2; causal model) and false discovery rate (second panel where the simulated causal effect is 0; null model) to detect an association between simulated exposure and outcome traits. The x-axis depicts the number of instruments (out of 30) that exhibit a horizontal pleiotropic effect. There are four methods for handling outliers, and we posit two scenarios for detecting outliers. The methods are ‘raw’, where all SNPs are used in a standard IVW analysis regardless of outlier status; ‘outliers adjusted’ where the outlier SNPs are adjusted for detected alternative pathways; ‘outliers removed (all)’ where all detected outliers are removed; and ‘outliers removed (candidate)’ where only outliers that are found to influence a candidate trait are removed. We run the latter three methods by detecting outliers empirically, but also show, for comparison, the hypothetical case in which we know the pleiotropic variants a priori. (B) As in (A), except comparing the bias of different methods, assessed as the proportion of estimates that are substantially different from the simulated effect (y-axis).

If a SNP influences a ‘candidate trait’,P, which in turn influences the outcome (or the exposure and the outcome), then the SNP’s influence on the exposure and the outcome will be a combination of its direct effects through x and indirect effects through P²⁴. If we have estimates of how the candidate trait influences the outcome, then we can adjust the original SNP-outcome estimate to the effect that it would have exhibited had it not been influencing the candidate trait. In other words, we can obtain an adjusted SNP-outcome effect conditional on the ‘candidate-trait - exposure’ and ‘candidate-trait - outcome’ effects. If the SNP influences P independent candidate traits (as selected from the LASSO step), then the expected effect of the SNP on y is

Hence, the effect of the SNP on the outcome adjusted for alternative pathwaysP₁, …, P_p is

We use parametric bootstraps to estimate the standard error of the (gy)* estimate, where 1000 resamples of (gy),(gp) and (py) are obtained based on their respective standard errors and the standard deviation of the resultant estimate, represents its standard error. Finally, an adjusted effect estimate of (xy) due to SNP g is obtained through the Wald ratio.

Simulations

IVW effect estimates are liable to be biased when at least some of the instrumenting SNPs exhibit horizontal pleiotropy, and those SNPs tend to contribute disproportionately towards the heterogeneity in the effect estimate. We assess the performance of MR-TRYX against outlier removal methods with respect to the ability to address problems that arise due to horizontal pleiotropy (bias, low power and inflated false discovery rates). In these simulations we ask: if we can identify the pathway through which an outlier SNP has a horizontal pleiotropic effect, can adjustment for that pathway improve the original exposure-outcome analysis? Two scenarios of simulations are performed, the first using a null causal effect ((gy) = 0), and the second a positive causal effect ((gy) = 0.2). In each set, four methods are considered for handling outliers:

1) Raw, where all SNPs are used in a standard IVW analysis.
2) Outliers adjusted, where the outlier SNPs are adjusted for the effect of the candidate trait on the outcome using MR-TRYX.
3) All outliers removed, where all detected outliers are removed.
4) Candidate outliers removed, where only outliers that are found to influence a candidate trait are removed.

We run the latter three methods by detecting outliers empirically, but also run the hypothetical case in which we know the pleiotropic variants a priori as a “gold-standard" for comparison. Individual level data are generated in a two-sample MR setting, where data on the SNP-exposure association and SNP-outcome are estimated in non-overlapping sets of individuals (n=5000). The relevant association summary statistics for two sample MR are obtained from a regression of genotype on trait under an additive genetic model. We set the range of the number of pleiotropic SNPs from 0 (no pleiotropic SNPs) to 30 out of 30. The IVW estimator was used as the comparator as this approach is the standard to MR. The results for each case represent the mean values for 1000 simulated datasets. The detailed information and the script used for the simulations can be found elsewhere (https://github.com/explodecomputer/tryx-analysis)

Empirical analyses

As applied examples, we chose two robust findings and two controversial findings that are potentially biased due to pleiotropy: i) systolic blood pressure (SBP) and coronary heart disease (CHD); ii) urate and CHD; iii) sleep duration and schizophrenia; and iv) education level (years of schooling) and body mass index (BMI). Those examples were chosen based on previous findings ^25-28 to illustrate how pleiotropic variants can be used to identify other pathways and adjusted to estimate the causal effect of the original exposure on the outcome independent of pleiotropic bias.

Summary statistics (beta coefficients and SEs) for the associations of the SNPs with each exposure were obtained from the publicly available GWAS database (Supplementary Table S1). Selected SNPs were harmonised for the analysis, excluding palindromic SNPs and pruning for linkage disequilibrium (r² <0.001). We primarily used the two-sample MR IVW method to obtain causal estimates between exposures and outcomes allowing each SNP to have different mean effect (random effects model). A number of sensitivity analyses were applied to evaluate the consistency of causal effect estimates under different models of pleiotropy amongst the SNPs, including the MR-Egger⁶, weighted median and weighted mode approaches^7,8.

Outliers were detected among the instruments for each exposure (P < 0.05 / the number of SNPs). We searched the MR-Base database to identify the candidate traits that are associated with outliers (p < 5 × 10⁻⁸). We then performed multivariable MR analysis to test which candidate trait can explain the heterogeneity in the original exposure-outcome association. To perform multivariable MR, more SNPs were introduced into the analysis that instrument the candidate traits.

Subsequently we re-estimated the association of the original exposure and the original outcome using different sets of instruments: a) all SNPs (corresponding to the raw method in our simulation), b) outliers adjusted c) all outlier removed, c) candidate outliers removed.

All analyses were conducted with the TwoSampleMR package of MR-Base (https://github.com/MRCIEU/TwoSampleMR) and the MR-TRYX package (https://github.com/explodecomputer/tryx) in R statistical software (ver 3.4.1).

Results

Simulations

Our simulations show that as the proportion of SNPs exhibiting (balanced) horizontal pleiotropy increases, type 1 error rates for the outlier removal approaches also increases (Figure 2A). Type 1 error rates are maintained at expected levels when adjusting for outliers. A similar pattern of results among the three methods (the raw, outlier removal and outlier adjustment) is seen for the likelihood of estimates being biased (Figure 2B), where outlier removal and raw estimates also performed worse than outlier adjustment.

For simulations in which there was a true causal effect, we observed that outlier removal methods had higher power, consistent with them having higher false discovery rates in the null simulations (Figure 2). However, outlier adjustment improved power over the ‘raw’ IVW approach. This is likely because balanced heterogeneity increases the standard error and adjusting away the pleiotropic effects reduces this noise term. Bias was elevated substantially in the outlier removal methods as the proportion of SNPs with pleiotropic effects increased, whereas bias was lowest for the adjustment-based method, and independent of the level of pleiotropy across the SNPs.

Outlier removal and outlier adjustment performance are limited by the efficacy and power of outlier detection methods: we note that when we assume all outliers are detected correctly in our simulation scenarios the performance of outlier removal and outlier adjustment both improve in terms of FDR, power and bias. Outlier adjustment is also dependent on availability of GWAS summary data for the candidate trait(s), and on the power to detect a variant's association with the candidate trait(s).

Empirical MR-TRYX analyses using four exposure-outcome hypotheses

To examine the performance of MR-TRYX analysis, we tested four independent exposure- outcome hypotheses. For each analysis we: a) obtain MR estimates of the exposure- outcome causal relationship and detect outlier instruments; b) identify putative novel influences (candidate traits) on the outcome trait based on their associations with outlier variants (Table 1; Supplementary Table S2); c) adjust the original SNP-outcome estimates for the putative influences operating through the candidate traits (Table 2); and d) compare the changes in heterogeneity in the MR estimates of the adjusted SNP-outcome effects to standard outlier removal methods (Figure 4).

View this table:

Table 1.

Candidate traits associated with both exposure and outcome.

View this table:

Table 2.

The results of empirical analyses with different IV estimators derived from different methods.

Example 1: Systolic blood pressure and coronary heart disease

Blood pressure is a well-established risk factor for CHD. Random effects IVW estimates indicated that higher SBP is causally associated with higher risk of CHD (Odds ratio [OR] per 1SD: 1.76; 95% CI: 1.47, 2.10). While there was substantial heterogeneity in this estimate (Q=682.7 on 157 SNPs, p=5.74 × 10⁻⁶⁷), the estimates from MR Egger, weighted median and weighted mode methods were consistent (Table 2). Seven of the 157 SNPs were detected as strong outliers based on Q statistics. We identified 69 candidate traits that were associated with these outliers (p < 5 x 10⁻⁸). We manually removed redundant traits and traits that are similar to the exposure and the outcome (e.g. high blood pressure). Among the candidate traits, 15 were putatively causally associated with the risk of CHD (Figure 3A). After we applied LASSO regression, 6 traits remained (Table 1): Anthropometric measures (e.g. height), lipid levels (e.g. cholesterol level), and self-reported ibuprofen use were amongst the candidate traits that associated with CHD, which were uncovered due to two outliers (rs3184504 near SH2B3 and rs9349279 near PHACTR).

Figure 3.

Manhattan plot to visualise the causal associations between candidate exposures and hypothesised outcome. This represents the number of traits associated with outliers. The plot is stratified by phenotype category and, within each group, we present the results related to the candidate traits identified. Along the X axis, different phenotype groups are shown in different colours. The Y axis presents log transformed P value for each trait. Filled circles in each category indicate the evidence of association between candidate traits and exposure or outcome (p < 0.05). (A) Empirical analysis 1: Systolic blood pressure (mmHg) and coronary heart disease (log odds). (B) Empirical analysis 2: Urate (mg/dl) and coronary heart disease (log odds). (C) Empirical analysis 3: Sleep duration (hour/night) and schizophrenia (log odds). (D) Empirical analysis 4: Years of schooling (years) and body mass index (kg/m²).

We next adjusted the exposure-outcome association for the detected pleiotropic pathways and obtained an adjusted IVW estimate. The total heterogeneity, based on adjusting only these two of 157 SNP effects, was reduced by 17% (Q=567.6). The effect estimate remained consistent with the original estimate, as did the IVW estimates when removing all outliers, or just outliers known to associate with candidate traits that associated with the outcome. However, the width of the confidence interval was substantially larger (including the null) after removing outliers known to associate with candidate traits (1 OR per SD: 1.80; 95% CI: 0.56, 5.79).

Example 2: Urate and coronary heart disease

Here we show an example with mixed findings from previous studies. The influence of circulating urate levels on risk of coronary heart disease has been under debate. Several MR studies have investigated the inflated effect of urate on CHD, which appeared to be influenced by pleiotropy ^26,29. We re-estimated the associations here using a range of MR methods. As has been previously reported the estimate from IVW suggested a weak association between urate and the risk of CHD using all variants (OR per 1 SD: 1.08; 95% CI: 1.00, 1.17), while there was a large intercept in the MR-Egger analysis (intercept = 1.02; 95% CI: 1.00, 1.03) with a much-attenuated causal effect estimate (Table 2). The median and mode-based estimates were also consistent with the MR-Egger estimate, indicating weak support for urate having a causal influence on CHD. Three variants were detected as outliers, which associated with 61 candidate traits (p < 5 × 10⁻⁸). Among those outliers, rs653178, and rs642803 were associated with 14 traits that had conditionally independent influences on the outcome (Figure 3B), including anthropometric measures (e.g. hip circumference), cholesterol levels, diagnosis of thyroid disease, and smoking status.

Removing the outliers in the IVW analysis led to a more precise (though slightly attenuated) estimate of the influence of higher urate levels on CHD risk (OR per 1 SD: 1.05; 95% CI: 1.01, 1.10 and OR per 1 SD: 1.06, 95% CIs: 1.06, 1.12, respectively, Table 2). The adjustment model also indicated an attenuated IVW estimate in comparison to the ‘raw’ approach, with confidence intervals spanning the null (OR per 1 SD: 1.07, 95% CI: 0.99, 1.16) whilst the degree of heterogeneity was reduced by half by accounting for the pleiotropic pathways through two outlier SNPs. The adjusted scatter plot showed that outliers moved towards the fitted line after controlling for the SNP effect on the candidate traits (Figure 4B). The results in this analysis suggest that it is unlikely that urate has a strong causal influence on CHD. Here, outlier removal appears to strengthen evidence that may lead to wrong conclusion.

Figure 4.

Scatter plot for the exposure-outcome association adjusting the SNP effects on the candidate traits. The arrow indicates changes in the SNP effect after conditioning on the effect of candidate traits on the outcome. The candidate traits that influence the association of the original exposure and the original outcome were listed in the box. (A) Empirical analysis 1: Systolic blood pressure (mmHg) and coronary heart disease (log odds). (B) Empirical analysis 2: Urate (mg/dl) and coronary heart disease (log odds). (C) Empirical analysis 3: Sleep duration (hour/night) and schizophrenia (log odds). (D) Empirical analysis 4: Years of schooling (years) and body mass index (kg/m²).

Example 3: Sleep duration and schizophrenia

Previous studies have shown that sleep disorder is associated with schizophrenia²⁸ However, none of them confirmed the causality between sleep disorder and schizophrenia. We observed weak evidence for any association between sleep duration and schizophrenia (OR per 1 SD: 1.18; 95% CIs: 0.57, 2.45), but there was substantial heterogeneity when all SNPs were used (Q= 204.8, p=6.9 x 10⁻²⁶). Six outlier instruments were detected, which associated with 46 candidate traits (p < 5 x 10⁻⁸). Among those outliers, the SNPs rs7764984 (near HIST1H2BJ) and rs13107325 (near SLC39A8) were associated with three traits that putatively influenced the outcome: self-reported coeliac disease, body composition (impedance of leg) and memory function (Figure 4C).

We re-estimated the original association accounting for the detected outliers. The degree of heterogeneity was reduced by 74% (Q=54.1) when removing all 6 outliers and by 46% (Q=147.7) when adjusting for the two SNP effects that had putative pleiotropic pathways. Both methods of outlier removal and adjustment provide similar estimates in terms of direction, whilst the magnitude of estimates differed. After removing outliers, MR Egger causal estimates were substantially larger (OR per 1 SD= 2.43; 95% CI: 0.49, 12.16 and Beta= 0.20; 95% CI: −0.40, 0.79, respectively) than those from the method using all variants. IVW causal estimates from the adjustment method were virtually identical with the original estimates, with narrower CIs (OR per 1 SD= 2.36; 95% CI: 0.25, 21.96). While all methods indicate that sleep duration is unlikely to be a major causal risk factor for schizophrenia, pursuing outliers in the analysis provided putative indications that coeliac disease and memory function may be risk factors for schizophrenia (Figure 4D).

Example 4: Years of schooling and body mass index

The association of education and health outcome is well established in social science ³⁰. Higher socioeconomic position is generally thought to lead to a lower risk of obesity in high-income countries ^31,32. We used 59 independent genetic instruments ³³ to estimate the influence of years of schooling on BMI³⁴ (Table 2). All MR methods indicated that years of schooling has a causal beneficial effect on BMI (e.g. IVW Beta: −0.27; 95% CI: −0.39, −0.16), except the estimate from MR Egger which had a very imprecise estimate (beta: 0.01; 95% CI: −0.67, 0.70), but the degree of heterogeneity was large (Q = 211.9 on 59 SNPs; p=2.20 × 10⁻⁸). Three outliers (rs6882046 near LINC00461, rs4800490 near NPC1, rs8049439 near ATXN2L) were identified as contributors to heterogeneity, and they showed associations (p < 5 x 10⁻⁸) with 48 candidate traits. Among those candidate traits, two were associated with BMI (Figure 3B): alcohol intake frequency (which associated with all three outliers) and usual walking pace.

We next re-estimated the influence of years of schooling on BMI by accounting for outliers. Adjusting the outliers for candidate trait pathways such as alcohol intake and usual walking pace reduced heterogeneity by 15% and had a small reduction in the confidence intervals while the point estimate remained consistent (Table 1). By contrast, there was a 48% reduction in heterogeneity when removing outliers. Point estimates remained largely consistent across all outlier removal methods. However, we note that Figure 4B shows that one of the outliers (rs4800490, near gene NPC1) on the scatter plot moved away from the fitted line after adjusting for the pleiotropic pathway, indicating that if this outlier is due to a pleiotropic pathway we have estimated its indirect effect inaccurately or partially (e.g. where GWAS summary statistics are not available to identify other effective pleiotropic pathways).

Discussion

The problem of instrumental variables being invalid due to horizontal pleiotropy has received much attention in MR analysis. Detecting and excluding such invalid instruments, based on whether they appear to be outliers in the analysis, is now a common strategy that exists in various forms^7,8,14,15,35. We have shown here that outlier removal could, in some circumstances, compound rather than reduce bias, and misses an opportunity to better understand the traits under study. We developed the MR-TRYX framework, which utilises the MR-Base database ¹⁰ of GWAS summary data to identify potential explanations for outlying SNP instruments, and to improve estimates by accounting for the pleiotropic pathways that give rise to them. We have also demonstrated the use and interpretation of MR-TRYX in four sets of empirical analyses.

For accurate performance, MR-TRYX depends upon the performance of three methodological components: (i) detecting instruments that exhibit horizontal pleiotropy; (ii) identifying the candidate traits on the alternative pathways from the variant to the outcome; and (iii) correctly estimating the effects of the candidate traits on the outcome. Each of these components is a difficult problem, but they are all modular and build upon existing methods and resources, and the MR-TRYX framework will naturally improve as those methods and resources themselves improve. We will now discuss the consequences of underperformance of each of these components on the TRYX analysis.

The classification of an outlier in MR analysis can be based on the statistical estimates of how a SNP being included as an instrument due to being reverse causal (Steiger filtering)^12,17, the extent to which a single SNP disproportionately influences the overall result (e.g. Cook's distance) ³⁶, or most commonly the extent to which a SNP contributes to heterogeneity (e.g. Cochran’s Q statistic, MR-PRESSO, and implicitly in median- and mode- based estimators)^7,8,14,15. The philosophy of the latter two approaches is that proving horizontal pleiotropy is impossible, but that it should lead to outliers ⁹. While a useful approximation, these approaches have two main limitations. First, determining whether a SNP is an outlier depends on the use of arbitrary thresholds, and this entails a trade-off between specificity and sensitivity. Second, if most variants are pleiotropic, then it is possible that the outlier SNPs are the only valid instruments. Such a scenario can arise for complex traits such as gene expression or protein levels that have a few large effects and many small effects. For example, for C-reactive protein (CRP) levels, the SNP in the CRP gene region is likely the only valid instrument in some analyses ³⁷. In this context, bias due to horizontal pleiotropy cannot be avoided by selection of instruments since this approach may generate more bias ³⁸. This is supported by our simulation which demonstrates that in the presence of extensive pleiotropy removing outliers increased FDR and bias.

MR-TRYX should, in principle, avoid the problem of outlier removal because instead of removing outliers in their entirety, it attempts to eliminate the component of the SNP- outcome effect that is due to horizontal pleiotropy. Hence, we avoid implicitly cherry picking from amongst the SNPs to be used in the analysis, and if we have low sensitivity (i.e. a more relaxed threshold for outlier detection) it doesn't mean that there will be an unnecessary loss of power in the overall analysis. Previous work has adjusted for the effect of pleiotropic phenotypes, but they treated pleiotropic phenotypes as exogenous variables that are not associated with the causal pathways of interest ³⁹. In MR-TRYX, candidate traits are treated as endogenous variables to account for the effect of the traits on the original association. Moreover, our method is applicable in the two-sample context, whereas the previous method requires individual level data. The problem of outlier detection which remains in MR-TRYX could be sidestepped by applying the adjustment approach to all SNPs irrespective of their contributions to heterogeneity.

Upon identification of potentially pleiotropic SNPs, MR-TRYX can only account for these if the pathways through which pleiotropy is acting can be identified. Detecting the pathways depends on the density and coverage of the human phenome available for the analysis. We use the MR-Base database of GWAS summary results, which comprises several hundred independent traits. While a valuable resource, it is certainly not covering the whole human phenome, and therefore even if a pleiotropic variant is detected correctly, it may not be possible to adjust it away. In the empirical analyses, often fewer than half of the candidate traits were inferred to be associated with the outcome. Yet, as we illustrated, MR-TRYX allows for an informative analysis that could routinely be applied in MR analyses. Broadening phenotype coverage is an on-going pursuit that will continually improve MR- TRYX analysis ⁴⁰.

Finally, it is necessary for the effects through the identified pleiotropic pathways to be accurately estimated. This is a recursive problem - MR-TRYX adjusts the SNP-outcome effects based on the pleiotropic effect through the outlier SNP, but it does this by introducing more SNPs into the analysis that instrument the candidate traits. These new SNPs may themselves exhibit pleiotropic effects which could lead to bias in the estimates of the candidate traits on the outcome, requiring a second round of TRYX-style candidate trait searches; and so on. In the example of education level and BMI, adjustment for the pleiotropic pathway failed to substantially reduce the degree of heterogeneity. Further developments could involve recursively analysing alternative pathways.

MR-TRYX is an expansive framework and there are several limitations in addition to those discussed already. First, our LASSO extension to multivariable MR is used to automate the selection of exposures that will be used for adjustment. A shrinkage step of LASSO may increase the SNP-exposure effect heterogeneity necessary for the power of multivariable MR ²⁴. Multivariable MR is adept at establishing conditionally independent exposures but the reason that some exposures have attenuated effects in comparison to their total effects could be because of a) their total effects were biased by pleiotropy or b) they are mediated by the exposures that are included in the model. Interpretations of a) and b) are very different, because in the case of mediation the exposure is a causal factor for the outcome. Second, we were primarily using the multivariable approach for practical purposes to avoid having multiple highly related exposures taken forward to the adjustment step (e.g. multiple different measures of body composition such as body weight and BMI). This approach worked effectively, although a problem remains unsolved in automating the removal of traits that are “similar" to the outcome. For example, if a trait similar to the outcome CHD associates with an outlier and is included in the multivariable analysis of multiple exposures against coronary CHD, then all the putative exposures will be dropped from the model. In the analyses presented we manually removed traits that came up as candidate pleiotropic pathways but were, in fact, synonymous with or closely related to the outcome. Third, we note that heterogeneity does not necessarily arise only because of pleiotropy, for example the non-collapsibility of odds ratios will introduce heterogeneity automatically which cannot be adjusted away through the TRYX approach. Many other mechanisms exist that can lead to bias in MR, as has been described in detail elsewhere. Fourth, SNPs can appear to be outliers not through being pleiotropic, but through other mechanisms, such as population stratification (association of alleles with phenotypes being confounded by ancestral population), canalization (developmental compensation to a genetic change) ^2,41, or the influence on phenotype being changeable across the life course ⁴². Finally, in the case of a binary outcome, there may be parametric restrictions on the conditional causal odds ratio in our multivariable MR model where the exposure effect is linear in the exposure on the log odds ratio scale ⁴³. However, the two-stage estimator with a logistic second-stage model still yields a valid test of the causal null hypothesis ⁴³.

In this study, we demonstrated the use of MR-TRYX through four examples of identifying putative pathways. In the first empirical example (SBP on CHD), we illustrated the validity of MR-TRYX to detect the traits that possibly influence the disease outcome. Apart from SBP, MR-TRYX also detected well established risk factors for CHD including adiposity, cholesterol levels and standing height. An interesting finding of this example is that headache related traits (e.g. experience of pain due to headache and self-reported status of ibuprofen intake) were identified as candidate traits, which may influence the original association. In support of the putative finding for self-reported ibuprofen use associating with CHD, we also found that pain experienced in the last month (headache) and self-reported migraine were associated with lower risk of CHD (OR per 1 SD: 0.33; 95% CI: 0.12, 0.89 and Beta= 0.02; 95% CI: 0.0004, 0.65, respectively). A previous study reported shared genetic risk between headache (migraine) and CHD, suggesting a potential role of migraine in vascular mechanisms ⁴⁴. An alternative mechanism that could give rise to this association is that the effect of pain on lower CHD risk is entirely mediated through the use of medications such as aspirin that have known protective effects on CHD.

The example of urate and CHD demonstrated the benefit of the adjustment method showing that the noise due to pleiotropy was substantially reduced after correcting for the effect of candidate traits. The presence of hypothyroidism and self-reported levothyroxine sodium intake status were identified as putative risk factors for risk of CHD, which is consistent with previous clinical trial studies: thyroid dysfunction is associated with the overall coronary risk ⁴⁵, which can be reversed by levothyroxine therapy ⁴⁶. In the education -BMI example, we showed that increased alcohol intake and slower usual walking pace may influence the obesity of individuals. These identified traits have been reported as possible risk factors for higher BMI and obesity ^47,48. Additionally, the example of sleep duration and risk of schizophrenia suggested coeliac disease and body composition as putative risk factors for schizophrenia. A number of observational studies suggested that schizophrenia is linked with body composition ⁴⁹ and coeliac disease ⁵⁰. MR of binary exposures is often difficult to interpret because the instrument effects are on liability to disease, not the presence or absence of the disease. Hence, the association between coeliac disease and schizophrenia may be better interpreted as an indication of shared disease aetiology. Nevertheless, this is a valuable finding since the causal effect of those putative risk factors on risk of schizophrenia has not been investigated using an MR approach. Therefore, our example illustrates how outliers can be used to identify alternative pathways, opening the door for hypothesis-free MR approaches and a network-based approach to disease.

In conclusion, we have shown a new method to deal with the bias from horizontal pleiotropy, and to identify putative risk factors for outcomes in a more directed manner than typical hypothesis-free analyses, by exploiting outliers. Heterogeneity is widespread across MR analyses and so we are tapping into a potential new reservoir of information for understanding the aetiology of disease. The strategy is a departure from previous ones dealing with pleiotropy - we have shown that enlarging the problem by searching across all traits for a better understanding of a specific exposure-outcome hypothesis can be fruitful.

Author contributions

YC, GDS and GH conceived the study and developed the statistical analysis plan. YC and GH developed the model and methods. YC, GDS and GH prepared the first draft of manuscript. YC, PCH, TRG, JZ, APM, GDS, and GH contribute to the writing of the manuscript. All authors reviewed and agreed on the manuscript.

Competing interests

The authors declare that they have no conflict of interest.

Data availability

The data that support the findings of this study are available from MR-Base (www.mrbase.org).

References

↵
Holmes, M. V., Ala-Korpela, M. & Davey Smith, G. Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nat Rev Cardiol 14, 577–590, (2017).
OpenUrl CrossRef PubMed
↵
Davey Smith, G. & Ebrahim, S. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology 32, 1–22, (2003).
OpenUrl CrossRef PubMed Web of Science
↵
Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet 23, R89–98, (2014).
OpenUrl CrossRef PubMed Web of Science
↵
Johnson, T. & Uk, S. Efficient calculation for multi-SNP genetic risk scores. (2013).
↵
Pierce, B. L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am J Epidemiol 178, 1177–1184, (2013).
OpenUrl CrossRef PubMed Web of Science
↵
Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. International Journal of Epidemiology 44, 512–525, (2015).
OpenUrl CrossRef PubMed
↵
Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genet Epidemiol 40, 304–314, (2016).
OpenUrl CrossRef PubMed
↵
Hartwig, F. P., Davey Smith, G. & Bowden, J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int J Epidemiol 46, 1985–1998, (2017).
OpenUrl CrossRef PubMed
↵
Hemani, G., Bowden, J. & Davey Smith, G. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum Mol Genet, (2018).
↵
Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408, (2018).
OpenUrl CrossRef PubMed
↵
Corbin, L. J. et al. BMI as a Modifiable Risk Factor for Type 2 Diabetes: Refining and Understanding Causal Estimates Using Mendelian Randomization. Diabetes 65, 3002–3007, (2016).
OpenUrl Abstract/FREE Full Text
↵
Hemani, G., Tilling, K. & Davey Smith, G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. Plos Genet 13, (2017).
↵
Bowden, J. et al. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat Med 36, 1783–1802, (2017).
OpenUrl CrossRef PubMed
↵
Bowden, J. et al. Improving the visualization, interpretation and analysis of two-sample summary data Mendelian randomization via the Radial plot and Radial regression. Int J Epidemiol, (2018).
↵
Verbanck, M., Chen, C. Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet 50, 693–698, (2018).
OpenUrl CrossRef PubMed
↵
Bakker, M. & Wicherts, J. M. Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations. Psychol Methods 19, 409–427, (2014).
OpenUrl CrossRef PubMed
↵
Hemani, G. et al. Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome. bioRxiv, 173682, (2017).
↵
Bateson, W. The methods and scope of genetics. (Cambridge University Press, 2014).
↵
Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 37, 658–665, (2013).
OpenUrl CrossRef PubMed
↵
Bowden, J. et al. Improving the visualization, interpretation and analysis of two-sample summary data Mendelian randomization via the Radial plot and Radial regression. Int J Epidemiol 47, 1264–1278, (2018).
OpenUrl
Bowden, J., Hemani, G. & Davey Smith, G. Detecting individual and global horizontal pleiotropy in Mendelian randomization: a job for the humble heterogeneity statistic? Am J Epidemiol, (2018).
↵
Sterne, J. A. & Davey Smith, G. Sifting the evidence-what's wrong with significance tests? Phys Ther 81, 1464–1469, (2001).
OpenUrl
↵
Wasserstein, R. L. & Lazar, N. A. The ASA's Statement on p-Values: Context, Process, and Purpose. Am Stat 70, 129–131, (2016).
OpenUrl CrossRef
↵
Sanderson, E., Davey Smith, G., Windmeijer, F. & Bowden, J. An examination of multivariable Mendelian randomization in the single sample and two-sample summary data settings. bioRxiv, (2018).
↵
Bennett, D. A. & Holmes, M. V. Mendelian randomisation in cardiovascular research: an introduction for clinicians. Heart 103, 1400–1407, (2017).
OpenUrl Abstract/FREE Full Text
↵
White, J. et al. Plasma urate concentration and risk of coronary heart disease: a Mendelian randomisation analysis. Lancet Diabetes Endocrinol 4, 327–336, (2016).
OpenUrl
Tyrrell, J. et al. Height, body mass index, and socioeconomic status: mendelian randomisation study in UK Biobank. BMJ 352, i582, (2016).
OpenUrl Abstract/FREE Full Text
↵
Kaskie, R. E., Graziano, B. & Ferrarelli, F. Schizophrenia and sleep disorders: links, risks, and management challenges. Nat Sci Sleep 9, 227–239, (2017).
OpenUrl CrossRef
↵
Kleber, M. E. et al. Uric Acid and Cardiovascular Events: A Mendelian Randomization Study. J Am Soc Nephrol 26, 2831–2838, (2015).
OpenUrl Abstract/FREE Full Text
↵
Strom, D., Dudovitz, R., Guerrero, L. R. & Wong, M. D. The Link between Education and Health: It Is Not What You Know, but Whom You Know. J Gen Intern Med 30, S277–S278, (2015).
OpenUrl
↵
Bockerman, P. et al. Does higher education protect against obesity? Evidence using Mendelian randomization. Prev Med 101, 195–198, (2017).
OpenUrl CrossRef PubMed
↵
Cohen, A. K., Rai, M., Rehkopf, D. H. & Abrams, B. Educational attainment and obesity: a systematic review. Obes Rev 14, 989–1005, (2013).
OpenUrl CrossRef PubMed
↵
Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542, (2016).
OpenUrl CrossRef PubMed
↵
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206, (2015).
OpenUrl CrossRef PubMed
↵
Zhu, Z. H. et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat Commun 9, (2018).
↵
Corbin, L. J. et al. BMI as a Modifiable Risk Factor for Type 2 Diabetes: Refining and Understanding Causal Estimates Using Mendelian Randomization. Diabetes 65, 3002–3007, (2016).
OpenUrl Abstract/FREE Full Text
↵
Hartwig, F. P., Borges, M. C., Horta, B. L., Bowden, J. & Davey Smith, G. Inflammatory Biomarkers and Risk of Schizophrenia A 2-Sample Mendelian Randomization Study. Jama Psychiat 74, 1226–1233, (2017).
OpenUrl
↵
Burgess, S., Thompson, S. G. & CRP CHD Genetics Collaboration. Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol 40, 755–764, (2011).
OpenUrl
↵
Jiang, L. et al. Constrained Instruments and their Application to Mendelian Randomization with Pleiotropy. bioRxiv, 227454, (2017).
↵
Visscher, P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet 101, 5–22, (2017).
OpenUrl CrossRef PubMed
↵
Davey Smith, G. & Ebrahim, S. Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol 33, 30–42, (2004).
OpenUrl CrossRef PubMed Web of Science
↵
Tan, Q. et al. Analyzing age-specific genetic effects on human extreme age survival in cohort-based longitudinal studies. Eur J Hum Genet 21, 451–454, (2013).
OpenUrl CrossRef PubMed
↵
Vansteelandt, S., Bowden, J., Babanezhad, M. & Goetghebeur, E. On Instrumental Variables Estimation of Causal Odds Ratios. Stat Sci 26, 403–422, (2011).
OpenUrl CrossRef
↵
Winsvold, B. S. et al. Shared genetic risk between migraine and coronary artery disease: A genome-wide analysis of common variants. Plos One 12, (2017).
↵
Rodondi, N. et al. Subclinical hypothyroidism and the risk of coronary heart disease and mortality. JAMA 304, 1365–1374, (2010).
OpenUrl CrossRef PubMed Web of Science
↵
Fadeyev, V. V. et al. Levothyroxine replacement therapy in patients with subclinical hypothyroidism and coronary artery disease. Endocr Pract 12, 5–17, (2006).
OpenUrl CrossRef PubMed
↵
Cho, Y. et al. Alcohol intake and cardiovascular risk factors: A Mendelian randomisation study. Sci Rep 5, 18422, (2015).
OpenUrl CrossRef PubMed
↵
Williams, P. T. & Thompson, P. D. The relationship of walking intensity to total and cause-specific mortality. Results from the National Walkers' Health Study. Plos One 8, e81098, (2013).
OpenUrl CrossRef PubMed
↵
Sugawara, N. et al. Body composition in patients with schizophrenia: Comparison with healthy controls. Annals of general psychiatry 11, 11, (2012).
OpenUrl CrossRef
↵
Ludvigsson, J. F., Osby, U., Ekbom, A. & Montgomery, S. M. Coeliac disease and risk of schizophrenia and other psychosis: a general population cohort study. Scand J Gastroenterol 42, 179–185, (2007).
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted November 26, 2018.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Bioinformatics

Subject Areas

All Articles

Animal Behavior and Cognition (5200)
Biochemistry (11703)
Bioengineering (8718)
Bioinformatics (29127)
Biophysics (14930)
Cancer Biology (12048)
Cell Biology (17353)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14143)
Epidemiology (2067)
Evolutionary Biology (18266)
Genetics (12219)
Genomics (16765)
Immunology (11841)
Microbiology (28003)
Molecular Biology (11551)
Neuroscience (60804)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3229)
Physiology (4939)
Plant Biology (10383)
Scientific Communication and Education (1679)
Synthetic Biology (2877)
Systems Biology (7333)
Zoology (1642)

[1] ↵
Holmes, M. V., Ala-Korpela, M. & Davey Smith, G. Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nat Rev Cardiol 14, 577–590, (2017).
OpenUrl CrossRef PubMed

[2] ↵
Davey Smith, G. & Ebrahim, S. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology 32, 1–22, (2003).
OpenUrl CrossRef PubMed Web of Science

[3] ↵
Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet 23, R89–98, (2014).
OpenUrl CrossRef PubMed Web of Science

[4] ↵
Johnson, T. & Uk, S. Efficient calculation for multi-SNP genetic risk scores. (2013).

[5] ↵
Pierce, B. L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am J Epidemiol 178, 1177–1184, (2013).
OpenUrl CrossRef PubMed Web of Science

[6] ↵
Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. International Journal of Epidemiology 44, 512–525, (2015).
OpenUrl CrossRef PubMed

[7] ↵
Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genet Epidemiol 40, 304–314, (2016).
OpenUrl CrossRef PubMed

[8] ↵
Hartwig, F. P., Davey Smith, G. & Bowden, J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int J Epidemiol 46, 1985–1998, (2017).
OpenUrl CrossRef PubMed

[9] ↵
Hemani, G., Bowden, J. & Davey Smith, G. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum Mol Genet, (2018).

[10] ↵
Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408, (2018).
OpenUrl CrossRef PubMed

[11] ↵
Corbin, L. J. et al. BMI as a Modifiable Risk Factor for Type 2 Diabetes: Refining and Understanding Causal Estimates Using Mendelian Randomization. Diabetes 65, 3002–3007, (2016).
OpenUrl Abstract/FREE Full Text

[12] ↵
Hemani, G., Tilling, K. & Davey Smith, G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. Plos Genet 13, (2017).

[13] ↵
Bowden, J. et al. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat Med 36, 1783–1802, (2017).
OpenUrl CrossRef PubMed

[14] ↵
Bowden, J. et al. Improving the visualization, interpretation and analysis of two-sample summary data Mendelian randomization via the Radial plot and Radial regression. Int J Epidemiol, (2018).

[15] ↵
Verbanck, M., Chen, C. Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet 50, 693–698, (2018).
OpenUrl CrossRef PubMed

[16] ↵
Bakker, M. & Wicherts, J. M. Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations. Psychol Methods 19, 409–427, (2014).
OpenUrl CrossRef PubMed

[17] ↵
Hemani, G. et al. Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome. bioRxiv, 173682, (2017).

[18] ↵
Bateson, W. The methods and scope of genetics. (Cambridge University Press, 2014).

[19] ↵
Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 37, 658–665, (2013).
OpenUrl CrossRef PubMed

[20] ↵
Bowden, J. et al. Improving the visualization, interpretation and analysis of two-sample summary data Mendelian randomization via the Radial plot and Radial regression. Int J Epidemiol 47, 1264–1278, (2018).
OpenUrl

[21] Bowden, J., Hemani, G. & Davey Smith, G. Detecting individual and global horizontal pleiotropy in Mendelian randomization: a job for the humble heterogeneity statistic? Am J Epidemiol, (2018).

[22] ↵
Sterne, J. A. & Davey Smith, G. Sifting the evidence-what's wrong with significance tests? Phys Ther 81, 1464–1469, (2001).
OpenUrl

[23] ↵
Wasserstein, R. L. & Lazar, N. A. The ASA's Statement on p-Values: Context, Process, and Purpose. Am Stat 70, 129–131, (2016).
OpenUrl CrossRef

[24] ↵
Sanderson, E., Davey Smith, G., Windmeijer, F. & Bowden, J. An examination of multivariable Mendelian randomization in the single sample and two-sample summary data settings. bioRxiv, (2018).

[25] ↵
Bennett, D. A. & Holmes, M. V. Mendelian randomisation in cardiovascular research: an introduction for clinicians. Heart 103, 1400–1407, (2017).
OpenUrl Abstract/FREE Full Text

[26] ↵
White, J. et al. Plasma urate concentration and risk of coronary heart disease: a Mendelian randomisation analysis. Lancet Diabetes Endocrinol 4, 327–336, (2016).
OpenUrl

[27] Tyrrell, J. et al. Height, body mass index, and socioeconomic status: mendelian randomisation study in UK Biobank. BMJ 352, i582, (2016).
OpenUrl Abstract/FREE Full Text

[28] ↵
Kaskie, R. E., Graziano, B. & Ferrarelli, F. Schizophrenia and sleep disorders: links, risks, and management challenges. Nat Sci Sleep 9, 227–239, (2017).
OpenUrl CrossRef

[29] ↵
Kleber, M. E. et al. Uric Acid and Cardiovascular Events: A Mendelian Randomization Study. J Am Soc Nephrol 26, 2831–2838, (2015).
OpenUrl Abstract/FREE Full Text

[30] ↵
Strom, D., Dudovitz, R., Guerrero, L. R. & Wong, M. D. The Link between Education and Health: It Is Not What You Know, but Whom You Know. J Gen Intern Med 30, S277–S278, (2015).
OpenUrl

[31] ↵
Bockerman, P. et al. Does higher education protect against obesity? Evidence using Mendelian randomization. Prev Med 101, 195–198, (2017).
OpenUrl CrossRef PubMed

[32] ↵
Cohen, A. K., Rai, M., Rehkopf, D. H. & Abrams, B. Educational attainment and obesity: a systematic review. Obes Rev 14, 989–1005, (2013).
OpenUrl CrossRef PubMed

[33] ↵
Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542, (2016).
OpenUrl CrossRef PubMed

[34] ↵
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206, (2015).
OpenUrl CrossRef PubMed

[35] ↵
Zhu, Z. H. et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat Commun 9, (2018).

[36] ↵
Corbin, L. J. et al. BMI as a Modifiable Risk Factor for Type 2 Diabetes: Refining and Understanding Causal Estimates Using Mendelian Randomization. Diabetes 65, 3002–3007, (2016).
OpenUrl Abstract/FREE Full Text

[37] ↵
Hartwig, F. P., Borges, M. C., Horta, B. L., Bowden, J. & Davey Smith, G. Inflammatory Biomarkers and Risk of Schizophrenia A 2-Sample Mendelian Randomization Study. Jama Psychiat 74, 1226–1233, (2017).
OpenUrl

[38] ↵
Burgess, S., Thompson, S. G. & CRP CHD Genetics Collaboration. Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol 40, 755–764, (2011).
OpenUrl

[39] ↵
Jiang, L. et al. Constrained Instruments and their Application to Mendelian Randomization with Pleiotropy. bioRxiv, 227454, (2017).

[40] ↵
Visscher, P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet 101, 5–22, (2017).
OpenUrl CrossRef PubMed

[41] ↵
Davey Smith, G. & Ebrahim, S. Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol 33, 30–42, (2004).
OpenUrl CrossRef PubMed Web of Science

[42] ↵
Tan, Q. et al. Analyzing age-specific genetic effects on human extreme age survival in cohort-based longitudinal studies. Eur J Hum Genet 21, 451–454, (2013).
OpenUrl CrossRef PubMed

[43] ↵
Vansteelandt, S., Bowden, J., Babanezhad, M. & Goetghebeur, E. On Instrumental Variables Estimation of Causal Odds Ratios. Stat Sci 26, 403–422, (2011).
OpenUrl CrossRef

[44] ↵
Winsvold, B. S. et al. Shared genetic risk between migraine and coronary artery disease: A genome-wide analysis of common variants. Plos One 12, (2017).

[45] ↵
Rodondi, N. et al. Subclinical hypothyroidism and the risk of coronary heart disease and mortality. JAMA 304, 1365–1374, (2010).
OpenUrl CrossRef PubMed Web of Science

[46] ↵
Fadeyev, V. V. et al. Levothyroxine replacement therapy in patients with subclinical hypothyroidism and coronary artery disease. Endocr Pract 12, 5–17, (2006).
OpenUrl CrossRef PubMed

[47] ↵
Cho, Y. et al. Alcohol intake and cardiovascular risk factors: A Mendelian randomisation study. Sci Rep 5, 18422, (2015).
OpenUrl CrossRef PubMed

[48] ↵
Williams, P. T. & Thompson, P. D. The relationship of walking intensity to total and cause-specific mortality. Results from the National Walkers' Health Study. Plos One 8, e81098, (2013).
OpenUrl CrossRef PubMed

[49] ↵
Sugawara, N. et al. Body composition in patients with schizophrenia: Comparison with healthy controls. Annals of general psychiatry 11, 11, (2012).
OpenUrl CrossRef

[50] ↵
Ludvigsson, J. F., Osby, U., Ekbom, A. & Montgomery, S. M. Coeliac disease and risk of schizophrenia and other psychosis: a general population cohort study. Scand J Gastroenterol 42, 179–185, (2007).
OpenUrl CrossRef PubMed Web of Science

MR-TRYX: Exploiting horizontal pleiotropy to infer novel causal pathways

Abstract

Introduction

Methods

Overview of MR-TRYX

Outlier detection

Candidate trait detection

Assessing causal estimates of the association of candidate trait with the outcome

Adjusting exposure-outcome associations for known candidate-trait associations

Simulations

Empirical analyses

Results

Simulations

Empirical MR-TRYX analyses using four exposure-outcome hypotheses

Example 1: Systolic blood pressure and coronary heart disease

Example 2: Urate and coronary heart disease

Example 3: Sleep duration and schizophrenia

Example 4: Years of schooling and body mass index

Discussion

Author contributions

Competing interests

Data availability

References

Citation Manager Formats

Subject Area