Abstract
Background In Mendelian randomization (MR) analysis, variants that exert horizontal pleiotropy, influencing the outcome through a pathway excluding the hypothesised exposure, are typically treated as a nuisance. However, they could provide valuable information for identifying novel pathways to the traits under investigation.
Methods Following the advice of William Bateson to “TReasure Your eXceptions”, we developed the MR-TRYX framework. Here, we begin by detecting outliers in a single exposure-outcome MR analysis. Outliers are hypothesised to arise due to horizontal pleiotropy, so we search through the MR-Base database of GWAS summary statistics to systematically identify other (“candidate”) traits that associate with the outliers. We developed a LASSO-based multivariable MR approach to model the heterogeneity in the exposure-outcome analysis due to pathways through candidate traits.
Results Through simulations we showed that commonly used outlier removal methods can increase type 1 error rates, but adjustment for detected pleiotropic pathways can improve power without the increase in type 1 error rates. We illustrate the use of MR-TRYX through investigation of several causal relationships: i) systolic blood pressure on coronary heart disease (CHD); ii) urate on CHD; iii) sleep duration on schizophrenia; and iv) education level on body mass index. Many pleiotropic pathways were uncovered with already established causal effects, validating the approach. Novel putative causal pathways, such as pain related phenotypes influencing CHD, were also identified. Adjustment for these pleiotropic pathways substantially reduced the heterogeneity across the analyses.
Conclusion Incorporating GWAS on thousands of traits in MR-Base to model horizontal pleiotropy in MR analysis can improve power through reducing heterogeneity, whilst enabling the identification of novel causal relationships.
Introduction
Mendelian randomization (MR) is now widely used to infer the causal influence of one trait (the exposure) on another (the outcome) 1,2. It is generally performed by obtaining genetic instruments for an exposure through genome-wide association studies (GWAS). If the instruments are valid, in that they are unconfounded and influence the outcome only through the exposure (vertical pleiotropy), then they will each provide an independent, unbiased estimate of the causal effect of the exposure on the outcome 3. Meta-analysing these estimates can provide a more precise estimate of the causal relationship between the exposure and the outcome 4,5. If, however, some of the instruments are invalid, particularly because they additionally influence the outcome through pathways that bypass the exposure (horizontal pleiotropy) 3, then the causal effect estimate is liable to be biased. To date, MR method development has viewed horizontal pleiotropy as a nuisance that needs to be factored out of the analysis6-9. Departing from this viewpoint, here we exploit horizontal pleiotropy as an opportunity to identify new traits that putatively influence the outcome. We then use this knowledge to improve the original exposure-outcome estimates.
A crucial feature of MR is that it can be performed using only GWAS summary data, where the causal effect estimate can be obtained solely from the association results of the instrumental single nucleotide polymorphisms (SNPs) on the exposure and on the outcome 5. This means that causal inference between two traits can be made even if they have never been measured together in the same sample of individuals. Complete GWAS summary results have now been collected from thousands of complex trait and common diseases 10, meaning that one can search the database for candidate traits that might be influenced by the outliers. In turn, the causal influence of each of those candidate traits on the outcome can be estimated using MR by identifying their instruments (and excluding the original outlier). Should any of these candidate traits putatively associate with the outcome then this goes some way towards explaining the horizontal pleiotropic effect that was exhibited by the outlier SNP in the initial exposure-outcome hypothesis.
Several methods exist for identifying outliers in MR, each likely to be sensitive to different patterns of horizontal pleiotropy. Cook’s distance can be used to measure the influence of a particular SNP on the combined estimate from all SNPs 11, identifying SNPs with large influences as outliers. Steiger filtering removes those SNPs that do not explain substantially more of the variance in the exposure trait than in the outcome, attempting to guard against using SNPs as instruments that are likely to be associated with the outcome through a pathway other than the exposure 12. Finally, meta-analysis tools can be used to evaluate if a particular SNP contributes disproportionately to the heterogeneity between the estimates obtained from the set of instruments, and this has been adapted recently to detect outliers in MR analysis 13-15. A potential limitation of heterogeneity-based outlier removal is that this practice is a form of cherry picking 9,16. While outlier removal can certainly improve power by reducing noise in estimation, it could also potentially induce higher type 1 error rates, which we go on to explore through simulations.
Recent large-scale MR scans have indicated that horizontal pleiotropy is widespread based on systematic analysis of heterogeneity 15,17. This suggests that many SNPs used as instruments are likely to associate with other traits, which in turn might associate with the original outcome of interest - hence giving rise to heterogeneity. As such we have an opportunity to identify novel pathways through exploiting outliers. Equipped with automated MR analysis software, outlier detection methods and a database of complete GWAS summary datasets, we developed MR-TRYX (from the phrase coined by William Bateson, “Treasure your exceptions18”), a framework for identifying novel putative causal factors when performing a simple exposure-outcome analysis. In this paper we present simulations to show how knowledge of horizontal pathways can be used to discover novel putative causal factors for an outcome of interest, and to also improve the power and reliability of the original exposure-outcome association analysis. We apply MR-TRYX to several exemplar analyses to demonstrate its potential utility.
Methods
Overview of MR-TRYX
Figure 1 shows an overview of the approach. MR-TRYX is applied to an exposure-outcome analysis and it has two objectives. The first is to use outliers in the original exposure-outcome analysis to identify novel putative factors that influence the outcome independently of the exposure. The second is to re-estimate the original exposure-outcome association by adjusting outlier SNPs for the horizontal pleiotropic pathways that might arise through the novel putative associations.
Outlier detection
Several outlier detection methods now exist that are based on the contribution of each SNP to overall heterogeneity in an inverse-variance weighted (IVW) meta-analysis 19. We used the approach implemented in the RadialMR R package (https://github.com/WSpiller/RadialMR) to detect outliers. Full details are provided elsewhere 20, but briefly, we used the so-called ‘modified 2nd order weighting’ approach to estimate total Cochran’s Q statistic as a measure of heterogeneity, as well as the individual contributions of each SNP, qi20. This has been shown to be comparable to the simulation-based approach in MR-PRESSO 15,12. The probability of a SNP being an outlier is calculated based on qi being chi-square distributed with 1 degree of freedom. For demonstration purposes we adopted a conservative p-value threshold for identifying outliers, dividing 0.05 by the number of SNPs as a correction for multiple testing. We are not, however, suggesting that this arbitrary threshold will be optimal for identifying outliers, and users can apply other approaches or thresholds through the MR-TRYX software. We employed modified 2nd order weights throughout this paper to avoid problems arising due to the no measurement error in the exposure (NOME) assumption 20, assuming a multiplicative random effects model if any residual heterogeneity was detected.
Candidate trait detection
Traits associated with the detected outliers could causally influence the outcome. MR-TRYX searches the MR-Base database to identify the traits that have associations with the detected outliers. By default, we limit the search to traits for which the GWAS results registered at MR-Base have more than 500,000 SNPs and sample sizes exceeding 5,000. Traits that have an association with outlier SNPs at genome-wide p-value threshold (p < 5 x 10−8; in keeping with traditional GWAS thresholds used for instrument selection) are regarded as potential risk factors for the outcome and defined as “candidate traits". Each candidate trait is tested for its influence on the original exposure and outcome traits (Figure 1) using the IVW random effects model. We take forward putative associations based on FDR < 0.05 but we note that the use of arbitrary thresholds is problematic 22,23, and we use them here to make high dimensional investigations more manageable.
Assessing causal estimates of the association of candidate trait with the outcome
Suppose we have g0,gx1,…,gxE instruments for the exposures x where g0 is an outlier in the x-y MR analysis due to an association with candidate trait p, where E indicates the number of genetic variants. Also, p has g0,gp1,…,gpM genetic instruments, where M is the number of genetic variants for p. To obtain the estimate of (py) uncontaminated by shared genetic effects between P and x (Figure 1A), we perform multivariable MR analysis 24. We obtain a unique list of T clumped instruments for both x and p, and then obtain the genetic effects of each of these SNPs on the exposure (gx), candidate trait (gp), and outcome (gy). Finally, we estimate the causal influence of p on y conditioning on x by regressing (gy) ~(gx) + (gp) weighted by the inverse of the variance of the (gy) estimates. The whole process is automated within the TwoSampleMR R package which connects to the MR-Base database.
In the case of an outlier SNP associating with many candidate traits we first apply a LASSO regression of (gy) ~(gx) + (gpi) +…+(gpp) and use cross validation to obtain the shrinkage parameter that minimises the mean squared error. We retain only the candidate traits that are putatively associated with the outcome and have non-zero effects after shrinkage. Then we apply remaining traits in a multivariable model with x against the outcome, as described above 24. We perform the LASSO step because many traits in the MR- Base database have considerable overlap and redundancy, and the statistical power of multivariable analysis depends on the heterogeneity between the genetic effects on the exposure variables 24. Using LASSO therefore automates the removal of redundant traits. With the remaining traits we then obtain estimates of (py) that are conditionally independent of x and amongst all P traits by combining them in a multivariable analysis on the outcome y.
Adjusting exposure-outcome associations for known candidate-trait associations
An illustration of how outliers arise in MR analyses is shown in Figure 2. If a SNP g has some influence on exposure x, and x has some influence on outcome y, the SNP effect on y is expected to be (gy) = (gx)(xy), where (gx) is the SNP effect on x and (xy) is the causal effect of x on y. Any substantive difference between (gy) and (gx)(xy) could be due to an additional influence on y arising from the SNP’s effect through an alternative pathway.
If a SNP influences a ‘candidate trait’,P, which in turn influences the outcome (or the exposure and the outcome), then the SNP’s influence on the exposure and the outcome will be a combination of its direct effects through x and indirect effects through P24. If we have estimates of how the candidate trait influences the outcome, then we can adjust the original SNP-outcome estimate to the effect that it would have exhibited had it not been influencing the candidate trait. In other words, we can obtain an adjusted SNP-outcome effect conditional on the ‘candidate-trait - exposure’ and ‘candidate-trait - outcome’ effects. If the SNP influences P independent candidate traits (as selected from the LASSO step), then the expected effect of the SNP on y is
Hence, the effect of the SNP on the outcome adjusted for alternative pathwaysP1, …, Pp is
We use parametric bootstraps to estimate the standard error of the (gy)* estimate, where 1000 resamples of (gy),(gp) and (py) are obtained based on their respective standard errors and the standard deviation of the resultant estimate, represents its standard error. Finally, an adjusted effect estimate of (xy) due to SNP g is obtained through the Wald ratio.
Simulations
IVW effect estimates are liable to be biased when at least some of the instrumenting SNPs exhibit horizontal pleiotropy, and those SNPs tend to contribute disproportionately towards the heterogeneity in the effect estimate. We assess the performance of MR-TRYX against outlier removal methods with respect to the ability to address problems that arise due to horizontal pleiotropy (bias, low power and inflated false discovery rates). In these simulations we ask: if we can identify the pathway through which an outlier SNP has a horizontal pleiotropic effect, can adjustment for that pathway improve the original exposure-outcome analysis? Two scenarios of simulations are performed, the first using a null causal effect ((gy) = 0), and the second a positive causal effect ((gy) = 0.2). In each set, four methods are considered for handling outliers:
1) Raw, where all SNPs are used in a standard IVW analysis.
2) Outliers adjusted, where the outlier SNPs are adjusted for the effect of the candidate trait on the outcome using MR-TRYX.
3) All outliers removed, where all detected outliers are removed.
4) Candidate outliers removed, where only outliers that are found to influence a candidate trait are removed.
We run the latter three methods by detecting outliers empirically, but also run the hypothetical case in which we know the pleiotropic variants a priori as a “gold-standard" for comparison. Individual level data are generated in a two-sample MR setting, where data on the SNP-exposure association and SNP-outcome are estimated in non-overlapping sets of individuals (n=5000). The relevant association summary statistics for two sample MR are obtained from a regression of genotype on trait under an additive genetic model. We set the range of the number of pleiotropic SNPs from 0 (no pleiotropic SNPs) to 30 out of 30. The IVW estimator was used as the comparator as this approach is the standard to MR. The results for each case represent the mean values for 1000 simulated datasets. The detailed information and the script used for the simulations can be found elsewhere (https://github.com/explodecomputer/tryx-analysis)
Empirical analyses
As applied examples, we chose two robust findings and two controversial findings that are potentially biased due to pleiotropy: i) systolic blood pressure (SBP) and coronary heart disease (CHD); ii) urate and CHD; iii) sleep duration and schizophrenia; and iv) education level (years of schooling) and body mass index (BMI). Those examples were chosen based on previous findings 25-28 to illustrate how pleiotropic variants can be used to identify other pathways and adjusted to estimate the causal effect of the original exposure on the outcome independent of pleiotropic bias.
Summary statistics (beta coefficients and SEs) for the associations of the SNPs with each exposure were obtained from the publicly available GWAS database (Supplementary Table S1). Selected SNPs were harmonised for the analysis, excluding palindromic SNPs and pruning for linkage disequilibrium (r2 <0.001). We primarily used the two-sample MR IVW method to obtain causal estimates between exposures and outcomes allowing each SNP to have different mean effect (random effects model). A number of sensitivity analyses were applied to evaluate the consistency of causal effect estimates under different models of pleiotropy amongst the SNPs, including the MR-Egger6, weighted median and weighted mode approaches7,8.
Outliers were detected among the instruments for each exposure (P < 0.05 / the number of SNPs). We searched the MR-Base database to identify the candidate traits that are associated with outliers (p < 5 × 10−8). We then performed multivariable MR analysis to test which candidate trait can explain the heterogeneity in the original exposure-outcome association. To perform multivariable MR, more SNPs were introduced into the analysis that instrument the candidate traits.
Subsequently we re-estimated the association of the original exposure and the original outcome using different sets of instruments: a) all SNPs (corresponding to the raw method in our simulation), b) outliers adjusted c) all outlier removed, c) candidate outliers removed.
All analyses were conducted with the TwoSampleMR package of MR-Base (https://github.com/MRCIEU/TwoSampleMR) and the MR-TRYX package (https://github.com/explodecomputer/tryx) in R statistical software (ver 3.4.1).
Results
Simulations
Our simulations show that as the proportion of SNPs exhibiting (balanced) horizontal pleiotropy increases, type 1 error rates for the outlier removal approaches also increases (Figure 2A). Type 1 error rates are maintained at expected levels when adjusting for outliers. A similar pattern of results among the three methods (the raw, outlier removal and outlier adjustment) is seen for the likelihood of estimates being biased (Figure 2B), where outlier removal and raw estimates also performed worse than outlier adjustment.
For simulations in which there was a true causal effect, we observed that outlier removal methods had higher power, consistent with them having higher false discovery rates in the null simulations (Figure 2). However, outlier adjustment improved power over the ‘raw’ IVW approach. This is likely because balanced heterogeneity increases the standard error and adjusting away the pleiotropic effects reduces this noise term. Bias was elevated substantially in the outlier removal methods as the proportion of SNPs with pleiotropic effects increased, whereas bias was lowest for the adjustment-based method, and independent of the level of pleiotropy across the SNPs.
Outlier removal and outlier adjustment performance are limited by the efficacy and power of outlier detection methods: we note that when we assume all outliers are detected correctly in our simulation scenarios the performance of outlier removal and outlier adjustment both improve in terms of FDR, power and bias. Outlier adjustment is also dependent on availability of GWAS summary data for the candidate trait(s), and on the power to detect a variant's association with the candidate trait(s).
Empirical MR-TRYX analyses using four exposure-outcome hypotheses
To examine the performance of MR-TRYX analysis, we tested four independent exposure- outcome hypotheses. For each analysis we: a) obtain MR estimates of the exposure- outcome causal relationship and detect outlier instruments; b) identify putative novel influences (candidate traits) on the outcome trait based on their associations with outlier variants (Table 1; Supplementary Table S2); c) adjust the original SNP-outcome estimates for the putative influences operating through the candidate traits (Table 2); and d) compare the changes in heterogeneity in the MR estimates of the adjusted SNP-outcome effects to standard outlier removal methods (Figure 4).
Example 1: Systolic blood pressure and coronary heart disease
Blood pressure is a well-established risk factor for CHD. Random effects IVW estimates indicated that higher SBP is causally associated with higher risk of CHD (Odds ratio [OR] per 1SD: 1.76; 95% CI: 1.47, 2.10). While there was substantial heterogeneity in this estimate (Q=682.7 on 157 SNPs, p=5.74 × 10−67), the estimates from MR Egger, weighted median and weighted mode methods were consistent (Table 2). Seven of the 157 SNPs were detected as strong outliers based on Q statistics. We identified 69 candidate traits that were associated with these outliers (p < 5 x 10−8). We manually removed redundant traits and traits that are similar to the exposure and the outcome (e.g. high blood pressure). Among the candidate traits, 15 were putatively causally associated with the risk of CHD (Figure 3A). After we applied LASSO regression, 6 traits remained (Table 1): Anthropometric measures (e.g. height), lipid levels (e.g. cholesterol level), and self-reported ibuprofen use were amongst the candidate traits that associated with CHD, which were uncovered due to two outliers (rs3184504 near SH2B3 and rs9349279 near PHACTR).
We next adjusted the exposure-outcome association for the detected pleiotropic pathways and obtained an adjusted IVW estimate. The total heterogeneity, based on adjusting only these two of 157 SNP effects, was reduced by 17% (Q=567.6). The effect estimate remained consistent with the original estimate, as did the IVW estimates when removing all outliers, or just outliers known to associate with candidate traits that associated with the outcome. However, the width of the confidence interval was substantially larger (including the null) after removing outliers known to associate with candidate traits (1 OR per SD: 1.80; 95% CI: 0.56, 5.79).
Example 2: Urate and coronary heart disease
Here we show an example with mixed findings from previous studies. The influence of circulating urate levels on risk of coronary heart disease has been under debate. Several MR studies have investigated the inflated effect of urate on CHD, which appeared to be influenced by pleiotropy 26,29. We re-estimated the associations here using a range of MR methods. As has been previously reported the estimate from IVW suggested a weak association between urate and the risk of CHD using all variants (OR per 1 SD: 1.08; 95% CI: 1.00, 1.17), while there was a large intercept in the MR-Egger analysis (intercept = 1.02; 95% CI: 1.00, 1.03) with a much-attenuated causal effect estimate (Table 2). The median and mode-based estimates were also consistent with the MR-Egger estimate, indicating weak support for urate having a causal influence on CHD. Three variants were detected as outliers, which associated with 61 candidate traits (p < 5 × 10−8). Among those outliers, rs653178, and rs642803 were associated with 14 traits that had conditionally independent influences on the outcome (Figure 3B), including anthropometric measures (e.g. hip circumference), cholesterol levels, diagnosis of thyroid disease, and smoking status.
Removing the outliers in the IVW analysis led to a more precise (though slightly attenuated) estimate of the influence of higher urate levels on CHD risk (OR per 1 SD: 1.05; 95% CI: 1.01, 1.10 and OR per 1 SD: 1.06, 95% CIs: 1.06, 1.12, respectively, Table 2). The adjustment model also indicated an attenuated IVW estimate in comparison to the ‘raw’ approach, with confidence intervals spanning the null (OR per 1 SD: 1.07, 95% CI: 0.99, 1.16) whilst the degree of heterogeneity was reduced by half by accounting for the pleiotropic pathways through two outlier SNPs. The adjusted scatter plot showed that outliers moved towards the fitted line after controlling for the SNP effect on the candidate traits (Figure 4B). The results in this analysis suggest that it is unlikely that urate has a strong causal influence on CHD. Here, outlier removal appears to strengthen evidence that may lead to wrong conclusion.
Example 3: Sleep duration and schizophrenia
Previous studies have shown that sleep disorder is associated with schizophrenia28 However, none of them confirmed the causality between sleep disorder and schizophrenia. We observed weak evidence for any association between sleep duration and schizophrenia (OR per 1 SD: 1.18; 95% CIs: 0.57, 2.45), but there was substantial heterogeneity when all SNPs were used (Q= 204.8, p=6.9 x 10−26). Six outlier instruments were detected, which associated with 46 candidate traits (p < 5 x 10−8). Among those outliers, the SNPs rs7764984 (near HIST1H2BJ) and rs13107325 (near SLC39A8) were associated with three traits that putatively influenced the outcome: self-reported coeliac disease, body composition (impedance of leg) and memory function (Figure 4C).
We re-estimated the original association accounting for the detected outliers. The degree of heterogeneity was reduced by 74% (Q=54.1) when removing all 6 outliers and by 46% (Q=147.7) when adjusting for the two SNP effects that had putative pleiotropic pathways. Both methods of outlier removal and adjustment provide similar estimates in terms of direction, whilst the magnitude of estimates differed. After removing outliers, MR Egger causal estimates were substantially larger (OR per 1 SD= 2.43; 95% CI: 0.49, 12.16 and Beta= 0.20; 95% CI: −0.40, 0.79, respectively) than those from the method using all variants. IVW causal estimates from the adjustment method were virtually identical with the original estimates, with narrower CIs (OR per 1 SD= 2.36; 95% CI: 0.25, 21.96). While all methods indicate that sleep duration is unlikely to be a major causal risk factor for schizophrenia, pursuing outliers in the analysis provided putative indications that coeliac disease and memory function may be risk factors for schizophrenia (Figure 4D).
Example 4: Years of schooling and body mass index
The association of education and health outcome is well established in social science 30. Higher socioeconomic position is generally thought to lead to a lower risk of obesity in high-income countries 31,32. We used 59 independent genetic instruments 33 to estimate the influence of years of schooling on BMI34 (Table 2). All MR methods indicated that years of schooling has a causal beneficial effect on BMI (e.g. IVW Beta: −0.27; 95% CI: −0.39, −0.16), except the estimate from MR Egger which had a very imprecise estimate (beta: 0.01; 95% CI: −0.67, 0.70), but the degree of heterogeneity was large (Q = 211.9 on 59 SNPs; p=2.20 × 10−8). Three outliers (rs6882046 near LINC00461, rs4800490 near NPC1, rs8049439 near ATXN2L) were identified as contributors to heterogeneity, and they showed associations (p < 5 x 10−8) with 48 candidate traits. Among those candidate traits, two were associated with BMI (Figure 3B): alcohol intake frequency (which associated with all three outliers) and usual walking pace.
We next re-estimated the influence of years of schooling on BMI by accounting for outliers. Adjusting the outliers for candidate trait pathways such as alcohol intake and usual walking pace reduced heterogeneity by 15% and had a small reduction in the confidence intervals while the point estimate remained consistent (Table 1). By contrast, there was a 48% reduction in heterogeneity when removing outliers. Point estimates remained largely consistent across all outlier removal methods. However, we note that Figure 4B shows that one of the outliers (rs4800490, near gene NPC1) on the scatter plot moved away from the fitted line after adjusting for the pleiotropic pathway, indicating that if this outlier is due to a pleiotropic pathway we have estimated its indirect effect inaccurately or partially (e.g. where GWAS summary statistics are not available to identify other effective pleiotropic pathways).
Discussion
The problem of instrumental variables being invalid due to horizontal pleiotropy has received much attention in MR analysis. Detecting and excluding such invalid instruments, based on whether they appear to be outliers in the analysis, is now a common strategy that exists in various forms7,8,14,15,35. We have shown here that outlier removal could, in some circumstances, compound rather than reduce bias, and misses an opportunity to better understand the traits under study. We developed the MR-TRYX framework, which utilises the MR-Base database 10 of GWAS summary data to identify potential explanations for outlying SNP instruments, and to improve estimates by accounting for the pleiotropic pathways that give rise to them. We have also demonstrated the use and interpretation of MR-TRYX in four sets of empirical analyses.
For accurate performance, MR-TRYX depends upon the performance of three methodological components: (i) detecting instruments that exhibit horizontal pleiotropy; (ii) identifying the candidate traits on the alternative pathways from the variant to the outcome; and (iii) correctly estimating the effects of the candidate traits on the outcome. Each of these components is a difficult problem, but they are all modular and build upon existing methods and resources, and the MR-TRYX framework will naturally improve as those methods and resources themselves improve. We will now discuss the consequences of underperformance of each of these components on the TRYX analysis.
The classification of an outlier in MR analysis can be based on the statistical estimates of how a SNP being included as an instrument due to being reverse causal (Steiger filtering)12,17, the extent to which a single SNP disproportionately influences the overall result (e.g. Cook's distance) 36, or most commonly the extent to which a SNP contributes to heterogeneity (e.g. Cochran’s Q statistic, MR-PRESSO, and implicitly in median- and mode- based estimators)7,8,14,15. The philosophy of the latter two approaches is that proving horizontal pleiotropy is impossible, but that it should lead to outliers 9. While a useful approximation, these approaches have two main limitations. First, determining whether a SNP is an outlier depends on the use of arbitrary thresholds, and this entails a trade-off between specificity and sensitivity. Second, if most variants are pleiotropic, then it is possible that the outlier SNPs are the only valid instruments. Such a scenario can arise for complex traits such as gene expression or protein levels that have a few large effects and many small effects. For example, for C-reactive protein (CRP) levels, the SNP in the CRP gene region is likely the only valid instrument in some analyses 37. In this context, bias due to horizontal pleiotropy cannot be avoided by selection of instruments since this approach may generate more bias 38. This is supported by our simulation which demonstrates that in the presence of extensive pleiotropy removing outliers increased FDR and bias.
MR-TRYX should, in principle, avoid the problem of outlier removal because instead of removing outliers in their entirety, it attempts to eliminate the component of the SNP- outcome effect that is due to horizontal pleiotropy. Hence, we avoid implicitly cherry picking from amongst the SNPs to be used in the analysis, and if we have low sensitivity (i.e. a more relaxed threshold for outlier detection) it doesn't mean that there will be an unnecessary loss of power in the overall analysis. Previous work has adjusted for the effect of pleiotropic phenotypes, but they treated pleiotropic phenotypes as exogenous variables that are not associated with the causal pathways of interest 39. In MR-TRYX, candidate traits are treated as endogenous variables to account for the effect of the traits on the original association. Moreover, our method is applicable in the two-sample context, whereas the previous method requires individual level data. The problem of outlier detection which remains in MR-TRYX could be sidestepped by applying the adjustment approach to all SNPs irrespective of their contributions to heterogeneity.
Upon identification of potentially pleiotropic SNPs, MR-TRYX can only account for these if the pathways through which pleiotropy is acting can be identified. Detecting the pathways depends on the density and coverage of the human phenome available for the analysis. We use the MR-Base database of GWAS summary results, which comprises several hundred independent traits. While a valuable resource, it is certainly not covering the whole human phenome, and therefore even if a pleiotropic variant is detected correctly, it may not be possible to adjust it away. In the empirical analyses, often fewer than half of the candidate traits were inferred to be associated with the outcome. Yet, as we illustrated, MR-TRYX allows for an informative analysis that could routinely be applied in MR analyses. Broadening phenotype coverage is an on-going pursuit that will continually improve MR- TRYX analysis 40.
Finally, it is necessary for the effects through the identified pleiotropic pathways to be accurately estimated. This is a recursive problem - MR-TRYX adjusts the SNP-outcome effects based on the pleiotropic effect through the outlier SNP, but it does this by introducing more SNPs into the analysis that instrument the candidate traits. These new SNPs may themselves exhibit pleiotropic effects which could lead to bias in the estimates of the candidate traits on the outcome, requiring a second round of TRYX-style candidate trait searches; and so on. In the example of education level and BMI, adjustment for the pleiotropic pathway failed to substantially reduce the degree of heterogeneity. Further developments could involve recursively analysing alternative pathways.
MR-TRYX is an expansive framework and there are several limitations in addition to those discussed already. First, our LASSO extension to multivariable MR is used to automate the selection of exposures that will be used for adjustment. A shrinkage step of LASSO may increase the SNP-exposure effect heterogeneity necessary for the power of multivariable MR 24. Multivariable MR is adept at establishing conditionally independent exposures but the reason that some exposures have attenuated effects in comparison to their total effects could be because of a) their total effects were biased by pleiotropy or b) they are mediated by the exposures that are included in the model. Interpretations of a) and b) are very different, because in the case of mediation the exposure is a causal factor for the outcome. Second, we were primarily using the multivariable approach for practical purposes to avoid having multiple highly related exposures taken forward to the adjustment step (e.g. multiple different measures of body composition such as body weight and BMI). This approach worked effectively, although a problem remains unsolved in automating the removal of traits that are “similar" to the outcome. For example, if a trait similar to the outcome CHD associates with an outlier and is included in the multivariable analysis of multiple exposures against coronary CHD, then all the putative exposures will be dropped from the model. In the analyses presented we manually removed traits that came up as candidate pleiotropic pathways but were, in fact, synonymous with or closely related to the outcome. Third, we note that heterogeneity does not necessarily arise only because of pleiotropy, for example the non-collapsibility of odds ratios will introduce heterogeneity automatically which cannot be adjusted away through the TRYX approach. Many other mechanisms exist that can lead to bias in MR, as has been described in detail elsewhere. Fourth, SNPs can appear to be outliers not through being pleiotropic, but through other mechanisms, such as population stratification (association of alleles with phenotypes being confounded by ancestral population), canalization (developmental compensation to a genetic change) 2,41, or the influence on phenotype being changeable across the life course 42. Finally, in the case of a binary outcome, there may be parametric restrictions on the conditional causal odds ratio in our multivariable MR model where the exposure effect is linear in the exposure on the log odds ratio scale 43. However, the two-stage estimator with a logistic second-stage model still yields a valid test of the causal null hypothesis 43.
In this study, we demonstrated the use of MR-TRYX through four examples of identifying putative pathways. In the first empirical example (SBP on CHD), we illustrated the validity of MR-TRYX to detect the traits that possibly influence the disease outcome. Apart from SBP, MR-TRYX also detected well established risk factors for CHD including adiposity, cholesterol levels and standing height. An interesting finding of this example is that headache related traits (e.g. experience of pain due to headache and self-reported status of ibuprofen intake) were identified as candidate traits, which may influence the original association. In support of the putative finding for self-reported ibuprofen use associating with CHD, we also found that pain experienced in the last month (headache) and self-reported migraine were associated with lower risk of CHD (OR per 1 SD: 0.33; 95% CI: 0.12, 0.89 and Beta= 0.02; 95% CI: 0.0004, 0.65, respectively). A previous study reported shared genetic risk between headache (migraine) and CHD, suggesting a potential role of migraine in vascular mechanisms 44. An alternative mechanism that could give rise to this association is that the effect of pain on lower CHD risk is entirely mediated through the use of medications such as aspirin that have known protective effects on CHD.
The example of urate and CHD demonstrated the benefit of the adjustment method showing that the noise due to pleiotropy was substantially reduced after correcting for the effect of candidate traits. The presence of hypothyroidism and self-reported levothyroxine sodium intake status were identified as putative risk factors for risk of CHD, which is consistent with previous clinical trial studies: thyroid dysfunction is associated with the overall coronary risk 45, which can be reversed by levothyroxine therapy 46. In the education -BMI example, we showed that increased alcohol intake and slower usual walking pace may influence the obesity of individuals. These identified traits have been reported as possible risk factors for higher BMI and obesity 47,48. Additionally, the example of sleep duration and risk of schizophrenia suggested coeliac disease and body composition as putative risk factors for schizophrenia. A number of observational studies suggested that schizophrenia is linked with body composition 49 and coeliac disease 50. MR of binary exposures is often difficult to interpret because the instrument effects are on liability to disease, not the presence or absence of the disease. Hence, the association between coeliac disease and schizophrenia may be better interpreted as an indication of shared disease aetiology. Nevertheless, this is a valuable finding since the causal effect of those putative risk factors on risk of schizophrenia has not been investigated using an MR approach. Therefore, our example illustrates how outliers can be used to identify alternative pathways, opening the door for hypothesis-free MR approaches and a network-based approach to disease.
In conclusion, we have shown a new method to deal with the bias from horizontal pleiotropy, and to identify putative risk factors for outcomes in a more directed manner than typical hypothesis-free analyses, by exploiting outliers. Heterogeneity is widespread across MR analyses and so we are tapping into a potential new reservoir of information for understanding the aetiology of disease. The strategy is a departure from previous ones dealing with pleiotropy - we have shown that enlarging the problem by searching across all traits for a better understanding of a specific exposure-outcome hypothesis can be fruitful.
Author contributions
YC, GDS and GH conceived the study and developed the statistical analysis plan. YC and GH developed the model and methods. YC, GDS and GH prepared the first draft of manuscript. YC, PCH, TRG, JZ, APM, GDS, and GH contribute to the writing of the manuscript. All authors reviewed and agreed on the manuscript.
Competing interests
The authors declare that they have no conflict of interest.
Data availability
The data that support the findings of this study are available from MR-Base (www.mrbase.org).