Causal Inference for Heritable Phenotypic Risk Factors Using Heterogeneous Genetic Instruments

Jingshu Wang; Qingyuan Zhao; Jack Bowden; Gibran Hemani; George Davey Smith; Dylan S. Small; Nancy R. Zhang

doi:10.1101/2020.05.06.077982

Abstract

Over a decade of genome-wide association studies have led to the finding that significant genetic associations tend to spread across the genome for complex traits. The extreme polygenicity where “all genes affect every complex trait” complicates Mendelian Randomization studies, where natural genetic variations are used as instruments to infer the causal effect of heritable risk factors. We reexamine the assumptions of existing Mendelian Randomization methods and show how they need to be clarified to allow for pervasive horizontal pleiotropy and heterogeneous effect sizes. We propose a comprehensive framework GRAPPLE (Genome-wide mR Analysis under Pervasive PLEiotropy) to analyze the causal effect of target risk factors with heterogeneous genetic instruments and identify possible pleiotropic patterns from data. By using summary statistics from genome-wide association studies, GRAPPLE can efficiently use both strong and weak genetic instruments, detect the existence of multiple pleiotropic pathways, adjust for confounding risk factors, and determine the causal direction. With GRAPPLE, we analyze the effect of blood lipids, body mass index, and systolic blood pressure on 25 disease outcomes, gaining new information on their causal relationships and the potential pleiotropic pathways.

1 Introduction

Understanding the pathogenic mechanism of common diseases is a fundamental goal in clinical research. As randomized controlled experiments are not always possible, researchers are looking towards Mendelian Randomization (MR) as an alternative method for probing the causal mechanisms of common diseases [18]. MR uses inherited genetic variations as instrumental variables (IV) to interrogate the causal effect of heritable risk factor(s) on the disease of interest. The basic idea is that at these variant loci, the inherited alleles are randomly transmitted from the parents to their offsprings according to Mendel’s laws. Thus, the genotypes are independent from non-heritable confounding variables which may obfuscate causal estimation in parent-offspring studies. More generally, such independence also approximately holds for population data such as the genome-wide association studies (GWAS) when individuals share the same ancestry [46]. With the accumulation of data from GWAS, there is an increasing interest in MR approaches, especially approaches that only rely on the GWAS summary statistics that are readily available in the public domain [19, 46].

How well Mendelian Randomization works depends on how well the genetic variant loci used as instruments abide by the rules of IV. These rules dictate that, if the genetic locus has an effect on the disease outcome, it should be only through pathways mediated by the risk factor(s) of interest. This rule, termed exclusion restriction, is violated when there is horizontal pleiotropy, defined as the case where the genetic variant can influence the disease through pathways other than the given risk factor(s) [21]. There has been much recent attention on this issue [10, 4, 5, 25, 51, 59, 42, 11, 3, 36, 43] in MR, yet our understanding is far from complete. Current methods rely on different assumptions on the pattern of horizontal pleiotropy, while improper assumptions may lead to biased estimation of the true causal effects. What assumptions on pleiotropy and genetic effects would be suitable? Would it be possible to learn the degree of pleiotropy from the data? Could we perform model diagnosis utilizing only GWAS summary statistics?

The pleiotropy issue that muddles Mendelian Randomization studies is, in a large part, due to the fact that complex traits are extremely polygenic [16, 57, 8, 32, 45, 49, 36, 38, 55]. Accumulating evidence from GWAS studies indicate that complex diseases may share an omnigenic architecture where all genes affect every complex trait [6]. While a few genes might be “core” genes, almost all genes are involved and can exert non-zero effects on both the risk factors and disease. Thus, given a risk factor that explains only part of the causal mechanism of a complex disease, there would be many SNPs affecting the disease through their effects on other unmeasured risk factors. In other words, in an MR analysis, not only would we expect horizontal pleiotropy to be a pervasive issue across all genetic variants, any disease or complex risk factor would also be associated with a large number of SNPs across the whole genome. Many existing MR methods rely on the assumption that pleiotropic effects sparsely involve only a few SNPs, which directly counters these recent insights. Methods that don’t assume sparsity often require that the pleiotropic effects cancel each other across SNPs, named as the instrument strength independent of direct effect (inSIDE) assumption [4], which can be rather optimistic. Recently, a few new methods relaxed the inSIDE assumption to consider “directional pleiotropy” through one pleiotropic pathway [36]. However, there would then be an issue in identifying the true causal effect of the risk factor, and the model is restrictive to allow for only one pleiotropy pathway. Armed with these assumptions, most existing methods also utilize only the few SNPs that have the strongest association with the risk factor as instruments, ignoring the SNPs that are weakly associated. In this work, we will show that weakly associated SNPs are also informative, and that a model combining weak and strong SNPs would not harm MR while increasing its accuracy and stability in some scenarios.

We propose a comprehensive statistical framework for causal effect estimation when pleiotropy is pervasive across the genome. The framework, called GRAPPLE (Genome-wide mR Analysis under Pervasive PLEiotropy), facilitates interactive identification of multiple pleiotropic pathways and the incorporation of all SNPs associated with the risk factor into the analysis. GRAPPLE builds on the statistical framework MR-RAPS [59]. However, we emphasize the detection of multiple pleiotropic pathways when the inSIDE assumption in MR-RAPS is violated as well as the discrimination of the direction of causality. Using GRAPPLE, we further address how to jointly estimate the effects with multiple risk factors to reduce directional pleiotropy, as well as how to integrate cohorts with overlapping samples, both common challenges faced by current studies. The estimation accuracy of GRAPPLE is examined through validations involving real studies and simulations.

GRAPPLE is applied to a screening of the causal effects of 5 risk factors (three plasma lipid traits, body mass index, and systolic blood pressure) on 25 common diseases. Although there have been many causal effect screens [51, 39, 36] for these risk factors and diseases, the combined analysis enabled by GRAPPLE brings forth new insights on the pleiotropic landscape across diseases and, thus, an improved understanding of the causal estimates obtained. Specifically, we will reexamine the role of lipid traits on coronary artery disease and type-II diabetes, where the results from the multitude of MR studies [46, 31, 33] have been under heated debate.

2 Results

2.1 Model Overview

2.1.1 From the causal model to GWAS summary statistics

Our framework starts with a set of structural equations that jointly specify the generative model on the disease Y that relies on K observed risk factors X = (X₁, ⋯, X_K) of interest, and all genetic variants Z = (Z₁, Z₂,…) (Fig 1a). where U represents unknown non-heritable confounding factors and E_{X_k} and E_Y are random noise acting on X_k and Y respectively. The parameter of interest, β, quantifies the causal effect of the vector of risk factors X on Y. Due to Mendel’s law of inheritance, the genotypes Z are independent of (U, E_Y, E_{X_k}). The function f (U, Z, E_Y) represents the causal effect of unmeasured risk factors on Y, which can be heritable (contributed by Z) or non-heritable (contributed by U). The nonparametric functions f(·) and g_k(·) allow interactions among SNPs in Z and variables (U, E_Y, E_{X_k}) in their causal effects on X and Y. Under this model, there is horizontal pleiotropy for a SNP j if Z_j has nonzero association with f(U, Z, E_Y). This is the case, for example, when Z_j acts on Y through a pathway affecting unmeasured risk factors, or when Z_j is in linkage disequilibrium (LD) with such a locus.

Figure 1:

Model overview. a, The causal directed graph represented by structural equations (1). b, The existence of a pleiotropic pathway 2 (purple) can result in multiple modes of the profile likelihood. c, Multi-modality of the profile likelihood can reflect causal direction. d, The work-flow with GRAPPLE.

Now consider the case where only GWAS summary statistics, i.e. the estimated marginal associations between each SNP j and the risk factors/disease traits, are available. Let Γ_j be the true association between SNP j and Y, and γ_j be the vector of true marginal associations between SNP j and X. Later, we will denote their estimated values from GWAS summary statistics as . Then, as shown in Materials and Methods, the model (1) results in the linear relationship where for binary Y, the parameter β in (2) is a conservatively biased version of β in (1). This relationship holds even when the functions f (·) and g(·) in (1) are not linear. Here, α_j is the marginal association between Z_j and f (U, Z, E_Y), representing the unknown horizontal pleiotropy of SNP j. In MR, one would typically simultaneously select p SNPs as multiple instruments to estimate the causal effect of X.

One can immediately see that identifying β is impossible without further assumptions on α_j. Early MR methods such as IVW [10] made the simplest assumption that all instruments are valid satisfying α_j = 0. However, as already discussed in Introduction, the assumption of no pleiotropy, or more generally, assuming that α_j is sparsely nonzero as in Weighted Median [5] or MR-PRSSO [51] contradicts the fact that horizontal pleiotropy is pervasive. One assumption that allows pervasive pleiotropy is to assume the inSIDE assumption [4] where , or alternatively, the random effect model [59, 43] where ) for most genetic instruments. Unfortunately, the inSIDE assumption requires all unmeasured heritable risk factors of the disease to be genetically uncorrelated with the target risk factor(s) X, which is likely violated, especially when there are clusters of SNPs associate with both the unmeasured risk factors and X.

Noticing the limitation of the inSIDE assumption, some new MR methods, such as LCV [39], CAUSE [36] and MRMix [42] allow a proportion of genetic instruments to be associated with one common hidden pleiotropic pathway affecting both the risk factor and disease. For instance, under the above notation, both CAUSE and MRMix assumed that for the proportion of SNPs that violate the inSIDE assumption, their pleiotropic effects satisfy where represents the directional pleiotropic effects due to a confounding pathway and . This is a more realistic assumption than inSIDE, though there would then be an issue to distinguish the true causal effect β from the pleiotropic direction β + a, and the model may be too restrictive to allow for only one pleiotropic pathway.

2.1.2 Identify multiple pleiotropic pathways and the direction of causality

The key idea underlying GRAPPLE is to detect multiple pleiotropic pathways by using the shape of the data profile likelihood under no pleiotropy to probe the underlying causal mechanism, without explicit assumptions of the pleiotropic patterns (Fig 1b). When K = 1, the GWAS summary statistics reduce to the scalar and , with their standard errors and . From the central limit theorem, the joint distribution of approximately follows a multivariate normal distribution where θ is a shared sample correlation that can be estimated as (see Materials and Methods).

When there is no horizontal pleiotropy in the p selected independent genetic instruments (α_j = 0 for j = 1,2, ⋯, p), the robustified profile likelihood is [59], where ρ(·) is the Tukey’s Biweight loss. As described with more details in Materials and Methods, the profile likelihood is obtained by profiling out nuisance parameters γ₁, ⋯, γ_p in the full likelihood from (3), which is further robustified by replacing the L₂ loss with Tukey’s Biweight loss to increase the sensitivity of mode detection. Under no pleiotropy or inSIDE assumption, it would only have one mode near the true causal effect b = β.

Now consider the case where a second genetic pathway (Pathway 2) also contributes substantially to the disease, and where some of the loci that we include as instruments are also associated with Pathway 2 (Fig 1b). In this scenario, SNPs that are associated with X only through Pathway 2 can contribute to a second mode in the profile likelihood at location β + κ/δ, where κ and δ quantifies the causal effect of Pathway 2 on Y and its marginal association with X, respectively (Materials and Methods). By a similar logic, multiple pleiotropic pathways result in multiple modes in l(b). Thus, we can use the presence of multiple modes in l(b) to diagnose the presence of horizontal pleiotropic effects that are grouped into different directions.

The existence of pleiotropic pathways not only complicates MR, more severely, it makes the causal effects of the risk factors unidentifiable. Specifically, when Pathway 2 exists, the GWAS summary statistics alone can not provide information to distinguish β from β + κ/δ. Instead of making further assumptions to identify the true causal effect, when multiple modes are detected, we suggest collecting more GWAS data to adjust for confounding risk factors that contribute to these modes. To help finding the confounding risk factors, GRAPPLE identifies marker SNPs of each mode, as well as the mapped genes and GWAS traits of each marker SNP (see Materials and Methods), so that researchers can use their expert knowledge to infer possible confounding risk factors that contribute to each mode. With the GWAS summary statistics of these confounding traits, GRAPPLE can perform a multivariate MR analysis assuming the inSIDE assumption on the remaining horizontal pleiotropic effects. GRAPPLE uses an adjusted robustified profile likelihood approach that can jointly estimate β and τ² (Materials and Methods).

With multiple modes detection, we can also consider the question of whether X indeed causes Y, as our structural equation (1) presumes, or it is the reverse case of Y causing X. If it were the case that the direction of causality runs from Y to X, then an instrument is associated with X either through Y, or through unmeasured heritable risk factors of X unrelated to Y. In the latter case, a SNP j satisfies γ_j = 0 while Γ_j ≠ 0, and would contribute to a mode at 0. In the former case, γ_j = βΓ_j where β is the causal effect of Y on X, and these SNPs may contribute to a mode around 1/β. This idea shares similarities with bidirectional MR [50, 26]. Bidirectional MR is based on the assumptions that when MR is reversely performed, all selected instruments affect Y not through X, and filters out suspicious SNPs that may violate this assumption by checking their associations with X. Though it sometimes works, there is no guarantee that the filtering does not introduce bias. In GRAPPLE, we identify the direction by checking if there is a mode at 0 after switching the roles of X and Y, while tolerating the existence of another mode around .

2.1.3 Weak genetic instruments: A curse or a blessing?

Besides the assumption of no-horizontal-pleiotropy, for a SNP to be a valid genetic instrument, it needs to have a non-zero association with the risk factor of interest. In most MR pipelines, SNPs are selected as instruments only when their p-values are below 10^-8, which is required to guarantee a low family-wise error rate (FWER) for GWAS data. Using such a stringent threshold also avoids weak instrument bias [13], where measurement errors in are too large to lead to bias in . However, such a stringent selection threshold may result in very few, or even zero, instruments for underpowered GWAS, and may still not be adequate to avoid weak instrument bias. Further, when our goal is to jointly model the effects of multiple risk factors (the setting where X as a vector), it is unrealistic to assume that all selected SNPs have strong effects on every risk factor. In addition, the highly polygenecity phenomenon of complex traits indicates that the number of weak instruments far outnumbers the number of strong instruments, and collectively, they may exert a positive effect on the estimation accuracy.

In GRAPPLE, we use a flexible p-value threshold, which can be either as stringent as 10^-8 or as mild as 10^-2, for instrument selection. Based on the profile likelihood framework of MR-RAPS [58], GRAPPLE can provide valid inference of to avoid weak instrument bias for multiple risk factors with SNPs selected at any given p-value threshold, when horizontal pleiotropy of most SNPs follow the random effect model . This flexible p-value threshold is beneficial for several reasons. First, including moderate and weak instruments may increase power, especially for under-powered GWAS data where there are too few strongly associated SNPs. Second, for MR with multiple risk factors where it is inevitable to include SNPs that have weak associations with some of the risk factors, we can obtain more accurate causal effect estimations than methods that can only deal with strongly associated SNPs. More importantly, comparing estimates across a series of p-value thresholds can show stability of our estimates and a more complete picture of the underlying horizontal pleiotropy. In practice, we suggest researchers to vary the selection p-value thresholds from a stringent one (say 10^-8) to a mild one (say 10^-2), both in the detection of multiple modes and in estimating causal effects. We would expect to see consistent results across the p-value thresholds, if there are truly multiple pleiotropic pathways or our assumptions hold in estimating the causal effects of the risk factors.

2.1.4 The three-sample design to guard against instrument selection bias

Selecting instruments from GWAS summary statistics can also introduce bias, which is the “winner’s curse”. The magnitude of will increase conditional on being selected and would bias the estimate of β. When K = 1 that there is only one risk factor, the estimate will bias towards 0, but there is no guarantee of the direction of the bias when K > 1. Typically, it is believed that the selection bias is negligible when only the strongly associated SNPs are selected as instruments.

However, we find that for commonly used MR methods, instrument selection can introduce bias even when only genetic variants with genome-wide significant p-values (≤ 10^-8) are selected (Fig S1a). Thus, unlike the usual two-sample GWAS summary statistics design which involves one GWAS data for the risk factor and one for the disease, we strongly advocate using a three-sample GWAS summary statistics design (Fig 1d). To avoid the selection bias, selection of genetic instruments is done on another GWAS dataset for the risk factor, whose cohort has no overlapping samples with both the risk factor and disease cohorts. In addition, to ease calculation (see Materials and Methods), currently we only include independent SNPs in GRAPPLE and we use the LD clumping for SNP selection to obtain them [41]. The three-sample design will also avoid possible selection bias introduced during clumping.

Summarizing the above points, a complete diagram of the GRAPPLE workflow is shown in Fig 1d. A researcher may start with a single target risk factor of interest. The shape of the robustified profile likelihood provides information on possible pleiotropic pathways. If multiple modes are detected, then one may need to adjust for pleiotropic pathways. Unfortunately, this step can not be done automatically as summary statistics themselves do not provide enough information to distinguish a causal mode from a pleiotropic mode. Researchers can use the marker SNP/gene/trait information that GRAPPLE provides to understand each mode, decide what confounding risk factors to adjust for, and collect extra GWAS data for them. GRAPPLE can then jointly estimate the causal effects of multiple risk factors to adjust for the confounding effects of the added risk factors.

2.2 Assessment of GRAPPLE with real studies

2.2.1 Inference from both weak and strong genetic instruments under no pleiotropy

We first examine whether GRAPPLE provides reliable statistical inference combining weak and strong instruments under an artificial setting with real GWAS summary statistics. In this setting, we make X and Y be the same trait from two non-overlapping cohorts, thus γ_j = Γ_j while for any SNP. Though the structural equation describing the causal effect of X on Y does not exist, the linear relationship model (2) from which we estimate β still holds with β = 1 and α_j = 0. In other words, we are not estimating a meaningful “casual” effect, but are in a special case where the true β is known, which can be used to test whether MR methods provides valid inference under no pleiotropy. Specifically, we consider three traits: Body mass index (BMI), Type II diabetes (T2D) and height from the GIANT and DIAGRAM consortium where sex-specific GWAS data are available [30, 35]. The female cohort is used to get and the male cohort is used to get . As a three-sample design, the UK Biobank data for corresponding traits are used for SNP selection. The true β is 1, when we assume that all selected instruments have no gender-specific association with the traits. For benchmarking, we compare the performance of GRAPPLE with CAUSE [36] and other three well-adopted MR methods, inverse-variance weighted (IVW) [10], MR-Egger [4] and weighted median [5] with the same three-sample design.

We compare across different p-value thresholds for instrument selection, ranging from a stringent threshold 10^-8 to a mild threshold 10^-2 (Fig 2a). GRAPPLE keeps providing unbiased estimates of β showing that it does not suffer from the weak instrument bias. Surprisingly, biases exist in other MR methods even with a stringent p-value threshold, which is most likely due to the power discrepancy between the GWAS data for selection and estimating γ_j. In addition, the confidence intervals do get narrower with GRAPPLE for T2D, showing the potential benefit of including weak instruments for less powerful GWAS studies.

Figure 2:

Performance evaluation. a, Estimation of β across selection p-value thresholds under no pleiotropy. Error bars show 95% Confidence intervals and the numbers are the number of independent SNPs obtained at each threshold. b, Estimation of β across three categories of SNPs. The numbers are the number of SNPs in each category. c, Identifying causal directions by multi-modality with MR reversely performed. The selection p-value threshold is kept at 10^-4. d, three modes detected in the profile likelihood with selection p-value threshold 10^-4 for CRP on CAD. Marker genes and GWAS traits (in parenthesis) are shown for each mode. e, estimation of the CRP effect β at different p-value selection threshold with each method. The numbers are the estimated , with * indicating p-value below 0.05 and ** indicating p-value below 0.01.

Finally, we demonstrate that the three-sample design to avoid selection bias is necessary not only for GRAPPLE, but also for other MR methods. As shown in Fig S1a, the two-sample design where we use the same cohort of the risk factor for selection can result in biased casual effects estimation, and the biases appear for most MR methods even when only the strongly associated SNPs are selected.

2.2.2 Level of pleiotropy in SNPs with heterogeneous strengths

Next, we examine whether or not the weak instruments are more vulnerable to pleiotropy, which can be a concern for including the weak SNPs. We compare four risk factor and disease pairs that cover eight different complex traits, including the effect of BMI on T2D, low-density cholesterol concentrations (LDL-C) on coronary artery disease (CAD), height on smoking, and systolic blood pressure (SBP) on stroke (Fig 2b).

We test whether independent sets of strongly and weakly associated SNPs can provide consistent estimates of the causal effects of the risk factors. SNPs passing the p-value threshold 10^-2 in the cohort for selection are divided into three groups after LD clumping: “strong” (p_j ≤ 10^-8), “moderate” (10^-8 < p_j ≤ 10^-5), and “weak” (10^-5 < p_j ≤ 10^-2). The SNPs across groups are used separately to obtain group-specific estimates of the causal effect β. We observe that for all the four pairs, the estimates are stable across groups (Fig 2b). Though the “weaker” SNPs provide estimates with more uncertainty due to limited power, the estimates are consistent with those from the “strong” group. Other MR methods also show some level of consistency in estimating β across different sets of instruments, but perform worse due to weak instrument bias (Fig S1b). To conclude, in the analysis of these four pairs of traits, we do not see any evidence that weakly associated SNPs provide more biased estimates than strong instruments due to horizontal pleiotropy. In contrast, as the strong SNPs, they may also provide useful information to infer the causal effects of the risk factors. GRAPPLE can expand the ability to evaluate causal effect of risk factors with both strong and weak genetic instruments.

2.2.3 Identify direction of causality for known causal relationships

Then, we examine the performance of GRAPPLE in identifying the causal direction with the shape of the profile likelihood. For the causal direction, we focus on the two pairs of traits with known causal relationship: BMI on T2D, and LDL-C on CAD. We switch the roles of the risk factor and disease to see if the correct direction can be revealed. Specifically, we treat T2D and CAD as the “risk factor”, and BMI and LDL-C as the corresponding “disease” (Fig 2c). For T2D, the cohort for the other gender is used for SNP selection and for CAD, the risk factor cohort used is from [17] and the selection p-values are from [44]. As expected, we see that when the roles of the risk factor and disease are reversed, the robustified profile likelihood shows a main mode at 0, and a weaker mode around 1/β.

2.2.4 Multiple pleiotropic pathways in the effect of C-reactive protein

Finally, we test for our ability to identify multiple pleiotropic pathways with the analysis of the C-reactive protein (CRP) effect on CAD. C-reactive protein has been found to be strongly associated with the risk of heart disease while many SNPs who are associated with the C-reactive protein also seem to have pleiotropic effect on lipid traits [22]. Previous MR analyses only included SNPs that are near the gene CRP to guarantee a free-of-pleiotropy analysis [14] and found that CRP has no causal effect on CAD, validated also by randomized experiments [28]. However, if the SNP selection near CRP gene is not performed, can GRAPPLE identify the existence of multiple pathways and obtain the correct estimate of the C-reactive protein effect from its associated SNPs across the whole genome?

CRP GWAS data from [40] is used for selection and the data from [20] using a larger cohort is used for getting . The robustified profile likelihood shows a pattern of three modes, indicating the existence of at least three different pathways (Fig 2d). One mode is negative, one is positive and the third is around zero. The negative mode involves a few marker genes including HNF1A and PVRL2, with a marker trait LDL-C. The positive mode has marker traits pulmonary function and the C-reactive protein, and the few markers genes (IL6R, ARHGAP10, BCL7B, PABPC4) are also involved in immune response and lung cancer progression [47, 48]. The mode at 0 has marker genes CRP and LEPR, and only one marker trait, the C-reactive protein.

We compare across 3 p-value thresholds (10^-8, 10^-5, 10^-3) and check how the existence of multiple pathways affects causal estimates of the effect of C-reactive protein in MR methods using SNPs across the genome. Including the C-reactive protein as the only risk factor, all bench-marking methods give a negative estimate of the CRP effect, which is possibly driven by the bias from an LDL-C induced pleiotropic pathway (Fig 2e). MR-RAPS is the estimation method used in GRAPPLE only there is only one risk factor, and the three other bench-marking methods give incorrect inference of the CRP effect with a p-value of β below 0.01 for at least one SNP selection threshold (notice that the weak instrument bias is bias towards zero as shown in Fig 2a, thus the significance at p-value threshold 10^-3 for MR-Egger and IVW is not due to weak instrument bias). In contrast, after using two risk factors: the C-reactive protein and LDL-C, where LDL-C is an apparent confounding risk factor from Fig 2d, the estimates of CRP effect can keep insignificant across p-value thresholds. In addition, the estimates themselves are much closer to 0 compared with that without including LDL-C. This analysis illustrates how GRAPPLE can detect pleiotropic pathway, provide information on which confounding risk factors to adjust for, and obtain reliable inference after adjusting for additional risk factors.

As a complement to the above real data analysis, we have also conducted a set of simulations to evaluate GRAPPLE’s performance in detecting multiple pleiotropic pathways. For details, see Supplementary Note 2 and Fig S2.

2.3 A causal landscape from 5 risk factors to 25 common diseases

Finally, we apply GRAPPLE to interrogate the causal effects of 5 risk factors on 25 complex diseases through a multivariate genome-wide screen. The five risk factors are three plasma lipid traits: LDL-C, high-density lipoprotein cholesterol (HDL-C), triglycerides (TG), BMI and SBP. The diseases include heart disease, Type II diabetes, kidney disease, common psychiatric disorders, inflammatory disease and cancer (Fig 3a). For each pair of the risk factor and disease, we compare across p-value thresholds from 10^-8 to 10^-2. As a summary of the results, Fig 3a illustrates the average number of modes detected across the p-value thresholds for SNP selection (for modes at each p-value threshold, see Figure S2). Besides the number of modes, Fig 3a also shows the p-values for each risk factor when GRAPPLE is performed with only the single risk factor (see also Fig S3, Materials and Methods). These p-values are not valid when there are pleiotropic pathways.

Figure 3:

Screening with GRAPPLE. a, Landscape of pleiotropic pathways on 25 diseases. The colors show average number of modes across 7 different selection p-value thresholds. The “+” sign shows a positive estimated effect and “−” indicates a negative estimated effect, with the p-value for each cell a combined p-value of replicability across 7 thresholds. These p-values are not multiple-testing adjusted across pairs. b, Multi-modality of the profile likelihood for effect of HDL-C on CAD at 2 different selection p-value threshold. Vertical bars are positions of marker SNPs , labeled by their mapped genes (only unique gene names are shown). c, Multivariate MR for the effect of 5 risk factors on CAD. d, Multivariate MR for the effect of 4 risk factors on CAD. The Error bars are 95% confidence intervals.

Fig 3a shows that multi-modality can be detected in many risk factor and disease pairs. Multimodality is most easily seen using the stringent p-value threshold 10^-8 (Fig S3). However, we find that some modes are contributed by a single SNP thus is more likely an outlier than a pathway. For instance, the effect of stroke on LDL-C shows two modes when the p-value threshold is 10^-8 or 10^-7 (one mode around −2.3 and another mode near 0.08). However, the negative mode only has one marker SNP (rs3184504) which has been found strongly associated with hundreds of different traits according to GWAS Catalog [9] while the other mode has hundreds or marker genes. After removing the SNP rs3184504, the mode disappears. Such a mode also disappears when we increase the p-value threshold to include more SNPs as instruments. Thus, the average number of modes serves as a strength of evidence for the existence of multiple pleiotropic pathways. Some risk factor and disease pairs show multi-modality without having a significant p-value for β, suggesting that the risk factor and disease are genetically correlated through multiple pathways but there is no evidence that risk factor has a causal effect on the disease.

We then focus on two diseases: CAD and T2D. For CAD, all five risk factors show very significant effects, though multi-modality is detected in HDL-C and SBP. First, consider the well-studied, often-debated relationship between CAD and the lipid traits. In our results for HDL-C, with different p-value thresholds, three modes in total can show up, two being negative and one positive, indicating that the pathways from HDL-C to CAD is complicated (Fig 3b). (Fig 3b shows that one negative mode is contributed by SNPs near genes LPL and BUD13, which are strongly associated with triglycerides. Another positive mode is contributed by SNPs near genes ALDH1A2 and PSKH1, which is related to respiratory diseases [52]. The markers of the other negative mode are mapped to genes including LIPG and CETP.

Since the effects of the lipid traits are generally complicated, we combine all 5 risk factors and run an MR jointly with GRAPPLE (Fig 3c) with different p-value thresholds. After adjusting for other risk factors, the two most prominent risk factors for the heart disease are LDL-C and SBP, while the protective effect of HDL-C stays negligible as well as the risk brought by TG. So these results show that HDL-C as a single measurement does not seem to have a protective effect on heart disease, while there are complicated multiple pathways involved. Researchers have suggested analyzing different subgroups of HDL-C as smaller particles tend to have a stronger protective effect [60].

Lipids are involved in a number of biological functions including energy storage, signaling, and acting as structural components of cell membranes and have been reported to be associated with various diseases [54, 24, 56, 27, 34, 1]. Besides CAD, another disease that most likely involves the lipid traits is the Type II diabetes (Fig 3a). T2D is associated with dyslipidemia (i.e., higher concentrations of TG and LDL-C, and lower concentrations of HDL-C), though the causal relationship is still unclear [23]. In the mean time, evidence has emerged that LDL-C reduction with statin therapy results in a modest increase in risk of T2D [54]. For the MR analyzing each risk factor alone, we see potential protective effects of LDL-C and HDL-C on T2D but also multi-modality patterns. Two modes show up in the profile likelihood from HDL-C to T2D where one negative mode has a marker gene LPL and a mode near 0 with marker genes CETP and AC012181.1. Thus we include all 3 lipid traits, along with BMI and run a joint model for these 4 risk factors using GRAPPLE (Fig 3d). Our result indicates a mild protective effect of HDL-C on T2D, while showing not enough evidence for the effect of either LDL-C or TG.

3 Discussion

We propose a comprehensive framework that utilizes both strong and weakly associated SNPs to un-derstand the causal relationship between complex traits. GRAPPLE is robust to pervasive pleiotropy and can identify multiple pleiotropic pathways. The multivariate MR in GRAPPLE can adjust for known confounding risk factors.

GRAPPLE incorporates several improvements over existing MR methods. It gets rid of weak instrument bias by dealing with measurement errors of the SNP associations on the risk factors with profile likelihood. Our likelihood is similar to the approach in [12], while allowing pervasive pleiotropy with the inSIDE assumption. The multi-modality visualization shares similarities with [25], which estimates the causal effect by the global mode, but we provide a more comprehensive analysis to identify multiple pleiotropic pathways by the local modes. Our causality direction identification is related to bidirectional MR where they used the assumption that if we reverse the role of risk factor and disease, the estimated causal effect is likely to be 0. We use this idea in a more principled way and can avoid bias when SNPs affecting the disease through the target risk factors are also selected in the reversed MR. Finally, as the intercept term in MR-Egger is not invariant to the arbitrary assignment of effect alleles for each SNP, leading to the deficiency of the method, GRAPPLE does not include any intercept term.

GRAPPLE needs a separate GWAS cohort of the exposure for SNP selection, which is necessary for valid inference with weakly associated SNPs. Actually, as shown in Fig S1a, the three-sample design is needed for other MR methods as well to avoid selection bias. Currently, we find it hard to obtain multiple good-quality public GWAS summary statistics with non-overlapping cohorts. We suggest that the stage-specific or study-specific GWAS data before meta-analysis may be released to the public in the future.

In GRAPPLE, we still require using a p-value threshold, though it can be as mild as 10^-2, instead of requiring no p-value threshold at all. There are two main reasons for this requirement. One consideration is to increase power, as including too many SNPs with γ_j = 0 or extremely small would instead increase the variance of [59, 58]. Another consideration is that we would not want unmeasured risk factors that are unassociated (or very weakly associated) with target risk factors to bring in large pleiotropic effects on SNPs that mainly affect these unmeasured risk factors. The chance of including these SNPs would be much lower by requiring a mild p-value threshold.

To adjust for confounding risk factors, GRAPPLE requires that these factors are either known a priori, or can be identified from the marker SNPs / genes / traits. However, this step can be hard to execute in practice. The pleiotropic pathways may not be well tagged, and GRAPPLE may not have the power to return enough markers. As a future direction, instead of adjusting for unknown confounding risk factors, we may consider directly adjusting for confounding gene expressions that can be more easily identified.

Finally, when discussing the causal effect of a risk factor, one implicit assumption we use is consistency, assuming that there is a clear and only one version of intervention that can be done on the risk factor. However, interventions on risk factors such as BMI are typically vague [15]. For instance, there can be multiple ways to change weight, such as taking exercise, switching to different diet or conducting a surgery. It is common sense that these different interventions would have different effects on diseases, though they may change BMI by the same amount. Similarly, the cholesterol has abundant functions in our body and involves in multiple biological processes. Intervening different biological processes to change the concentrations of lipid traits may also have different effect on diseases. With MR, the interventions are changing risk factors levels with natural mutations, which may be different from interventions with drugs that has a rapid and strong effect on the risk factors. We think that our causal inference using GRAPPLE, along with the markers we detect, would provide abundant information to deepen our understanding of the risk factors. However, one still needs to be careful when giving causal interpretations of the results. One recommendation in practice is to triangulate the results from MR with other sources of evidence [37].

Materials and Methods

Model details

The structural equations (1) where X = (X₁,X₂,…,X_K) and β = (β₁,β₂,…,β_K) describe how individual level data are generated. To link it with the GWAS summary statistics data, denote which is the true marginal association between a SNP Z_j and risk factor X_k and which is the marginal association between Z_j and the causal effects of unmeasured risk factors on Y, i.e. the horizontal pleiotropic effect of Z_j on Y given X. Then we can rewrite the structural equations into the following linear models: where corr(Z_j, ∈_jk) = 0 for any k and corr guaranteed by the definitions of γ_jk and α_j. By replacing X in (6) with (5), we get where and As Corr(Z_{j, ej}) = 0, we conclude that Γ_j also satisfies that

Thus, parameters Γ_j also represent true marginal associations between SNP Z_j and the the disease trait. This is how we result in working with Eq (2).

When the disease is a binary trait, the structural equation of Y changes to

With the same argument, we have

If we further assume that for each genetic instrument j, Z_j is actually independent of e_j (instead of just being uncorrelated), then the odds ratio that is estimated from the marginal logistic regression will be approximately Γ_j/c with a constant c > 1 determined by the distribution of e_j. In other words, for binary disease outcomes, Eq (2) is still approximately correct with the β in (2) being a conservatively biased (by a ratio of 1/c) version of the β in (7) (for a detailed calculation, see A.1 of [59]).

GWAS summary statistics from overlapping cohorts

The GWAS estimated effect sizes (log odds ratios for binary traits) of SNP j are for the disease and a length K vector for the risk factors. As shown in [7] and derived in Supplementary Note 3.1, for any risk factor k we have where N_o and N_ek are the total sample sizes for the disease and kth risk factor. N_sk is the number of shared samples. The correlation of X_k and Y of any shared sample is Corr [Y_s, X_ks]. Eq (8) shows that all the SNPs share the same correlation. As a consequence, we assume where Σ is the unknown shared correlation matrix.

Estimate the shared correlation Σ

To estimate Σ from summary statistics, we can use Eq (8). We first need to choose SNPs where γ_jk = 0 for all risk factors k so that we can estimate the shared correlation Corr using the sample correlation of the chosen SNPs. We choose all SNPs whose selection p-values for all k.

For these selected SNPs, denote the Z-values of for j = 1,…,T as matrix Z_{T ×(k+1)} where T is the number of selected SNPs. Then Σ is estimated as the correlation matrix of Z_T×(k+1).

Instruments selection using LD clumping

In GRAPPLE, we need to first select a set of SNPs as genetic instruments to estimate the causal effects β. Here, we only select independent SNPs to simplify the calculation. Besides the independence requirement, we only include SNPs that pass a p-value threshold to reduce the inclusion of false positives that can decrease power. To avoid selection bias, a separate cohort for each risk factor is used where the reported p-values in that cohort are used for instruments selection. Denote the selection p-value for SNP j and risk factor k as p_jk, for multiple risk factors and a given selection threshold, we require the Bonferroni combined p-values K min(p_jk) to pass the threshold. After that, we use LD clumping with PLINK [29] to select independent genetic instruments. The LD r² threshold for PLINK is set to 0.001.

Estimate the effects β

Here, we perform statistical analysis assuming α_j ~ N(0, τ²) for the pleiotropic effects, while robust to outliers where the pleiotropic effects for a few instruments are large.

Under model (9), Eq (2) and given Σ, the log-likelihood with GWAS summary statistics satisfy: up to some additive constant. Here, e = (1, 0,…, 0).

Define for each SNP j the statistics where Σ_Xj is the variance of and Σ_{X_jY_j} is the covariance between and in Σ_j. Then the profile log-likelihood that profile out parameters (γ₁,…, γ_p) results in

As discussed in [59], maximizing L(β, τ²) would not give consistent estimate of τ². Because of this and the goal of making robust to outlier SNPs with large pleiotropic effects, our optimization function is the adjusted robustified profile likelihood defined as where ρ(·) is some robust loss function. By default, GRAPPLE uses the Tukey’s Biweight loss function: where c is set to its common default value 4.6851. We maximize (11) with respect to β as well as solving the following estimating equation for the heterogeneity τ² which is where with . The estimating equation satisfies at the true values of β and τ², thus can result in consistent estimate of τ². For the details of estimating β and τ² as well building confidence intervals for them, see Supplementary Note 3.2.

Identify pleiotropic pathways via the multi-modality diagnosis

We use the mode detection of the robustified profile likelihood (11) to detect multiple pleiotropic pathways. To increase sensitivity, we set τ² = 0 and reduce the tuning parameter in the Tukey’s Biweight loss function to c = 3. Here we present a detailed argument on why mode detection can identify pleiotropic pathways.

If there is a confounding Genetic Pathway 2 , as shown in Figure 1a, that are missed, then we have the structural equation and also the linear model for a SNP j that only associate with Genetic Pathway 2 and uncorrelate with X conditional on . Similar to (5), we have

Plug in (13), we have

Thus, if there are enough SNPs like SNP j, they would contribute to another mode of (4) at β + κ/δ.

The same argument works for identification of the causal direction. Say there is another that affects Y but is uncorrelated with the risk factor X (δ = 0). The existence of such is common, unless X is the only heritable risk factor of Y. SNPs strongly associated with would not likely be selected when X is the exposure while would appear when the roles of X and Y are switched. These SNPs can be used to identify the causal direction, as as in the reverse MR, they contribute to a mode at 0, while the SNPs that affect Y through X will contribute to a mode at 1/β.

Select marker SNPs and genes for each mode

GRAPPLE uses LD clumping with a stringent r² (= 0.001) threshold to guarantee independence among the genetic instruments. However, marker SNPs are not restricted to these independent instruments in order to get more biological meaningful markers. Marker SNPs are selected from a SNP set where the SNPs are selected using LD clumping with r² threshold 0.05.

Assume that there are M modes detected at positions β₁, β₂, ⋯, β_M. Define the residual of SNP for mode m as where t_j(·, ·) is defined in Eq (10). SNP j is selected as a marker for mode m if |r_jm′| > t₁ for any m′ ≠ m and |r_jm| ≤ t₀. By default, t₁ is set to 2 and t₀ is set to 1 which gives reasonable results in practice. When the marker SNPs are selected, GRAPPLE further map the SNPs to ENCODE genes where the marker SNPs locate and and search for the traits that these SNPs are strongly associated with in GWA studies by querying HaploReg v4.1 [53] using the R package HaploR. The ratios of the marker SNPs are also returned for reference (shown as the vertical bars in Fig 3b).

Compute replicability p-values across SNP selection thresholds

Each p-value shown in Fig 3a summarizes a vector of p-values across 7 different selection p-value thresholds ranging from 10^-8 ot 10^-2 for each risk factor and disease pair. It reflects how consistent the significance is across SNP selection thresholds. Specifically, it is the partial conjunction p-value [2] for rejecting the null that β is non-zero for at most 2 of the selection thresholds. For a risk factor and disease pair k, let the p-values computed by using SNPs selected with the 7 thresholds p_ks where s = 1, 2, ⋯, 7. Then rank them as p_k(1) ≤ p_k(2) ≤ ⋯ ≤ p_k(7), the partial conjunction p-value for the pair k is computed as 5p_k(3).

Code Availability

The R package GRAPPLE can be installed from Github at https://github.com/jingshuw/GRAPPLE.

Data Availability

All GWAS summary statistics that are used in the analyses of the manuscript are downloaded from public resources, where most of them are downloaded from the GWAS Catalog [9], and the websites of GWAS consortium GIANT, DIAGRAM, PGC, GLGC, and UKBiobank. A complete list of the datasets used in each analysis and where they are from is provided in Supplementary Tables 1 and 2 and Supplementary Note 2. Intermediate results for screening of 5 risk factors on 25 diseases are available at https://www.dropbox.com/sh/myh8xgxne8fo17v/AABWJf781VrCGnqNFMLtnqIea?dl=0.

Author Contribution

J.W. and Q.Z. conceptualize the study and formulate the model, with discussions with D.S. and N.Z.. J.W. developed the method and algorithm, and performed data analysis. J.B. and G.D.S. helped with designing the validation experiments. G.H. provided data for the GWAS summary statistics of C-reactive protein and LD clumping. J.W., Q.Z. and N.Z. wrote the paper.

References

[1].↵
A. P. Agouridis, M. Elisaf, and H. J. Milionis. An overview of lipid abnormalities in patients with inflammatory bowel disease. Annals of Gastroenterology: Quarterly Publication of the Hellenic Society of Gastroenterology, 24(3):181, 2011.
OpenUrl
[2].↵
Y. Benjamini and R. Heller. Screening for partial conjunction hypotheses. Biometrics, 64(4):1215–1222, 2008.
OpenUrl CrossRef PubMed Web of Science
[3].↵
C. Berzuini, H. Guo, S. Burgess, and L. Bernardinelli. A bayesian approach to mendelian randomization with multiple pleiotropic variants. Biostatistics, 21(1):86–101, 2020.
OpenUrl
[4].↵
J. Bowden, G. Davey Smith, and S. Burgess. Mendelian randomization with invalid instruments: effect estimation and bias detection through egger regression. International journal of epidemiology, 44(2):512–525, 2015.
OpenUrl CrossRef PubMed
[5].↵
J. Bowden, G. Davey Smith, P. C. Haycock, and S. Burgess. Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genetic epidemiology, 40(4):304–314, 2016.
OpenUrl CrossRef PubMed
[6].↵
E. A. Boyle, Y. I. Li, and J. K. Pritchard. An expanded view of complex traits: from polygenic to omnigenic. Cell, 169(7):1177–1186, 2017.
OpenUrl CrossRef PubMed
[7].↵
B. Bulik-Sullivan, H. K. Finucane, V. Anttila, A. Gusev, F. R. Day, P.-R. Loh, L. Duncan, J. R. Perry, N. Patterson, E. B. Robinson, et al. An atlas of genetic correlations across human diseases and traits. Nature genetics, 47(11):1236, 2015.
OpenUrl CrossRef PubMed
[8].↵
B. K. Bulik-Sullivan, P.-R. Loh, H. K. Finucane, S. Ripke, J. Yang, N. Patterson, M. J. Daly, A. L. Price, B. M. Neale, Schizophrenia Working Group of the Psychiatric Genomics Consortium, et al. Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics, 47(3):291, 2015.
OpenUrl CrossRef PubMed
[9].↵
A. Buniello, J. A. L. MacArthur, M. Cerezo, L. W. Harris, J. Hayhurst, C. Malangone, A. McMahon, J. Morales, E. Mountjoy, E. Sollis, et al. The nhgri-ebi gwas catalog of published genomewide association studies, targeted arrays and summary statistics 2019. Nucleic acids research, 47(D1):D1005–D1012, 2019.
OpenUrl CrossRef PubMed
[10].↵
S. Burgess, A. Butterworth, and S. G. Thompson. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic epidemiology, 37(7):658–665, 2013.
OpenUrl CrossRef PubMed
[11].↵
S. Burgess, C. N. Foley, E. Allara, J. R. Staley, and J. M. Howson. A robust and efficient method for mendelian randomization with hundreds of genetic variants. Nature Communications, 11(1):1–11, 2020.
OpenUrl
[12].↵
S. Burgess and S. G. Thompson. Multivariable mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. American journal of epidemiology, 181(4):251–260, 2015.
OpenUrl CrossRef PubMed
[13].↵
S. Burgess, S. G. Thompson, and C. C. G. Collaboration. Avoiding bias from weak instruments in mendelian randomization studies. International journal of epidemiology, 40(3):755–764, 2011.
OpenUrl CrossRef PubMed Web of Science
[14].↵
C Reactive Protein Coronary Heart Disease Genetics Collaboration et al. Association between C reactive protein and coronary heart disease: Mendelian randomization analysis based on individual participant data. Bmj, 342:d548, 2011.
OpenUrl Abstract/FREE Full Text
[15].↵
S. R. Cole and C. E. Frangakis. The consistency statement in causal inference: a definition or an assumption? Epidemiology, 20(1):3–5, 2009.
OpenUrl CrossRef PubMed Web of Science
[16].↵
I. S. Consortium. Common polygenic variation contributes to risk of schizophrenia that overlaps with bipolar disorder. Nature, 460(7256):748, 2009.
OpenUrl CrossRef PubMed Web of Science
[17].↵
Coronary Artery Disease (C4D) Genetics Consortium et al. A genome-wide association study in europeans and south asians identifies five new loci for coronary artery disease. Nature genetics, 43(4):339, 2011.
OpenUrl CrossRef PubMed Web of Science
[18].↵
G. Davey Smith and S. Ebrahim. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? International journal of epidemiology, 32(1):1–22, 2003.
OpenUrl CrossRef PubMed Web of Science
[19].↵
G. Davey Smith and G. Hemani. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human molecular genetics, 23(R1):R89–R98, 2014.
OpenUrl CrossRef PubMed Web of Science
[20].↵
A. Dehghan, J. Dupuis, M. Barbalic, J. C. Bis, G. Eiriksdottir, C. Lu, N. Pellikka, H. Wallaschof-ski, J. Kettunen, P. Henneman, et al. Meta-analysis of genome-wide association studies in¿ 80 000 subjects identifies multiple loci for c-reactive protein levelsclinical perspective. Circulation, 123(7):731–738, 2011.
OpenUrl Abstract/FREE Full Text
[21].↵
S. Ebrahim and G. D. Smith. Mendelian randomization: can genetic epidemiology help redress the failures of observational epidemiology? Human genetics, 123(1):15–33, 2008.
OpenUrl CrossRef PubMed Web of Science
[22].↵
P. Elliott, J. C. Chambers, W. Zhang, R. Clarke, J. C. Hopewell, J. F. Peden, J. Erdmann, P. Braund, J. C. Engert, D. Bennett, et al. Genetic loci associated with c-reactive protein levels and risk of coronary heart disease. Jama, 302(1):37–48, 2009.
OpenUrl CrossRef PubMed Web of Science
[23].↵
T. Fall, W. Xie, W. Poon, H. Yaghootkar, R. Mägi, J. W. Knowles, V. Lyssenko, M. Weedon, T. M. Frayling, E. Ingelsson, et al. Using genetic variants to assess the relationship between circulating lipids and type 2 diabetes. Diabetes, page db141710, 2015.
[24].↵
A. C. R. Fonseca, R. Resende, C. R. Oliveira, and C. M. Pereira. Cholesterol and statins in alzheimer’s disease: current controversies. Experimental neurology, 223(2):282–293, 2010.
OpenUrl CrossRef PubMed
[25].↵
F. P. Hartwig, G. Davey Smith, and J. Bowden. Robust inference in summary data mendelian randomization via the zero modal pleiotropy assumption. International journal of epidemiology, 46(6):1985–1998, 2017.
OpenUrl CrossRef PubMed
[26].↵
G. Hemani, K. Tilling, and G. D. Smith. Orienting the causal relationship between imprecisely measured traits using gwas summary data. PLoS genetics, 13(11):e1007081, 2017.
OpenUrl
[27].↵
J. R. Hibbeln and N. Salem Jr.. Dietary polyunsaturated fatty acids and depression: when cholesterol does not satisfy. The American journal of clinical nutrition, 62(1):1–9, 1995.
OpenUrl Abstract/FREE Full Text
[28].↵
M. V. Holmes, M. Ala-Korpela, and G. D. Smith. Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nature Reviews Cardiology, 14(10):577, 2017.
OpenUrl
[29].↵
International Schizophrenia Consortium, S. M. Purcell, N. R. Wray, J. L. Stone, P. M. Visscher, M. C. O’Donovan, P. F. Sullivan, P. Sklar, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 460(7256):748–752, 2009.
OpenUrl CrossRef PubMed Web of Science
[30].↵
A. E. Justice, T. W. Winkler, M. F. Feitosa, M. Graff, V. A. Fisher, K. Young, L. Barata, X. Deng, J. Czajkowski, D. Hadley, et al. Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits. Nature communications, 8:14977, 2017.
OpenUrl
[31].↵
R. M. Krauss. Lipids and lipoproteins in patients with type 2 diabetes. Diabetes care, 27(6):1496–1504, 2004.
OpenUrl Abstract/FREE Full Text
[32].↵
P.-R. Loh, G. Bhatia, A. Gusev, H. K. Finucane, B. K. Bulik-Sullivan, S. J. Pollack, T. R. de Candia, S. H. Lee, N. R. Wray, K. S. Kendler, M. C. O’Donovan, B. M. Neale, N. Patterson, A. L. Price, and S. W. G. o. t. P. G. Consortium. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nature Genetics, 47(12):1385–1392, 2015.
OpenUrl CrossRef PubMed
[33].↵
L. A. Lotta, S. J. Sharp, S. Burgess, J. R. Perry, I. D. Stewart, S. M. Willems, J. Luan, E. Ar-danaz, L. Arriola, B. Balkau, et al. Association between low-density lipoprotein cholesterol-lowering genetic variants and risk of type 2 diabetes: a meta-analysis. Jama, 316(13):1383–1391, 2016.
OpenUrl CrossRef PubMed
[34].↵
M. Maes, R. Smith, A. Christophe, E. Vandoolaeghe, A. V. Gastel, H. Neels, P. Demedts, A. Wauters, and H. Meltzer. Lower serum high-density lipoprotein cholesterol (hdl-c) in major depression and in depressed men with serious suicidal attempts: relationship with immune-inflammatory markers. Acta Psychiatrica Scandinavica, 95(3):212–221, 1997.
OpenUrl CrossRef PubMed Web of Science
[35].↵
A. P. Morris, B. F. Voight, T. M. Teslovich, T. Ferreira, A. V. Segre, V. Steinthorsdottir, R. J. Strawbridge, H. Khan, H. Grallert, A. Mahajan, et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature genetics, 44(9):981, 2012.
OpenUrl CrossRef PubMed
[36].↵
J. Morrison, N. Knoblauch, J. H. Marcus, M. Stephens, and X. He. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nature Genetics, pages 1–7, 2020.
[37].↵
M. R. Munafò and G. D. Smith. Robust research needs many lines of evidence, 2018.
[38].↵
L. J. O’Connor, A. P. Schoech, F. Hormozdiari, S. Gazal, N. Patterson, and A. L. Price. Extreme polygenicity of complex traits is explained by negative selection. The American Journal of Human Genetics, 105(3):456–476, 2019.
OpenUrl
[39].↵
L. J. O’Connor and A. L. Price. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nature genetics, 50(12):1728–1734, 2018.
OpenUrl CrossRef PubMed
[40].↵
B. P. Prins, K. B. Kuchenbaecker, Y. Bao, M. Smart, D. Zabaneh, G. Fatemifar, J. Luan, N. J. Wareham, R. A. Scott, J. R. Perry, et al. Genome-wide analysis of health-related biomarkers in the uk household longitudinal study reveals novel associations. Scientific reports, 7(1):1–9, 2017.
OpenUrl
[41].↵
S. Purcell, B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira, D. Bender, J. Maller, P. Sklar, P. I. De Bakker, M. J. Daly, et al. Plink: a tool set for whole-genome association and populationbased linkage analyses. The American journal of human genetics, 81(3):559–575, 2007.
OpenUrl CrossRef PubMed
[42].↵
G. Qi and N. Chatterjee. Mendelian randomization analysis using mixture models for robust and efficient estimation of causal effects. Nature Communications, 10(1):1–10, 2019.
OpenUrl CrossRef
[43].↵
E. Sanderson, W. Spiller, and J. Bowden. Testing and correcting for weak and pleiotropic instruments in two-sample multivariable mendelian randomisation. bioRxiv, 2020.
[44].↵
H. Schunkert, I. R. Köonig, S. Kathiresan, M. P. Reilly, T. L. Assimes, H. Holm, M. Preuss, A. F. Stewart, M. Barbalic, C. Gieger, et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics, 43(4):333–338, 2011.
OpenUrl CrossRef PubMed
[45].↵
H. Shi, G. Kichaev, and B. Pasaniuc. Contrasting the genetic architecture of 30 complex traits from summary association data. The American Journal of Human Genetics, 99(1):139–153, 2016.
OpenUrl CrossRef PubMed
[46].↵
G. D. Smith, M. V. Holmes, N. M. Davies, and S. Ebrahim. Mendel’s laws, mendelian randomization and causal inference in observational data: substantive and nomenclatural issues. European Journal of Epidemiology, pages 1–13, 2020.
[47].↵
S. Spencer, S. Köstel Bal, W. Egner, H. Lango Allen, S. I. Raza, C. A. Ma, M. Gürel, Y. Zhang, G. Sun, R. A. Sabroe, et al. Loss of the interleukin-6 receptor causes immunodeficiency, atopy, and abnormal inflammatory responses. Journal of Experimental Medicine, 216(9):1986–1998, 2019.
OpenUrl Abstract/FREE Full Text
[48].↵
J.-P. Teng, Z.-Y. Yang, Y.-M. Zhu, D. Ni, Z.-J. Zhu, and X.-Q. Li. The roles of arhgap10 in the proliferation, migration and invasion of lung cancer cells. Oncology letters, 14(4):4613–4618, 2017.
OpenUrl
[49].↵
N. J. Timpson, C. M. Greenwood, N. Soranzo, D. J. Lawson, and J. B. Richards. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nature Reviews Genetics, 19(2):110, 2018.
OpenUrl CrossRef PubMed
[50].↵
N. J. Timpson, B. G. Nordestgaard, R. M. Harbord, J. Zacho, T. M. Frayling, A. Tybjærg-Hansen, and G. D. Smith. C-reactive protein levels and body mass index: elucidating direction of causation through reciprocal mendelian randomization. International journal of obesity, 35(2):300–308, 2011.
OpenUrl CrossRef PubMed
[51].↵
M. Verbanck, C.-y. Chen, B. Neale, and R. Do. Detection of widespread horizontal pleiotropy in causal relationships inferred from mendelian randomization between complex traits and diseases. Nature genetics, 50(5):693–698, 2018.
OpenUrl CrossRef PubMed
[52].↵
J. Wang, F. Li, H. Wei, Z.-X. Lian, R. Sun, and Z. Tian. Respiratory influenza virus infection induces intestinal immune injury via microbiota-mediated th17 cell-dependent inflammation. Journal of Experimental Medicine, 211(12):2397–2410, 2014.
OpenUrl Abstract/FREE Full Text
[53].↵
L. D. Ward and M. Kellis. Haploreg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic acids research, 40(D1):D930–D934, 2012.
OpenUrl CrossRef PubMed Web of Science
[54].↵
J. White, D. I. Swerdlow, D. Preiss, Z. Fairhurst-Hunter, B. J. Keating, F. W. Asselbergs, N. Sattar, S. E. Humphries, A. D. Hingorani, and M. V. Holmes. Association of lipid fractions with risks for coronary artery disease and diabetes. JAMA cardiology, 1(6):692–699, 2016.
OpenUrl
[55].↵
N. R. Wray, C. Wijmenga, P. F. Sullivan, J. Yang, and P. M. Visscher. Common disease is more complex than implied by the core gene omnigenic model. Cell, 173(7):1573–1580, 2018.
OpenUrl
[56].↵
R. S. Yadav and N. K. Tiwari. Lipid integration in neurodegeneration: an overview of alzheimer’s disease. Molecular neurobiology, 50(1):168–176, 2014.
OpenUrl CrossRef PubMed
[57].↵
J. Yang, B. Benyamin, B. P. McEvoy, S. Gordon, A. K. Henders, D. R. Nyholt, P. A. Madden, A. C. Heath, N. G. Martin, G. W. Montgomery, M. E. Goddard, and P. M. Visscher. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics, 42(7):565–569, 2010.
OpenUrl CrossRef PubMed Web of Science
[58].↵
Q. Zhao, Y. Chen, J. Wang, and D. S. Small. Powerful three-sample genome-wide design and robust statistical inference in summary-data Mendelian randomization. International Journal of Epidemiology, 48(5):1478–1492, 07 2019.
OpenUrl
[59].↵
Q. Zhao, J. Wang, G. Hemani, J. Bowden, D. S. Small, et al. Statistical inference in two-sample summary-data mendelian randomization using robust adjusted profile score. Annals of Statistics, 48(3):1742–1769, 2020.
OpenUrl
[60].↵
Q. Zhao, J. Wang, Z. Miao, N. Zhang, S. Hennessy, D. S. Small, and D. J. Rader. The role of lipoprotein subfractions in coronary artery disease: A mendelian randomization study. bioRxiv, page 691089, 2019.

View the discussion thread.

Posted December 15, 2020.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Genetics

Subject Areas

All Articles

Animal Behavior and Cognition (5210)
Biochemistry (11740)
Bioengineering (8750)
Bioinformatics (29189)
Biophysics (14967)
Cancer Biology (12093)
Cell Biology (17410)
Clinical Trials (138)
Developmental Biology (9420)
Ecology (14178)
Epidemiology (2067)
Evolutionary Biology (18301)
Genetics (12239)
Genomics (16797)
Immunology (11865)
Microbiology (28070)
Molecular Biology (11583)
Neuroscience (60953)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4957)
Plant Biology (10425)
Scientific Communication and Education (1683)
Synthetic Biology (2884)
Systems Biology (7338)
Zoology (1651)

[1] [1].↵
A. P. Agouridis, M. Elisaf, and H. J. Milionis. An overview of lipid abnormalities in patients with inflammatory bowel disease. Annals of Gastroenterology: Quarterly Publication of the Hellenic Society of Gastroenterology, 24(3):181, 2011.
OpenUrl

[2] [2].↵
Y. Benjamini and R. Heller. Screening for partial conjunction hypotheses. Biometrics, 64(4):1215–1222, 2008.
OpenUrl CrossRef PubMed Web of Science

[3] [3].↵
C. Berzuini, H. Guo, S. Burgess, and L. Bernardinelli. A bayesian approach to mendelian randomization with multiple pleiotropic variants. Biostatistics, 21(1):86–101, 2020.
OpenUrl

[4] [4].↵
J. Bowden, G. Davey Smith, and S. Burgess. Mendelian randomization with invalid instruments: effect estimation and bias detection through egger regression. International journal of epidemiology, 44(2):512–525, 2015.
OpenUrl CrossRef PubMed

[5] [5].↵
J. Bowden, G. Davey Smith, P. C. Haycock, and S. Burgess. Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genetic epidemiology, 40(4):304–314, 2016.
OpenUrl CrossRef PubMed

[6] [6].↵
E. A. Boyle, Y. I. Li, and J. K. Pritchard. An expanded view of complex traits: from polygenic to omnigenic. Cell, 169(7):1177–1186, 2017.
OpenUrl CrossRef PubMed

[7] [7].↵
B. Bulik-Sullivan, H. K. Finucane, V. Anttila, A. Gusev, F. R. Day, P.-R. Loh, L. Duncan, J. R. Perry, N. Patterson, E. B. Robinson, et al. An atlas of genetic correlations across human diseases and traits. Nature genetics, 47(11):1236, 2015.
OpenUrl CrossRef PubMed

[8] [8].↵
B. K. Bulik-Sullivan, P.-R. Loh, H. K. Finucane, S. Ripke, J. Yang, N. Patterson, M. J. Daly, A. L. Price, B. M. Neale, Schizophrenia Working Group of the Psychiatric Genomics Consortium, et al. Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics, 47(3):291, 2015.
OpenUrl CrossRef PubMed

[9] [9].↵
A. Buniello, J. A. L. MacArthur, M. Cerezo, L. W. Harris, J. Hayhurst, C. Malangone, A. McMahon, J. Morales, E. Mountjoy, E. Sollis, et al. The nhgri-ebi gwas catalog of published genomewide association studies, targeted arrays and summary statistics 2019. Nucleic acids research, 47(D1):D1005–D1012, 2019.
OpenUrl CrossRef PubMed

[10] [10].↵
S. Burgess, A. Butterworth, and S. G. Thompson. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic epidemiology, 37(7):658–665, 2013.
OpenUrl CrossRef PubMed

[11] [11].↵
S. Burgess, C. N. Foley, E. Allara, J. R. Staley, and J. M. Howson. A robust and efficient method for mendelian randomization with hundreds of genetic variants. Nature Communications, 11(1):1–11, 2020.
OpenUrl

[12] [12].↵
S. Burgess and S. G. Thompson. Multivariable mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. American journal of epidemiology, 181(4):251–260, 2015.
OpenUrl CrossRef PubMed

[13] [13].↵
S. Burgess, S. G. Thompson, and C. C. G. Collaboration. Avoiding bias from weak instruments in mendelian randomization studies. International journal of epidemiology, 40(3):755–764, 2011.
OpenUrl CrossRef PubMed Web of Science

[14] [14].↵
C Reactive Protein Coronary Heart Disease Genetics Collaboration et al. Association between C reactive protein and coronary heart disease: Mendelian randomization analysis based on individual participant data. Bmj, 342:d548, 2011.
OpenUrl Abstract/FREE Full Text

[15] [15].↵
S. R. Cole and C. E. Frangakis. The consistency statement in causal inference: a definition or an assumption? Epidemiology, 20(1):3–5, 2009.
OpenUrl CrossRef PubMed Web of Science

[16] [16].↵
I. S. Consortium. Common polygenic variation contributes to risk of schizophrenia that overlaps with bipolar disorder. Nature, 460(7256):748, 2009.
OpenUrl CrossRef PubMed Web of Science

[17] [17].↵
Coronary Artery Disease (C4D) Genetics Consortium et al. A genome-wide association study in europeans and south asians identifies five new loci for coronary artery disease. Nature genetics, 43(4):339, 2011.
OpenUrl CrossRef PubMed Web of Science

[18] [18].↵
G. Davey Smith and S. Ebrahim. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? International journal of epidemiology, 32(1):1–22, 2003.
OpenUrl CrossRef PubMed Web of Science

[19] [19].↵
G. Davey Smith and G. Hemani. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human molecular genetics, 23(R1):R89–R98, 2014.
OpenUrl CrossRef PubMed Web of Science

[20] [20].↵
A. Dehghan, J. Dupuis, M. Barbalic, J. C. Bis, G. Eiriksdottir, C. Lu, N. Pellikka, H. Wallaschof-ski, J. Kettunen, P. Henneman, et al. Meta-analysis of genome-wide association studies in¿ 80 000 subjects identifies multiple loci for c-reactive protein levelsclinical perspective. Circulation, 123(7):731–738, 2011.
OpenUrl Abstract/FREE Full Text

[21] [21].↵
S. Ebrahim and G. D. Smith. Mendelian randomization: can genetic epidemiology help redress the failures of observational epidemiology? Human genetics, 123(1):15–33, 2008.
OpenUrl CrossRef PubMed Web of Science

[22] [22].↵
P. Elliott, J. C. Chambers, W. Zhang, R. Clarke, J. C. Hopewell, J. F. Peden, J. Erdmann, P. Braund, J. C. Engert, D. Bennett, et al. Genetic loci associated with c-reactive protein levels and risk of coronary heart disease. Jama, 302(1):37–48, 2009.
OpenUrl CrossRef PubMed Web of Science

[23] [23].↵
T. Fall, W. Xie, W. Poon, H. Yaghootkar, R. Mägi, J. W. Knowles, V. Lyssenko, M. Weedon, T. M. Frayling, E. Ingelsson, et al. Using genetic variants to assess the relationship between circulating lipids and type 2 diabetes. Diabetes, page db141710, 2015.

[24] [24].↵
A. C. R. Fonseca, R. Resende, C. R. Oliveira, and C. M. Pereira. Cholesterol and statins in alzheimer’s disease: current controversies. Experimental neurology, 223(2):282–293, 2010.
OpenUrl CrossRef PubMed

[25] [25].↵
F. P. Hartwig, G. Davey Smith, and J. Bowden. Robust inference in summary data mendelian randomization via the zero modal pleiotropy assumption. International journal of epidemiology, 46(6):1985–1998, 2017.
OpenUrl CrossRef PubMed

[26] [26].↵
G. Hemani, K. Tilling, and G. D. Smith. Orienting the causal relationship between imprecisely measured traits using gwas summary data. PLoS genetics, 13(11):e1007081, 2017.
OpenUrl

[27] [27].↵
J. R. Hibbeln and N. Salem Jr.. Dietary polyunsaturated fatty acids and depression: when cholesterol does not satisfy. The American journal of clinical nutrition, 62(1):1–9, 1995.
OpenUrl Abstract/FREE Full Text

[28] [28].↵
M. V. Holmes, M. Ala-Korpela, and G. D. Smith. Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nature Reviews Cardiology, 14(10):577, 2017.
OpenUrl

[29] [29].↵
International Schizophrenia Consortium, S. M. Purcell, N. R. Wray, J. L. Stone, P. M. Visscher, M. C. O’Donovan, P. F. Sullivan, P. Sklar, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 460(7256):748–752, 2009.
OpenUrl CrossRef PubMed Web of Science

[30] [30].↵
A. E. Justice, T. W. Winkler, M. F. Feitosa, M. Graff, V. A. Fisher, K. Young, L. Barata, X. Deng, J. Czajkowski, D. Hadley, et al. Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits. Nature communications, 8:14977, 2017.
OpenUrl

[31] [31].↵
R. M. Krauss. Lipids and lipoproteins in patients with type 2 diabetes. Diabetes care, 27(6):1496–1504, 2004.
OpenUrl Abstract/FREE Full Text

[32] [32].↵
P.-R. Loh, G. Bhatia, A. Gusev, H. K. Finucane, B. K. Bulik-Sullivan, S. J. Pollack, T. R. de Candia, S. H. Lee, N. R. Wray, K. S. Kendler, M. C. O’Donovan, B. M. Neale, N. Patterson, A. L. Price, and S. W. G. o. t. P. G. Consortium. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nature Genetics, 47(12):1385–1392, 2015.
OpenUrl CrossRef PubMed

[33] [33].↵
L. A. Lotta, S. J. Sharp, S. Burgess, J. R. Perry, I. D. Stewart, S. M. Willems, J. Luan, E. Ar-danaz, L. Arriola, B. Balkau, et al. Association between low-density lipoprotein cholesterol-lowering genetic variants and risk of type 2 diabetes: a meta-analysis. Jama, 316(13):1383–1391, 2016.
OpenUrl CrossRef PubMed

[34] [34].↵
M. Maes, R. Smith, A. Christophe, E. Vandoolaeghe, A. V. Gastel, H. Neels, P. Demedts, A. Wauters, and H. Meltzer. Lower serum high-density lipoprotein cholesterol (hdl-c) in major depression and in depressed men with serious suicidal attempts: relationship with immune-inflammatory markers. Acta Psychiatrica Scandinavica, 95(3):212–221, 1997.
OpenUrl CrossRef PubMed Web of Science

[35] [35].↵
A. P. Morris, B. F. Voight, T. M. Teslovich, T. Ferreira, A. V. Segre, V. Steinthorsdottir, R. J. Strawbridge, H. Khan, H. Grallert, A. Mahajan, et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature genetics, 44(9):981, 2012.
OpenUrl CrossRef PubMed

[36] [36].↵
J. Morrison, N. Knoblauch, J. H. Marcus, M. Stephens, and X. He. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nature Genetics, pages 1–7, 2020.

[37] [37].↵
M. R. Munafò and G. D. Smith. Robust research needs many lines of evidence, 2018.

[38] [38].↵
L. J. O’Connor, A. P. Schoech, F. Hormozdiari, S. Gazal, N. Patterson, and A. L. Price. Extreme polygenicity of complex traits is explained by negative selection. The American Journal of Human Genetics, 105(3):456–476, 2019.
OpenUrl

[39] [39].↵
L. J. O’Connor and A. L. Price. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nature genetics, 50(12):1728–1734, 2018.
OpenUrl CrossRef PubMed

[40] [40].↵
B. P. Prins, K. B. Kuchenbaecker, Y. Bao, M. Smart, D. Zabaneh, G. Fatemifar, J. Luan, N. J. Wareham, R. A. Scott, J. R. Perry, et al. Genome-wide analysis of health-related biomarkers in the uk household longitudinal study reveals novel associations. Scientific reports, 7(1):1–9, 2017.
OpenUrl

[41] [41].↵
S. Purcell, B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira, D. Bender, J. Maller, P. Sklar, P. I. De Bakker, M. J. Daly, et al. Plink: a tool set for whole-genome association and populationbased linkage analyses. The American journal of human genetics, 81(3):559–575, 2007.
OpenUrl CrossRef PubMed

[42] [42].↵
G. Qi and N. Chatterjee. Mendelian randomization analysis using mixture models for robust and efficient estimation of causal effects. Nature Communications, 10(1):1–10, 2019.
OpenUrl CrossRef

[43] [43].↵
E. Sanderson, W. Spiller, and J. Bowden. Testing and correcting for weak and pleiotropic instruments in two-sample multivariable mendelian randomisation. bioRxiv, 2020.

[44] [44].↵
H. Schunkert, I. R. Köonig, S. Kathiresan, M. P. Reilly, T. L. Assimes, H. Holm, M. Preuss, A. F. Stewart, M. Barbalic, C. Gieger, et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics, 43(4):333–338, 2011.
OpenUrl CrossRef PubMed

[45] [45].↵
H. Shi, G. Kichaev, and B. Pasaniuc. Contrasting the genetic architecture of 30 complex traits from summary association data. The American Journal of Human Genetics, 99(1):139–153, 2016.
OpenUrl CrossRef PubMed

[46] [46].↵
G. D. Smith, M. V. Holmes, N. M. Davies, and S. Ebrahim. Mendel’s laws, mendelian randomization and causal inference in observational data: substantive and nomenclatural issues. European Journal of Epidemiology, pages 1–13, 2020.

[47] [47].↵
S. Spencer, S. Köstel Bal, W. Egner, H. Lango Allen, S. I. Raza, C. A. Ma, M. Gürel, Y. Zhang, G. Sun, R. A. Sabroe, et al. Loss of the interleukin-6 receptor causes immunodeficiency, atopy, and abnormal inflammatory responses. Journal of Experimental Medicine, 216(9):1986–1998, 2019.
OpenUrl Abstract/FREE Full Text

[48] [48].↵
J.-P. Teng, Z.-Y. Yang, Y.-M. Zhu, D. Ni, Z.-J. Zhu, and X.-Q. Li. The roles of arhgap10 in the proliferation, migration and invasion of lung cancer cells. Oncology letters, 14(4):4613–4618, 2017.
OpenUrl

[49] [49].↵
N. J. Timpson, C. M. Greenwood, N. Soranzo, D. J. Lawson, and J. B. Richards. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nature Reviews Genetics, 19(2):110, 2018.
OpenUrl CrossRef PubMed

[50] [50].↵
N. J. Timpson, B. G. Nordestgaard, R. M. Harbord, J. Zacho, T. M. Frayling, A. Tybjærg-Hansen, and G. D. Smith. C-reactive protein levels and body mass index: elucidating direction of causation through reciprocal mendelian randomization. International journal of obesity, 35(2):300–308, 2011.
OpenUrl CrossRef PubMed

[51] [51].↵
M. Verbanck, C.-y. Chen, B. Neale, and R. Do. Detection of widespread horizontal pleiotropy in causal relationships inferred from mendelian randomization between complex traits and diseases. Nature genetics, 50(5):693–698, 2018.
OpenUrl CrossRef PubMed

[52] [52].↵
J. Wang, F. Li, H. Wei, Z.-X. Lian, R. Sun, and Z. Tian. Respiratory influenza virus infection induces intestinal immune injury via microbiota-mediated th17 cell-dependent inflammation. Journal of Experimental Medicine, 211(12):2397–2410, 2014.
OpenUrl Abstract/FREE Full Text

[53] [53].↵
L. D. Ward and M. Kellis. Haploreg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic acids research, 40(D1):D930–D934, 2012.
OpenUrl CrossRef PubMed Web of Science

[54] [54].↵
J. White, D. I. Swerdlow, D. Preiss, Z. Fairhurst-Hunter, B. J. Keating, F. W. Asselbergs, N. Sattar, S. E. Humphries, A. D. Hingorani, and M. V. Holmes. Association of lipid fractions with risks for coronary artery disease and diabetes. JAMA cardiology, 1(6):692–699, 2016.
OpenUrl

[55] [55].↵
N. R. Wray, C. Wijmenga, P. F. Sullivan, J. Yang, and P. M. Visscher. Common disease is more complex than implied by the core gene omnigenic model. Cell, 173(7):1573–1580, 2018.
OpenUrl

[56] [56].↵
R. S. Yadav and N. K. Tiwari. Lipid integration in neurodegeneration: an overview of alzheimer’s disease. Molecular neurobiology, 50(1):168–176, 2014.
OpenUrl CrossRef PubMed

[57] [57].↵
J. Yang, B. Benyamin, B. P. McEvoy, S. Gordon, A. K. Henders, D. R. Nyholt, P. A. Madden, A. C. Heath, N. G. Martin, G. W. Montgomery, M. E. Goddard, and P. M. Visscher. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics, 42(7):565–569, 2010.
OpenUrl CrossRef PubMed Web of Science

[58] [58].↵
Q. Zhao, Y. Chen, J. Wang, and D. S. Small. Powerful three-sample genome-wide design and robust statistical inference in summary-data Mendelian randomization. International Journal of Epidemiology, 48(5):1478–1492, 07 2019.
OpenUrl

[59] [59].↵
Q. Zhao, J. Wang, G. Hemani, J. Bowden, D. S. Small, et al. Statistical inference in two-sample summary-data mendelian randomization using robust adjusted profile score. Annals of Statistics, 48(3):1742–1769, 2020.
OpenUrl

[60] [60].↵
Q. Zhao, J. Wang, Z. Miao, N. Zhang, S. Hennessy, D. S. Small, and D. J. Rader. The role of lipoprotein subfractions in coronary artery disease: A mendelian randomization study. bioRxiv, page 691089, 2019.