Discovering genetic interactions bridging pathways in genome-wide association studies

Gang Fang; Wen Wang; Vanja Paunic; Hamed Heydari; Michael Costanzo; Xiaoye Liu; Xiaotong Liu; Benjamin Oately; Michael Steinbach; Brian Van Ness; Eric E. Schadt; Nathan D. Pankratz; Charles Boone; Vipin Kumar; Chad L. Myers

doi:10.1101/182741

Abstract

Genetic interactions have been reported to underlie phenotypes in a variety of systems, but the extent to which they contribute to complex disease in humans remains unclear. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions, but existing methods for identifying them from GWAS data tend to focus on testing individual locus pairs, which undermines statistical power. Importantly, the global genetic networks mapped for a model eukaryotic organism revealed that genetic interactions often connect genes between compensatory functional modules in a highly coherent manner. Taking advantage of this expected structure, we developed a computational approach called BridGE that identifies pathways connected by genetic interactions from GWAS data. Applying BridGE broadly, we discovered significant interactions in Parkinson’s disease, schizophrenia, hypertension, prostate cancer, breast cancer, and type 2 diabetes. Our novel approach provides a general framework for mapping complex genetic networks underlying human disease from genome-wide genotype data.

Genome-wide association studies (GWAS) have been increasingly successful at identifying single-nucleotide polymorphisms (SNPs) with statistically significant association to a variety of diseases^1-5 and gene sets significantly enriched for SNPs with moderate association^6-10. However, for most diseases, there remains a substantial disparity between the disease risk explained by the discovered loci and the estimated total heritable disease risk based on familial aggregation^11-16. While there are a number of possible explanations for this “missing heritability”, including many loci with small effects or rare variants^11-15-17, genetic interactions between loci are one potential culprit^{13,14,16,18,19}. Genetic interactions generally refer to a combination of two or more genes whose contribution to a phenotype cannot be completely explained by their independent effects^16,20,21, For example, one example of an extreme genetic interaction is synthetic lethality, which is the case where two mutations, neither of which is lethal on its own, combines to generate a lethal double mutant phenotype. Genetic interactions allow relatively benign variation to combine and generate more extreme phenotypes, including complex human diseases^11-13,16,22. While several studies have reported interactions between genetic variants in various disease contexts^20,23-26, and though efficient and scalable computational tools have been developed for searching for interactions amongst genome wide SNPs^20,26-28, discovering them systematically with statistical significance remains a major challenge. For example, recent work estimated through simulation studies that approximately 500,000 subjects would be needed to detect significant genetic interactions under reasonable assumptions¹⁶, which remains beyond the cohort sizes available for a typical GWAS study or even the large majority of meta-GWAS studies.

Genome-wide reverse genetic screens in model organisms have produced rich insights into the prevalence and organization of genetic interactions^29,30. Specifically, the mapping and analysis of the yeast genetic interaction network revealed that genetic interactions are numerous and tend to cluster in highly organized network structures, connecting genes in two different but compensatory functional modules (e.g. pathways or protein complexes) as opposed to appearing as isolated instances^{29, 31–33}. For example, nonessential genes belonging to the same pathway often exhibit negative genetic interactions with the genes of a second nonessential pathway that impinges on the same essential function (Fig. 1A). Due to their functional redundancy, the two different pathways can compensate for the loss of the other, and thus, only simultaneous perturbation of both pathways would result in an extreme loss of function phenotype, which could be associated with either increased or decreased disease risk. Importantly, the same phenotypic outcome could be achieved by several different combinations of genetic perturbations in both pathways (e.g. A-X, A-Z, B-X, B-Y, B-Z, as summarized in Fig. 1B).This model for the local topology of genetic networks, called the “between pathway model” (BPM), has been widely observed in yeast genetic interaction networks^29,34. Indeed, as many as ~70% of negative genetic interactions observed in yeast occur in BPM structures, indicating that genetic interactions are highly organized and this type of local clustering is the rule rather than the exception³¹. Combinations of mutations in genes within the same pathway or protein complex also exhibit a high frequency of genetic interaction, a scenario we refer to as the “within-pathway model” (WPM)^29,34. Indeed, ~80% of essential protein complexes in yeast exhibit a significantly elevated frequency of within-pathway interactions³⁵. In the context of human disease, this scenario may arise for an individual inheriting two variants in the same pathway, resulting in reduced flux or function of a particular pathway and an increase or decrease in disease risk.

Figure 1. Between pathway model of genetic interactions.

An illustrative example demonstrating the concept of the between pathway model of genetic interactions. (A) Two distinct pathways, A→B→C and X→Y→Z converge to regulate the same essential function. Independent genetic perturbations in either pathway (indicated by blue color with an asterisk) have little or no contribution to a phenotype, but combined perturbations in both pathways in the same individual result in a genetic interaction, leading to a loss of function phenotype that can be associated with either an increase or decrease in disease risk. (B) The bipartite structure of genetic interactions resulting from functional compensation between the two pathways shown in (A). Genetic perturbations in any pair of genes across the two pathways combine to increase or decrease disease risk. Edges indicate observed interactions at the gene-gene or SNP-SNP level. (C) Conceptual overview of the BridGE method for detecting genetic interactions from GWAS data.

The prevalence of BPM and WPM structures observed in the yeast global genetic network has important practical implications that can be exploited to explore disease-associated genetic interactions in humans based on GWAS data. Although tests to identify interactions between specific SNP or gene pairs are statistically under-powered, we may be able to detect genetic interactions by leveraging the fact that pairwise interactions between genome variants are likely to cluster into larger BPM and WPM network structures similar to those observed in the yeast global genetic network. Indeed, other studies exploited similar structural properties to derive genetic interaction networks from phenotypic variation in a yeast recombinant inbred population³⁶. We note that the method we propose here is also broadly similar to previous approaches that have used gene set enrichment or GO enrichment analysis to interpret SNP sets arising from univariate or interaction analyses^6-10,37-40 or aggregation tests for rare variants^15,41,42 (See Methods). Other existing approaches have successfully identified interactions by reducing the test space for SNP-SNP pairs, through either knowledge or data-driven prioritization^43-46 (See Methods). However, to our knowledge, no existing method has been developed to systematically identify between-pathway interaction structures based on human genetic data, which is the focus of this study.

Results

BridGE: a novel method for systematic discovery of pathway level genetic interactions from GWAS

We developed a method called BridGE (Bridging Gene sets with Epistasis) to explicitly search for coherent sets of SNP-SNP interactions within GWAS cohorts that connect groups of genes corresponding to characterized pathways or functional modules. Specifically, although many pairs of loci do not have statistically significant interactions when considered individually, they can be collectively significant if there is an enrichment of SNP-SNP interactions between two functionally related sets of genes (Fig. 1B). Thus, we imposed prior knowledge of pathway membership and exploited structural and topological properties of genetic networks to gain statistical power to detect genetic interactions that occur between or within pathways in GWAS associated with diverse diseases. Our algorithm specifically focuses on identifying BPM stuctures, where two distinct pathways are bridged by several SNP-level interactions connecting them, as well as WPM structures, where interactions densely connect between SNPs linked to genes in the same functional module or pathway.

Our approach involves five main components (See Methods, Fig. 1C): (I) Data processing consisting of sample quality control and adjustment for population substructure between the cases and controls to avoid false discoveries due to population stratification^47,48. Linkage disequilibrium (LD) was also accounted for by pruning the full set of SNPs into a subset, as LD could otherwise result in spurious BPM structures. (II) SNP-SNP interaction networks were constructed based on SNP-SNP interactions scored under different disease model assumptions (additive, recessive, dominant, or combined recessive and dominant models). The additive disease model was implemented as previously described, and SNP-SNP interaction scores were derived based on likelihood ratio tests for models with and without an interaction term²⁰. Interactions based on recessive and dominant disease models were estimated using a hypergeometric-based metric that directly tests for disease association for individuals that are either homozygous (recessive and dominant models) or heterozygous (dominant only) for the minor allele at two loci and compares the observed degree of association to the marginal effects of both loci. (III) The SNP-SNP network was thresholded by applying a lenient significance cutoff to generate a low-confidence, high-coverage SNP-SNP interaction network. This binary network is expected to contain a large number of false positive interactions, but it enables assessment of the significance of SNP-SNP interactions collectively at the pathway level. (IV) Pairs of pathways (for BPMs) or single pathways (for WPMs), as defined by curated functional standards^49-51, were tested for enrichment of SNP-SNP pair interactions connecting between them (or within the single pathway) with a chi-squared test, compared to both the global interaction density and the marginal interaction density of the two pathways , as well as a permutation test (p_perm) conducted by randomly shuffling the SNP-pathway assignment. These tests produced three statistics to measure the significance of each candidate BPM or WPM. (V) Finally, a sample permutation strategy was applied to estimate false discovery rate, to correct for multiple hypothesis testing and assess the significance of the candidate BPMs or WPMs. Multiple hypothesis test correction is conducted only at the level of pathway or pathway pairs; the number of hypothesis tests performed for all possible pathways and all possible between-pathway combinations is substantially less than the number of tests for all possible SNP pairs (~10⁵ as compared to ~10¹¹), which increases our power for discovering interactions relative to approaches that operate on individual SNP-SNP interactions. As part of BridGE, in addition to discovering BPM and WPM structures, we can also identify individual pathways that have significantly elevated marginal density of SNP-SNP interactions even where the interaction partners do not necessarily have clear coherence in terms of pathways (called PATH structures, See Methods). In this case, we are not focused on pathway-pathway interactions but simply assess whether a particular pathway is a highly connected hub and associated with numerous SNP-level interactions. These five steps enabled us to extract statistically significant pathway-level interactions that can be associated with either increased risk of disease when pairs of minor alleles linked to two pathways occur more frequently in the diseased population or, conversely, decreased risk of disease when pairs of minor alleles annotated to two pathways occur more frequently in the control population.

Discovery of between-pathway interactions in a Parkinson’s disease cohort

We first applied BridGE to identify between pathway interactions in a genome-wide association study of Parkinson’s disease (PD)⁵², denoted as PD-NIA (Supplementary Table 1). Recent work estimated a substantial heritable contribution to PD risk across a variety of GWAS designs (20%~40%)^53,54, and although a relatively large number of variants have been individually associated with PD, the loci discovered to date explain only a small fraction (6%–7%) of the total heritable risk ⁵³. The PD-NIA cohort used in this analysis consists of 519 patients and 519 ancestry-matched controls after balancing the population substructure (See Methods). We compiled a collection of 833 curated gene sets (MSigDB Canonical pathways)⁵⁵ representing established pathways or functional modules from KEGG⁴⁹, BioCarta⁵⁰ and Reactome⁵¹ (Supplementary Table 2) and found that 658 of these pathways were represented in the PD-NIA cohort after filtering based on gene set size (minimum: 10 genes or SNPs, maximum: 300 genes or SNPs). After using both SNP-pathway membership permutations (NP=150,000) and sample permutations (NP=10) to establish global significance and correct for the multiple hypotheses tested (See Methods), BridGE reported 173 total significant BPMs at a false discovery rate (FDR) of < 0.25 (p_perm > 4.7 × 10^-5) using a combined disease model (QQ plot in Fig. 2A, Supplementary Table 3). Due to overlap among the pathways, these could be summarized by a less redundant set of 23 BPMs involving 32 unique pathways (a maximum overlap coefficient of 0.25, Fig. 3, Supplementary Table 4, See Methods). Some of the identified BPMs persisted at even the most stringent FDR cutoffs (FDR ≤ 0.05). For example, a high confidence BPM was identified between the Golgi associated vesicle biogenesis gene set and FcεRI signaling. More specifically, we observed 2281 SNP-SNP interactions between the vesicle biogenesis and FcεRI signaling gene sets (Fig. 2B), which is 1.5-fold higher than the expected number of SNP-SNP interactions (1510) based on the global density SNP-SNP interaction network and 1.3- and 1.2-fold higher than expected given the marginal density of the two pathways (5.9% and 6.5%), respectively , , , Fig. 2C). In contrast to the significance of this BPM, none of the individual SNPs supporting this BPM were significant on their own after multiple hypothesis correction based on single-locus tests on this cohort (Fig. 2B). Furthermore, none of the individual SNP-SNP interactions between the two pathways were significant when tested independently under an additive disease model (Fig. 2D, FDR ≥0.94), or recessive or dominant models (See Methods) (Supplementary Fig. 1). Thus, the variants involved in this pathway-pathway interaction observed in the Parkinson’s disease PD-NIA cohort, would be missed based on traditional univariate analysis or interaction tests that focus on individual SNP pairs, but were highly significant when assessed collectively by BridGE.

Figure 2. Significant pathway-pathway interactions discovered from the PD-NIA Parkinson’s disease cohort.

(A) Quantile-quantile (QQ) plot comparing observed p-values (based on SNP-pathway membership permutations) for all possible pathway-pathway interactions between the 685 pathways to the expected, uniform distribution (log₁₀ scale). The horizontal line at 6.7x10^-6 reflects the maximum resolution supported by 150,000 permutations. (B) Interaction between Golgi associated vesicle biogenesis pathway (Reactome) and Fc epsilon receptor I signaling pathway (KEGG). Two sets of SNPs mapped to genes in these pathways are connected by grey lines that reflect SNP-SNP interactions above a lenient top-5% percentile cutoff. The two groups of horizontal bars (grouped and colored by chromosome) show the -log₁₀ p-values derived from a single locus (univariate) test applied to each SNP individually (hypergeometric test), and the two dashed lines correspond to an uncorrected p ≤ 0.05 cutoff, indicating that very few of the SNPs show marginal significant association before multiple hypothesis test correction. (C) Null distribution of the SNP-SNP interaction density between the Golgi associated vesicle biogenesis pathway and Fc epsilon receptor I signaling pathway described in (A) based on 150,000 SNP permutations. The observed density for the Golgi associated vesicle biogenesis and Fc epsilon receptor I signaling interaction is indicated by the red arrow and was not exceeded by any of the random instances (p_perm < 6.7×10^-6). (D) Distribution of p-values from individual tests for pairwise SNP-SNP interactions for SNP pairs supporting the pathway-pathway interaction, as measured by an additive disease model (-log₁₀ p-value). None of the SNP pairs are significant after multiple hypothesis correction (dashed line at the most significant SNP-SNP pair corresponds to FDR=0.94).

Furthermore, few of the pathways that we discovered as parts of significant BPMs (Fig. 3, Supplementary Table 4) would be discovered using approaches based on pathway enrichment tests of single locus effects^6,7. For example, only three pathways were enriched among the single-locus effects associated with PD (Golgi associated vesicle biogenesis, Clathrin-derived vesicle budding and the Rac-1 cell motility signaling pathway; Supplementary Table 5) at the same FDR applied to the discovery of BPMs (FDR < 0.25), and only one of these was represented as part of a BPM identified by our analysis (Supplementary Table 4). We failed to identify any of the remaining 31 BPM-involved pathways through gene set enrichment analysis of single locus effects.

Figure 3. Global summary of between-pathway and within-pathway interactions discovered from a Parkinson’s disease cohort (PD-NIA).

Network representation of a set of significant (FDR ≤ 0.25) between-pathway (BPM) and within-pathway interactions (WPM) that are associated with increased (red edges) or decreased (green edges) risk of PD. Each node indicates the name of the pathway or gene set, and each edge represents a between-pathway interaction or within-pathway interaction (self-loop edges). The size of the node reflects the number of interactions edges it has. Replicated interactions are shown as bold lines.

Strikingly, the large majority (22 of 23) of discovered BPMs were associated with decreased risk for Parkinson’s disease (Fig. 3). This may suggest that, in the case of Parkinson’s disease, genetic interactions may be more frequently associated with protective effects, or alternatively, simply that there is more heterogeneity across the population in genetic interactions leading to increased risk, which would limit our ability to discover such interactions. Several BPM interactions were highly relevant to the biology of Parkinson’s disease. In particular, the FC epsilon receptor I (FcεRI) signaling pathway represented a hub in the pathway interaction network (Fig. 3). FcεRI is the high-affinity receptor for Immunoglobulin E and is the major controller of the allergic response and associated inflammation. In general, immune-related inflammation has been frequently associated with Parkinson’s disease and several immuno-modulating therapies have been pursued, but it remains unclear whether this is a causal driver of the disease or is rather a result of the neurodegeneration associated with disease progression^56,57. There has been relatively little focus on the specific role of FcεRI in Parkinson’s, but recent observations support the relevance of this pathway to the disease⁵⁸. For example, Bower et al. reported an association between the occurrence of allergic rhinitis and increased susceptibility to PD⁵⁹. Furthermore, reduction of IL-13, one of the cytokines activated by FcεRI and a member of the FcεRI signaling pathway, was shown to have a protective effect in mouse models of PD⁶⁰, and galectin-3, which is known to modulate the FcεRI immune response, was shown to promote microglia activation induced by α-synuclein, a cellular phenotype associated with PD^61,62. These observations indicate that a hyperactive allergic response may predispose indviduals to PD, and suggest that protective interactions reported by our method may result from variants that subtly reduce the activity of this pathway. Aberrant events in the Golgi and related transport processes have been known to play an important role in the pathology of various neurodegenerative diseases, including Parkinson’s disease^63,64. Also, glycolytic and gluconeogenic metabolic intermediates have been found to be cytoprotective against 1-methyl 4-phenylpyridinium (MPP+) ion toxicity in Parkinson’s disease⁶⁵. Our BridGE approach also identified three protective interactions involving the IL-12 and STAT4 signaling pathway, a pro-inflammatory cytokine that plays a major role in regulating both the innate and adaptive immune responses⁶⁶. Specifically, microglial cells both produce and respond to IL-12 and IFN-gamma, and these comprise a positive feedback loop that can support stable activation of microglia^67,68, a hallmark of Parkinson’s disease, particularly in later stages^69-73. The prevalence of the FcεRI and IL-12 interactions among the discovered interactions suggests a major role for immune signaling as a causal driver of PD.

In addition to significant between-pathway interactions, we also discovered 3 significant WPMs associated with Parkinson’s disease risk: golgi-associated vesicle biogenesis , , and FRD > 0.01), collagen mediated activation cascade , , , and FDR = 0.13), and the HCMV and MAP kinase pathway , , , and FDR = 0.25) (Fig. 3, Supplementary Table 4). In all three cases, minor allele combinations within the pathways were associated with decreased risk of PD. All three of these pathways were also implicated in high confidence protective BPM interactions with other pathways suggesting they play important roles in PD risk.

Replication of pathway-pathway interactions in an independent Parkinson’s disease cohort

To validate our findings, we determined if the BPM interactions discovered in the PD-NIA cohort could be replicated in an independent PD cohort (PD-NGRC)⁷⁴; 1947 cases and 1947 controls, all of European ancestry; subjects overlapping with PD-NIA cohort were removed). Indeed, 8 of the 173 total BPM interactions discovered in the PD-NIA cohort were nominally significant in the PD-NGRC based on all three significance criteria (, , ) (See Methods). To assess the significance of this level of replication across the entire set of discoveries, we compared the number of observed replicated BPMs at several different FDR cutoffs to the number expected by chance, which was estimated based on 10 random sample permutations of the validation cohort (See Methods). Indeed, this analysis confirmed that the discovered interactions replicated more frequently than expected (Fig. 4A, Supplementary Table 6). For example, at an FDR cutoff of 0.05, the number of replicated BPMs was ~7 fold higher than expected (p = 0.02). BPMs identified at more stringent FDR cutoffs showed a stronger tendency to replicate in the independent cohort (Fig. 4A, Supplementary Table 6), including the top-ranked BPM interaction we discovered between Golgi associated vesicle biogenesis and the FC epsilon receptor I (FcεRI) signaling pathway. Intriguingly, another between-pathway interaction for the FcεRI signaling pathway, with a Glycolysis/gluconeogenesis gene set, also replicated (Supplementary Table 6).

Figure 4. Replication analysis of BPM interactions discovered from PD-NIA in an independent cohort (PD-NGRC).

(A) Each BPM interaction discovered from the PD-NIA data was tested for replication in the PD-NGRC cohort. The collective significance of replication of the entire set of interactions discovered in PD-NIA was evaluated by measuring the fraction of significant BPMs discovered from PD-NIA that replicated in the PD-NGRC cohort (blue bars) at five different FDR cutoffs (x-axis). The random expectation for the number of replicating BPMs is plotted for comparison and was estimated based on 10 random sample permutations (grey bars). (B) Sample permutation-based approach to check whether the individual SNP-SNP interactions supporting the replicated pathway-level interactions are similar between PD-NIA and PD-NGRC. The significance of the overlap (blue dots) of SNP-SNP interactions in each of the BPMs replicated in PD-NGRC was assessed by a hypergeometric test. The random expectation for the level of overlap was estimated by measuring the SNP-SNP interaction overlap in the same set of BPMs in 10 random sample permutations of the PD-NGRC cohort (gray dots) (p = 6.7 x 10⁻⁴, Wilcoxon rank-sum test). (C) Scatter plot of the significance of SNP-SNP interaction overlap in each of the replicated BPMs (-log₁₀ hypergeometric p-value) versus a direct measure of overlap (overlap coefficient).

While we confirmed replication of a significant fraction of the discovered interactions at the pathway level, this does not necessarily imply that the individual SNP pairs supporting these pathway-level effects are shared across cohorts. For the 8 BPMs that were validated in the PD-NGRC cohort, we evaluated the significance of the overlap between the specific SNP-SNP pair interactions supporting each of the validated BPMs in the PD-NIA and the PD-NGRC cohorts and contrasted the observed overlap to comparable statistics from 10 random sample permutations of the PD-NGRC cohort. Several individual BPMs exhibited significant overlap in their supporting SNP-SNP interactions, and collectively, the set of 8 replicated BPMs were strongly shifted toward higher than expected SNP-SNP interaction overlap (See Methods) (p = 1.4 × 10^-3) (Fig. 4B, see Supplementary Table 6 for a list of SNP-SNP pairs in common across cohorts). However, despite statistically significant overlap among SNP-SNP interactions identified in replicated BPMs, the extent of the observed overlap in terms of fraction of pairs was relatively low for most cases, with all of them exhibiting an overlap coefficient of less than 0.15 (See Methods) (Fig. 4C). Thus, the same pathway-pathway interaction may be supported by different sets of SNP-SNP interactions in different populations, or alternatively, this may reflect that the power for reliably pinpointing specific locus pairs is limited. In either case, these results highlight the primary motivation for our method: genetic interactions, in particular those in a BPM structure, can be more efficiently detected from GWAS when discovered at a pathway or functional module level rather than at the level of individual genomic loci.

Discovery of pathway-level genetic interactions in five other diseases

We applied BridGE more broadly to an additional twelve GWAS cohorts representing seven different diseases (Parkinson’s disease, schizophrenia, breast cancer, hypertension, prostate cancer, pancreatic cancer and type 2 diabetes)^75-80 (Supplementary Table 1) (See Methods). Including PD-NIA, of the thirteen cohorts, analysis of eleven cohorts (covering six different diseases) resulted in significant discoveries for at least one of the three types of interactions (BPM, WPM or PATH) at FDR < 0.25. More specifically, significant BPMs were discovered for eight cohorts (covering six different diseases), significant WPMs for six cohorts (covering four different diseases) and significant PATH structures for six cohorts (covering three different diseases) at FDR ≤ 0.25 (Fig. 5, Fig. 6A, Supplementary Tables S7-S20). The number of interaction discoveries per cohort varied substantially, from as low as two in one of the schizophrenia cohorts to as many as 50 interactions in one of the breast cancer cohorts. While we tested multiple disease models (additive, dominant, recessive, and combined dominant-recessive), the most significant discoveries for the majority of diseases examined were reported when using a dominant or combined model as measured by our SNP-SNP interaction metric (See Methods). The relative frequency of interactions under a dominant vs. a recessive model may be largely due to our increased power to detect interactions between SNPs with dominant effects compared to recessive effects (See Methods).

Figure 5. Between-pathway and within-pathway interactions discovered from 6 different diseases.

Network representation of a set of significant between-pathway (BPM) or within-pathway (WPM) interactions (FDR ≤ 0.25) that are associated with increased (red edges) and decreased (green edges) risk of corresponding diseases. Replicated interactions are shown as bold lines. Discoveries from different diseases are indicated by different background colors. Only the most significant 10 BPM/WPMs are shown for each GWAS cohort (see Supplementary Table 4, 9-20 for complete list)

We obtained appropriate replication cohorts for three additional diseases beyond Parkinson’s disease, including prostate cancer, breast cancer and schizophrenia, and were able to successfully replicate discovered genetic interactions for all three diseases (Supplementary Table 21 replication summary). For example, three of eleven BPMs (FDR≤ 0.25) discovered in the ProC-CGEMS prostate cancer cohort were replicated in the ProC-BPC3 cohort (7.5-fold enrichment, p = 0.01) while three of ten WPMs discovered from the ProC-BPC3 cohort (FDR≤ 0.25) could be replicated in ProC-CGEMS (3-fold enrichment, p = 0. 0001). For breast cancer, six of 108 significant BPMs (FDR ≤ 0.20) discovered from the BC-MCS-JPN cohort replicated in the BC-MCS-LTN cohort (2-fold enrichment, p = 0.07) and the sole significant PATH interaction discovered from the BC-MCS-LTN cohort replicated in the BC-MCS-JPN cohort. For schizophrenia, one of eight signficant BPMs discovered from the SZ-GAIN cohort replicated (fold-enrichment > 10, p = 0.02), and the top significant WPM (FDR ≤ 0.1) also replicated in the SZ-CATIE cohort.

The vast majority of the genetic interactions we discovered appear to be disease-specific (Fig. 5, Supplementary Table 7), and many of the pathways implicated in genetic interactions showed strong relevance to the corresponding disease. For example, we identified several cancer-related gene sets involved in replicated BPMs predicted to affect breast cancer risk, including p53 signaling, a basal cell carcinoma gene set, as well as an increased-risk interaction between MTA-3 related genes and T cell receptor activation initiated by Lck and Fyn. MTA-3 is a Mi-2/NuRD complex subunit that regulates an invasive growth pathway in breast cancer⁸¹, and Lck and Fyn are members of the Src family of kinases whose expression have been found to be associated with breast cancer progression and response to treatment^82-84.

We also identified and replicated multiple prostate cancer risk-associated interactions that involved DNA repair, PD-1 (Programmed cell death protein 1) signaling, and insulin regulation pathways. Consistent with our findings, metabolic syndrome has been recently associated with prostate cancer⁸⁵, and serum insulin levels have been shown to correlate with risk of prostate cancer⁸⁶. We also identified a replicating interaction associated with decreased risk of prostate cancer between the p38 MAPK signaling and AKAP95 chromosome dynamics pathways. P38 MAPK signaling has been associated with a variety of cancers⁸⁷, and AKAP95 is an A kinase-anchoring protein involved in chromatin condensation and maintenance of condensed chromosomes during mitosis⁸⁸ whose expression has been previously implicated in the development and progression of rectal and ovarian cancers⁸⁹. We also discovered and replicated two WPMs associated with prostate cancer risk. The first involves the antigen processing and presentation pathway (associated with increased risk) and a second involving a gene set associated activation of ATR in response to replication stress (associated with decreased risk). Both of these pathways have strong relevance to cancer risk^90,91.

For schizophrenia, we discovered and replicated a BPM interaction comprising a gene set associated with the HIV life cycle and a vitamin and cofactor metabolism pathway. Interestingly, a recent large Danish schizophrenia study reported that schizophrenia patients are at a 2-fold increased risk of HIV infection, and conversely, that individuals infected with HIV exhibited increased risk of schizophrenia, especially in the year following diagnosis⁹². Our finding suggests a common genetic basis between risk factors for schizophrenia and host response to the HIV virus, which may help to explain the observed co-morbidity of these diseases. We also discovered and replicated a protective WPM for schizophrenia in the nicotinate and nicotinamide metabolism pathway. Nicotinic acid (vitamin B3) supplements have been pursued as a treatment for schizophrenia dating back to the 1950s⁹³. Interestingly, after an initial series of reports of promising treatments, several follow-up studies had difficulty reproducing the beneficial effects of nicotinic acid⁹⁴, which could be a result of modifier effects within this pathway.

Although we did not conduct replication analyses for hypertension or type 2 diabetes, we found that many of the pathways involved in interactions from the discovery cohorts were also highly relevant to the corresponding disease. For example, in the hypertension cohort, we identified a risk-associated BPM interaction involving hypoxia inducible factor (HIF) signaling, whose aberrant expression has been previously associated with hypertension⁹⁵. Two BPMs and one WPM, all associated with increased risk, involved the Rho cell motility signaling pathway, which has been previously implicated in the pathogenesis of hypertension⁹⁶. For type 2 diabetes, we discovered BPMs associated with protective effects involving an autoimmune thyroid disease gene set, glycosaminoglycan biosynthesis, and the mTOR signaling pathway, all of which have strong links to diabetes^97-99. In summary, BridGE was able to detect all possible types of pathway-level genetic interactions (BPM, WPM and PATH) across several diverse disease cohorts, highlighting the utility of our method and the potential for genetic interactions to underlie complex human diseases.

Simulation study to evaluate the power of BridGE approach

Several of our results indicate that the additional power gained by aggregating SNPs connecting between or within pathways is critical for discovering genetic interactions from GWAS, at least based on the cohort sizes analyzed here. To fully explore the limits of our approach, we carried out a simulation study to estimate the statistical power afforded by the BridGE method with respect to sample size, interaction effect size, minor allele frequency, and pathway size, all of which should affect the sensitivity of detection of pathway-level genetic interactions.

We focused our power analysis on the detection of BPMs, which comprise most of our discoveries. Briefly, our simulations involved two components: one in which individual SNP-SNP pairs were embedded in a simulated population cohort with varying allele frequency ¹⁰⁰, and another component that simulated the rate of detection of increasingly larger BPM interaction structures given the corresponding level of false positives in the SNP-level network as determined by the first component (See Methods). Indeed, we found that each of the evaluated parameters (sample size, interaction effect size, minor allele frequency, and pathway size) affected the power of our approach (Fig. 6B). As expected, the sensitivity of our method increases with increasing pathway size, which is a key motivation for the approach. For example, our power analysis indicated that a minimum cohort size of 5000 individuals (2500 cases, 2500 controls) is required to detect a 25×25 BPM (i.e. two interacting pathways with 25 SNPs mapping to each pathway) that confers a 2X increase in risk with a minor allele frequency (MAF) of 0.05 (FDR < 25%) while a 300x300 BPM with the same effect size would require only 1000 individuals (500 cases, 500 controls) for detection at the same level of significance (simulation results for more stringent FDR cutoffs). As expected, the sensitivity of the approach also increases for interactions involving SNPs with higher MAF. For example, the same 25x25 BPM involving variants at MAF of 0.15 conferring 2X increase in risk can be detected from cohorts as small as 2000 individuals (1000 cases, 1000 controls), and a 300x300 BPM with these characteristics could be detected from a cohort as small as 500 individuals (250 cases, 250 controls). A key parameter affecting these power estimates is the assumed biological density of interactions, which we define as the fraction of SNP-SNP pairs crossing two pathways of interest that actually have a functional impact on the disease phenotype relative to all possible SNP-SNP pairs. We assumed a density of 5% for the power analysis reported here (analysis based on 2.5% and 10% are included in Supplementary Fig. 2), meaning that the fraction of SNP-pairs that have the potential to jointly influence the phenotype comprise only a small minority of all possible SNP pairs. In practice, we anticipate that this frequency varies substantially across different pathways, depending on the frequency of functionally deleterious SNPs that are present in the population for each pathway. A higher density of functionally deleterious SNPs will result in higher sensitivity of our approach and vice versa, a lower density of functionally deleterious SNP combinations can substantially reduce the sensitivity of our approach (Supplementary Fig. 3). Notably, while statistical power increases with pathway size (i.e. number of SNPs mapping to each pathway), this is only true under the assumption that the SNPs (and the corresponding genes) actually contribute in a functionally coherent manner to the particular pathway or functional module. On the real disease cohorts, we discovered interactions for a large range of pathway sizes (Supplementary Fig. 4), suggesting there are even relatively small functional modules (e.g. less than 20 associated SNPs) that have sufficiently strong interaction effects to be detected. In general, these power analyses confirm that our approach is sufficiently powered to discover pathway-level genetic interactions at moderate effect size (~1.5-2X increased/decreased risk) for relatively small cohorts (~1000 or more individuals), which suggests it could be broadly applied to discover interactions in hundreds of existing GWAS cohorts that have been previously analyzed using only univariate approaches¹⁰¹.

Figure 6. Summary of discoveries across all disease cohorts and power analysis.

(A) The number of discoveries made in each of the disease cohorts evaluated, the disease model under which discoveries were made, and the direction of the disease association is reported. A complete list of interactions discovered is available as Supplementary Table 4, 9-20 (B) Power analysis of the effect of minor allele frequency (MAF), BPM size, interaction effect size, and sample size on the discovery of between-pathway interactions. Colors in the heatmap indicate the estimated minimum number of samples needed for discovering significant BPMs of different sizes under each scenarios (MAF at 0.05, 0.1, 0.15, 0.2, 0.25; interaction effects at 1.1, 1.5, 2.0, 3.0, 5.0; BPM sizes at 25×25, 50×50, 75×75, 100×100, 125×125, 150×150, 200×200, 250×250, 300×300). All power analyses were conducted using a significance threshold required to meet an FDR < 0.25 (SNP-permutation p < 3.0 × 10^-5) based on an average of the significant BPM discoveries across all analyzed GWAS cohorts.

Discussion

We described a novel and systematic approach for discovering human disease-specific, pathway-level genetic interactions from genome-wide association data. Results from eleven GWAS cohorts representing six different diseases confirmed that interaction structures prevalent in genetic networks of model organisms are indeed apparent in human disease populations and that these structures can be leveraged to discover significant genetic interactions either between or within biological pathways or functional modules. Genetic interactions discovered for these six diseases have the potential to contribute substantially to our understanding of their genetic basis. For example, to date, there have been approximately 85 singly associated loci (p ≤1.0×10^-7) and one genetic interaction (between FGF20 and MAOB) reported for Parkinson’s Disease^102,103. Here, we discovered 23 more pathway level genetic interactions, emphasizing the potential of our approach to expand our knowledge of the contribution of genetic variation associated with diseases such as PD. Indeed, many of the pathways discovered by our approach have not been previously implicated in these diseases. For example, the median percentage of BridGE-identified pathways for which there was at least one linked SNP reported in dbGaP across the six diseases was 22% (Supplementary Table 22), indicating that the large majority of our discoveries represent novel insights that could not be made using standard single-locus approaches.

The are several ways the BridGE method could be expanded and improved upon to better detect genetic interactions. First, our approach currently depends on literature-curated collections of biological pathways as a major input. The potential of our method to detect genetic interactions within or between well-defined pathways and functional modules could be substantially improved as more complete curated or data-derived functional standards are developed and integrated with the approach, which will be a focus of future work. Second, to avoid spurious network structures related to SNPs that map to genes located in close physical proximity or linkage disequilibrium (LD), we sampled a conservatively sized subset of tag SNPs to run our analysis for each dataset. This conservative approach has undoubtedly missed functional variants that may contribute to disease risk. More sophisticated approaches for retaining a larger set of tag SNPs while still controlling for LD structure could improve the sensitivity of our method. Finally, we emphasize that our study focuses exclusively on detecting pathway level genetic interactions between common variants assayed by typical GWAS. Continued development to examine the contribution of rare variants or interactions between rare variants and other loci, or to leverage the full set of variants identified through whole-genome or exome sequencing represent logical extensions of the BridGE approach.

Developing mechanistic or clinically actionable disease insights based on the genetic interactions we have discovered will require additional strategies that build on pathway-level discoveries to generate more targeted hypotheses, followed by functional studies in disease models. One potential strategy to generate more targeted hypotheses involves leveraging an approach like BridGE to find pathways with robust disease-associated genetic interactions followed by a more targeted search for individual SNP-SNP or gene-gene pairs within these pathways that explain these structures. Our analysis of the Parkinson’s cohort indicated that there is indeed significant overlap among the strongest SNP-SNP interactions underlying replicated pathway level interactions, supporting the potential utility of this hierarchical approach.

The extent to which genetic interactions contribute to the genetic basis of human disease has been the subject of recent debate^16,104,105. This debate is in part fueled by differences in language among geneticists that regularly encounter physiological epistasis between specific alleles and statistical geneticists who instead study statistical epistasis, which measures the non-additive component of genetic variance in a population^104,106. The target of our method is to discover disease-relevant physiological epistasis between sets of specific alleles in biological pathways based on population genetic data. Robust estimates of the additional heritability explained by pathway level genetic interactions discovered by our method will be a focus of future work, but we anticipate this still remains just one of many contributions to heritability. Even in cases where the contribution to disease heritability is modest, genetic interactions define genetically distinct disease subtypes and point toward new insights about disease mechanism that can seed the search for new, targeted therapies. Also, recent studies suggest that accurately predicting the phenotypes of individuals from genotypes can depend critically on understanding interactions between genetic loci^104,107, and thus, progress in personalized genome interpretation and medicine depends on our understanding of how specific alleles interact to cause phenotypes. Our work establishes a new paradigm for approaching this problem and provides a systematic method for detecting genetic interactions that can be applied to existing population genetic data for a variety of human diseases.

Methods

1. Brief Summary of existing methods

Although efficient and scalable computational tools have been developed for searching for interactions amongst genome wide SNPs^{26–28, 108}, detecting them with statistical significance remains a major challenge. There are previous methods that have approached this problem, although from different perspectives than the method proposed here. We briefly summarize those methods and describe the novelty of our approach relative to this body of existing work.

Three general directions taken by previous methods for genetic interaction analysis that are the most similar to our approach are: (1) gene set enrichment-based approaches applied to loci derived from univariate tests, (2) gene set enrichment-based approaches applied to SNP-level summary statistics from interactions, and (3) methods that use pathways as a prior to study SNP or gene level interactions or reduce the number of hypothesis tests.

(1) Gene set enrichment-based approaches applied to loci derived from univariate tests

Gene set enrichment analysis (GSEA) was originally developed for case-control gene expression datasets^55,109 but has previously been adapted to summarize sets of loci (and their linked genes) derived from univariate tests applied to GWAS datasets^6,7. There are two key differences between these approaches and the method we propose. First, traditional approaches for GSEA start from univariate statistics of genes or SNPs, while our approach is built on interactions between pairs of SNPs that could have little or no single locus association with a disease phenotype. Second, approaches for GSEA target the enrichment of single gene/SNP associations in each individual pathway while our approach explores the enrichment of SNP-SNP interactions crossing each pair of pathways (between-pathway model or BPMs).

(2) Gene set enrichment-based approaches applied to SNP-level summary statistics from interactions

The gene set enrichment approach has also been applied beyond loci derived from univariate analysis. Another class of methods first measure genetic interactions based on pairwise SNP analysis, derive summary statistics at the individual SNP level based on specific interaction properties, and follow this with gene set enrichment analysis (GSEA) using pathway-associated SNP (or gene) interaction-based scores. For example, one such approach was recently applied to a bipolar study and a sporadic Amyotrophic Lateral Sclerosis study^110,111. In this study, whole genome SNPs were first filtered based on their ECML scores¹¹² and only the top 1000 SNPs with the strongest main effects and gene-gene interactions were retained for studying SNP-SNP interactions. Then, a SNP-SNP interaction network was constructed using a logistic regression model, and SNPs were ranked based on their network centrality in this network. Finally, candidate pathways were evaluated using a gene-set enrichment analysis based on pathway members’ rankings. A similar GO enrichment approach was applied to the sporadic Amyotrophic Lateral Sclerosis study¹¹¹, but SNP interaction strength was first estimated using a multiple dimension reduction (MDR) model and then summarized at a gene by enrichment analysis. GO annotation enrichment approaches were then applied to these gene-level scores. Again, these studies have not introduced the key concept that motivates our method: that genetic interactions connect coherently across pairs of distinct pathways.

(3) Methods that use pathways as a prior to study SNP or gene level interactions or reduce the number of hypothesis tests

Another strategy implemented by other existing methods to address the multiple hypothesis testing challenge presented by pairwise SNP analysis is to reduce the number of hypothesis tests, based on a variety of different criteria¹¹³. These methods typically employ a filtering step, either data driven^43-45 or knowledge driven^46,114, before applying statistical analysis of interactions. Other illustrative examples of this class of approaches are from a recent autism spectrum disorder study where all possible SNPs were tested for interactions with the Ras/MAPK pathway³⁹, and a melanoma risk study where SNP-SNP interactions were studied within the five pathways that are significant based on the traditional individual SNP based-GSEA analysis⁴⁰. Most studies implementing this approach investigate interactions among a small set of genetic variants (genes or SNPs) that either statistically demonstrate evidence for individual association with the disease phenotype or are known to be relevant to the disease based on prior knowledge. Hence, systematic detection of genetic interactions among novel genes, or genes that show no marginal association will not be detected by these approaches.

In summary, existing approaches are related to the proposed approach in the general sense that they leverage existing knowledge of pathways or other sets of functionally related sets of genes to either perform enrichment on univariate effects or interaction-based SNP summary statistics (e.g. interaction degree), or simply use pathways as a prior to reduce the number of SNP pairs tested for interactions. To our knowledge, no existing methods explicitly test for higher-level interactions connecting within or between multiple pathways and are sufficiently powered to perform this systematically across comprehensive pathway databases.

2. Genome-wide association studies (GWAS) datasets

Twelve GWAS datasets, representing 13 different cohorts covering seven diseases, were used in this paper: Parkinson’s disease (PD-NIA: phs000089.v3.p2, PD-NGRC: phs000196.v1.p1), breast cancer (BC-CGEMS-EUR, BC-MCS-JPN and BC-MCS-LTN: phs000517.v3.p1), schizophrenia (SCHZ-GAIN: phs000021.v3.p2; SCHZ-CATIE: CATIE study), hypertension (HT-eMERGE: phs000297.v1.p1; HT-WTCCC: cases are from EGAD00000000006, controls are from EGAD00000000001 and EGAD00000000002), prostate cancer (ProC-CGEMS: phs000207.v1.p1; ProC-BPC3: phs000812.v1.p1), pancreatic Cancer (.PanC-PanScan: phs000206.v3.p2) and Type 2 Diabetes (T2D-WTCCC: cases are from EGAD00000000009, controls are from EGAD00000000001 and EGAD00000000002). These data sets were obtained from three resources: dbGaP¹⁰¹, Wellcome Trust Case Control Consortium or the National Institute of Mental Health (NIMH)¹¹⁵. Details of each dataset (e.g. sample size, genotyping platform) are summarized in Supplementary Table 1.

3. Data processing

We used the same set of pre-processing steps for all GWAS data sets analyzed in this paper. Each of the steps is outlined in detail in the sections that follow.

3.1 Sample quality control

We first controlled data quality using the standard PLINK inclusion procedure with the following parameters: 0.02 as the maximal missing genotyping rate for each individual/SNP (–mind, –geno), 0.05 as the minimum minor allele frequency (–maf), and 1.0 1 0 as the Hardy-Weinberg equilibrium cutoff (–hwe 1e-6).

To identify outlier samples that were not consistent with the reported study population, we mapped SNPs in each GWAS dataset to Genome Reference Consortium GRCh37¹¹⁶ and combined the samples with the 1000 Genomes data¹¹⁷ (all ancestry groups). We then used PLINK to perform multi-dimensional scaling (MDS) analysis. Based on the MDS plot, we removed samples that were not tightly clustered with the corresponding ancestry groups in the 1000 Genomes data. For the two Parkinson’s disease cohorts, we followed the previous study¹¹⁸ to remove samples that are likely outliers. For these cohorts, duplicate subjects were kept in just one cohort with priority given to PD-NIA over the PD-NGRC cohort, so that we could retain as many samples as possible for the smaller cohort.

3.2 Population stratification

Checking relatedness among individuals

Relatedness among each pair of subjects was tested by calculating IBD¹¹⁹. For subject pairs with a proportion IBD score greater than 0.2, one was randomly chosen and removed from the data, and the other was kept.

Matching population structure between cases and controls

Because spurious allelic associations can be discovered due to unknown population structure^47,120,121, recent GWAS analyses suggest the use of a procedure to ensure balanced population structure between cases and controls¹¹⁹. Here, all subjects were clustered into groups of size 2, each containing one case and one control that are from the same sub-population (based on pairwise identity-by-state distance and the corresponding statistical test), as is implemented in PLINK¹¹⁹.

Future extensions of our method could include parameters capturing population structure directly in the model for genetic interactions, for example, as is described in¹²². The primary concern in developing and applying our current approach was to ensure that population structure was not introducing spurious between-pathway interactions, so we took this relatively conservative approach to adjust for population stratification. More sophisticated approaches could reduce the number of samples lost in filtering based on population stratification and improve the sensitivity of the method.

3.3 Filtering SNPs in linkage disequilibrium (LD)

For each data set, we selected all SNPs that could be mapped to at least one of the 6744 genes in the collection of pathways used in the pathway-pathway interaction search. A SNP was mapped to all genes that overlap with a +/- 50kb window centered at the SNP, and then mapped to pathways to which the corresponding gene(s) were annotated. For the purposes of computing pathway-level statistics, a SNP was only associated once with each pathway, even if it mapped to multiple genes in the pathway.

To avoid the discovery of trivial bipartite structures, SNPs in linkage disequilibrium (LD) need to be removed before between or within-pathway enrichment of SNP-SNP interactions is conducted. Two general approaches can be pursued towards this goal: 1) removing SNPs in LD before calculating pairwise SNP-SNP interactions; and 2) removing structures that emerge as a result of SNPs in LD after calculating pairwise SNP-SNP interactions.

The first alternative is more likely to miss informative SNP-SNP interactions than the second because it only considers a subset of all SNPs, but is more computationally efficient and scalable. It is worth noting that a biclustering algorithm pursuing the second approach was designed in³⁶ to condense a yeast SNP-SNP interaction network into an LD-LD network. The algorithm described in that work took the SNP-SNP interaction matrix as input and searched for sets of consecutive SNPs that had a statistically significant number of across-set SNP-SNP interactions based on a hypergeometric test. The algorithm was applied on a yeast SNP-SNP interaction network (originally constructed in¹²³) with 1977 SNPs, where the LD effect was assumed to be localized to less than 60 SNPs for computational reasons¹. We attempted to apply this algorithm to the human genotype datasets used in this paper and observed that the algorithm could handle about 1500 SNPs with a threshold of σ below 60) but not beyond. For example, on a data set with 2000 SNPs, the program did not finish in two days with σ = 100. Given issues with scalability of this approach, we adopted the first alternative, which is to select a subset of SNPs that are not in LD. To accomplish this, we used a procedure in PLINK¹¹⁹ to select a subset of unlinked SNPs from each GWAS dataset, specifically “-indep-pairwise 50 5 0.1”. With this procedure, PLINK searches each window of 50 SNPs with a sliding step of 5 SNPs, and selects a subset of SNPs with pairwise r² below 0.1 within each sliding window. After this procedure, ~15,000-20,000 SNPs were left in each dataset, and the highest r² between any pair of SNPs within any window of 1Mb is lower than the commonly used threshold for controlling LD (r² < 0.2)^7,124, demonstrating that the LD was effectively controlled. Note that by using a stringent r² threshold of 0.1, we are undoubtedly ignoring many informative SNPs. However, we chose this conservative approach to minimize the chance that spurious BPMs resulted from remaining LD structure. Future work that explores less conservative approaches to handling SNPs in LD would be worthwhile.

For diseases that we tested for replication of discovered interactions on independent cohorts of the same ancestry, to make the discovery and replication analysis consistent for these instances, cohorts were first combined and then processed using the procedures described above to select the subset of SNPs on which the analysis was run. After selection of SNPs, population stratification and discovery of interactions was then performed independently. We followed this procedure for three of the diseases analyzed, Parkinson’s disease, schizophrenia, and breast cancer.

For prostate cancer, our access to ProC-CGEMS and ProC-BPC3 was gained at different times, so SNPs used in ProC-BPC3 were selected based on the CGEMS cohort. A summary of all processed datasets used in this study is included in Supplementary Table 1.

3.4 Selection of Pathways

833 human pathways (gene sets) were collected from the Kyoto Encyclopedia of Genes and Genomes (KEGG)^125,126, Biocarta¹²⁷, and Reactome⁵¹ (Supplementary Table 2). We excluded any pathway from our analysis with less than 10 or more than 300 genes, or less than 10 or more than 300 SNPs, mapping to the pathway after LD control to avoid pathways that were too small to provide sufficient statistical power or too large to provide specific biological insights.

4. SNP-SNP genetic interaction estimation

MM, Mm and mm are used to denote the three genotypes of each SNP, i.e., majority homozygous, heterozygous and minority homozygous, respectively. Our method implements multiple disease models, which affect how interactions are estimated at the SNP-SNP interaction level. A minor allele (m) at each locus could be additive, dominant or recessive in the context of different diseases. For the additive model, we used the standard logistic regression-based model implemented in CASSI²⁸ to quantify the interaction between two SNPs coded as follows, mm=2, Mm=1, MM=0. In this model, the goodness-of-fit was compared between a standard logistic regression model with an interaction term between the two loci of interest and a standard logistic regression without an interaction term, and the significance of the interaction was measured by a likelihood ratio test²⁸. We refer to this type of SNP-SNP interaction as an additive-additive (AA) model based interaction. In the dominant model, a SNP is encoded as mm=1, Mm=1, MM=0. In the recessive model, a SNP is encoded as mm=1, Mm=0, MM=0. Because the minor allele could have recessive (R) or dominant (D) contribution to disease at two different loci comprising an interaction, four types of SNP-SNP interactions were examined: recessive-recessive (RR), dominant-dominant (DD), recessive-dominant (RD), and dominant-recessive (DR) model-based interaction for each pair of SNPs. The interactions under these four models can also be estimated by a logistic regression-based model similar to the AA case described above except with the appropriate encoding of the SNP genotypes. Alternatively, the RR, DD, DR and RD interactions can be estimated by explicit statistical tests (e.g. hypergeometric tests) of the association between a specific genotype combination of two SNPs and a disease of interest, where this association is compared to the association between each of the individual SNPs and the disease (marginal effect). Interactions estimated by logistic regression based models directly capture non-additive effects between two SNPs considering different combinations of SNP genotypes. In contrast, interactions estimated by explicit statistical tests have the flexibility of specifically testing certain combinations of genotypes for association with the phenotype. We explored alternative approaches both in representing different disease models and in the estimation of SNP-SNP interactions, and found that RR, DD, DR and RD interactions estimated by explicit statistical tests more likely led to the discovery of significant BPMs/WPMs in the context of our BridGE approach. The measure we developed based on explicit statistical tests, called hygeSSI, is described in detail below. The relationship between hygeSSI and logistic regression based models is explored in more depth in section 8.

4.1 hygeSSI

We designed a hypergeometric-based measurement (hygeSSI) to estimate the interactions between two binary-coded SNPs (dominant or recessive as described above). The hypergeometric p-value for a pair of binary-coded SNPs with respect to a case-control cohort is calculated as follows: Where S_x and S_y are two SNPs; M is the total number of samples; N is the total number of samples in class C; K is the total number of samples that have genotype T; X is the total number of samples that have genotype T in class C.

We use P_{1 ~} (S_x, C) and P_{1 ~} (S_y, C) to represent the individual SNP S_x and S_y’s main effects and P₁₁(S_x, S_y, C), P₁₀(S_x, S_x, C), P₀₁(S_x, S_y, C), and P₀₀(S_x, S_x, C) to represent the effects of all pairs of combinations. With a nominal p-value threshold (α), we first require a SNP pair to have significant association with the phenotype P₁₁(S_x, S_y, C) ≤ α. In addition, we specifically exclude instances where other allele combinations show significant association with the trait, i.e. we require: P₁₀(S_x, S_y, C) < α, P₀₁(S_x, S_y, C) < α and P₀₀(S_x, S_y, C) > α. Given a binary-coded SNP pair (S_x, S_y) and a binary class label C, the following measure hygeSSI (Hypergeometic SNP-SNP Interaction) was defined to estimate the genetic interaction between two SNPs S_x and S_y (specifically for P₁₁):

As described in a recent comprehensive review²⁰, algorithms based on logistic/linear regression, multifactor dimensionality reduction (MDR)¹²⁸, entropy or information theory¹²⁹ have been developed to measure genetic interactions. All of these approaches quantify the synergistic effect of SNP pairs by comparing the relative strength of the association between a pair of SNPs and a disease trait with the strength of the associations between two individual SNPs and the disease trait. A few of these alternatives were tested in the context of our method and did not provide the significant results we achieved with the metric above. We designed the above hygeSSI measure because it explicitly captures the interaction between combinations of specific genotypes of two loci.

4.2 Construction of SNP-SNP interaction networks

We constructed SNP-SNP interaction networks to serve as the basis for the pathway level BPM tests based on each of the disease model assumptions described above. An additive-additive (AA) interaction network was constructed by the described logistic regression based approach, where SNP-SNP edge scores were derived from the -log₁₀ p-value resulting from the likelihood ratio test. The recessive-recessive (RR) and dominant-dominant (DD) interaction networks were computed based on the hygeSSI metric described above, and only positive interactions were kept in the network (i.e. where the joint effect of the SNP-SNP pair under the corresponding disease model was stronger than any marginal or alternative combination of SNPs). In addition to the above three networks, we also constructed a hybrid SNP-SNP interaction network in which interactions under recessive and dominant disease model could coexist. To do this, we integrated all four networks (RR, DD, RD and DR) into a single network (RD-combined) by taking the maximum hygeSSI among the four interaction networks for each pair of SNPs.

5. Measuring pathway-pathway interactions

5.1 Estimating pathway-pathway interactions based on the SNP-SNP interaction network

For each pair of pathways, we want to test if the number of SNP-SNP interactions between them is significantly higher than expected given the overall density of the SNP-SNP network as well as the marginal interaction density of the two pathways involved. enrichment analysis based on SNP-SNP interactions is much more computationally challenging, and thus we choose to binarize the hygeSSI values (based on a lenient threshold) to make follow up computation efficient and scalable. After binarization, we divided the SNP-SNP interaction network into two networks based whether the joint mutation of a SNP pair is more prevalent in the case or control group, which we refer to as the risk and protective networks, respectively.

For each pathway-pathway interaction, we first removed the common SNPs shared between two pathways. Then, we test if the observed SNP-SNP interaction density between two pathways is significantly higher than expected globally (the global network density) and locally (the marginal density of SNP-SNP interactions of the two pathways). Specifically, the marginal density of a pathway is calculated as the SNP-SNP interaction density between the SNPs mapped to the pathway and all other SNPs in the network. We computed a chi-square statistic to test differences from both global and local density, namely chi-square global and chi-square local . The chi-square test assumes the SNP-SNP interactions in a network are independent, which may not be true for a variety of reasons. So, in addition to these chi-square statistics, we use permutation tests to derive an empirical p-value for each pathway-pathway interaction. To do this, we randomly shuffled the SNP-pathway membership (NP = 100,000-200,000 times), and for a given pathway-pathway interaction (bpm_i), we compared its observed and with the values from these random permutations and to obtain a permutation-based p-value. We used (p_perm) together with and for BPM discovery as further described in detail in the next two sections.

5.2 Correction for multiple hypothesis testing

Because a large number of pathway pairs (all possible pathway-pathway combinations) are tested in the search for significant BPMs, correction for multiple hypothesis testing is needed. To estimate a false discovery rate, we employed sample permutations (NP = 10 times) to derive the number of expected BPMS discovered by chance at each level of significance. We randomly shuffled the original case-control groups 10 times while maintaining the matched case-control population structure. For each permuted dataset, the same, complete pipeline for BPM discovery was performed, including calculation of the SNP-SNP interaction network after permutation, which was then thresholded at a fixed interaction density matching the density chosen for the real sample labels. From these sample permutations, we obtained three null distributions (, , and ), from which we estimated the false discovery rate (FDR) for each BPM (e.g., bpm_i). Specifically, we compared the number of BPMs observed in each real dataset that have better overall statistics than with the corresponding random expectation estimated from the three null distributions derived from sample permutations (, , and ):

A simpler approach to estimate FDR would be to use only the SNP permutation-based p-value, p_perm, in the above formula. However, we chose to use all three measurements (, and p_perm) because we observed that in some cases the permutation-based p-value alone did not provide enough resolution to differentiate among top BPMs (this could be improved with additional SNP permutations, but this is computationally expensive). and provide higher resolution measures of significance of each BPM and, when combined with the permutation-based p-value, can differentiate among the top-most significant discoveries.

We emphasize that we have used a hybrid permutation strategy to assess significance of the discovered structures. The primary permutation applied was to permute the SNP labels, for which 100,000-200,000 permutations were used for each dataset analyzed. The sample (case-control label) permutation approach mentioned above was used in addition to the SNP permutation strategy to estimate our false discovery rate across all discovered interactions. For each of the 10 sample permutations, we ran the full set of 100,000-200,000 SNP permutations. This hybrid approach provides a robust estimate of significance of the discovered pathway interactions and properly corrects for multiple testing.

We also conducted a study to explore the sensitivity of our FDR estimation on the number of sample permutations. Specifically, for the PD-NIA dataset, we performed 1000 sample permutations (and 200,000 SNP permutations within each of these) to derive an estimate of FDR for discoveries in this dataset (Supplementary Table 25). As shown in Supplementary Fig. 5, the FDRs estimated from 10 sample permutations show reasonable agreement to FDRs estimated from 1000 sample permutations (Pearson’s correlation of 0.81).

5.3 Selection of disease models and density thresholds

The method we proposed for pathway-level detection of genetic interactions is general in the sense that any disease model (e.g. RR, DD, RD-combined, and AA) or interaction statistic could be used to discover pathway-level interactions. In this study, we focus on prioritizing a single disease model per disease cohort for full analysis by our pipeline to limit the complexity of data analysis across the 13 GWAS cohorts we explored with our method. Here, we describe the strategy we used to select the disease model to focus on for each GWAS dataset.

To prioritize the disease model and SNP-SNP interaction network density threshold for each data set, we first performed a pilot experiment in which we examined combinations of different disease models and different density thresholds, but with fewer SNP permutations (Supplementary Table 23). To exclude SNP pairs with little or weak interactions from our analysis, we required each SNP pair’s hygeSSI score to be at least 0.2 before applying density-based binarization. For each combination, we performed 10,000 SNP-pathway membership permutations (as compared to 100,000-200,000 for a complete run) to estimate FDRs using a similar procedure as that described in section 5.2, except that SNP permutations were used to estimate FDR instead of sample permutations, as sample permutations are much more computationally expensive. Based on this pilot experiment in each cohort, we chose the disease model and density threshold combination that resulted in the lowest estimated FDR for the top-most significant pathway-pathway interaction. The rationale of using such a pilot experiment is to identify the disease model that is most likely to discover significant pathway-level interactions while limiting the computational burden of applying our approach to several GWAS cohorts under multiple disease models. Based on these pilot experiments, which were performed for all 13 cohorts, we ran the complete BridGE pipeline, including 100,000-200,000 SNP permutations and 10 sample permutations with the disease model and network density threshold chosen from the pilot experiments. The results of pilot experiments for all cohorts are reported in Supplementary Table 23, and all full BPM discovery results for all diseases can be found in Supplementary Table 3 and 9-20 as well a summary in Suppementary Table 8. We note that for focused application of our approach on a single or small number of cohorts of interest, we would suggest exploring all possible disease models with complete runs.

5.4 Replication in independent cohorts

The significant BPMs discovered from one cohort could be evaluated in another independent cohort for replication. To determine if a discovered BPM was replicated in an independent cohort, we required the BPM to satisfy , , and p_perm ≤ 0.05 on the validation cohort. We also performed sample permutation tests (NP=10) for each validation cohort, from which we could generate null distributions for , and p_perm in the validation cohort. Given a set of discovered BPMs (e.g. FDR ≤ 0.25), we calculated fold enrichment by comparing the number of BPMs discovered from the original dataset that passed the validation criteria to the average number of BPMs that passed the same validation criteria in the random sample permutations. More specifically, given a set of significant BPMs (bpm_1,2,…,k) which were discovered from original cohort, the fold enrichment for replication is defined as:

We also evaluated the significance of the fold enrichment by 10,000 bootstrapped BPM sets. Specifically, we randomly selected the same number of BPMs and used the above procedure to evaluate the fold enrichment, and we repeated this for 10,000 times to generate a null distribution for the fold enrichment scores in the validation cohort. We then evaluated the significance of the fold enrichment score for our discovered BPM set based on this empirical null distribution. All replication results can be found in Supplementary Table 6 and 21.

For the BPMs that replicated in an independent cohort, we further checked if the SNP-SNP interactions supporting the discovered pathway-level interactions were similar between the cohort used for discovery and the independent cohort used for replication. For example, we used the BPMs discovered from PD-NIA (FDR 0.25) and for each BPM replicated in PD-NGRC, we computed the number of SNP-SNP interactions in common between the PD-NIA and PD-NGRC interaction networks as supporting interactions for the BPM. We used the same permutation approach as that described above for BPM-level validation except that the SNP-SNP interactions supporting each BPM were compared between the discovery and validation cohorts by a hypergeometric test. This was done for the real validation cohort PD-NGRC first and then repeated 10 times under sample permutations of the validation cohort to estimate a null distribution. A Wilcoxon’s rank-sum test was then used to evaluate the significance of the SNP-SNP interaction overlap between the replicated BPMs in the real validation cohort and in the random sample permuted validation cohorts (Fig. 4B).

5.5 BPM redundancy

Due to the fact that many of the curated gene sets overlap, we needed to control for redundancy in the discovered BPMs. To do this, in reporting total discoveries, we filtered BPMs based on their relative overlap in terms of SNP-SNP interactions using an overlap coefficient. The overlap coefficient between two BPMs is defined as the number of overlapping SNP pairs divided by the number of possible SNP pairs in the smaller BPM.

For the significant BPMs discovered, we computed all pairwise overlap coefficients and used a maximum allowed similarity score of 0.25 as a cutoff. We reported the number of unique BPMs based on the number of connected components. For visualization purposes (Fig. 3), we selected representative BPMs from each connected component, prioritizing BPMs that validated in the independent cohort (PD-NGRC) for visualization. Significance of the validation of the set of BPMs was evaluated on the entire set of discovered BPMs using the permutation procedures described above, which directly accounts for the redundancy among the discovered BPMs.

6. Measuring within-pathway interactions

In addition to the between-pathway model (BPM), we also tested for enrichment of genetic interactions within each pathway³⁴ (within-pathway models, WPMs). All of the measures and procedures described above for BPMs apply directly to the WPM case, only we specifically look at SNP pairs connecting genes within the same pathways/gene sets instead of between pathway pairs. For WPMs, the false discovery rate and validation statistics were computed separately from BPMs. All WPM discovery results can be found in Supplementary Table 3, 9-20.

7. Identifying pathway hubs in the SNP-SNP interaction network

Since both “between-pathway model” and “within-pathway model” analysis have been designed to avoid discoveries caused by the higher marginal interaction density of the individual pathways, pathways that are frequently interacting with many loci across the genome (as opposed to localized interactions with functionally coherent gene sets) are less likely to appear in our pathway-pathway or within-pathway interactions. However, such pathways may also be disease relevant as they reflect pathways that modify the disease risk associated with a large number of other variants, so we also report pathways exhibiting these characteristics with BridGE (we refer to these as “PATH” discoveries in BridGE output files). For PATH discovery, the procedure is similar to that for BPMs and WPMs, with a minor modification to the scoring of each pathway. Specifically, each pathway is represented by a vector of pathway-associated SNPs’ degrees in the SNP-SNP interaction network. We then applied a one-tailed rank-sum test to compare each pathway-associated degree vector with the non-pathway-associated degree vector to see if the PATH associated SNPs exhibited significantly more interactions than the entire set of SNPs. PATH discovery and validation is then done by repeating the same steps as BPM/WPM discovery but replacing the and statistics with the rank-sum test p-value (in –log₁₀ scale). All PATH discovery results can also be found in Supplementary Table 3 and 9-20. Many of these also have clear relevance to the disease cohort in which they were discovered. For example, applying BridGE to discover such hub pathways in the context of Parkinson’s disease resulted in 3 significant pathways after removing redundancy (FDR ≤ 0.25), including the same Golgi-associated vesicle biogenesis gene set as well as the IL-12 and STAT4 signaling pathway (Biocarta) discussed in the main text.

8. Comparison of hygeSSI interactions with logistic regression-based interactions

We examined if the interactions captured by hygeSSI were non-additive as measured through a standard logistic regression-based interaction measure. We applied the logistic regression model on the PD-NIA data and computed RR, DD, RD and DR interaction networks (binary encoding as described earlier). We also integrated these 4 logistic regression-based networks to form an RD-combined network. Then we checked (1) if the top SNP-SNP interactions based on hygeSSI were significant (p≤0.05) in logistic regression based tests, and (2) if the significant BPMs discovered from a hygeSSI interaction network show significance (, , and p_perm ≤ 0.05) based on SNP-SNP interactions estimated from logistic regression. This analysis revealed that among the top 1% hygeSSI interactions, 93% are significant based on a logistic regression-based test for interaction. And for the significant BPMs (FDR≤0.05), 100% of them are also significant if only SNP-SNP interactions also supported by a logistic regression model are considered. These data suggest SNP-SNP interactions captured by hygeSSI do represent non-additive interactions as defined based on a logistic regression model. Detailed results from this comparison can be found in Supplementary Table 24. Further evaluation of different disease models and different measures for estimating SNP-SNP interactions in the context of BridGE will be the focus of future work.

9. Evaluation of significance of individual SNP-SNP interaction tests

For SNP-SNP pairs that supported the between-pathway interaction reported in Fig. 2B, we checked the statistical significance of SNP-SNP interaction pairs tested individually. We measured all pairwise additive-additive (AA), recessive-recessive (RR), dominant-dominant (DD) interactions. We then performed a permutation test in which sample labels were permuted 10 times and for each permutation, all pairwise AA, RR, DD interactions were computed for each SNP pair. These permutations were used to estimate a false discovery rate (FDR) for those SNP-SNP pairs supporting the reported BPM. No individual SNP-SNP pairs were significant after FDR-based multiple hypothesis correction (Fig. 2D, Supplementary Fig. 1).

10. Pathway enrichment analysis of single locus effects

To check if the pathways involved in the significant BPMs discovered in PD-NIA were enriched for SNPs with moderate univariate association with Parkinson’s disease, we performed single pathway enrichment analysis for the same set of 685 pathways used for BPM discovery. In the single pathway enrichment analysis, we used a hypergeometric test as the SNP-level statistic for measuring univariate association (risk and protective associations were evaluated separately) for three different disease models: 1) recessive; 2) dominant, and 3) a combination of recessive and dominant, in which each SNP were tested for both recessive and dominant disease models and the more significant one assigned to each SNP. We then used Wilcoxon’s rank-sum test to check if a pathway was enriched for SNPs with higher association than the background (all SNPs). With 10,000 sample permutations, we computed FDR for each individual pathway (both risk and protective associations) by using same procedure described in 5.2. The results are summarized in Supplementary Table 5.

11. Comparison of pathways discovered by BridGE with previously reported disease risk loci from the GWAS catalog

To check if previous singly-associated SNPs also appear in our discovered pathway-level interactions, we compared our BridGE-discovered pathways with pathways that could be linked to disease risk loci reported in NHGRI-EBI GWAS catalog¹³⁰ (Ensembl release version 87, retrieved on Feb 6, 2017). Based on the GWAS catalog, the numbers of genes linked to known risk loci (p≤2.0 x 10^-5) in each disease are: 143 (144 SNPs, Parkinson’s disease), 1009 (824 SNPs, Schziophrenia), 134 (172 SNPs, Breast cancer), 71 (57 SNPs, Hypertension), 249 (234 SNPs, Prostate cancer) and 294 (288 SNPs, Type II diabetes). For each disease, we summarized all pathways that were discovered by BridGE (FDR ≤ 0.25) and identified pathways that were implicated by individually associated SNPs reported in the GWAS catalog (a SNP mapping to a single gene in a given pathway was assumed to implicate the corresponding pathway). For context, for each disease, we also summarize the total number of genes implicated by GWAS-identified SNPs, how many these map to the 833 pathways we used in our study, and how many of them can be linked to the significant pathways identified by BridGE. These results are presented in Supplementary Table 22.

12. Dependence of interaction discoveries on the assumed disease model

While we tested multiple disease models (additive, dominant, recessive, and combined dominant-recessive), the most significant discoveries for the majority of diseases examined were reported when using a dominant or combined model as measured by our SNP-SNP interaction metric¹³¹. The relative frequency of interactions under a dominant vs. a recessive model may be largely due to our increased power to detect interactions between SNPs with dominant effects compared to recessive effects. More specifically, individuals with both heterozygous and homozygous (minor allele) genotypes at two interacting loci would be affected under a dominant disease model, while only individuals with homozygous (minor allele) genotypes would be affected in a recessive disease model. The number of individuals homozygous at two interacting loci can be quite small depending on the allele frequency, which limits our power to discover them. Thus, the larger number of discoveries based on a dominant model assumption relative to a recessive model is likely a reflection of difference in statistical power and not an indication that genetic interactions among alleles with dominant effects are contributing more strongly to disease risk. We observed that interactions derived from an additive disease model provided the fewest significant discoveries when used in the context of BridGE based on the pilot experiments (Supplementary Table 23). To understand this, we investigated whether the SNP-SNP interactions supporting the BPMs discovered under the combined dominant-recessive model for the PD-NIA cohort were non-additive when evaluated using a logistic-regression based interaction test as opposed to the direct association tests used for our dominant and recessive disease models¹³¹. Most SNP-SNP interactions supporting the PD-NIA discoveries were indeed non-additive when assessed using the logistic regression framework, but these were not necessarily ranked among the highest SNP-SNP pairs when assessed in the context of a logistic regression model¹³¹ (Supplementary Table 24), which may explain the difference in results under the additive vs. recessive or dominant disease models. An important distinction between the SNP-level interaction metric we use is that we specifically identify the small subset of individuals with the appropriate combination of genotypes (dominant model: heterozygous for minor allele at two candidate loci; recessive model: homozygous for minor allele at two candidate loci), and directly test for association with the disease phenotype, whereas for the additive model, an interaction term must explain a sufficient fraction of the variance across the entire population for it to reach significance. This distinction may play a role in why we are able to discover pathway-level genetic interactions with the metric proposed here but rarely with a standard additive model. It is worth noting that the core of the BridGE approach, discovering genetic interactions in aggregate rather than in isolation, is readily adaptable to other disease models or other statistical measures of interaction. Further exploration of different disease models as well as different statistical measures of interaction ^123,132 would be worthwhile.

13. Power analysis based on interaction simulation study

To characterize the power of our BridGE approach with respect to sample size, effect size, minor allele frequency and pathway size, we used a two-stage simulation approach. We first generated synthetic GWAS datasets with embedded SNP-SNP interaction pairs using GWAsimulator¹⁰⁰. Specially, we used PD-NIA as input to GWAsimulator and embedded SNP-SNP interactions with different minor allele frequencies (e.g. 0.05, 0.1, 0.15, 0.2 and 0.25) and a range of interaction effects (e.g. d₁₁=d₁₂=d₁₂=d₂₂=1.1, 1.5, 2, 2.5, 3 and 5, where 0, 1, 2 refer to the number of minor alleles present in a given genotype for an individual SNP, and d₁₁, d₁₂, d₁₂, and d₂₂ are defined as the relative risk of that genotype–11,12, 21 or 22– versus 00)¹⁰⁰. We also varied the number of samples (genotypes) in the simulation (e.g. 200, 500, 1000, 2000, 5000 and 10000). In all simulations, we specified the disease prevalence to be 0.05, dominance effect for all disease SNPs with PR1=1 (see GWAsimulator for more details)¹⁰⁰. Under different scenarios (combinations of different minor allele frequencies, interaction effects and sample sizes), we embedded 100 SNP pairs and measured the percentage of SNP-SNP interactions that were identified by our pairwise SNP-SNP interaction measure, hygeSSI at a 1% network density (e.g. SNP-SNP pairs whose hygeSSI is greater or equal to the 99th percentile of all possible interactions) (Supplementary Fig. 6). These simulations provide a direct measure of the sensitivity and specificity of the SNP-SNP interaction level measure that forms the basis of the pathway-level statistics.

The SNP-SNP level power statistics were complemented with a second set of simulations in which we directly assessed the sensitivity of BridGE in detecting BPMs with different levels of noise in the SNP-SNP level network (derived from the process described above). To characterize the statistical power of our approach as a function of pathway size, we first generated a synthetic interaction network with the same degree distribution as the PD-NIA DD network at 1% density. Then, we embedded a set of non-overlapping BPMs into this SNP-SNP interaction network while retaining the same degree distribution and density of the network. Each set had 90 BPMs at 9 different sizes (number of SNPs mapped to the two pathways in each BPM: 10×10, 25x25, 50×50, 75x75, 100×100, 150×150, 200×200, 250×250 and 300×300; and 10 different background densities 0.01, 0.012, 0.014, 0.016, 0.018, 0.02, 0.025, 0. 03, 0.04 and 0.05. We applied 150,000 SNP-pathway membership permutations to assess the significance of these embedded patterns. The SNP permutation-derived p-values of the simulations were reported in Supplementary Fig. 3 and provide an estimation of BPM density required for detecting interactions between pathways of different sizes. We used the average p-values (p = 3.0×10^-5, SNP-permutation) of the significant BPM discoveries across all GWAS cohorts (FDR ≤ 0.25) as the discovery significance cutoff for the simulation analysis.

We derived power estimates for each combination of parameter settings by integrating the results from above two simulation studies. More specifically, we estimated the minimum sample size needed to discover significant BPMs at different pathway sizes under each of the scenarios (e.g. minor allele frequency, relative disease risk). To connect the two simulation studies, we require a scaling parameter (here, we explored s = 0.025, 0.05 and 0.1) which corresponds to the biological density of genetic interactions crossing each pair of truly interacting pathways. This represents the fraction of all possible SNP-SNP pairs crossing the pair of pathways of interest for which the combination of variants actually has a functional deleterious impact on the phenotype. This quantity is expected to be relatively small, but is difficult to estimate, which is why we have explored three scenarios (s = 0.025, 0.05 and 0.1). For a given BPM of a specific size (10×10, 25×25, 50×50, 75×75, 100×100, 150×150, 200×200, 250×250 and 300×300), from the 2^nd simulation, we identified the corresponding BPM density needed for it to rise to the level of statistical significance required for a 25% FDR based on the PD-NIA cohort. We then scaled the required density by the parameter, s, and based on the 1^st set of simulation results, identified the minimum sample size required under each scenario (combinations of minor allele frequency, interaction effect, and sample size) to support the discovery of the corresponding BPM (results summarized in Fig. 6B).

Simulation results for additional scaling parameters (s = 0.1 and s = 0.025) are included in the supplementary Supplementary Fig. 2. These plots together provide an estimate of the power of the BridGE approach to detect pathway-pathway interaction in these different scenarios. We note that this power analysis was conducted for the dominant disease model, which comprises the majority of the BPM interactions discovered across all cohorts. Sensitivity of our method under a recessive model assumption is expected to be lower, which is consistent with the relative rate of discoveries of both types.

PD-NIA (phs000089.v3.p2)

The genotyping of samples was provided by the National Institute of Neurological Disorders and Stroke (NINDS). The dataset used for the analyses described in this manuscript were obtained from the NINDS Database found at https://www.ncbi.nlm.nih.gov/gap

PD-NGRC (phs000196.v3.p1)

This work utilized in part data from the NINDS DbGaP database from the CIDR:NGRC PARKINSON’S DISEASE STUDY.

SZ-GAIN (phs000021.v3.p2)

Funding support for the Genome-Wide Association of Schizophrenia Study was provided by the National Institute of Mental Health (R01 MH67257, R01 MH59588, R01 MH59571, R01 MH59565, R01 MH59587, R01 MH60870, R01 MH59566, R01 MH59586, R01 MH61675, R01 MH60879, R01 MH81800, U01 MH46276, U01 MH46289 U01 MH46318, U01 MH79469, and U01 MH79470) and the genotyping of samples was provided through the Genetic Association Information Network (GAIN). The datasets used for the analyses described in this manuscript were obtained from the database of Genotypes and Phenotypes (dbGaP) found at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000021.v3.p2. Samples and associated phenotype data for the Genome-Wide Association of Schizophrenia Study were provided by the Molecular Genetics of Schizophrenia Collaboration (PI: Pablo V. Gejman, Evanston Northwestern Healthcare (ENH) and Northwestern University, Evanston, IL, USA).

BC-CGEMS-EUR (phs000147.v3.p1)

This dataset was from the Cancer Genetic Markers of Susceptibility (CGEMS) Breast Cancer Genome-wide Association Study with dbGaP accession number phs000147.v3.p1.

BC-MCS-LTN, BC-MCS-JPN (phs000517.v3.p1)

The Multiethnic Cohort and the genotyping in this study were funded by grants from the National Institute of Health (CA63464, CA54281, CA098758, CA132839 and HG005922) and the Department of Defense Breast Cancer Research Program (W81XWH-08-1-0383).

HT-eMERGE (phs000297.v1.p1)

Group Health Cooperative/University of Washington – Funding support for Alzheimer's Disease Patient Registry (ADPR) and Adult Changes in Thought (ACT) study was provided by a U01 from the National Institute on Aging (Eric B. Larson, PI, U01AG006781). A gift from the 3M Corporation was used to expand the ACT cohort. DNA aliquots sufficient for GWAS from ADPR Probable AD cases, who had been enrolled in Genetic Differences in Alzheimer's Cases and Controls (Walter Kukull, PI, R01 AG007584) and obtained under that grant, were made available to eMERGE without charge. Funding support for genotyping, which was performed at Johns Hopkins University, was provided by the NIH (U01HG004438). Genome-wide association analyses were supported through a Cooperative Agreement from the National Human Genome Research Institute, U01HG004610 (Eric B. Larson, PI).

Mayo Clinic – Samples and associated genotype and phenotype data used in this study were provided by the Mayo Clinic. Funding support for the Mayo Clinic was provided through a cooperative agreement with the National Human Genome Research Institute (NHGRI), Grant #: UOIHG004599; and by grant HL75794 from the National Heart Lung and Blood Institute (NHLBI). Funding support for genotyping, which was performed at The Broad Institute, was provided by the NIH (U01HG004424).

Marshfield Clinic Research Foundation – Funding support for the Personalized Medicine Research Project (PMRP) was provided through a cooperative agreement (U01HG004608) with the National Human Genome Research Institute (NHGRI), with additional funding from the National Institute for General Medical Sciences (NIGMS) The samples used for PMRP analyses were obtained with funding from Marshfield Clinic, Health Resources Service Administration Office of Rural Health Policy grant number D1A RH00025, and Wisconsin Department of Commerce Technology Development Fund contract number TDF FYO10718. Funding support for genotyping, which was performed at Johns Hopkins University, was provided by the NIH (U01HG004438).

Northwestern University – Samples and data used in this study were provided by the NUgene Project (www.nugene.org). Funding support for the NUgene Project was provided by the Northwestern University’s Center for Genetic Medicine, Northwestern University, and Northwestern Memorial Hospital. Assistance with phenotype harmonization was provided by the eMERGE Coordinating Center (Grant number U01HG04603). This study was funded through the NIH, NHGRI eMERGE Network (U01HG004609). Funding support for genotyping, which was performed at The Broad Institute, was provided by the NIH (U01HG004424).

Vanderbilt University - Funding support for the Vanderbilt Genome-Electronic Records (VGER) project was provided through a cooperative agreement (U01HG004603) with the National Human Genome Research Institute (NHGRI) with additional funding from the National Institute of General Medical Sciences (NIGMS). The dataset and samples used for the VGER analyses were obtained from Vanderbilt University Medical Center's BioVU, which is supported by institutional funding and by the Vanderbilt CTSA grant UL1RR024975 from NCRR/NIH. Funding support for genotyping, which was performed at The Broad Institute, was provided by the NIH (U01HG004424).

Assistance with phenotype harmonization and genotype data cleaning was provided by the eMERGE Administrative Coordinating Center (U01HG004603) and the National Center for Biotechnology Information (NCBI). The datasets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000297.v1.p1.

ProC-CGEMS (phs000207.v1.p1)

This data was from the Cancer Genetic Markers of Susceptibility (CGEMS) Prostate Cancer Genome-Wide Association Study.

ProC-BPC3 (phs000812.v1.p1):

The Breast and Prostate Cancer Cohort Consortium (BPC3) genome-wide association studies of advanced prostate cancer and estrogen-receptor negative breast cancer was supported by the National Cancer Institute under cooperative agreements U01-CA98233, U01-CA98710, U01-CA98216, and U01-CA98758 and the Intramural Research Program of the National Cancer Institute, Division of Cancer Epidemiology and Genetics.

PanC-PanScan (phs000206.v5.p3)

This project was funded in whole or in part with federal funds from the National Cancer Institute (NCI), US National Institutes of Health (NIH) under contract number HHSN261200800001E. Additional support was received from NIH/NCI K07 CA140790, the American Society of Clinical Oncology Conquer Cancer Foundation, the Howard Hughes Medical Institute, the Lustgarten Foundation, the Robert T. and Judith B. Hale Fund for Pancreatic Cancer Research and Promises for Purple. A full list of acknowledgments for each participating study is provided in the Supplementary Note of the manuscript with PubMed ID: 25086665.

Conflict of Interest

The authors declare that they have no conflict of interest.

List of Supplementary Tables

Supplementary Table 1. Information about the 13 genome-wide association studies (GWAS) data sets used in this study.

Supplementary Table 2. List of 833 gene sets from KEGG, BioCarta and Reactome.

Supplementary Table 3. BridGE results from PD-NIA cohort based on recessive/dominant combined disease model.

BridGE results are reported for the PD-NIA cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the combined recessive-dominant disease model.

Supplementary Table 4. List of BPMs and WPMs after filtering for redundancy for the PD-NIA cohort.

This file contains a list of BPMs obtained from the PD-NIA cohort after controlling for redundancy based on a maximum overlap coefficient of 0.25. These correspond to the set visualized in Fig. 3A of the manuscript.

Supplementary Table 5. Pathway enrichment analysis for single locus effects for PD-NIA.

Pathway enrichment analysis on single locus effects was computed for several different disease models and subsets of SNPs. Each of the following tabs appears in this file: (A) combined disease model, LD controlled SNP set, (B) dominant disease model, LD controlled SNP set, (C) recessive disease model, LD controlled SNP set, (D) combined disease model, genome-wide SNP set, (E) dominant disease model, genome-wide SNP set, (F) recessive disease model, genome-wide SNP set.

Supplementary Table 6. Replication statistics and lists of replicated BPMs for BridGE discoveries from PD-NIA.

BPMs discovered from the PD-NIA cohort were tested for replication in the independent PD-NGRC cohort. Tab (A) contains a summary of replication statistics and tab (B) contains a list of replicated BPMs.

Supplementary Table 7. Summary of between and within-pathway interactions discovered across six diseases. This file contains a list of BPMs and WPMs (top 10) discovered across six diseases. These correspond to the set visualized in Fig. 5 of the manuscript.

Supplementary Table 8. Summary of interactions discovered across 13 GWAS cohorts.

The number of between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH) discovered are reported for each of the 13 GWAS cohorts at a range of FDR cutoffs.

Supplementary Table 9. BridGE results from PD-NGRC cohort based on dominant disease model. BridGE results are reported for the PD-NGRC cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the dominant disease model.

Supplementary Table 10. BridGE results from SZ-GAIN cohort based on combined disease model.

BridGE results are reported for the SZ-GAIN cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately.These results were derived using the combined recessive-dominantdisease model.

Supplementary Table 11. BridGE results from SZ-CATIE cohort based on recessive disease model.

BridGE results are reported for the SZ-CATIE cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately.These results were derived using the recessivedisease model.

Supplementary Table 12. BridGE results from BC-CGEMS-EUR cohort based on recessive disease model.

BridGE results are reported for the BC-CGEMS-EUR cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the recessive model.

Supplementary Table 13. BridGE results from BC-MCS-JPN cohort based on dominant disease model.

BridGE results are reported for the BC-MCS-JPN cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the dominant model.

Supplementary Table 14. BridGE results from BC-MCS-LTN cohort based on dominant disease model.

BridGE results are reported for the BC-MCS-LTN cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the dominant model.

Supplementary Table 15. BridGE results from HT-eMERGE cohort based on dominant disease model.

BridGE results are reported for the HT-eMERGE cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the dominant model.

Supplementary Table 16. BridGE results from HT-WTCCC cohort based on combined disease model.

BridGE results are reported for the HT-WTCCC cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the recessive-dominant combined model.

Supplementary Table 17. BridGE results from ProC-CGEMS cohort based on dominant disease model.

BridGE results are reported for the ProC-CGEMS cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the dominant model.

Supplementary Table 18. BridGE results from ProC-BPC3 cohort based on dominant disease model.

BridGE results are reported for the ProC-BPC3 cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the dominant model.

Supplementary Table 19. BridGE results from PanC-PanScan cohort based on dominant disease model.

BridGE results are reported for the PanC-PanScan cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the dominant model.

Supplementary Table 20. BridGE results from T2D-WTCCC cohort based on combined disease model.

BridGE results are reported for the T2D-WTCCC cohort, with the following tabs (in order): summary of discoveries, between-pathway model (BPM) interactions, within-pathway model (WPM) interactions, and hub pathways (pathways exhibiting elevated density of SNP-SNP interactions across the genome) (PATH). Decreased risk (protective) and increased risk (risk) interactions are listed separately. These results were derived using the recessive-dominant combined model.

Supplementary Table 21. Replication statistics and lists of replicated BPMs, WPMs or PATHs for BridGE discoveries from prostate cancer, breast cancer and schizophrenia.

BPMs, WPMs and PATHs discovered from the each disease cohort were tested for replication in the corresponding independent cohort, for each of the three diseases. Both a summary of replication statistics and a list of replicated BPMs, WPMs or PATHs are reported, with one disease cohort per tab.

Supplementary Table 22. Comparison between BridGE pathways and SNPs reported in the GWAS catalog.

Summary of the comparison (A) and list of pathways identified by BridGE with FDR< 0.25 and their association with GWAS SNPs for the six diseases studied: (B) Parkinson’s disease, (C) Schizophrenia, (D) Breast cancer, (E) Hypertension, (F) Prostate cancer and (G) Type II diabetes.

Supplementary Table 23. Results of pilot experiments for 13 GWAS cohorts.

As described in methods, all 13 cohorts on which BridGE was applied were first explored in pilot runs in which a smaller number of SNP permutations. Based on initial estimates of FDR, the disease model and density combination with strongest statistical significance were run in full. Pilot results from all 13 cohorts are included in this file, one per tab.

Supplementary Table 24. Summary of evaluation of hygeSSI SNP-SNP interactions by a logistic regression-based interaction test.

Supplementary Table 25. BridGE results from PD-NIA cohort based on recessive/dominant combined disease model using 1000 sample permutations.

Acknowledgments

We thank Dr. Frank Albert and Dr. Jing Hou for constructive comments on the manuscript. This work was partially supported by NSF grants DBI 0953881 (CLM) and IIS 0916439 (VK), NIH grants R01HG005084 (CLM) and R01HG005853 (CLM, CB), R01MH097276 (GF, EES) and R01GM114472 (GF), a University of Minnesota Rochester Biomedical Informatics and Computational Biology Program Traineeship Award (GF) and a Walter Barnes Lang Fellowship (GF). CLM and CB are supported by the CIFAR Genetic Networks program. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funders. Computing resources and data storage services were partially provided by the Minnesota Supercomputing Institute and the UMN Office of Information Technology, respectively.

The genome-wide association datasets (PD-NIA, PD-NGRC, SZ-GAIN, BC-CGEMS-EUR, BC-MCS-JPN, BC-MCS-LTN, HT-eMERGE, ProC-CGEMS, ProC-BPC3 and PanC-PanScan) used in this study were obtained from https://www.ncbi.nlm.nih.gov/gap through dbGaP accession numbers: phs000089.v3.p2, phs000196.v3.p1, phs000021.v3.p2, phs000147.v3.p1, phs000517.v3.p1, phs000297.v1.p1, phs000207.v1.p1, phs000812.v1.p1, and phs000206.v5.p3. We acknowledge the Contributing Investigators who submitted data from their original study to dbGaP, the primary funding organization that supported the Contributing Investigators, and the NIH data repository.

The genome-wide association datasets (SZ-GAIN, HT-WTCCC, T2D-WTCCC) used in this study were provided by Wellcome Trust Case Control Consortium through Dataset Accession numbers: EGAD00000000006, EGAD00000000009 and EGAD00000000001 and EGAD00000000002. These were funded by the Wellcome Trust under award 076113 and a full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk.

References

1.↵
Hirschhorn, J.N. & Daly, M.J. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics 6, 95–108 (2005).
OpenUrl CrossRef PubMed Web of Science
2.
Burton, P.R. et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
OpenUrl CrossRef PubMed Web of Science
3.
Pharoah, P.D. et al. GWAS meta-analysis and replication identifies three new susceptibility loci for ovarian cancer. Nature genetics 45, 362–370 (2013).
OpenUrl CrossRef PubMed
4.
Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nature genetics (2013).
5.↵
Simón-Sánchez, J. et al. Genome-wide association study reveals genetic risk underlying Parkinson's disease. Nature Genetics 41, 1308–12 (2009).
OpenUrl CrossRef PubMed Web of Science
6.↵
Wang, K., Li, M. & Bucan, M. Pathway-basedapproaches for analysis of genomewide association studies. The American Journal of Human Genetics 81, 1278–1283 (2007).
OpenUrl CrossRef PubMed Web of Science
7.↵
Wang, K., Li, M. & Hakonarson, H. Analysing biological pathways in genome-wide association studies. Nature Reviews Genetics 11, 843–854 (2010).
OpenUrl CrossRef PubMed Web of Science
8.
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature (2013).
9.
Baranzini, S.E. et al. Network-based multiple sclerosis pathway analysis with GWAS data from 15,000 cases and 30,000 controls. American Journal of Human Genetics: A Record of Research, Review and Bibliographic Material Relating to Heredity in Man 92, 854–865 (2013).
OpenUrl
10.↵
Califano, A., Butte, A.J., Friend, S., Ideker, T. & Schadt, E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nature genetics 44, 841–847 (2012).
OpenUrl CrossRef PubMed
11.↵
Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
OpenUrl CrossRef PubMed Web of Science
12.
Maher, B. & others. Personal genomes: The case of the missing heritability. Nature 456, 18 (2008).
OpenUrl CrossRef PubMed Web of Science
13.↵
Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nature Reviews Genetics 11, 446–450 (2010).
OpenUrl CrossRef PubMed Web of Science
14.↵
Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.-L.V. & Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234–237 (2013).
OpenUrl CrossRef PubMed Web of Science
15.↵
Zuk, O. et al. Searching for missing heritability: Designing rare variant association studies. Proceedings of the National Academy of Sciences 111, E455–E464 (2014).
OpenUrl Abstract/FREE Full Text
16.↵
Zuk, O., Hechter, E., Sunyaev, S.R. & Lander, E.S. The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences 109, 1193–1198 (2012).
OpenUrl Abstract/FREE Full Text
17.↵
Stahl, E.A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature genetics (2012).
18.↵
Brown, A.A. et al. Genetic interactions affecting human gene expression identified by variance association mapping. eLife (2014).
19.↵
Hemani, G. et al. Detection and replication of epistasis influencing transcription in humans. Nature (2014).
20.↵
Cordell, H.J. Detecting gene-gene interactions that underlie human diseases. Nature Reviews Genetics 10, 392–404 (2009).
OpenUrl CrossRef PubMed Web of Science
21.↵
Cordell, H.J. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Human molecular genetics 11, 2463 (2002).
OpenUrl CrossRef PubMed Web of Science
22.↵
Greene, C.S., Penrod, N.M., Williams, S.M. & Moore, J.H. Failure toReplicate a Genetic Association May Provide Important Clues About Genetic Architecture. Plos One 4(2009).
23.↵
de Cid, R. et al. Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis. Nature Genetics 41, 211–215 (2009).
OpenUrl CrossRef PubMed Web of Science
24.
Martin, M.P. et al. Epistatic interaction between KIR3DS1 and HLA-B delays the progression to AIDS. Nature Genetics 31, 429–434 (2002).
OpenUrl CrossRef PubMed Web of Science
25.
Mamtani, M., Anaya, J., He, W. & Ahuja, S. Association of copy number variation in the FCGR3B gene with risk of autoimmune diseases. Genes and immunity 11, 155–160 (2009).
OpenUrl
26.↵
Prabhu, S. & Pe'er, I. Ultrafast genome-wide scan for SNP-SNP interactions in common complex disease. Genome Research (2012).
27.
Wan, X. et al. BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. American journal of human genetics 87, 325 (2010).
OpenUrl CrossRef PubMed Web of Science
28.↵
Howey, R.. CASSI http://www.staff.ncl.ac.uk/richard.howey/cassi/.
29.↵
Costanzo, M. et al. The genetic landscape of a cell. Science 327, 425 (2010).
OpenUrl Abstract/FREE Full Text
30.↵
Tong, A.H.Y. et al. Global mapping of the yeast genetic interaction network. Science 303, 808 (2004).
OpenUrl Abstract/FREE Full Text
31.↵
Bellay, J. et al. Putting genetic interactions in context through a global modular decomposition. Genome Research 21, 1375–1387 (2011).
OpenUrl Abstract/FREE Full Text
32.
Horn, T. et al. Mapping of signaling networks through synthetic genetic interaction analysis by RNAi. Nature Methods 8, 341–346 (2011).
OpenUrl
33.↵
Lehner, B., Crombie, C., Tischler, J., Fortunato, A. & Fraser, A.G. Systematic mapping of genetic interactions in Caenorhabditis elegans identifies common modifiers of diverse signaling pathways. Nature Genetics 38, 896–903 (2006).
OpenUrl CrossRef PubMed Web of Science
34.↵
Kelley, R. & Ideker, T. Systematic interpretation of genetic interactions using protein networks. Nature biotechnology 23, 561–566 (2005).
OpenUrl CrossRef PubMed Web of Science
35.↵
Costanzo, M. et al. A global genetic interaction network maps a wiring diagram of cellular function. Science 353(2016).
36.↵
Hannum, G. et al. Genome-wide association data reveal a global map of genetic interactions among protein complexes. PLoS Genet 5, e1000782 (2009).
OpenUrl CrossRef PubMed
37.↵
Pandey, A. et al. Epistasis network centrality analysis yields pathway replication across two GWAS cohorts for bipolar disorder. Transl Psychiatry 2, e154 (2012).
OpenUrl CrossRef
38.
Kim, N.C. et al. Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS. BioData Min 5, 9 (2012).
OpenUrl PubMed
39.↵
Mitra, I. et al. Reverse Pathway Genetic Approach Identifies Epistasis in Autism Spectrum Disorders. PLoS Genet 13, e1006516 (2017).
OpenUrl CrossRef PubMed
40.↵
Brossard, M. et al. Integrated pathway and epistasis analysis reveals interactive effect of genetic variants at TERF1 and AFAP1L2 loci on melanoma risk. Int J Cancer 137, 1901–9 (2015).
OpenUrl CrossRef PubMed
41.↵
Wu, M.C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics 89, 82–93 (2011).
OpenUrl CrossRef PubMed
42.↵
Zhang, F., Boerwinkle, E. & Xiong, M. Epistasis analysisfor quantitative traits by functional regression model. Genome Research (2014).
43.↵
Paré, G., Cook, N.R., Ridker, P.M. & Chasman, D.I. On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women's Genome Health Study. PLoS Genet 6, e1000981 (2010).
OpenUrl CrossRef PubMed
44.
Greene, C.S., Penrod, N.M., Kiralis, J. & Moore, J.H. Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions. BioData Min 2, 5 (2009).
OpenUrl CrossRef PubMed
45.↵
Ma, L., Clark, A.G. & Keinan, A. Gene-basedtesting of interactions in association studies of quantitative traits. PLoS Genet 9, e1003321 (2013).
OpenUrl CrossRef PubMed
46.↵
Bush, W.S., Dudek, S.M. & Ritchie, M.D. Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pac Symp Biocomput, 368–79 (2009).
47.↵
Cardon, L.R. & Palmer, L.J. Population stratification and spurious allelic association. The Lancet 361, 598–604 (2003).
OpenUrl
48.↵
Cantor, R.M., Lange, K. & Sinsheimer, J.S. Prioritizing GWAS results: A review of statistical methods and recommendations for their application. The American Journal of Human Genetics 86, 6–22 (2010).
OpenUrl CrossRef PubMed Web of Science
49.↵
Ogata, H. et al. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research 27, 29 (1999).
OpenUrl CrossRef PubMed Web of Science
50.↵
Nishimura, D. BioCarta. Biotech Software & Internet Report: The Computer Software Journal for Scient 2, 117–120 (2001).
OpenUrl
51.↵
Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Research 33, D428–D432 (2005).
OpenUrl CrossRef PubMed Web of Science
52.↵
Simón-Sánchez, J. et al. Genome-wide association study reveals genetic risk underlying Parkinson's disease. Nature Genetics 41, 1308–1312 (2009).
OpenUrl CrossRef PubMed Web of Science
53.↵
Do, C.B. et al. Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease. PLoS Genetics 7, e1002141 (2011).
OpenUrl
54.↵
Hamza, T.H. & Payami, H. The heritability of risk and age at onset of Parkinson's disease after accounting for known genetic risk factors. Journal of human genetics 55, 241–3 (2010).
OpenUrl CrossRef PubMed
55.↵
Subramanian, A., Tamayo, P., Mootha, V.K. & others. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102, 15545–15550 (2005).
OpenUrl Abstract/FREE Full Text
56.↵
Kannarkata, G.T., Bossb, J.M. & Tansey, M.G. The Role of Innate and Adaptive Immunity in Parkinson's Disease. Journal of Parkinson's Disease 3, 493–514 (2013).
OpenUrl
57.↵
Olson, K.E. & Gendelman, H.E. Immunomodulation as a neuroprotective and therapeutic strategy for Parkinson's disease. Curr Opin Pharmacol 26, 87–95 (2015).
OpenUrl
58.↵
Okun, E., Mattson, M.P. & Arumugam, T.V. Involvement of Fc receptors in disorders of the central nervous system. Neuromolecular Med 12, 164–78 (2010).
OpenUrl CrossRef PubMed
59.↵
Bower, J.H., Maraganore, D.M., Peterson, B.J., Ahlskog, J.E. & Rocca, W.A. Immunologic diseases, anti-inflammatory drugs, and Parkinson disease: a case-control study. Neurology 67, 494–6 (2006).
OpenUrl CrossRef PubMed
60.↵
Morrison, B.E. et al. Cutting edge: IL-13Ralpha1 expression in dopaminergic neurons contributes to their oxidative stress-mediated loss following chronic peripheral treatment with lipopolysaccharide. J Immunol 189, 5498–502 (2012).
OpenUrl Abstract/FREE Full Text
61.↵
Boza-Serrano, A. et al. The role of Galectin-3 in alpha-synuclein-induced microglial activation. Acta Neuropathol Commun 2, 156 (2014).
OpenUrl
62.↵
Burguillos, M.A. et al. Microglia-Secreted Galectin-3 Acts as a Toll-like Receptor 4 Ligand and Contributes to Microglial Activation. Cell Rep (2015).
63.↵
Fan, J. et al. Golgi apparatus and neurodegenerative diseases. Int J Dev Neurosci 26, 523–34 (2008).
OpenUrl CrossRef PubMed
64.↵
Cooper, A.A. et al. Alpha-synuclein blocks ER-Golgi traffic and Rab1 rescues neuron loss in Parkinson's models. Science 313, 324–8 (2006).
OpenUrl Abstract/FREE Full Text
65.↵
Mazzio, E. & Soliman, K.F. The role of glycolysis and gluconeogenesis in the cytoprotection of neuroblastoma cells against 1-methyl 4-phenylpyridinium ion toxicity. Neurotoxicology 24, 137–47 (2003).
OpenUrl CrossRef PubMed
66.↵
Watford, W.T., Moriguchi, M., Morinobu, A. & O'Shea, J.J. The biology of IL-12: coordinating innate and adaptive immune responses. Cytokine Growth Factor Reviews 14, 361–8 (2003).
OpenUrl CrossRef PubMed Web of Science
67.↵
Chiu, T., Wang, M. & Su, C. The treatment of glioblastoma multiforme through activation of microglia and TRAIL induced by rAAV2-mediated IL-12 in a syngeneic rat model. Journal of Biomedical Science 19, 45 (2012).
OpenUrl PubMed
68.↵
Taoufik, Y. et al. Human microglial cells express a functional IL-12 receptor and produce IL-12 following IL-12 stimulation. European Journal of Immunology 31, 3228–3239 (2001).
OpenUrl CrossRef PubMed
69.↵
Walter, L. & Neumann, H. Role of microglia in neuronal degeneration and regeneration. Seminars in Immunopathology 31, 513–25 (2009).
OpenUrl
70.
Hanisch, U.K. & Kettenmann, H. Microglia: active sensor and versatile effector cells in the normal and pathologic brain. Nature Neuroscience 10, 1387–94 (2007).
OpenUrl CrossRef PubMed Web of Science
71.
Rogers, J., Mastroeni, D. & al., e. Neuroinflammation in Alzheimer's disease and Parkinson's disease: are microglia pathogenic in either disorder? International Review of Neurobiology 82, 235–46 (2007).
OpenUrl CrossRef PubMed Web of Science
72.
Lull, M.E. & Block, M.L. Microglial activation and chronic neurodegeneration. Neurotherapeutics 7, 354–65 (2010).
OpenUrl CrossRef PubMed Web of Science
73.↵
Zhang, B. et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153, 707–720 (2013).
OpenUrl CrossRef PubMed Web of Science
74.↵
Hamza, T.H. et al. Common genetic variation in the HLA region is associated with late-onset sporadic Parkinson's disease. Nature Genetics 42, 781–785 (2010).
OpenUrl CrossRef PubMed Web of Science
75.↵
Haiman, C.A. et al. A common variant at the TERT-CLPTM1L locus is associated with estrogen receptor-negative breast cancer. Nat Genet 43, 1210–4 (2011).
OpenUrl CrossRef PubMed
76.
Yeager, M. et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39, 645–9 (2007).
OpenUrl CrossRef PubMed
77.
Siddiq, A. et al. A meta-analysis of genome-wide association studies of breast cancer identifies two novel susceptibility loci at 6q14 and 20q11. Hum Mol Genet 21, 5373–84 (2012).
OpenUrl CrossRef PubMed Web of Science
78.
Wolpin, B.M. et al. Genome-wide association study identifies multiple susceptibility loci for pancreatic cancer. Nat Genet 46, 994–1000 (2014).
OpenUrl CrossRef PubMed
79.
Petersen, G.M. et al. A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33. Nat Genet 42, 224–8 (2010).
OpenUrl CrossRef PubMed Web of Science
80.↵
Amundadottir, L. et al. Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat Genet 41, 986–90 (2009).
OpenUrl CrossRef PubMed Web of Science
81.↵
Fujita, N. et al. MTA3, a Mi-2/NuRD complex subunit, regulates an invasive growth pathway in breast cancer. Cell 113, 207–19 (2003).
OpenUrl CrossRef PubMed Web of Science
82.↵
Elsberger, B. et al. Breast cancer patients' clinical outcome measures are associated with Src kinase family member expression. Br J Cancer 103, 899–909 (2010).
OpenUrl CrossRef PubMed Web of Science
83.
Chakraborty, G., Rangaswami, H., Jain, S. & Kundu, G.C. Hypoxia regulates cross-talk between Syk and Lck leading to breast cancer progression and angiogenesis. J Biol Chem 281, 11322–31 (2006).
OpenUrl Abstract/FREE Full Text
84.↵
Elias, D. & Ditzel, H.J. Fyn is an important molecule in cancer pathogenesis and drug resistance. Pharmacol Res 100, 250–4 (2015).
OpenUrl CrossRef
85.↵
Bhindi, B. et al. Dissecting the association between metabolic syndrome and prostate cancer risk: analysis of a large clinical cohort. Eur Urol 67, 64–70 (2015).
OpenUrl CrossRef PubMed
86.↵
Hsing, A.W. et al. Prostate cancer risk and serum levels of insulin and leptin: a population-based study. J Natl Cancer Inst 93, 783–9 (2001).
OpenUrl CrossRef PubMed Web of Science
87.↵
Koul, H.K., Pal, M. & Koul, S. Role of p38 MAP Kinase Signal Transduction in Solid Tumors. Genes Cancer 4, 342–59 (2013).
OpenUrl CrossRef PubMed
88.↵
Collas, P., Le Guellec, K. & Taskén, K. The A-kinase-anchoring protein AKAP95 is a multivalent protein with a key role in chromatin condensation at mitosis. J Cell Biol 147, 1167–80 (1999).
OpenUrl Abstract/FREE Full Text
89.↵
Liu, W. et al. Roles of Cx43 and AKAP95 in ovarian cancer tissues in G1/S phase. Int J Clin Exp Pathol 8, 14315–24 (2015).
OpenUrl
90.↵
Doonan, B.P. & Haque, A. HLA Class II Antigen Presentation in Prostate Cancer Cells: A Novel Approach to Prostate Tumor Immunotherapy. Open Cancer Immunol J 3, 1–7 (2010).
OpenUrl
91.↵
Mazouzi, A., Velimezi, G. & Loizou, J.I. DNA replication stress: causes, resolution and disease. Exp Cell Res 329, 85–93 (2014).
OpenUrl CrossRef PubMed
92.↵
Helleberg, M., Pedersen, M.G., Pedersen, C.B., Mortensen, P.B. & Obel, N. Associations between HIV and schizophrenia and their effect on HIV treatment outcomes: a nationwide population-based cohort study in Denmark. Lancet HIV 2, e344–50 (2015).
OpenUrl
93.↵
Hoffer, A. Nicotinic acid: an adjunct in the treatment of schizophrenia. Am J Psychiatry 120, 171–3 (1963).
OpenUrl CrossRef PubMed
94.↵
Ban, T.A. Nicotinic acid in the treatment of schizophrenias. Practical and theoretical considerations. Neuropsychobiology 1, 133–45 (1975).
OpenUrl PubMed
95.↵
Farha, S. et al. Hypoxia-inducible factors in human pulmonary arterial hypertension: a link to the intrinsic myeloid abnormalities. Blood 117, 3485–93 (2011).
OpenUrl Abstract/FREE Full Text
96.↵
Loirand, G. & Pacaud, P. The role of Rho protein signaling in hypertension. Nat Rev Cardiol 7, 637–47 (2010).
OpenUrl CrossRef PubMed Web of Science
97.↵
Wang, C. The Relationship between Type 2 Diabetes Mellitus and Related Thyroid Diseases. J Diabetes Res 2013, 390534 (2013).
OpenUrl PubMed
98.
Gowd, V., Gurukar, A. & Chilkunda, N.D. Glycosaminoglycan remodeling during diabetes and the role of dietary factors in their modulation. World J Diabetes 7, 67–73 (2016).
OpenUrl
99.↵
Verges, B. & Cariou, B. mTOR inhibitors and diabetes. Diabetes Res Clin Pract 110, 101–8 (2015).
OpenUrl CrossRef PubMed
100.↵
Li, C. & Li, M. GWAsimulator: a rapid whole-genome simulation program. Bioinformatics 24, 140–2 (2008).
OpenUrl CrossRef PubMed Web of Science
101.↵
Tryka, K.A. et al. NCBI's Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res 42, D975–9 (2014).
OpenUrl CrossRef PubMed Web of Science
102.↵
Welter, D, M.J., Morales, J, Burdett, T, Hall, P, Junkins, H, Klemm, A, Flicek, P, Manolio, T, Hindorff, L, and Parkinson, H. The NHGRIGWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research 42 D1001–D1006 (2014).
OpenUrl CrossRef PubMed Web of Science
103.↵
Gao, X., Scott, W.K., Wang, G., Mayhew, G., Li, Y.J., Vance, J.M., Martin. E.R. Gene-gene interaction between FGF20 and MAOB in Parkinson disease. Annals of Human Genetics 72, 157–62 (2008).
OpenUrl CrossRef PubMed Web of Science
104.↵
Sackton, T.B. & Hartl, D.L. Genotypic Context and Epistasis in Individuals and Populations. Cell 166, 279–87 (2016).
OpenUrl CrossRef PubMed
105.↵
Hill, W.G., Goddard, M.E. & Visscher, P.M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet 4, e1000008 (2008).
OpenUrl CrossRef PubMed
106.↵
Phillips, P.C. Epistasis–the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 9, 855–67 (2008).
OpenUrl CrossRef PubMed Web of Science
107.↵
Forsberg, S.K., Bloom, J.S., Sadhu, M.J., Kruglyak, L. & Carlborg, Ö. Accounting for genetic interactions improves modeling of individual quantitative trait phenotypes in yeast. Nat Genet 49, 497–503 (2017).
OpenUrl CrossRef PubMed
108.↵
Upton, A., Trelles, O., Cornejo-García, J.A. & Perkins, J.R. Review: High-performance computing to detect epistasis in genome scale data sets. Brief Bioinform 17, 368–79 (2016).
OpenUrl CrossRef PubMed
109.↵
Mootha, V.K. et al. PGC-1$\alpha$-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics 34, 267–273 (2003).
OpenUrl CrossRef PubMed Web of Science
110.↵
Pandey, A. et al. Epistasis network centrality analysis yields pathway replication across two GWAS cohorts for bipolar disorder. Translational Psychiatry 2, e154 (2012).
OpenUrl
111.↵
Kim, N. et al. Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS. BioData mining 5, 9 (2012).
OpenUrl
112.↵
McKinney, B.A., Crowe, J.E., Guo, J. & Tian, D. Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet 5, e1000432 (2009).
OpenUrl CrossRef PubMed
113.↵
Sun, X. et al. Analysis pipeline for the epistasis search - statistical versus biological filtering. Front Genet 5, 106 (2014).
OpenUrl
114.↵
Ma, L. et al. Knowledge-driven analysis identifies a gene-gene interaction affecting high-density lipoprotein cholesterol levels in multi-ethnic populations. PLoS Genet 8, e1002714 (2012).
OpenUrl CrossRef PubMed
115.↵
U. S. Department of Health and Human Services, National Institutes of Health, National Institute of Mental Health. Depression (NIH Publication No. 15-3561). Bethesda, MD: U.S. Government Printing Office. (2015).
116.↵
Schneider, V. & Church, D. Genome Reference Consortium. (2013).
117.↵
Siva, N. 1000 Genomes project. Nature biotechnology 26, 256–256 (2008).
OpenUrl CrossRef PubMed Web of Science
118.↵
Pankratz, N. et al. Meta-analysis of parkinson disease: identification of a novel locus, rit2. Annals of Neurology 71, 370–84 (2012).
OpenUrl CrossRef PubMed
119.↵
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81, 559–575 (2007).
OpenUrl CrossRef PubMed
120.↵
Li, J.Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100 (2008).
OpenUrl Abstract/FREE Full Text
121.↵
Rabinowitz, D. & Laird, N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Human Heredity 50, 211–223 (2000).
OpenUrl CrossRef PubMed Web of Science
122.↵
Sul, J.H. et al. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models. PLoS Genet 12, e1005849 (2016).
OpenUrl CrossRef
123.↵
Storey, J.D., Akey, J.M., Kruglyak, L. & others. Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biology 3, 1380 (2005).
OpenUrl Web of Science
124.↵
Holmans, P. et al. Gene ontology analysis of gwa study data sets provides insights into the biology of bipolar disorder. American journal of human genetics 85, 13–24 (2009).
OpenUrl CrossRef PubMed Web of Science
125.↵
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).
OpenUrl CrossRef PubMed Web of Science
126.↵
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Research 40, D109–D114 (2012).
OpenUrl CrossRef PubMed Web of Science
127.↵
Nishimura, D. BioCarta. Biotech Software & Internet Report 2, 117–120 (2001).
OpenUrl CrossRef
128.↵
Ritchie, M.D. et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. American journal of human genetics 69, 138–147 (2001).
OpenUrl CrossRef PubMed Web of Science
129.↵
Anastassiou, D. Computational analysis of the synergy among multiple interacting genes. Molecular Systems Biology 3(2007).
130.↵
Welter, D, M.J., Morales, J, Burdett, T, Hall, P, Junkins, H, Klemm, A, Flicek, P, Manolio, T, Hindorff, L, and Parkinson, H. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research 42, D1001–D1006 (2014).
OpenUrl CrossRef PubMed Web of Science
131.↵
Materials and methods are available as supplementary materials on Science Online.
132.↵
Herold, C. & others. INTERSNP: genome-wide interaction analysis guided by a priori information. Bioinformatics 25, 3275 (2009).
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted August 30, 2017.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Genetics

Subject Areas

All Articles

Animal Behavior and Cognition (5201)
Biochemistry (11718)
Bioengineering (8724)
Bioinformatics (29132)
Biophysics (14936)
Cancer Biology (12051)
Cell Biology (17360)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14146)
Epidemiology (2067)
Evolutionary Biology (18269)
Genetics (12223)
Genomics (16768)
Immunology (11844)
Microbiology (28016)
Molecular Biology (11560)
Neuroscience (60822)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4940)
Plant Biology (10401)
Scientific Communication and Education (1680)
Synthetic Biology (2878)
Systems Biology (7333)
Zoology (1642)

[1] 1.↵
Hirschhorn, J.N. & Daly, M.J. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics 6, 95–108 (2005).
OpenUrl CrossRef PubMed Web of Science

[2] 2.
Burton, P.R. et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
OpenUrl CrossRef PubMed Web of Science

[3] 3.
Pharoah, P.D. et al. GWAS meta-analysis and replication identifies three new susceptibility loci for ovarian cancer. Nature genetics 45, 362–370 (2013).
OpenUrl CrossRef PubMed

[4] 4.
Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nature genetics (2013).

[5] 5.↵
Simón-Sánchez, J. et al. Genome-wide association study reveals genetic risk underlying Parkinson's disease. Nature Genetics 41, 1308–12 (2009).
OpenUrl CrossRef PubMed Web of Science

[6] 6.↵
Wang, K., Li, M. & Bucan, M. Pathway-basedapproaches for analysis of genomewide association studies. The American Journal of Human Genetics 81, 1278–1283 (2007).
OpenUrl CrossRef PubMed Web of Science

[7] 7.↵
Wang, K., Li, M. & Hakonarson, H. Analysing biological pathways in genome-wide association studies. Nature Reviews Genetics 11, 843–854 (2010).
OpenUrl CrossRef PubMed Web of Science

[8] 8.
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature (2013).

[9] 9.
Baranzini, S.E. et al. Network-based multiple sclerosis pathway analysis with GWAS data from 15,000 cases and 30,000 controls. American Journal of Human Genetics: A Record of Research, Review and Bibliographic Material Relating to Heredity in Man 92, 854–865 (2013).
OpenUrl

[10] 10.↵
Califano, A., Butte, A.J., Friend, S., Ideker, T. & Schadt, E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nature genetics 44, 841–847 (2012).
OpenUrl CrossRef PubMed

[11] 11.↵
Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
OpenUrl CrossRef PubMed Web of Science

[12] 12.
Maher, B. & others. Personal genomes: The case of the missing heritability. Nature 456, 18 (2008).
OpenUrl CrossRef PubMed Web of Science

[13] 13.↵
Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nature Reviews Genetics 11, 446–450 (2010).
OpenUrl CrossRef PubMed Web of Science

[14] 14.↵
Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.-L.V. & Kruglyak, L. Finding the sources of missing heritability in a yeast cross. Nature 494, 234–237 (2013).
OpenUrl CrossRef PubMed Web of Science

[15] 15.↵
Zuk, O. et al. Searching for missing heritability: Designing rare variant association studies. Proceedings of the National Academy of Sciences 111, E455–E464 (2014).
OpenUrl Abstract/FREE Full Text

[16] 16.↵
Zuk, O., Hechter, E., Sunyaev, S.R. & Lander, E.S. The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences 109, 1193–1198 (2012).
OpenUrl Abstract/FREE Full Text

[17] 17.↵
Stahl, E.A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature genetics (2012).

[18] 18.↵
Brown, A.A. et al. Genetic interactions affecting human gene expression identified by variance association mapping. eLife (2014).

[19] 19.↵
Hemani, G. et al. Detection and replication of epistasis influencing transcription in humans. Nature (2014).

[20] 20.↵
Cordell, H.J. Detecting gene-gene interactions that underlie human diseases. Nature Reviews Genetics 10, 392–404 (2009).
OpenUrl CrossRef PubMed Web of Science

[21] 21.↵
Cordell, H.J. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Human molecular genetics 11, 2463 (2002).
OpenUrl CrossRef PubMed Web of Science

[22] 22.↵
Greene, C.S., Penrod, N.M., Williams, S.M. & Moore, J.H. Failure toReplicate a Genetic Association May Provide Important Clues About Genetic Architecture. Plos One 4(2009).

[23] 23.↵
de Cid, R. et al. Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis. Nature Genetics 41, 211–215 (2009).
OpenUrl CrossRef PubMed Web of Science

[24] 24.
Martin, M.P. et al. Epistatic interaction between KIR3DS1 and HLA-B delays the progression to AIDS. Nature Genetics 31, 429–434 (2002).
OpenUrl CrossRef PubMed Web of Science

[25] 25.
Mamtani, M., Anaya, J., He, W. & Ahuja, S. Association of copy number variation in the FCGR3B gene with risk of autoimmune diseases. Genes and immunity 11, 155–160 (2009).
OpenUrl

[26] 26.↵
Prabhu, S. & Pe'er, I. Ultrafast genome-wide scan for SNP-SNP interactions in common complex disease. Genome Research (2012).

[27] 27.
Wan, X. et al. BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. American journal of human genetics 87, 325 (2010).
OpenUrl CrossRef PubMed Web of Science

[28] 28.↵
Howey, R.. CASSI http://www.staff.ncl.ac.uk/richard.howey/cassi/.

[29] 29.↵
Costanzo, M. et al. The genetic landscape of a cell. Science 327, 425 (2010).
OpenUrl Abstract/FREE Full Text

[30] 30.↵
Tong, A.H.Y. et al. Global mapping of the yeast genetic interaction network. Science 303, 808 (2004).
OpenUrl Abstract/FREE Full Text

[31] 31.↵
Bellay, J. et al. Putting genetic interactions in context through a global modular decomposition. Genome Research 21, 1375–1387 (2011).
OpenUrl Abstract/FREE Full Text

[32] 32.
Horn, T. et al. Mapping of signaling networks through synthetic genetic interaction analysis by RNAi. Nature Methods 8, 341–346 (2011).
OpenUrl

[33] 33.↵
Lehner, B., Crombie, C., Tischler, J., Fortunato, A. & Fraser, A.G. Systematic mapping of genetic interactions in Caenorhabditis elegans identifies common modifiers of diverse signaling pathways. Nature Genetics 38, 896–903 (2006).
OpenUrl CrossRef PubMed Web of Science

[34] 34.↵
Kelley, R. & Ideker, T. Systematic interpretation of genetic interactions using protein networks. Nature biotechnology 23, 561–566 (2005).
OpenUrl CrossRef PubMed Web of Science

[35] 35.↵
Costanzo, M. et al. A global genetic interaction network maps a wiring diagram of cellular function. Science 353(2016).

[36] 36.↵
Hannum, G. et al. Genome-wide association data reveal a global map of genetic interactions among protein complexes. PLoS Genet 5, e1000782 (2009).
OpenUrl CrossRef PubMed

[37] 37.↵
Pandey, A. et al. Epistasis network centrality analysis yields pathway replication across two GWAS cohorts for bipolar disorder. Transl Psychiatry 2, e154 (2012).
OpenUrl CrossRef

[38] 38.
Kim, N.C. et al. Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS. BioData Min 5, 9 (2012).
OpenUrl PubMed

[39] 39.↵
Mitra, I. et al. Reverse Pathway Genetic Approach Identifies Epistasis in Autism Spectrum Disorders. PLoS Genet 13, e1006516 (2017).
OpenUrl CrossRef PubMed

[40] 40.↵
Brossard, M. et al. Integrated pathway and epistasis analysis reveals interactive effect of genetic variants at TERF1 and AFAP1L2 loci on melanoma risk. Int J Cancer 137, 1901–9 (2015).
OpenUrl CrossRef PubMed

[41] 41.↵
Wu, M.C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics 89, 82–93 (2011).
OpenUrl CrossRef PubMed

[42] 42.↵
Zhang, F., Boerwinkle, E. & Xiong, M. Epistasis analysisfor quantitative traits by functional regression model. Genome Research (2014).

[43] 43.↵
Paré, G., Cook, N.R., Ridker, P.M. & Chasman, D.I. On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women's Genome Health Study. PLoS Genet 6, e1000981 (2010).
OpenUrl CrossRef PubMed

[44] 44.
Greene, C.S., Penrod, N.M., Kiralis, J. & Moore, J.H. Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions. BioData Min 2, 5 (2009).
OpenUrl CrossRef PubMed

[45] 45.↵
Ma, L., Clark, A.G. & Keinan, A. Gene-basedtesting of interactions in association studies of quantitative traits. PLoS Genet 9, e1003321 (2013).
OpenUrl CrossRef PubMed

[46] 46.↵
Bush, W.S., Dudek, S.M. & Ritchie, M.D. Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pac Symp Biocomput, 368–79 (2009).

[47] 47.↵
Cardon, L.R. & Palmer, L.J. Population stratification and spurious allelic association. The Lancet 361, 598–604 (2003).
OpenUrl

[48] 48.↵
Cantor, R.M., Lange, K. & Sinsheimer, J.S. Prioritizing GWAS results: A review of statistical methods and recommendations for their application. The American Journal of Human Genetics 86, 6–22 (2010).
OpenUrl CrossRef PubMed Web of Science

[49] 49.↵
Ogata, H. et al. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research 27, 29 (1999).
OpenUrl CrossRef PubMed Web of Science

[50] 50.↵
Nishimura, D. BioCarta. Biotech Software & Internet Report: The Computer Software Journal for Scient 2, 117–120 (2001).
OpenUrl

[51] 51.↵
Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Research 33, D428–D432 (2005).
OpenUrl CrossRef PubMed Web of Science

[52] 52.↵
Simón-Sánchez, J. et al. Genome-wide association study reveals genetic risk underlying Parkinson's disease. Nature Genetics 41, 1308–1312 (2009).
OpenUrl CrossRef PubMed Web of Science

[53] 53.↵
Do, C.B. et al. Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease. PLoS Genetics 7, e1002141 (2011).
OpenUrl

[54] 54.↵
Hamza, T.H. & Payami, H. The heritability of risk and age at onset of Parkinson's disease after accounting for known genetic risk factors. Journal of human genetics 55, 241–3 (2010).
OpenUrl CrossRef PubMed

[55] 55.↵
Subramanian, A., Tamayo, P., Mootha, V.K. & others. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102, 15545–15550 (2005).
OpenUrl Abstract/FREE Full Text

[56] 56.↵
Kannarkata, G.T., Bossb, J.M. & Tansey, M.G. The Role of Innate and Adaptive Immunity in Parkinson's Disease. Journal of Parkinson's Disease 3, 493–514 (2013).
OpenUrl

[57] 57.↵
Olson, K.E. & Gendelman, H.E. Immunomodulation as a neuroprotective and therapeutic strategy for Parkinson's disease. Curr Opin Pharmacol 26, 87–95 (2015).
OpenUrl

[58] 58.↵
Okun, E., Mattson, M.P. & Arumugam, T.V. Involvement of Fc receptors in disorders of the central nervous system. Neuromolecular Med 12, 164–78 (2010).
OpenUrl CrossRef PubMed

[59] 59.↵
Bower, J.H., Maraganore, D.M., Peterson, B.J., Ahlskog, J.E. & Rocca, W.A. Immunologic diseases, anti-inflammatory drugs, and Parkinson disease: a case-control study. Neurology 67, 494–6 (2006).
OpenUrl CrossRef PubMed

[60] 60.↵
Morrison, B.E. et al. Cutting edge: IL-13Ralpha1 expression in dopaminergic neurons contributes to their oxidative stress-mediated loss following chronic peripheral treatment with lipopolysaccharide. J Immunol 189, 5498–502 (2012).
OpenUrl Abstract/FREE Full Text

[61] 61.↵
Boza-Serrano, A. et al. The role of Galectin-3 in alpha-synuclein-induced microglial activation. Acta Neuropathol Commun 2, 156 (2014).
OpenUrl

[62] 62.↵
Burguillos, M.A. et al. Microglia-Secreted Galectin-3 Acts as a Toll-like Receptor 4 Ligand and Contributes to Microglial Activation. Cell Rep (2015).

[63] 63.↵
Fan, J. et al. Golgi apparatus and neurodegenerative diseases. Int J Dev Neurosci 26, 523–34 (2008).
OpenUrl CrossRef PubMed

[64] 64.↵
Cooper, A.A. et al. Alpha-synuclein blocks ER-Golgi traffic and Rab1 rescues neuron loss in Parkinson's models. Science 313, 324–8 (2006).
OpenUrl Abstract/FREE Full Text

[65] 65.↵
Mazzio, E. & Soliman, K.F. The role of glycolysis and gluconeogenesis in the cytoprotection of neuroblastoma cells against 1-methyl 4-phenylpyridinium ion toxicity. Neurotoxicology 24, 137–47 (2003).
OpenUrl CrossRef PubMed

[66] 66.↵
Watford, W.T., Moriguchi, M., Morinobu, A. & O'Shea, J.J. The biology of IL-12: coordinating innate and adaptive immune responses. Cytokine Growth Factor Reviews 14, 361–8 (2003).
OpenUrl CrossRef PubMed Web of Science

[67] 67.↵
Chiu, T., Wang, M. & Su, C. The treatment of glioblastoma multiforme through activation of microglia and TRAIL induced by rAAV2-mediated IL-12 in a syngeneic rat model. Journal of Biomedical Science 19, 45 (2012).
OpenUrl PubMed

[68] 68.↵
Taoufik, Y. et al. Human microglial cells express a functional IL-12 receptor and produce IL-12 following IL-12 stimulation. European Journal of Immunology 31, 3228–3239 (2001).
OpenUrl CrossRef PubMed

[69] 69.↵
Walter, L. & Neumann, H. Role of microglia in neuronal degeneration and regeneration. Seminars in Immunopathology 31, 513–25 (2009).
OpenUrl

[70] 70.
Hanisch, U.K. & Kettenmann, H. Microglia: active sensor and versatile effector cells in the normal and pathologic brain. Nature Neuroscience 10, 1387–94 (2007).
OpenUrl CrossRef PubMed Web of Science

[71] 71.
Rogers, J., Mastroeni, D. & al., e. Neuroinflammation in Alzheimer's disease and Parkinson's disease: are microglia pathogenic in either disorder? International Review of Neurobiology 82, 235–46 (2007).
OpenUrl CrossRef PubMed Web of Science

[72] 72.
Lull, M.E. & Block, M.L. Microglial activation and chronic neurodegeneration. Neurotherapeutics 7, 354–65 (2010).
OpenUrl CrossRef PubMed Web of Science

[73] 73.↵
Zhang, B. et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153, 707–720 (2013).
OpenUrl CrossRef PubMed Web of Science

[74] 74.↵
Hamza, T.H. et al. Common genetic variation in the HLA region is associated with late-onset sporadic Parkinson's disease. Nature Genetics 42, 781–785 (2010).
OpenUrl CrossRef PubMed Web of Science

[75] 75.↵
Haiman, C.A. et al. A common variant at the TERT-CLPTM1L locus is associated with estrogen receptor-negative breast cancer. Nat Genet 43, 1210–4 (2011).
OpenUrl CrossRef PubMed

[76] 76.
Yeager, M. et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39, 645–9 (2007).
OpenUrl CrossRef PubMed

[77] 77.
Siddiq, A. et al. A meta-analysis of genome-wide association studies of breast cancer identifies two novel susceptibility loci at 6q14 and 20q11. Hum Mol Genet 21, 5373–84 (2012).
OpenUrl CrossRef PubMed Web of Science

[78] 78.
Wolpin, B.M. et al. Genome-wide association study identifies multiple susceptibility loci for pancreatic cancer. Nat Genet 46, 994–1000 (2014).
OpenUrl CrossRef PubMed

[79] 79.
Petersen, G.M. et al. A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33. Nat Genet 42, 224–8 (2010).
OpenUrl CrossRef PubMed Web of Science

[80] 80.↵
Amundadottir, L. et al. Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat Genet 41, 986–90 (2009).
OpenUrl CrossRef PubMed Web of Science

[81] 81.↵
Fujita, N. et al. MTA3, a Mi-2/NuRD complex subunit, regulates an invasive growth pathway in breast cancer. Cell 113, 207–19 (2003).
OpenUrl CrossRef PubMed Web of Science

[82] 82.↵
Elsberger, B. et al. Breast cancer patients' clinical outcome measures are associated with Src kinase family member expression. Br J Cancer 103, 899–909 (2010).
OpenUrl CrossRef PubMed Web of Science

[83] 83.
Chakraborty, G., Rangaswami, H., Jain, S. & Kundu, G.C. Hypoxia regulates cross-talk between Syk and Lck leading to breast cancer progression and angiogenesis. J Biol Chem 281, 11322–31 (2006).
OpenUrl Abstract/FREE Full Text

[84] 84.↵
Elias, D. & Ditzel, H.J. Fyn is an important molecule in cancer pathogenesis and drug resistance. Pharmacol Res 100, 250–4 (2015).
OpenUrl CrossRef

[85] 85.↵
Bhindi, B. et al. Dissecting the association between metabolic syndrome and prostate cancer risk: analysis of a large clinical cohort. Eur Urol 67, 64–70 (2015).
OpenUrl CrossRef PubMed

[86] 86.↵
Hsing, A.W. et al. Prostate cancer risk and serum levels of insulin and leptin: a population-based study. J Natl Cancer Inst 93, 783–9 (2001).
OpenUrl CrossRef PubMed Web of Science

[87] 87.↵
Koul, H.K., Pal, M. & Koul, S. Role of p38 MAP Kinase Signal Transduction in Solid Tumors. Genes Cancer 4, 342–59 (2013).
OpenUrl CrossRef PubMed

[88] 88.↵
Collas, P., Le Guellec, K. & Taskén, K. The A-kinase-anchoring protein AKAP95 is a multivalent protein with a key role in chromatin condensation at mitosis. J Cell Biol 147, 1167–80 (1999).
OpenUrl Abstract/FREE Full Text

[89] 89.↵
Liu, W. et al. Roles of Cx43 and AKAP95 in ovarian cancer tissues in G1/S phase. Int J Clin Exp Pathol 8, 14315–24 (2015).
OpenUrl

[90] 90.↵
Doonan, B.P. & Haque, A. HLA Class II Antigen Presentation in Prostate Cancer Cells: A Novel Approach to Prostate Tumor Immunotherapy. Open Cancer Immunol J 3, 1–7 (2010).
OpenUrl

[91] 91.↵
Mazouzi, A., Velimezi, G. & Loizou, J.I. DNA replication stress: causes, resolution and disease. Exp Cell Res 329, 85–93 (2014).
OpenUrl CrossRef PubMed

[92] 92.↵
Helleberg, M., Pedersen, M.G., Pedersen, C.B., Mortensen, P.B. & Obel, N. Associations between HIV and schizophrenia and their effect on HIV treatment outcomes: a nationwide population-based cohort study in Denmark. Lancet HIV 2, e344–50 (2015).
OpenUrl

[93] 93.↵
Hoffer, A. Nicotinic acid: an adjunct in the treatment of schizophrenia. Am J Psychiatry 120, 171–3 (1963).
OpenUrl CrossRef PubMed

[94] 94.↵
Ban, T.A. Nicotinic acid in the treatment of schizophrenias. Practical and theoretical considerations. Neuropsychobiology 1, 133–45 (1975).
OpenUrl PubMed

[95] 95.↵
Farha, S. et al. Hypoxia-inducible factors in human pulmonary arterial hypertension: a link to the intrinsic myeloid abnormalities. Blood 117, 3485–93 (2011).
OpenUrl Abstract/FREE Full Text

[96] 96.↵
Loirand, G. & Pacaud, P. The role of Rho protein signaling in hypertension. Nat Rev Cardiol 7, 637–47 (2010).
OpenUrl CrossRef PubMed Web of Science

[97] 97.↵
Wang, C. The Relationship between Type 2 Diabetes Mellitus and Related Thyroid Diseases. J Diabetes Res 2013, 390534 (2013).
OpenUrl PubMed

[98] 98.
Gowd, V., Gurukar, A. & Chilkunda, N.D. Glycosaminoglycan remodeling during diabetes and the role of dietary factors in their modulation. World J Diabetes 7, 67–73 (2016).
OpenUrl

[99] 99.↵
Verges, B. & Cariou, B. mTOR inhibitors and diabetes. Diabetes Res Clin Pract 110, 101–8 (2015).
OpenUrl CrossRef PubMed

[100] 100.↵
Li, C. & Li, M. GWAsimulator: a rapid whole-genome simulation program. Bioinformatics 24, 140–2 (2008).
OpenUrl CrossRef PubMed Web of Science

[101] 101.↵
Tryka, K.A. et al. NCBI's Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res 42, D975–9 (2014).
OpenUrl CrossRef PubMed Web of Science

[102] 102.↵
Welter, D, M.J., Morales, J, Burdett, T, Hall, P, Junkins, H, Klemm, A, Flicek, P, Manolio, T, Hindorff, L, and Parkinson, H. The NHGRIGWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research 42 D1001–D1006 (2014).
OpenUrl CrossRef PubMed Web of Science

[103] 103.↵
Gao, X., Scott, W.K., Wang, G., Mayhew, G., Li, Y.J., Vance, J.M., Martin. E.R. Gene-gene interaction between FGF20 and MAOB in Parkinson disease. Annals of Human Genetics 72, 157–62 (2008).
OpenUrl CrossRef PubMed Web of Science

[104] 104.↵
Sackton, T.B. & Hartl, D.L. Genotypic Context and Epistasis in Individuals and Populations. Cell 166, 279–87 (2016).
OpenUrl CrossRef PubMed

[105] 105.↵
Hill, W.G., Goddard, M.E. & Visscher, P.M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet 4, e1000008 (2008).
OpenUrl CrossRef PubMed

[106] 106.↵
Phillips, P.C. Epistasis–the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 9, 855–67 (2008).
OpenUrl CrossRef PubMed Web of Science

[107] 107.↵
Forsberg, S.K., Bloom, J.S., Sadhu, M.J., Kruglyak, L. & Carlborg, Ö. Accounting for genetic interactions improves modeling of individual quantitative trait phenotypes in yeast. Nat Genet 49, 497–503 (2017).
OpenUrl CrossRef PubMed

[108] 108.↵
Upton, A., Trelles, O., Cornejo-García, J.A. & Perkins, J.R. Review: High-performance computing to detect epistasis in genome scale data sets. Brief Bioinform 17, 368–79 (2016).
OpenUrl CrossRef PubMed

[109] 109.↵
Mootha, V.K. et al. PGC-1$\alpha$-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics 34, 267–273 (2003).
OpenUrl CrossRef PubMed Web of Science

[110] 110.↵
Pandey, A. et al. Epistasis network centrality analysis yields pathway replication across two GWAS cohorts for bipolar disorder. Translational Psychiatry 2, e154 (2012).
OpenUrl

[111] 111.↵
Kim, N. et al. Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS. BioData mining 5, 9 (2012).
OpenUrl

[112] 112.↵
McKinney, B.A., Crowe, J.E., Guo, J. & Tian, D. Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet 5, e1000432 (2009).
OpenUrl CrossRef PubMed

[113] 113.↵
Sun, X. et al. Analysis pipeline for the epistasis search - statistical versus biological filtering. Front Genet 5, 106 (2014).
OpenUrl

[114] 114.↵
Ma, L. et al. Knowledge-driven analysis identifies a gene-gene interaction affecting high-density lipoprotein cholesterol levels in multi-ethnic populations. PLoS Genet 8, e1002714 (2012).
OpenUrl CrossRef PubMed

[115] 115.↵
U. S. Department of Health and Human Services, National Institutes of Health, National Institute of Mental Health. Depression (NIH Publication No. 15-3561). Bethesda, MD: U.S. Government Printing Office. (2015).

[116] 116.↵
Schneider, V. & Church, D. Genome Reference Consortium. (2013).

[117] 117.↵
Siva, N. 1000 Genomes project. Nature biotechnology 26, 256–256 (2008).
OpenUrl CrossRef PubMed Web of Science

[118] 118.↵
Pankratz, N. et al. Meta-analysis of parkinson disease: identification of a novel locus, rit2. Annals of Neurology 71, 370–84 (2012).
OpenUrl CrossRef PubMed

[119] 119.↵
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81, 559–575 (2007).
OpenUrl CrossRef PubMed

[120] 120.↵
Li, J.Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100 (2008).
OpenUrl Abstract/FREE Full Text

[121] 121.↵
Rabinowitz, D. & Laird, N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Human Heredity 50, 211–223 (2000).
OpenUrl CrossRef PubMed Web of Science

[122] 122.↵
Sul, J.H. et al. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models. PLoS Genet 12, e1005849 (2016).
OpenUrl CrossRef

[123] 123.↵
Storey, J.D., Akey, J.M., Kruglyak, L. & others. Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biology 3, 1380 (2005).
OpenUrl Web of Science

[124] 124.↵
Holmans, P. et al. Gene ontology analysis of gwa study data sets provides insights into the biology of bipolar disorder. American journal of human genetics 85, 13–24 (2009).
OpenUrl CrossRef PubMed Web of Science

[125] 125.↵
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).
OpenUrl CrossRef PubMed Web of Science

[126] 126.↵
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Research 40, D109–D114 (2012).
OpenUrl CrossRef PubMed Web of Science

[127] 127.↵
Nishimura, D. BioCarta. Biotech Software & Internet Report 2, 117–120 (2001).
OpenUrl CrossRef

[128] 128.↵
Ritchie, M.D. et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. American journal of human genetics 69, 138–147 (2001).
OpenUrl CrossRef PubMed Web of Science

[129] 129.↵
Anastassiou, D. Computational analysis of the synergy among multiple interacting genes. Molecular Systems Biology 3(2007).

[130] 130.↵
Welter, D, M.J., Morales, J, Burdett, T, Hall, P, Junkins, H, Klemm, A, Flicek, P, Manolio, T, Hindorff, L, and Parkinson, H. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research 42, D1001–D1006 (2014).
OpenUrl CrossRef PubMed Web of Science

[131] 131.↵
Materials and methods are available as supplementary materials on Science Online.

[132] 132.↵
Herold, C. & others. INTERSNP: genome-wide interaction analysis guided by a priori information. Bioinformatics 25, 3275 (2009).
OpenUrl CrossRef PubMed Web of Science