Shared activity patterns arising at genetic susceptibility loci reveal underlying genomic and cellular architecture of human disease

J. Kenneth Baillie; Andrew Bretherick; Christopher S. Haley; Sara Clohisey; Alan Gray; Jeffrey Barret; Eli A. Stahl; Albert Tenesa; Robin Andersson; J. Ben Brown; Geoffrey J. Faulkner; Marina Lizio; Ulf Schaefer; Carsten Daub; Masayoshi Itoh; Naoto Kondo; Timo Lassmann; Jun Kawai; IIBDGC Consortium; FANTOM5 Consortium; Vladimir B. Bajic; Peter Heutink; Michael Rehli; Hideya Kawaji; Albin Sandelin; Harukazu Suzuki; Jack Satsangi; Christine A. Wells; Nir Hacohen; Thomas C Freeman; Yoshihide Hayashizaki; Piero Carninci; Alistair R.R. Forrest; David A. Hume

doi:10.1101/095349

Abstract

Genetic variants underlying complex traits, including disease susceptibility, are enriched within the transcriptional regulatory elements, promoters and enhancers. There is emerging evidence that regulatory elements associated with particular traits or diseases share patterns of transcriptional regulation. Accordingly, shared transcriptional regulation (coexpression) may help prioritise loci associated with a given trait, and help to identify the biological processes underlying it. Using cap analysis of gene expression (CAGE) profiles of promoter-and enhancer-derived RNAs across 1824 human samples, we have quantified coexpression of RNAs originating from trait-associated regulatory regions using a novel analytical method (network density analysis; NDA). For most traits studied, sequence variants in regulatory regions were linked to tightly coexpressed networks that are likely to share important functional characteristics. These networks implicate particular cell types and tissues in disease pathogenesis; for example, variants associated with ulcerative colitis are linked to expression in gut tissue, whereas Crohn’s disease variants are restricted to immune cells. We show that this coexpression signal provides additional independent information for fine mapping likely causative variants. This approach identifies additional genetic variants associated with specific traits, including an association between the regulation of the OCT1 cation transporter and genetic variants underlying circulating cholesterol levels. This approach enables a deeper biological understanding of the causal basis of complex traits.

ONE SENTENCE SUMMARY We discover that variants associated with a specific disease share expression profiles across tissues and cell types, enabling fine mapping and identification of new disease-associated variants, illuminating key cell types involved in disease pathogenesis.

Introduction

Genome-wide association studies (GWAS) have considerable untapped potential to reveal new mechanisms of disease¹. Variants associated with disease are strongly over-represented in regulatory, rather than protein-coding, sequence; this enrichment is particularly strong in promoters and enhancers^2–4. There is emerging evidence that gene products associated with a specific disease participate in the same pathway or process⁵, and therefore share transcriptional control⁶.

We have recently shown that cell-type specific patterns of activity at multiple alternative promoters⁷ and enhancers³ can be identified using cap-analysis of gene expression (CAGE) to detect capped RNA transcripts, including mRNAs, lncRNAs and eRNAs^3,5. In the FANTOM5 project, we used CAGE to locate transcription start sites at single-base resolution and quantified the activity of 267,225 regulatory regions in 1824 human samples (primary cells, tissues, and cells following various perturbations)⁸.

Unlike analysis of chromatin modifications or accessibility, the CAGE sequencing used in FANTOM5 combines extremely high resolution in three relevant dimensions: maximal spatial resolution on the genome, quantification of activity (transcript expression) over a wide dynamic range, and high biological resolution – quantifying activity in a much wider range of cell types and conditions than any previous study of regulatory variation^2,4. Since a majority of human protein-coding genes have multiple promoters⁵ with distinct transcriptional regulation, CAGE also provides a more detailed survey of transcriptional regulation than microarray or RNAseq resources. Heritability of traits studied by GWAS is substantially enriched in these FANTOM5 promoters⁹.

Genes that are coexpressed are more likely to share common biology^10,11. Similarly, regulatory regions that share activity patterns are more likely to contribute to the same biological pathways⁵. Transcriptional activity of regulatory elements (both promoters and enhancers³) is associated with variable levels of expression arising at these elements in different cell types and tissues⁵.

In order to determine whether coexpression can provide additional information to prioritise genome-wide associations that would otherwise fall below genome-wide significance, we developed network density analysis (NDA). The NDA method combines genetic signals (disease association in a GWAS) with functional signals (correlation in expression across numerous cell types and tissues, Figure 1), by mapping genetic signals onto a pairwise coexpression network of regulatory regions, and then quantifying the density of genetic signals within the network. Every regulatory region that contains a GWAS SNP is assigned a score quantifying its proximity in the network to every other regulatory region containing a GWAS SNP for that trait. We then identified specific cell types and tissues in which there is preferential activity of regulatory elements associated with selected disease-related phenotypes, thereby providing appropriate cell culture models for critical disease processes.

Figure 1: Use of NDA to detect coexpression. a) A subset of regulatory elements is identified containing disease-associated SNPs. b) The strength of the links between pairs of these regulatory regions is quantified, first as the Spearman correlation, then as the —log₁0p-value quantifying the probability, specific to this regulatory region, of a Spearman correlation of at least this strength arising by chance. This is determined from the empirical distribution of correlations between this regulatory region and all other regulatory regions in the entire network of all regulatory regions in the genome. c) The subset of regulatory regions containing disease-associated SNPs form an unexpectedly dense grouping in the network, but this may not be visible in a two-dimensional representation (for illustration, this network shows all correlations between regulatory regions with Spearman r > 0.7, layout generated by the FMMM algorithm). The NDA score assigned to any one node is the sum of the links it shares with other nodes in the chosen subset (see Supplementary Methods for a full explanation). d) NDA scores from the input subset of regulatory elements are compared with NDA scores from permuted subsets of regulatory elements in order to quantify the false discovery rate (FDR).

Results

Discovery and prioritisation of GWAS hits in regulatory sequence

We defined regulatory regions as the transcription start site (TSS) −300bp and +100bp for promoters⁵, and the region between bidirectional TSS for enhancers³ (See Online Methods). For each of 7 GWAS studies for which high-resolution complete datasets were publicly available, we identified a set of regulatory regions containing variants with GWAS p-values below a permissive threshold (5e-8; Table 1). We devised NDA to examine the similarity in activity patterns among the set of regulatory regions detected in each GWAS (that is, the similarity in expression profile of transcripts arising from these regulatory regions).

View this table:

Table 1:

Results of coexpression analysis for a range of human traits for which high-quality data are available: Crohn’s disease, ulcerative colitis, high-density lipoprotein (HDL), low-density lipoprotein (LDL), total cholesterol, triglycerides, height, systolic blood pressure (SBP) and diastolic blood pressure (DBP). *Initial optimisation and parameterisation of the algorithm was undertaken using a random subset of data from this study.

NDA detected significant coexpression (see below) among the sets of transcripts arising from regulatory regions containing variants associated with each of the following diseases and traits: ulcerative colitis, Crohn’s disease, height, HDL cholesterol, LDL cholesterol, total cholesterol and triglyceride levels (Table 1). One lower-resolution study, of blood pressure, was also analysed: in this smaller study, no coexpression signal was detected among transcripts arising near variants associated with either systolic or diastolic blood pressure (Table 1).

Significant coexpression was only detected within loci containing variants with low p-values (Fig 2a). Similar expression profiles are often seen arising from regulatory regions that are close to each other on the same chromosome, which may also span linkage disequilibrium blocks. The effect of this on the coexpression signal was mitigated by grouping nearby (within 100,000bp) regulatory regions into a single unit, unless they have notably different expression patterns (Fig 2c; Online Methods). SNPs in nearby regulatory regions are also more likely to be in linkage disequilibrium, and these regulatory regions themselves are more likely to share cis or short range trans-regulatory signals in common. We checked for significant linkage disequilibrium between regulatory regions assigned to independent groups (Supplementary files 1, 4-12). At a threshold of r² > 0.8, there is no linkage disequilibrium between significantly coexpressed groups; three examples of weaker linkage relationships were detected with 0.08 ≤ r² ≤ 0.6 (Supplementary file 1).

Figure 2: a. Change in coexpression signal in 800 SNPs selected at random from GWAS of Crohn’s disease — log₁₀(p) bins from 0 to 5. No signal for coexpression is detected at weak p-values. Percentage of significantly coexpressed entities (hits, FDR < 0.05) and p-value (Kolmogorov-Smirnov test) comparing observed and expected distributions are shown below each plot. b. Relationship between GWAS p-value for a SNP, and coexpression scores of individual promoters assigned to that SNP. Top panel: GWAS p-values (log scale) vs corrected coexpression scores. Bottom panel: linear regression lines for data in top panel; Spearman’s r and associated p-values are shown for each trait. Only significantly coexpressed (FDR < 0.05) promoters are included. c. Detail of chromosomal region containing variants associated with LDL cholesterol. Top panel: Rectangles show corrected coexpression scores of individual regulatory regions; groups of regulatory regions considered as a single unit share the same colour. Black circles show GWAS p-values for individual SNPs. Bottom panel: known protein coding transcripts in sense (green) and antisense (purple).

Regulatory regions around individual TSS with higher coexpression scores contain variants with stronger GWAS p-values (Fig 2b), indicating that this independent signal provides additional information that may be used for fine-mapping causative loci (Fig 2c).

In order to enable the detection of new regulatory regions with strong coexpression relationships, we chose a permissive p-value threshold for trait association of 5×10^-6(see Online Methods). GWAS data for Crohn’s disease¹² were used for initial optimisation of the NDA approach; among GWAS datasets for phenotypes that were not used in algorithm development (i.e. all apart from Crohn’s disease), 0-24% of regulatory regions containing a GWAS SNP showed significant coexpression with other regulatory elements associated with the same phenotype (FDR < 0.05, compared with 100 permuted subsets of equal size; see Online Methods).

For a given disease, regulatory regions containing GWAS variants are coexpressed if they share similar activity patterns (i.e. similar expression patterns among transcripts arising from these regulatory regions) with other regulatory regions implicated in that disease. Figure 3 shows significant coexpression superimposed on a two-dimensional representation of the entire network of pairwise correlations. Since activity (transcript expression) was measured in numerous samples, the true proximity of regulatory regions to one another cannot be accurately represented in two dimensions – a perfect representation would require as many dimensions as there are unique samples. However, the NDA method is designed to quantify proximity in network space, so that significantly coexpressed elements are detected, even if they are not directly adjacent on a two-dimensional representation of the network (Figure 3). Among strong coexpression was seen between loci that were widely separated on the genome (Figure 4).

Figure 3: Network layouts (Spearman r > 0.5, FMMM layout algorithm, largest component only is shown) showing position of significant hits on a two-dimensional network representation of FANTOM5 regulatory regions. Red circles: significantly-coexpressed (FDR < 0.05) regulatory regions containing a putative GWAS hit (p < 5 × 10^-6) for this trait. Blue circles: regulatory regions containing a putative GWAS hit (p < 5 × 10^-6) for this trait that are not significantly coexpressed (FDR > 0.05).

Figure 4: (Top panels) Circular plots of coexpression links between different locations on the genome, illustrating the spatial separation of highly-correlated regulatory regions. The coloured outer circle shows an end-to-end concatenated view of the human chromosomes. The black inner circle shows log₁₀ GWAS p-values for included SNPs. Links depict an association between two regulatory regions containing these wSNPs and are coloured according to — log₁₀(p)(line colour indicates log₁₀(p): red>3, blue> 2, green> 1.5). (Bottom panels) Quantile-quantile plots showing observed and expected coexpression scores. Expected coexpression scores are derived from circular permuted subsets of regulatory regions (post-mapping permutations; black circles) or SNPs chosen by circular permutations against the background of all SNPs genotyped in each study. Data are shown for Crohn’s disease, ulcerative colitis, high-density lipoprotein (HDL), low-density lipoprotein (LDL), total cholesterol, triglycerides, height, systolic blood pressure (SBP) and diastolic blood pressure (DBP)

The coexpression signal essentially combines the signal for association in a GWAS with the location and activity pattern of regulatory regions on the genome. We deliberately chose a permissive GWAS p-value threshold in order to enable the detection of new signals that did not achieve genome-wide significance in the original studies. For example, we found that coexpressed transcripts for both LDL and total cholesterol (TC) arise from promoters for well-studied genes such as APOB¹³ and ABCG5¹⁴, but also from regulatory regions not previously associated with cholesterol levels. A promoter for SLC22A1, which encodes an organic cation transporter, OCT1¹⁵, is strongly coexpressed among elements associated with both conditions (Supplementary File 1). OCT1 transcription is regulated by cholesterol¹⁶ and the transporter regulates hepatic steatosis through its role in thiamine transport¹⁷. This action of OCT1 is inhibited by metformin¹⁷, an oral hypoglycaemic agent whose cholesterol-lowering effect¹⁸ is not well understood¹⁹. Full results of coexpression analyses are in Supplementary File 1, and online at www.coexpression.net.

Cell-type and tissue specificity

The significantly-coexpressed networks detected here could be regarded as revealing the signature expression profile, at least within the FANTOM5 dataset, for a given disease or trait. We next explored whether these signature expression patterns reveal cell types or biological processes that may contribute to the trait or disease susceptibility.

We therefore ranked cell types and tissues by transcriptional activity for each of the significantly-coexpressed loci for each trait, and combined the rankings using a robust rank aggregation²⁰ (Online Methods). By first detecting the characteristic expression signature associated with a given phenotype using only high-resolution GWAS data, and then detecting the cell type and tissue activity profiles that underlie this signature, we improve on the statistical power of previous methods that have attempted to detect cell-type specific signatures of disease^4,6,21. Strong signals reported previously are highly significant in our analysis; for example genetic loci associated with cholesterol are transcriptionally active in hepatocytes and liver tissue⁶(Supplementary File 8).

This analysis reveals robust cell-type associations that have important implications for understanding disease pathogenesis. For example, cell-type associations with Crohn’s disease were restricted to immune cells, particularly monocytes exposed to inflammatory stimuli (Supplementary File 4). In contrast, cell type associations with ulcerative colitis were statistically significant in rectum, colon and intestine samples, and in a distinct group of immune cells: macrophages exposed to bacterial lipopolysaccharide (Supplementary File 5). This is consistent with the view that ulcerative colitis, in which disease processes are primarily restricted to the colon and rectum, is a consequence of dysregulation of processes that are intrinsic to the large bowel, including epithelial barrier function²², whereas Crohn’s disease is a multisystem autoimmune disorder with more diverse extra-intestinal manifestations²³, consistent with a primary immune aetiology.

Discussion

The development of high-throughput genotyping methods has led to an explosion of associations between genetic markers and human diseases²⁴. The results presented here are a step towards overcoming the next challenge for this field: making sense of these associations to advance the practice of medicine. There has been increasing recognition of the potential to utilise prior knowledge to improve detection and interpretation of genome-wide signals²⁵. The results of our analysis demonstrate that there is biological information in the coexpression of genetic variants associated with a particular disease that can provide the basis for prioritising variants that would not otherwise meet standard thresholds for genome-wide statistical significance.

We report relationships between numerous regulatory regions that are not associated with named genes – a restriction that has previously limited the transition from genetic discovery to biological understanding^26-30. The analysis reveals the impact of specific enhancers and promoters that may be remote from the genes they regulate, or may contribute to tissue-specific regulation of a gene that may otherwise appear to be more widely-expressed.

Even for those disease-associated variants that can be reliably assigned to a named gene, previous attempts to draw functional inferences have, by necessity, relied on published data²⁶,annotated biological pathways³¹, or gene sets^30,32. Although many important insights have been gained from these approaches, they share a fundamental limitation: reliance on existing knowledge. This restricts the ability to exploit the potential of genomics to deliver insights into new, previously unseen, mechanisms of disease³³.

The data used for development and testing of the coexpression approach were from large meta-analyses that incorporate genotyping (or imputation) of genetic variants at extremely high resolution, increasing the probability that variants will be found within regulatory regions. In future, the availability of whole-genome sequencing can reasonably be expected to produce many additional high-quality datasets for coexpression analysis. In principle, the NDA approach can be generalised to any network in which it is desirable to quantify the proximity of a subset of nodes.

The scale, depth and breadth of the FANTOM5 expression atlas, together with the NDA approach, enable detection of subtle coexpression signals for regulatory regions that have previously been undetectable. As additional genetic studies become available at greater genotyping resolution, we anticipate that this method will detect new genetic associations with disease, coexpressed modules underlying pathogenesis, identify critical cell types implicated in mechanisms of disease.

DATA ACCESS

The FANTOM5 atlas is accessible from http://fantom.gsc.riken.jp/data/

An online service running the coexpression method is available at https://coexpression.roslin.ed.ac.uk

username: fantom5

password: review

Authors’ contributions

JKB conceived the study, designed and led the analyses and wrote the manuscript. AB and AG contributed to computational optimisation and description of methods. AB and SC generated network and circos images, respectively. CH, JB, JBB, TF, and AT advised on statistical and network analysis methods. ML managed the data collection, including annotation, expression profiling, metadata association and archiving. CW, JS, NH and TF contributed biological expertise. CW, RA, JS, AS, MR, VB, and PH advised on methodology. ARRF, MI, CD, NK, TL, JK, HS, HK, YH, and PC organised the FANTOM5 project including sample collection, data production, mapping and tag clustering. JKB, ARRF, DAH, TF, GJF, PC and YH provided resources. DAH and ARRF advised on methodology, and contributed to the manuscript. All authors contributed to and approved the final version of the manuscript.

DISCLOSURE DECLARATION

All authors report that they have no conflicts of interests to declare in respect of this manuscript.

ACKNOWLEDGEMENTS

We would like to express our gratitude for the diligence and professionalism of the entire FANTOM5 consortium and to the members of the IIBDGC group, GIANT consortium, and Global Lipids consortium for freely sharing their data. We are particularly grateful to the tens of thousands of patients and healthy volunteers who donated DNA and other material to these studies.

JKB gratefully acknowledges funding support from a Wellcome Trust Intermediate Clinical Fellowship (103258/Z/13/Z) and a Wellcome-Beit Prize (103258/Z/13/A), BBSRC Institute Strategic Programme Grant to the Roslin Institute (BBS/E/D/20241864), the UK Intensive Care Foundation, and the Edinburgh Clinical Academic Track (ECAT) scheme. Funds were provided to the Roslin Institute through a BBSRC Strategic Programme Grant (JKB, SC, CSH, GJF, TCF, DAH; BBS/E/D/20211551, BBS/E/D/20231760). We acknowledge the financial support provided by the MRC-HGU Core Fund (CSH, AT). FANTOM5 was made possible by a Research Grant for RIKEN Omics Science Center from MEXT to YH and a Grant of the Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from the MEXT, Japan to YH. RIKEN Centre for Life Science Technologies, Division of Genomic Technologies members (RIKEN CLST (DGT)) are supported by institutional funds from the MEXT, Japan. ARRF is supported by a Senior Cancer Research Fellowship from the Cancer Research Trust and funds raised by the Ride to Conquer Cancer. JCB is supported by Wellcome Trust grant WT098051. GJF acknowledges the support of an NHMRC Career Development Fellowship (GNT1045237), NHMRC Project Grants (GNT1042449, GNT1045991, GNT1067983 and GNT1068789), and the EU FP7 under grant agreement No. 259743 underpinning the MODHEP consortium. MR was supported by grants from the Deutsch Forschungsgemeinschaft, the German Cancer Aid and the Rudolf Bartling Foundation. RA was supported by funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 638273). US and VBB are supported by the KAUST Base Research Fund to VBB and KAUST CBRC Base Fund. RMP is supported by grants from the US National Institutes of Health (R01-AR057108, R01-AR056768, U01-GM092691 and R01-AR059648) and holds a Career Award for Medical Scientists from the Burroughs Wellcome Fund. RA and AS were supported by funds from FP7/2007-2013/ERC grant agreement 204135, the Novo Nordisk foundation, and the Lundbeck Foundation and the Danish Cancer Society. CAW is supported by a Queensland Government Smart Futures Fellowship, and samples were collected under Australian National Health and Medical Research council project grants 455947 and 597452, under agreement from the Australian Red Cross 11-02QLD-10 and the University of QLD ethics committee.

REFERENCES

1.↵
Tenesa, A. & Haley, C. S. The heritability of human disease: estimation, uses and abuses. Nat Rev Genet 14, 139–149 (2013).
OpenUrl CrossRef PubMed
2.↵
Maurano, M. T. et al. Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science 337, 1190–1195 (2012).
OpenUrl Abstract/FREE Full Text
3.↵
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
OpenUrl CrossRef PubMed Web of Science
4.↵
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
OpenUrl CrossRef PubMed
5.↵
Forrest, A. R. R., Kawaji, H., Rehli, M., Baillie, J.K., et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
OpenUrl CrossRef PubMed Web of Science
6.↵
Marbach, D. et al. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Meth 13, 366–370 (2016).
OpenUrl
7.↵
The FANTOM Consortium et al. The Transcriptional Landscape of the Mammalian Genome. Science 309, 1559–1563 (2005).
OpenUrl Abstract/FREE Full Text
8.↵
Arner, E. et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010–1014 (2015).
OpenUrl Abstract/FREE Full Text
9.↵
Finucane, H. K. et al. Partitioning heritability by functional category using GWAS summary statistics. bioRxiv 14241 (2015). doi:10.1101/014241
OpenUrl Abstract/FREE Full Text
10.↵
Hume, D. A., Summers, K. M., Raza, S., Baillie, J. K. & Freeman, T. C. Functional clustering and lineage markers: insights into cellular differentiation and gene function from large-scale microarray studies of purified primary cell populations. Genomics 95, 328–338 (2010).
OpenUrl CrossRef PubMed Web of Science
11.↵
Mabbott, N. A., Baillie, J. K., Hume, D. A. & Freeman, T. C. Meta-analysis of lineage-specific gene expression signatures in mouse leukocyte populations. Immunobiology 215, 724–736 (2010).
OpenUrl CrossRef PubMed
12.↵
Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nature Genetics 42, 1118–1125 (2010).
OpenUrl CrossRef PubMed Web of Science
13.↵
Tybjærg-Hansen, A., Steffensen, R., Meinertz, H., Schnohr, P. & Nordestgaard, B. G. Association of Mutations in the Apolipoprotein B Gene with Hypercholesterolemia and the Risk of Ischemic Heart Disease. New England Journal of Medicine 338, 1577–1584 (1998).
OpenUrl CrossRef PubMed Web of Science
14.↵
Lee, M.-H. et al. Identification of a gene, ABCG5, important in the regulation of dietary cholesterol absorption. Nat Genet 27, 79–83 (2001).
OpenUrl PubMed Web of Science
15.↵
Klaassen, C. D. & Aleksunes, L. M. Xenobiotic, Bile Acid, and Cholesterol Transporters: Function and Regulation. Pharmacol Rev 62, 1–96 (2010).
OpenUrl Abstract/FREE Full Text
16.↵
Dias, V. & Ribeiro, V. The expression of the solute carriers NTCP and OCT-1 is regulated by cholesterol in HepG2 cells. Fundam Clin Pharmacol 21, 445–450 (2007).
OpenUrl CrossRef PubMed
17.↵
Chen, L. et al. OCT1 is a high-capacity thiamine transporter that regulates hepatic steatosis and is a target of metformin. Proc Natl Acad Sci U S A 111, 9983–9988 (2014).
OpenUrl Abstract/FREE Full Text
18.↵
Bailey, C. J. & Turner, R. C. Metformin. N. Engl. J. Med. 334, 574–579 (1996).
OpenUrl CrossRef PubMed Web of Science
19.↵
Shaw, R. J. et al. The kinase LKB1 mediates glucose homeostasis in liver and therapeutic effects of metformin. Science 310, 1642–1646 (2005).
OpenUrl Abstract/FREE Full Text
20.↵
Kolde, R., Laur, S., Adler, P. & Vilo, J. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28, 573–580 (2012).
OpenUrl CrossRef PubMed Web of Science
21.↵
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
OpenUrl CrossRef PubMed Web of Science
22.↵
Xavier, R. J. & Podolsky, D. K. Unravelling the pathogenesis of inflammatory bowel disease. Nature 448, 427–434 (2007).
OpenUrl CrossRef PubMed Web of Science
23.↵
Mekhjian, H. S., Switz, D. M., Melnyk, C. S., Rankin, G. B. & Brooks, R. K. Clinical features and natural history of Crohn’s disease. Gastroenterology 77, 898–906 (1979).
OpenUrl PubMed Web of Science
24.↵
Li, M. J. et al. GWASdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res 40, D1047–D1054 (2012).
OpenUrl CrossRef PubMed Web of Science
25.↵
MacLeod, I. M. et al. Exploiting biological priors and sequence variants enhances QTL discovery and genomic prediction of complex traits. BMC Genomics 17, 144 (2016).
OpenUrl CrossRef PubMed
26.↵
Raychaudhuri, S. et al. Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions. PLoS Genetics 5, e1000534 (2009).
OpenUrl
27.
Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun 6, 5890 (2015).
OpenUrl CrossRef PubMed
28.
Wojcik, G. L., Kao, W. L. & Duggal, P. Relative performance of gene- and pathway-level methods as secondary analyses for genome-wide association studies. BMC Genet 16, (2015).
29.
Rossin, E. J. et al. Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology. PLoS Genet 7, e1001273 (2011).
OpenUrl CrossRef PubMed
30.↵
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545–15550 (2005).
OpenUrl Abstract/FREE Full Text
31.↵
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).
OpenUrl CrossRef PubMed Web of Science
32.↵
Nam, D., Kim, J., Kim, S.-Y. & Kim, S. GSA-SNP: a general approach for gene set analysis of polymorphisms. Nucl. Acids Res. 38, W749–W754 (2010).
OpenUrl CrossRef PubMed Web of Science
33.↵
Baillie, J. K. Targeting the host immune response to fight infection. Science 344, 807–808 (2014).
OpenUrl Abstract/FREE Full Text

References

[1].↵
Carl A Anderson, Gabrielle Boucher, Charlie W Lees, Andre Franke, Mauro D’Amato, Kent D Taylor, James C Lee, Philippe Goyette, Marcin Imielinski, Anna Latiano, Caroline Lagac, Regan Scott, Leila Amininejad, Suzannah Bumpstead, Leonard Baidoo, Robert N Baldassano, Murray Barclay, Theodore M Bayless, Stephan Brand, Carsten Bning, Jean-Frdric Colombel, Lee A Denson, Martine De Vos, Marla Dubinsky, Cathryn Edwards, David Ellinghaus, Rudolf S N Fehrmann, James A B Floyd, Timothy Florin, Denis Franchimont, Lude Franke, Michel Georges, Jrgen Glas, Nicole L Glazer, Stephen L Guthery, Talin Haritunians, Nicholas K Hayward, Jean-Pierre Hugot, Gilles Jobin, Debby Laukens, Ian Lawrance, Marc Lmann, Arie Levine, Cecile Libioulle, Edouard Louis, Dermot P McGovern, Monica Milla, Grant W Montgomery, Katherine I Morley, Craig Mowat, Aylwin Ng, William Newman, Roel A Ophoff, Laura Papi, Orazio Palmieri, Laurent Peyrin-Biroulet, Julin Pans, Anne Phillips, Natalie J Prescott, Deborah D Proctor, Rebecca Roberts, Richard Russell, Paul Rutgeerts, Jeremy Sanderson, Miquel Sans, Philip Schumm, Frank Seibold, Yashoda Sharma, Lisa A Simms, Mark Seielstad, A Hillary Steinhart, Stephan R Targan, Leonard H van den Berg, Morten Vatn, Hein Verspaget, Thomas Walters, Cisca Wijmenga, David C Wilson, Harm-Jan Westra, Ramnik J Xavier, Zhen Z Zhao, Cyriel Y Ponsioen, Vibeke Andersen, Leif Torkvist, Maria Gazouli, Nicholas P Anagnou, Tom H Karlsen, Limas Kupcinskas, Jurgita Sventoraityte, John C Mansfield, Subra Kugathasan, Mark S Silverberg, Jonas Halfvarson, Jerome I Rotter, Christopher G Mathew, Anne M Griffiths, Richard Gearry, Tariq Ahmad, Steven R Brant, Mathias Chamaillard, Jack Satsangi, Judy H Cho, Stefan Schreiber, Mark J Daly, Jeffrey C Barrett, Miles Parkes, Vito Annese, Hakon Hakonarson, Graham Radford-Smith, Richard H Duerr, Sverine Vermeire, Rinse K Weersma, and John D Rioux. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nature Genetics, 43(3):246–252, 2011.
OpenUrl CrossRef PubMed Web of Science
[2].↵
Yoav Benjamini. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics, 29(4):1165–1188, 2001.
OpenUrl CrossRef Web of Science
[3].↵
Sonja I. Berndt, Stefan Gustafsson, Reedik Mgi, Andrea Ganna, Eleanor Wheeler, Mary F. Feitosa, Anne E. Justice, Keri L. Monda, Damien C. Croteau-Chonka, Felix R. Day, Tnu Esko, Tove Fall, Teresa Ferreira, Davide Gentilini, Anne U. Jackson, Jian’an Luan, Joshua C. Randall, Sailaja Vedantam, Cristen J. Willer, Thomas W. Winkler, Andrew R. Wood, Tsegaselassie Workalemahu, Yi-Juan Hu, Sang Hong Lee, Liming Liang, Dan-Yu Lin, Josine L. Min, Benjamin M. Neale, Gudmar Thorleifsson, Jian Yang, Eva Albrecht, Najaf Amin, Jennifer L. Bragg-Gresham, Gemma Cadby, Martin den Heijer, Niina Eklund, Krista Fischer, Anuj Goel, Jouke-Jan Hottenga, Jennifer E. Huffman, Ivonne Jarick, sa Johansson, Toby Johnson, Stavroula Kanoni, Marcus E. Kleber, Inke R. Knig, Kati Kristiansson, Zoltn Kutalik, Claudia Lamina, Cecile Lecoeur, Guo Li, Massimo Mangino, Wendy L. McArdle, Carolina Medina-Gomez, Martina Mller-Nurasyid, Julius S. Ngwa, Ilja M. Nolte, Lavinia Paternoster, Sonali Pechlivanis, Markus Perola, Marjolein J. Peters, Michael Preuss, Lynda M. Rose, Jianxin Shi, Dmitry Shungin, Albert Vernon Smith, Rona J. Strawbridge, Ida Surakka, Alexander Teumer, Mieke D. Trip, Jonathan Tyrer, Jana V. Van Vliet-Ostaptchouk, Liesbeth Vandenput, Lindsay L. Waite, Jing Hua Zhao, Devin Absher, Folkert W. Asselbergs, Mustafa Atalay, Antony P. Attwood, Anthony J. Balmforth, Hanneke Basart, John Beilby, Lori L. Bonnycastle, Paolo Brambilla, Marcel Bruinenberg, Harry Campbell, Daniel I. Chasman, Peter S. Chines, Francis S. Collins, John M. Connell, William O. Cookson, Ulf de Faire, Femmie de Vegt, Mariano Dei, Maria Dimitriou, Sarah Edkins, Karol Estrada, David M. Evans, Martin Farrall, Marco M. Ferrario, Jean Ferrires, Lude Franke, Francesca Frau, Pablo V. Gejman, Harald Grallert, Henrik Grnberg, Vilmundur Gudnason, Alistair S. Hall, Per Hall, Anna-Liisa Hartikainen, Caroline Hayward, Nancy L. Heard-Costa, Andrew C. Heath, Johannes Hebebrand, Georg Homuth, Frank B. Hu, Sarah E. Hunt, Elina Hyppnen, Carlos Iribarren, Kevin B. Jacobs, John-Olov Jansson, Antti Jula, Mika Khnen, Sekar Kathiresan, Frank Kee, Kay-Tee Khaw, Mika Kivimki, Wolfgang Koenig, Aldi T. Kraja, Meena Kumari, Kari Kuulasmaa, Johanna Kuusisto, Jaana H. Laitinen, Timo A. Lakka, Claudia Langenberg, Lenore J. Launer, Lars Lind, Jaana Lindstrm, Jianjun Liu, Antonio Liuzzi, Marja-Liisa Lokki, Mattias Lorentzon, Pamela A. Madden, Patrik K. Magnusson, Paolo Manunta, Diana Marek, Winfried Mrz, Irene Mateo Leach, Barbara McKnight, Sarah E. Medland, Evelin Mihailov, Lili Milani, Grant W. Montgomery, Vincent Mooser, Thomas W. Mhleisen, Patricia B. Munroe, Arthur W. Musk, Narisu Narisu, Gerjan Navis, George Nicholson, Ellen A. Nohr, Ken K. Ong, Ben A. Oostra, Colin N. A. Palmer, Aarno Palotie, John F. Peden, Nancy Pedersen, Annette Peters, Ozren Polasek, Anneli Pouta, Peter P. Pramstaller, Inga Prokopenko, Carolin Ptter, Aparna Radhakrishnan, Olli Raitakari, Augusto Rendon, Fernando Rivadeneira, Igor Rudan, Timo E. Saaristo, Jennifer G. Sambrook, Alan R. Sanders, Serena Sanna, Jouko Saramies, Sabine Schipf, Stefan Schreiber, Heribert Schunkert, So-Youn Shin, Stefano Signorini, Juha Sinisalo, Boris Skrobek, Nicole Soranzo, Alena Stankov, Klaus Stark, Jonathan C. Stephens, Kathleen Stirrups, Ronald P. Stolk, Michael Stumvoll, Amy J. Swift, Eirini V. Theodoraki, Barbara Thorand, David-Alexandre Tregouet, Elena Tremoli, Melanie M. Van der Klauw, Joyce B. J. van Meurs, Sita H. Vermeulen, Jorma Viikari, Jarmo Virtamo, Veronique Vitart, Grard Waeber, Zhaoming Wang, Elisabeth Widn, Sarah H. Wild, Gonneke Willemsen, Bernhard R. Winkelmann, Jacqueline C. M. Witteman, Bruce H. R. Wolffenbuttel, Andrew Wong, Alan F. Wright, M. Carola Zillikens, Philippe Amouyel, Bernhard O. Boehm, Eric Boerwinkle, Dorret I. Boomsma, Mark J. Caulfield, Stephen J. Chanock, L. Adrienne Cupples, Daniele Cusi, George V. Dedoussis, Jeanette Erdmann, Johan G. Eriksson, Paul W. Franks, Philippe Froguel, Christian Gieger, Ulf Gyllensten, Anders Hamsten, Tamara B. Harris, Christian Hengstenberg, Andrew A. Hicks, Aroon Hingorani, Anke Hinney, Albert Hofman, Kees G. Hovingh, Kristian Hveem, Thomas Illig, Marjo-Riitta Jarvelin, Karl-Heinz Jckel, Sirkka M. Keinanen-Kiukaanniemi, Lambertus A. Kiemeney, Diana Kuh, Markku Laakso, Terho Lehtimki, Douglas F. Levinson, Nicholas G. Martin, Andres Metspalu, Andrew D. Morris, Markku S. Nieminen, Inger Njlstad, Claes Ohlsson, Albertine J. Oldehinkel, Willem H. Ouwehand, Lyle J. Palmer, Brenda Penninx, Chris Power, Michael A. Province, Bruce M. Psaty, Lu Qi, Rainer Rauramaa, Paul M. Ridker, Samuli Ripatti, Veikko Salomaa, Nilesh J. Samani, Harold Snieder, Thorkild I. A. Srensen, Timothy D. Spector, Kari Stefansson, Anke Tnjes, Jaakko Tuomilehto, Andr G. Uitterlinden, Matti Uusitupa, Pim van der Harst, Peter Vollenweider, Henri Wallaschofski, Nicholas J. Wareham, Hugh Watkins, H.-Erich Wichmann, James F. Wilson, Goncalo R. Abecasis, Themistocles L. Assîmes, Ins Barroso, Michael Boehnke, Ingrid B. Borecki, Panos Deloukas, Caroline S. Fox, Timothy Frayling, Leif C. Groop, Talin Haritunian, Iris M. Heid, David Hunter, Robert C. Kaplan, Fredrik Karpe, Miriam F. Moffatt, Karen L. Mohlke, Jeffrey R. O’Connell, Yudi Pawitan, Eric E. Schadt, David Schlessinger, Valgerdur Steinthorsdottir, David P. Strachan, Unnur Thorsteinsdottir, Cornelia M. van Duijn, Peter M. Visscher, Anna Maria Di Blasio, Joel N. Hirschhorn, Cecilia M. Lindgren, Andrew P. Morris, David Meyre, Andr Scherag, Mark I. McCarthy, Elizabeth K. Speliotes, Kari E. North, Ruth J. F. Loos, and Erik Ingelsson. Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nature Genetics, 45(5):501–512, 2013.
OpenUrl CrossRef PubMed
[4].↵
Piero Carninci, Albin Sandelin, Boris Lenhard, Shintaro Katayama, Kazuro Shimokawa, Jasmina Ponjavic, Colin A M Semple, Martin S Taylor, Par G Engstrom, Martin C Frith, Alistair R R Forrest, Wynand B Alkema, Sin Lam Tan, Charles Plessy, Rimantas Kodzius, Timothy Ravasi, Takeya Kasukawa, Shiro Fukuda, Mutsumi Kanamori-Katayama, Yayoi Kitazume, Hideya Kawaji, Chikatoshi Kai, Mari Nakamura, Hideaki Konno, Kenji Nakano, Salim Mottagui-Tabar, Peter Arner, Alessandra Chesi, Stefano Gustincich, Francesca Persichetti, Harukazu Suzuki, Sean M Grimmond, Christine A Wells, Valerio Orlando, Claes Wahlestedt, Edison T Liu, Matthias Harbers, Jun Kawai, Vladimir B Bajic, David A Hume, and Yoshihide Hayashizaki. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet, 38(6):626–635, 2006.
OpenUrl CrossRef PubMed Web of Science
[5].↵
Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nature Genetics, 45(11):1274–1283, 2013.
OpenUrl CrossRef PubMed
[6].↵
Kawaji H. Rehli M.-Baillie J.K. et al. Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature, 507(7493):462–470, 2014.
OpenUrl CrossRef PubMed Web of Science
[7].↵
Andre Franke, Dermot P B McGovern, Jeffrey C Barrett, Kai Wang, Graham L Radford-Smith, Tariq Ahmad, Charlie W Lees, Tobias Balschun, James Lee, Rebecca Roberts, Carl A Anderson, Joshua C Bis, Suzanne Bumpstead, David Ellinghaus, Eleonora M Festen, Michel Georges, Todd Green, Talin Haritunians, Luke Jostins, Anna Latiano, Christopher G Mathew, Grant W Montgomery, Natalie J Prescott, Soumya Raychaudhuri, Jerome I Rotter, Philip Schumm, Yashoda Sharma, Lisa A Simms, Kent D Taylor, David Whiteman, Cisca Wijmenga, Robert N Baldassano, Murray Barclay, Theodore M Bayless, Stephan Brand, Carsten Bning, Albert Cohen, Jean-Frederick Colombel, Mario Cottone, Laura Stronati, Ted Denson, Martine De Vos, Renata D’Inca, Marla Dubinsky, Cathryn Edwards, Tim Florin, Denis Franchimont, Richard Gearry, Jrgen Glas, Andre Van Gossum, Stephen L Guthery, Jonas Halfvarson, Hein W Verspaget, Jean-Pierre Hugot, Amir Karban, Debby Laukens, Ian Lawrance, Marc Lemann, Arie Levine, Cecile Libioulle, Edouard Louis, Craig Mowat, William Newman, Julin Pans, Anne Phillips, Deborah D Proctor, Miguel Regueiro, Richard Russell, Paul Rutgeerts, Jeremy Sanderson, Miquel Sans, Frank Seibold, A Hillary Steinhart, Pieter C F Stokkers, Leif Torkvist, Gerd Kullak-Ublick, David Wilson, Thomas Walters, Stephan R Targan, Steven R Brant, John D Rioux, Mauro D’Amato, Rinse K Weersma, Subra Kugathasan, Anne M Griffiths, John C Mansfield, Severine Vermeire, Richard H Duerr, Mark S Silverberg, Jack Satsangi, Stefan Schreiber, Judy H Cho, Vito Annese, Hakon Hakonarson, Mark J Daly, and Miles Parkes. Genome-wide meta-analysis increases to 71 the number of confirmed crohn’s disease susceptibility loci. Nature Genetics, 42(12):1118–1125, 2010.
OpenUrl CrossRef PubMed Web of Science
[8].↵
Lucia A Hindorff, Praveen Sethupathy, Heather A Junkins, Erin M Ramos, Jayashri P Mehta, Francis S Collins, and Teri A Manolio. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences of the United States of America, 106(23):9362–9367, 2009.
OpenUrl Abstract/FREE Full Text
[9].↵
A. S. Hinrichs, D. Karolchik, R. Baertsch, G. P. Barber, G. Bejerano, H. Clawson, M. Diekhans, T. S. Furey, R. A. Harte, F. Hsu, J. Hillman-Jackson, R. M. Kuhn, J. S. Pedersen, A. Pohl, B. J. Raney, K. R. Rosenbloom, A. Siepel, K. E. Smith, C. W. Sugnet, A. Sultan-Qurraie, D. J. Thomas, H. Trumbower, R. J. Weber, M. Weirauch, A. S. Zweig, D. Haussler, and W. J. Kent. The ucsc genome browser database: update 2006. Nucleic Acids Research, 34(suppl 1):D590–D598, 2006.
OpenUrl CrossRef PubMed Web of Science
[10].↵
Tae-Kyung Kim, Martin Hemberg, Jesse M. Gray, Allen M. Costa, Daniel M. Bear, Jing Wu, David A. Harmin, Mike Laptewicz, Kellie Barbara-Haley, Scott Kuersten, Eirene Markenscoff-Papadimitriou, Dietmar Kuhl, Haruhiko Bito, Paul F. Worley, Gabriel Kreiman, and Michael E. Greenberg. Widespread transcription at neuronal activity-regulated enhancers. Nature, 465(7295):182–187, 2010.
OpenUrl CrossRef PubMed Web of Science
[11].↵
Raivo Kolde, Sven Laur, Priit Adler, and Jaak Vilo. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics, 28(4):573–580, 2012.
OpenUrl CrossRef PubMed Web of Science
[12].↵
Boris Lenhard, Albin Sandelin, and Piero Carninci. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nature reviews. Genetics, 13(4):233–245, 2012.
OpenUrl CrossRef PubMed
[13].↵
Mulin Jun Li, Panwen Wang, Xiaorong Liu, Ee Lyn Lim, Zhangyong Wang, Meredith Yeager, Maria P. Wong, Pak Chung Sham, Stephen J. Chanock, and Junwen Wang. Gwasdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Research, 40(D1):D1047–D1054, 2012.
OpenUrl CrossRef PubMed Web of Science
[14].↵
S. T. Sherry, M.-H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. dbsnp: the ncbi database of genetic variation. Nucleic Acids Research, 29(1):308–311, 2001.
OpenUrl CrossRef PubMed Web of Science
[15].↵
The International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature, 478(7367):103–109, 2011.
OpenUrl CrossRef PubMed Web of Science
[16].↵
Gosia Trynka, Karen A. Hunt, Nicholas A. Bockett, Jihane Romanos, Vanisha Mistry, Agata Szperl, Sjoerd F. Bakker, Maria Teresa Bardella, Leena Bhaw-Rosun, Gemma Castillejo, Emilio G. de la Concha, Rodrigo Coutinho de Almeida, Kerith-Rae M. Dias, Cleo C. van Diemen, Patrick C. A. Dubois, Richard H. Duerr, Sarah Edkins, Lude Franke, Karin Fransen, Javier Gutierrez, Graham A. R. Heap, Barbara Hrdlickova, Sarah Hunt, Leticia Plaza Izurieta, Valentina Izzo, Leo A. B. Joosten, Cordelia Langford, Maria Cristina Mazzilli, Charles A. Mein, Vandana Midah, Mitja Mitrovic, Barbara Mora, Marinita Morelli, Sarah Nutland, Concepcin Nez, Suna Onengut-Gumuscu, Kerra Pearce, Mathieu Platteel, Isabel Polanco, Simon Potter, Carmen Ribes-Koninckx, Isis Ricao-Ponce, Stephen S. Rich, Anna Rybak, Jos Luis Santiago, Sabyasachi Senapati, Ajit Sood, Hania Szajewska, Riccardo Troncone, Jezabel Varad, Chris Wallace, Victorien M. Wolters, Alexandra Zhernakova, Spanish Consortium on the Genetics of Coeliac Disease (cegec), PreventCD Study Group, Wellcome Trust Case Control Consortium (wtccc), B. K. Thelma, Bozena Cukrowska, Elena Urcelay, Jose Ramon Bilbao, M. Luisa Mearin, Donatella Barisani, Jeffrey C. Barrett, Vincent Plagnol, Panos Deloukas, Cisca Wijmenga, and David A. van Heel. Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease. Nature Genetics, 43(12):1193–1201, 2011.
OpenUrl CrossRef PubMed
[17].↵
Benjamin F. Voight, Hyun Min Kang, Jun Ding, Cameron D. Palmer, Carlo Sidore, Peter S. Chines, Nol P. Burtt, Christian Fuchsberger, Yanming Li, Jeanette Erdmann, Timothy M. Frayling, Iris M. Heid, Anne U. Jackson, Toby Johnson, Tuomas O. Kilpelinen, Cecilia M. Lindgren, Andrew P. Morris, Inga Prokopenko, Joshua C. Randall, Richa Saxena, Nicole Soranzo, Elizabeth K. Speliotes, Tanya M. Teslovich, Eleanor Wheeler, Jared Maguire, Melissa Parkin, Simon Potter, N. William Rayner, Neil Robertson, Kathleen Stirrups, Wendy Winckler, Serena Sanna, Antonella Mulas, Ramaiah Nagaraja, Francesco Cucca, Ins Barroso, Panos Deloukas, Ruth J. F. Loos, Sekar Kathiresan, Patricia B. Munroe, Christopher Newton-Cheh, Arne Pfeufer, Nilesh J. Samani, Heribert Schunkert, Joel N. Hirschhorn, David Altshuler, Mark I. McCarthy, Gonalo R. Abecasis, and Michael Boehnke. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet, 8(8):e1002793, 2012.
OpenUrl CrossRef PubMed