Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies

Qiongshi Lu; Ryan L Powles; Qian Wang; Beixin J He; Hongyu Zhao

doi:10.1101/028464

Abstract

Extensive efforts have been made to understand genomic function through both experimental and computational approaches, yet proper annotation still remains challenging, especially in non-coding regions. In this manuscript, we introduce GenoSkyline, an unsupervised learning framework to predict tissue-specific functional regions through integrating high-throughput epigenetic annotations. GenoSkyline successfully identified a variety of non-coding regulatory machinery including enhancers, regulatory miRNA, and hypomethylated transposable elements in extensive case studies. Integrative analysis of GenoSkyline annotations and results from genome-wide association studies (GWAS) led to novel biological insights on the etiologies of a number of human complex traits. We also explored using tissue-specific functional annotations to prioritize GWAS signals and predict relevant tissue types for each risk locus. Brain and blood-specific annotations led to better prioritization performance for schizophrenia than standard GWAS p-values and non-tissue-specific annotations. As for coronary artery disease, heart-specific functional regions was highly enriched of GWAS signals, but previously identified risk loci were found to be most functional in other tissues, suggesting a substantial proportion of still undetected heart-related loci. In summary, GenoSkyline annotations can guide genetic studies at multiple resolutions and provide valuable insights in understanding complex diseases. GenoSkyline is available at http://genocanyon.med.yale.edu/GenoSkyline.

Introduction

Functionally annotating the human genome is a major goal in human genetics research. After years of community efforts, a variety of experimental and computational approaches have been developed and applied for genomic functional annotation. Comparative genomics studies have shown that approximately 4.5% of the human genome is conserved across mammals¹. Furthermore, the rich collection of epigenomic data generated by large consortia (e.g. ENCODE² and Epigenomics Roadmap Project³) also provides great insight for understanding the functional effects of the genome, especially in terms of non-coding regulatory machinery. To best utilize these rich data, we recently developed GenoCanyon⁴, a non-coding functional prediction approach based on integrative analysis of annotation data, whose performance was demonstrated through predicting well-studied regulatory DNA elements. GenoCanyon provides general predictions of non-coding functional regions in the human genome but does not fully utilize cell-type-specific information of epigenomic data. Incorporating cell-type-specific or tissue-specific information into annotation tools is essential not only for understanding the basic biology of the genome, but also for better characterizing genetic variation, as in the functional interpretation of risk loci identified from genome-wide association studies (GWAS).

GWAS has been a great success in the past decade, yet challenges still remain in both identifying additional risk variants and interpreting GWAS results. Current practice employs a significance threshold (i.e. 5×10⁻⁸) that controls family-wise error rate. Yet this approach is known to be underpowered when effect sizes are weak or moderate at risk loci⁵. Moreover, nearly 90% of the genome-wide significant hits in published GWAS are located in non-coding regions whose functional impact to human complex traits is largely unknown⁶. Complex linkage disequilibrium (LD) patterns also hinder our ability to identify real functional sites among correlated SNPs. Several methods have been proposed to integrate annotation data for better prioritizing GWAS signals and their effectiveness has also been well demonstrated^7-10. Tissue-specific functional annotations have the potential to bring even more biological insights to post-GWAS analysis and help understand complex disease etiology.

In this paper, we introduce GenoSkyline, a tissue-specific functional prediction tool based on integrated analysis of epigenomic annotation data. We demonstrate its ability to identify tissue-specific functionality from its performance to rediscover a number of experimentally validated non-coding elements. Next, we show valuable biological insights GenoSkyline can provide in post-GWAS analysis through integrative analysis of 15 human complex traits. We believe that GenoSkyline will prove to be a powerful tool for human genetics research because of its abilities to assess tissue-specific enrichment of GWAS signals, better prioritize GWAS signals, and offer biological interpretations of risk loci.

Results

Predicting tissue-specific functional regions in the human genome

The posterior probability of being functional given the annotation data is used to measure tissue-specific functional potential of each nucleotide in the human genome (Online Methods). It will be referred to as GenoSkyline (GS) score in following sections. We calculated GS scores for 7 human tissue types; brain, gastrointestinal tract (GI), lung, heart, blood, muscle, and epithelium (Supplementary Table 1). With a GS score cutoff of 0.5, 22.2% of the human genome is predicted to be functional in at least one of these tissue types, while 1.7% is functional in all 7 tissues (Figure 1a). Since GS score has a bimodal pattern, these results are not sensitive to cutoff choice (Supplementary Notes).

Figure 1.

General characteristics of GenoSkyline annotations. (a) Number of tissues in which nucleotides are functional. (b) Proportion of functional genome for each tissue type. (c) Overlap of functional regions across seven tissue types. The scale is log odds ratio.

Across tissue types, the percentage of predicted functional genome ranges from 5.4% (Lung) to 9.7% (GI) (Figure 1b and Supplementary Table 2). The overlap between heart-specific and muscle-specific functional regions is the largest among all pairs of tissues. Interestingly, although the percentage of functional genome in blood (8.4%) is similar to other tissue types, it overlaps less with the functional regions in other tissues (Figure 1c). This is consistent with the recent discovery that blood has the lowest levels of eQTL sharing with other tissues¹¹.

Investigating the performance of tissue-specific functional annotations Beta-globin gene complex

We now demonstrate GenoSkyline’s ability to predict tissue-specific functionality using a variety of experimentally validated functional machinery. Beta-globin (HBB) gene complex is an extensively studied genomic region on chromosome 11, containing 6 genes and 23 cis-regulatory modules (CRMs) that are known to control both the timing and the spatial pattern of gene expression^12,13. We compared GS scores for different tissue types in this region. Not surprisingly, blood-specific functionality was observed (Figure 2a). Among the 6 genes in this region, adult globin genes HBB and HBD, as well as pseudogene HBBP1 are captured well by blood-specific GS scores (Supplementary Table 3). However, embryonically expressed HBE1, fetally expressed HBG1 and HBG2, and the CRMs that regulate these genes have lower GS scores. This is possibly because 18 of the 24 cell lines used for developing blood-specific GS scores were acquired from adult samples (Supplementary Table 1). The mean blood-specific GS score in these genes increases from 0.388 to 0.704 after removing HBG1, HBG2, and HBE1. Similarly, a substantial boost in mean GS score is observed after removing CRMs regulating the embryonic and fetal globin genes (Figure 2b, Supplementary Figure 1). Compared with GenoCanyon, GenoSkyline provides less sensitive but highly specific functional predictions. Its ability of identifying tissue-specific functional coding and non-coding DNA elements has the potential to benefit diverse types of biological studies.

Figure 2.

Case studies of HBB gene complex, in vivo enhancers, and regulatory miRNAs. (a) Comparison of GenoCanyon prediction and GenoSkyline scores for seven tissues in HBB gene complex region. Red boxes mark the locations of CRMs. The number of red boxes is less than 23 because some CRMs are next to each other. (b) Mean blood-specific GS score for different region categories. (c) Boxplot of mean GS scores for enhancers in CNS, heart, and blood vessel. (d) Boxplot of mean GS scores for 11 human-accelerated elements near NPAS3. (e) Boxplot of mean GS scores for tissue-specific regulatory miRNAs.

Tissue-specific enhancers

In vivo enhancers with tissue-specific activity in central nervous system (CNS; n=585), heart (n=96), and blood vessel (n=9) were downloaded from VISTA enhancer browser¹⁴ (Online Methods). Mean GS scores for brain, heart, and blood tissues were calculated for each enhancer. Brain-specific and heart-specific GS scores were substantially higher in their respective enhancer categories compared to GS scores of non-relevant tissue types. Additionally, the mean blood-specific GS score also stands out for enhancers with observed activity in blood vessel despite the limited sample size (Figure 2c). In a separate study, 11 human-accelerated elements near the brain developmental transcription factor NPAS3 have been identified to act as tissue-specific enhancers within the nervous system¹⁵. Brain-specific GS scores for these enhancers are substantially higher than those for other tissue types (Figure 2d), concurrent with previous results.

Regulatory miRNAs

Next, we test if GenoSkyline could also capture miRNAs expressed exclusively in certain tissue types. Liang et al. studied the tissue specific expression pattern of eight groups of miRNAs¹⁶. We extracted and annotated four groups (groups I, II, III+IVa, and V from Liang et al.) that could be represented by the currently available tissue types in GenoSkyline annotations. These four groups of miRNAs were found to be expressed preferentially in skeletal/cardiac muscle, organs lined with epithelium, brain/peripheral blood mononuclear cell (PBMC), and heart, respectively through unsupervised clustering. The most relevant tissue types suggested by GenoSkyline for these four groups are muscle/heart, GI/epithelium, brain, and heart, respectively (Figure 2e). Our results based on integrative analysis of epigenomic data are consistent with the tissue-specific expression pattern reported by Liang et al.

Inter-genic regulation of myosin heavy chain

We applied GenoSkyline to a validated biologic switch in cardiac development and disease. Myosin heavy chain (MHC) is the major contractile protein in human striated muscle¹⁷. Cardiac muscle cells primarily express two isoforms, alpha-MHC (MYH6) and beta-MHC (MYH7)¹⁸. The ratio of alpha-to-beta isoforms determines cardiac contractility and allows for effective response to a wide range of physiologic and pathologic stimuli¹⁹. Alpha-to-beta ratio decreases in cardiac diseased states^20,21, and reversal of this shift is associated with better clinical outcomes²². miRNAs can regulate alpha-to-beta isoform shift, and prior studies in rodents have outlined a network of crosstalk between intronically expressed miRNAs and their host muscle genes^23,24. For instance, mir-208a, on an intron of MYH6, is a positive regulator of beta-MHC by targeting transcription factors that repress its expression²⁴. GS scores for MYH6 and mir-208a accurately reflect their cardiac-specific expression, whereas MYH7 and mir-208b exhibit strong signals in both skeletal and cardiac tissue (Figure 3a and Supplementary Table 4). This corresponds to known expression pattern of MYH7 and mir-208b in slow twitch skeletal muscle fibers¹⁷ as well as heart. We also explored tissue-specific functionality of two known distal enhancers of mir-208b identified on VISTA Enhancer Browser, hs2330 and hs1670. GS scores for hs2330 mirror MYH7/mir-208b signals. Interestingly, GS scores for hs1670, a distal enhancer flanking mir-208b, are also strong in nervous and GI tissue, a finding that agrees with its observed expression pattern in other tissues (based on VISTA Enhancer Browser data). Collectively, these results show that GenoSkyline can replicate the tissue-specific expression pattern of a complex inter-gene regulatory network.

Figure 3.

Case studies of MHC, ZRS, and hypomethylated TEs. (a) GenoSkyline scores for seven tissues in the genomic region surrounding MYH6 and MYH7. (b) ESC-specific and fetal-cell-specific GS scores for the 5th intron of LMBR1. The red box marks the location of ZRS. (c) Bar plot of the mean GS scores for the 5th intron of LMBR1 and ZRS across nine tissue and cell types. (d) Boxplot of mean GS scores for four groups of hypomethylated TEs.

Zone of polarizing activity regulatory sequence

GenoSkyline can also be generalized to identify tissue specificity outside of the 7 core categories discussed here, based on available experimental data. For example, Zone of polarizing activity regulatory sequence (ZRS), a well-studied developmental enhancer, is located in the fifth intron of LMBR1 gene. Acting as an enhancer of SHH, ZRS has been shown to play a crucial role in limb development²⁵. However, none of the seven tissue types in GenoSkyline suggest ZRS’s functionality (Supplementary Figure 2). In order to see if ZRS could be identified using epigenomic data of other cell types, we extended GenoSkyline to two new groups of cells that are potentially important for development, embryonic stem cells (ESC) and fetal cells (Supplementary Table 5). Both ESC and fetal-cell-specific GS scores successfully identified ZRS with high resolution (Figures 3b and 3c). This example shows that GenoSkyline is a flexible framework. Researchers could develop their own cell-group-specific functional annotations if ChIP-seq data are available for the cells of interest.

Hypomethylated transposable elements

A recent study of genome-wide DNA methylation status identified tissue-specific hypomethylated transposable elements (TE) exhibiting enhancer activities²⁶. We downloaded four groups of TEs that are hypomethylated in ESC H1, fetal brain/primary neural progenitor cells, adult breast epithelial cells, and PBMC/adult immune cells, respectively (Online Methods). Although DNA methylation data were not used for developing GenoSkyline, we were still able to provide highly consistent results, suggesting tissue-specific functionality of these TEs in ESC, brain, epithelium, and blood cell, respectively (Figure 3d).

Analyzing tissue-specific enrichment for 15 human complex traits

In the sections above, we demonstrated GenoSkyline’s ability of identifying tissue-specific functional regions in the human genome. Next, we focus on how GenoSkyline could help us understand human complex traits. Finucane et al. recently proposed using LD score regression to partition heritability of complex traits by functional categories²⁷. We applied LD score regression on 15 human complex diseases and traits (Supplementary Table 6), and calculated the tissue-specific enrichments using GenoSkyline annotations (Online Methods).

Our analysis successfully replicated some well-known findings and also provided novel insights to these complex traits (Figure 4; Supplementary Figure 3). For schizophrenia, enrichment in brain is much stronger than in any other tissue type (p = 6.52×10⁻²⁶), while highly significant enrichment could be observed in heart (p = 2.30×10⁻⁷) and blood (p = 1.65×10⁻⁵) as well. Brain is also the most enriched tissue for anorexia nervosa (p = 4.86×10⁻²) despite the substantially weaker signal. For three autoimmune diseases (Crohn’s disease, ulcerative colitis, and rheumatoid arthritis), the strongest enrichment was in blood. However, solid enrichment in GI could also be observed for both Crohn’s disease and ulcerative colitis, but not rheumatoid arthritis. Sex-stratified summary statistics were available for two anthropometric traits - body mass index (BMI) and waist-hip ratio (WHR) adjusted for BMI^28,29. Therefore, we performed gender-specific analyses for these two traits. Consistent with recently published results²⁷, brain possesses the strongest enrichment for BMI. Interestingly, the enrichment in brain is stronger in female samples (p = 1.21×10⁻⁸) than in male samples (p = 2.39×10⁻⁶), while epithelial tissue may play a more important functional role in male samples (p = 1.34×10⁻³ in males and 2.83×10⁻² in females). Some patterns of gender-specific enrichment were also observed for WHR. GI is the dominant tissue for females (p = 5.26×10⁻⁵) but seems less important in male samples (p = 2.95×10⁻²), while enrichment in muscle is consistent between males and females.

Figure 4.

Tissue-specific enrichment of GWAS signals. Enrichment p-values were calculated using LD score regression. The grey line is the 0.05 cutoff for p-value.

It is worth noting that extra caution is needed when interpreting these enrichment results. For example, Finucane et al. reported connective/bone as the most enriched tissue type for human height²⁷, but GenoSkyline annotations for this tissue is not available at this moment due to incomplete epigenomic data (Online Methods). Similarly, we are not yet able to investigate the relationship between lipid traits and liver tissue because of the lack of tissue-relevant functionality data.

GWAS signal prioritization using tissue-specific functional annotations

We recently developed Genome Wide Association Prioritizer (GenoWAP), and showed that GWAS signals could be better prioritized through integrating GWAS summary statistics with GenoCanyon annotation¹⁰. From the results of tissue-specific enrichment analysis, it could be seen that some complex traits are strongly related to a few tissue types. In this section, we show that the performance of GWAS signal prioritization could be further improved through integrating GenoSkyline annotations of relevant tissue types.

Using both tissue-specific GS scores and GenoCanyon scores that quantify the overall functionality, we calculate the posterior probability P(Z_D = 1, Z_T = 1|p) to measure the importance of each SNP. In this calculation, Z_D is the indicator of disease/trait-specific functionality, Z_T is the indicator of tissue-specific functionality, and p is the p-value acquired from standard GWAS analysis (Online Methods). Psychiatric Genomics Consortium (PGC) has published two large GWAS meta-analyses for schizophrenia, a major psychiatric disorder. We applied our method to the smaller study³⁰ and attempted to replicate the findings of the larger study³¹. This analysis demonstrates GenoSkyline’s ability to prioritize association signals that are more likely to be replicated in a larger sample. These two studies will be referred to as PGC2011 and PGC2014 studies in the following discussion.

Enrichment analysis suggests that brain is the most enriched of schizophrenia GWAS signals compared with other tissue types, and strong enrichment could also be observed in heart and blood (Figure 4). For each SNP in the PGC2011 study, mean GenoCanyon score of its surrounding region and mean GS scores of brain, blood, and heart tissues were calculated (Online Methods). SNPs in these tissue-specific functional regions and the SNPs in general functional regions are all enriched for associations with schizophrenia (Figure 5a; Supplementary Figure 4). Notably, tissue-specific functional regions are more enriched for associations with schizophrenia relative to general functional regions, with blood showing the strongest enrichment. It is also worth noting that non-functional regions are enriched of GWAS associations as well, most likely due to the LD between functional and non-functional SNPs¹⁰.

Figure 5.

Prioritizing schizophrenia GWAS signals using GenoSkyline annotations. (a) Tissue-specific functional regions are more enriched of schizophrenia associations than generally functional regions and non-functional regions. (b) Enrichment of GTEx whole-blood eQTLs in top SNPs from PGC2011 study. (c) Enrichment of human brain quantitative trait loci in top SNPs from PGC2011 study. (d) Summary statistics at the schizophrenia-associated locus on chromosome 8q21 near MMP16 gene. The top and middle panel show p-values from PGC2011 and PGC2-14 studies, respectively. The bottom panel shows GenoSkyline annotations at this locus. (e) Locus plots for tissue-specific posterior scores. From top to bottom, the three panels show posterior scores of brain, heart, and blood tissues, respectively.

Next, we define a new SNP-level metric for the tissue-specific GenoSkyline posterior (GSP) scores (i.e. P(Z_D = 1, Z_T = 1|p)) of brain, blood, and heart, as well as the nonspecific functionality posterior (NSFP) scores (i.e. P(Z_D = 1|p); see Online Methods) for each SNP in PGC2011 study. Enrichment analysis using GTEx whole-blood eQTLs¹¹ found that the top SNPs based on tissue-specific GSP scores are substantially more enriched of eQTLs than NSFP scores and p-values. As expected, blood GSP scores showed the strongest enrichment of whole-blood eQTLs (Figure 5b). When using a set of quantitative trait loci in human brain³², tissue-specific GSP scores also showed superior performance, with the brain-specific scores dominating others as the number of top SNPs increase (Figure 5c and Supplementary Figure 5).

A total of 108 schizophrenia-associated loci were identified in the PGC2014 study. We removed three loci on chromosome X due to the absence of SNPs on sex chromosomes in the PGC2011 dataset. All the SNPs in the PGC2011 study were ranked based on their p-values, NSFP scores, and tissue-specific GSP scores, respectively (Supplementary Table 7). The maximum ranks at each of the 105 schizophrenia-associated loci based on these different criteria were then compared (Supplementary Table 8). Brain GSP score showed better performance in prioritizing these loci when compared with p-value. Sixty-seven out of 105 loci had an increased rank (p-value=0.003, one-sided binomial test). The performance of heart GSP score was slightly worse than brain-specific score, but still better than p-value ranking. Blood GSP score showed comparable performance with p-value ranking. Notably, the performance of brain and heart GSP scores was still significantly better than NSFP score, although NSFP score outperforms ranking based on p-value.

Tissue-specific functional annotations could provide even deeper insight when prioritizing SNPs locally at risk loci. The schizophrenia-associated locus on chromosome 8q21 is located in the intergenic region upstream of MMP16 gene (Figure 5d). The p-values in the PGC2014 study clearly suggested two signal peaks. One is located near the transcription start site of MMP16, while the other resides nearly 200,000 bases upstream and shows slightly stronger signal. However, the two-peak pattern was not clear in the PGC2011 study. Instead, two SNPs close to the end of the LD block near 89.8M showed the strongest signal. We compared the local predictions based on brain, heart, and blood-specific GSP scores at this locus (Figure 5e). Brain GSP scores successfully revealed the multi-peak nature at this locus, suggested the importance of the peak near 89.6Mb, and diminished the signal strength at the two SNPs near 89.8Mb, concurrent with the PGC2014 results. Although the method was applied on PGC2011 p-values, the results after prioritization matched the signal pattern in the PGC2014 study very well. Heart GSP scores also suggested the existence of the signal peak near 89.6M. However, the posterior scores have lower values, and the overall signal pattern does not match the PGC2014 study very well. The signal peak near 89.6M was completely lost in the blood-specific results. The two SNPs near 89.8M, however, had large GSP scores. The differences across tissue types are concurrent with GS scores at this locus (Figure 5d). Upstream of MMP16, near 89.6M, several functional segments can be seen in brain, only one remains in heart, and none exists in blood. Through comparing the tissue-specific prioritization results with the p-values in PGC2014 study, we see that brain-specific GSP scores had the strongest signal strength, which can be quantified using the local maximum GSP score (Online Methods). The highly matched signal pattern also suggested that brain might be the tissue type in which this locus plays a functional role.

Further insight on risk loci associated with coronary artery disease

Next, we applied our method to another GWAS to further illustrate the biological insight that GenoSkyline can provide for understanding complex diseases. The CARDIoGRAM consortium published a large-scale GWAS meta-analysis of coronary artery disease (CAD) comprising 22,333 cases and 64,762 controls³³, in which they replicated 10 out of 12 previously reported risk loci and identified 13 new loci associated with CAD. We applied our method on the summary statistics and used the local maximum GSP score to measure the relatedness between each risk locus and different tissue types (Online Methods). We removed the locus on chromosome 1q41 (MIA3) and the locus on chromosome 6q25.3 (LPA) due to incomplete data in the meta-analysis stage of CARDIoGRAM study. The remaining 21 CAD-associated loci are summarized in Table 1 and Supplementary Table 9.

View this table:

Table 1.

Risk loci for coronary artery disease

The first impression of these results is that despite the strong overall enrichment of GWAS signals (Figure 4), heart is the most relevant tissue type for only two loci. On the contrary, a substantial proportion of risk loci (9 out of 21) seem to be functional in the GI tissue. Interestingly, GI was the most enriched tissue type for several known risk factors for CAD including LDL and total cholesterol (Figure 4). These results suggest not only the larger effect sizes of CAD-associated loci in the gastrointestinal system, but also a substantial amount of undetected heart-related loci. Furthermore, brain was the least enriched tissue type for CAD GWAS signals, but the risk locus on chromosome 14q32.2 near HHIPL1 and CYP46A1 was predicted to be functional in brain. In fact, the CYP46A1 gene encodes for Cholesterol 24-hydroxylase that is present mainly in brain, where it converts cholesterol from degraded neurons into 24S-hydroxychoelesterol^33,34. This process is crucial for eliminating cholesterol from the brain since cholesterol is usually unable to pass the blood-brain barrier³⁵.

A larger GWAS for CAD was published during the preparation of this manuscript³⁶. This large study may be used to validate the performance of our approach when its summary statistics become publicly available in the future.

Discussion

In this paper, we introduced GenoSkyline, an integrative framework for predicting tissue-specific functional regions in the human genome. Through integrating GenoSkyline annotations with GWAS summary statistics, we illustrated a variety of ways that GenoSkyline could help researchers understand human complex diseases and traits. We also showed that the GenoSkyline framework is customizable so that researchers can develop their own functional annotations for a selected group of cells. As epigenomic ChIP-seq data become available for an increasing number of cell types in the future, GenoSkyline’s ability to facilitate studies of complex disease will be further enhanced.

Our approach is not without limitation. First, the annotation results are incomplete due to currently unavailable tissue types, and as a result, the GWAS enrichment results may not be comprehensive (e.g. liver may also be highly related to CAD, but there is no complete annotation data from liver yet). Second, some risk loci (or independent functional segments at the same locus) may play active roles in multiple tissue types. For example, in our PGC GWAS analysis, although local maximum GSP scores suggest that brain may be more relevant with the risk locus upstream of MMP16, two SNPs near 89.8MB are located near several functional segments in blood. Whether these SNPs can be functionally linked to schizophrenia remains to be investigated. Moreover, we emphasize that our method identifies regions of likely functionality, but does not provide conclusive proof of functionality for any individual SNP or locus. That said, our method still provides a simple and intuitive summary statistic that measures the relatedness between risk loci and sets of functionally related tissues. It has great potential to become a standard step in downstream GWAS analysis to help researchers generate new hypotheses regarding the etiology behind each risk locus.

The increasing accessibility of GWAS summary statistic datasets, coupled with the method’s independence from requiring individual-level genotype and phenotype, make Genoskyline tissue-specific prioritization useful and easy to implement. Moreover, GWAS signal integration is just one way to utilize GenoSkyline annotations. Its nucleotide-level functional prediction based on unsupervised learning and the good predictive performance in non-coding regions promise a potential role in many fields of genomics, such as next-generation sequencing studies and understanding somatic mutations. GenoSkyline scores of seven tissue types and two additional cell types have been pre-calculated for the entire human genome and can be readily downloaded. Source code is available for all major OSes and can be accessed at (http://genocanyon.med.yale.edu/GenoSkyline). We believe that GenoSkyline and its applications can guide genetics research at multiple resolutions and greatly benefit the broader scientific community.

Online Methods

Consolidated epigenomes

Epigenetic data were selected from the Epigenomics Roadmap Project’s 111 consolidated reference epigenomes database³ (http://egg2.wustl.edu/roadmap/) based on anatomy type and mark availability. Each tissue type is a clustering of relevant samples in order to contain at least one of each of the following: H3k4me1, H3k4me3, H3k36me3, H3k27me3, H3k9me3, H3k27ac, H3k9ac, and DNase I Hypersensitivity. Samples are reduced to a per-nucleotide binary encoding of presence or absence of narrow contiguous regions of ChIP-seq signal enrichment compared to input (Poisson p-value threshold of 0.01), and a union of all tissue-specific samples for that mark is taken. The set of 8 marks was chosen due to the well-understood, localized regulatory interactions of histone marks³⁷ and DNase I³⁸. We created nine unique tissue and cell type clusters based on these annotations (Supplementary Table 1) to represent common, physiologically-related organ systems. To reflect actual tissue-specific epigenetic behavior, a majority of samples chosen are primary tissues and cultures, and inclusion of immortalized cell lines has been kept to a minimum.

GenoSkyline model and estimation

Lu et al. previously proposed a method that applies unsupervised-learning techniques on genomic annotations to predict the functional potential of a genomic region⁴. Given a set of annotations A, we assume the joint distribution of A along the genome to be a mixture of annotations at locations with no functionality, i.e. f(A | Z = 0), and annotations at locations that are functional, i.e. f(A | Z = 1). We assume that each annotation in A is conditionally independent given Z, allowing the conditional joint density of A given Z to be factorized as

Since all annotations used are binary classifiers, the Bernoulli distribution was used to model the marginal functional likelihood given each individual annotation.

Assuming a prior probability π of being functional (π = P(Z = 1)), we can estimate the parameter p_ic of each annotation with the Expectation-Maximization (EM) algorithm, and calculate the posterior probability at a given genomic coordinate, referred to as the GS score.

We must estimate 17 parameters for each tissue tract.

Where

Parameters were estimated using the GWAS Catalog⁶, downloaded from the NHGRI website (http://www.genome.gov/gwastudies/), which at the time of download, contained 13,070 unique SNPs found to be significant in at least one published GWAS. These SNPs were expanded into 1k bp intervals, and formed a genome sampling covering 12,801,840 bp of the genome. While significant SNP associations are likely to tag the effects of nearby functional elements, the size and distance of these functional elements varies for each individual SNP. As a result, the total sampling serves as an effective and robust representation of functional and non-functional regions along the genome (Supplemental Notes).

Case studies of experimentally validated functional machinery

VISTA enhancers¹⁴ were downloaded from the VISTA Enhancer Browser (http://enhancer.lbl.gov/), where enhancers with E11.5 reporter staining experimental data were selected. Brain enhancers were selected based on staining results identifying any CNS-related tissues (neural tube, cranial nerve, hindbrain, mesenchyme derived from neural crest, trigeminal V, forebrain, and midbrain). Heart enhancers were enhancers identified for positive reporter results in the heart region of E11.5 mouse reporter assays. Blood vessels enhancers were identified by selecting for “blood vessels” expression pattern. Hypomethylated TE loci in H1ES, brain, breast, and blood were downloaded from http://epigenome.wustl.edu/TE_Methylation/. All genomic coordinates were converted to genome build hg19.

SNP prioritization using tissue-specific functional annotation

We identify three disjoint cases for a given GWAS SNP:

The SNP is in a genomic region that is functional for the given phenotype and tissue (Z_D = 1, Z_T = 1).
The SNP is in a genomic region that is functional for the given tissue, but that tissue has no functionality in the phenotype (Z_D = 0, Z_T = 1).
The SNP is in a genomic region that is not functional in the given tissue (Z_T = 0).

A useful metric for prioritizing SNPs is the conditional probability that the SNP is classified under case-I given its p-value in a given GWAS study, i.e. P(Z_D = 1, Z_T = 1|p). We can calculate this probability by employing Bayes formula and considering all three cases as follows:

First, the case in which Z_T = 0 can be directly identified by assigning each SNP a prior probability of tissue-specific functionality (i.e. P(Z_T = 1)) defined as the average GS score of its surrounding 10,000 base pairs for that tissue (Supplementary Notes). We partition all the SNPs into two subgroups based on a mean GS score threshold of 0.1, although these probabilities take on a bimodal distribution and are not sensitive to changing threshold¹⁰. In this way, we can use these partitions to directly estimate f(p|Z_T = 0) by applying density estimation techniques on the SNP subgroup with low GS scores. More specifically, we apply histogram for density estimation and use cross validation to choose the optimal number of bins.

Second, we estimate the p-value density of our second case, where Z_D = 0 and Z_T = 1. We can intuitively assume that SNPs that are functional in a tissue but not relevant to the phenotype will have similar p-value behavior to all other SNPs that are not relevant to the phenotype, which in turn behave similarly to SNPs that are not functional at all (Supplementary Notes). More formally, we can describe this relationship as follows:

We can effectively estimate f(p|Z = 0) by using a similar approach to estimating f(p|Z_T = 0), but partitioning SNPs using the general functionality GenoCanyon score instead of tissue-specific GS score.

Next, we consider the following formulas.

The prior probability P(Z_T = 1) can be calculated directly from GS scores as stated above, but the conditional probabilities of disease-specific functionality given tissue-specific functionality remains to be estimated.

Finally, we estimate all the remaining terms in formula 4 using the EM algorithm. In the first step of the estimation procedure, we acquired the subset of SNPs located in tissue-specific functional regions. The p-value distribution of these SNPs is the following mixture.

Density f(p|Z_D = 0, Z_T = 1) has been estimated in earlier steps. Applying the findings of Chung et al., we assume a beta distribution of the p-values of functional SNPs (i.e. f(p|Z_D = 1, Z_T = 1)) as a reasonable approximation under general assumptions of SNP effect size⁹.

The EM algorithm is then applied to the SNP subset located in tissue-specific functional regions. The beta assumption guarantees a closed-form expression in each iteration and all the remaining parameters can be subsequently estimated. We now have all the necessary terms for equation 4, and define this as our posterior probability score of tissue-specific disease functionality (GSP score). The feature of integrating tissue-specific functional annotations to prioritize GWAS signals has been added to the GenoWAP software available on our server (http://genocanyon.med.yale.edu/GenoSkyline).

SNP prioritization using GenoCanyon annotation

Non-tissue specific GenoCanyon scores are assigned to GWAS signals using GenoWAP¹⁰. Briefly, GenoWAP calculates the posterior score P(Z_D = 1|p) using a simpler model for functionality.

This conditional probability can be calculated similarly to GS scores, making use of equation (5) to empirically estimate f(p|Z_D = 0), a beta distribution on partitioned Genocanyon scores (calculated with 22 tissue non-specific ENCODE functionality annotations⁴) to estimate f(p|Z_D = 1), and the EM algorithm on the functional marker p-value density to calculate P(Z_D = 1) as described in Lu et al. These are referred to in the results as the NSFP scores to which GenoSkyline SNP prioritization is compared.

Calculating tissue-specific enrichment using LD score regression

Enrichment of GenoSkyline-derived tissue-specific annotations in GWAS summary statistics was calculated using stratified LD score regression²⁷. First, tissue-specific annotations were computed using GenoSkyline scores, 1000 Genomes data of European ancestry³⁹ and a 1-centiMorgan (cM) window. Then the annotations were analyzed by adding each one of them to the full baseline model to control for 53 categories of general annotations. For each tissue-specific annotation, partitioned heritability was estimated using stratified LD score regression²⁷ and enrichment was then calculated as the ratio of proportion of SNP heritability explained by the annotation and proportion of SNPs in the annotation.

Measuring relevant tissue types for GWAS risk loci

A large GSP score is obtained if the p-value for the SNP is small and the SNP is located in a highly functional region for the tissue type under investigation. Therefore, the maximal GSP score at a risk locus effectively measures how well the p-values match the pattern of GenoSkyline annotations, thereby measuring the relatedness between the GWAS locus and different tissue types. For each tissue, the maximal GSP score is acquired at the risk locus of interest. These scores are then compared across tissue types. The largest score is referred to as local maximum GSP score, and the corresponding tissue type is predicted to be the most relevant tissue.

Bioinformatics tools

Locus plots were generated using LocusZoom⁴⁰. The “ggbio” R package⁴¹ was used to plot genes. The “bigmemory” R package⁴² was used to access and manipulate massive dataset.

Author Contributions

Q.L. and R.L.P. conceived the project, wrote the initial draft, and performed the analyses. Q.W. performed tissue-specific enrichment analysis. B.J.H. performed one heart-related case study. H.Z. advised on statistical and genetic issues.

Acknowledgements

This study was supported in part by the National Institutes of Health grants R01 GM59507, the VA Cooperative Studies Program of the Department of Veterans Affairs, Office of Research and Development, and the Yale World Scholars Program sponsored by the China Scholarship Council.

References

1.↵
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–82 (2011).
OpenUrl CrossRef PubMed Web of Science
2.↵
Bernstein, B.E. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
OpenUrl CrossRef PubMed Web of Science
3.↵
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
OpenUrl CrossRef PubMed
4.↵
Lu, Q. et al. A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data. Sci. Rep. 5(2015).
5.↵
Efron, B. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, (Cambridge University Press, 2010).
6.↵
Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106, 9362–7 (2009).
OpenUrl Abstract/FREE Full Text
7.↵
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. (2014).
8.
Pickrell, J.K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. The American Journal of Human Genetics 94, 559–573 (2014).
OpenUrl CrossRef PubMed
9.↵
Chung, D., Yang, C., Li, C., Gelernter, J. & Zhao, H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. (2014).
10.↵
Lu, Q., Yao, X., Hu, Y. & Zhao, H. GenoWAP: Post-GWAS Prioritization Through Integrated Analysis of Genomic Functional Annotation. bioRxiv, 019539 (2015).
11.↵
Ardlie, K.G. et al. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science 348, 648–660 (2015).
OpenUrl Abstract/FREE Full Text
12.↵
Kellis, M. et al. Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A (2014).
13.↵
King, D.C. et al. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res 15, 1051–60 (2005).
OpenUrl Abstract/FREE Full Text
14.↵
Pennacchio, L.A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006).
OpenUrl CrossRef PubMed Web of Science
15.↵
Kamm, G.B., Pisciottano, F., Kliger, R. & Franchini, L.F. The developmental brain gene NPAS3 contains the largest number of accelerated regulatory sequences in the human genome. Mol Biol Evol 30, 1088–102 (2013).
OpenUrl CrossRef PubMed Web of Science
16.↵
Liang, Y., Ridzon, D., Wong, L. & Chen, C. Characterization of microRNA expression profiles in normal human tissues. BMC Genomics 8, 166 (2007).
OpenUrl CrossRef PubMed
17.↵
Jandreski, M.A., Sole, M.J. & Liew, C.C. Two different forms of beta myosin heavy chain are expressed in human striated muscle. Hum Genet 77, 127–31 (1987).
OpenUrl CrossRef PubMed Web of Science
18.↵
Gorza, L. et al. Myosin types in the human heart. An immunofluorescence study of normal and hypertrophied atrial and ventricular myocardium. Circ Res 54, 694–702 (1984).
OpenUrl Abstract/FREE Full Text
19.↵
Baldwin, K.M. & Haddad, F. Effects of different activity and inactivity paradigms on myosin heavy chain gene expression in striated muscle. J Appl Physiol (1985) 90, 345–57 (2001).
OpenUrl CrossRef PubMed Web of Science
20.↵
Gupta, M.P. Factors controlling cardiac myosin-isoform shift during hypertrophy and heart failure. J Mol Cell Cardiol 43, 388–403 (2007).
OpenUrl CrossRef PubMed Web of Science
21.↵
Miyata, S., Minobe, W., Bristow, M.R. & Leinwand, L.A. Myosin heavy chain isoform expression in the failing and nonfailing human heart. Circ Res 86, 386–90 (2000).
OpenUrl Abstract/FREE Full Text
22.↵
Lowes, B.D. et al. Myocardial gene expression in dilated cardiomyopathy treated with beta-blocking agents. N Engl J Med 346, 1357–65 (2002).
OpenUrl CrossRef PubMed Web of Science
23.↵
van Rooij, E. et al. A family of microRNAs encoded by myosin genes governs myosin expression and muscle performance. Dev Cell 17, 662–73 (2009).
OpenUrl CrossRef PubMed Web of Science
24.↵
Callis, T.E. et al. MicroRNA-208a is a regulator of cardiac hypertrophy and conduction in mice. J Clin Invest 119, 2772–86 (2009).
OpenUrl CrossRef PubMed Web of Science
25.↵
VanderMeer, J.E. & Ahituv, N. cis-regulatory mutations are a genetic cause of human limb malformations. Dev Dyn 240, 920–30 (2011).
OpenUrl CrossRef PubMed
26.↵
Xie, M. et al. DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape. Nat Genet 45, 836–41 (2013).
OpenUrl CrossRef PubMed
27.↵
Finucane, H.K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics (2015).
28.↵
Locke, A.E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
OpenUrl CrossRef PubMed
29.↵
Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187–196 (2015).
OpenUrl CrossRef PubMed Web of Science
30.↵
Consortium, S.P.G.-W.A.S. Genome-wide association study identifies five new schizophrenia loci. Nature genetics 43, 969–976 (2011).
OpenUrl CrossRef PubMed
31.↵
Consortium, S.W.G.o.t.P.G. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
OpenUrl CrossRef PubMed Web of Science
32.↵
Gibbs, J.R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet 6, e1000952 (2010).
OpenUrl CrossRef PubMed
33.↵
Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics 43, 333–338 (2011).
OpenUrl CrossRef PubMed
34.↵
Pikuleva, I.A. Cytochrome P450s and cholesterol homeostasis. Pharmacol Ther 112, 761–73 (2006).
OpenUrl CrossRef PubMed Web of Science
35.↵
Russell, D.W. The enzymes, regulation, and genetics of bile acid synthesis. Annu Rev Biochem 72, 137–74 (2003).
OpenUrl CrossRef PubMed Web of Science
36.↵
Consortium, C.D. A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nature Genetics (2015).
37.↵
Bannister, A.J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell research 21, 381–395 (2011).
OpenUrl CrossRef PubMed Web of Science
38.↵
Crawford, G.E. et al. Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proceedings of the National Academy of Sciences of the United States of America 101, 992–997 (2004).
OpenUrl Abstract/FREE Full Text
39.↵
Genomes Project, C. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
OpenUrl CrossRef PubMed Web of Science
40.↵
Pruim, R.J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
OpenUrl CrossRef PubMed Web of Science
41.↵
Yin, T., Cook, D. & Lawrence, M. ggbio: an R package for extending the grammar of graphics for genomic data. Genome Biol 13, R77 (2012).
OpenUrl CrossRef PubMed
42.↵
Kane, M.J., Emerson, J.W. & Weston, S. Scalable Strategies for Computing with Massive Data. Journal of Statistical Software 55, 1–19 (2013).
OpenUrl
43.↵
Bulik-Sullivan, B.K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics 47, 291–295 (2015).
OpenUrl CrossRef PubMed
44.
Boraska, V. et al. A genome-wide association study of anorexia nervosa. Molecular psychiatry 19, 1085–1094 (2014).
OpenUrl CrossRef PubMed Web of Science
45.
Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat Genet 42, 1118–25 (2010).
OpenUrl CrossRef PubMed Web of Science
46.
Anderson, C.A. et al. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat Genet 43, 246–52 (2011).
OpenUrl CrossRef PubMed Web of Science
47.
Stahl, E.A. et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nature genetics 42, 508–514 (2010).
OpenUrl CrossRef PubMed Web of Science
48.
Consortium, G.L.G. Discovery and refinement of loci associated with lipid levels. Nature genetics 45, 1274–1283 (2013).
OpenUrl CrossRef PubMed
49.
Morris, A.P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature genetics 44, 981 (2012).
OpenUrl CrossRef PubMed
50.
Estrada, K. et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nature genetics 44, 491–501 (2012).
OpenUrl CrossRef PubMed
51.
Locke, A.E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
OpenUrl CrossRef PubMed
52.
Wood, A.R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature genetics 46, 1173–1186 (2014).
OpenUrl CrossRef PubMed

View the discussion thread.

Posted October 06, 2015.

Download PDF

Citation Tools

Subject Area

Bioinformatics

Subject Areas

All Articles

Animal Behavior and Cognition (5200)
Biochemistry (11703)
Bioengineering (8722)
Bioinformatics (29127)
Biophysics (14932)
Cancer Biology (12048)
Cell Biology (17359)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14143)
Epidemiology (2067)
Evolutionary Biology (18268)
Genetics (12220)
Genomics (16766)
Immunology (11841)
Microbiology (28005)
Molecular Biology (11552)
Neuroscience (60808)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4939)
Plant Biology (10384)
Scientific Communication and Education (1679)
Synthetic Biology (2877)
Systems Biology (7333)
Zoology (1642)

[1] 1.↵
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–82 (2011).
OpenUrl CrossRef PubMed Web of Science

[2] 2.↵
Bernstein, B.E. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
OpenUrl CrossRef PubMed Web of Science

[3] 3.↵
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
OpenUrl CrossRef PubMed

[4] 4.↵
Lu, Q. et al. A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data. Sci. Rep. 5(2015).

[5] 5.↵
Efron, B. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, (Cambridge University Press, 2010).

[6] 6.↵
Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106, 9362–7 (2009).
OpenUrl Abstract/FREE Full Text

[7] 7.↵
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. (2014).

[8] 8.
Pickrell, J.K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. The American Journal of Human Genetics 94, 559–573 (2014).
OpenUrl CrossRef PubMed

[9] 9.↵
Chung, D., Yang, C., Li, C., Gelernter, J. & Zhao, H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. (2014).

[10] 10.↵
Lu, Q., Yao, X., Hu, Y. & Zhao, H. GenoWAP: Post-GWAS Prioritization Through Integrated Analysis of Genomic Functional Annotation. bioRxiv, 019539 (2015).

[11] 11.↵
Ardlie, K.G. et al. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science 348, 648–660 (2015).
OpenUrl Abstract/FREE Full Text

[12] 12.↵
Kellis, M. et al. Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A (2014).

[13] 13.↵
King, D.C. et al. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res 15, 1051–60 (2005).
OpenUrl Abstract/FREE Full Text

[14] 14.↵
Pennacchio, L.A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006).
OpenUrl CrossRef PubMed Web of Science

[15] 15.↵
Kamm, G.B., Pisciottano, F., Kliger, R. & Franchini, L.F. The developmental brain gene NPAS3 contains the largest number of accelerated regulatory sequences in the human genome. Mol Biol Evol 30, 1088–102 (2013).
OpenUrl CrossRef PubMed Web of Science

[16] 16.↵
Liang, Y., Ridzon, D., Wong, L. & Chen, C. Characterization of microRNA expression profiles in normal human tissues. BMC Genomics 8, 166 (2007).
OpenUrl CrossRef PubMed

[17] 17.↵
Jandreski, M.A., Sole, M.J. & Liew, C.C. Two different forms of beta myosin heavy chain are expressed in human striated muscle. Hum Genet 77, 127–31 (1987).
OpenUrl CrossRef PubMed Web of Science

[18] 18.↵
Gorza, L. et al. Myosin types in the human heart. An immunofluorescence study of normal and hypertrophied atrial and ventricular myocardium. Circ Res 54, 694–702 (1984).
OpenUrl Abstract/FREE Full Text

[19] 19.↵
Baldwin, K.M. & Haddad, F. Effects of different activity and inactivity paradigms on myosin heavy chain gene expression in striated muscle. J Appl Physiol (1985) 90, 345–57 (2001).
OpenUrl CrossRef PubMed Web of Science

[20] 20.↵
Gupta, M.P. Factors controlling cardiac myosin-isoform shift during hypertrophy and heart failure. J Mol Cell Cardiol 43, 388–403 (2007).
OpenUrl CrossRef PubMed Web of Science

[21] 21.↵
Miyata, S., Minobe, W., Bristow, M.R. & Leinwand, L.A. Myosin heavy chain isoform expression in the failing and nonfailing human heart. Circ Res 86, 386–90 (2000).
OpenUrl Abstract/FREE Full Text

[22] 22.↵
Lowes, B.D. et al. Myocardial gene expression in dilated cardiomyopathy treated with beta-blocking agents. N Engl J Med 346, 1357–65 (2002).
OpenUrl CrossRef PubMed Web of Science

[23] 23.↵
van Rooij, E. et al. A family of microRNAs encoded by myosin genes governs myosin expression and muscle performance. Dev Cell 17, 662–73 (2009).
OpenUrl CrossRef PubMed Web of Science

[24] 24.↵
Callis, T.E. et al. MicroRNA-208a is a regulator of cardiac hypertrophy and conduction in mice. J Clin Invest 119, 2772–86 (2009).
OpenUrl CrossRef PubMed Web of Science

[25] 25.↵
VanderMeer, J.E. & Ahituv, N. cis-regulatory mutations are a genetic cause of human limb malformations. Dev Dyn 240, 920–30 (2011).
OpenUrl CrossRef PubMed

[26] 26.↵
Xie, M. et al. DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape. Nat Genet 45, 836–41 (2013).
OpenUrl CrossRef PubMed

[27] 27.↵
Finucane, H.K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics (2015).

[28] 28.↵
Locke, A.E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
OpenUrl CrossRef PubMed

[29] 29.↵
Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187–196 (2015).
OpenUrl CrossRef PubMed Web of Science

[30] 30.↵
Consortium, S.P.G.-W.A.S. Genome-wide association study identifies five new schizophrenia loci. Nature genetics 43, 969–976 (2011).
OpenUrl CrossRef PubMed

[31] 31.↵
Consortium, S.W.G.o.t.P.G. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
OpenUrl CrossRef PubMed Web of Science

[32] 32.↵
Gibbs, J.R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet 6, e1000952 (2010).
OpenUrl CrossRef PubMed

[33] 33.↵
Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics 43, 333–338 (2011).
OpenUrl CrossRef PubMed

[34] 34.↵
Pikuleva, I.A. Cytochrome P450s and cholesterol homeostasis. Pharmacol Ther 112, 761–73 (2006).
OpenUrl CrossRef PubMed Web of Science

[35] 35.↵
Russell, D.W. The enzymes, regulation, and genetics of bile acid synthesis. Annu Rev Biochem 72, 137–74 (2003).
OpenUrl CrossRef PubMed Web of Science

[36] 36.↵
Consortium, C.D. A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nature Genetics (2015).

[37] 37.↵
Bannister, A.J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell research 21, 381–395 (2011).
OpenUrl CrossRef PubMed Web of Science

[38] 38.↵
Crawford, G.E. et al. Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proceedings of the National Academy of Sciences of the United States of America 101, 992–997 (2004).
OpenUrl Abstract/FREE Full Text

[39] 39.↵
Genomes Project, C. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
OpenUrl CrossRef PubMed Web of Science

[40] 40.↵
Pruim, R.J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
OpenUrl CrossRef PubMed Web of Science

[41] 41.↵
Yin, T., Cook, D. & Lawrence, M. ggbio: an R package for extending the grammar of graphics for genomic data. Genome Biol 13, R77 (2012).
OpenUrl CrossRef PubMed

[42] 42.↵
Kane, M.J., Emerson, J.W. & Weston, S. Scalable Strategies for Computing with Massive Data. Journal of Statistical Software 55, 1–19 (2013).
OpenUrl

[43] 43.↵
Bulik-Sullivan, B.K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics 47, 291–295 (2015).
OpenUrl CrossRef PubMed

[44] 44.
Boraska, V. et al. A genome-wide association study of anorexia nervosa. Molecular psychiatry 19, 1085–1094 (2014).
OpenUrl CrossRef PubMed Web of Science

[45] 45.
Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat Genet 42, 1118–25 (2010).
OpenUrl CrossRef PubMed Web of Science

[46] 46.
Anderson, C.A. et al. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat Genet 43, 246–52 (2011).
OpenUrl CrossRef PubMed Web of Science

[47] 47.
Stahl, E.A. et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nature genetics 42, 508–514 (2010).
OpenUrl CrossRef PubMed Web of Science

[48] 48.
Consortium, G.L.G. Discovery and refinement of loci associated with lipid levels. Nature genetics 45, 1274–1283 (2013).
OpenUrl CrossRef PubMed

[49] 49.
Morris, A.P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature genetics 44, 981 (2012).
OpenUrl CrossRef PubMed

[50] 50.
Estrada, K. et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nature genetics 44, 491–501 (2012).
OpenUrl CrossRef PubMed

[51] 51.
Locke, A.E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
OpenUrl CrossRef PubMed

[52] 52.
Wood, A.R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature genetics 46, 1173–1186 (2014).
OpenUrl CrossRef PubMed