Abstract
Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here, we analyze a broad set of functional elements, including cell-type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits spanning a total of 1.3 million phenotype measurements. To enable this analysis, we introduce a new method for partitioning heritability from GWAS summary statistics while controlling for linked markers. This new method is computationally tractable at very large sample sizes, and leverages genome-wide information. Our results include a large enrichment of heritability in conserved regions across many traits; a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers; and many cell-type-specific enrichments including significant enrichment of central nervous system cell types in body mass index, age at menarche, educational attainment, and smoking behavior. These results demonstrate that GWAS can aid in understanding the biological basis of disease and provide direction for functional follow-up.
Introduction
In GWAS of complex traits, much of the heritability lies in single-nucleotide polymorphisms (SNPs) that do not reach genome-wide significance at current sample sizes.1, 2 However, many current approaches that leverage functional information3, 4 and GWAS data to inform disease biology use only SNPs in genome-wide significant loci,5-7 assume only one causal SNP per locus,8 or do not account for LD.9 We can improve power by estimating the proportion of SNP heritability1 attributable to various functional categories, using information from all SNPs and explicitly modeling LD.
Previous work on partitioning SNP heritability has used restricted maximum likelihood (REML) as implemented in GCTA.10-13 REML requires individual genotypes, but many of the largest GWAS analyses are conducted through meta-analysis of study-specific results, and so typically only summary statistics, not individual genotypes, are available for these studies. Even when individual genotypes are available, using REML to analyze multiple functional categories becomes computationally intractable at sample sizes in the tens of thousands. Here, we introduce a method for partitioning heritability, stratified LD score regression, that requires only GWAS summary statistics and LD information from an external reference panel that matches the population studied in the GWAS.
We apply our novel approach to 17 complex diseases and traits spanning 1,263,072 phenotype measurements. We first analyze non-cell-type-specific annotations and identify heritability enrichment in many of these functional annotations, including a large enrichment in conserved regions across many traits and a very large immunological disease-specific enrichment in FANTOM5 enhancers. We then analyze cell-type-specific annotations and identify many cell-type-specific heritability enrichments, including enrichment of central nervous system (CNS) cell types in body mass index, age at menarche, educational attainment, and smoking behavior.
Results
Overview of methods
Our method for partitioning heritability from summary statistics, called stratified LD score regression, relies on the fact that the χ2 association statistic for a given SNP includes the effects of all SNPs that it tags.14, 15 Thus, for a polygenic trait, SNPs with high linkage disequilibrium (LD) will have higher χ2 statistics on average than SNPs with low LD.15 This might be driven either by the higher likelihood of these SNPs to tag an individual large effect, or their ability to tag multiple weak effects. If we partition SNPs into functional categories with different contributions to heritability, then LD to a category that is enriched for heritability will increase the χ2 statistic of a SNP more than LD to a category that does not contribute to heritability. Thus, our method determines that a category of SNPs is enriched for heritability if SNPs with high LD to that category have higher χ2 statistics than SNPs with low LD to that category.
More precisely, under a polygenic model,1 the expected χ2 statistic of SNP j is where N is sample size, C indexes disjoint categories, ℓ(j, C) is the LD Score of SNP j with respect to category C (defined as ℓ(j,C): = ΣkϵCr2(j,k)), a is a term that measures the contribution of confounding biases,15 and τC is the per-SNP heritability in category C (Methods). Equation (1) allows us to estimate τC via a (computationally simple) multiple regression of against ℓ(j, C). The method easily generalizes to overlapping categories and case-control studies16 (Methods). We define the enrichment of a category to be the proportion of SNP heritability explained divided by the proportion of SNPs. We estimate standard errors with a block jackknife,15 and use these standard errors to calculate z-scores, P-values, and FDRs (Methods). We have released open-source software implementing the method (Web Resources).
To apply LD score regression (or REML) we must first specify which categories we include in our model. We created a “full baseline model” from 24 main annotations that are not specific to any cell type (Table S1; see Methods and Web Resources). Below, we show that including many categories in our model leads to more accurate estimates of enrichment. The 24 main annotations include: coding, UTR, promoter, and intron;17 histone marks H3K4me1, H3K4me3, H3K9ac5 and two versions of H3K27ac;18, 19 open chromatin reflected by DNase I hypersensitivity Site (DHS) regions;5 combined chromHMM/Segway predictions,20 which are a computational combination of many ENCODE annotations into a single partition of the genome into seven underlying “chromatin states,”; regions that are conserved in mammals;21 super-enhancers, which are large clusters of highly active enhancers;19 and active enhancers from the FANTOM5 panel of samples, which we call FANTOM5 enhancers.22 For the histone marks and other annotations that differ among cell types, we combined the different cell types into a single annotation for the baseline model by taking a union. To prevent our estimates from being biased upwards by enrichment in nearby regions,13 we also included 500bp windows around each functional category as separate functional categories in the baseline model, as well as 100bp windows around ChIP-seq peaks when appropriate (see Methods). This yielded a total of 53 (overlapping) functional categories in the baseline model, including a category containing all SNPs.
To prevent our estimates from being biased upwards by enrichment in nearby regions,13 we also included 500bp windows around each functional category as separate categories in the full baseline model, as well as 100bp windows around ChIP-seq peaks when appropriate (Methods). The 24 main annotations plus additional windows and a category containing all SNPs yielded a total of 53 (overlapping) functional categories in the full baseline model.
Simulation results
In order to assess the robustness of the method, we performed a variety of simulations. We used chromosome 1 genotypes of controls from the Wellcome Trust Case Control Consortium,23 imputed using a 1000 Genomes reference panel.24 After quality control, we had 2,680 individuals and 360,106 SNPs (Methods). We simulated quantitative phenotypes with a heritability of 0.5 using an additive model. For each simulation, SNP effect sizes were drawn from a normal distribution with mean zero and variance (i.e., average per-SNP heritability) determined by functional categories. We estimated the enrichment of the DHS category, i.e., (%h2)/(%SNPs), using three methods:(1) REML with two categories (DHS/Other), (2) LD score regression with two categories (DHS/Other), and (3) LD score regression with the full baseline model (53 categories, described above). Since REML with 53 categories did not converge at this sample size and would be computationally intractable at sample sizes in the tens of thousands, we did not include it in our comparison; an advantage of LD score regression is that it is possible to include a large number of categories in the underlying model. We report means and standard errors of the mean over 100 independent simulations.
We first performed three sets of simulations where the causal pattern of enrichment was well modeled by the two-category (DHS/Other) model; for these, all three methods performed well, although LD score regression with the full baseline model had larger standard errors around the mean (Figure 1a). For example, the standard errors around the mean in simulations with no DHS enrichment were 0.08 for REML, 0.11 for two-category LD score regression and 0.19 for LD score regression with the full baseline model. For the first set of simulations, all SNPs were causal and SNP effect sizes were drawn i.i.d. from a normal distribution. For the second set of simulations, all SNPs were causal and SNP effect sizes were drawn independently from a normal distribution, but the variance of the normal distribution depended on whether the SNP was in a DHS region, and two variances (DHS and Other) were chosen so that the proportion of heritability of DHS would be 3x more than the proportion of SNPs. For the third set of simulations, only SNPs in DHS regions were causal, and effect sizes of DHS SNPs were drawn i.i.d. from a normal distribution.
Next, to explore the realistic scenario where the model used to estimate enrichment does not match the (unknown) causal model, we performed three sets of simulations where all causal SNPs were in a particular category, but the model used to estimate heritability did not include this causal category. The three sets of simulations were (1) all causal SNPs in coding regions, yielding 1.6x DHS enrichment due to coding/DHS overlap, (2) all causal SNPs in FANTOM5 enhancers, yielding 4.0x DHS enrichment due to FANTOM5 enhancer/DHS overlap, and (3) all causal SNPs in 200bp DHS flanking regions, yielding 0x DHS enrichment. For the coding and FANTOM5 enhancer causal simulations, we made the full baseline model into a misspecified model by removing the causal category (and windows around the causal category). Results from these simulations are displayed in Figure 1b).
The two-category estimators were not robust to model misspecification and consistently over-estimated DHS enrichment by a wide margin. LD score regression with the full baseline model gave more accurate mean estimates of enrichment. Specifically, for the simulations with coding and FANTOM5 Enhancers causal, LD score regression with the full baseline model gave unbiased mean enrichment estimates of 1.8x (s.e. 0.22) and 4.2x (s.e. 0.22), respectively, while the mean enrichment estimates of REML and two-category LD score regression were nearly double these. The full baseline model includes a 500bp window around DHS but not a 200bp window, and gave a mean estimated DHS enrichment of 0.65x (s.e. 0.22) when the 200bp flanking regions were causal, which is inflated relative to the true enrichment of 0x but much less inflated than > 3x mean enrichment estimates given by the two-category methods.
In summary, while these simulations include exaggerated patterns of enrichment (e.g., 100% of heritability in DHS flanking regions), the results highlight the possibility that two-category estimators of enrichment can yield incorrect conclusions. Although we cannot entirely rule out model misspecification as a source of bias for LD score regression with the full baseline model, we have shown here that it is robust to a wide variety of patterns of enrichment, because including many categories gives it the flexibility to adapt to the unknown causal model.
Application to real data
We applied LD score regression with the full baseline model to 17 diseases and quantitative traits: height, BMI, age at menarche, LDL levels, HDL levels, triglyceride levels, coronary artery disease, type 2 diabetes, fasting glucose levels, schizophrenia, bipolar disorder, anorexia, educational attainment, smoking behavior, rheumatoid arthritis, Crohn’s disease, and ulcerative colitis (Table S2, Web resources). This includes all traits with publicly available summary statistics with sufficient sample size and SNP heritability, measured by the z-score of total SNP-heritability (Methods), spanning a total of of 1,263,072 unique phenotype measurements. We removed the MHC region from all analyses, due to its unusual LD and genetic architecture. Figure 2 shows results for the 24 main functional annotations, averaged across nine independent traits (Methods). Figure 3 shows trait-specific results for selected annotations and traits (Methods). Tables S3 and S4 show meta-analysis and trait-specific results for all traits and all 53 categories in the full baseline model.
We observed large and statistically significant enrichments for many functional categories. A few categories stood out in particular. First, regions conserved in mammals21 showed the largest enrichment of any category, with 2.6% of SNPs explaining an estimated 35% of SNP heritability on average across traits (P < 10−15 for enrichment). This is a significantly higher average enrichment than for coding regions, and provides evidence for the biological importance of conserved regions, even though the biochemical function of many conserved regions remains uncharacterized.25 Second, FANTOM5 Enhancers22 were extremely enriched in the three immunological diseases, with 0.4% of SNPs explaining an estimated 15% of SNP heritability on average across these three diseases (P < 10−5), but showed no evidence of enrichment for non-immunological traits (Figure 3). Third, repressed regions were depleted: 46% of SNPs explain only 29% of heritability on average (P < 0.006), consistent with the hypothesis that these are regions of low activity.20 We did not see a large enrichment of H3K27ac regions marked as super-enhancers over all H3K27ac regions; the estimates for enrichment were 1.8x (s.e. 0.2) and 1.6x (s.e. 0.1), respectively. This lack of enrichment supports the argument that super-enhancers may not play a much more important role in regulating transcription than regular enhancers.26 For many annotations, there was also enrichment in the 500bp flanking regions (Table S3). Analyses stratified by minor allele frequency produced broadly similar results for all of these enrichments (Table S5; see Methods).
We performed two different cell-type-specific analyses: an analysis of 220 individual cell-type-specific annotations, and an analysis of 10 cell-type groups. The 220 individual cell-type-specific annotations are a combination of cell-type-specific annotations from four histone marks: 77 from H3K4me1,5 81 from H3K4me3,5 27 from H3K9ac,5 and 35 from H3K27ac18 (Table S6, Methods). When ranking these 220 cell-type-specific annotations, we wanted to control for overlap with the functional categories in the full baseline model, but not for overlap with the 219 other cell-type-specific annotations. Thus, we added the 220 cell-type-specific annotations individually, one at a time, to the full baseline model, and ranked these 220 annotations by the P-value for the coefficient corresponding to the annotation. This P-value tests whether the annotationcontributes significantly to per-SNP heritability after controlling for the effects of the annotations in the full baseline model. We assessed statistical significance at the 0.05 level after Bonferroni correction for 220 × 17 = 3, 740 hypotheses tested. (This is conservative, since the 220 annotations are not independent.) We also report results with false discovery rate (FDR) < 0.05 (computed over 220 cell types for each trait). For 15 of the 17 traits, the top cell type passed an FDR threshold of 0.05. The top cell type for each trait is displayed in Table 1, with additional top cell types reported in Table S7.
We combined information from related cell types by aggregating the 220 cell-type-specific annotations into 10 groups (Figure 4 legend and Table S6; see Methods). For each trait, we performed the same analysis on the 10 group-specific annotations as with the 220 cell-type-specific annotations. We assessed statistical significance at the 0.05 level after Bonferroni correction for 10 × 17 = 170 hypotheses tested, and we again also report results with false discovery rate (FDR) < 0.05 (now computed over all cell-type groups and traits). For 16 of the 17 traits (all traits except anorexia), the top cell type group passed an FDR threshold of 0.05. Results for the 11 traits with the most significant enrichments (after pruning closely related traits) are shown in Figure 4, with remaining traits in Figure S1.
These two analyses are generally concordant, and show highly trait-specific patterns of cell-type enrichment. They also recapitulate several well-known findings. For example, the top cell type for each of the three lipid traits is liver (FDR < 0.05 for all three traits). For both type 2 diabetes and fasting glucose, the top cell type is pancreatic islets (FDR < 0.05 for fasting glucose but not type 2 diabetes). For the three psychiatric traits, the top cell type is a brain cell type (FDR < 0.05 for schizophrenia and bipolar disorder but not for anorexia) and the top cell-type group is CNS (significant after multiple testing for schizophrenia and bipolar disorder but not for anorexia).
There are also several new insights among these results. For example, the three immunological disorders show patterns of enrichment that reflect biological differences among the three disorders. Crohn’s disease has 40 cell types with FDR < 0.05, of which 39 are immune cell types and one (colonic mucosa) is a GI cell type. On the other hand, the 39 cell types with FDR < 0.05 for ulcerative colitis include nine GI cell types in addition to 30 immune cell types, whereas all 39 cell types with FDR < 0.05 for rheumatoid arthritis are immune cell types. The top cell type for all three traits is CD4+ CD25-IL17+ PMA Ionomycin simulated Th17 primary. Th17 cells are thought to act in opposition to Treg cells, which have been shown to suppress immune activity and whose malfunction has been associated with immunological disorders.27
We also identified several non-psychiatric phenotypes with enrichments in brain cell types. For both BMI and age at menarche, cell types in the central nervous system (CNS) ranked highest among individual cell types, and the top cell-type group was CNS, all with FDR < 0.05. These enrichments support previous human and animal studies that propose a strong neural basis for the regulation of energy homeostasis.28 For educational attainment, the top cell-type group is CNS (FDR < 0.05) and of the ten cell types that are significant after multiple testing, nine are CNS cell types. This is consistent with our understanding that the genetic component of educational attainment, which excludes environmental factors and population structure, is highly correlated with IQ.29 Finally, for smoking behavior, the CNS cell-type group is significant after multiple testing correction, and the top cell type is again a brain cell type, likely reflecting CNS involvement in nicotine processing.
Discussion
We developed a new statistical method, stratified LD score regression, for identifying functional enrichment from GWAS data that uses information from all SNPs and explicitly models LD. We applied this method to GWAS data spanning 17 traits and 1.3 million phenotype measurements. Our method identified strong enrichment for conserved regions across all traits, and immunological disease-specific enrichment for FANTOM5 enhancers. Our cell-type-specific enrichment results confirmed previously known enrichments, such as liver enrichment for HDL levels and pancreatic islet enrichment for fasting glucose. In addition, we identified enrichments that would have been challenging to detect using existing methods, such as CNS enrichment for smoking behavior and educational attainment—traits with one and three genome-wide significant loci, respectively.29, 30 Stratified LD score regression represents a significant departure from previous methods that require raw genotypes,10 use only SNPs in genome-wide significant loci,5-7 assume only one causal SNP per locus,8 or do not account for LD9 (see Methods for a discussion of other methods). Our method is also computationally efficient, despite the 53 overlapping functional categories analyzed.
Although our polygenic approach has enabled a powerful analysis of genome-wide summary statistics, it has several limitations. First, the method requires a very large sample size and/or large SNP heritability, and the trait analyzed must be polygenic. Second, it requires an LD reference panel matched to the population studied; all results in this paper are from European datasets and use 1000G Europeans as a reference panel. Third, our method is currently not applicable to studies using custom genotyping arrays (e.g., Metabochip; see Methods). Fourth, our method is based on an additive model and does not consider the contribution of epistatic or other non-additive effects, nor does it model causal contributions of SNPs not in the reference panel; in particular, it is possible that patterns of enrichment may be different at extremely rare variants. Fifth, the method is limited by available functional data: if a trait is enriched in a cell type for which we have no data, we cannot detect the enrichment. Last, though we have shown our method to be robust in a wide range of scenarios, we cannot rule out model misspecification caused by enrichment in an unidentified functional category as a possible source of bias.
The polygenic approach described here is a powerful and efficient way to learn about functional enrichments from summary statistics, and it will become increasingly useful as functional data continues to grow and improve, and as GWAS studies of larger sample size are conducted.
Web Resources
ldsc software: https:github.com/bulik/ldsc
DNaseI Digital Genomic Footprinting (DGF) annotations:3, 13 https://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwDgf
Transcription factor binding sites:3 13 https://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/
Segway-chromHMM combined enhancer annotations:20 ftp://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/byDataType/segmentations/jan2011
Super-enhancers and H3K27ac: Available as a supplementary table to Hnisz et al 2014.19
Conserved regions:21, 31 https://compbio.mit.edu/human-constraint/data/gff/">
FANTOM5 Enhancers:22 https://enhancer.binf.ku.dk/presets/">
Post-processed H3K4me1, H3K4me3, and H3K9ac:5 https://www.broadinstitute.org/mpg/goshifter/
Height32 and BMI33 summary statistics: https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files
Menarche summary statistics:34 https://www.reprogen.org
LDL, HDL, and Triglycerides summary statistics:35 https://www.broadinstitute.org/mpg/pubs/lipids2010/
Coronary artery disease summary statistics:36 https://www.cardiogramplusc4d.org
Type 2 diabetes summary statistics:37 https://www.diagram-consortium.org
Fasting glucose summary statistics:38 https://www.magicinvestigators.org/downloads/
Schizophrenia,18 Bipolar Disorder,39 Anorexia,40 and Smoking behavior30 summary statistics: https://www.med.unc.edu/pgc/downloads
Education attainment summary statistics:29 https://www.ssgac.org
Rheumatoid arthritis summary statistics:41 https://plaza.umin.ac.jp/yokada/datasource/software.htm
Crohn’s disease and ulcerative colitis summary statistics:42 https://www.ibdgenetics.org/downloads.html
We used a newer version of these data with 1000 Genomes imputation.
Methods
Stratified LD score regression
We begin with a derivation of Equation (1) for overlapping categories in a sample with no population structure or other confounding. The derivation of the intercept term in the presence of confounding is identical to the derivation in previous work.15
Let yi be a quantitative phenotype in individual i, standardized to mean 0 and variance 1 in the population, and let Xij be the genotype of individual i at the j-th SNP, standardized so that for each SNP j, Xij has mean 0 and variance 1 in the population. We will assume a linear model: where G is some fixed set of SNPs, βj(G) is the effect size of SNP j, and ϵi is mean-0 noise.
We define β(G) = (β1(G),…, βM(G)) as the hypothetical result of multiple linear regression of y on X at infinite sample size. Thus, β(G) depends on the set G; for example, if G is the set of genotyped SNPs then βj(G) includes the causal effects of non-typed SNPs that are tagged by SNP j, whereas if G contains all SNPs, then βj(G) will reflect only the true effect at SNP j.
We will define the heritability of the set G of SNPs to be and the heritability of a category 𝓒 ⊂ G to be
The definition of depends on both G and 𝓒; for example, if 𝓒 is the set of SNPs with minor allele frequency (MAF) greater than 5%, will be larger if G = 𝓒 than if G contains SNPs with lower MAF since in the first case includes tagged effects of low-frequency SNPs, whereas in the second case the low-frequency effects are included in .
Suppose that we have a sample of N individuals. Let y = (y1,…,yN), and let X be the N × M matrix of standardized genotypes, where M = |G|. (We will assume that our sample is large enough that normalizing each SNP within our sample is roughly equivalent to normalizing each SNP in the population.) Let ϵ = (ϵ1,…,ϵN) be a vector of residuals. Then we can write
Let β̂j be the estimate of the marginal effect of SNP j in our sample, given by where Xj is the j-th column of X. Define χ2 statistics .
Our goal here is to estimate , where G is the set of 1000G SNPs that have minor allele count greater than five in Europeans, and 𝓒 is a functional category, such as coding SNPs or SNPs that are in H3K4me3 regions in fetal brain tissue. From now on, we will omit the dependenceon G, understanding G to be the set of 1000G SNPs with minor allele count greater than five in Europeans. We would like to estimate this quantity from summary statistics; i.e., the input to our method will be , where R is the set of SNPs that were tested in a GWAS. We will also need LD information from a reference panel.
Substituting y = Xβ + ϵ into the the definition of β̂j, we get where D̂j is the j-th row of the in-sample LD matrix D̂ = XTX/N and . For a single entry βj, this means that where r̂jk: = D̂jk) and has mean 0 and variance .
To estimate , we will model β as a mean-0 random vector with independent entries. We will allow the variance of βj to depend on certain functional properties of SNP j; for example, we will allow coding and non-coding SNPs to have different variances. More precisely, we will assume we have C functional categories 𝓒1,…, 𝓒C ⊂ {1,…, M}. One of the categories must always contain all SNPs to allow for baseline heritability. We will model the variance of βj as
In the case that the 𝓒c are disjoint, we will have τC = h2(𝓒c)/M(𝓒c), where M(𝓒c) is the number of SNPs in 𝓒c.
Consider the expectation of . where the second equality follows because the random variables are all independent with mean 0.
Let rjk denote the true correlation between SNPs j and k in the underlying population. In an unstructured sample .
We now have where . The variance of yj is . Since our phenotype has variance one, we can replace with 1, giving us our main equation:
Our goal is to estimate , given a vector of χ2 statistics and LD information either from the sample or from a reference panel. To estimate Var(βj), we need to estimate each of the coefficients τc, and then we apply Equation (2). To estimate τc, we first compute ℓ(j, c), then we regress on ℓ(j, c). This is called stratified LD score regression.
We estimate standard errors using a block jackknife over SNPs with 200 equally-sized blocks of adjacent SNPs,15 and use these standard errors to calculate z-scores, P-values, and FDRs. To minimize standard error, we remove outliers by excluding SNPs j with , where N is the maximum sample size in the study, and we also weight the regression in a way that takes into account both over-counting and heteroskedasticity due to LD (see below). We remove the MHC region from all analyses.
Proportion of heritability and GC correction
As described above, stratified LD score regression is a method for estimating h 2 (𝓒) for a given category 𝓒. However, we are often more interested in estimating the proportion of heritability h2(𝓒)/h2. For this, we estimate h2(𝓒) and h2 separately and divide the estimates. For inference, we jackknife the proportion, rather than jackknifing the numerator and denominator separately.
Estimating the proportion of heritability is possible for summary statistics with GC correction, even though GC correction makes estimating the category specific heritability and total heritability impossible. This is because GC correction introduces a multiplicative error into estimates of both h2(𝓒) and h2,15 but the two multiplicative errors are equal, and cancel out in the ratio.
Choice of regression SNPs and reference SNPs
The derivation above does not incorporate imperfect imputation. Ideally, we would prune our χ2 statistics to a set of “regression SNPs” with imputation accuracy above 0.9, but since imputation accuracy is not always available, we instead use HapMap Project Phase 3 (HapMap343) SNPs as a proxy for well-imputed SNPs. So for the purposes of this paper, our regression SNPs are always the HapMap3 SNPs.
However, we do not assume that only HapMap3 SNPs are causal. It is important that our model allow as many SNPs as possible to contribute causally, since if we use a model with, for example, only HapMap3 SNPs causal then we are assigning causality of any SNP that is tagged by HapMap3 but not included in HapMap3 to the HapMap3 SNPs that tag it. This is problematic specifically for functional partitioning because the functional categories containing the causal SNP may not be the same as the functional categories of the HapMap3 SNPs that tag it.
Recall that is the heritability of set B defined using a model that allows any SNP in set A to be causal. Another way to restate our above point is that we are interested in rather than because a model that only allows HapMap3 SNPs to be causal is allowing non-HapMap3 heritability to be tagged by HapMap3 SNPs and therefore potentially assigning heritability to the wrong functional category. For this reason, our set of potentially causal SNPs—i.e., the set of SNPs in our reference panel—is the set of 1000G SNPs24 with minor allele count greater than five in Europeans.
However, there is a problem introduced by having many reference panel SNPs that are not tagged by regression SNPs: it leads us to extrapolate the enrichments at well-tagged SNPs to the rare SNPs on our reference panel that are not well-tagged. In other words, we estimate τc using summary statistics at HapMap3 SNPs, and then we multiply these coefficients by the number of SNPs in the relevant categories in 1000G, extrapolating the enrichments observed at HapMap3 SNPs to all 1000G SNPs.
To prevent ourselves from this extrapolation, we only estimate enrichments of categories containing common SNPs—i.e., SNPs with MAF > 0.05, all of which we assume to be well-tagged by HapMap3. Let 𝓖 denote the set of SNPs with MAF > 0.05. Then for any category 𝓒, we can estimate 𝓒 ∩ 𝓖 without potentially inaccurate extrapolation. For this reason, the proportions of heritability that we report throughout this manuscript are .
Out-of-bounds estimates
Stratified LD score regression can give heritability estimates that are not between 0 and 1. When an unbiasedness is important—for example, when we are averaging estimates over several simulation replicates—we do not adjust these out-of-bounds estimates. However when mean squared error is more important than unbiasedness—for example, when reporting the results of a single analysis—we truncate these estimates to be between 0 and 1. To get a confidence interval around the truncated estimate, we intersect the original confidence interval with the interval [0, 1].
Custom genotyping arrays
LD score regression is not currently applicable to studies using a custom genotyping arrays. For these arrays, SNPs that are more likely to have a large effect size also have a larger sample size, and this dependency is not modeled in the above derivation.
Regression weights
To minimize standard error, we weight the regression in a way that takes into account both over-counting and heteroskedasticity due to LD. For over-counting, we compute LD Scores within HapMap3 SNPs; call these ℓhm3(j). We also compute ℓ1000G(j, c) for all categories c in our model. The variance of is proportional to (1 + N Σcτcℓ1000G(j, c))2, but we do not have τc. We use a rough approximation of τc obtained by taking the mean over regression SNPs of both sides of Equation (3) and assuming that all the τc are equal. This gives us T̂ = (χ̄2 — 1) / (N . ℓ̄ (j)), where χ̄2 is the mean of χ2 and Z(j) is the mean of and ℓ̄(j is the mean of Σcℓ1000G(j, c), both taken over regression SNPs. We then weight SNP j by the inverse product of the over counting weights and heteroskedasticity weights:
Baseline model
The 53 functional categories, derived from 24 main annotations, were obtained as follows:
Coding, 3’-UTR, 5’-UTR promoter, and intron annotations from the RefSeq gene model were obtained from UCSC17 and post-processed by Gusev et al.13
Digital genomic footprint and transcription factor binding site annotations were obtained from ENCODE3 and post-processed by Gusev et al.13
The combined chromHMM/Segway annotations for six cell lines were obtained from Hoffman et al.20 CTCF, promoter flanking, transcribed, transcription start site, strong enhancer, and weak enhancers are a union the six cell lines; repressed is an intersection over the six cell lines.
DNase I hypersensitive sites (DHSs) are a combination of ENCODE and Roadmap data, postprocessed by Trynka et al.5 We combined the cell-type-specific annotations into two annotations for inclusion in the full baseline model: a union of all cell types, and a union of only fetal cell types. DHS and fetal DHS.
H3K4me1, H3K4me, and H3K9ac were all obtained from Roadmap and post-processed by Trynka et al.5 For each, we took a union over cell types for the full baseline model, and used the individual cell types for our cell-type-specific analysis.
One version of H3K27ac was obtained from Roadmap, post-processed,18 and a second version of H3K27ac was obtained from the data of Hnisz et al.19 For each, we took a union over cell types for the full baseline model, and used the individual cell types for our cell-type-specific analysis.
Super enhancers were also obtained from Hnisz et al,19 and comprise a subset of the H3K27ac annotation from that paper. We took a union over cell types for the full baseline model
Regions conserved in mammals were obtained from Lindblad-Toh et al.,21 post-processed by Ward and Kellis.31
FANTOM5 enhancers were obtained from Andersson et al.22
For each of these 24 categories, we added a 500bp window around the category as an additional category to keep our heritability estimates from being inflated by heritability in flanking regions.13
For each of DHS, H3K4me1, H3K4me3, and H3K9ac, we added a 100bp window around the ChIP-seq peak as an additional category.
We added an additional category containing all SNPs.
WTCCC1 genotypes
We used chromosome 1 genotypes from the NBS and 1966BC control cohorts from the Wellcome Trust Case-Control Consortium.23 Imputation of the genotypes to integrated phase1 v3 1000 Genomes and QC were done as in Gusev et al.;13 we removed any SNPs that were below a MAF of 0.01, were above 0.002 missingness, or deviated from Hardy-Weinberg equilibrium at a P < 0.01.
Choice of phenotypes
We quantified the combination of sample size and heritability by the z-score of total SNP-heritability in the baseline analysis. We applied our method to all traits with available summary statistics, and removed all traits with a heritability z-score less than six. We then removed one of each pair of traits with a large genetic correlation (> 0.95): college attendance, which has a very high genetic correlation with years of education, and total cholesterol, which has a very high genetic correlation with LDL.16
Meta-analysis over traits
For our meta-analysis over traits, we identified pairs of traits with substantial sample overlap and trait correlation by using the intercept of LD score regression for genetic covariance. Specifically, we identified pairs of traits for which the intercept on N1N2 scale, which is an unbiased estimator of phenotypic correlation times sample overlap, was at least 10% of the sample size of either of the traits, and we excluded one of each such pair. The remaining set of traits was: Height, BMI, menarche, LDL levels, coronary artery disease, schizophrenia, educational attainment, smoking behavior, and rheumatoid arthritis. We then performed a random-effects meta-analysis of proportion of heritability over these phenotypes for each functional category. The results are in Figure 2 and Table S3.
Choice of traits to include in Figure 3
Height, BMI, age at menarche, and schizophrenia are the four traits with the highest combination of SNP heritability and sample size, which we quantify by the z-score of total heritability in the baseline analysis. We also included a meta-analysis of immunological diseases, since they have a different pattern of enrichment from other traits; for example FANTOM5 enhancers are very enriched for immunological diseases but not for other traits. This meta-analysis included only rheumatoid arthritis and Crohn’s disease; we excluded ulcerative colitis because that dataset shares controls with the Crohn’s disease dataset.
Robustness to derived allele frequency
To check that our results are not affected by a derived-allele-frequency-dependent genetic architecture, we repeated the meta-analysis over traits with a model that contained all of the categories of the full baseline model as well as seven derived allele frequency bins to the model as extra annotations: 0-0.1, 0.1-0.2, 0.2-0.3, 0.3-0.4, 0.4-0.6, 0.6-0.8, and 0.8-1. This allowed for effect size to depend on derived allele frequency, independently of functional annotation. These results are very similar to our results without derived allele frequency bins, and are displayed in Table S5.
Cell-type specific analysis
We used all available cell types from the four histone marks, excluding cell lines and cells labeled as cultured cells to limit ourselves to data with the clearest biological interpretation. The resulting cell-type-specific annotations are listed in Table S6. We then added each annotation individually to the baseline model and evaluated the significance of the coefficient τc of the cell-type-specific annofarh2014naturetation. Next, we combined the 220 cell-type-specific annotations into 10 cell-type groups and repeated the same analysis.
Discussion of other methods
There are no other methods designed to estimate genome-wide components of heritability from summary statistics. However, there are existing methods that identify enriched functional categories and cell types from summary statistics; here, we discuss a few of these methods.
A paper by Pickrell8 combines GWAS data with functional data to identify enriched and depleted functional categories, and leverages the resulting model to increase GWAS power. While the method is effective at increasing power and identifies many interesting enrichments, it is unclear whether the method is effective at ranking cell types; for example, fetal fibroblasts are the top cell type for Crohn’s disease. This could be because the model only allows for one causal SNP per locus, or could be a result of including all cell types simultaneously, thereby penalizing redundant cell types.
Kichaev et al.7 introduce a new method that leverages functional data for improved fine-mapping. Their method also outputs annotations associated with disease. While their method is effective in increasing fine-mapping resolution, it is again unclear whether the method is effective at ranking cell types; for example, the cell types they identify as contributing the most to HDL, LDL, and Triglycerides are muscle, kidney, and fetal small intestine, respectively, whereas the top cell types for those three phenotypes identified using our method are liver, liver, and liver. The lower power of this method in ranking cell types is likely because it considers only genome-wide significant loci. Similarly, a recent method from Farh et al.6 focuses on fine-mapping and considers only genome-wide significant loci, although their cell-type-specific results are more concordant with known biology.
Maurano et al. use enrichment of SNPs passing P-value thresholds of increasing stringency to identify important cell types. However, they are implicitly assuming that the functional annotation at a GWAS SNP matches the functional annotation at the causal SNP. While this could be true for functional annotations composed of very wide regions, it is not likely to be true for functional annotations composed of smaller regions, such as conserved regions. Moreover, their method does not account for total LD, and so could give biased results if used to compare functional annotations with different average amounts of total LD.
The method of Trynka et al.5 (also see a more recent manuscript44) is conservative in its identification of enrichment, comparing to a null obtained by local shifting rather than a genome-wide null, and leverages only genome-wide significant SNPs. As a result, they have very low power not only in traits such as bipolar disorder which have few genome-wide significant loci, but even for traits such as rheumatoid arthritis that have many significant loci. (Lowest P-value 10−4 much larger than the lowest P-value given by LD Score regression, 2 × 10−7.)
Acknowledgements
We thank Brad Bernstein, Mariel Finucane, Eran Hodis, Dylan Kotliar, X. Shirley Liu, Manolis Kellis, Michael O’Donovan, Bogdan Pasaniuc, Abhishek Sarkar, Patrick Sullivan, Bjarni Vilhjalmsson, and Adrian Veres for helpful discussions. This research was funded by NIH grants R01 MH101244, R03 CA173785, and 1U01HG0070033. H.K.F. was supported by the Fannie and John Hertz Foundation. S.R. is supported by funding from the Arthritis Foundation and by a Doris Duke Clinical Scientisit Development Award. This study made use of data generated by the Wellcome Trust Case Control Consortium (WTCCC) and the Wellcome Trust Sanger Institute. A full list of the investigators who contributed to the generation of the WTCCC data is available at www.wtccc.org.uk. Funding for the WTCCC project was provided by the Wellcome Trust under award 076113. The members of the Schizophrenia Working Group of the Psychiatric Genetics Consortium are listed in the Supplementary Information.