Abstract
Noncoding regulatory variants are often highly context-specific, modulating gene expression in a small subset of possible cellular states. Although these genetic effects are likely to play important roles in disease, the molecular mechanisms underlying context-specificity are not well understood. Here, we identify shared quantitative trait loci (QTLs) for chromatin accessibility and gene expression (eQTLs) and show that a large fraction (∼60%) of eQTLs that appear following macrophage immune stimulation alter chromatin accessibility in unstimulated cells, suggesting they perturb enhancer priming. We show that such variants are likely to influence the binding of cell type specific transcription factors (TFs), such as PU.1, which then indirectly alter the binding of stimulus-specific TFs, such as NF-κB or STAT2. Our results imply that, although chromatin accessibility assays are powerful for fine mapping causal noncoding variants, detecting their downstream impact on gene expression will be challenging, requiring profiling of large numbers of stimulated cellular states and timepoints.
Genetic differences between individuals can profoundly alter how their immune cells respond to environmental stimuli (1). At the molecular level, these differences manifest as expression quantitative trait loci (eQTLs) that alter the magnitude of gene expression change after stimulation (response eQTLs) (2–7). Although response eQTLs have been implicated in modulating risk for complex immune-mediated disorders (8,9), the molecular mechanisms that give rise to these context specific effects are poorly understood. The majority of eQTLs also alter chromatin accessibility, presumably reflecting disruption of transcription factor (TF) binding (10). Because cellular response to external stimuli is regulated by stimulus-specific transcription factors (TFs), response eQTLs might directly disrupt their binding (Fig. 1A). In support of this model, a number of studies have observed that response eQTLs are enriched at the binding sites of stimulation-specific TFs such as NF-κB and STAT2 (5–7). However, a single stimulus or a developmental cue can upregulate alternate sets of genes in different cell types, even when the activated signalling pathways and TFs remain the same (11). To explain these observations, multiple studies have proposed a hierarchical enhancer activation model (11–14), under which cell type specific TFs bind to a subset of enhancers without a direct effect on target gene expression. This enhancer ‘priming’ can facilitate their subsequent activation by signal specific TFs, producing a cell type specific response (Fig. 1B). Thus, genetic variants could modulate stimulus specific effects on gene expression indirectly, by altering the binding of a cell type specific TF, for example PU.1 in macrophages, that regulate chromatin accessibility (Fig 1B). However, the genome-wide prevalence of enhancer priming is currently unclear because directed genome editing studies have been limited to handful of loci (15, 16). A powerful alternative is to use shared genetic associations at chromatin and gene expression level to probe the relationships between enhancer accessibility and gene transcription.
We focussed on enhancer priming in the context of human macrophage immune response. To ensure sufficient numbers of cells, we differentiated macrophages from a panel of 123 human induced pluripotent cell lines (iPSCs) obtained from the HipSci project (17, 18). We profiled gene expression (RNA-seq) and chromatin accessibility (ATAC-seq) in a subset of 86 successfully differentiated lines (fig. S1, table S1) in four experimental conditions: naive (N), 18 hours IFNɣ stimulation (I), 5 hours Salmonella enterica serovar Typhimurium (Salmonella) infection (S), and IFNɣ stimulation followed by Salmonella infection (I+S) (Fig. 1C). We chose these stimuli because they activate distinct, well characterised signalling pathways (Fig. 1D, fig. S2) and pre-stimulating macrophages with IFNɣ prior to bacterial infection is known to lead to enhanced microbial killing and stronger activation of the inflammatory response (19, 20).
We identified common genetic variants that were associated with either gene expression (eQTLs) or chromatin accessibility (caQTLs). Using an allele-specific method implemented in RASQUAL (23), we detected at least one QTL for up to 3,431 genes and 20,788 chromatin regions (caQTL regions) in each condition (10% FDR) (fig. S3), 50-75% of which were shared between conditions (fig. S3). Next, using a statistical interaction test followed by filtering on effect size, we identified 387 response eQTLs and 2247 response caQTLs with a small or undetectable effect (fold change < 1.5) in the naive state that increased at least 1.5 fold after stimulation (see Methods). These genetic effects displayed a variety of activity patterns (Fig. 2A, fig. S4). Strikingly, 18% of the response eQTLs appeared only after the cells were exposed to both stimuli (cluster 1), exceeding the number that appeared after IFNɣ stimulation alone (clusters 5 and 6). Response caQTL regions harboured closed chromatin in the naive cells (median transcripts per million (TPM) = 0.49) and became 3.8-fold more accessible only after the relevant stimulus (fig. S4). Furthermore, response caQTLs were enriched for disrupting stimulus-specific TF motifs (fig. S4), suggesting that they are largely driven by TFs that bind to DNA only after stimulation.
To quantify the extent of enhancer priming in macrophage immune response, we next focussed on how response eQTLs manifest on the chromatin level. We grouped response eQTLs (Fig. 2A) by the condition in which they had the largest effect size (I, S or I+S). We then used linkage disequilibrium (LD) (R2 > 0.8) between the lead variants to identify caQTL-eQTL pairs that were likely to be driven by the same causal variant (see Methods). For example, we identified a QTL upstream of GP1BA that had no effect in naive cells, but became simultaneously associated with chromatin accessibility and gene expression after IFNɣ + Salmonella stimulation (Fig. 2D). The lead caQTL variant (rs4486968) was predicted to disrupt NF-κB binding motif (fig. S5), illustrating how a genetic variant can have direct effect on stimulus-specific TF binding and gene expression. In contrast, a genetic variant in an intron of NXPH2 modulated the accessibility of a regulatory element both in naive and stimulated cells, but only became associated with gene expression after IFNɣ stimulation (Fig. 2E). Genome-wide, we found that for approximately half of all response eQTLs, the linked caQTL was present in naive cells prior to stimulation (caQTL fold change > 1.5), suggesting that many response eQTLs disrupt enhancer priming (Fig. 2B).
One potential issue with our analysis is that using LD to identify eQTL-caQTL pairs will sometimes lead to false positives where two independent causal variants for different phenotypes are mistaken for a single shared causal variant. To estimate our false positive rate, we performed a reverse analysis where we asked how often response caQTLs were linked to eQTLs that were present in the naive state, reasoning that these are likely to be false positives. Using the same fold change threshold as above, we estimated the mean false positive rate to be 17% (Fig. 2C).
We speculated that response eQTLs that alter enhancer priming should be enriched for disrupting the motifs of macrophage cell type specific TFs. To test this, we focussed on the 145 eQTL-caQTL pairs (137 unique caQTLs) identified above (Fig. 2B). We found that 9/78 caQTLs present in the naive cells disrupted PU.1 motifs compared to none of the 59 caQTLs that appeared together with the response eQTL (Fisher’s exact test, p = 0.01). For example, the rs7594476 variant in the NXPH2 enhancer disrupted PU.1 binding in a direction consistent with the caQTL effect (Fig. 3A).
Recent evidence suggests that single genetic variants can modulate the activity of multiple regulatory elements within topologically associated domains (23–26). One plausible mechanism for these broad associations is that a single causal variant may directly regulate the accessibility of a “master” region, which subsequently influences neighbouring “dependent” regions (23). We used caQTL summary statistics to heuristically identify likely master and dependent regions, assuming that the causal variant should reside within the master region itself, and this affects accessibility in dependent regions (Fig. 3B) (see Methods). We found a striking example of such a relationship at the NXPH2 locus, where a putative causal variant in the master region was also associated with the accessibility of neighbouring dependent region after IFNɣ stimulation (Fig. 3A). Using this approach, we identified 2,934 dependent regions that belonged to 1,921 unique master regions (Fig. 3B). While 77% of the master regions had a single dependent region only a few kb away (fig. S6), we found many loci where master peaks were associated with multiple regions of open chromatin (Fig. 3C). In the NXPH2 locus introduced above, we detected 18 dependent regions spanning 100 kilobases of DNA (Fig. 3C), six of which appeared only after IFNɣ stimulation (Fig. 3D,F). Notably, the appearance of condition-specific dependent regions correlated with the caQTL becoming a response eQTL for both NXPH2 and SPOPL (Fig. 3E), suggesting that some of them might be required for gene activation. Using a linear model followed by strict filtering (see Methods), we found a total of 64 condition-specific dependent regions genome-wide, two of which are highlighted in fig. S7.
Because they can be engineered with high efficiency, iPSC-derived cells are promising cellular models of disease. Similarly to previous studies (7), we found that macrophage eQTLs and caQTLs were enriched for GWAS hits of multiple immune-mediated disorders (fig. S8).
However, observing a genome-wide enrichment has only limited utility and detailed follow up of a locus is only justified when there is evidence for a shared causal mechanism between GWAS and eQTL associations. Thus, we used a statistical colocalisation test (28) to identify cases where the gene expression and trait association signals were consistent with a model of a single, shared causal variant. We identified 22 eQTLs (table S3) that showed evidence of colocalisation (PP3 + PP4 > 0.8, PP4/PP3 > 9) with at least one disease (see Methods).
Consistent with our enrichment analysis, we found the largest number of overlaps with IBD and RA (Fig. 4A). Interestingly, only 10/22 of the colocalised eQTLs were detected in the naive cells and each additional stimulated state increased the number of overlaps by approximately 30% (Fig. 4B). For example, we found an IFNɣ + Salmonella specific response eQTL for TRAF1 that colocalised with a RA GWAS hit (fig. S9). Although the same overlap was previously reported in whole blood (29), our data highlights the environmental condition in which the association is active.
Our analysis of enhancer priming suggested that many disease associations might manifest at the level of chromatin without an apparent effect on expression. To explore this further, we focussed on colocalisation between caQTLs and GWAS hits. We detected 24 caQTLs that colocalised with a GWAS hit (table S4), but only two of these also colocalised with an eQTL (PTK2B eQTL with Alzheimer's disease (fig. S10) and WFS1 eQTL with type 2 diabetes). Since genes often have multiple independent eQTLs (30), we reasoned that some caQTLs might be secondary eQTLs for their target genes. To capture these secondary effects, we first identified four additional genes that were associated with a caQTL lead variant at FDR < 10%, even though the caQTL and eQTL lead variants were not in strong LD (i.e. R2 < 0.8). We repeated the colocalisation analysis on these loci and identified two additional overlaps (table S3), including a secondary eQTL for CTSB that colocalised with a GWAS hit for systemic lupus erythematosus (SLE) (Fig. 4C). Interestingly, although the CTSB eQTL appeared after IFNɣ + Salmonella stimulation, the caQTL was already present in naive cells. Although some caQTL colocalisation with eQTLs might remain undetected due to lack of power, the CTSB example suggests that a fraction of disease-associated caQTLs might correspond to primed enhancers that regulate gene expression in some other yet unknown conditions. Although majority (22/24) of caQTL overlaps with disease were detected in the naive cells (Fig. 4C), this is confounded with a smaller ATAC-seq sample size in Salmonella and IFNɣ + Salmonella conditions that limited our power to detect colocalisations.
Discussion
The results of our study resolve an apparent paradox in the genetics of human complex traits: although disease loci from association studies are strongly enriched in regulatory elements (31, 32), a relatively small fraction are explained by known eQTLs, even those identified in trait-relevant tissues (29, 33, 34). Our results suggest that this apparent contradiction arises partly because many disease risk variants affect chromatin structure in a broad range of cellular states, but their impact on expression is highly context-specific. This conclusion is supported by studies of 3D chromatin structure linking GWAS loci to putative target genes but with no observable effect on gene expression (35), in particular because enhancer-promoter interactions are known to precede transcription (36, 37). We believe our result has important implications for future studies of human disease. First, it is likely that a large range of cellular states will need to be profiled in order to capture the effects of disease-associated variants on expression. Our results suggest this space will be challenging to explore systematically, especially given the numbers of novel associations we detect using a combination of just two stimuli. Second, overlap of disease variants with open chromatin, while likely to be informative regarding the identity of the causal variant, may be less useful predictors of the disease relevant cell state.
Although our study suggests that many human disease associated variants impact enhancer priming, the functional relevance of this is currently not well understood. First, enhancer priming may facilitate cell type specific response to ubiquitous signals (11, 38, 39). Although specificity can also be achieved by cooperative binding to newly established enhancers (40), TFs differ in their intrinsic ability to bind to closed chromatin (41). Thus, enhancer priming might be a preferred mechanism of cooperation between ‘pioneer’ TFs that can independently open up chromatin (e.g. PU.1 in macrophages) and ‘settlers’ (e.g. NF-κB) that predominantly bind to accessible regions (42). Alternatively, enhancer priming might facilitate rapid response to external stimuli. In support of this model, promoters of immediate early response genes are already accessible in naive cells (43) and TF binding to primed enhancers peaks minutes after stimulation while the activation of de novo enhancers can take several hours (40). Thus, response eQTLs that appear rapidly after stimulation might be enriched for primed enhancers relative to those that appear later. Finally, enhancer priming might not be limited to single regulatory elements. Our results (Fig. 3D) together with previous reports (16, 44) suggest that some regulatory elements can act as ‘seed’ enhancers that allow other neighbouring enhancers to become active after stimulation and lead to upregulation of gene expression. Although we have identified a small number of such examples, future caQTL mapping studies in multiple cell types and conditions have a potential to systematically identify and characterise these hierarchical relationships between enhancers.
In summary, our results illustrate how pre-existing genetic effects on chromatin propagate to gene expression during immune activation, and highlights the relevance of these hidden genetic effects for deciphering the molecular architecture of disease-associated variants. Our study is also the first that we are aware of to utilise iPSC-derived cells to study genetic effects in immune response. We believe a major future use of this system will be the systematic exploration of gene-environment interactions across large numbers of cell states. Furthermore, because iPSCs are readily engineered, the identity of causal variants and their downstream consequences can be directly tested in exactly the same cell types and conditions where they were discovered.
Supplementary Tables
Table S1: Metadata for all iPSC to macrophage differentiation attempts.
Table S2: GARFIELD fold enrichments and p-values for 10 different GWAS traits.
Table S3: List of eQTLs colocalised with GWAS associations.
Table S4: List of caQTLs colocalised with GWAS associations.
Table S5: Metadata for the RNA-seq samples.
Table S6: Metadata for the ATAC-seq samples.
Acknowledgements
We thank Leopold Parts, Jeremy Schwartzentruber, Chris Wallace, Lili Milani, Kaido Lepik and Hedi Peterson for helpful comments on the manuscript. We thank Rachel Nelson for assistance and early access to HipSci iPSC lines. We also thank WTSI DNA Pipelines and Cytometry Core Facility for their sequencing and flow cytometry services. This work was supported by the Wellcome Trust grant #098051. K.A. was supported by a PhD fellowship from the Mathematical Genomics and Medicine programme from the Wellcome Trust. The iPSC lines were generated at the Wellcome Trust Sanger Institute, under the Human Induced Pluripotent Stem Cell Initiative funded by a strategic award (WT098503) from the Wellcome Trust and Medical Research Council. We also acknowledge Life Science Technologies Corporation as the provider of cytotune.
References
- 1.↵
- 2.↵
- 3.
- 4.
- 5.↵
- 6.
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.
- 13.
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.
- 25.
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.
- 87.