Abstract
The promoters of mammalian genes are commonly regulated by multiple distal enhancers, which physically interact within discrete chromatin domains. How such domains form and how the regulatory elements within them interact within single cells is not understood. To address this we developed Tri-C, a new Chromosome Conformation Capture (3C) approach to identify concurrent chromatin interactions at individual alleles within single cells. The heterogeneity of interactions observed between such cells shows that CTCF-mediated formation of chromatin domains and interactions within them are dynamic processes. Importantly, our analyses reveal higher-order structures involving simultaneous interactions between multiple enhancers and promoters within individual cells. This provides a structural basis for understanding how multiple cis-elements act together to establish robust regulation of gene expression.
Introduction
Precise spatial and temporal patterns of gene expression during development and differentiation are controlled by cis-regulatory elements including promoters, enhancers and boundary elements. The interaction and activity of these elements depend on their structural organization within the nucleus (1). To date the relationship between structure and function has mainly been analyzed in populations of cells. The globin loci, which provide ideal models to elucidate the general principles of mammalian gene regulation, have been extensively studied in this way. For example, we have previously shown that the active mouse α-globin cluster and its regulatory elements are located in a decompacted ~70 kb chromatin domain that forms early in erythroid differentiation and is flanked by CTCF-binding sites (2–4). Within this domain, the α-globin genes interact with five enhancer elements, which cooperate in an additive manner to upregulate gene expression (5). Such activity of multiple enhancer elements has been frequently observed in a wide variety of mammalian gene loci and is thought to impart robustness to patterns of gene expression underlying cell fate decisions (6).
It is now known that analyzing populations of cells may not reveal the true mechanisms underlying biological processes that are revealed by single cell analysis. At present, we do not know how promoters, enhancers and boundary elements physically interact as genes are switched on in single cells. To address this important question, we developed Tri-C, a new Chromosome Conformation Capture (3C) technique, which can identify concurrent chromosomal interactions at individual alleles, and consequently provides information derived from single cells. We combined conventional 3C and Tri-C experiments to perform in-depth characterization of the structural architecture of the murine globin loci in single cells.
Our analyses of the single-cell conformations of the domains containing the globin loci show that interactions between boundary elements are heterogeneous, implying that they are dynamic rather than fixed, thus providing support for a loop extrusion mechanism contributing to the formation of chromatin domains. Furthermore, we observe higher-order complexes within these domains in which multiple enhancers and promoters interact within individual nuclei. This shows that more than one enhancer frequently interacts simultaneously with individual promoters within a single cell. This is consistent with previous observations showing that fully regulated expression of the murine globin genes (5, 7) and many other genes (6) depends on the presence of more than one enhancer element. The formation of such enhancer-promoter complexes also explains the function of apparent redundant enhancer elements (8, 9), which could play a role in the formation of robust structures required for assembly of the transcriptional machinery at gene promoters.
Results
Activation of the murine globin loci is associated with the formation of strongly compartmentalized domains in which enhancers and promoters interact
To characterize the large-scale conformations of the murine globin loci, we performed Hi-C in primary mouse erythroid cells, in which the globin genes are highly expressed, and compared this to an equivalent Hi-C dataset from mouse embryonic stem (ES) cells (10), in which the globin loci are silent (Supplementary Figure 1). This comparison shows that the chromatin regions containing the globin clusters form strong, tissue-specific, self-interacting domains in erythroid cells, but not in ES cells. To characterize the interactions within these domains at higher resolution, we performed multiplexed, high-resolution Capture-C experiments in erythroid and ES cells from ∼45 viewpoints, which were closely spaced across the loci and included all known regulatory elements. Figure 1 shows interaction profiles in the α-globin domain from the viewpoint of the promoters, the strongest enhancer element (R2), and the CTCF boundary elements on either side of the domain. These profiles show strong reciprocal interactions between the α-globin promoters and all its enhancer elements in erythroid cells. The flanking CTCF-binding elements do not participate in these enhancer-promoter interactions, but form diffuse interactions with the chromatin bound by CTCF on either side of the domain, spanning contiguous regions of ∼50 kb. The nature of these interaction patterns is highlighted in the contact matrices derived from the viewpoints tiled across a 300 kb region containing the locus (Figure 2).
Although the contacts between the enhancers and promoters are more discrete compared to those between the CTCF-binding sites, interaction frequencies are significantly enriched throughout the domain in erythroid cells. Importantly, this is not consistent with a model in which enhancers and promoters simply form stable, distinct loops (1, 11). Rather, these interactions are more readily interpreted as a compartmentalized domain, in which, at some point, every region of chromatin interacts with every other, and preferred, transiently stabilized structures are formed between regulatory elements. Analysis of the β-globin locus reveals a similar interaction landscape (Supplementary Figures 2, 3).
Our analysis by Capture-C suggests that all enhancers and promoters in the globin loci can form interactions. However, as these data are derived from pair-wise contacts in populations of cells, they reflect multiple dynamic conformations that obscure specific higher-order structures associated with productive interactions between enhancers and promoters in individual cells. In particular, it is not clear whether cis-interactions between multiple enhancers and promoters occur simultaneously in a single cell nucleus, or if these elements compete for the formation of exclusive interactions. Therefore, it remains unclear how these regulatory elements interact to ensure robust regulation of gene expression.
Tri-C detects multi-way interactions with unprecedented sensitivity
Questions regarding the structural interaction between regulatory elements can be addressed by analyzing interactions between these elements in individual cells. Though single-cell Hi-C (12–14) and Genome Architecture Mapping (15) have provided insights into chromosomal structures in single cells at large scales, the resolution of these techniques (100 kb and 30 kb, respectively) is insufficient to be informative at the level of individual regulatory elements. To overcome these limitations, we explored a different strategy to analyze chromosomal structures in single cells. Because in situ 3C libraries contain long DNA concatemers in which neighboring fragments represent chromatin regions that were in close proximity in individual nuclei, single-cell chromatin conformations can be derived from population-based assays in which several neighboring fragments in the 3C concatemer are identified simultaneously. However, the proportion of reads that contain multiple interacting fragments in conventional 3C techniques is very low (<2%) (16–19) and despite advances in efficiency in a recent 3way-4C approach (20), resolution and sensitivity of current techniques are insufficient for robust detection of higher-order structures formed by individual regulatory elements in single cells.
To analyze such structures at high resolution, we developed Tri-C, a new 3C method that can identify multi-way interactions with viewpoints of interest with unprecedented sensitivity (Figure 3a). For efficient detection of multiple ligation junctions between interacting fragments, Tri-C libraries are generated using an enzyme selected to create relatively small DNA fragments (∼200 bp) for the viewpoints of interest. We chose the restriction enzyme NlaIII, which has a recognition sequence of four bp and therefore cuts on average every 256 bp. Sonication of these libraries to ∼450 bp generates fragments of which about ∼50% contain multiple ligation junctions. Using highly optimized oligonucleotide capture-mediated enrichment for viewpoints of interest, millions of multi-way contacts can be identified with Illumina sequencing, generating 3C profiles at unprecedented depth (Figure 3b). Importantly, Tri-C allows for multiplexing both viewpoints and samples, enabling analyses of multiple genomic regions and cell types of interest in a single experiment. Because Illumina sequencing provides accurate identification of the random sonication ends of the reads, these can be used as unique molecular identifiers to filter out PCR duplicates, allowing for quantitative analysis of the detected multi-way interactions.
To validate that Tri-C detects reliable multi-way interactions in individual cells, we performed several additional experiments and analyses. First, to confirm that capturing multiple ligation junctions simultaneously did not introduce a bias in the detected interactions, we compared the pair-wise interactions derived from Tri-C to Capture-C interaction profiles from the same viewpoint (Supplementary Figure 4). Next, we validated the detected multi-way interactions by long-read Nanopore sequencing of PCR-enriched ligation events (Supplementary Figure 5). Finally, to confirm that Tri-C interactions represent single-cell chromosome conformations, we show that the detected multi-way interactions are allele-specific (Supplementary Figure 6).
Multiple enhancers and promoters form higher-order chromatin structures in which they interact simultaneously
We used Tri-C to analyze the higher-order structures around regulatory elements in the globin loci in erythroid and ES cells. To visualize these 3D structures, we represent the multiway interactions in contact matrices in which we excluded the viewpoint of interest and plotted the frequencies with which two elements were captured simultaneously with this viewpoint. Mutually exclusive contacts between elements appear as depletions at the intersections between such elements in the contact matrix, whereas preferential simultaneous interactions in higher-order structures are visible as enrichments at these foci.
Analysis from the R2 viewpoint, the strongest α-globin enhancer (5), shows clear enrichments at the intersections with the other enhancers (R1, R3, Rm and R4) and promoters in erythroid cells (Figure 4a). These interactions are not observed in ES cells (Figure 4b) and direct comparisons show over 100-fold enrichment for the multiple enhancer-promoter interactions in erythroid cells (Figure 4c), demonstrating that these do not simply reflect genomic proximity. These patterns and enrichments are highly reproducible between biological replicates (Supplementary Figure 7). Moreover, the interactions are enriched compared to the contact distribution detected by the multiplexed Capture-C experiments (Figure 4d), highlighting that these higher-order structures cannot be predicted by pair-wise 3C data. Tri-C analysis of HS2, one of the strongest β-globin enhancers (7), shows very similar structures (Supplementary Figure 8).
These analyses demonstrate that multiple enhancers and promoters form higher-order structures in which they simultaneously interact together to switch genes on. Of interest, the promoter of the housekeeping gene Nprl3, which is six-fold upregulated in erythroid cells (2), is included in the complexes formed with the α-globin enhancers and promoters, highlighting that there is no competition between genes for mutually exclusive interactions with enhancers.
CTCF-binding sites form dynamic interactions supportive of a loop extrusion mechanism underlying boundary formation
We have previously shown that CTCF-binding sites flanking the α-globin locus contribute to the formation of a domain that delimits the region of chromatin within which the observed complexes between enhancers and promoters can be formed (2). However, the processes underlying the formation of such chromatin domains and their contribution to enhancer-promoter specificity remain unclear. Based on Hi-C data, it has been suggested that CTCF-binding sites located at domain boundaries form specific loops and that multiple CTCF-binding sites might form multi-anchored and/or nested structures (21). Our Capture-C data show that the functionally validated CTCF-binding site HS-39 (2) located upstream of the a-globin cluster predominantly interacts with a region downstream of the domain, which also contains many CTCF-binding sites (Figures 1 and 2). However, the pattern of interactions is very broad compared to the interactions between enhancers and promoters and it is unclear whether these interactions represent stable loops between multiple CTCF-binding sites or represent a more dynamic mechanism by which domain boundaries are formed.
To resolve these structures within individual nuclei, we performed Tri-C analysis from the viewpoint of the HS-39 CTCF-binding site (Figure 5). Consistent with the pair-wise interaction data, the multi-way interactions are preferentially formed with the region located downstream of the α-globin domain containing many CTCF-binding sites. A model in which CTCF boundaries are formed by stable, multi-anchored loops would result in specific enrichments of multi-way contacts between these CTCF-binding sites. However, the Tri-C contact matrix in erythroid cells shows a diffuse enrichment with the entire region on the other side of the α-globin domain. This diffuse enrichment is confined to the viewpoint diagonal and the base of the matrix, which indicates that the only simultaneous interactions we observe with the HS-39 CTCF-binding site are proximal to HS-39 itself or to its interacting partner. This shows that CTCF-binding sites do not form specific higher-order structures. Rather, the diffuse pattern indicates that HS-39 forms dynamic interactions with the entire region flanking the opposite side of the α-globin domain. The spread of interactions suggests that conformations seen in single cells represent snapshots of a dynamic scanning process throughout this chromatin region.
Tri-C analysis from the viewpoint of the 3’HS1 CTCF-binding site in the β-globin locus shows a similar pattern (Supplementary Figure 9). Our data are therefore not consistent with the formation of stable loops between CTCF-binding sites at domain boundaries. Rather, the Tri-C interaction patterns provide support for a loop extrusion mechanism, in which the formation of chromatin domains is mediated by protein complexes, likely involving Cohesin, that translocate along chromosomes, bringing every region in contact with each other (22–24). Continuous scanning across chromatin regions and transient stalling of these protein complexes at CTCF boundary elements explains the enrichment over a broad region of chromatin containing many CTCF-binding sites. In contrast to these preferential interactions with the opposite boundary region in erythroid cells, the multi-way interactions of CTCF-binding sites around the globin clusters in ES cells are more symmetrical along the genome. However, the patterns of interactions, with broad enrichments along the diagonal and the base of the matrix, are similar, indicating that loop extrusion is a general mechanism contributing to chromosome organization in all cell types.
Discussion
Formation of chromatin domains by the proposed loop extrusion mechanism could explain many features of chromosome organization (22–24). However, the process of loop extrusion and the resulting dynamic chromatin structures have not been observed directly, and current evidence is derived from polymer model predictions (23, 24) and perturbations of specific components of the loop extrusion machinery (25–27). Here, we show for the first time that high-resolution chromatin structures in single cells do not support a model of stable loops between CTCF-binding sites, but indicate a dynamic mechanism such as loop extrusion underlying the formation of chromatin domains. Importantly, the diffuse enrichment patterns we observe are indicative of transient CTCF-binding site interactions, which is in agreement with the kinetics of CTCF binding to the genome (28).
The preferential interactions between CTCF-binding sites at opposite ends of the globin domains in erythroid cells and the more symmetrical patterns in ES cells explain the formation of tissue-specific domains in erythroid cells (Figure 5 and Supplementary Figure 9). We have shown before that CTCF occupancy is similar in both cell types (2). These different structures therefore likely reflect differences in the processivity of loop extruding factors such as Cohesin, which could result from tissue-specific recruitment locations and/or extrusion rates. It has been shown that Mediator, a transcriptional coactivator bound at active enhancers and promoters, forms complexes with Cohesin and the Cohesin-loading factor Nipbl (29), and we have previously observed binding of Cohesin and Mediator at the α-globin enhancers in erythroid cells (2, 5). The erythroid-specific formation of the globin domains could therefore be explained by a loop extrusion model in which Cohesin is recruited to the genome at active enhancers and/or promoters. Recent findings have suggested that Cohesin translocation is stimulated by Nipbl, suggesting that Nipbl abundance determines extrusion rates (30). Differences in Nipbl distribution could therefore also contribute to tissue-specific interaction patterns, though this remains to be further explored. Within the dynamic chromatin domains containing the globin gene clusters, we identify higher-order structures in which multiple enhancers and promoters interact simultaneously in individual cells. These complex, tissue-specific structures, cannot be explained by CTCF/Cohesin-mediated loop extrusion alone and indicate other, independent mechanisms contributing to chromosome architecture. This is consistent with polymer models (31) and recent studies in which Cohesin binding was perturbed, but interactions between enhancers and promoters still occurred, though more promiscuous (25, 26). Importantly, contacts between these elements do not represent stable, exclusive enhancer-promoter loops (11), but preferred interactions within dynamic compartmentalized domains, in agreement with polymer model predictions (32). Our data could therefore be explained by a loop extrusion mechanism that brings regulatory elements into close proximity and enables subsequent formation of more complex, stabilized structures. This is likely mediated by multi-protein complexes and could contribute to or result from the formation of phase-separated assemblies of components of the transcriptional machinery (33) (Figure 6).
The formation of such complexes in which multiple enhancers simultaneously interact with the genes they regulate indicates cooperation rather than competition between enhancer elements. This is consistent with the observed additive effects of individual enhancers at the globin (5, 7) and many other genes (6). Importantly, we have previously shown that no single enhancer element in the α-globin locus is critical for the formation of the chromatin structures associated with active α-globin transcription (3, 5). The formation of complexes in which multiple enhancers and promoters interact simultaneously therefore provides a structural basis for the observed functional cooperativity and also suggests a role for apparent redundant enhancer elements (8, 9). Such ‘shadow’ enhancers could have a structural function in forming and maintaining effective platforms for assembly of the transcriptional machinery and ensure the formation of robust complexes, even in the context of mutations or deletions in other enhancer elements.
This highlights that Tri-C analyses not only contribute to our fundamental understanding of the relationship between genome structure and function, but are also a valuable tool to interpret how genetic variations can disrupt complex chromatin structures and cause misregulation of gene expression and disease.
Methods
Cells
Primary murine ter119+ erythroid cells were obtained from spleens of female C57BL/6 mice treated with phenylhydrazine as previously described (34). Mouse embryonic stem (ES) cells (129/Ola) were derived from mice at embryonic day 14 and cultured and harvested as previously described (34).
Hi-C – Experimental procedure
Hi-C in primary murine ter119+ erythroid cells was performed as previously described (35). An equivalent dataset in mouse ES cells (Cast/129) was used for comparative analyses (10).
Hi-C – Data analysis
Hi-C data were analyzed using the HiC-Pro pipeline (36). Reads were aligned to the mm9 reference genome using Bowtie2, with minor modifications to the recommended options (erythroid: -k 3 –score-min L,-0.6,-0.2; ES: -k 3 –score-min L,-0.6,-0.6) to allow for multimapping in the duplicated regions in the globin loci and better alignment of the ES data in the β-globin region, which contains many SNPs in the Cast/129 strain compared to the mm9 reference.
TADs were identified based on insulation indices using TADtool (37). A/B compartmentalization was analyzed as previously described (38).
Capture-C – Experimental procedure
Capture-C data were generated using the Next-Generation Capture-C protocol (34, 39). The DpnII restriction enzyme was used for digestion during 3C library preparation.
Because exclusion zones around all viewpoints analyzed in a multiplexed experiment are removed from analysis, we performed several Capture-C experiments to characterize the complete interaction landscapes of all cis-regulatory elements of interest in the globin loci. We used the following combinations of viewpoints in six independent experiments:
α-globin locus: R1 and R2 (enhancers); β-globin locus: HS1 and HS2 (enhancers);
α-globin locus: R3, Rm and R4 (enhancers); β-globin locus: HS3 and HS4 (enhancers);
α-globin locus: Hbq-1 and Hbq-2 (θ-globin promoters/CTCF-binding sites); β-globin locus: HS5 and HS6 (enhancers/CTCF-binding sites);
α-globin locus: HS-38 and HS-39 (upstream CTCF-binding sites); β-globin locus: HS-57 (enhancer), HS-60 and HS-90 (upstream CTCF-binding sites);
α-globin locus: HS+44 and HS+48 (downstream CTCF-binding sites); β-globin locus: 3’HS1 and 3’HS2 (downstream CTCF-binding sites);
α-globin locus: Il9r, Snrnp25, Rhbdf1, Mpg and Nprl3 (promoters);
To cover the remaining regions of the α- and β-globin loci for the generation of a high-resolution all vs all contact matrix, we performed an additional multiplexed experiment with viewpoints tiled across a 300 kb window around both globin clusters.
Capture oligonucleotides were designed using CapSequm (34). Overviews of the viewpoint DpnII fragments in the α- and β-globin loci are shown in Supplementary Tables 1 and 2, respectively.
We used three biological replicates of primary murine ter119+ erythroid and ES cells in every experiment, which were pooled after ligation of indexed sequencing adapters.
The generated Capture-C libraries were sequenced using Illumina sequencing platforms (V2 chemistry; 150 bp paired-end reads).
Capture-C – Data analysis
Capture-C data were analyzed as previously described (34). Reads were aligned to the mm9 reference genome using Bowtie1 with the following options: -p 1 -m 2 -v 3 –best –strata. The -m 2 option was used to allow for multi-mapping in the duplicated regions in the globin loci. The -v 3 option was used to allow up to three mismatches to improve alignment of the ES data in the β-globin region, which contains many SNPs in the 129/Ola strain compared to the mm9 reference. Because some SNPs were located in close proximity to or even in DpnII cut sites, alignment of reads with viewpoints in these DpnII fragments remained sub-optimal. The quality of some Capture-C profiles in the β-globin region in ES cells is therefore somewhat compromised, though still highly interpretable.
As PCR duplicates are removed during data analysis, Capture-C accurately quantifies chromatin interactions (40, 41). The Capture-C profiles in the figures represent the mean number of unique interactions per restriction fragment from three replicates, normalized for a total of 100,000 interactions on the chromosome analyzed, and scaled to 1,000.
Tri-C – Experimental procedure
The R2 enhancer and HS-39 CTCF-binding site in the α-globin locus, and the HS2 enhancer and 3’HS1 CTCF-binding site in the β-globin locus are located on small NlaIII restriction fragment (Supplementary Table 3). We therefore used the NlaIII restriction enzyme for digestion of 3C libraries, which were prepared as previously described (34). To be able to capture multiple restriction fragments in individual reads, we optimized the subsequent sonication step to generate DNA fragments of ∼450 bp, using a Covaris S220 Focussed Ultra-Sonicator with the following settings: one cycle of 55 s; duty cycle: 10%; intensity: 4; cycles per burst: 200. DNA clean-up and size selection after sonication were performed with Ampure XP beads in a 0.7:1 bead-sample ratio. Subsequent ligation of sequencing adapters and indexing and multiplexing of libraries were performed as described previously (34). To enrich for reads containing the viewpoint fragments of interest, we performed a double oligonucleotide capture as previously described (34), using a 13 fmol pool of capture oligonucleotides.
We initially performed an experiment with three biological replicates of primary erythroid cells, which was sequenced using the Illumina MiSeq platform (V2 chemistry; 250 bp paired-end reads). In silico trimming of the 250 bp paired-end reads showed that there was little benefit of using 500 cycles of sequencing compared to 300 cycles for capturing multiple reporters in the reads. Therefore, to generate data at sufficient depth, we performed a second multiplexed experiment with seven additional technical replicates (derived from three biological replicates) of erythroid and ES cells, which was sequenced using the Illumina NextSeq platform (V2 chemistry; 150 bp paired-end reads).
Tri-C – Data analysis
Tri-C data were analyzed using a combination of publicly available tools and customized scripts. Trim_galore (Babraham Institute, http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) was used to remove adapter sequences in the reads. Where possible, paired-end reads were reconstructed into single reads using FLASH with interleaved output settings. A custom script was used to perform an in silico restriction enzyme digestion, after which the reads were aligned to the reference genome using Bowtie1 (-p 1 -m 2 -v 3 –best –strata). The aligned reads were analyzed with custom scripts to identify captured reads containing the targeted viewpoints of interest. Restriction fragments in captured reads were defined as interacting ‘reporter’ fragments if they were located outside a ∼1 kb exclusion zone around the viewpoint restriction fragment. PCR duplicates were removed by excluding reads that had the same start and end coordinates of each individual restriction fragment. Reads with two or more reporters were used to calculate interaction frequencies between reporter fragments for each viewpoint. An overview of the numbers of detected reporters is shown in Supplementary Table 4. The interaction frequencies were normalized for a total of 100,000 interactions on the chromosome analyzed and binned in 500 bp bins, to allow the data to be represented in symmetrical contact matrices.
C-Trap – Experimental procedure
3C libraries were prepared as previously described (34), using the DpnII restriction enzyme for digestion. Primers targeting the restriction fragments of interest were designed >100 bp away from the restriction site (Supplementary Table 5), to allow for the selection of reads that were selectively amplified and of sufficient quality. PCR amplification was performed using the Takara Prime Star GXL 2-step amplification program for 10-30 kb amplicons. The Agencourt AMPure XP system was used to purify the PCR product and select fragments >600 bp. The purified amplicons were adapter-ligated and sequenced using the Oxford Nanopore MinION platform (MAP-005 chemistry).
C-Trap – Data analysis
The MinION reads were basecalled using Metrichor and converted to fasta format using Poretools (43). To select reads that were specifically amplified and of sufficient quality, a BLAST search against a database containing the sequences between the primers and the restriction sites of the targeted fragments was performed. Reads that matched >60% of both primer sequences were selected. To map the trapped restriction fragments in the reads, a BLAST search against a database containing all the DpnII fragments in the genome was used. Custom scripts were used to iterate through the BLAST results and select the non-overlapping BLAST matches in the reads with a match >60% of fragment length or >100 bp (to avoid a skew towards smaller fragments) in order of significance of the BLAST scores. A summary of the read statistics is shown in Supplementary Table 6.