Abstract
Mammalian genomes are folded in a hierarchy of compartments, topologically associating domains (TADs), subTADs and looping interactions. Currently, there is a great need to evaluate the link between chromatin topology and genome function across many biological conditions and genetic perturbations. Hi-C generates high quality, high resolution maps of looping interactions genome-wide, but is intractable for high-throughput screening of loops across conditions due to the requirement of an enormous number of reads (>6 Billion) per library. Here, we describe 5C-ID, an updated version of Chromosome-Conformation-Capture-Carbon-Copy (5C) with restriction digest and ligation performed in the nucleus (in situ Chromosome-Conformation-Capture (3C)) and ligation-mediated amplification performed with a new double alternating design. 5C-ID reduces spatial noise and enables higher resolution 3D genome folding maps than canonical 5C, allowing for a marked improvement in sensitivity and specificity of loop detection. 5C-ID enables the creation of high-resolution, high-coverage maps of chromatin loops in up to a 30 Megabase subset of the genome at a fraction of the cost of Hi-C.
Introduction
Higher-order folding of chromatin in the 3D nucleus has been linked to genome function. Mammalian genomes are arranged in a nested hierarchy of territories [1], compartments [2–4], topologically associating domains [5–8] (TADs), subTADs [3, 9] and long-range looping interactions [10, 11]. Looping interactions have been linked to at least two mechanistically different modes of control over gene expression. First, enhancers can loop to distal target genes in a highly cell type-specific manner to facilitate their precise spatial-temporal regulation [12–15]. Second, long-range loops anchored by the architectural protein CTCF are often constitutive among cell types and form the structural basis for TADs/subTADs [9]. CTCF-mediated interactions connecting loop domains can create insulated neighborhoods that demarcate the search space of enhancers within the domain [16]. Specifically, CTCF anchored constitutive loops can prevent ectopic enhancer activation of genes outside of the domain or aberrant invasion of nonspecific enhancers into an inappropriate domain [16–20]. An active area of intense investigation includes the mapping of 3D loops genome-wide among hundreds of cell types, species and developmental lineages.
As genome-wide reference maps of looping interactions become widely available, a critical emerging goal will be to unravel the cause and effect relationship between looping and gene expression. Indeed, there is a great need in the field to build upon descriptive mapping studies and begin to perturb the 3D genome and evaluate the link of chromatin topology to function. One major limitation preventing progress is that genome-wide Hi-C technology requires more than six billion reads per replicate to obtain high quality, high resolution, genome-wide looping maps [3, 13, 21]. The financial and logistical difficulties of obtaining this read depth makes it intractable to conduct studies with a high number of samples with perturbations induced by genome editing or drug induction. Thus, there is a great need for a technology that creates ultra high-resolution 3D genome folding maps at a much lower cost.
Chromosome Conformation Capture Carbon Copy (5C) is proximity ligation technology pioneered by Dekker and colleagues [22, 23]. 5C adds a hybrid capture step to the classic Chromosome Conformation Capture (3C) method to facilitate the selection of all possible ligation products that occur only in a subset of the genome [22, 24–26]. Equivalent resolutions to Hi-C can be achieved at the fraction of the cost by only querying a subset of interactions in a 10-20 Megabase (Mb) subset of all genome-wide contacts [7, 9, 17, 27]. The ability to query a subset of genome contacts is important because genome-editing experiments are often conducted at only one specific location in the genome. The organizing principles governing genome folding can be queried at a key subset of loops without requiring the resources to map all loops genome wide, thus allowing many samples and perturbation conditions to be screened in a high-throughput manner.
Despite key advantages in the original 5C technique, it also has key challenges that have held back its widespread use, including: (1) the high number of cells (>40 million) required for the 3C template [2, 9, 27, 28], (2) the high amount of spatial noise caused by non-specific ligation products [29, 30] and (3) the non-comprehensive nature of the alternating primer design [7, 22–26, 31], resulting in many important interactions missing and a high degree of spatial noise. In the present study, we introduce two major updates to the 5C protocol that lead to a marked increase in resolution, decrease in spatial noise, and increased sensitivity and specificity of loop detection. We conduct a comparative analysis of in situ [3, 32] vs. canonical dilution 3C [2, 28] and a double alternating [17] vs. single alternating primer design [7, 22–26, 31] and report the downstream effect of these changes on 5C’s ability to detect bona fide looping interactions.
Results
Overview
A 5C experiment starts with preparation of the 3C template (Figure 1A-B). Chromatin is fixed within a population of cells with formaldehyde. In canonical dilution 3C [2, 28], cellular and nuclear membranes are disrupted and chromatin is digested in solution with a restriction enzyme (Figure 1A). Ligation is subsequently performed under dilute conditions that promote intra-molecular ligation. By contrast, in situ 3C [3, 32] involves restriction enzyme digest and ligation within intact nuclei. In both methods, cross-links are reversed and DNA is isolated to create the 3C template, which represents the genome-wide library of possible hybrid ligation junctions across a population cells (Figure 1B).
The second half of the 5C protocol involves a hybrid capture step based on ligation-mediated amplification to select only a distinct subset of junctions from the genome-wide 3C library (Figure 1C-F). Canonical 5C [7, 22–26, 31] is built on an alternating primer design in which every other fragment is represented by either a Forward (FOR) primer binding to the sense strand or a Reverse (REV) primer binding to the antisense strand (Figure 1C, left). The single alternating design only queries approximately half of all ligation junctions in a target region because only FOR-REV primer ligation events are possible (Figure 1D-E, left). More recently, Dekker, Lajoie. and colleagues created a new double alternating primer design [17] which incorporates two additional "left-oriented’ primers, LFOR and LREV (Figure 1C right). The LFOR primer orientation is designed to the antisense strand on fragments also queried by REV primers, whereas the LREV primer orientation is designed to the sense strand on fragments also queried by FOR primers. Thus, the double alternating 5C primer design, there are now two primers representing each fragment, leading to 4 possible primer ligation orientations (FOR-REV, LFOR-LREV, LFOR-REV, FOR-LREV) and the query of nearly all fragment-fragment ligation events in an a priori selected Megabase (Mb)-scale genomic region (Figure 1D-E right).
Double alternating primer design achieves increased loop detection sensitivity compared to single alternating design
We hypothesized that by using the double alternating design developed by Dekker and colleagues[17], we could markedly improve canonical 5C’s matrix resolution and loop detection sensitivity. To test this idea, we first started with a canonical dilution 3C template from pluripotent embryonic stem (ES) cells (detailed in Materials and Methods) and compared the quality of 5C libraries created at the same genomic region with both single alternating and double alternating primer designs. A tradeoff of the more comprehensive double alternating primer design is the possibility of artifactual ‘self-circles’ (i.e. ligation events between the 5’ and 3’ ends of the same restriction fragment; (5) and (6) in Figure 1D-E right). We counted the proportion of each possible primer ligation from the double alternating 5C experiment on a dilution 3C template from ES cells. There was an even distribution of ligation events across the four biologically informative primer-primer orientations ((1) FOR-REV: 21.3%, (2) LFOR-LREV: 20.4%, (3) LFOR-REV: 20.8%, (4) FOR-LREV: 20.9%). Importantly, self-circle ligation events ((5) LFOR-REV and (6) FOR-LREV from the same fragment) comprised only <0.1% of all primer ligations (Figure 1E), suggesting that the risk of self-ligation is very small.
We visually inspected 4 kb-binned heatmaps of 5C counts in Megabase-scale genomic regions around Sox2 and Zfp462 genes after matrix balancing and sequencing depth correction (detailed in Materials and Methods). We observed that the double alternating primer design results in marked improvement in specific, punctate looping signal between known long-range enhancer promoter-interactions compared to the single alternating primer design (Figure 2A-B). Double alternating 5C maps also showed less missing fragments than single alternating primer maps due to the increased complexity of ligation junctions that are queried and sequenced. In previous 5C studies, a smoothing window was required to reduce the blockiness of looping signal caused by missing ligation junctions [27, 33]. Here, with double alternating design, we use a 4 kb bin with no smoothing window and achieve punctate looping signal, little missing fragments and markedly reduced spatial noise (Figure 2A-B).
To further test our qualitative observation of increased looping sensitivity with the double alternating design, we also quantified chromatin looping interactions in each 5C dataset. We modeled binned interactions as a fold-enrichment relative to a background expected model based on distance dependence and local chromatin domain architecture (detailed in Materials and Methods). As previously published [27, 30, 33], we modeled these so-called Observed/Expected values with a parameterized logistic distribution and subsequently converted p-values to interaction scores (Figure 2C; detailed in Materials and Methods). After thresholding interaction scores, we clustered adjacent looping pixels into long-range looping interaction clusters (Figure 2D; detailed in Materials and Methods). Consistent with observations in Figure 2A-B, the interaction score and loop cluster maps also highlight punctate Sox2 and Zfp462 gene promoter-enhancer looping clusters (Figure 2C-D). As expected, the chromatin fragments anchoring the base of detected looping interactions contained high signal for H3K27ac, a chromatin modification known to demarcate active non-coding regulatory elements and active transcription start sites. Importantly, we identified key looping interactions between Zfp462 and distal enhancers with the double alternating primer design that were not present with the single alternating design. The well-established Sox2-super enhancer interaction [5, 9, 27, 33–35] was detected by the single alternating design, but significantly more punctate and less blocky/noisy with the double alternating design.
Our loop detection analysis also suggests that the double alternating primer design would enable more precise discovery of epigenetic marks involved in looping. We intersected cell-type specific annotations of epigenetic marks from ES cells and primary neural progenitor cells (NPCs) [33] with our identified looping clusters (Figure 2E). Looping clusters in the double alternating 5C library are significantly enriched for ES-specific CTCF and ES-specific enhancers and depleted of NPC-specific CTCF. Consistent with the notion that looping specificity is improved with the double alternating design, we see that looping clusters in the single alternating 5C library are also enriched for ES-specific features but to a lesser degree than the double alternating design. Overall, these data indicate that the double alternating primer design allows for more sensitive detection of both strong and weak looping interactions.
In situ 3C reduces spatial noise in 5C heatmaps compared to dilution 3C
We next assessed the quality of the double alternating 5C experiment using in situ 3C and dilution 3C templates. We prepared the in situ and dilution 3C templates from 2 million and 40 million ES cells cultured in 2i media, respectively, as previously reported (detailed in Materials and Methods). Both dilution and in situ 3C led to detection of previously reported looping interactions between Sox2 and Zfp462 and their target enhancers (Figure 3A-D, magenta arrowheads). Looping interactions from both in situ 3C and dilution 3C templates showed enrichment of ES-specific enhancers and ES-specific CTCF (Figure 3E). Although the enrichments and the number of loops appeared to be similar between the two templates, visual inspection of the maps revealed an extremely high degree of spatial noise and abnormal looping clusters from the dilution 3D template (Figure 3A-D). Spatial variance in dilution 3C was ~330x and ~280x higher than that of in situ 3C in the genomic regions around the Sox2 and Zfp462 genes, respectively (Figure 3F-G). In situ 3C resulted in a major improvement in spatial noise (Figure 3F-G) and led to looping interaction pixels grouped in more spherically shaped clusters with minimal background noise around the punctate looping pixels. These results indicate that in situ 3C is superior to dilution 3C in reducing overall spatial noise in heatmaps due to nonspecific ligation events.
Combined implementation of a double alternating primer design and in situ 3C allows for the use of lower genome copies than canonical 5C
We observed that implementing a double alternating primer design and in situ 3C noticeably improves the quality and resolution of our 5C heatmaps by reducing background noise and allowing for more sensitive detection of chromatin looping interactions (Figure 2-3). Therefore, we hypothesized that we could lower the genome copies required for loop detection by combining the two improvements. The advantage of lowering the required number of genome copies is that lower cell number 5C could be performed in the future, opening up opportunities for conducting 5C analysis on rare cell types and human tissue samples.
We performed double alternating 5C on an in situ 3C template made from ES cells cultured in 2i media. Canonical 5C typically performs the ligation-mediated amplification step on 200,000 genome copies (~590 ng) of the mouse 3C template. We tried 590ng, 245 ng, 120 ng, 12 ng, and 2.5 ng of the same in situ 3C library prepared from mouse ES cells in 2i media, representing 200,000, 100,000, 50,000, 5,000 and 1,000 mouse genome copies, respectively. To ensure that the total DNA mass did not affect 5C primer binding and ligation efficiencies, we mixed 3C templates with an excess of salmon sperm DNA (to a total DNA mass of 1,500 ng).
Visual inspection of heatmaps revealed that the 3C template mass could be reduced to 50,000 genome copies and still sensitively detect all gold-standard looping interactions (Figure 4A-D, Supplementary Figure 1A-D). Notably, the quality of chromatin looping signal is drastically reduced when the number of genome copies is further lowered to 5,000 and 1,000 (Figure 4A-D, Supplementary Figure 1A-D). Consistent with this result, quantitative chromatin enrichments were similar for 50,000-200,000 genome copies, but did not show interpretable results at 1,000-5,000 genome copies (Figure 4E). Since there were not any looping clusters called in the 5C libraries using 5,000 and 1,000 genome copies (Figure 4E, G), no enrichment or depletion was detected. Spatial noise was generally comparable with 200,000-50,000 genome copies, but was notably higher in libraries prepared with 1,000 and 5,000 genome copies (Figure 4H). Altogether, these data demonstrate that simultaneous implementation of a double alternating primer design and in situ 3C allows for successful 5C using lower genome copies. The implication of these results is that 5C might be performed on smaller cell populations in future studies.
Discussion/Conclusions
The invention of the canonical 5C procedure by Dekker, Dostie and colleagues enabled the creation of high-resolution, high-coverage 3D genome folding maps from a subset of the genome (up to ~30 Mb) at a fraction of the cost of Hi-C [7, 9, 17, 27]. Due to the markedly reduced cost, 5C is poised to have high utility in addressing the significant unmet need of comprehensive inquiry of the folding of all fragments within a large genomic locus across hundreds of biological perturbation conditions. The ability to create loop resolution maps across thousands of gene editing perturbations is essential for testing the functional relationship between genome structure and function.
Despite these key advantages, canonical 5C has been limited in looping detection sensitivity and specificity due to the alternating primer design which only queries half the ligation junctions and non-specific ligations leading to a low signal to noise ratio. In the present study, we present an updated version of the classic 5C procedure, 5C in situ double alternating (5C-ID). 5C-ID implements a double alternating primer design [17] and in situ 3C [3, 32], resulting in markedly increased sensitivity for looping signal detection and reduced off-target non-specific ligation junctions. Double alternating primers comprehensively bind to all possible ligation junctions [17] missed by the single alternating design [7, 22–26, 31], leading to markedly improved loop detection sensitivity. Moreover, by conducting restriction digestion and ligation steps of 3C in situ in the nucleus, we dramatically reduced spatial noise cause by known nonspecific ligations from classic dilution 3C[2, 3, 28–30, 32]. By combining these two changes, we were also able to maintain loop detection sensitivity with reduced genome copies. While canonical 5C was performed on 40-100 million cells[9, 15, 36], only 2 million cells were used here for 5C-ID. The evidence that genome copies can be decreased to 50,000, which suggests pellets of ~25,000-50,000 cells or possibly less are possible in the future for high quality 5C maps. Thus, 5C-ID creates high-coverage, high-quality heatmaps at loop resolution at a fraction of the cost and opens the future potential for low cell number analysis from rare cell types and human tissues.
Authors’ Contributions
JEPC and JHK conceived of the study. JHK, WG and JAB performed 5C. KRT implemented the computational pipeline. JEPC, JHK, KRT wrote the manuscript
Competing financial interests
The authors declare no competing financial interests.
Acknowledgements
We thank Job Dekker and Bryan Lajoie for providing us access to the My5C software and assistance in designing double alternating 5C primers. J. Phillips-Cremins is a New York Stem Cell Foundation (NYSCF) Robertson Investigator and an Alfred P. Sloan Foundation Fellow. This work was funded by The New York Stem Cell Foundation (J. Phillips-Cremins), the Alfred P. Sloan Foundation (J. Phillips-Cremins), the NIH Director’s New Innovator Award (1DP2MH11024701; J. Phillips-Cremins), a 4D Nucleome Common Fund grant (1U01HL12999801; J. Phillips-Cremins) and a joint NSF-NIGMS grant to support research at the interface of the biological and mathematical sciences (1562665; J. Phillips-Cremins).