Abstract
Physical interactions between genomic regions play critical roles in the regulation of genome functions, including gene expression. However, the methods for confidently detecting physical interactions between genomic regions remain limited. Here, we demonstrate the feasibility of using engineered DNA-binding molecule-mediated chromatin immunoprecipitation (enChIP) in combination with next-generation sequencing (NGS) (enChIP-Seq) to detect such interactions. In enChIP-Seq, the target genomic region is captured by an engineered DNA-binding complex, such as a CRISPR system consisting of a catalytically inactive form of Cas9 (dCas9) and a single guide RNA (sgRNA). Subsequently, the genomic regions that physically interact with the target genomic region in the captured complex are sequenced by NGS. Using enChIP-Seq, we found that the 5’HS5 locus, which regulates expression of the β-globin genes, interacts with multiple genomic regions upon erythroid differentiation in the human erythroleukemia cell line K562. Genes near the genomic regions inducibly associated with the 5’HS5 locus were transcriptionally up-regulated in the differentiated state, suggesting the existence of a coordinated transcription mechanism directly or indirectly mediated by physical interactions between these loci. Our data suggest that enChIP-Seq is a potentially useful tool for detecting physical interactions between genomic regions in a non-biased manner, which would facilitate elucidation of the molecular mechanisms underlying regulation of genome functions.
Introduction
Physical interactions between genomic regions play important roles in the regulation of genome functions, including transcription and epigenetic regulation [1]. Several techniques, such as fluorescence in situ hybridization (FISH) [2, 3], chromosome conformation capture (3C), and 3C-derived methods [4, 5], have been used to detect such interactions. Although these techniques are widely used, they have certain limitations. The resolution of FISH is low, i.e., apparent co-localization of FISH signals does not necessarily mean that the loci in question physically interact. In addition, FISH cannot be used in a non-biased search for interacting genomic regions. In 3C and related methods, molecular interactions are maintained by crosslinking with formaldehyde prior to digestion with a restriction enzyme(s). The digested DNA is purified after ligation of the DNA ends within the same complex. Interaction between genomic loci is detected by PCR using locus-specific primers or next-generation sequencing (NGS). As with FISH, 3C-based approaches also have intrinsic drawbacks. For example, these methods require enzymatic reactions, including digestion with restriction enzyme(s) and ligation of crosslinked chromatin; the difficulty of achieving complete digestion of crosslinked chromatin can result in detection of artifactual interactions. In addition, in 3C and its derivatives, it is difficult to distinguish the products of intra-molecular and inter-molecular ligation reactions, making it difficult to ensure that the detected signals truly reflect physical interactions between different genomic regions.
An alternative approach to detecting physical interactions between genomic regions is to purify specific genomic regions engaged in molecular interactions and then analyze the genomic DNA in the purified complexes. To purify specific genomic regions, we recently developed two locus-specific chromatin immunoprécipitation (locus-specific ChIP) technologies, insertional ChIP (iChIP) [6-10] (see review [11]) and engineered DNA-binding molecule-mediated ChIP (enChIP) [12-16] (see reviews [17, 18]). enChIP consists of the following steps (Figure 1): (i) A DNA-binding molecule or complex (DB) that recognizes a target DNA sequence in a genomic region of interest is engineered. Zinc-finger proteins [19], transcription activator-like (TAL) proteins [20], and a clustered regularly interspaced short palindromic repeats (CRISPR) system [21] consisting of a catalytically inactive Cas9 (dCas9) plus a single guide RNA (sgRNA) can be used as the DB. Tag(s) and a nuclear localization signal (NLS)(s) can be fused with the engineered DB, and the fusion protein(s) can be expressed in the cells of interest. (ii) If necessary, the DB-expressing cells are stimulated and crosslinked with formaldehyde or other crosslinkers. (iii) The cells are lysed, and chromatin is fragmented by sonication or digested with nucleases. (iv) Chromatin complexes containing the engineered DB are affinity-purified by immunoprecipitation or other methods. (v) After reverse crosslinking (if necessary), DNA, RNA, proteins, or other molecules are purified and identified by various methods including NGS and mass spectrometry.
As a model locus in this study, we focused on 5’HS, which plays critical roles in developmentally regulated expression of the β-globin genes and has been extensively analyzed (Figure 2A) [22, 23]. The 5’HS2-4 regions in the 5’HS locus behave as enhancers for β-globin expression [24-26]. By contrast, 5’HS5 functions as an insulator to prevent invasion of heterochromatin into the β-globin genes [27]. In addition, the 5’HS5 locus interacts with the 3’HS1 locus in the 3’ region of the globin locus [28, 29]. Moreover, CTCF, a major component of the insulator complex, plays a critical role in insulation and formation of a chromatin loop [27, 29]. However, the molecular mechanisms underlying the functions of 5’HS5 remain incompletely understood.
Here, we combined enChIP with NGS (enChIP-Seq) to detect genomic regions that physically interact with the 5’HS5 locus. Using enChIP-Seq, we showed that the 5’HS5 locus physically interacts with multiple genomic regions upon erythroid differentiation in the human erythroleukemia cell line K562. Thus, enChIP-Seq represents a potentially useful tool for analysis of genome functions.
Results and Discussion
Isolation of the 5’HS5 locus by enChIP using the CRISPR system
To purify the 5’HS5 locus by enChIP using the CRISPR system (Figures 1 and 2A) [12, 14], we generated human erythroleukemia K562-derived cells expressing 3xFLAG-dCas9 [14] and sgRNA targeting the 5’HS5 locus (Figure 2A). We designed two sgRNAs (#6 and #17) to target different sites in the 5’HS5 locus separated by 52 bp (Figure 2B). Like the parental K562 cells, the derivative cells were white, suggesting that they did not spontaneously express globin genes as a result of introduction of the CRISPR complex or its binding to the target locus. Crosslinked chromatin was fragmented by sonication and subjected to affinity purification with anti-FLAG antibody (Ab) (Figure 1B). Subsequently, crosslinking was reversed and DNA was purified from the isolated chromatin. As shown in Figure 2C, 0.1-0.4% of the input 5’HS5 locus was isolated, whereas the irrelevant Sox2 locus was not enriched, suggesting that enChIP using either of the two sgRNAs (#6 and #17) could isolate the 5’HS5 locus.
Detection of genomic regions that physically interact with the 5’HS5 locus
To identify genomic regions associated with the 5’HS5 locus on a genome-wide scale in erythroid cells under undifferentiated or differentiated conditions, K562-derived cells were mock-treated or treated with sodium butyrate (NaB) for 4 days before being crosslinked with formaldehyde and subjected to enChIP. The K562-derived cells changed from white to pink upon NaB treatment, suggesting that they had begun to express the globin genes, as wild-type K562 cells do in response to NaB. After isolating the 5’HS5 locus by enChIP, we subjected the purified DNA to NGS analysis. As expected, reads corresponding to the 5’HS5 locus were clearly detected in cells expressing either sgRNA #6 or #17, but not in cells expressing neither sgRNA (Figure 2D and Table 1). By contrast, no peak was detected at the irrelevant Sox2 locus (Figure 2E).
To identify genomic regions physically interacting with 5’HS5 upon erythroid differentiation, we analyzed the NGS peaks observed in the differentiated state. The CRISPR complex can interact with multiple genomic sites containing sequences similar to the sgRNA sequence [30-33]. In addition, in the absence of sgRNA, it is possible that dCas9 could bind non-specifically to some genomic sites in vivo. Therefore, the peaks identified for sgRNA #6 and #17 may include such off-target sites. To remove those off-target sites, we first eliminated peaks derived from non-specific binding of dCas9 in the absence of sgRNA from those identified for each sgRNA (Steps 1 and 2 in Figure 3). To identify peaks with confidence, we next established two criteria for choosing peaks based on NGS information from the target 5’HS5 locus: (1) tag number >5% of that of the target 5’HS5 locus, and (2) fold enrichment relative to input genomic DNA >10. As shown in Figure 3 (Step 2), 19 and 228 peaks for sgRNA #6 and #17, respectively, fulfilled these criteria. Next, to eliminate sgRNA-dependent off-target sites, we compared the peaks for sgRNA #6 and #17 and selected peaks detected in common by both sgRNA #6 and #17. These peaks were considered to represent regions engaged in bona fide physical interactions with the 5’HS5 locus (Step 3 in Figure 3). The six identified peaks could be classified into two categories: (1) peaks that were larger in the differentiated state, and (2) peaks constitutively observed in both the undifferentiated and differentiated states. The first category should contain genomic regions that inducibly associate with the 5’HS5 locus upon erythroid differentiation, whereas the second category should contain genomic regions constitutively associated with the 5’HS5 locus. To extract the peaks that grew larger specifically in the differentiated state, we selected peaks observed constitutively or in the undifferentiated state (Step 4 in Figure 3) and compared them with the six peaks extracted in Step 3 (Step 5 in Figure 3). As shown in Figure 3 (Step 5) and Table 2, the 5’HS5 site was the unique peak constitutively observed in the undifferentiated and differentiated states, whereas the five other peaks were larger specifically in the differentiated state. These included one intra-chromosomal interaction and four inter-chromosomal interactions (Table 2); the two peaks on chromosome 1 corresponding to inter-chromosomal interactions were adjacent to each other in the primary sequence. Next, we attempted to extract genomic regions that interacted with the 5’HS5 locus specifically in the undifferentiated state. However, bioinformatics analysis based on the aforementioned criteria identified no regions in this category (Figure S1).
We visualized some of the identified peaks in the UCSC Genome Browser (Figure 4). The peaks were clearly visible for both sgRNA #6 and #17. The adjacent NaB-specific peaks in chromosome 1 were both located in the first intron of the ZFN670 and ZFN670-695 genes. The other NaB-specific peaks were located in the vicinity of the MIR422A gene in chromosome 15 and between the TMEM151A and YIF1A genes in chromosome 11.
To confirm the interactions identified by enChIP-Seq, we used the ligation-mediated approach used in 3C-based assays (Figure S2). In this approach, cells are subjected to crosslinking, and then chromatin is randomly fragmented by sonication. After proximity ligation of genomic DNA, the junction between the target locus and a potential interacting site is amplified by PCR. Subsequently, a part of the amplified region in the potential interacting locus is detected by a second PCR. Amplification of a region in the second PCR suggests that the potential interacting locus is physically proximal to the target locus. When we used this assay to examine the amplified the ZNF670/ZFN670-695 locus in a NaB-specific manner only when the proximal ligation step was performed (Figure 5). This observation is consistent with the enChIP-Seq result (Figure 4) showing that the 5’HS5 locus interacts with the ZNF670/ZFN670-695 locus in the differentiated state in K562 cells. Thus, we were able to confirm the chromosomal interaction identified by enChIP-Seq by another independent method, suggesting that it is feasible to use enChIP-Seq to perform non-biased identification of physical interactions between genomic regions.
Transcription of genes near the 5’HS5-interacting genomic regions identified by enChIP-Seq could be directly or indirectly regulated by the induced association with the 5’HS locus. Such regulation could involve the 5’HS2-4 regions, which function as enhancers [24-26]. Therefore, we investigated whether mRNA levels of the genes in the vicinity (±10 kbp) of the 5’HS5-interacting genomic regions changed after NaB treatment. As shown in Figure 6, mRNA levels of the ZNF670, MIR422A, and CNIH2 genes were clearly up-regulated in the NaB-induced differentiated state, suggesting that the identified chromosomal interactions are involved in transcriptional regulation of these genes. At this time, it is not clear whether these gene products play any roles in erythroid development. Future studies should attempt to elucidate how the 5’HS locus regulates transcription of these genes. It is possible that the enhancer function of the 5’HS locus directly activates transcription of these genes via interactions with their promoters under differentiated conditions. Alternatively, these loci may be incorporated into the “transcription factory” [34] upon erythroid differentiation, independent of the enhancer function of the 5’HS locus.
Lack of interaction between the 5′HS5 and β-globin loci
Studies using 3C and derived methods suggested that the 5’HS5 and β-globin loci interact [28, 29]. Therefore, we sought to detect a physical interaction between these loci by enChIP-Seq. As shown in Figure 3, bioinformatics analysis based on the criteria described above (fold enrichment: >10, tag number: >5% of that of the target positions) did not extract genomic regions around the β-globin locus. In fact, the peak images did not indicate any physical interactions between 5’HS5 and the β-globin locus or 3’HS1 (Figure S3). Importantly, in this regard, it is not likely that bound dCas9/sgRNA complexes abrogate the formation of chromosomal loops between 5’HS5 and the β-globin or 3’HS1 locus, because the target positions of the sgRNA do not overlap with the CTCF-binding site and the 5’HS5 core region (Figure 2), and the cells maintained their capability to differentiate in response to NaB.
Several phenomena might explain the discrepancy between the results of this study and those of 3C-derived methods regarding the interaction of 5’HS5 with the β-globin locus. First, 3C-derived methods might be much more sensitive than enChIP-Seq. Specifically, because 3C-derived methods use PCR amplification to detect ligated regions consisting of different genomic regions, they may be able to detect transient or weak interactions that did not pass the criteria we used in this study. Second, the fragmentation of chromatin by sonication in enChIP-Seq may be too harsh to retain weak chromosomal interactions. By contrast, 3C-derived methods employ restriction digestion, which is much milder than sonication, to fragment chromatin. Third, physical interactions between genomic regions are likely to be regulated in a cell cycle-dependent manner, and it is possible that interactions between these regions may occur only in a certain phase of a cell cycle, making it difficult to detect by enChIP-Seq. Alternatively, our results raise the possibility that the ‘interactions’ detected by 3C and its derivatives reflect accessibility of the loci to the nucleases and ligases employed in these techniques, but do not necessarily reflect physical interactions between genomic regions. In fact, discrepancies between results of 3C or its derivatives and those of FISH have been suggested [35]. This possibility highlights the importance of confirming chromosomal interactions by independent methods.
Managing potential contamination of off-target sites
In this study, we identified physical interactions between genomic regions based on signals detected by enChIP-Seq. dCas9 can bind to multiple sites containing sequences similar to the sgRNA sequence [30, 31]. To eliminate potential contamination of our findings by off-target sites, we propose several strategies:
(1) Carefully examine the sequences of the detected peaks and remove those containing sequences similar to the target sequence.
(2) Use different conditions or cell types. Signals specifically detected in one condition or cell type should reflect true physical interactions between genomic regions.
(3) Use multiple different sgRNAs. Because different sgRNA are unlikely to engage in off-target binding at the same genomic regions, signals observed in common using different sgRNAs should reflect true physical interactions between genomic regions. In addition, cells expressing dCas9 without sgRNA should be used as a negative control to eliminate off-target sites associated with dCas9 in the absence of sgRNA.
(4) Use a sequential purification scheme. Cas9 orthologs derived from different bacterial species recognize distinct proto-spacer adjacent motif (PAM) sequences and can be used for genome editing and gene regulation [36]. Tagging of a given locus with dCas9s derived from different lineages and bearing distinct tags would make it feasible to sequentially purify the locus, minimizing contamination with off-target sites.
Using these techniques, we believe that we can effectively manage potential contamination of dCas9 off-target sites in enChIP analyses.
Conclusions
In this study, we used enChIP-Seq analysis to detect physical interactions between genomic regions. In K562-derived cells, the 5’HS5 locus physically interacted with multiple regions in the genome (Figure 4, Table 2). These interactions were induced by erythroid differentiation in response to NaB treatment (Figure 4, Table 2). Transcription of genes around the interacting genomic region was up-regulated in the differentiated state (Figure 6), suggesting a direct or indirect involvement of 5’HS enhancer activity in transcription of genes proximal to the interacting sites. Our results suggest that enChIP-Seq represents a potentially useful tool for performing non-biased searches for physical interactions between genomic regions, which would facilitate elucidation of the molecular mechanisms underlying regulation of genome functions.
Materials and Methods
Plasmids
3xFLAG-dCas9/pMXs-puro (Addgene #51240) was described previously [14]. To construct vectors for expression of sgRNAs, two oligos for each sgRNA were annealed and extended using Phusion polymerase (New England Biolabs) to make 100 bp double-stranded DNA fragments, as described previously [12]. The nucleotide sequences were as follows: hHS5 #6, 5’-TTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCGGATTCATAGCAGACA GCTA-3’ and 5’-GACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAACTAGCTGTCTGCTATGAA TCC-3’; hHS5 #17, 5’-TTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCGGGAAGATAGGGTAA GAGAC-3’ and 5’-GACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAACGTCTCTTACCCTATCTT CCC-3’. Fragments were purified following agarose gel electrophoresis and subjected to Gibson assembly (New England Biolabs) with the linearized sgRNA cloning vector (Addgene #41824), a gift from George Church [37], to yield sgRNA-hHS5 #6 and sgRNA-hHS5 #17. The gBlocks were excised with XhoI and HindIII and cloned into XhoI/HindIII-cleaved pSIR vector to generate self-inactivating retroviral vectors for sgRNAs, as described previously [14].
Cell culture
K562-derived cells [14] were maintained in RPMI (Wako) supplemented with 10% fetal calf serum (FCS).
Establishment of cells stably expressing 3xFLAG-dCas9 and sgRNA
Establishment of K562-derived cells expressing 3xFLAG-dCas9 was described previously [14]. To establish cells expressing both 3xFLAG-dCas9 and sgRNAs targeting the 5’HS5 locus, 2 μg of sgRNA-hHS5 #6/pSIR or sgRNA-hHS5 #17/pSIR was transfected along with 2 μg of pPAM3 into 1 × 106 293T cells [38]. Two days after transfection, K562-derived cells expressing 3xFLAG-dCas9 were infected with the supernatant (5 ml) of 293T cells containing the virus particles. K562-derived cells expressing both 3xFLAG-dCas9 and sgRNA-hHS5 #6 or sgRNA-hHS5 #17 were selected in RPMI medium containing 10% FCS, puromycin (0.5 μg/ml), and G418 (0.8 mg/ml).
Induction of differentiation of K562-derived cells
To induce erythroid differentiation of the K562-derived cells, cells were incubated in the presence of 1 mM NaB for 4 days.
enChIP-real-time PCR
enChIP-real-time PCR was performed as previously described [12], except that ChIP DNA Clean & Concentrator (Zymo Research) was used for purification of DNA. Primers used in the analysis are shown in Table S3.
enChIP-Seq and bioinformatics analysis
Undifferentiated or differentiated K562-derived cells (2 × 107 each) expressing 3xFLAG-dCas9 and sgRNAs were subjected to the enChIP procedure as described previously [12], except that ChIP DNA Clean & Concentrator was used for purification of DNA. NGS and data analysis were performed at the University of Tokyo as described previously [39, 40]. Additional data analysis for Step 2 in Figure 3 was performed at Hokkaido System Science Co., Ltd. Images of NGS peaks were generated using the UCSC Genome Browser (https://genome.ucsc.edu/cgi-bin/hgGateway).
Proximity ligation assay to confirm interactions between genomic regions
K562 cells (1 × 107) were fixed with 1% formaldehyde at 37°C for 5 min. The chromatin fraction was extracted and fragmented by sonication (fragment length, 2 kb on average) as described previously [41], except for the use of 800 μl of TE buffer (10 mM Tris pH 8.0, 1 mM EDTA) and a UD-201 ultrasonic disruptor (TOMY SEIKO). Sonicated chromatin (34 μl was treated with the End-It DNA End-Repair kit (Epicentre) in a 50 μl reaction mixture at room temperature for 45 min. After heating at 70°C for 10 min, reaction mixture (23.5 μl was incubated in the presence or absence of T4 DNA ligase (Roche) at room temperature for 2 h. After reverse crosslinking at 65°C followed by RNase A and Proteinase K treatment, DNA was purified using ChIP DNA Clean & Concentrator. The purified DNA was used as a template for the first PCR with KOD FX (Toyobo) and a primer set including one primer containing an I-SceI site that was biotinylated at the 5’ end (Table S3). PCR conditions were as follows: denaturing at 94°C for 2 min; 30 cycles of 98°C for 10 sec, 60°C for 30 sec, and 68°C for 6 min. The reaction mixture (15 μl) was mixed with 15 μl of Dynabeads M-280 Streptavidin (Thermo Fisher Scientific) and 500 μl of RIPA buffer (50 mM Tris [pH 7.5], 150 mM NaCl, 1 mM EDTA, 0.5% sodium deoxycholate, 0.1% SDS, 1% IGEPAL-CA630) at 4°C for 1 h. After three washes with RIPA buffer and one wash with 1 × NEBuffer 2 (New England Biolabs), the Dynabeads were treated with I-Sce I at 37°C for 2 h. The supernatant was collected, incubated at 65°C for 20 min, and used for the second PCR with AmpliTaq Gold 360 Master Mix (Applied Biosystems). PCR conditions were as follows: denaturing at 95°C for 10 min; 27 cycles of 95°C for 30 sec, 60°C for 30 sec, and 72°C for 1 min. Primers used in the analysis are shown in Table S3.
Acknowledgments
We thank G.M. Church for providing a plasmid (Addgene plasmid #41824), F. Kitaura for technical assistance, and T. Kikuchi, H. Horiuchi, and M. Tosaka for NGS analysis.