Abstract
Achieving a quantitative and predictive understanding of 3D genome architecture presents a major challenge and aspiration. However, this milestone will not be achieved without quantitative measurements of the key proteins driving nuclear organization. Here we report the quantification of CTCF and cohesin, two causal regulators of topological associating domains (TADs) in mammalian cells. Within the context of the cohesin/CTCF mediated loop extrusion model and recent imaging studies (Hansen 2017), here we determine the density of extruding cohesins and CTCF boundary permeability. Furthermore, co-immunoprecipitation studies of an endogenously tagged subunit (Rad21) confirms the presence of dimers and/or oligomers. Having established cell lines with accurately measured protein abundances, we report a simple method to conveniently count molecules of any Halo-tagged protein in the nucleus. We anticipate that these tools and results will advance a more quantitative understanding of 3D genome organization, and facilitate quantifying proteins involved in diverse biological processes.
Introduction
Folding of mammalian genomes into structures known as Topologically Associating Domains (TADs) is thought to help regulate gene expression while aberrant misfolding has been associated with disease (Dekker and Mirny, 2016; Hansen et al., 2018; Hnisz et al., 2017; Lupianez et al., 2015). CTCF and cohesin have emerged as master regulators of TADs since acute CTCF or cohesin depletion causes global loss of TADs (Gassler et al., 2017; Nora et al., 2017; Rao et al., 2017; Schwarzer et al., 2017; Wutz et al., 2017). Consistent with the key role played by CTCF and cohesin, models of genome folding through cohesin-mediated loop extrusion and CTCF binding have been remarkably successful in reproducing the general features of genomic contact maps at the level of TADs (Fudenberg et al., 2016, 2018; Sanborn et al., 2015). Despite such success, these models have been limited by a dearth of quantitative data to constrain the modeling. Importantly, the molecular stoichiometry of cohesin remains unknown, further limiting our ability to test various models. Building on our recent genomic and imaging studies of endogenously tagged CTCF and cohesin (Hansen et al., 2017), here we (1) determined the density of extruding cohesin complexes and estimated CTCF boundary permeability; (2) provided biochemical evidence that at least a subset of cohesin complexes exist as dimers or higher order structures and (3) developed a simple method for obtaining the absolute abundance of any protein fused with the widely used and highly versatile HaloTag (Los et al., 2008).
Absolute CTCF/cohesin quantification and implications for 3D genome organization
To obtain a reasonably accurate count of CTCF and cohesin molecules in the nucleus of mammalian cells, we took advantage of our previously validated mouse and human cell lines where CTCF (U2OS and mouse embryonic stem cells (mESC)) and the cohesin kleisin subunit Rad21 (mESC) were endogenously and homozygously Halo-tagged (Hansen et al., 2017). To establish a standard, we purified 3xFLAG-Halo-CTCF and Rad21-Halo-3xFLAG from insect cells and quantitatively labeled the purified proteins with the bright dye JF646 (Grimm et al., 2015). We then ran a known quantity of protein side-by-side with a known number of cells and quantified total protein abundance using “in-gel” fluorescence (Figure 1A; Materials and Methods). Note that JF646-labeling is quantitative in live cells (Yoon et al., 2016). This revealed that, on average, mESCs contain ~218,000 +/- 24,000 CTCF protein molecules (mean +/- std) and, if cohesin exists as a monomeric ring, ~87,000 +/- 35,000 cohesin complexes (Figure 1B; Rad21 appears to be the least abundant cohesin subunit, see Materials and Methods; these numbers are comparable to FCS-measurements from HeLa cells reported in a recent preprint (Cai et al., 2018)). Having previously determined the fraction of CTCF bound to specific sites in mESCs by single-molecule imaging (~49%) and the number of CTCF sites by ChIP-seq (~71,000) (Hansen et al., 2017), we can now calculate that an average CTCF binding site is occupied ~50% of the time (assuming that an “average” cycling cell is half-way through the cell cycle and contains 3 genome copies; full details in Materials and Methods). In the context of the loop extrusion model (Fudenberg et al., 2018), where cohesin extrudes DNA loops until it encounters convergent binding sites occupied by CTCF, this suggests that the time-averaged occupancy of an average CTCF boundary is ~50% (Figure 1C-D). Likewise, we find that the density of extruding cohesin molecules is ~4.2 per Mb assuming cohesin exists as a monomeric ring or ~2.1 per Mb if cohesin forms dimers (full details on calculation in Materials and Methods). These numbers will be useful starting points for constraining and parameterizing models of 3D genome organization, though we note that they represent average values. Although it remains unclear how ChIP-Seq peak strength relates to time-averaged occupancy, the wide distribution of CTCF ChIP-Seq read counts (Figure 1D) suggests that some CTCF binding sites will be occupied most of the time, while other sites are rarely bound (i.e. 50% is an average). Likewise, the density of extruding cohesin complexes is unlikely to be uniform across the genome (e.g., due to uneven loading or obstacles to cohesin extrusion by other large DNA binding protein complexes). It is also worth mentioning that the CTCF abundance in human U2OS cells (~105,00 +/- 15,000 proteins/cell) is less than half of that seen in mESCs. Thus, cell-type specific control of chromatin looping may be achieved in part by regulating CTCF abundance.
Mammalian cohesin can form dimers and/or higher order oligomers
Interpreting the cohesin data described above requires an accurate count of its molecular stoichiometry. In addition to potentially engaging in loop extrusion, cohesin complexes play important roles in sister chromatid cohesion and DNA repair (Guacci et al., 1997; Losada et al., 1998; Michaelis et al., 1997; Onn et al., 2008). Perhaps most critically, cohesin is generally assumed to exist as a single tripartite ring composed of the subunits Smc1, Smc3 and Rad21/Scc1 at 1:1:1 stoichiometry (Nasmyth, 2011). However, the hypothesized ability of cohesin to extrude DNA and independently stop once it encounters CTCF at both ends of a loop (Fudenberg et al., 2018)(Figure 1C), seems intuitively more consistent with a dimeric complex (Figure 2A). Such a dimer model seems likely if direct protein-protein contacts between CTCF and a cohesin subunit are required to halt cohesin mediated extrusion, because two such interactions would presumably be necessary to stop extrusion at both CTCF-bound sites. Indeed, higher order oligomeric cohesin structures have been proposed based upon the unusual genetic properties of cohesin subunits in budding yeast (Eng et al., 2015; Skibbens, 2016). Moreover, a previous study used self co-immunoprecipitation (CoIP) of cohesin subunits to suggest a hand-cuff shaped dimer model for cohesin (Zhang et al., 2008). However, this study has remained highly controversial (Nasmyth, 2011). Since this study (Zhang et al., 2008) relied on over-expressed epitope-tagged cohesin subunits and given our recent observations that over-expression of the Rad21 subunit does not faithfully recapitulate the properties of endogenously tagged Rad21 (Hansen et al., 2017), we decided to revisit this important issue using endogenous tagging without overexpression. First, we generated mESCs where one endogenous Rad21 allele was Halo-V5 tagged while the other allele was not tagged (clone C85; Figure 2B-C; see Materials and Methods for details). In addition, we also generated mESCs where one allele of Rad21 was tagged with Halo-V5 and the other with SNAP-3xFLAG (clone B4; Figure 2B-C). If cohesin exclusively existed as a single ring containing one Rad21 subunit, a V5 IP of Rad21-Halo-V5 should not pull down the Rad21 protein generated from the other allele. However, in the C85 clonal line, the V5 co-IP clearly precipitated wild-type Rad21 (Figure 2D). This cohesin:cohesin interaction appears to be protein-mediated rather than dependent on DNA association since benzonase treatment, which leads to complete DNA degradation (Figure 2 – Figure Supplement 1B) did not interfere with co-IP (Figure 2D; single-color blots in Figure 2 – Figure Supplement 1A). This demonstrates that Rad21 either directly or indirectly self-associates in a protein-mediated and biochemically stable manner consistent with dimers or higher order oligomers.
To independently verify this result and to ensure that the coIP’ed Rad21 was not a degradation product of the tagged protein, we repeated these co-IP studies in the clonal cell line B4, where the two endogenous Rad21 alleles express orthogonal epitope tags. Again, a V5-IP efficiently pulled down Rad21- SNAP-3xFLAG (Figure 2E) and, reciprocally, a FLAG-IP pulled down Rad21-Halo-V5 (Figure 2F). As before, the Rad21 self-interaction was entirely benzonase-resistant and thus independent of nucleic acid binding as this enzyme degrades both DNA and RNA (Figure 2 – Figure Supplement 1B). Under the simplest assumption of cohesin forming dimers, using IP and CoIP efficiencies we calculated that at least ~10% of cohesin is in a dimeric state during our pull-down experiment (full calculation details in Material and Methods). This percentage is almost certainly a significant underestimate of the actual oligomeric vs. monomeric ratio in live cells, since we expect a substantial proportion of the self-interactions not to survive cell lysis and the typically harsh IP procedures. Thus, while these results cannot exclude that some or even a majority of mammalian cohesin exists as a single-ring (Figure 2A), they do demonstrate that a substantial population exists as dimers or oligomers. Whether this subpopulation represents handcuff-like dimers, oligomers (Figure 2A), cohesin clusters (Hansen et al., 2017) or an alternative state will be an important direction for future structural studies.
A simple general method for counting Halo-tagged proteins in live cells
Here we have illustrated how absolute quantification of protein abundance can provide crucial functional insights into mechanisms regulating genome organization when integrated with genomic and/or imaging data (Figure 1; (Hansen et al., 2017)). The HaloTag (Los et al., 2008) is a popular and versatile protein-fusion platform that has found application in a broad range of experimental systems (England et al., 2015). Indeed, it is currently the preferred choice for live-cell single molecule imaging. Combined with the development of Cas9-mediated genome-editing (Ran et al., 2013), endogenous Halo-tagging of proteins has thus become the gold standard (Chong et al., 2018; Hansen et al., 2017; Rhodes et al., 2017a, 2017b; Stevens et al., 2017; Teves et al., 2016, 2018; Youmans et al., 2018), because it avoids the now well-established limitations and potential artifacts associated with protein overexpression (Hansen et al., 2017; Shao et al., 2018; Teves et al., 2016).
Now that we have determined the absolute abundance of CTCF and cohesin in a few cell lines (Figure 1B), determining the absolute abundance of any other Halo-tagged protein becomes straightforward: by growing your cell line of interest side-by-side with one of the cell lines characterized here (e.g. C45 mESC Rad21-Halo), absolute quantification can be achieved simply by measuring the relative fluorescence intensity using either microscopy or flow cytometry (Figure 3). To illustrate this, here we compared the fluorescence intensity of mESC lines carrying homozygously Halo-tagged Sox2 (Teves et al., 2016) and TBP (Teves et al., 2018) to our mESC C45 Rad21-Halo mESC line, and determined the average protein copy number per cell to be ~235,100 +/- 34,100 for Halo-Sox2 and ~47,199 +/- 3,300 for Halo-TBP (Figure 3; Figure 3 – Figure Supplement 1). The HaloTag knock-in cell lines described here will be freely available to the research community for use as a convenient standard to enable rapid absolute quantification of any Halo-tagged protein of interest.
Discussion
Our results strongly suggest that a significant subpopulation of cohesin (>10%) exists as either a dimer or higher order complexes (Figure 2A) consistent with an earlier study that relied on over-expression of tagged molecules (Zhang et al., 2008). Along these lines, the related bacterial SMC complex, MukBEF, also forms a dimer (Badrinarayanan et al., 2012) and budding yeast cohesin exhibits inter-allelic complementation (Eng et al., 2015) consistent with a dimeric or higher order architecture. While this does not exclude that some cohesin molecules exist as single rings, it seems evident that further elucidating the molecular architecture of extruding cohesin should be an urgent goal for future studies. Moreover, although polymer-modeling of 3D genome organization is rapidly advancing (Fudenberg et al., 2018; Nuebler et al., 2017), we suggest that a paucity of quantitative data to inform us of the stoichiometries of key 3D genome organizers constrains our ability to test the various models that have been reported. We hope that the data presented here will prove useful in informing and advancing such efforts in the future.
Given that Halo-tagging has become increasingly common, we also hope that the simple method presented here for absolute protein quantification in vivo (Figure 3) will find widespread use. To this end, we will freely share the lines described here as standards for either microscopy- or flow cytometry-based absolute quantifications of any Halo-tagged protein of interest. We also note that although Fluorescence Correlation Spectroscopy (FCS) remains a powerful complementary and orthogonal tool for measuring protein concentrations (Politi et al., 2018), it requires sophisticated imaging and analysis infrastructure, while conversion to absolute protein abundance depends on precise measurement of nuclear or cytoplasmic volume. Since volume scales with the cube of the radius, even small errors in measuring the radius can result in large volume errors. For these reasons, we hope that the method described here will make accurate counting of protein molecules more accessible and convenient.
Materials and Methods
Cell Culture
JM8.N4 mouse embryonic stem cells (Pettitt et al., 2009) (Research Resource Identifier: RRID:CVCL_J962; obtained from the KOMP Repository at UC Davis) were cultured as previously described (Hansen et al., 2017). Briefly, mESC lines were grown on plates pre-coated with 0.1% gelatin (autoclaved and filtered; Sigma-Aldrich, G9391) under feeder free conditions in knock-out DMEM with 15% FBS and LIF (full recipe: 500 mL knockout DMEM (ThermoFisher #10829018), 6 mL MEM NEAA (ThermoFisher #11140050), 6 mL GlutaMax (ThermoFisher #35050061), 5 mL Penicillin-streptomycin (ThermoFisher #15140122), 4.6 μL 2-mercapoethanol (Sigma-Aldrich M3148), 90 mL fetal bovine serum (HyClone FBS SH30910.03 lot #AXJ47554)). mES cells were fed by replacing half the medium with fresh medium daily and passaged every two days by trypsinization. Human U2OS osteosarcoma cells (Research Resource Identifier: RRID:CVCL_0042; a gift from David Spector’s lab, Cold Spring Harbor Laboratory) were grown as previously described (Hansen et al., 2017). Briefly, U2OS cells were grown in low glucose DMEM with 10% FBS (full recipe: 500 mL DMEM (ThermoFisher #10567014), 50 mL fetal bovine serum (HyClone FBS SH30910.03 lot #AXJ47554) and 5 mL Penicillin-streptomycin (ThermoFisher #15140122)) and were passaged every 2-4 days before reaching confluency. Both mouse ES and human U2OS cells were grown in a Sanyo copper alloy IncuSafe humidified incubator (MCO-18AIC(UV)) at 37°C/5.5% CO2. Both the mESC and U2OS cell lines were pathogen-tested and found to be clean and the U2OS cell line was authenticated through STR profiling. Full details on pathogen-testing and authentication can be found elsewhere (Hansen et al., 2017).
CRISPR/Cas9-mediated genome editing
CTCF knock-in U2OS and mESC lines were as previously described (Hansen et al., 2017). The Rad21 knock-in C85 and B4 mESC clones were sequentially created roughly according to published procedures (Ran et al., 2013), but exploiting the HaloTag and SNAPf-Tag to FACS for edited cells. The SNAPf-Tag is an optimized version of the SNAP-Tag, and we purchased a plasmid encoding this gene from NEB (NEB, Ipswich, MA, #N9183S). We transfected mESCs with Lipofectamine 3000 (ThermoFisher L3000015) according to manufacturer’s protocol, co-transfecting a Cas9 and a repair plasmid (2 μg repair vector and 1 μg Cas9 vector per well in a 6-well plate; 1:2 w/w). The Cas9 plasmid was slightly modified from that distributed from the Zhang lab (Ran et al., 2013): 3xFLAG-SV40NLS-pSpCas9 was expressed from a CBh promoter; the sgRNA was expressed from a U6 promoter; and mVenus was expressed from a PGK promoter. For the repair vector, we modified a pUC57 plasmid to contain the tag of interest (Halo-V5 for C85 or SNAPf-3xFLAG for B4) preceded by the Sheff and Thorn linker (GDGAGLIN) (Sheff and Thorn, 2004), and flanked by ~500 bp of genomic homology sequence on either side. To generate the C85 Rad21-Halo-V5 heterozygous clone, we used three previously described sgRNAs (Hansen et al., 2017) that overlapped with the STOP codon and, thus, that would not cut the repair vector (see table below for sequences). To generate the B4 Rad21-Halo-V5/Rad21-SNAPf-3xFLAG tagged clone, we re-targeted clone C85 with sgRNAs specific to the “near wild-type” allele (see below) while providing the SNAPf-3xFLAG repair vector.
We cloned the sgRNAs into the Cas9 plasmid and co-transfected each sgRNA-plasmid with the repair vector individually. 18–24 hr later, we then pooled cells transfected with each of the sgRNAs individually and FACS-sorted for YFP (mVenus) positive, successfully transfected cells. YFP-sorted cells were then grown for 4–12 days, labeled with 500 nM Halo-TMR (Halo-Tag knock-ins) or 500 nM SNAP-JF646 (SNAPf-Tag knock-in) and the cell population with significantly higher fluorescence than similarly labeled wild-type cells, FACS-selected and plated at very low density (~0.1 cells per mm2). Clones were then picked, expanded and genotyped by PCR using a three-primer PCR (genomic primers external to the homology sequence and an internal Halo or SNAPf primer). Successfully edited clones were further verified by PCR with multiple primer combinations, Sanger sequencing and Western blotting. The chosen C85 and B4 clones show similar tagged protein levels to the endogenous untagged protein in wild-type controls (Figure 2C).
Genomic DNA sequencing of the C85 heterozygous clone showed the expected Halo-V5-targeted allele, and a “near wild-type” allele, where repair following Cas9-cutting generated a 4 bp deletion (nt 2145-2148 in the NCBI Reference Sequence NM_009009.4), expected to result in a reading frame shift replacing the 2 most C-terminal amino acids (II) with SEELDVFELVITH. The mutation was repaired in clone B4 by providing a corrected SNAPf-3xFLAG repair vector.
All plasmids used in this study are available upon request. The table below lists the primers used for genome editing and genotyping of the Rad21 knock-in clones.
Antibodies
Antibodies were as follows: ChromPure mouse normal IgG from Jackson ImmunoResearch; anti-V5 for IP from Abcam (ab9116) and for Western Blot (WB) from ThermoFisher (R960-25); anti-FLAG for IP (F7425) and for WB (F3165) from Sigma-Aldrich; anti-Rad21 for WB from Abcam (ab154769); anti-Halo for WB from Promega (G9211); anti-βactin for WB from Sigma-Aldrich (A2228).
Western blot and co-immunoprecipitation (CoIP) experiments
Cells were collected from plates by scraping in ice-cold phosphate-buffered saline (PBS) with PMSF and aprotinin, pelleted, and flash-frozen in liquid nitrogen.
For Western blot analysis, cell pellets where thawed on ice, resuspended to 1 mL/10 cm plate of low-salt lysis buffer (0.1 M NaCl, 25 mM HEPES pH 7.5, 1 mM MgCl2, 0.2 mM EDTA, 0.5% NP-40 and protease inhibitors), with 125 U/mL of benzonase (Novagen, EMD Millipore), passed through a 25G needle, rocked at 4°C for 1 hr and 5M NaCl was added to reach a final concentration of 0.2 M. Lysates were then rocked at 4°C for 30 min and centrifuged at maximum speed at 4°C. Supernatants were quantified by Bradford. 15μg of proteins were loaded on 8% Bis-Tris SDS-PAGE gel and transferred onto nitrocellulose membrane (Amersham Protran 0.45 um NC, GE Healthcare) for 2 hr at 100V.
For chemiluminescent Western blot detection with HRP-conjugated secondary antibodies, after the transfer the membrane was blocked in TBS-Tween with 10% milk for at 1 hr at room temperature and blotted overnight at 4°C with primary antibodies in TBS-T with 5% milk. HRP-conjugated secondary antibodies were diluted 1:5000 in TBS-T with 5% milk and incubated at room temperature for an hour.
For fluorescence detection, after the transfer the membrane was blocked with the Odyssey® Blocking Buffer (PBS) for 1 hr at room temperature, followed by overnight incubation at 4°C with primary antibodies in Odyssey® Blocking Buffer (PBS) and PBS (1:1). IRDye secondary antibodies were used for detection at 1:5000 dilution and 1 hour incubation at room temperature. After extensive washes, the membrane was scanned with a LI-COR Odyssey CLx scanner.
For co-immunoprecipitation experiments (CoIP), cell pellets where thawed on ice, resuspended to 1 ml/10 cm plate of cell lysis buffer (5 mM PIPES pH 8.0, 85 mM KCl, 0.5% NP-40 and protease inhibitors), and incubated on ice for 10 min. Nuclei were pelleted in a tabletop centrifuge at 4°C, at 4000 rpm for 10 min, and resuspended to 0.5 mL/10 cm plate of low salt lysis buffer either with or without benzonase (600U/ml) and rocked for 4 hours at 4°C. After the 4-hour-incubation the salt concentration was adjusted to 0.2M NaCl final and the lysates were incubated for another 30 minutes at 4°C. Lysates were then cleared by centrifugation at maximum speed at 4°C and the supernatants quantified by Bradford. In a typical CoIP experiment, 1 mg of proteins was diluted in 1 mL of CoIP buffer (0.2 M NaCl, 25 mM HEPES pH 7.5, 1 mM MgCl2, 0.2 mM EDTA, 0.5% NP-40 and protease inhibitors) and pre-cleared for 2 hrs at 4°C with protein-G sepharose beads (GE Healthcare Life Sciences) before overnight immunoprecipitation with 4 mg of either normal serum IgGs or specific antibodies as listed above. Some pre-cleared lysate was kept at 4°C overnight as input. Protein-G-sepharose beads precleared overnight in CoIP buffer with 0.5% BSA were then added to the samples and incubated at 4°C for 2 hr. Beads were pelleted and all the CoIP supernatant was removed and saved for phenol-chloroform extraction of DNA. The beads were then washed extensively with CoIP buffer, and the proteins were eluted from the beads by boiling for 5 min in 2X SDS-loading buffer and analyzed by SDS-PAGE and Western blot.
Estimate of cohesin dimer-to-monomer ratio from CoIP experiments
Assuming that a dimeric state is responsible for the observed protein-based cohesin self-interaction, we calculated the percentage of cohesin molecules forming dimers from our CoIP experiments in the clonal cell line B4. In these cells one allele of Rad21 is tagged with Halo-V5 and the other with SNAP-3xFLAG, and the two proteins are expressed at virtually identical levels (Figure 2C). We also assumed that V5:V5 and FLAG:FLAG dimers are formed with the same likelihood of V5:FLAG dimers, the latter being the only ones that our assay probes for. Since we observed no difference when treating with benzonase, we averaged all Western Blot results from both the V5 and the FLAG reciprocal pull-downs (Figure 2E and F). We used the ImageJ “Analyze Gels” function (Schindelin et al., 2012) to measure pull-down and input (IN) band intensities (I) and used those numbers to calculate IP and CoIP efficiencies (%) as follows: with 0.015 being the percent of input loaded onto gel as a reference and 0.1 or 0.9 the amount of the pull-down material loaded onto gel to quantify the IP or CoIP efficiency, respectively. Within the assumed scenario, we will use the V5 pull-down of Figure 2E to illustrate our calculations. The V5 antibody immunoprecipitates Rad21 V5 monomers (MV5), V5:V5 dimers (DV5), and V5:FLAG dimers (DV5-FLAG). The %IP (i.e., the fraction of all V5 molecules that are pulled down) is thus the sum of the three terms: where each DV5 contains two V5 molecules, and a DV5-FLAG contains a single V5 molecule. Since we assumed an equal likelihood of V5 and V5-FLAG dimers, the equation becomes:
Since the total number of V5 and FLAG-tagged Rad21 molecules are the same: thus
Finally, adjusting for the efficiency of the V5 pull-down, the total percentage of Rad21 molecules in monomers can be calculated as: and
After performing the calculations described above, the resulting percentages of cohesin molecules in dimers for all the experiments were:
V5 IP, untreated: 11.23%
V5 IP, Benzonase: 7.60%
FLAG IP, untreated: 9.07%
FLAG IP, Benzonase: 10.25%
with an average of 9.54% ± 1.57% (standard deviation).
DNA extraction and quantification
For DNA extraction, the CoIP supernatant was extracted twice with an equal volume of phenol-chloroform (UltraPure™ Phenol:Chloroform:Isoamyl Alcohol (25:24:1, v/v)). After centrifugation at room temperature and maximum speed for 5 minutes, the aqueous phase containing DNA was added of 2 volumes of 100% ethanol and precipitated 30 minutes at −80°C. After centrifugation at 4°C for 20 minutes at maximum speed, DNA was re-dissolved in 25 μl of water and quantified by nanodrop. About 100ng of the untreated sample DNA, or an equal volume from the nuclease treated samples, were used for relative quantification by quantitative PCR (qPCR) with SYBR Select Master Mix for CFX (Applied Biosystems, ThermoFisher) on a BIO-RAD CFX Real-time PCR system.
Primers for DNA quantification were as follows:
Actb promoter forward: CATGGTGTCCGTTCTGAGTGATC
Actb promoter reverse: ACAGCTTCTTTGCAGCTCCTTCG
Expression and purification of recombinant 3xFLAG-Halo-CTCF and Rad21-Halo-3xFLAG
Recombinant Bacmid DNAs for the fusion mouse proteins 3xFLAG-Halo-CTCF-His6 (1086 amino acids; 123.5 kDa) and His6-Rad21-Halo-3xFLAG (972 amino acids; 110.2 kDa) were generated from pFastBAC constructs according to manufacturer’s instructions (Invitrogen). Recombinant baculovirus for the infection of Sf9 cells was generated using the Bac-to-Bac Baculovirus Expression System (Invitrogen). Sf9 cells (~2×106/ml) were infected with amplified baculoviruses expressing Halo-CTCF or Rad21-Halo. Infected Sf9 suspension cultures were collected at 48 hr post infection, washed extensively with cold PBS, lysed in 5 packed cell volumes of high salt lysis buffer (HSLB; 1.0 M NaCl, 50 mM HEPES pH 7.9, 0.05% NP-40, 10% glycerol, 10 mM 2-mercaptoethanol, and protease inhibitors), and sonicated. Lysates were cleared by ultracentrifugation, supplemented with 10 mM imidazole, and incubated with Ni-NTA resin (Qiagen) for either 90 mins for Halo-CTCF or 16 hrs for Rad21-Halo. Bound proteins were washed extensively with HSLB with 20 mM imidazole, equilibrated with 0.5 M NaCl HGN (50 mM HEPES pH 7.9, 10% glycerol, 0.01% NP-40) with 20 mM imidazole, and eluted with 0.5 M NaCl HGN supplemented with 0.25 M imidazole. Eluted fractions were analyzed by SDS-PAGE followed by PageBlue staining.
Peak fractions were pooled and incubated with anti-FLAG (M2) agarose (Sigma) and 3X molar excess fluorogenic JF646 for 4 hr in the dark. Bound proteins were washed extensively with HSLB, equilibrated to 0.2M NaCl HGN, and eluted with 3xFLAG peptide (Sigma) at 0.4 mg/ml. Protein concentrations were determined by PageBlue staining compared to a β-Galactosidase standard (Sigma). HaloTag Standard (Promega) was labeled according to the method described above to determine the extent of fluorescent labeling.
Quantification of CTCF and Rad21 molecules per cell
The number of CTCF and Rad21 molecules per cell was quantified by comparing JF646-labelled cell lysates to known amounts of purified JF646-labelled protein standards (e.g. 3xFLAG-Halo-CTCF-His6 or His6-Rad21-Halo-3xFLAG) as shown in Figure 1A. JM8.N4 mouse embryonic stem cells (either C45 mRad21-Halo-V5; C59 FLAG-Halo-mCTCF, mRad21-SNAPf-V5; or C87 FLAG-Halo-mCTCF) were grown overnight on gelatin-coated P10 plates and human U2OS osteosarcoma C32 FLAG-Halo-hCTCF cells on P10 plates. Cells were then labelled with 500 nM (final concentration) Halo-JF646 dye (Grimm et al., 2015) in cell culture medium for 30 min at 37°C/5.5% CO2. Importantly, it has previously been shown that Halo-JF646 labeling is quantitative for cells grown in culture (Yoon et al., 2016). Cells were washed with PBS, dissociated with trypsin, collected by centrifugation and re-suspended in 1 mL PBS and stored on ice in the dark. Cells were diluted 1:10 and counted with a hemocytometer. Cells were then collected by centrifugation and resuspended in 1x SDS loading buffer (50mM Tris-HCl, pH 6.8, 100mM DTT, 2.5% beta-mercaptoethanol, 2% SDS, 10% glycerol) to a concentration of ~10,000-20,000 cells per μL. 5-8 biological replicates were collected per cell line.
Cell lysates equivalent to 5.0 x 104 to 1.5 x 105 cells were run on 10% SDS-PAGE alongside known amounts of purified JF646-labelled 3xFLAG-Halo-CTCF-His6 or His6-Rad21-Halo-3xFLAG. The protein standards were processed similar to the cell lysates to account for any loss of JF646 fluorescence due to denaturation or SDS-PAGE, allowing for quantitative comparisons. JF646-labelled proteins were visualized on a Pharos FX-plus Molecular Imager (Bio-Rad) using a 635 nm laser line for excitation and a Cy5-bandpass emission filter. Band intensities were quantified using Image Lab (Bio-Rad). From the absolute protein standards, we calculated the fluorescence per protein molecule, such that we could normalize the cell lysate fluorescence by the fluorescence per molecule and the known number of cells per lane to determine the average number of molecules per cell.
Fractional occupancy and mean density calculations
Next, we calculated the fractional occupancy of CTCF in JM8.N4 mouse embryonic stem cells. Previously (Hansen et al., 2017), using ChIP-Seq we found 68,077 MACS2-called peaks in wild-type mESCs and 74,374 peaks in C59 FLAG-Halo-mCTCF/mRad21-SNAPf-V5 double knock-in mESCs. If we take the mean, this corresponds to ~71,200 CTCF binding sites in vivo. This is per haploid genome. An “average” cell is halfway through the cell cycle and thus contains 3 genomes. In total, an “average” mES cell therefore contains ~213,600 CTCF binding sites. Previously (Hansen et al., 2017), we found that 48.9% and 49.3% of Halo-mCTCF molecules were bound to cognate binding sites in the C59 and C87 cell lines (two independent clones where CTCF has been homozygously Halo-Tagged), respectively. This corresponds to a mean of 49.1%. The average number of Halo-mCTCF molecules per cell was 217,600 ± 26,000 and 218,500 ± 22,700 in the C59 and C87 cell lines, respectively (mean across biological replicates ± standard deviation). This corresponds to a mean of ~218,000 molecules per cell. Thus, the average occupancy (i.e. fraction of time the site is occupied) per CTCF binding site is:
Thus, an average CTCF binding site is bound by CTCF 50% of the time in mES cells. Note, that this analysis assumes that all binding sites are equally likely to be occupied. Most likely, some of the sites will exhibit somewhat higher and lower fractional occupancy as suggested by Figure 1D.
Within the context of the loop extrusion model (Fudenberg et al., 2016; Sanborn et al., 2015), it is crucial to know the average density of extruding cohesin complexes (e.g. number of extruding cohesins per Mb). We found the average number of mRad21-Halo molecules per JM8.N4 mES cell to be ~86,900 ± 35,600 (mean across biological replicates ± standard deviation). Previously (Hansen et al., 2017), we found 39.8% of mRad21-Halo molecules to be topologically bound to chromatin in G1 phase and 49.8% in S/G2-phase. After DNA replication begins in S-phase, cohesin adopts multiple functions other than loop extrusion (Skibbens, 2016). Thus, we will use 39.8% as an estimate of the fraction of cohesin molecules that are topologically engaged and involved in loop extrusion throughout the cell cycle. The estimated size of the inbred C57BL/6J mouse genome, the strain background from which the JM8.N4 mES cell line is derived, is 2,716 Mb (Waterston et al., 2002). Importantly, using single-molecule tracking we found that essentially all endogenously tagged mRad21-Halo protein is incorporated into cohesin complexes (Hansen et al., 2017). Thus, we can assume that the number of Rad21 molecules per cell corresponds to the number of cohesin complexes per cell. Thus, we get an average density of “loop extruding” cohesin complexes of (assuming again, that an “average” cell contains 3 genomes):
Thus, on average each megabase of chromatin contains 4.24 loop extruding cohesion molecules. We note that it is still not clear whether cohesin functions as a single ring or as a pair of rings (Skibbens, 2016). Thus, if cohesin functions as a single ring, the estimated average density is 4.24 extruding cohesins per Mb and if cohesin functions as a pair, the estimated average density is extruding cohesin complexes per Mb. We also note that it is currently unclear whether or not the density of extruding cohesins is likely to be uniform across the genome.
Conversion based calculation of absolute abundance of Halo-tagged cell lines
To obtain the absolute abundance of the Halo-Sox2 (Teves et al., 2016) and Halo-TBP (Teves et al., 2018) cell lines, we grew them side-by-side with the C45 Halo-Rad21 knock-in cell line. We labeled them with 500 nM Halo-TMR (Promega G8251) for 30 min at 37°C/5.5% CO2 in a tissue-culture incubator, washed out the dye (remove medium; add PBS; remove medium; add fresh medium) and then immediately prepared the cell for Flow Cytometry. We collected the cell through trypsinization and centrifugation, resuspended the cells in fresh medium, filtered the cells through a 40 μm filter and placed the cells on ice until their fluorescence was read out by Flow Cytometry (~20 min delay). Using a LSR Fortessa (BD Biosciences) flow cytometer, cells were gated using forward and side scattering. TMR fluorescence was excited using a 561 nm laser and emission read out using a 610/20 band pass filter. Finally, the absolute abundance of protein X was obtained according to: where nX is the absolute abundance of the protein of interest (mean number of molecules per cell), lX is the average measured fluorescence intensity of cell lines expressing protein X (in AU), lBackground is the average measured fluorescence intensity of cell lines that were not labeled with TMR, C45 is the average measured fluorescence intensity of the C45 cell line standard and lC45 is the absolute abundance of C45 (~86,900 proteins per cell).
To quantify the abundance of Sox2 and TBP in mESCs, we performed 3 biological replicates and the measurements for each are shown in Figure 3 – Figure supplement 1.
Acknowledgements
We thank Luke Lavis for generously providing JF dyes, Sheila Teves for sharing cell lines and Doug Koshland, Hugo Brandao and members of the Tjian-Darzacq lab for insightful comments on the manuscript. ASH is a postdoctoral fellow of the Siebel Stem Cell Institute. This work was supported by NIH grants UO1-EB021236 and U54-DK107980 (XD), the California Institute of Regenerative Medicine grant LA1-08013 (XD), and by the Howard Hughes Medical Institute (003061, RT).