Abstract
Chromosome structure is thought to be crucial for proper functioning of the nucleus. Here, we present a method for visualizing chromosomal DNA at super-resolution and then integrating Hi-C data to produce three-dimensional models of chromosome organization. We begin by applying Oligopaint probes and the single-molecule localization microscopy methods of OligoSTORM and OligoDNA-PAINT to image 8 megabases of human chromosome 19, discovering that chromosomal regions contributing to compartments can form distinct structures. Intriguingly, our data also suggest that homologous maternal and paternal regions may be differentially organized. Finally, we integrate imaging data with Hi-C and restraint-based modeling using a method called integrative modeling of genomic regions (IMGR) to increase the genomic resolution of our traces to 10 kb.
One Sentence Summary Super-resolution genome tracing, contact maps, and integrative modeling enable 10 kb resolution glimpses of chromosome folding.
Main text
In terms of organization, chromosomal DNA is among the most challenging of biological molecules to study. It is massive, differs in sequence and structure from one region to the next, and often comes in pairs of homologs that defy easy distinction. Nevertheless, much progress has been made via genetic and epigenetic analyses, including DNA-DNA proximity ligation assays, such as Hi-C and other Chromosome Conformation Capture technologies (1–8), as well as three-dimensional (3D) modeling (reviewed by (9, 10)). Recently, researchers have walked along chromosomes using widefield microscopy and fluorescent in situ hybridization (FISH) probes in rounds of hybridization (11), wherein each round targeted the central portion of a Hi-C defined self-interacting contact domain, also known as a topologically associating domain, or TAD (3–7). This strategy revealed the overall path of chromosomes, which was then used as a guideline for Hi-C contact-based modeling (bioRxiv, https://doi.org/10.1101/318493 May 2018). Other studies, using super-resolution microscopy, have explored a multitude of genomic features (12–18), some lending support for a physical basis for contact domains (15, 18). What remains relatively unexplored, is the physical nature of compartmental intervals, that is, the contiguous chromosomal regions whose loci share long-range contact patterns and collectively, generate the plaid “compartment” pattern often seen extending above and below the Hi-C diagonal (3).
Here, we present a method for visualizing chromosomal regions at super-resolution and then integrating the images with Hi-C interaction matrices via a data-driven computational protocol to produce 3D models of chromosome organization at the level of the single cell. We begin by describing our strategy for walking along 8.16 megabases (Mbs) of human chromosome 19 in primary skin fibroblasts of donor PGP1 (XY, PGP1f; Methods), using oligonucleotide-based Oligopaint FISH probes (19, 20) and imaging in sequential steps using the single-molecule localization microscopy methods of OligoSTORM and OligoDNA-PAINT (12, 20). The resulting images successfully resolved the structure of individual compartmental intervals, showing further that it can vary from cell to cell. We then describe a strategy for integrating images of the genome with Hi-C contact frequency data to produce 3D models at 10 kilobase (kb) resolution. As the PGP1 genome has been haplotyped and its pedigree is known, (Fig. 1A, B; (21); Methods) we also describe a strategy whereby the maternal and paternal homologs can be distinguished.
We considered two approaches for walking along the chromosome – one in which the sizes of the steps are in accordance with genomic features and another in which step sizes are uniform – initiating our studies with the first. Thus, we designed Oligopaint probes corresponding to the Hi-C defined features of compartments as well as smaller features, including contact domains, loops, and genes. Because a Hi-C map of PGP1f cells was not available when we initiated our study, we designed our walk based on the Hi-C data of IMR-90 lung fibroblasts (Fig. S1; (8)) Methods). Focusing on an 8.16 Mb region extending from chr19:7,400,000 to chr19:15,560,000, we then generated Oligopaint probes to image the region in 9 segments ranging in size from 360 kb to 1.8 Mb, including probes that would permit visualization of features as small as 10 kb (Table S1; also see Bintu, et al.).
Similar to fluorophore- (dye-) conjugated oligonucleotides (oligos) pioneered over two decades ago (22) as well as other iterations (Methods), Oligopaint probes consist of computationally designed oligos (Fig. 1C; (19)). In addition, they are single-stranded and carry a short region (~32-42 nucleotides, nts) of genomic homology as well as nongenomic sequences, called Mainstreet and Backstreet, at their 5’ and 3’ ends, respectively. Streets permit Oligopaints to be amplified, multiplexed, barcoded, and indirectly visualized via labeled secondary oligos (11, 12, 14, 19, 20, 23–26). Here, we use Streets to enable OligoSTORM and OligoDNA-PAINT (12, 14, 20). OligoSTORM melds Oligopaints with STORM (Stochastic Optical Reconstruction Microscopy; (27)) or dSTORM (direct STORM; (28)) and generates single events of fluorescence (localizations) via single dyes or dye-pairs. As for Oligo-DNA-PAINT, it combines Oligopaints with DNA-PAINT (DNA-points accumulation for imaging in nanoscale topography; (29, 30)) and generates localizations via transient hybridizations of short (~9-10 nts) dye-labeled oligos, called imager strands, to complementary sequences, called DNA docking sites, located on the Streets; here the transient binding of imager strands is interpreted as localizations.
Consisting of 45,407 species of oligos, our Oligopaint library incorporated three key features (Fig. 1C). First, all the oligos avoid sequences that differ between the parental genomes of PGP1f and, thus, they bind both homologs of chromosome 19. Second, both streets of each oligo carry 20-nt long barcodes corresponding to sub-regions of our 8.16 Mb walk; these barcodes confer multiple functionalities on the oligo. For example, when enabling OligoSTORM, these barcodes co-localize “bridge” oligos that recruit the labeled secondary oligos to be visualized by OligoSTORM. When enabling OligoDNA-PAINT, the bridges include a docking site for the imager strand. The barcodes also permit any targeted genomic region to be imaged twice, once via Mainstreet and then, again, via Backstreet. Finally, each oligo carries a 7-nt sequence (Fig. 1C, asterisk) that facilitates sequential rounds of imaging; in conjunction with a flanking barcode, these 7 nts generate a region which, when bound by a 27-nt “toehold” oligo (31–33), dislodge any previously bound bridge.
Our workflow (Fig. 1D; Methods) begins with T7 RNA polymerase-mediated amplification of the entirety of an Oligopaint library that is to be used for a walk. The totality of the resulting single-stranded oligos (11) is then hybridized to fixed cells in a single round of denaturation and hybridization. When implementing OligoSTORM, the sample is then placed on a 3D single-molecule localization microscope (Bruker Vutara 352) and scanned to identify nuclei to be imaged. Sequential rounds of imaging are then conducted with an automated microfluidic system (Fig. 1E), each round targeting one step of the walk by hybridizing a specific bridge oligo (Fig. 1E, labeled with ‘b’) along with its dye-conjugated oligos, after which images are acquired one nucleus at a time until all nuclei have been imaged; the next round is initiated through the use of toehold oligos (Fig. 1E, labeled with ‘t’), which dislodge bridges bound in the previous round. To traverse the depth of PGP1f nuclei, imaging is conducted in 100 nm increments along the axial (Z) dimension for up to 4 μm, well within the volumetric capability of the Vutara. This protocol has given an average localization precision of 11.14 +/− 1.47 nm and 11.02 +/− 1.22 nm along the lateral axes and 46.96 +/− 6.02 nm along the axial dimension, corresponding to average supported optical resolutions of 26 ± 3 nm and 112 ± 11 nm, respectively. All localizations are then subjected to drift correction with respect to fiducial beads, after which density-based spatial clustering of applications with noise (DBSCAN; (34)) extracts clusters of localizations most likely to represent genomic regions (Fig. 1E; Methods).
Using our workflow, we have walked the entirety of the 8.16 Mb region in over 60 chromosomes, separately imaging all 9 chromosomal segments via Mainstreet (CS1-9; Fig. 2A, B; Movie S1; Table S1). These studies also demonstrated the effectiveness of using Backstreet in conjunction with Mainstreet to image smaller features lying within larger regions, all in one imaging run (Table S1); for example, in a single 21-round imaging run that imaged CS1-9 using Mainstreet (Fig. 2A, rounds 1-9), we used Backstreet and to image four subregions comprising CS7 (Fig. 2A, rounds 10-13; Fig. S2), a loop, including its anchors and flanking regions in CS6 (Fig. 2C, rounds 14-18), and the Dnmt1 gene in CS3 (Fig. 2D, round 19). As a further demonstration of the capacity of barcodes, we have taken advantage of the propensity of chromosomes to lie within their own territories (35) and traced several chromosomes in parallel. For example, we have traced the 8.16 Mb region of chromosomes 19 while also walking along chromosomes 3 and 5 in uniform step sizes of 500 and 250 kb, respectively (Fig. 2E; Fig. S3 for all 23 steps; Table S2; Methods). We have also used our library to image CS7, 8, and 9 using OligoDNA-PAINT (Fig. 2F; Table S3), achieving an exciting supported resolution of 18 nm (Methods; (36)). Finally, to assess the reproducibility of our images, we have conducted a number of assessments using OligoSTORM (Methods). In one, we imaged CS1 using Mainstreet and then, imaged it again, 90 hours later using Backstreet, achieving an average spatial overlap of 80 ± 9% (n = 20; Methods) with minimal change in the average number of localizations (n=4,8371 for round 1 and 4,912 for round 21; Fig. 2G, S4). Comparable results were obtained (75% - 85%) when we imaged CS7 via Mainstreet and then reimaged it using barcodes on Backstreet to sequentially visualize four subregions comprising CS7 (Fig. 2H).
Intriguingly, we noticed early on that a number of consecutive steps of our walk appeared minimally entangled, suggesting that they correspond to distinct physical entities. To explore this, we further analyzed 38 OligoSTORM walks of CS1-9; these 38 represented the 19 nuclei with the highest image quality and for which we observed exactly two foci of signal per chromosomal segment and no evidence of aneuploidy (Methods). These nuclei were analyzed by convolving them with a Gaussian kernel (37), which enabled us to represent each chromosomal segment of each chromosome as a distribution of localization densities in 3D space (Methods). To assess entanglement of chromosomal segments, we used our density maps and calculated centers of mass, distances between centers of mass (Distance score, DS) and overlap (Entanglement score, ES; Fig. S5; Methods). The results revealed that chromosomal segments are not randomly placed with respect to each other. Although there was considerable variation in DS (Fig. S5B), the observed DS distributions and their mean distance scores (MDS) showed that the five central segments, CS3-7, were closer than expected in all pairwise combinations (Fig. 3A) except one (CS4 and CS7), which nevertheless trended in the same direction (Fig. S5B). Overall, these central segments were also distanced from the segments lying at the beginning (CS1 and CS2) and end (CS8 and CS9) of the walk. Importantly, even with only 38 walks, these differences from expectation were statistically significant for most pairwise comparisons (Fig. S5B). Moreover, segments at the edges of the central region were further than expected from segments just beyond the central region; CS3 was further from CS2, and CS7 was further from CS8 than expected. Interestingly, the beginning (CS1 and CS2) and end (CS8 and CS9) of the walk trended towards being closer than expected. Altogether, these observations suggested the center of the walk to be spatially separated from the ends, with the two ends being closer in space than expected.
Our analyses of entanglement also revealed a nonrandom pattern (Fig. 3B, S5C, D; Methods (37)). Although, again, there was considerable variation (Fig. S5D), the distributions of ES and their mean entanglement scores (MES) revealed that the central segments, CS3-7, were more entangled than expected or trended as such, while the segments lying at the boundaries of this region (CS3 and CS7) entangled less than expected with the segments lying upstream and downstream, respectively (Figs. 3B, S5D). The contiguous pairs of segments lying at the ends of the walk were also minimally entangled; CS1 and CS2 showed insignificant entanglement, while CS8 and CS9 resulted in the lowest of all the MES scores. Thus, it is especially noteworthy that CS1 and CS2, lying at one end of the walk, nevertheless showed significant entanglement with CS9 and CS8, respectively, at the other end (Figs. 3B, S5D). These observations aligned with our analyses of distance, suggesting a 3D clustering of the 9 chromosomal segments into two groups, one consisting of the central segments (CS3-7) and the other consisting of the ends of the walk (CS1, 2, 8, and 9). This was confirmed by unsupervised clustering of the MDS and MES matrices (Fig. S5E, F).
To further characterize the clustering, we calculated the surface area, volume, and sphericity for each chromosomal segment, corrected for genomic size (Fig. S6; Methods) and then, combining these features with the 8 measures of distance (DS) and 8 measures of entanglement (ES), conducted unsupervised clustering via Principal Component Analyses (PCA) for all 342 imaged chromosomal segments (38 chromosomes x 9 chromosomal segments) (Methods; Fig. S7). Two major clusters emerged, with the first two principal components accounting for 24.2% and 14.0% of the variability (Fig. 3C, S7A); the first cluster comprised 73.8% of all the images of CS3-7, while the second cluster comprised 75.0% of all the images of CS1, 2, 8 and 9. Considered together, the segments of cluster 1 had larger area and volume and were less spherical than the aggregate of the segments in cluster 2 (Fig. 3D-F). Interestingly, the segments that were primarily in cluster 1 harbored epigenetic markings of active chromatin, while those primarily in cluster 2 harbored markings of inactive chromatin (Fig. 3G; Methods). These findings align with the differing degrees of chromatin compactness that have been observed for different epigenetic states in Drosophila (14).
Our analyses also highlighted variability in the classifications for all chromosomal segments (Table S4). For example, while CS3-7 fell primarily in cluster 1 and CS2 and CS8 fell primarily in cluster 2, CS1 and CS9 resulted in much more mixed classifications, with ratios of cluster 1:cluster 2 being 42:58 and 37:63, respectively (Fig. 3C, bar graph). CS1 is particularly noteworthy in this regard; the distributions of PC1 values revealed that all segments were skewed towards one cluster type with the exception of CS1 (Fig. S7B-J), which was also the most mixed in terms of intra-nuclear and inter-nuclear variation (Fig. S7K). In brief, our data suggest that, while chromosomal segments can be broadly classified by their structural properties into categories reminiscent of active and/or inactive chromatin, they are also variable and dynamic in their character.
In sum, OligoSTORM derived density maps suggested a number of organizational traits for the imaged region (Fig. 3H). First, CS3-7, can form an internally entangled entity. Second, this region can be distanced from, as well as less entangled with, its flanking segments, suggesting the potential of distance, per se, to be an organizational principle. Third, the entire region can fold back on itself such that its ends come into proximity and entangle to a modest degree (see example in Fig. 2B). Fourth, CS1, CS2, CS8, and CS9 may each correspond to an individual entity, suggesting that the imaged region may be composed of at least 5 structurally distinct sections. Fifth, the two clusters of segments revealed by PCA may correspond to spatially separated compartments, active (A-type) and relatively inactive (B-type), reminiscent of those observed by Hi-C (3). Finally, while some chromosomal segments were predominantly active or inactive, others could be classified as in either state or mixed.
How well might our images correlate with population-based data from DNA-DNA proximity ligation based technologies? To address this, we generated an in situ Hi-C map for PGP1f cells (Fig. 4; Table S5; Methods) and assigned loci in the target region to A and B compartments based on the resulting map. The annotation assigned CS1, CS3-7, and CS9 to the A compartment, whereas CS8 and most of CS2 were assigned to the B compartment (Fig. 4, S8). Notably, the fact that CS3-7 fell in a single compartment meant that all loci in this extended interval exhibited enhanced contact frequencies with one another. These findings recapitulate several features of the imaging data: [i] five physical entities (CS1, CS2, CS3-7, CS8, and CS9) with distinctive chromatin states; [ii] a single, extended structure spanning CS3-7; [iii] elevated contact frequency between the endpoints of the target region; and [iv] strong correspondence between the chromosomal segments assigned to the A compartment (CS1, CS3-7, and CS9) and those that were frequently assigned to cluster 1 (CS1and CS3-7). This strong correspondence between the outcomes of two very different technologies, OligoSTORM and Hi-C, lends confidence to our image-based measurements of distance, entanglement, size, volume, sphericity, and structural variability as well as to the overall potential of OligoSTORM to elucidate the organization of large swaths of, if not entire, chromosomes. Moreover, our images specifically argue that compartmental intervals have the capacity to form distinct structures.
Next, we developed an integrative protocol for combining different modalities of data to model the genome at unprecedented genomic resolution. This integrative modeling of genomic regions (IMGR) was used here to integrate OligoSTORM and Hi-C data, producing, for the first time, a 3D model of two homologous regions in a single diploid nucleus (Figs. 3I, S9A; Movie S2; see Methods and Supplementary notes for rationale of IMGR, including rigid and flexible fittings and protocol to select models most likely to represent the image in question.). Population-based 3D modeling of Hi-C data has been used before to recapitulate diploid genomes (38), and 3D modeling of single-cell Hi-C data has been implemented for haploid reconstruction (39, 40). However, modeling homologous chromosomes has proven challenging due to structural variability between the two chromosome copies (39). In this study, we addressed diploidy through integrative modeling. We used, as input, an ensemble of Hi-C derived 3D models and the corresponding single-nuclei OligoSTORM density maps for each of the 9 chromosomal segments we had imaged; the models had been built as previously described (41, 42), using 10 kb-resolution PGP1f Hi-C maps. Then, via a protocol inspired by the fitting of proteins to cryoEM density maps (43), we implemented a two-step integrative process to fit each of the ensemble 3D models into the density map of the corresponding chromosomal segment (Fig. S9A; Methods). Thus, this modeling protocol achieved a resolution of 10 kb for a genomic region that had been imaged in step sizes ranging from 0.36 to 1.8 Mbs in size.
Models of the genomic resolution we have produced can yield insights into the 3D organization of specific loci in the modeled region. For example, chromosome 19 is extraordinarily rich in zinc-finger genes, which are clustered in 6 locations (44), and Figure S9B models the arrangement of the two clusters located in the region we imaged; one lies at one end of CS2 and another extends across CS4. Similarly, all genes in the imaged region that are classified by OMIM as disease-causing (45) can now be positioned in 3D (Fig. S9C). The models can also be used to map ChIP-seq data in 3D space (46). For instance, it appears that one end of each copy of the region is decorated by a large patch of H3K4me1, H3K27ac, and H3K4me3 (Fig. S9D-F). These patches suggest enhancer-promoter clusters and, consistent with this, RNA-seq and DNAse accessibility data indicate that the patches co-localized with foci of expressed genes and greater accessibility (Fig. S9G, H). In contrast, ChIP-seq data for facultative (H3K27me3) and constitutive (H3K9me3) heterochromatin suggest multi-foci patches of heterochromatin throughout (Fig. S9I, J).
Finally, we have explored the potential of our approach to address questions regarding the relationship between homologous chromosomal regions. Indeed, although not identified by haplotype, allelic differences in accessibility and volume have been noted at the level of individual genes (47). Will this difference be maintained across large chromosomal regions and, if so, will they reflect parental origin? Here, we assessed the difference between homologous regions by first assigning an ellipticity score to each 8.16 Mb region of both homologs for all 19 nuclei and then, for each nucleus, taking the ratio of the larger to the smaller score to obtain an ellipticity ratio (Methods). Intriguingly, the median ellipticity ratio was 2.5 (±0.54), which was significantly greater than for randomly selected pairs of homologous regions (1.3 ± 0.48; Fig. 5A, S10A, Table S6), suggesting that homologous regions of a single nucleus may differ nonrandomly in ellipticity and that the difference is, on average, greater between homologous regions within a nucleus than between nuclei.
To ascertain whether greater ellipticity might correlate with parental origin, we turned to the most recent of our 38 walks as, for these, we had been successful in assigning parental origin. Here, we had implemented homolog-specific Oligopaints (HOPs; (12)), which target single nucleotide variants (SNVs) that differ between the haplotypes of a genome and, thus, permit distinction of homologous genomic regions. While proven successful in Drosophila and mice with abundant SNVs (~2-5 SNVs per kb; (12)), their applicability to humans, which have a lower frequency of SNVs (~0.77 SNVs per kb for PGP1), was unknown. Thus, we generated two libraries, HOP-M and HOP-P, the former targeting 11,259 maternal-specific SNVs extending across the entirety of chromosome 19, and the latter targeting the analogous 11,259 paternal-specific SNVs (Fig. 5B; Methods). Excitingly, HOP-M and HOP-P proved successful in differentially labeling the homologs (Fig. 5C-G, S10B, C). Thus, having applied HOPs in our most recent imaging runs, we were able analyze 12 walks (6 nuclei) for which parental origin was determined and conclude that ellipticity was not absolutely correlated with parental origin (Fig. 5H). Interestingly, MDS and MES matrices suggest differences between maternal and paternal homologs, with the potential of CS3-7 being more spatially compartmentalized in maternal as versus paternal homologs (Fig. S10D-I); of note, maternal and paternal homologs produced matrices reminiscent of those of the entire set of 38 chromosomes, arguing that the homologs we distinguished by HOPs approximate those of the larger population (Figs. 3A, B, S10F, I). While these studies must be greatly expanded to substantiate any homolog-specific trend, our experience indicates that OligoSTORM should be useful for exploring the biology of homology.
In conclusion, we have described strategies for in situ super-resolution genome visualization through sequential rounds of imaging, followed by integration of the resulting images with Hi-C data to produce 3D models at 10 kb resolution in single nuclei. In conjunction with other efforts (e.g., refs (11–18)), these strategies should advance our understanding of genome organization and, when enhanced with HOPs, the impact of parental origin. Indeed, here we have observed patterns of chromosome organization that were otherwise unknown. In line with other studies, including a live-cell analysis correlating chromatin accessibility more with fluctuation than compaction (bioRxiv, https://www.biorxiv.org/content/early/2018/07/09/365700 July 2018), we have also observed significant structural variability and speculate, here, as to the potential of that variability. Such variability may reflect genomic function or may, itself, be a signature that distinguishes one genomic region, or chromatin state, from another. For example, while regulatory factors may recognize their binding regions by the conformation of their targets, it may be that some also, or solely, respond to the variations of that conformation, the number of conformations, and/or the speed with which their targets transition through conformations. If propagated along chromosomes or of sufficient magnitude to generate a turbulence that can be propagated through the nuclear milieu, dynamic variability of chromosomal regions may even function to convey information across nuclear distances, perhaps even perturb or promote phase separations (48, 49). Finally, we note that, as our strategies are applicable to entire chromosomes, it should be possible to implement them in studies of chromosomes as single, fully integrated units of structure and function, consistent with the unit of inheritance being as much the chromosome as it is the gene.
Funding
This work was supported by funds from Ministerio de Ciencia, Innovación y Universidades of Spain (IJCI-2015-23352) to I.F., Damon Runyon Cancer Research Foundation and Howard Hughes Medical Institute to B.J.B., Uehara Memorial Foundation Research to H.M.S., William Randolph Hearst Foundation to R.B.M., EMBO Long-Term fellowship to J.E, NSF (Center for Theoretical Biological Physics, Rice University) to M.D.P and J.N.O, NSF (CCF-1054898, CCF-1317291), NIH (1R01EB018659-01, 1-U01-MH106011-01), and Office of Naval Research (N00014-13-1-0593, N00014-14-1-0610, N00014-16-1-2182, N00014-16-1-2410) to P.Y., NIH (1DP2OD008540, U01HL130010, UM1HG009375, 4DP2OD008540), NSF (PHY-1427654), USDA (2017-05741), Welch Foundation (Q-1866), NVIDIA, IBM, Google, Cancer Prevention Research Institute of Texas (R1304), and McNair Medical Institute to E.L.A., Horizon 2020 Research and Innovation Programme (676556), European Research Council (609989), Ministerio de Ciencia, Innovación y Universidades of Spain (BFU2017-85926-P), CERCA, and AGAUR Programme of the Generalitat de Catalunya and Centros de Excelencia Severo Ochoa (SEV-2012-0208) to M.A.M-R., and NIH (5DP1GM106412, R01HD091797, R01GM123289) to C.-t.W.
Author contributions
G.N. conceived the project, led the design, performed experiments and analyses, and wrote the paper. I.F. conceived and performed analyses and integrative modeling and wrote the paper. C.P.E conceived and performed Hi-C experiments and analyses and wrote the paper. C.G.E conceived innovations to STORM imaging and data storage and analysis and wrote the paper. B.J.B, H.M.S., S.H.L., S.C.N, and R.B.M conceived several aspects of the project, performed experiments and analyses, and edited the paper. S.C., M.H., S.R., and N.C.D performed experiments. S.S.P.R, J.Y.K., P.S-V, and M.D.P. analyzed data. J.E., J.A.A, N.M.C.M, and H.Q.N contributed to discussions. M.D.P and J.N.O. contributed to Hi-C discussions. S.Ca and J.Sc conceived the Bruker Vutara image analysis software. J.St, P.Y., E.L.A, M.A.M-R, and C.-t.W participated in all aspects of the study and wrote the paper.
Competing interests
C.-t.W., B.J.B., R.B.M., S.C.N., S.C., and H.Q.N. hold or have patent filings pertaining to Oligopaints and related technologies, including other oligo-based methods for imaging. These technologies may be licensed to ReadCoor, a company in which C.-t.W. holds equity. P.Y. and B.J.B have filed patents related to DNA-PAINT. DNA-PAINT technology has been licensed to Ultivue Inc., a company in which P.Y. is an equity holder and cofounder. P.Y. is also a co-founder of NuProbe Global.
Data and materials availability
All data from the study are either in the manuscript, available upon request, or as follows. Streets and MATLAB code used for the generation and appending of streets can be downloaded at https://github.com/gnir/OligoLego. PGP1f cells are available from the Personal Genome Project (https://www.personalgenomes.org/us) upon acceptance of a standard MTA. Hi-C data will be placed in the Gene Expression Omnibus (GEO), where it will be associated with an accession number. All data pertaining to modeling will be available through http://3DGenomes.org/datasets.
Materials and Methods
Mining for Oligopaint targets
Akin to a growing category of oligo-based FISH probes (e.g., ref. (52–56)), Oligopaint FISH probes consist of oligos that have been computationally designed to target specific sequences of the genome ((19); see the Oligopaints website (https://oligopaints.hms.harvard.edu/) for preselected whole genome datasets of Oligopaint oligo targets for the human, mouse, zebrafish, Arabidopsis, Drosophila, and C. elegans genomes). We identified genomic sequences that would serve as good targets for Oligopaint oligos by applying Oligominer to a repeat-masked human hg19 assembly (57). Specifically, we used the ‘balance’ settings of 35–41 nts of genome homology, 42–47°C for TM, a simulated hybridization temperature of 42°C, and ‘LDA mode’ filtering. Candidate probes were further processed by the ‘kmerFilter’ script that calls the algorithm Jellyfish (58) to eliminate probes containing 18mers that occur in >5 times in hg19.
Mining the genome for homolog-specific Oligopaint (HOP) targets on chromosome 19 of PGP1
As the genome of PGP1 has been fully sequenced and phased (21, 59) (Kun Zhang and Daniel Jacobsen, unpublished; Bing Ren and Anthony Schmitt, unpublished), we were also able to design HOP probes for the homologs of chromosome 19; also see paper by O’Keefe, et al. (1996, ref.(60)). First, genomic sequences that would be good targets for HOPs (12) were discovered using the approach described above, except that the input hg19 sequence was not repeat-masked. These probe sequences were then intersected with the locations of phased heterozygous SNVs for PGP1 according to Lo, et al. (2013, ref.(59)) using the ‘intersectBed’ utility from BEDTools (61) and then processed with in-house software to generate two sets of haplotype-specific HOP probes in which each probe oligo targets ≥1 heterozygous SNV.
In order to ascertain which of the two HOP probes corresponds to maternal variants and which corresponds to paternal variants, we took advantage of the full genome sequence of PGP95 (courtesy of Rigel Chan and Elaine Lim), with whom PGP1 shares his maternal parent but not his paternal parent, and presumed that the homolog of chromosome 19 that carries one (or more) long blocks of variants in common with PGP95 would be the one that was maternally derived (Fig. 1B). First, heterozygous variants that identify the two PGP1 haplotypes were matched to variants from the full genome sequence of PGP95, and any that were homozygous in the PGP95 genome were discarded. All remaining PGP95 variants were then assigned as matching one of the PGP1 haplotypes (arbitrarily designated as H0 or H1) or neither, permitting us to identify blocks of contiguous matches to H0 or H1. The H1 haplotype included significantly more of the longest blocks shared between PGP1 and PGP95, thus identifying the H1 haplotype as maternally derived (Fig. S11). Then, as many long blocks corresponding to H1 were discovered to occupy a segment of PGP1 chromosome 19 at coordinates chr19:48,932,903-59,087,560, we designated the chromosome 19 homolog that carried these blocks the maternal chromosome. Accordingly, we designated the HOP probe corresponding to H1 as HOP-Maternal (HOP-M) and that corresponding to H0 as HOP-Paternal (HOP-P). The probes are efficient. Up to 96.5% of nuclei producing two fluorescent foci when imaged with HOP-M or HOP-P showed one of the signals to be stronger than the other, with the focus labeled more strongly with one probe being the more weakly labeled focus when imaged with the other probe; dye-swap experiments suggested that the difference between ratios for HOP-M and HOP-P is due to dye chemistries (Fig. S10B, C). Importantly, the corresponding ratio of signals obtained with probes targeting only the interstitial regions lying between SNVs approached 1.
Multiplexing of the Oligopaint library and design of streets
We designed our Oligopaint libraries based on Hi-C maps that were available at the time our study was initiated. In particular, Geoffrey Fudenberg used IMR-90 Hi-C contact frequencies from Rao, et al., (2014, ref.(8)) to identify compartmental intervals and contact domains in the 8.16 Mb region extending from chr19:7,400,000 (19p13.2) to chr19:15,560,000 (19p13.12), where the calling of compartmental intervals relied on ICE (62) and a 3-state Gaussian Hidden Markov Model (HMM) implemented in scikit-learn (63), while contact domain calling relied on (64). Having segmented the 8.16 Mb region, we then incorporated Mainstreets and Backstreets into our library such that they would permit individual genomic features to be separately imaged. In particular, streets were generated using an in-house MATLAB (MathWorks, Natick, MA) script, ‘MakingStreets’, available via GitHub (https://github.com/gnir/OligoLego). Street sequences were vetted for predicted performance when serving as primers in PCR reactions using Primer3Plus (65) with default settings, except that the melting temperature was set to be 57-59°C, and the GC content was set to be 40-60% (65). Street sequences were also checked for pairwise orthogonality using NUPACK (66) to estimate equilibrium hybridization yields between candidate streets and other potential target sites in 390 mM Na+ and 50% formamide at room temperature (RT). Toehold sequences were also validated using NUPACK. Candidate streets passing all upstream checks were then examined for orthogonality to the human genome by using bowtie2 (67) to filter sequences aligning at least once to hg38 with ‘—very-sensitive-local’ alignment mode. Mainstreet and Backstreet sequences were appended to Oligopaint oligos using an in-house MATLAB code available via GitHub (https://github.com/gnir/OligoLego).
Multiplexing Oligopaint libraries to walk along multiple chromosomes simultaneously
Here, we designed Mainstreet barcodes so that a) they could be shared across multiple chromosomes, such that one round of imaging could target anywhere from one up to the maximum number of chromosomes to be imaged, and b) imaging different chromosomes would be initiated in different rounds, thus permitting each chromosome to be identified by the round in which it first appears. For example, in an imaging scheme that introduces a new chromosome in every round, one could use the following scheme, wherein the walk along Chromosomes 1, 2, 3, 4, 5, and N are initiated in Rounds 1, 2, 3, 4, 5, and N, respectively.
Oligopaint probe synthesis
Oligopaint libraries were purchased from CustomArray (Bothell, WA) and amplified using in vitro transcription as previously described (11, 14, 20). Briefly, for each sub-pool to be amplified, we first optimized the template and primer concentrations using real-time PCR. We next performed large-scale limited-cycle PCR, and then used the product as template for in vitro transcription by T7 RNA polymerase (NEB, E2040S). The resulting ssRNA was then reverse transcribed (EP0752, ThermoFisher Scientific). RNA was digested by adding a mixture of 0.5 M NaOH and 0.25 M EDTA of equal volume to the reverse transcription reaction at 95° C for 10 minutes. Oligopaint ssDNA oligo probes were purified using the DNA Clean & Concentrator–100 kit (Zymo Research, Irvine, CA) paired with an EZ-Vac Vacuum Manifold (Zymo Research).
Culturing of cells
Human primary skin fibroblasts of donor PGP1 (GM23248, Coriell Institute; (68)) were grown at 37°C + 5% CO2 in serum-supplemented (10% v/v) Dulbecco’s Modified Eagle Medium (DMEM) (serum Gibco 10437; media Gibco 10564). The cells were also supplemented with 1% (v/v) MEM Non-Essential Amino Acids Solution (Gibco 11140050). Penicillin and streptomycin (Gibco 15070) were also added to the cell culture media to final concentrations of 50 units/ml and 50 μg/ml, respectively.
Sample preparation
40 mm coverslips #1.5 (Bioptechs, Butler, PA) were cleaned with a bath sonicator and stored in a 75% (v/v) ethanol solution at room temperature until used. In order to prepare PGP1 samples for imaging, the pre-cleaned coverslips were first allowed to fully dry in a tissue culture hood and then placed in sterile 150 mm tissue culture dishes (Falcon 353025). PGP1 cells were then deposited at ~15 % confluency onto the 150 mm dishes and allowed to grow in a mammalian tissue culture incubator until ~85% confluent (~5 days). At this point, the PGP1 cells were rinsed with 1x PBS, fixed using 4% (w/v) paraformaldehyde in 1x PBS for 10 minutes, rinsed with 1x PBS, permeabilized with 0.5% (v/v) Triton X-100 in 1x PBS for 10 minutes, rinsed with 1x PBS, and then stored in a cold room for up to 3 weeks.
3D DNA FISH
3D DNA FISH (69) was performed on fixed and permeabilized coverslips essentially as previously described (12, 19, 70). Briefly, coverglass samples were placed in small glass-bottom tissue culture dishes (MatTek P50G-1.5-30-F) and incubated with PBST (1x PBS + 0.1% v/v Tween-20) for 2 minutes. Samples were then incubated once with 1N HCL for 5 minutes, twice with 2x SSCT (2x SSC + 0.1% v/v Tween-20) for 1 minute, once with 2x SSCT + 50% (v/v) formamide for 2 minutes, and then once for 20 minutes with a pre-heated solution of 2x SSCT + 50% formamide at 60°C. Coverslips were then air-dried and incubated with a hybridization cocktail consisting of 2x SSCT, 50% formamide, 10% (w/v) dextran sulfate, 0.4 μg/μL of RNase A (ThermoFisher EN0531), and Oligopaint probe whose concentration was adjusted such that the final amount of probe added was equivalent to ~1.4 μM per every 1 Mb targeted (e.g. ~2.8 μM for probe targeting 2 Mb). In cases where the parental origin of the homolog was to be determined, HOPs probes were added at a concentration of 2.85 μM each. The hybridization reaction was then sealed beneath a 22 x 30 mm, #1.5 coverslip (VWR, Randor, PA) using rubber cement (Elmer’s, Westerville, OH), which was allowed to dry at 37°C for 7 minutes. Samples were then denaturated on top of a water-immersed heat at 80°C for 3 minutes. Coverslips were then allowed to hybridize for 2–3 nights at 47°C in a humidified chamber. Following hybridization, coverslips were washed 4 times with 2x SSCT at 60°C for 5 minutes, then twice with 2x SSCT at RT. Coverslips were then rinsed with 1x PBS and allowed to air dry. For the purposes of drift correction, fiducial markers (gold nano-urchins, GNUs; d=90 nm, 630 nm abs max) (Cytodiagnostics, Ontario, Canada) were sonicated for 10 minutes and then diluted 1:3 in PBS, pipetted onto the sample coverslips, and sandwiched with a blank, i.e. nontreated, 40 mm coverslip. GNU-coverslip sandwiches were centrifuged at 500 x g for 3 minutes. The sample coverslips were then washed for 2 minutes with 1x PBS.
HOPs imaging and processing
Images were obtained using the widefield mode on the Bruker Vutara 352 (Bruker Nano Inc.) microscope (described in the section in Methods on OligoSTORM imaging). SlowFade antifade reagent containing DAPI dye (ThermoFIsher Scientific) was used as the imaging buffer. Images were acquired in the Z-dimension, approximating 2-3 um across 4 different channels, with the following dyes: DAPI, Alexa Fluor 647, Alexa Fluor 488, and ATTO 561 with 405, 488, 561, and 639 nm lasers (Coherent), the emission filters (custom-made, Semrock) 465/30, 515/40, 600/50, respectively, and detected via an sCMOS camera (Hamamatsu Orca-flash 4.0 v3) with 100 ms exposure time. These four channels corresponded to nuclear staining, interstitial signals, and the two HOP probes, respectively. Images were analyzed using an inhouse MATLAB (Mathworks) script, which was written to detect and then derive the intensity ratio of the foci in each channel. Detection of signal from foci was validated by eye.
OligoSTORM imaging
OligoSTORM imaging was performed on a Bruker Vutara 352 commercial 3D biplane single molecule localization microscope (71–73), equipped with a 60x water objective (Olympus) with a numerical aperture (NA) of 1.2. For illuminating Oligopaint oligos with Alexa Fluor 647, we used a 639-nm Coherent Genesis laser with ~ 25 kW/cm2 at the back aperture of the objective, and for illuminating with Alexa Fluor 405, we used a 405-nm Coherent Obis laser for photoactivation with up to ~0.05 kW/cm2 at the back aperture (emission filters were described in the Methods section describing HOPs imaging and processing). Fluorescent detection was captured on an sCMOS camera (Hamamatsu Orca-flash 4.0 v2) using a 10 ms exposure time. At each imaging session, we imaged 5-10 nuclei at different X,Y stage locations, while taking up to 4 μm z-scan using 100 nm Z-steps at each location, with up to 21 loci per nucleus, collecting 42,000-150,000 frames for each locus. Image data was collected by a Bruker-provided network attached storage (NAS) system.
OligoDNA-PAINT
DNA-PAINT imaging was performed on a custom optical set-up based on a Nikon Ti Eclipse microscope featuring a Perfect Focus System and a custom-built TIRF illuminator. DNA-PAINT was performed using Highly Inclined and Laminated Optical sheet illumination (HILO; (74)) with 10% of a 1000-mW, 532-nm laser (MPB Communications) using a CFI Apo TIRF 100× oil (N.A. 1.49) objective at an effective power density of ~2 kW/cm2. The 532-nm laser excitation light was passed through a quarter-wave plate (Thorlabs, WPQ05M-532), placed at 45° to the polarization axis and directed to the objective through an excitation filter (Chroma ZET532/10x) via a long-pass dichroic mirror (Chroma ZT532RDC_UF2). Emission light was spectrally filtered (Chroma ET542LP and Chroma ET550 LP), directed into a 4f adaptive optics system containing a deformable mirror (MicAO 3DSR, Imagine Optic) and imaged on an sCMOS camera (Andor Zyla 4.2+) with 6.5-μm pixels using rolling shutter readout at a bandwidth of 200 MHz at 16 bit, resulting in an effective pixel size of 65 nm. In order to estimate the axial positions of single molecule emitters, optical astigmatism was applied using the deformable mirror to cause an asymmetric distortion in the observed point spread functions (75, 76). A total of 22,500–45,000 250-ms frames were acquired for each image by using 0.5–1 nM of Cy3B-labeled 10-mer oligos in 1× PBS solution + 500 mM NaCl, 10 nM PCD, 2.5 mM PCA, and 2 mM Trolox. Gold nanoparticles (40 nm; no. 753637; Sigma-Aldrich) were used as fiducial markers to facilitate drift correction. The lateral positions of single-molecule localization events were first determined using Picasso (77) by applying the least-square fitting algorithm, and the axial positions were then also determined using Picasso by fitting to a pre-established calibration curve (75, 76). Global lateral drift-independent localization precision was estimated by Nearest Neighbor based Analysis (NeNA) (78). Global axial localization precision was estimated by empirical analysis of sub-diffraction sized fiducial markers after fitting to an empirically derived calibration curve with a mean precision of < 20 nm.
OligoDNA-PAINT image reconstruction
Two-dimensional DNA-PAINT images were rendered using Picasso, with individual localization events being represented as single point sources blurred by an isotropic twodimensional Gaussian function whose “sigma” parameter reflects the average X-Y NeNA localization precision of the localizations occurring in the presented field of view and colored according to inferred axial position (78). Multicolor three-dimensional DNA-PAINT images were rendered using ViSP (79), with individual localization events being represented as single point sources blurred by an anisotropic three-dimensional Gaussian function whose X-Y “sigma” reflects the average NeNA localization precision of the localizations occurring in the presented field of view and Z “sigma” was informed by the precision of the calibration curve used for axial fitting as determined by Picasso.
Automated microfluidics
We exploited a fluidics system as previously described (11, 23). Briefly, Bioptech’s FCS2 flow chamber (v ~ 100 μL) was integrated with a peristaltic pump (Gilson minipuls3) and 3 valves (Hamilton HVXM 8-5), with the resulting dead volume in our system being ~700 μL. Integration of the fluidics system and the Bruker Vutara 352 microscope was programmed into SRX, the commercially available microscopy software package that controls the Bruker Vutara 352.
Sequential hybridization for multiple rounds of OligoSTORM
Secondary hybridizations were performed using 50 μM of secondary probe in 2x SSC +30% (v/v) formamide, and allowed to hybridize for 30 minutes at RT. Following hybridization, a wash solution (40% v/v formamide + 2x SSC) was introduced to the flow cell and incubated for 3 minutes without flow. Wash solution was then replaced with imaging buffer consisting of 10% (w/v) glucose, 2x SSC, 50 mM Tris, 1% (v/v) β-mercaptoethanol, and 2% (v/v) of a GLOX stock solution consisting of 75 mg/mL glucose oxidase (Sigma G2133-250KU), 7.5 mg/mL catalase (Sigma C40-500MG), 30 mM Tris, and 30 mM NaCl. A thin layer of mineral oil (Sigma M5904) was added at the top to prevent oxygen from penetrating the imaging buffer. Imaging was initiated after the imaging buffer was allowed to incubate for 2 minutes without flow.
OligoSTORM data processing
During localization analysis, the precision of each localization event was determined by calculating the Cramér-Rao Lower Bound, namely the inverse of the Fisher information of the measured point-spread function (80, 81). The Cramér-Rao Lower Bound is the lower bound of the variance of the estimation process, which is used to calculate the localization precision of each event.
For isolating localizations comprising structures of interest during each round of imaging (fluidic cycle), we first removed localizations of lesser quality according to the following criteria: initial filtering of the localization data consisted of applying a geometric mean goodness-of-fit filter calculated from three separate metrics from the localization data. The individual metrics consisted of calculating the ratio of the photons assigned to the Point Spread Function (PSF) in the localization routine to the total number of photons in the camera cutout around a localization event (first metric), total photons in the cutout around a localization event (second metric), and the offset in Z between the calculated localization position of the fluorophore and the physical position of the objective piezo (third metric). The goodness-of-fit metric of each localization event is calculated by sorting each metric through a self-normalization process (with a value of 0 and 1 being the lowest and highest, respectively, quality value to a given metric), multiplying the values of the three metrics together, and taking the cube-root. Localizations with a score closer to 1 are considered of higher quality as compared to localizations with a score closer to 0. An initial thresholding was set such that only localizations with a rank of 0.8 or higher were accepted for statistical analysis. Also, candidate localization events whose peak intensity varied no more than 2 pixels (pixel size ~100 nm) in up to 7 frames in a row were considered as one. Finally, we filtered out all localizations with an axial precision worse than 100 nm. We found that the number of localizations increases with genomic size (Fig. S12). Following localization filtering, we performed drift correction using a center-of-mass function between subsets of fiducial markers across the data recording (82). Finally, clusters of localizations were segmented using DBSCAN (34).
When determining the spatial overlap between two clusters, we used Bruker’s SRX software to generate surfaces of the calculated clusters based upon alpha shapes, using an alpha radius of 150 nm, which reflects the maximum particle distance used for the DBSCAN analysis for CS1, CS7, and the merged four contact domains that were imaged in rounds 10-13 (Fig. 2A, H). Average spatial overlap between two clusters was expressed as the average of the fraction of the first cluster overlapping the second and the fraction of the second cluster overlapping the first.
For comparison, we determined the spatial overlap of a cluster that was randomly divided into two clusters. In particular, we chose two nuclei as an example and divided the localizations obtained in rounds 1 and 21 of imaging into two, generating 8 clusters per diploid nucleus. We noticed that for the nucleus with the higher number of localizations (~6,500 localizations/cluster) the average spatial overlap was 90-92%, while for the nucleus with lower number of localizations (~3,380 localizations/cluster) the average spatial overlap was 85-89%. This implies that the spatial overlap between rounds is not expected to be more than 90% and is dependent upon the number of localizations.
Selection of nuclei for analysis
For OligoSTORM analysis, we chose only diploid nuclei for which the homologs were not too close to be distinguished with a single dye (as we used only Alexa647N for OligoSTORM), did not show high levels of anisotropy in Z, and also did not show evidence of aneuploidy. Note, PGP1 cells were confirmed to be karyotypically normal (XY; n=20; Cell Line Genetics, Madison, WI), and cell sorting experiments show that 72-86% of PGP1f cells are in G1.
Generating OligoSTORM density maps
DBSCAN-extracted localizations for each chromosomal segment were convolved with a Gaussian kernel to match the mean precision (37) and to reduce the anisotropy of the image acquisition. The density map of each homolog compartmental segment was represented by intensities at points i (ρi) on a cubic grid with fixed voxel size. For each localization, the density value of the nearest voxel was increased by a factor of 1. The obtained grid was convoluted with a Gaussian function using TEMPy (37), such that the density ρi was defined as: where x, y and z, and xn, yn, zn, are the Cartesian coordinates of grid point i, and localization n respectively. N is the total number of localizations, z is the number of localizations per grid point, and σ is set to be proportional to the mean precision. The iso-contour threshold for each map was set to be one sigma after shifting of the background peak to zero using TEMPy (37). The density map of the entire walk was also generated using the same approach.
Analysis of OligoSTORM density maps
A total of five structural measures were obtained from each OligoSTORM density map obtained for each chromosomal segment. First, the distance score (DS) between two chromosomal segments was calculated as the spatial distance between the center of mass of each OligoSTORM density map. Next, the DS was corrected for linear genomic distance using a power-law fit. Finally, the DS Z-score was calculated for each data point using the power-law function as expected value. Second, the entanglement score (ES) between two chromosomal segments was calculated as the fraction of overlapping voxels within the optimal iso-contour threshold with respect to the smaller of the two density maps as implemented in TEMPy (37, 83). As before, the ES was corrected for linear genomic distance using a power-law fit obtained by the ES Z-score. Third, the surface area of each OligoSTORM density map was calculated by summing over the area of the triangles on the surface on the convex hull using the “measure area” option in Chimera (50). Next, the surface area was corrected for genomic size (Fig. S6A) and transformed into a surface area Z-score. Fourth, volume was calculated within the optimal iso-contour threshold using the “measure volume” option in Chimera (50). As before, the volume was corrected for genomic size (Fig. S6B) and transformed into a volume Z-score. Fifth, a sphericity score (Ψ), which measures how closely the shape of an OligoSTORM density map approaches that of a mathematically perfect sphere, was calculated as: where VP and AP are the volume and surface area of the OligoSTORM density Map P. To take into account the bias due to genomic size, sphericity was corrected for genomic size (Fig. S6C) and transformed into a sphericity Z-score.
For each homolog, a feature vector was created, which is a binary 1×9 vector encoding the cluster types of each chromosomal segment in the region. Next, a per-nucleus profile of the compartment state was made for comparing the feature vectors of each of the homologs in a single nucleus. The compartment state profile is represented by a 1584:3429 vector whose values are determined by a comparison score set as: (i) 1, both compartment segments belong to cluster 1, (ii) 0, the compartment segments belong to different clusters, or (iii) −1, both compartment segments belong to cluster 2. The resulting matrix of 19 compartment state profiles was hierarchically clustered, resulting in five clusters.
The resulting matrix of structural features for the 342 chromosomal segments (that is, 9 segments in 2 homologs in 19 cells) was analyzed using the PCA implemented in the Python library Scikit-learn. The clustering resulted best in two major clusters that were named cluster 1 and cluster 2 (Fig. 3C, S7A).
ENCODE PGP1 data
Data available for RNA-seq, DNase-seq and five chromatin marks (H3K4me1, H3K27ac, H3K4me3, H3K27me3, H3K9me3) were downloaded from the ENCODE project (46). Mapped reads were obtained from samples ENCFF043JNP and ENCFF280JPQ for H3K4me1, ENCFF612WGP and ENCFF345SWP for H3K27ac, ENCFF397XVM and ENCFF179TEP for H3K4me3, ENCFF378EQH for RNA-seq, ENCFF523JNF and ENCFF050OVJ for DNase-seq, ENCFF800CCA and ENCFF876WHO for H3K27me3, and ENCFF644QGC for H3K9me3 (https://www.encodeproject.org). Next, coverage of the chr19:7 (19p13.2) to chr19:15.5 (19p13.12) visualized region was calculated using the SamTools bedcov command (84) at 10 kb resolution. All datasets, with the exception of RNA-seq and H3K9me3 ChIP-seq, had two replicates which were highly correlated (r > 0.99) and, in such cases, the average coverage per each bin was used. The final number of mapped reads for each dataset was 60,550,146 for H3K4m1, 52,286,921 for H3K27ac, 62,084,095 for H3K4m3, 200,776,708 for RNA-seq, 622,389,260 for DNAse, and 65,357,125 for H3K27me3, and 26,947,714 for H3K9me3.
Homolog ellipticity
The Cartesian coordinates of the location of each homolog in the cell population were fitted to an ellipsoid (85), and the ellipticity score for any 8.16 Mb region was then calculated to be the ratio between the two largest principal axes of the ellipsoid. When comparing homologous chromosomal regions in a single nucleus, we determined the ratio between the more elliptical homolog and the less elliptical one, calling that measure the ellipticity ratio. An ellipticity ratio equal to 1 indicates that the two homologs of a single nucleus had the same ellipticity score. For the evaluation of significance, a random ensemble of elongation ratios was constructed by calculating the ratio between two homologs randomly selected from the analyzed cell population.
Modeling of genomes
Integrative 3D modeling with TADbit
The chromosome 19 normalized Hi-C interaction matrices of each chromosomal segment and the entire region (chr19: 7400000:15560000) were used for modeling at a resolution of 10 kb as previously described (41, 42). The selected resolution was the highest possible with an MMP score predicted to result in reliable 3D models ((86) and Table S7). For each chromosomal segment, two independent ensembles of models were generated: (i) Isolated chromosomal segments and (ii) the chromosomal segments extracted from models of the entire region. Briefly, TADbit automatically generates 3D models (42) using a restraint-based modeling approach, where the experimental frequencies of interaction are transformed into a set of spatial restraints (41). For each chromosomal segment, a total of 5,000 models were generated, with only the best 1,000 models (that is, those that best satisfy the input restraints) used for further analysis. For each of the modeled chromosomal segments, the contact map obtained from the final 1,000 models resulted in a good correlation with the input Hi-C interaction matrix (Table S7), which is indicative of good model accuracy (86).
Rigid fitting
The resulting 3D models for each chromosomal segment ensemble were rigidly fitted in the corresponding OligoSTORM density maps using the ‘Fit_in_map’ global rigid-body fitting protocol in Chimera (50). A total of 100 random placements of each chromosomal segment were generated by randomly rotating and translating the fit within the reference OligoSTORM density map, followed by a step of local optimization using steepest ascent. Next, two scores were computed for each chromosomal segment fit: (i) the cross-correlation coefficient (CCC) between the OligoSTORM density map and the chromosomal segment fit and (ii) the connectivity score (ConS) of the chromosomal segment fit with that of its sequentially neighboring chromosomal segment. The CCC score is expressed by the following formula (37, 87): where M is all the voxels in the density grid of the map; P and T are the chromosomal segment fit map and OligoSTORM density map, respectively; and are the intensity of the density maps at voxel i; and and are the respective mean intensity values. The CCC ranges from 0 (no match) to 1 (perfect match). The ConS, which ranges from 0 to 1 was defined as: where dP,COM is the distance between the starting or end bin of the fit with the center of mass of the surface interface with its neighboring chromosomal segment, and max(dP,COM) is the maximal dP,COM in the ensemble of fits. The surface interface is defined in a multistep process: (i) identification of the surface area of the chromosomal segment in analysis that lies within a cut-off distance of 50 nm from its neighbor chromosomal segment and calculated by the “measure contactArea” option in Chimera (50), and (ii) selection of the density points of the chromosomal segment density map that are bounded by the identified surface area by using the “mask” command in Chimera (50). Finally, a combined score (Fig. S13) integrating both measures was used for calculating the goodness of fit:
For each chromosomal segment, the top-scoring fits (95th percentile) were chosen for further refinement.
Flexible refinement
To further improve the agreement between the fit and the OligoSTORM density maps, a flexible fitting method was implemented similarly to the protocols for conformational refinement of protein guided by cryoEM density (43). The protocol is a heuristic optimization that relies on simulated annealing molecular dynamics applied to a series of subdivisions of the fit into progressively smaller rigid bodies (bl). A rigid bod bly could be any set of particles in the fit (including loops, a sub-domain, a contact domain, compartment segment or larger region). A set of rigid bodies (B) was defined for each fit using TADbit’s TAD caller and selecting borders with strength higher than 5. Finally, the exact rigid bodies were manually adjusted by visual inspection of the rigid fitted 3D starting model (Table S8). During the refinement, the coordinates of each bl ∈ B were displaced in the direction that maximized their cross-correlation with the OligoSTORM density map and avoided the violation of the imposed restraints from the population Hi-C interaction matrix. The final scoring function was defined as: where ECCC(P) quantified the fit of the probe map (map generated from the 3D fitted model) (ρP) and the target OligoSTORM density map (ρT), and is defined as the negative sum of crosscorrelation coefficients between the OligoSTORM density map and the rigid bodies bl; EH_LB(P) quantified the lower-bound harmonic restraints which prevents two particles from getting closer than a given equilibrium distance; EH_UB(P) quantified the upper-bound harmonic restraints, which forces two particles to be closer than a given equilibrium distance; EEV(P) is an hard sphere excluded volume term (lower bound harmonic oscillator dropping at zero at 2r); and EC(P) is a harmonic bond that connects adjacent particles at a given equilibrium distance. The weights W1, W2, and W3 determine the relative importance of the corresponding terms and were set as default, which proved optimal for fitting of sub-tomogram averaging (88). The optimization protocol to identify the best fit consisted of simulated annealing rigid-body molecular dynamics followed by a final a conjugate-gradient minimization. A maximum of 50 cycles of simulated annealing molecular dynamics steps were performed, in each cycle the system was heated from 0K to 1000K for a maximum of 100 steps and cooled back to 0K for a maximum of 200 steps. At each step, the optimization was terminated if the change in CCC was < 0.001. Finally, a minimization step was performed comprising 200 conjugate-gradients steps with weights W1, W2 and W3 set to 1 and 200 conjugate-gradients steps with W1set to 0 and W2 and W3set to 1. The goodness-of-fit was assessed using the CCC score and the clash score (CLS). The latter defined as the number of serious clashes (that is, distance d<2r) per 1000 beads. The refined model was chosen as the one that maximizes the CCC and minimizes the clash score (Table S8).
PGP1F Hi-C
PGP1f cells were cultured as described in the ‘Cell Culture’ section of Methods and pelleted at low speed centrifugation (185 x g) to maintain cellular and nuclear integrity. We generated 2 in situ Hi-C libraries with three million cells each, utilizing the MboI restriction enzyme (NEB, R0147M), following the protocol described in (8). Briefly, the Hi-C protocol entails crosslinking cells with 1 % formaldehyde (wt/vol), cell permeabilization with nuclei intact, DNA digestion with an appropriate 4-cutter restriction enzyme, and 5’ -overhang filling with the incorporation of a biotinylated nucleotide. The resulting blunt end fragments are ligated, and the DNA is then sheared. Biotinylated ligation junctions are captured with streptavidin beads, and the resulting fragments are analyzed with paired-end sequencing.
PGP1f Hi-C data processing
The two Hi-C libraries were sequenced with 150 bp paired-end sequencing reads on an Illumina HiSeqX-10. The resulting sequencing data from both libraries (Table S9) was processed separately using Juicer (89) and mapped to the GRCh37/hg19 (hg19) human genome assembly (8, 89). To generate statistics and a Hi-C file for a combined map including both replicates (Table S5; Fig. S14), the mega.sh script was used (https://github.com/theaidenlab/juicer/wiki/Usage (89), with the following command: /gpfs0/juicer/scripts/mega.sh.
The combined Hi-C map generated a total of 691,620,560 reads, achieving a resolution of 5 kb, in alignment with Hi-C map resolution parameters described in (8). This combined map was used with ‘KR normalization’ (51, 89) for all subsequent analyses.
Assignment of manually identified compartments as ‘A’ or ‘B’
To label the two manually identified compartments as ‘A’ or ‘B’, we began by partitioning chromosome 19 into 100kb loci. These loci were assigned to two compartments based on the intrachromosomal contact matrix for chromosome 19 using the eigenvector method (Fig. S8) as implemented in Juicer (3, 89). Specifically, we used the following command: ‘java -jar /gpfs0/juicer/scripts/juicer_tools.jar eigenvector KR inter.hic 19 -p BP 100000’.
Visual inspection of the Hi-C map for chromosome 19 confirmed that the eigenvector corresponded reliably to the plaid pattern in the data (3) (Fig. S8 and interactive Fig. S8 http://bit.lv/2zCG2X6, for interactive figure documentation see: https://igvteam.github.io/juicebox.js/).
Next, we labeled the two compartments as ‘A’ and ‘B’ on the basis of GC content, which is known to be enriched in the ‘A’ compartment and in open chromatin more generally (90). The GC content track for human hg19 chr19 was downloaded from the UCSC Genome Browser (http://hgdownload.cse.ucsc.edu/goldenpath/hg19/gc5Base/) and encodes the raw data for the gc5Base track on hg19 (site last updated on 24-Apr-2009 14:48). We binned the data at 100 kb resolution for chromosome 19, using the following awk command: awk -v res=100000 ‘BEGIN{fname=“temp”}NF>2{split($2,c,”=“); if (fname!=c[2]){print tot ≫ fname”.txt”; close(fname”.txt”); m=res; tot=0} fname=c[2]; }NF==2 && $1>m{while ($1>m){print tot ≫ fname”.txt”; m=m+res; tot=0;}}$1<=m{tot+=$2; }END{print tot ≫ fname”.txt”; close(fname”.txt”);}’ my_file.wig
Finally, we calculated the Spearman rank correlation coefficient between the compartment eigenvector and the GC content track. We obtained a Spearman rho value of - 0.674928131634, which implies that the ‘A’ compartment corresponds to loci whose eigenvector entry is negative and the ‘B’ compartment corresponds to loci whose eigenvector is positive (Fig. S8, Interactive Figure http://bit.ly/2zCG2X6).
Accordingly, we labeled compartmental intervals 1, 3, 5, 7, and 9 (at coordinates 7 - 8.6 Mb, 9.3 – 9.5 Mb, 9.6 – 9.8 Mb, 9.9 – 14.6 Mb, and 15.2-15.5 Mb) as compartment ‘A’ and intervals 2, 4, 6, and 8 (with coordinates 8.7 – 9.3 Mb, 9.5 – 9.6 Mb, 9.8 – 9.9 Mb, and 14.6 – 15.2 Mb respectively), as compartment ‘B’. Note, compartmental intervals may differ from chromosomal segments by genomic coordinates and/or compartment classification.
Acknowledgments
We thank Geoffrey Fudenberg for calling contact domains and compartmental intervals in IMR-90 cells; Kun Zhang, Daniel Jacobsen, Bing Ren, and Anthony Schmitt for PGP1 haplotyping; Bogdan Bintu for help regarding microfluidics; Bogdan Bintu, Alistair Boettiger, Florian Schueder, and Xiaowei Zhuang for helpful input; and Jennifer Waters and the Nikon Imaging Center of Harvard Medical School. The authors also thank members of the Yin, Aiden, Marti-Renom, and Wu laboratories for countless valuable discussions. Finally, C.-t.W. thanks H.N., in particular, for conversations regarding the implications of structural variation and turbulence and, for remarkable cells and years of discussion, she also thanks PGP1 and PGP95.
References and Notes
- 1.↵
- 2.
- 3.↵
- 4.
- 5.
- 6.
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.
- 14.↵
- 15.↵
- 16.
- 17.
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.
- 25.
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.
- 54.
- 55.
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵