Abstract
The mechanisms of formation of LADs, the lamina associated domains, and TADs, the topologically associating domains of mammalian chromatin, were investigated here by using as a starting point the observation that chromatin architecture relies on an isochore framework and by doing a new analysis of both isochore structure and the isochore/chromatin domain connection. This approach showed that LADs correspond to isochores from the very GC-poor, compositionally very homogeneous L1 family and from the “low-heterogeneity” L2 (or L2−) sub-family; in fact, LADs are compositionally flat, flexible chromatin structures (because of the nucleosome depletion associated with the frequent oligo-A’s) that attach themselves to the nuclear lamina in self-interacting clusters. In contrast, TADs correspond to the increasingly GC-richer isochores from the “high-heterogeneity” L2 (or L2+) sub-family and from the H1, H2 and H3 families. These isochores, making the framework of the individual chromatin loops or of the chromatin loop ensembles of TADs, were found to consist of single or multiple GC peaks. The self-interacting single or multiple loops of TADs appear to be shaped by the property that accompany the increasing levels of GC and CpG islands in their isochore peak backbones, namely by an increasing bendability due to decreasing nucleosome density which is accompanied by decreasing supercoiling and increasing nuclease accessibility. In conclusion, chromatin architecture appears to be encoded and molded by isochores, the DNA units of genome organization. This “isochore encoding/molding model” of chromatin domains represents a paradigm shift compared to previously proposed models. Indeed, the latter only rely on the properties of architectural proteins, whereas the new model is essentially based on the physico-chemical properties of isochores and on their differential binding of nucleosomes.
Introduction
In interphase nuclei, chromatin comprises two sets of domains that are largely conserved in mammals: LADs, the lamina associated domains (∼0.5Mb median size), that are scattered over all chromosomes and correspond to GC-poor sequences (1-3), and TADs, the topologically associating domains (0.2-2 Mb in size), a system of GC-rich loops (4-7); many TADs can be resolved into contact domains (0.185 Mb median size (6).
In spite of the recent, impressive advances in our understanding of chromatin domains (see refs. 8-14, for reviews), the problem of their formation mechanism(s) is still unsolved. Interesting models have been proposed (15-24), but no satisfactory solution has been reached so far. The currently predominant model for TAD formation is the “chromatin extrusion model” (18,19), which proposes that TADs emerge as a consequence of loop extrusion by loop extruding factors (including cohesin), which is limited by boundary elements (including CTCF).
Here, the problem of chromatin domain formation was approached by taking into account the observation that isochores (see ref. 25 for a review) make up the framework of chromatin domains (26) and by having a new look at isochore structure and at the isochore/chromatin domain connection. This approach was applied to human chromosome 21 which was chosen because 1) this chromosome is a good representative of human chromosomes, allowing an extension of the results to all other chromosomes (as investigated in ref. 26); and 2) is the smallest human chromosome, allowing a more expanded graphical presentation of data.
As far as nomenclature is concerned, although TADs comprise, by definition, all the topologically associating domains, in the context of this article TADs will indicate the chromatin domains other than LADs. The main reason for this choice is that the mechanisms of formation of the two sets of domains are different, even if based on the same fundamental DNA property.
Results
Isochore structure: a new analysis
Figure 1A shows the compositional profile of the DNA sequence from human chromosome 21 as seen through non-overlapping 100Kb windows. This window size was used because 100KB is a plateau value under which the composition of DNA segments show an increasing variance with decreasing size (27) due to several factors (for instance, the distribution of interspersed repeated sequences).
Figures 1B and 1C display the isochore profile as obtained from the chromosome 21 sequence using either a sliding window approach (28,29; Fig. 1B) or a fixed window approach (27, 30; Fig. 1C). Both approaches flatten the compositional profiles by averaging, in two different ways, the fluctuating values of the large regions characterized by “fairly homogeneous” composition, the isochores. In the case of the sliding window approach, remnants of the fluctuations can still be seen as small spikes in GC-rich regions, whereas in the case of the fixed window approach the fluctuations disappear because of the strict averaging procedure applied.
A new, simpler, in fact elementary, approach was used here, namely plotting the individual GC values of 100Kb DNA blocks as points. This approach was suggested by two recent results: 1) correlations hold between isochore properties and the properties of chromatin domains (31); and, more precisely, 2) the framework of TADs and LADs is made up by GC-rich and GC-poor isochores, respectively (26). One may therefore imagine a possible topological similarity between the flat structure of LADs and the loops of TADs on the one hand, and the compositionally flat GC-poor and the compositionally heterogeneous GC-rich isochores, respectively, on the other. If such is the case, clearly a simple 100Kb point-by-point profile of GC levels is preferable not only to both sliding and fixed window approaches, that flatten the compositional profile, but also to the color bar plot of 100Kb DNA segments (Fig. 1A) for graphical clarity reasons.
The point-by-point profile (see Fig. 1D) expectedly showed the compositionally flat region 2 and the H1 and L2 peaks (a to f) of regions 1 and 3, that were already evident in Figs. 1A, 1B and 1C. It also led, however, to the discovery that the sequences of isochores from H1 (region 4), H2 (region 5) and H3 (region 6) isochores were not simply fluctuating within the compositional borders of the corresponding families (see Supplementary Table S1), but consisted, in fact, of sets of GC peaks. Upon close inspection, these very evident peaks may be seen to correspond to the minute peaks of Fig. 1B, that were flattened by the sliding window approach. Expectedly, the peaks of the point-by-point plots covered a broader GC range compared with the flattened peaks of the sliding window approach, as shown by comparing Fig. 1D with Fig. 1B, and Fig. 1F with Fig. 1E (the high resolution compositional profiles of a multi-peak H2 isochore from chromosome 20).
In purely compositional terms (see Fig. 2 for a larger-scale presentation of the data of Fig. 1D), three different situations were found: 1a) a series of single peaks (regions 1 and 3) corresponding to an H1 isochore (a), and to several L2 isochores (b to f), in which latter case very few points were slightly beyond the “fixed” isochore family borders, but still within the “extended” borders of Supplementary Table S1; 1b) several very sharp H3 single peaks (region 6) that included sequences belonging to the H2 and even to the H1 family, in which case an overall GC range of 18% was reached; 2) a very homogeneous L1 isochore (region 2), in which the overall GC range barely reached 4% and all points were within the “fixed” GC borders of isochore family L1; and 3) two series of GC-rich multi-peak isochores that belonged to H1 (region 4) and H2 (region 5) families in which, again, very few points were slightly beyond the “fixed” isochore family borders. The striking difference between the compositional profile of L1 and H3 isochores is shown in Fig. 2A. Expectedly, when using a higher resolution windows (50Kb; see Supplementary Figure S1) the compositional profiles of isochore peaks became broader in the GC level gradients and more complex, because of the presence of interspersed repeated sequences and CpG islands (see ref. 27).
Isochores and LADs
It is well established (see refs. 1-3) that LADs 1) may cover ∼35% of the human genome; 2) comprise 1,100-1,400 discrete domains demarcated by CTCF sites and CpG islands; 3) have a median size of ∼0.5Mb; 4) are scattered over all chromosomes; 5) can be subdivided into cLADs, i.e., “constitutive” LADs present in the four cell types originally tested and fLADs “facultative” LADs, only present in some cell types (in fact, only ∼15% of the genome is involved in “stable contacts” present in most cells); 6) are characterized, in the case of cLADs, by conserved positions in syntenic regions of human and mouse chromosomes; 7) show a correspondence of cLADs and ciLADs (the “constitutive inter-LADs”) with GC-poor and GC-rich isochores, respectively.
As shown in Figs. 3A (and 4A,4B), the major LAD of chromosome 21 corresponds to a large L1 isochore (which, incidentally, includes an exceptional GC-poor interLAD; see also the interval in the self-interacting domains corresponding to the L1 isochore in Fig. 4C). The other LADs correspond to the L1 isochores that separate the L2 peaks (to be described below and in the following section), and to a “valley” L2 isochore comprised between two H1 isochores (on the right side of Fig. 3A). Moreover, two LADs flank the centromere; in fact, this appears to be the rule for all human chromosomes, as judged by looking at the results of ref. 26.
In chromosome 20 (Fig. 3B), the largest LAD corresponds to an L2 isochore (interrupted by an interLAD) while several other LADs correspond to L2 valley isochores flanked by H1 isochores; among faint LADs, one (extreme right) corresponds to an H1 isochore comprised between two H2 isochores and two other ones flank the centromere. In the very GC-rich chromosome 19 (Fig. 3C), two LADs correspond to two H1 isochores flanking an H2 isochore and two other LADs correspond to L2 isochores flanking an H1 isochore; finally, two faint LADs flank the centromere.
These results show that LADs correspond not only to L1 isochores that represent ∼19% of the genome (incidentally, not too far from the ∼15% involved in “stable contacts”; see ref. 3), but also to L2 isochores and even to H1 isochores in the rare case of very GC-rich chromosomes.
As far as L2 isochores are concerned, it appears (see Supplementary Table 1 and Fig. 1) that 1) some isochores belong to a “low-heterogeneity” L2 sub-family that may be called L2−, show a flat profile (see, for example, the largest LAD of chromosome 20); and 2) some other isochores belong to a “high-heterogeneity” L2, or L2+, sub-family that are higher in average GC and are in the shape of single peaks (see Figs. 1 A-D and 3A). Now, as shown in Fig. 3, L2− isochores correspond to LADs, whereas L2+ isochores correspond to interLADs and TADs (see the following section). The remaining L2 isochores are generally present as valleys flanked by GC-richer isochores (see Fig. 3A,3B,3C; the relative amounts of L2 sub-families are presented in Supplementary Table S1).
Isochores and TADs
It should be recalled, as a preliminary remark, that the isochores from the five families (L1, L2, H1, H2 and H3) of the human genome (and other mammalian genomes; 32) are characterized not only by increasing GC levels and different short-sequence frequencies, but also by increasing compositional heterogeneities, increasing levels of CpG, CpG islands and Alu sequences and by decreasing levels of LINE sequences and of 5mC/CpG ratios (27,33-37). Moreover, at the chromatin level, GC increases are correlated with higher bendability (38), higher nuclease accessibility (39,40), lower nucleosome density (41) and lower supercoiling (42,43), all properties linked to DNA sequences.
The connection of the isochores of chromosome 21, as seen in Figs. 1D and 2, with chromatin loops can be described as follows (see Fig. 4A,4B,4C): 1) regions 1 and 3 show a series of H1 (a) and L2 (b to f) isochores in which latter case at least some of their single peaks appear to correspond to individual self-interactions; 2) region 2 is the GC-poorest L1 isochore which corresponds to two self-interactions (separated by an exceptional interLAD); 3) the multi-peak H1 isochores of region 4 correspond to a large interLAD region and to several self-interactions; the two short sequences X and Y, corresponding to LADs, separate region 4 from regions 3 and 5; 4) the small multi-peak H2 isochore (region 5) seems to correspond to a single self-interaction; this may be due to the dense packing of the peaks and/or to a lack of resolution; 5) a series of H3 isochores (red points comprised between two red arrows) correspond to a series of self-interactions comprised between the two red lines on the heat map; in this case, the six H3 isochore peaks correspond to at least three chromatin loops. In conclusion, the two classes of isochores, single-peak and multi-peak, essentially correspond to two classes, single-loop and multi-loop, respectively, of TADs (both of which also show inter-chromosomal interactions; 26).
The correspondence between isochores peaks and self-interactions is improved at a higher resolution of the heat map (compare the high-resolution Fig. S2A with the low resolution Fig. S2B). Likewise, a very good match of isochore peaks with chromatin loops can be seen in the high resolution heatmap of the multipeak H2 isochore of chromosome 20 (see Supplementary Fig. S3).
A very interesting correlation is shown in Fig. 4D, in that regions 1 and 3 to 6 correspond to A compartments (open chromatin) whereas region 2 and the short X and Y sequences correspond to B compartments (closed chromatin; see ref. 15). More precisely, the A compartment corresponds to multi-peak isochore TADs (regions 1,3,4,5,6), the B compartment to individual LADs (region 2,X,Y), the former being more frequent in telomeric regions.
Discussion and Conclusion
Encoding of chromatin domains by isochores
Very recent investigations showed that GC-poor and GC-rich isochores should be visualized as the framework of chromatin architecture or, in other words, as the DNA units that underlie LADs and TADs, respectively (26). This was an important step towards the idea that isochores encode chromatin domains. The present results provide a conclusive evidence for this idea, by showing a precise match between the chromatin domains and the isochores of chromosome 21 and by generalizing these results to all human chromosomes.
Indeed, the compositional profiles, the heatmaps and the LAD maps (26) show that: 1) the isochores from the L1 family and the L2− sub-family correspond to LADs in all human chromosomes; 2) L2+ peaks emerging from an L1 background and corresponding to interLADs and TADs are also found in other human chromosomes, although less frequently than in chromosome 21; likewise, H3 peaks also corresponding to interLADs and TADs are present in most human chromosomes; 3) the spikes of the compositional profiles of H1 and H2 isochores of Fig. 1B, that reflect the peaks of Fig. 1D, are regularly present in H1 and H2 isochores from all human chromosomes and correspond to the peaks of point-by-point profiles (Cozzi P. et al., paper in preparation) and to heat map interactions. This general match is important in that the only alternative to the encoding proposed here is that the match of the thousands of LADs and TADs with the corresponding isochores is just a coincidence (and this cannot be quia absurdum).
Molding of chromatin domains by isochores
The present results also solve an important open problem, namely the mechanism of formation of chromatin domains. Indeed, LADs should be visualized as chromatin structures corresponding to GC-poor isochores that are flexible, because of the local nucleosome depletions linked to the richness of oligo-A sequences in the corresponding isochores (33,37,44,45; G. Lamolle, H. Musto, G. Bernardi; paper in preparation). LADs only twist and bend in order to adapt and attach themselves to (and even embed in) the lamina, which is reassembled after mitosis (3). Expectedly, this leads to self-interactions (see Fig. 4), as well as to interactions with other LADs from the same chromosomes (26; see, for example, the two LADs bracketed by black lines in Fig. 4). In the case of TADs, the GC gradient within each GC-rich isochore peak is accompanied by properties, increasing levels of CpG, CpG islands and Alu sequences, that lead to increasing nucleosome depletion and bendability and decreasing supercoiling (38-43). These factors constrain the corresponding chromatin to fold into loops.
The models for the formation of LADs and TADs developed in this investigation are presented in Fig. 5, which stresses a keypoint, namely the central role played by the compositional properties of isochores in the formation of chromatin domains. Indeed, the folding model presented here clearly relies on isochore sequences, their nucleosome depletion and the emerging local (in LADs) or extended (in TADs) flexibility of the chromatin fiber.
It should be stressed that this “isochore encoding/molding model” of chromatin domains represents a paradigm shift compared to previously proposed models. Indeed, the latter only rely on the properties of architectural proteins, whereas the new model is essentially based on the physico-chemical properties of isochores and on their differential binding of nucleosomes.
The “isochore encoding/molding model” of chromatin domains is, however, compatible 1) with both the requirements of CTCF binding to close chromatin loops into insulated TADs (46) and the lack of such requirements (47); 2) with the interaction of topoisomerase II beta with cohesin and CTCF at topological domain borders (48); 3) with an “insulation-attraction model” of TAD formation (9) in which the insulation observed at TAD boundaries may result from stiffness of the chromatin fiber caused by functional elements (CTCF binding sites, highly active transcription starting sites etc) associated with increased nucleosome density and specific local chromatin interactions due to “attractive forces” (not better specified but possibly linked to supercoiling); 4) with the “chromatin extrusion model” (18,19), if the initial attachment of the loop extruder (cohesin) were to coincide with the tips (highest GC levels) of TAD loops.
The “isochore encoding/molding model” vs the “chromatin extrusion model”
Although, as just mentioned, “the extrusion model” could overlap with the “isochore endoding/molding model”, a question may be raised about which one of the two models is better supported by facts. An answer may come by considering what happens in the case of the “mitotic memory”, namely the rapid and precise re-establishment of the original interphase chromatin domains at the exit from mitosis. In the first case, the basic information required for such quick re-establishment is already present in the sequences of isochores (CTCF may also play an important role in the process). In the second case, the formation of thousands of loops involves the attachment of loop extruding factors (possibly at specific sequences as suggested above) and an extrusion process, which requires a source of energy. It is obvious that the first model relying on well known intrinsic physical properties of DNA is to be preferred to the second one which involves thousands of conjectural loop extruders and an unknown source of energy.
A final point should be made to stress that the models under discussion here concern the basic evolutionarily stable chromatin domains, since, as it is well known, epigenetic modifications and environmental factors may cause changes in chromatin architecture; indeed, while self-associating domains are stable in mammals, chromatin interactions within and between domains may change during differentiation (49) and evolution (the latter subject will be discussed elsewhere).
Isochores as functional genome units
The present results lead to a new vision of isochores since they not only correspond to a fundamental level of genome structure and organization (50), but also to a set of functional genome units that encode and mold LADs and TADs. Several observations support the above conclusion, three of which are the following: 1) the evolutionary conservation of the isochore patterns in mammals (32); 2) TADs from cells of adult organisms are basic units of replication timing (51) and GC-rich and GC-poor isochores are replication units characterized by all early or all late replicons (52); 3) alterations of the architecture of chromatin domains (both LADs and TADs), known to lead to senescence and diseases (see the review papers cited in the Introduction), are due to changes in their isochore framework. This was predicted by previous investigations on “genomic diseases” (53,54), defined as diseases due to sequence alterations that do not affect genes or classical regulatory sequences, but other sequences that “cause regional changes in chromatin structure”.
The fact that alterations in the chromatin architecture, not affecting genes or classical regulatory sequences, lead to problems in transcription 1) represents the strongest and final objection to the idea of “junk DNA” (see ref. 55, for a review); and 2) has practical implications: indeed, screening even a small human population in terms of chromatin structure in view of detecting “genomic diseases” is simply not feasible at least at the present time. The availability of LADs and TADs maps along sequenced human chromosomes may allow, however, to link alterations at the DNA sequence level with chromatin structure alterations. For instance, the maps of insertions, deletions and SNPs of Venter’s chromosomes (56) in combination with reference chromatin structure maps, might lead to detect problems in Venter’s chromatin domains.
The large-scale organization of the human genome
We can now consider a higher level of isochore and chromatin organization. At the DNA level, two “genome spaces” were defined on the basis of gene density (57,58): the gene-poor “genome desert” (L1+L2 isochores) and the gene-rich “genome core” (H1+H2+H3 isochores). In the interphase nucleus, the chromatin corresponding to the genome core showed an internal location and an open structure, whereas the chromatin corresponding to the genome desert showed a peripheral location and a closed structure (see Table 1); moreover, the former showed a preference for (generally GC rich) telomeric regions, the latter for centromeric regions, this preference explaining the polarity of chromosomes in the nucleus (59,60).
The recently proposed chromosome compartments, A and B, characterized by open and closed chromatin, respectively (15), show properties very similar to those just described. In conclusion, the two compartments, A and B (see the compartment profile of chromosome 21 in Fig. 4D) appear to correspond to the two genome spaces, the “genome core” and the “genome desert” (see Table 1). This conclusion is supported by the comparison of sub-compartments with isochore profiles (26).
The genomic code
The encoding of chromatin domains by isochores deserves the name of “genomic code”. This definition was originally coined (61,62; see also ref. 25) for two sets of compositional correlations 1) those that hold among genome sequences (for instance, between coding and contiguous non-coding sequences) and among the three codon positions of genes and that reflect isochore properties; and 2) those that link isochores with all the structural/functional properties of the genome (25,26,31), the latter now including the properties of TADs and LADS. Here it is proposed that the definition of “genomic code” be applied to the encoding of chromatin domains by isochores, since this is in fact the basis for the second set of the correlations just mentioned. Interestingly, the genomic code may be visualized as the fourth, and last, pillar of molecular biology, the first one being the double helix (1951-1953), the second the regulation of gene expression in E. coli (1957-1961), and the third the genetic code (1961-1966). In contrast with the other pillars, the genomic code took decades to be established.
Acknowledgements
The author thanks Paolo Ascenzi for hospitality, Giacomo Bernardi, Oliver Clay, and, especially, Kamel Jabbari for critical reading, comments and discussions as well as Caterina Nuvoli for excellent technical help. This research was supported by the Kimura Prize for Molecular Evolution and Evolutionary Genomics conferred to the author (Tokyo, June 2016).
Footnotes
gbernardi{at}uniroma3.it