ABSTRACT
More than 1,000 genes are differentially expressed between male and female mouse liver. However, ∼90% of 4,000 sex-differential open chromatin regions harboring regulatory elements implicated in liver sex bias are distal enhancers. To elucidate the role of 3D nuclear organization and address the challenge of linking distal regulatory regions to sex-biased gene targets, we performed ChIP-seq for CTCF and cohesin, known mediators of 3D nuclear organization, and identified ∼1,000 binding sites showing significant differential occupancy by these factors in male and female mouse liver. Distal genomic regions showing sex-differential CTCF and/or cohesin binding were evaluated as potential factors in sex-dependent looping controlling sex-biased proximal effects on gene expression. Results were integrated with a study of cohesin depletion in mouse liver to evaluate the impact of cohesin loss on the expression of putative gene targets. Further, circularized chromosome conformation capture with sequencing (4C-seq) was used to discover striking sex-differences in 3D looping, linking distal sex-biased enhancers to proximal sex-biased gene expression for both male-biased (C9, Nudt7, Nox4) and female-biased genes (A1bg, Sult3a2). Our findings also illustrate the importance of a nested intra-TAD loop for insulation of sex-specific enhancer–promoter contacts for Nox4. These studies reveal how chromatin interactions and 3-dimensional genome organization guided both directly and indirectly by CTCF and cohesin looping contribute to liver sex bias.
INTRODUCTION
Human sex differences are observed in several non-reproductive tissues, notably brain [1], immune system [2], kidney [3], and liver [4]. The health impacts of sex differences in liver include a higher incidence of aggressive liver cancer in males [5], increased susceptibility to autoimmune hepatitis in females [6], and sex differences in metabolism of some pharmaceuticals and environmental chemicals [7]. Transcriptomic and epigenetic sex-bias is best characterized in mouse liver, where ∼1,000 genes [8], including many lncRNA genes [9, 10] and miRNAs [11], exhibit sex-biased expression that may be regulated by thousands of sex-specific DNase I hypersensitive sites (DHS) [12, 13]. This prior work has shown that the majority of sex-biased genes do not have gene-proximal sex-biased chromatin marks or DHS, suggesting most epigenetic sex bias occurs at distal elements such as enhancers [12, 13]. Although not directly tested to date, three-dimensional looping is one mechanism that could link the ∼4,000 mostly distal sex-biased enhancers to the ∼1,000 sex-biased genes.
Two major transcription factor (TF) regulators of 3D genomic architecture are CCCTC-binding factor (CTCF) and the multi-protein complex cohesin. CTCF has primarily been studied for its role in DNA looping and insulation [14, 15]. Cohesin, named for its role in sister chromatid cohesion [16], is the molecular motor powering DNA looping via a “loop extrusion” mechanism [17, 18]. Loss of CTCF or cohesin is embryonic lethal in mice [19, 20], however, in adult mouse liver the induced degradation of the cohesin loading factor Nipbl results in a dose-dependent loss of both cohesin binding and virtually all focal DNA looping [21]. In other systems, a similar loss of looping has been observed as a result of direct depletion of either cohesin [22] or CTCF [23]. Thus, both CTCF and cohesin are required for DNA looping.
Cohesin is loaded and unloaded from chromatin by specific protein complexes but lacks sequence-specific DNA binding [24], thus its role is largely dependent on its binding partners. When associated with mediator and tissue-specific transcriptional regulators, cohesin mediates shorter-range looping between enhancers and promoters—often referred to as enhancer-promoter loops or gene loops [25–27]. Regions bound by cohesin but not CTCF (cohesin-non-CTCF sites; CNC) tend to be highly tissue specific but are frequently weaker than sites where cohesin is bound together with CTCF [25, 28, 29]. When cohesin and CTCF are both bound, at cohesin-and-CTCF (CAC) sites, cohesin forms insulating loops, which have been variously characterized as insulated neighborhoods [26], loop domains [30], Topologically Associating Domains (TADs) [31], and intra-TAD loops [32]. The majority of insulating CAC-mediated insulated loops are conserved across multiple tissues or cell types [30–33], and are often even conserved across mammalian species [34]. A distinct subset of CAC sites are important for enhancer-promoter interactions [35, 36], and may specifically help structure super-enhancers around key constituents enhancers, known as hubs [37]. The role of CTCF binding in the absence of cohesin (lone CTCF sites) is less clear. Unlike cohesin, CTCF binds specific DNA motifs via its 11 zinc fingers [38], yet the function of lone CTCF sites is likely also dependent on additional interacting proteins, such as the TFs YY1 [39], BORIS [40], and STAT5 [41].
CAC-mediated insulating loops are typically anchored by a pair of highly-bound CTCF sites oriented by the non-palindromic CTCF motif [30]. Supporting this finding, inversion or deletion of a CAC-bound anchor results in a loss of looping [17, 42]. This stereotypical pattern of convergently-oriented CTCF binding enables the accurate prediction of CAC-mediated loops from CTCF/cohesin binding activity and motif orientation alone [17, 32, 43]. Computational prediction of loops linking enhancers and promoters, i.e., gene loops, is a more elusive goal, although some recent progress has been made [44, 45]. To directly measure looping at this scale, experimental methods based on chromatin conformation capture (3C) technology use formaldehyde crosslinking and restriction enzyme digestion, followed by proximity ligation to probe the interaction frequency between regions in the genome [46]. Circularized chromatin conformation capture with sequencing (4C-seq) is a variant of 3C that assesses all interactions between a single region of interest and the rest of the genome (“one-versus-all”) [47].
Here we took a multi-pronged approach to characterize the role of architectural proteins and 3D genome organization in the maintenance of mouse liver sex differences. First, we performed ChIP-seq for CTCF and cohesin using chromatin from male and female mouse livers and show that the majority of sex-biased binding of these factors occurs at distal intergenic sites. Only a minority of these binding sites are cis to sex-biased genes, which may cis-proximal (within 20 kb) or more cis-distal (> 20 kb, but within the same TAD) to a sex biased gene. Specific examples of cis proximal and distal CTCF/cohesin binding are shown for male-biased genes (proximal: Cml5, Nat8; distal, C8a, C8b) and female-biased genes (proximal: Fam84b; distal: Slc22a29). Using a publicly available cohesin depletion dataset for male liver [21], we find that distally-regulated male-biased genes are considerably more sensitive to cohesin loss than those with proximal sex bias. To directly measure differences in chromatin interactions, we use 4C-seq for six viewpoints at or nearby five genes showing liver sex bias in expression, ranging from 2-fold to 700-fold (female-biased: A1bg and Sult3a1; male-biased: C9, Nudt7, and Nox4). Finally, we show the importance of a nested intra-TAD loop for insulation of enhancer-promoter contacts for Nox4. Overall, we describe how 3-dimensional genome organization contributes to liver sex bias in both direct and indirect ways.
RESULTS
Global differences in CTCF/Cohesin binding between male and female mouse liver
To understand the differences between male and female 3D genome organization, we determined the distribution of CTCF and cohesin binding genome wide by ChIP-seq. Using chromatin prepared from mouse liver, we performed ChIP-seq using polyclonal antibodies to CTCF (n=4 livers per sex) and the cleavable cohesin complex protein Rad21 (n=3 livers per sex). We identified 975 CTCF peaks (750 on autosomal chromosomes; Table S1A) and 1,011 cohesin peaks (841 autosomal; Table S1B) showing >2-fold sex-difference in factor binding (Figure 1A). Peaks with sex differential binding for both CTCF and cohesin are represent only 13.4% of differential cohesin peaks and 16.3% of differential CTCF peaks (Figure 1B). A similar low percentage overlap of sex-differential CTCF binding with sex-differential cohesin binding was found using several alternative filters or approaches to define sex-differential binding (Figure S1A). Aggregate plots and heatmaps show that, on average, peaks with sex-biased CTCF binding show the same trend of sex bias in cohesin binding (and vice versa), although the magnitude is generally less than 2-fold (Figure 1C).
Location of sex-biased binding sites relative to genes and regulatory elements
Given the strong dependence of cohesin and CTCF function on the presence of their binding partners, we subdivided the 1,845 sex differential CTCF and/or cohesin sites into 4 groups: CAC sites with sex-differential binding of cohesin (ΔCoh) or CTCF (ΔCTCF), CNC sites with sex-differential cohesin binding, and Lone CTCF with sex-differential CTCF binding (see Methods). Sex-differential CAC sites comprise 45-66% of all sex-differential sites (Figure S1B). The sex-differential CAC sites are stronger peaks than CNC or lone CTCF sites, with the female-biased sites generally found at stronger peaks than the male-biased sites (Figure 2A, Figure S2A). This difference is reflected in the percent of peaks with a CTCF motif and the motif quality (Figures S2B,C). A higher fraction of male-biased Lone CTCF sites contain a CTCF motif than similar female-biased sites (Figure S2B), but there is no significant difference in normalized ChIP signal or motif score (Figure 2A, Figure S2C). As CTCF binding is inhibited by DNA methylation [48, 49], the increased fraction of male-biased “Lone CTCF” peaks could be due to the reported hypomethylation of DNA in male compared to female [50], which could open up additional binding sites (see Discussion).
Female-biased CTCF/cohesin binding sites are significantly closer to genes and enhancer DHS (eDHS) than male-biased sites (Figure 2B,C). As expected, sex-biased CNCs are more closely associated with eDHS than sex-biased CAC sites. Specifically, the majority of sex-biased CNCs are within ∼200 bp of an enhancer (median distance: 202 bp in males, 119 bp in females), compared to a median distance of 8.8 kb for male-biased CAC and 17.5 kb for female-biased CAC sites (Figure 2C, left). This supports the proposal that, although sex-biased CNCs show weaker binding than CACs, they are often present at active enhancers and likely functional.
We identified 137 CAC peaks with sex differences in both CTCF and cohesin binding, 84 of which are on sex chromosomes (Table S1C). Of these, only 17 overlap TAD or intra-TAD loop anchors in liver [32], and again a majority of these are on sex chromosomes (9 of 17; Table S1C). Consistent with this, ∼90% of intra-TAD loops and anchors predicted in male liver are also predicted in female liver (Figure S2D). Thus, CAC-mediated intra-TAD loops in mouse liver are largely conserved between the sexes, and therefore do not directly regulate sex-biased gene expression. Rather, TAD and intra-TAD scale insulating loops are likely to indirectly guide enhancer-promoter looping in a sex-biased manner, as is described below.
We compared the binding of sex differential sites to CTCF binding and gene expression across 15 mouse tissues using data for male liver from the ENCODE consortium [51, 52]. Strikingly, female-biased CAC sites (ΔCoh or ΔCTCF) showed greater overlap with CTCF binding sites from other mouse tissues (i.e. are less liver specific; Figure S3A), even though the ENCODE data used came from male liver. By comparison, female-biased Lone CTCF sites are more liver-specific than male-biased sites, while there was no difference between the highly liver-specific male- and female-biased CNC sites (Figure S2C). The higher tissue conservation for female-biased CAC peaks is most likely a direct reflection of their binding strength, as stronger peaks show greater conservation for both sex-differential and all CTCF peaks (Figure S3B). When comparing the tissue specificity in expression of the nearest liver-expressed gene (FPKM > 1; TSS within 20 kb of peak), female-biased CNCs are nearby liver-specific genes relative to all expressed genes or female-biased CACs (p=0.0013 and p=0.0071, MW test; Figure S3C). While the closest genes to male-biased CNCs are more tissue specific than liver-expressed genes overall, they are not significantly more tissue specific than genes neighboring male-biased CACs (p=0.0247 and p=0.1050, MW test; Figure S3C). Thus, it appears that sex-biased CNC peaks also participate in tissue-specific transcriptional regulation, as was described for CNC peaks generally [25, 28, 29].
Sex-biased CTCF/cohesin sites tend to be distal from sex-biased genes and sex-biased DHS or H3K27ac-marked regions (Figures 2D,E). Fewer than 20% of sex-biased CTCF/cohesin binding sites are within 20 kb of a sex-biased gene, while at most 53% of male-biased CNCs could be considered potential cis regulators of sex-biased genes (within the same intra-TAD or TAD loop as a sex-biased gene; Figure 2D). Sex-biased CNCs are significantly closer to sex-biased cis regulatory elements than CACs or Lone CTCF sites, however the majority of all sites remain quite distal (Figure 2E).
Proximal and distal sex-biased CTCF/Cohesin peaks and impact of cohesin depletion
Examples of sex-biased genes with proximal sex-biased CTCF/cohesin binding are shown in Figures 3A and 3B. These include the male-biased genes Cml5 and Nat8 (M/F expression ratios of 20.2 and 4.2, respectively; Figure 3A) as well as the female biased gene Fam84b (F/M expression ratio of 2.8; Figure 3B). Approximately 3 kb upstream of the TSS of Cml5 a male biased CNC overlaps a male-biased DHS (red arrow in Figure 3A), while a male biased CAC overlapping a male-biased DHS is positioned ∼12 kb upstream of the TSS of Nat8. Presumably, cohesin and/or CTCF play a role in looping these proximal male-biased enhancers to these neighboring genes to enhance the male-biased in gene expression. Even more proximal, the female-biased CNC shown in Figure 3B overlaps the TSS of Fam84b and female-biased H3K27ac accumulation at the promoter.
A significant fraction of sex-biased CTCF/cohesin occurs within the same TAD as a sex-biased gene (Figure 2D). Examples of sex-biased CTCF/cohesin binding sites that are distal yet within the same TAD as similarly sex-biased genes are shown in Figures 3C and 3D. Figure 3D shows the female-biased gene Slc22a29 (F/M ratio, 8.7). 33 kb upstream of the Slc22a29 TSS is a highly female-biased enhancer (F-biased DHS and H3K27ac mark accumulation) which also overlaps a female-biased CAC. Slc22a29 is the closest gene and TSS to this female-biased enhancer. Figure 3C shows two male biased genes, the complement C8 genes C8a (M/F = 3.3) and C8b (M/F = 2.8), which are linearly distal (∼1.5 Mb) from a cluster of strongly male-biased enhancers. Based on known TAD structure, we would predict they are more proximal spatially than the adjacent gene, Oma1 (also see screenshot in Figure S4A).
A recent study of cohesin function in male mouse liver revealed that one consequence of loss of cohesin binding (achieved by depleting its loading factor, Nipbl) is the loss of distal enhancer-promoter contacts in favor of local ectopic contacts—activating proximal genes [21]. Figure 3E shows the relative expression change for wild type and cohesin-depleted male mouse liver for the 3 genes shown in Figure 3C: Oma1, C8a, and C8b. Expression of C8a and C8b is dramatically reduced (98.2% and 82.3% reduction, respectively) upon loss of chromatin-bound cohesin, while expression of Oma1 increases modestly (+22%), perhaps by an enhancer hijacking mechanism. By comparison, there is no significant difference in expression between WT and cohesin-depleted mice for the more proximally-regulated male-biased genes Cml5 and Nat8 (Figures 3A, S4B).
Beyond specific examples of distally-regulated (C8a, C8b) or proximally-regulated (Cml5, Nat8) male-biased genes, we also sought to characterize the requirement of cohesin for proper expression of male-biased genes. As a general rule, male-biased genes with proximal sex-biased cis regulatory elements are less sensitive to loss of cohesin than male-biased genes dependent on distal sex-biased enhancers (Figure 3F). Conceivably, alternative transcription factors (e.g. Mediator[25] or YY1[53]) are sufficient to maintain enhancer-promoter loops over short genomic distances, while cohesin is required for the upkeep of distal interactions likely via a direct or indirect looping mechanism.
Using 4C-seq to directly assess sex-biased chromatin interactions in mouse liver
To more directly assess the role of enhancer promoter loops in epigenetic sex bias, we performed 4C-seq for 6 viewpoints at or near 5 sex-biased genes. The viewpoints in Figure 4 show merged results from 3 biological replicates (n=3 per sex), each library prepared from a separate CD-1 mouse liver (see Methods) and example libraries shown in Figure S5A,B.
Figure 4A shows a viewpoint anchored at a female-biased enhancer near the strongly female-biased gene A1bg (F/M expression ratio of 155) as well as 12 female-biased, mono-exonic lncRNAs [10]. In the female samples, robust interactions are observed between the viewpoint enhancer and three regions, indicated by red arrows. While the viewpoint region interacts with a stronger female-biased enhancer (right arrow) and the promoter of A1bg (middle), the strongest interaction is actually with a cluster of lncRNAs (lncRNAs 12590-93; left). Of the 12 lncRNAs in this screenshot, this interacting cluster represents more highly expressed (Figure S5C) and more consistently female-biased genes across various RNA-seq datasets (Figure S5D). Traces for 4C-seq biological replicates are shown in Figure S6A. The precise link between the female-biased expression of these lncRNAs and the female-bias in 3D interactions with a distal enhancer is not known, however, it does point to specific candidates for further characterization. It is possible that the 3D interactions are mediated by these nuclear-enriched lncRNAs (as described for Xist [54] and Firre [55]), or alternatively, their interaction is primarily that of a regulatory enhancer driving expression of several genes—including A1bg and multiple lncRNAs. The presence of female-biased CTCF binding at both the strong female-biased enhancer and the lncRNA cluster (left and right arrows) provides some mechanistic support for the latter interpretation. As CTCF is known to functionally interact with lncRNAs with high affinity [56], these two mechanisms are not necessarily mutually exclusive; cis-acting lncRNAs (expressed only in female liver) could help selectively guide CTCF binding and interactions unique to female liver.
Figure 4B shows 4C-seq results for an enhancer proximal to the highly female-biased gene Gm4794 (F/M ratio = 704; also known as Sult3a2) and the mono-exonic female-biased lncRNAs 8820 and 8821 (F/M = 34.2 and 90.3). This enhancer is also distal to two other female-biased protein coding genes, Sult3a1 (F/M = 288) and Rsph4a (F/M = 33.6). Female-biased interactions are observed with the proximal promoter of Gm4794 (left red arrow) and the strong female-biased enhancers overlapping lncRNA 8820. Unlike the enhancer neighboring A1bg, these interactions may be mediated by female-biased CNC peaks present at both the viewpoint enhancer and downstream interacting enhancer. Despite the absence of insulator elements (i.e. no intra-TAD loops present), all focal interactions are local within ∼35 kb of the viewpoint. In addition to the enhancer-promoter interactions indicated by the red arrows in Figure 4B, the viewpoint enhancer also shows weak interactions from the female-biased promoter upstream Sult3a1 to the end of the gene body (red bracket). Despite robust female-biased DHS and H3K27ac mark accumulation, this region lacks cohesin binding, which may account for the lack of direct interactions with the viewpoint enhancer. It is possible that Gm4794 and Sult3a1 both interact with the enhancer that overlaps with lncRNA 8820, but the indirect nature of this weak association cannot be captured by proximity ligation under standard 4C-seq conditions. Traces for biological replicates are shown in Figure S6B.
In the absence of sex-biased CTCF or cohesin binding, intra-TAD loops may indirectly coordinate enhancer promoter contacts preferential to one sex. To test this hypothesis, we performed 4C-seq for the promoter of the component complement gene C9 (M/F = 3.5; Figure 4C). This gene also overlaps an antisense lncRNA that is also male-biased (lncRNA 12340; M/F = 25.5). A strongly male-biased enhancer region lies ∼230 kb upstream of the TSS of C9, and the former shows male-biased DHS and H3K27ac mark accumulation, while the latter only has male-biased DHS. Both of these regions fall just outside a nested intra-TAD loop (< 10 kb upstream for enhancers and downstream for C9 promoter) insulating the lowly expressed gene Dab2 (FPKM 0.7 in M and 1.2 in F). In both male and female liver we observe that the promoter of C9 is interacting with the cluster of distal enhancers (Figure 4C; red arrow), with a stronger interaction observed in males. In general, only very weak, mostly nonfocal interactions are observed between the promoter of C9 and within the intra-TAD loops. This insulation allows a strong male biased enhancer to bypass a more proximal gene, Dab2, to drive expression of C9. Beyond male bias in expression, this is a clear mechanism allowing high expression of C9 (61.3 FPKM in M) compared to Dab2 (0.7 FPKM in M). However, the shorter isoform of Dab2 does show weak, male-specific interaction despite a lack of any sex-bias in gene expression. The movement of cohesin is at least in part linked to transcriptional activity [18, 57], thus higher levels of transcription in male liver could lead to less cohesin pausing at loop anchors and elevated interactions crossing these loop boundaries in males. Traces for individual replicates are shown in Figure S6C.
Given the interaction with a strong distal enhancer element, we hypothesized that the expression of C9 and lncRNA 12340 would be sensitive to loss cohesin binding. In fact, both genes show substantial reduction in expression in cohesin-depleted mouse liver (∼3-5-fold; p < 0.01 for both), while the insulated gene Dab2 increases in expression but not to a significant extent (p=0.219; Figure 4D).
Nudt7 is an example of a highly expressed, male-biased gene with distal male-biased enhancers within the same intra-TAD loop (M/F = 2.5). Although there are some weak male-biased DHS along the gene body of Nudt7 (Figure 4E), there is no apparent sex bias at the shared promoter with lncRNA 7430 (M/F = 4.0; FPKM 2.9). ∼22 and 34 kb upstream are two male-biased enhancers with male-biased DHS and H3K27ac mark accumulation, the latter also overlaps the TSS of the male-biased lncRNA 7423 (M/F 7.0; FPKM 2.75 in M). This cluster of enhancers also is one of 503 super-enhancers found in male and female liver and maintained despite epigenetic variation associated with circadian rhythms [32]. In male liver interactions between the viewpoint enhancer and both a neighboring male-biased enhancer as well as the promoter of Nudt7 (red arrows; Figure 4E). By comparison, no focal interactions (in either sex) are observed between the viewpoint enhancer and a sex independent enhancer 15.7 kb upstream of Nudt7. Traces for individual replicates are shown in Figure S6D. While the interaction with the promoter region is primarily with the TSS of Nudt7 and not lncRNA7430, both genes are downregulated upon cohesin depletion (Figure 4F). Expression of lncRNA 7423 is not significantly reduced, potentially due to the proximity of strong male-biased enhancers.
Sex-independent, nested intra-TAD loops restrict >Nox4 to proximal enhancer-promoter interactions
We next sought to characterize chromatin interactions for two male-biased enhancers neighboring the male-biased gene NADPH Oxidase 4 (Nox4; M/F expression ratio 7.7). In humans, Nox4 is substantially upregulated across many cancer types, including hepatocellular carcinoma (Figure S7A). In mice that spontaneously-develop liver tumors, Nox4 is substantially upregulated in tumors over adjacent control tissue (Figure S7B; [58]). Figure 5A shows both viewpoints, with the “VP Enh1” track showing interactions for an enhancer ∼11kb upstream of the TSS of Nox4 (above in red) and the “VP Enh2” track showing interactions for an enhancer ∼125 kb downstream of the TSS of Nox4 (below in green). Enh2 does not appear to interact with Enh1 or with the promoter of Nox4 in either sex. In fact, Enh2 does not appear to have any clear male-biased interactions. Comparatively, Enh1 shows interactions associated with “transcribed-like” chromatin states that are specific to male liver (yellow highlighting in Figure 5A). The upstream highlighted region does not correspond to any annotated gene, however the chromatin state in male, but not female, indicates a narrow enhancer (green) and transcribed-like chromatin state (purple; full chromatin state key in Figure S7C) [13]. Downstream, the red arrows indicate the position of the 3’ anchor of the shorter intra-TAD loop. Immediately upstream of this anchor we can see interactions in both male and female liver indicative of insulation within the intra-TAD loop. Just downstream of the anchor however we see interactions only for males (yellow highlighting; right), which is likely due to the increased movement of cohesin beyond this loop anchor most probably as a result of increased transcription of Nox4 in male liver [18, 57]. Both highlighted regions are in a transcribed chromatin state only in male liver. Traces for individual replicates are shown in Figure S7D,E.
These data support the predicted model of two nested intra-TAD loops, with the smaller enclosed loop insulated from the larger enclosing loop. Domain predictions from other mouse tissues support the conclusion that Enh1 and Enh2 are in separate domains and thus only Enh1 would be predicted to interact with the Nox4 promoter (Figure S7F; [26, 51]). Generally, the cis regulatory elements relevant for the regulation of Nox4 appear to be contained within the smaller intra-TAD loop. It is less clear what regulatory function Enh2 plays, as it does not interact with either Nox4 or the downstream inactive gene Tyr (FPKM < 0.01 in both sexes).
DISCUSSION
This work represents the first study of sex differences in autosomal 3D genome organization. We show that more than 1,000 CTCF or cohesin peaks are occupied at different levels between male and female mouse liver. While cohesin-non-CTCF (CNC) sites are closer to enhancers than other peak classes, only a minority of any sex-biased CTCF/cohesin peaks are proximal to sex-biased genes or enhancers. We present several examples of sex differential CTCF/cohesin binding putatively linked to sex-biased genes that are proximal (male-biased: Cml5, Nat8; female-biased: Fam84b) or distal (male-biased: C8a, C8b; female-biased Slc22a29). Using publicly available data, we show that these distally regulated male-biased genes are more sensitive to cohesin depletion, providing additional evidence that cohesin is directly or indirectly involved in linking distal sex-biased enhancers to proximal gene targets. To directly test for differences in chromatin interactions between male and female mouse liver, we performed circularized chromosome conformation capture with sequencing (4C-seq) for 6 regions neighboring 5 highly sex-biased genes. These highlight examples of the direct contribution of female-biased CTCF binding towards female-biased interactions for A1bg (F/M=155) as well as female-biased cohesin binding facilitating proximal enhancer-promoter interactions near Gm4794 (F/M= 704). The indirect contribution of CTCF and cohesin to liver sex differences is shown for the male-biased gene C9 (M/F=3.5), which interacts more strongly with a distal male-biased enhancer, bypassing a lowly expressed gene insulated by an intra-TAD loop. Intra-TAD loops also insulate the male-biased gene Nudt7 and an upstream super-enhancer and limit Nox4 to proximal enhancer regulation by restricting access to an intronic enhancer. Taken together, chromatin interactions link distal sex-biased enhancers to genes which is guided both directly and indirectly by CTCF and cohesin looping.
Despite identifying ∼1,000 sex-biased binding sites for CTCF or cohesin, the majority are distal from enhancers or genes with known sex bias. Although these distal sites lack any obvious link to sex-biased gene expression in wild type liver, they could have a priming effect and contribute to sex-specific responses to hepatic stressors, such as high fat diet [59] or xenobiotic exposure [60]. Just as short-term feeding of a high fat diet can leave a lasting “epigenetic memory” in the form of epigenetic modifications [61], sex-biased CTCF/cohesin binding sites may prime each sex for differential looping patterns in response to subsequent chemical exposure or dietary stressors. It should also be noted that our datasets likely capture only a subset of sex-biased CTCF/cohesin binding sites for adult mice at 8 weeks of age. Liver sex differences in DNA methylation increase with age [50, 62], and the binding of CTCF is highly sensitive to DNA methylation within its motif [48, 49, 63]. The hypomethylation of enhancers in male liver (lower methylation in males than females; [50]) could open up additional CTCF sites, an effect that would presumably be more pronounced in older mice. At 8 weeks of age, we found that the majority of male-biased CTCF binding is at Lone CTCF sites, which tend to be closer to both enhancers and TSS than CAC sites (Figure 2B,C).
There is substantial evidence that the majority of insulated, CAC-mediated loops are invariant across most tissues and cell types studied [30]. A significant minority of these loops are tissue-specific, typically on the order of 20-30% [32]. Using a computational method for loop prediction that we recently described [32], we do not find substantial evidence for the presence of sex-specific insulated intra-TAD loops. A cell-type-specific intra-TAD loop was described in erythroid cells, despite no obvious differences in CTCF or cohesin binding from embryonic cells [64]. If this phenomenon is widespread, then some, and perhaps many sex-biased intra-TAD loops could be formed by sex-independent CTCF/cohesin anchors via an unknown mechanism. In the absence of global assessment of all sex-biased looping (i.e. using Hi-C), we conclude that intra-TAD loops are primarily sex-independent features that contribute indirectly to sex-biased gene expression.
Cohesin-and-CTCF sites may directly link enhancers and promoters, in addition to their well-characterized role in domain-level organization [35, 42]. This finding suggests that in addition to the largely invariant structural features of TADs and insulated intra-TAD loops, such “non-anchor” CAC sites may contribute to tissue-specific enhancer–promoter or enhancer–enhancer interactions. As the majority of sex-differential CTCF/cohesin binding occurs at non-anchor CAC sites, long distance CAC-mediated interactions may link sex-biased enhancers to promoters of sex-biased genes. Specific examples include the enhancer–enhancer contacts observed for A1bg (in the context of female-biased CTCF binding; Figure 4A), and may also characterize the distal female-biased enhancer neighboring Slc22a29 (female-biased CTCF and cohesin binding at a robust female-biased enhancer; Figure 3D).
The interacts of the promoter of C9 with a male-biased enhancer >200 kb upstream is an interesting example of CAC-looping indirectly guiding enhancer–promoter contact. An intervening intra-TAD loop insulates the sex independent (and weakly-expressed) gene Dab2 gene, while bringing C9 and a cluster of male-biased enhancers in closer proximity in both male and female liver. In the absence of any sex-biased CTCF or cohesin binding to explain the increased contact in male liver, it is probable that other known looping factors augment direct enhancer–promoter interactions specifically in male liver. Mechanistically, this could be driven by protein (such as YY1[53] or mediator[25]) or non-protein factors such as enhancer RNAs (eRNAs) [65, 66]. Although data is only available for male mouse liver, 3 of the 4 distal upstream enhancers are actively transcribed to produce bidirectional eRNA transcripts [67] and 4 of 4 are bound by the protein YY1 [68].
Sex differences in mouse and human liver are linked to the sex-specific pattern of growth hormone secretion by the pituitary gland [69–71]. While we have demonstrated the presence of proximal and distal sex-biased looping in mouse liver, it is not clear how dynamic these loops are in response to factors that dysregulate sex-biased gene expression following alterations in hormone patterns [72] or xenobiotic exposure [60]. In an extreme example of dynamic looping, complete loss of cohesin followed by its reintroduction (achieved by auxin-induced targeted degradation and withdrawal) establishes that virtually all loops are lost after 40 min then regained within an hour of auxin withdrawal [22]. During macrophage differentiation, extensive changes are observed to gene expression, the enhancer landscape, and chromatin interactions linking them [73]; however IMR90 cells show extensive changes in enhancer activity and gene expression in response to TNFα treatment yet virtually no differences in DNA looping are observed [74]. Studies on circadian effects in mouse liver have shown selective interactions between circadian oscillating genes and oscillating enhancers, but did not directly show a related rhythmicity of 3D interactions (either no observed differences [75], or the comparison was not made [76]). Based on the sensitivity of distally-regulated genes to cohesin depletion, we predict that looping from distal sex-biased enhancers would be more responsive to alterations of the plasma growth hormone pattern. Alternatively, the majority of loops might not be directly responsive to changes in growth hormone stimulation, as in treated IMR90 cells [74], or that sex differences in 3D interactions are a result sex hormones or other factors.
Chromosome conformation capture technologies have been extended beyond the interaction analysis to study genomic rearrangements and copy number variation [77]. Here we also show that 3C-based assays may also be a useful tool to determine candidate lncRNAs for further analysis. The highly female-biased gene A1bg is nearby many lncRNAs, however, we observed that a subset of interacting lncRNAs was consistently female-biased across different studies, experiment types, and mouse strains (Figure S5C,D) — suggestive they may play a functional role. Currently, enhancer-associated lncRNAs are defined as those transcribed from enhancers and show interesting properties such as increased enhancer activity, tissue-specificity, and association with expression of nearby protein coding genes [78, 79]. Our 4C-seq results for the A1bg enhancer suggest that this model could be extended to enhancer-interacting lncRNAs as an approach to characterize or screen for candidate functional lncRNAs. This would likely best be accomplished by implementation of a more global interaction-based assay, such as Hi-C.
The vast majority of sex-biased CTCF and/or cohesin binding is intergenic and distal to TSS and sex-biased genes (Figure 2C,F). While most sex-biased CNC peaks overlap enhancers (median distance ∼0.2 kb; Figure 2D), only ∼25% of sex-biased CNC peaks are within 1 kb of a sex-biased enhancer (Figure 2G). Overall, we estimate that less than half of sex-biased CTCF/cohesin peaks contribute to sex-biased gene expression, as they are not at least within the same TAD as a sex-biased gene (Figure 2F). However, this does not mean that 3D organization in general does not contribute to the extensive transcriptomic and epigenetic sex bias in mouse liver. Previously described intra-TAD loops may indirectly facilitate male-biased chromatin interactions for genes like C9 and insulate the highly active and super-enhancer regulated Nudt7, all despite lacking sex-biased CTCF or cohesin binding. Similarly, the nested intra-TAD loops insulating Nox4 prevent distal enhancer-promoter interaction with an intronic enhancer in favor of more proximal regulation by a cluster of upstream enhancers. Highly female-biased genes, such as A1bg and Gm4794, show female-biased CTCF and cohesin binding, respectively, as well as female-biased chromatin interactions at distances ranging from 10 to 100 kb. Thus, CTCF and cohesin can contribute in both direct and indirect ways to guide sex-biased enhancer–promoter contacts in mouse liver.
MATERIALS AND METHODS
Animal Work and Tissue Processing
Adult CD-1 mice were purchased from Charles River Laboratories [cat#:Crl:CD1(ICR)] and sexes were housed separately in the Boston University Laboratory Animal Care Facility. Animals were treated and housed until 8 weeks of age using protocols specifically reviewed for ethics and approved by Boston University’s Institutional Animal Care and Use Committee (IACUC). Mice were then euthanized via cervical dislocation and their livers removed and rinsed in ice cold Phosphate Buffered Saline (PBS). Livers were homogenized in a Potter-Elvehjem homogenizer in high sucrose homogenization buffer (10 mM HEPES (pH 7.5), 25 mM KCl, 1 mM EDTA, 2 M sucrose, 10% glycerol, 0.05 mM DTT, 1 mM PMSF, 0.15 mM spermine, 0.2% (v/v) spermidine, 1 mM Na orthovanadate, 10 mM NaF, and 1X Roche Complete Protease Inhibitor Cocktail) to preserve nuclei structure. This suspension was layered on top of a 3 ml cushion of homogenization buffer and ultracentrifuged at 25,000 RPM for 30 min at 4°C in an SW41 Ti rotor to obtain clean nuclei in a loose pellet. The supernatant was removed, and the pelleted nuclei were resuspended in 1 ml of crosslinking buffer (10 mM HEPES buffer (pH 7.6), 25 mM KCl, 0.15 mM 2-mercaptoethanol, 0.34 M sucrose, 2 mM MgCl2). This nuclear suspension was transferred to a 1.5 ml Eppendorf tube at 30°C, formaldehyde was added to a final concentration of 0.8% (v/v), and incubated with occasional mixing for 9 min at 30°C. Crosslinking was halted by the addition of 110 µl of 1 M glycine, followed by a 5 min incubation at room temperature. The crosslinked suspension was then layered atop homogenization buffer and pelleted, as above, prior to proceeding with sonication.
Pelleted samples were resuspended in 1 ml of 4°C 1X Radioimmunoprecipitation assay (RIPA) buffer (50 mM Tris-HCl, pH 8.0, 150 mM NaCl, 1% IPEGAL, 0.5% deoxycholic acid, and 1X Roche Complete Protease Inhibitor Cocktail) containing 0.5% SDS by gentle pipetting. This material was then transferred to a 15 ml polystyrene tubes (BD Falcon, cat#:352095) and sonicated according to the manufacturer’s instructions with a Bioruptor Twin instrument (UCD-400). This consisted of 75 cycles at high intensity setting in a 4°C water bath, where 1 cycle is ON for 30 s and OFF for 30 s. The sonicated chromatin was spun at 18,000 x g for 10 min at 4°C to clear any debris, and the supernatant was snap frozen in liquid nitrogen. The bulk of this material was stored at −80°C prior to ChIP, while a small aliquot was reverse crosslinked to quantify the amount of sonicated chromatin and the fragment size distribution. This aliquot was adjusted to a final concentration of 0.2M NaCl and crosslinks were reversed for 6 h at 65°C in a thermocycler with heated lid. This sample was dilute 3-fold in water, then digested with RNase A (Novagen, cat#:70856) for 30 min at 37°C followed by proteinase K (Bioline, cat#: BIO-37084) for 2 h at 56°C. Fragment size distribution for the resulting DNA was assessed by 1% agarose gel electrophoresis to ensure the bulk of material was < 500 bp in size. Chromatin quantification was performed using a Quanti-iT PicoGreen assay (Invitrogen, cat#:Q33130), back calculating to determine the undiluted concentration of the bulk chromatinized DNA stored for downstream ChIP experiments.
Chromatin Immunoprecipitation and Sequencing (ChIP-seq) Experimental Protocol
All male ChIP samples in this study are from a previous work [32], deposited at GSE102997. Male and female biological replicates are from individual mouse livers with 8 total CTCF ChIP-seq samples per sex (n=4 males and n=4 females) and 6 total Rad21 ChIP-seq samples (n=3 males and n=3 females). All samples were processed using the same protocol and conditions throughout, as follows.
Immunoprecipitation of chromatin from mouse liver was performed as described previously [13]. Specifically, 5 µl of antibody to either CTCF (Millipore, cat#:07-729) or to the cohesin subunit Rad21 (Abcam, cat#:ab992) were mixed with 30 µl of Protein A Dynabeads (Invitrogen, cat#:1002D) and incubated in blocking solution (0.5% bovine serum albumin in PBS) for 3 h at 4°C. Beads were then washed with blocking solution, followed by incubation overnight with 70 µg of liver chromatin in 1X RIPA (containing 0.1% SDS). After washing with 1X RIPA (containing 0.1% SDS), formaldehyde crosslinks were reversed by incubation for 6 h at 65°C followed by RNase A digestion (Novagen, cat#:70856) at 37°C for 30 min then proteinase K digestion (Bioline, cat#:37084) at 56 °C for 2 h. The resulting crude DNA extract was purified using the QIAquick Gel Extraction Kit (Qiagen, cat#:28706) and quantified with the Qubit HS DNA kit (Invitrogen, cat#:Q32854).
Circularized Chromatin Conformation Capture with Sequencing (4C-seq) Experimental Protocol
Nuclei isolation and crosslinking were performed as above in tissue processing, through pelleting of crosslinked nuclei. Digestion, proximity ligation, and inverse PCR were carried out as described previously [32]. Pelleted nuclei were resuspended in Buffer A (15 mM Tris-HCl pH 8.0, 15 mM NaCl, 60 mM KCl, 1 mM EDTA pH 8.0, 0.5 mM EGTA pH 8.0, 0.5 mM spermidine, 0.3 mM spermine) and quantified using the Countess Automated Cell Counter (Invitrogen, cat#:C10227). Nuclei were aliquoted so that 10 million nuclei were used per individual 4C experiment, pelleted at 1,000 x g for 5 min at 4°C, and resuspended in 1X NEBuffer 3 (100 mM NaCl, 50 mM Tris-HCl, 10 mM MgCl2, 1 mM DTT, pH 7.9; NEB, cat#:B7003S). Primary restriction enzyme digestion of intact nuclei was conducted with 50,000 U of DpnII (NEB: #R0543) overnight at 37°C with agitation at 900 RPM. DpnII was inactivated with a final concentration of 2% SDS, then diluted 5-fold in 1X ligation buffer (Enzymatics, cat#:B6030). Proximity ligation was performed overnight at 16°C with 200U of T4 DNA ligase (Enzymatics, cat#:L6030). At this point, DNA was reverse crosslinked and purified using a standard phenol/chloroform cleanup according to the manufacturer’s protocol (VWR, cat#:0883). Secondary digestion of purified DNA was performed with 50 U of Csp6I (Thermo Scientific, cat#:ER0211) overnight at 37°C in 500 µl of 1X Buffer B (Fermentas, cat#BB5; 10 mM Tris-HCl (pH 7.5), 10 mM MgCl2, 0.1 mg/ml BSA). To halt enzymatic activity, Csp6I was heat inactivated for 30 min at 65°C. Samples were diluted 10-fold and secondary ligation 10-fold and secondary ligation was carried out as above, overnight at 16°C. Primary digestion, proximity ligation, secondary digestion, and secondary ligation were each independently verified by reverse crosslinking and gel electrophoresis as described in the sonication section of “Animal Work and Tissue Processing,” above. An example gel image is shown in Figure S5A. The final PCR template was purified by phenol/chloroform clean up, followed by QiaPrep 2.0 column cleanup (Qiagen, cat#:27115). This resulted in a standard, circularized 4C inverse PCR template amplified by specific viewpoint primers, below.
PCR reactions were performed using primer pair sequences for valid 4C-seq viewpoints (VPs) from the 4CSeqpipe primer database (http://www.wisdom.weizmann.ac.il/~atanay/4cseq_pipe/) with the addition of 5’ dangling truncated Illumina adapters (Table S3A). Candidate viewpoints were selected based on the following criteria. First, the VP was required to be within the same TAD as a sex-biased protein-coding or lncRNA gene with >3-fold M/F bias in expression. Second, the VP must either fall within 1 kb of the TSS sex biased gene or else overlap a sex-biased enhancer (minimum 2-fold M/F bias in DHS opening or K27ac mark accumulation). Third, the nonreading primer was required to uniquely map to the genome, while the reading primer was more stringently required to have > 89% unique sequence identity (i.e., no 18-mer within the primer mapped elsewhere in the genome). Amplification of 4C templates was performed with Platinum Taq DNA polymerase (Invitrogen, cat#:10966026) under the following conditions: 94°C for 2 min, 25 cycles at (94°C 30 s, 55°C 30 s, 72°C 3 min), then 4°C hold. To limit the effect of PCR artifacts, a total of 6 identical inverse PCR reactions per sample were processed in parallel and then pooled prior to library preparation. For Nox4 M1, M2, F1, and F2 samples, libraries were prepared independently for singular inverse PCR reactions which were then pooled at the raw read level (equivalent of a pool of 2 identical PCR reactions). Example pooled 4C libraries are shown in Figure S5B.
4C-seq samples were amplified using NEB barcoded primers such that each biological replicate had a unique barcode (NEB, cat#:E7335). To prevent overamplification, each sample was amplified with 5 additional cycles of PCR to fill in the full Illumina adapter sequence necessary for sequencing. Due to the low complexity in the first 20 bases of the sequence read, 4C libraries were multiplexed and sequenced with additional high complexity libraries (i.e. RNA or ChIP-seq). In practice, 4C libraries constituted no more than 15% of the total library pool, by molarity. Samples were sequenced on an Illumina HiSeq 2500 with either 50, 125, or 150 bp paired end read lengths at the New York Genome Center (50 and 125 bp reads) and Novogene (150 bp reads).
Computational Analysis of ChIP-seq Datasets
Sequence reads were split by barcode and mapped to the mm9 genome assembly with Bowtie2 (v2.2.9). All reads not uniquely mapped to the genome were removed for downstream analyses. Peak calling was performed using MACS2 (v2.1.1) with default parameters, then peaks overlapping blacklisted genomic regions (www.sites.google.com/site/anshulkundaje/projects/blacklists) were filtered out. Additionally, we removed spurious peaks that contain only PCR duplicated reads, defined as 5 or more identical sequence reads that do not overlap any other reads. All BigWig tracks for visualization in a genome browser were normalized for sequencing depth and sample quality, expressed as reads in peaks per million mapped reads (RIPM). In practice, the y-axis shows the read count from a given sample divided by the total number of reads in peaks (reads that overlap a called peak in any sample, or the union peak list), per million. Normalization was performed separately for CTCF and cohesin datasets. This approach is functionally similar to the common quality control metric known as Fraction of Reads in Peaks (FRiP) used by the ENCODE consortium and others [80]. All samples used in this study were of good quality with a minimum FRiP of 0.1034 (male Rad21 ChIP-seq; G133_M6) with a full listing in Table S3B.
Differential peaks were identified as described previously with some modifications [REF_andy]. DiffReps (v1.55.4) was used to identify in-peak differential sites with default parameters and a window size of 200 bp. This output was then filtered to require that differential sites overlap a peak from the per-factor union peak list, contain a minimum of 10 sequence reads, show a minimum 2-fold sex difference, and have an FDR < 0.05. Distance to features and overlap analysis were conducted using Bedtools (v2.26.0). TAD and Intra-TAD coordinates for mm9 were downloaded from reference [32], where TAD definitions are based on prior experimental Hi-C analysis in male mouse liver [34]. For browser visualization, a secondary “lenient” sex-biased peak list was also defined as follows. Biological replicates were merged by sex and factor (M CTCF, M cohesin, F CTCF, and F cohesin), then the merged samples with a higher FRiP (compared to the same factor, but opposite sex) was downsampled so that the mapped read files for each sex contained the same number of reads in peaks. In practice, this meant that the mapped reads from merged female cohesin and merged male CTCF were downsampled by a factor of 0.979869 and 0.745955, respectively. MAnorm was then used to compare single replicates from each sex to define the “lenient” sex-differential binding for a given factor (minimum 2-fold sex bias; p adj < 0.01; read count > 15 for the upregulated peak; [81]). The strict and lenient lists are mutually exclusive and do not overlap.
Differential peaks were then further subdivided into 4 groups [CAC(ΔCoh), CAC(ΔCTCF), CNC(ΔCoh), or Lone CTCF(ΔCTCF)] based on the following criteria, separately for male and females. Differential cohesin peaks are considered CAC if they overlap peaks in 3 or 4 of the CTCF ChIP-seq biological replicates (out of a total of 4). Alternatively, they are considered CNC if they overlap peaks in 0 or 1 of the CTCF-seq biological replicates (out of a total of 4; those overlapping 2 CTCF replicates were removed). Similarly, differential CTCF peaks are considered CAC if they overlap peaks in 2 or 3 of the cohesin ChIP-seq biological replicates (out of a total of 3). Differential CTCF peaks are considered “Lone CTCF” if they overlap peaks in 0 or 1 of the cohesin ChIP-seq biological replicates (out of a total of 3). Control “sex-independent peaks” for CTCF and cohesin were defined by ranking each peak according to the ratio of the RIPM normalized ChIP signal for the merged male over the RIPM normalized ChIP signal for the merged female sample (separately for CTCF and cohesin). The 1,000 peaks whose ratio was closest to 1 were defined as “sex-independent” CTCF or cohesin peaks.
Enhancer DHS were defined by a high ratio of H3K4me1 to H3K4me3 ChIP signal as defined previously [32]. CTCF motif discovery was performed using the FIMO option from MEME Suite (v4.10.0), and presence of a motif is defined as a motif score > 10.
Intra-TAD loops for female samples were identified exactly as described previously for male liver [32]. The pipeline was run with default parameters for all CAC sites for an initial loop count of 20,000, again identical to previous parameters for male liver [32]. Overall, we identified 9,724 loops in female (compared to 9,052 in male) with 10,273 loop anchors (compared to 9,052 in male). Considering the lack of sex-differential CAC at TAD or intra-TAD anchors we did considered the majority of these to be the result of technical noise. Of the ∼10% of loops that were unique to males or females they tended to be weaker than shared loops, narrowly meeting the cutoff in one sex but not the other. Reciprocal overlap between loops was performed using bedtools (bedtools intersect –wa –u –r –f 0.8) as described previously [32].
Computational Analysis of 4C-seq Datasets
Biological replicates were first demultiplexed by index read barcode. As fastq files for each biological replicate contained multiple viewpoints, reads were further split based on matches to the reading primer for each viewpoint (Table S3A). Then, 20 nts from the 5’ end of reads were removed (representing the reading primer) using the FASTX-Toolkit (v0.0.14) prior to mapping. For consistency, 25 nts were also trimmed from the 3’ end of 150 nt reads so that they were identical in length to the 125 nt libraries (after 5’ and 3’ trimming, both were 105 bases in length). Reads were then iteratively mapped to a reduced mouse genome using Bowtie2 (v2.2.2). Rather than mapping to the full genome, this reduced genome contained only 105 bp upstream and 105 bp downstream from any DpnII cut site in the genome (GATC). An iterative mapping strategy was used because some reads contain multiple ligation junctions, rendering the full-length read unmappable as described in some Hi-C pipelines [82]. Any non-uniquely mapped reads were trimmed by ∼10% of their total length (starting from the 3’ end of the read), and mapping was reattempted as above. This process was repeated to a minimum read length of 20 nt, with a stepsize of 2 bp (for shorter reads; 50 nt) or 10 bp (for longer reads; 125 or 150 nt) per iteration. The net effect of trimming and remapping increased the overall percent of uniquely mapping reads by 2-6% for shorter read lengths and 8-22% for longer read length libraries.
After mapping bedGraph files were smoothed using a sliding window of 11 restriction fragments taking the median value in the window. Smoothed BigWig tracks were normalized by total reads per million to account for differences in sequencing depth. Merged replicates shown in the main Figure panels were generated by taking the median signal at a given restriction fragment per viewpoint and sex. The window overlapping the viewpoint fragment was removed prior to BigWig generation for merged replicates. Tracks were visualized in the WashU genome browser with replicates overlaid using the Matplot feature. Table S3C shows the sequences used for demultiplexing 4C libraries as well as basic statistics for read mapping and quality control.
Analysis of Publicly Available Datasets
Plots showing the impact of cohesin depletion on expression of protein coding and lncRNA genes were based on data from [21], using RNA-seq datasets from n=4 (each) male mouse livers for WT and Nipbl-depleted replicates. Using RPKM normalized data, reads are expressed as expression relative to the mean of the WT group (set as 1). This was accomplished on a per gene basis by dividing the expression value for each individual replicate by the mean of the WT group plus a small psuedocount to avoid dividing by zero (1e-6). This data is presented as mean relative expression ± SD for all plots. The underlying values for all genes are shown in Table S2A.
The following datasets did not originate from this study but were used to generate screenshots: H3K27ac ChIP-seq data (Lau Corona et. al., in Review), Dnase-seq [12], male CTCF/cohesin [32], sex-biased lncRNAs [83], and M/F expression ratio [72]. Unless otherwise noted, FPKM and M/F expression ratios for protein coding genes are based on Ribo-depleted total RNA while lncRNA values are based on Ribo-depleted nuclear RNA. Bigwigs were RIPM normalized separately per experiment type as described above (“Computational Analysis of ChIP-seq Datasets”). Below DHS and H3K27ac BigWig signal tracks the regions with significant sex bias are indicated. Sex-biased DHS and H3K27ac-marked regions were split into groups on the principal of defining “strict” and “lenient” subclasses of sex-biased binding. In practice, DHS are shaded from dark to light for high, standard, and low stringency male-(blue) and female-biased (pink) DHS, as defined previously [12]. The low and high stringency bear the greatest resemblance to the “lenient” and “strict” definitions used here. For the H3K27ac sex-biased tracks, four total groups were defined based on the magnitude of the FC difference between male and female samples (M/F or F/M > 2.5 defined as “strict”; 1.5 < M/F or F/M < 2.5 defined as “lenient”). Both “strict” and “lenient” sex-biased H3K27ac peaks were defined using MAnorm (p adj < 0.05; read count > 15 for the upregulated peak; [81]). These cutoffs resulted in 1,583 female-biased H3K27ac peaks (380 strict, 1203 lenient) as well as 2,241 male-biased H3K27ac peaks (604 strict, 1637 lenient).
Pan-cancer human expression data was downloaded for Nox4 using the web software TIMER [84]. Only tissues with matched “Normal” and “Tumor” expression values are shown with otherwise default parameters. Nox4 expression for male C3H mouse liver and neoplasms was downloaded as processed, normalized RNA-seq counts from a previous study (E-MTAB-6972; [58]).
Basic Statistical Tests and Approaches
Boxplots, cumulative distribution plots, and statistical analyses were performed using GraphPad Prism 7. Unless otherwise indicated, all pairwise comparisons used two-tailed, nonparametric t tests. Comparisons of distributions were performed using a Kolmogorov–Smirnov (KS) test, while comparisons of values used a Mann-Whitney (MW) test. The results are annotated where **** indicates p ≤ 0.0001, *** p ≤ 0.001, ** p ≤ 0.01, * p ≤ 0.05, and ns p > 0.05.
FUNDING
This work was supported by National Institutes of Health grants [DK33765, ES024421 to D.J.W]; and by the National Science Foundation predoctoral fellowship [DGE-1247312 to B.J.M].
CONFLICT OF INTEREST
The authors declare that they have no competing interests.
Supplemental Figures
Figure S1: Additional features of sex-biased differential CTCF/cohesin peaks
Figure S2: Comparison of sex-biased CTCF/cohesin peaks
Figure S3: Tissue conservation features of sex-differential peaks
Figure S4: Additional C8a/C8b screenshot and cohesin depletion effects
Figure S5: Quality control of 4C-seq Samples and A1bg lncRNA analysis
Figure S6: Additional 4C-seq screenshots with replicates
Figure S7: High expression of Nox4 in HCC (both human and mouse model) and other Nox4 screenshots
ACKNOWLEDGEMENTS
We would like to thank Aram Shin for optimizing Rad21 and CTCF ChIP-seq experiments in this lab. Also, thanks to Tisha Melia and Andy Rampersaud for developing the pipelines used in ChIP-seq and RNA-seq analysis. Further we’d like to thank Nicholas O’Neill for useful comments and improvements to the 4C-seq pipeline and mapping strategy.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵