ABSTRACT
Temporal data on gene expression and context-specific open chromatin states can improve identification of key transcription factors (TFs) and the gene regulatory networks (GRNs) controlling cellular differentiation. However, their integration remains challenging. Here, we delineate a general approach for data-driven and unbiased identification of key TFs and dynamic GRNs, called EPIC-DREM. We generated time-series transcriptomic and epigenomic profiles during differentiation of mouse multipotent bone marrow stromal cells (MSCs) towards adipocytes and osteoblasts. Using our novel approach we constructed time-resolved GRNs for both lineages. To prioritize the identified shared regulators, we mapped dynamic superenhancers in both lineages and associated them to target genes with correlated expression profiles. We identified aryl hydrocarbon receptor (AHR) as a mesenchymal key TF controlled by a dynamic cluster of MSC-specific superenhancers that become repressed in both lineages. AHR represses differentiation-induced genes such as Notch3 and we propose AHR to function as a guardian of mesenchymal multipotency.
INTRODUCTION
Understanding the gene regulatory interactions underlying cell differentiation and identity has become increasingly important, especially in regenerative medicine. Efficient and specific reprogramming of cells towards desired differentiated cell types relies on understanding of the cell type-specific regulators and their targets (Rackham et al., 2016). Similarly, knowledge of the regulatory wiring in the intermediate stages might allow controlled partial dedifferentiation, and thereby endogenous regeneration, also in mammals (Aguirre et al., 2014).
Great progress has been made in reconstruction of GRNs for various cell types in recent years. While successful, many of the approaches derive their regulatory interactions from existing literature and databases, which may be limiting as the majority of enhancers harboring transcription factor (TF) binding sites are cell type-specific (Consortium et al., 2014). Thus, the regulatory interactions derived from existing databases and literature might be misleading and are likely to miss important interactions that have not been observed in other cell types. Therefore, context-specific expression data have been used to overcome such biases and allow a data-driven network reconstruction (Janky et al., 2014). In addition, other approaches taking advantage of time-series data, such as Dynamic Regulatory Events Miner (DREM) (Schulz et al., 2012), have been developed to allow hierarchical identification of the regulatory interactions. However, while time-series epigenomic data has been used in different studies to derive time point-specific GRNs (Ramirez et al., 2017) (Goode et al., 2016), systematic approaches that integrate the different types of data in an intuitive and automated way are missing.
The central key genes of biological networks under multi-way regulation by many TFs and signaling pathways were recently shown to be enriched for disease genes and are often controlled through so called super-enhancers (SEs), large regulatory regions characterized by broad signals for enhancer marks like H3 lysine 27 acetylation (H3K27ac) (Galhardo et al., 2015) (Hnisz et al., 2013) (Parker et al., 2013) (Siersbaek et al., 2014). Hundreds of SEs can be identified per cell type, many of which are cell type-or lineage-specific and usually control genes that are important for the identity of the given cell type or condition. Thus, SE mapping can facilitate unbiased identification of novel key genes.
An example of lineage specification events with biomedical relevance is the differentiation of multipotent bone marrow stromal progenitor cells (MSCs) towards two mesenchymal cell types: osteoblasts and bone marrow adipocytes. Due to their shared progenitor cells, there is a reciprocal balance in the relationship between osteoblasts and bone marrow adipocytes. Proper osteoblast differentiation and maturation towards osteocytes is important in bone fracture healing and osteoporosis and osteoblast secreted hormones like osteocalcin can influence insulin resistance (Lee et al., 2007) (Silva & Kousteni, 2012). At the same time bone marrow adipocytes, that occupy as much as 70% of the human bone marrow (Fazeli et al., 2013), are a major source of hormones promoting metabolic health, including insulin sensitivity (Cawthorn et al., 2014). Moreover, increased commitment of the MSCs towards the adipogenic lineage upon obesity and aging was recently shown to inhibit both bone healing and the hematopoietic niche (Ambrosi et al., 2017).
Extensive temporal epigenomic analysis of osteoblastogenesis has been recently reported (Wu et al., 2017). However, a parallel investigation of two lineages originating from the same progenitor cells can help to understand both the lineage-specific and the shared regulators important for their (de)differentiation. To identify shared regulators of adipocyte and osteoblast commitment and to delineate a general approach for systematic unbiased identification of key regulators, we performed time-series epigenomic and transcriptomic profiling at 6 different time points over 15 day differentiation MSCs towards both lineages. We combine segmentation-based TF binding predictions from time point-specific active enhancer data (Schmidt et al., 2017) with probabilistic modeling of temporal gene expression data (Schulz et al., 2012) to derive dynamic GRNs for both lineages. By merging overlapping SEs identified using H3K27ac signal from different time points we obtained dynamic profiles of SE activity across the two differentiations and use these dynamic SEs to prioritize the key regulators identified through the network reconstruction. With this approach, we identify aryl hydrocarbon receptor (AHR) as a central regulator of multipotent MSCs under dynamic control from 4 adjacent SEs that become repressed in the differentiated cells. The AHR repression allows upregulation of many adipocyte- and osteoblast-specific genes, including Notch3, a conserved developmental regulator.
RESULTS
A subset of differentially expressed genes are shared between adipocyte and osteoblast differentiation
To identify shared regulators of MSC differentiation towards adipocytes and osteoblasts, and to delineate a general approach for a systematic unbiased identification of key regulators, we performed time-series ChIP-seq and RNA-seq profiling at 6 different time points over 15 days of differentiation of mouse ST2 MSCs (Figure 1A). Using ChIP-seq, genome-wide profiles of three different histone modifications, indicating active transcription start sites (TSS) (H3K4me3), active enhancers (H3K27ac), and on-going transcription (H3K36me3) were generated.
These data were complemented by corresponding time-series RNA-seq analysis. Importantly, at genome-wide level all the histone modifications showed good correlation with the RNA-seq data across the time points (Pearson correlation coefficients of approximately 0.5), further arguing for the reproducibility of the obtained results (Becker et al. in preparation). The successful differentiations were confirmed by induced expression of known lineage-specific marker genes and microscopic inspection of cellular morphology and stainings (Supplementary Figure S1). Interestingly, profiles of the adipogenic marker genes resembled those reported for the yellow adipose tissue (YAT) found in the bone marrow, rather than classic white adipose tissue (WAT) (Scheller et al., 2016), consistent with ST2 cells originating from bone marrow stroma (Figure S1A). Moreover, the expression profiles of Sp7 and Runx2 were consistent to those previously reported for mouse osteoblasts (Yoshida et al., 2012).
Principal component analysis of the obtained transcriptome profiles confirmed the specification of the cells towards two different lineages with differential temporal dynamics (Figure 1B). Osteoblastogenesis was accompanied by gradual and consistent progression towards a more differentiated cell type while adipogenesis showed more complex dynamics with a big transcriptome shift after one day of differentiation, followed by a more gradual progression during the following days. This is in keeping with the change in the composition of the differentiation medium from day 2 onwards (see Methods for details). In total the adipocyte differentiation was characterized by 5156 significantly differentially expressed genes (log2FC≥1, FDR<0.05) compared to the MSCs (Figure 1C; Supplementary Table S1). During osteoblast differentiation 2072 genes were dfifferentially expressed. 1401 of these genes were affected in both lineages. However, as illustrated by the top 100 genes with highest variance across the time points and depicted in the heatmap in Figure 1D, most genes exhibit either lineage-specific or opposing behavior between the lineages. Only a subset of genes showed similar changes in both lineages (Figure 1D), thus, narrowing the list of genes that could serve as shared regulators of both differentiation or dedifferentiation processes.
Unbiased data-driven derivation of context-specific dynamic regulatory networks of adipocyte and osteoblast differentiation using EPIC-DREM
In order to take an unbiased and data-driven approach that can benefit from the time series profiles, we have developed a new method to predict condition-specific TF binding using footprint calling in H3K27ac data and TF motif annotation (Gusmao et al., 2016) (Roider et al., 2007) (Schmidt et al., 2017). Our approach uses a novel randomization strategy, which accurately accounts for differences in footprint lengths and GC-content bias, to assess the significance of TF binding affinity values for each condition or time point (Figure 2). These time point-specific predictions can be combined with the DREM approach (Schulz et al., 2012) to construct lineage-specific networks that are supported by epigenetics data (called EPIC-DREM, Figure 2).
At first we have used 78 TF ChIP-seq datasets from three ENCODE cell lines (GM12878, HepG2, and K562) to test the ability of our approach to prioritize condition-specific TF binding sites using this pipeline. Using a p-value cutoff < 0.05 we obtained accurate cell-type specific TF binding predictions with a median precision of ~70% without a major decrease in recall (Supplementary Figure S2) and, thus, used the same cut-off for our time-series differentiation data.
We applied EPIC-DREM to the analysis for each timepoint of the two differentiation time series. Depending on the time point and lineage we predicted 0.6 to 1.4 million footprints per time point, consistent with previous reports on the presence of approximately 1.1 million DNase-seq footprints per cell type (Neph et al., 2012). These footprints were annotated with TFs that show expression in the differentiation series and associated to genes within 50 kb to obtain the TF scores per gene and per time point (see Methods and Figure 2). The full matrix of the time point-specific TF-target gene interactions per lineage can be downloaded at (Gerard et al., 2017).
The derived matrix of the predicted time point-specific TF-target gene interactions was combined with the time series gene expression data to serve as input for DREM to identify bifurcation events, where genes split into paths of co-expressed genes (Figure 2 and 3). Knowing the time point-specific TF-target gene interactions allows to associate split points and paths with the key TFs regulating them. Figure 3 shows the split points and the paths of co-expressed genes identified for adipocyte (Figure 3A) and osteoblast (Figure 3B) differentiation. The total number of TFs controlling the individual paths are indicated with the top TFs per path listed (based on their fold change (FC) and split score; see Supplementary Table S2 for all TFs per split).
Inspection of the identified key TFs confirmed many known positive (e.g. CEBPA, CEBPB, PPARG, and STAT5A/STAT5B) and negative (e.g. HES1 and GATA2) regulators of adipocyte differentiation (Farmer, 2006), to be among the TFs controlling the genes in the induced paths, while becoming up-or down-regulated themselves, respectively (Figure 3A). Interestingly, while not changing in their expression, TFs like E2F4 were annotated among those with highest split score during the first day of adipogenesis, consistently with E2F4’s role as early repressor of adipogenesis whose activity is controlled by its co-repressor p130 (Farmer, 2006). At the same time E2F7 and E2F8 seem to be associated with regulation of genes first repressed at early adipogenesis, and then upregulated during the following days (Figure 3A).
The results of the regulatory network of osteoblastogenesis confirms many known regulators such as DLX3 (Hassan et al., 2004), DDIT3 (CHOP) (Shirakawa et al., 2006), ATF4, and FOXO1 (Long, 2011), while revealing many other factors that have not been previously associated to osteoblast differentiation (Figure 3B).
The analysis identified several TFs that could play a role in both lineages. In keeping with previous findings using the same ST2 cell line, ID4 was identified by EPIC-DREM as an activator of upregulated genes during osteoblastogenesis, while in adipocytes ID4 was controlling downregulated genes, thereby favouring the osteoblast lineage (Tokuzawa et al., 2010). Among other TFs implicated in several different splits in both lineages we found for example VDR, GLIS1, ARNT2 and AHR.
Taken together, the EPIC-DREM approach can identify many known key regulators of adipocyte and osteoblast differentiation and predicts additional novel regulators and the bifurcation events they control in an entirely unbiased manner relying only on the available time series data.
Identification of dynamic SEs in adipocyte and osteoblast differentiation
While EPIC-DREM can efficiently identify many of the main regulators of the differentiation time courses, it still yields a list of tens of TFs. In order to further prioritize the identified main regulators, we hypothesized that the key TFs of the differentiation processes would be controlled by SEs with dynamic profiles. To obtain such profiles of SEs across time points, we first identified all SEs with a width of at least 10 kb separately in each of the 10 H3K27ac ChIP-seq data sets from the two time courses. Next, to allow a quantification of the SE signals across time while also accounting for the changes in the width of the SEs, we combined all SEs from all time points with at least 1 nucleotide overlap into one broader genomic region, called a merged SE. Figure 4A illustrates how the 49 SEs found at the Cxcl12 locus, producing a chemokine essential for maintenance of the hematopoietic stem cells (HSCs) (Ambrosi et al., 2017), can be combined into one exceptionally large merged SE region that enables the quantification of the SE signal across the time. The normalized read counts under the merged SE region were collected and the relative SE signals normalized to D0 are shown in Figure 4B. To confirm that the obtained profile is a reasonable estimate of the SE activity and to see whether it could indeed be regulating the Cxcl12 gene expression, we compared the SE signal profiles to the Cxcl12 mRNA profiles. As shown in Figure 4B, in both lineages the SE signal closely followed the mRNA expression profile with Pearson correlation coefficients of 0.83 and 0.89, respectively. Moreover, similar correlations could be seen for most other identified merged SEs, further confirming the applicability of our approach.
In total, we identified 1052 merged SEs across the two lineages (Supplementary Table S3). 120 and 79 merged SEs showed a dynamic profile (log2FC≥1 in at least one time point) in adipocyte and osteoblast differentiation, respectively (Figure 4C–4E). Consistent with the different dynamics in transcriptomic changes (Figure 1), the adipocyte SEs could be divided into four separate main profiles based on their dynamics (Figure 4C; Supplementary Figure S3) while most osteoblast SEs could be assigned into one of two simple profiles that either increase or decrease in signal over time (Figure 4D).
25 dynamic SEs showed changes in both differentiation processes (Figure 4E). Based on the proximity, presence of H3K4me3-marked TSSs and correlation with the expression profiles, these SEs could be assigned to 18 separate genes, several of which have already been implicated in MSCs or cell types derived from them (Figure 4E) (e.g. (Ambrosi et al., 2017) (Gu et al., 2007) (Cui et al., 2007)). However, only three of the genes, aryl hydrocarbon receptor (Ahr), GLIS family zinc finger 1 (Glis1), and Meningioma 1 (Mn1), encode TFs and could serve as central nodes in GRNs. Interestingly, both AHR and GLIS1 were already predicted by EPIC-DREM as regulators of multiple bifurcation events in both lineages (Figure 3), while MN1 motif is not included in the motif collection currently used by EPIC-DREM. Moreover, Ahr is associated with four separate merged SEs, more than any other TF in our analysis. Therefore, we next focused on Ahr regulation.
Ahr is controlled by multiple SEs in MSCs and repressed with lineage-specific dynamics
The four adjacent merged SEs (SE283, SE284, SE285, and SE286) at Ahr locus together cover a continuous region over 300 kb of active enhancer signal downstream of the Ahr gene in the MSCs (Figure 5A and 5B). All four SE regions follow similar dynamics across the time points (Figure 5A–5D) although SE283 did not pass the FC cut-off in the initial analysis in Figure 4. Moreover, all four SEs showed a very high correlation (r≥0.95) with Ahr mRNA levels as measured by RNA-seq (Figure 5C–5D, upper panels) and validated by RT-qPCR (Figure 5C–5D, lower panels). In adipogenesis Ahr expression is repressed already by D1 and remains repressed throughout the differentiation. Similarly, the signal from all SE regions is becoming reduced already after 1 day. In contrast in osteoblasts, the SE regions first show an increase in signal on D1, followed by gradual reduction from D3 onwards, consistent with the Ahr mRNA levels. Finally, in both lineages the repression is accompanied by a decreased signal of H3K36me3 in the gene body of Ahr, confirming repression at transcriptional level.
While the newly identified SE cluster at the Ahr locus is also flanked by another active gene, Snx13, it is unlikely that the SEs contribute to its regulation; during the differentiations Snx13 shows very few changes in its expression or H3K36me3 signal (Figure 5A–5B and data not shown). Moreover, inspection of Hi-C data of surrounding the Ahr locus in different mouse cell types indicates that Ahr and the four SEs are located in their own topological domain (TAD) separate from the Snx13 gene (Supplementary Figure S4) (“The 3D Genome Browser,”).
Interestingly, the SEs controlling Ahr appear to be specific for the multipotent MSCs as only weak H3K27ac signal could be detected in the corresponding genomic region in mouse 3T3-L1 pre-adipocytes that are more committed towards the white adipocyte lineage (Figure 5E, (Mikkelsen et al., 2010)). Importantly, the large SE domains downstream of the Ahr gene could be identified also in human MSCs, but not in other inspected human cell types, suggesting that the complex regulation of Ahr expression in these multipotent cells could be conserved and relevant also in human development (Supplementary Figure S5).
AHR regulates mesenchymal multipotency through repression of lineage-specific genes such as Notch3
Based on the above results, we reasoned that Ahr could play an important role in maintaining MSCs in a multipotent state. Indeed, previous work has separately shown that both adipogenesis and osteoblastogenesis can be inhibited by toxic compounds like dioxin that are xenobiotic ligands of AHR (Alexander et al., 1998). (Naruse et al., 2002), (Korkalainen et al., 2009).
While AHR is best known for its role as a xenobiotic receptor, it has been recently suggested that it could also play a role in stem cell maintenance in HSCs under the control of endogenous ligands (Gasiewicz, Singh, & Bennett, 2014). We therefore tested whether the endogenous activity of AHR is important for the maintenance of the appropriate transcriptional program of the MSCs by knocking down AHR in undifferentiated cells, or those differentiated for one day towards either lineage (Figure 6A, Supplementary Figure S6). Two days following the KD total RNA was extracted and subjected to RNA-seq analysis.
At first, we took advantage of these context-specific KD data to ask whether the EPIC-DREM predicted primary AHR targets at the corresponding time points were indeed affected by depletion of AHR. As shown in Figure 6B, at each condition the predicted AHR targets were significantly more affected by AHR-KD than all genes on average. Especially the top genes with highest affinity scores for AHR regulation, were clearly shifted towards more upregulation at each condition, arguing for functional relevance of the EPIC-DREM predictions and for AHR’s role as transcriptional repressor.
The number of genes significantly affected by the AHR-KD compared to the respective control siRNA transfections was greatly dependent on the cellular condition and ranged from 614 genes in the undifferentiated cells to 722 and 1819 genes in the one day differentiated osteoblasts and adipocytes, respectively (Figure 6B–6D, Supplementary Table S4). The higher extent of genes affected in the differentiated cells, and especially in the adipocyte lineage, is well in keeping with the ratio of genes normally changing in the early osteoblasto- and adipogenesis. This suggests that early changes induced by the knock-down in the undifferentiated cells are amplified in the differentiating cells. To further elucidate the role of AHR in the undifferentiated cells, we asked how the genes deregulated were behaving later in the normally differentiated D15 adipocytes and osteoblasts. The scatter plots in Figure 6C indicate the transcriptome changes taking place upon AHR-KD in MSCs compared to changes of the same genes in the differentiated cell types. As indicated by the colour coding, approximately half of the genes affected by AHR-KD were also differentially expressed after normal differentiation (286 in adipocytes and 314 in osteoblasts). Moreover, genes that were upregulated in AHR-KD were also often induced in the differentiated cell types (Figure 6C). Therefore AHR might serve as a guardian of the multipotent state in the undifferentiated cells, with its downregulation allowing increased expression of the lineage-specific genes, as already suggested by the EPIC-DREM analysis (Figure 3).
To better understand the function of putative AHR target genes in the MSCs, we overlapped the differentially expressed genes from the undifferentiated cells with those identified in the two other KD experiments and obtained 266 high-confidence target genes that were affected in all three conditions (Figure 6D). Enrichment analysis for tissue-specific expression profiles of these genes in human and mouse gene atlas databases revealed smooth muscle, adipocytes, and osteoblasts as the most enriched cell types, where the AHR regulated genes are normally expressed (Figure 6E). Consistently, inspection of publicly available ChIP-seq data from various cell types revealed the other regulators of the AHR targets to include CEBPB, PPARG, and NR1H3, all of which are important regulators involved in induction of genes during adipocyte differentiation (Galhardo et al., 2014) (Nielsen et al., 2008).
Taken together, the above findings support a role for AHR as a regulator of lineage-specific genes that need to remain repressed in the multipotent MSCs. Among such genes induced upon AHR-KD we identified Notch3, a known regulator of cellular differentiation (Bray, 2016) (Figure 6C). In both differentiation time courses Notch3 expression showed an anti-correlating profile compared Ahr, with Notch3 becoming induced while Ahr levels decreased (Figure 6F). The Notch3 induction was also accompanied by increased Notch4 levels in both lineages while the third abundantly expressed receptor, Notch1, was concomitantly downregulated (Supplementary Figure S6). Thus, it appears that a reprogramming of Notch signaling could be involved in the commitment of MSCs and this rewiring is at least partially under AHR-mediated control.
DISCUSSION
The ability to obtain unbiased GRNs and to identify their key nodes for any given cell state transition in a data-driven manner is becoming increasingly relevant for regenerative and personalized medicine. Understanding such dynamic networks can be improved by obtaining genome-wide time-series data sets such as transcriptomics or epigenomics data. To seamlessly integrate such data sets we have combined time point-specific high accuracy TF binding predictions with probabilistic modeling of temporal gene expression data and applied it to our own time-series data from mesenchymal differentiation (Figure 2 and Figure 3). Similar time series data collections have previously been used to study for example hematopoiesis (Goode et al., 2016) and myeloid differentiation (Ramirez et al., 2017). However, the derived dynamic GRNs have relied on experimentally identified TF binding sites, that cover only a fraction of all TF-target gene interactions, or on a sub-network of selected TF-TF interactions, respectively.
EPIC-DREM can reveal the key TFs controlling co-expressed gene sets of interest. Still, consistent with the co-operative nature of TF activity (Siersbaek et al., 2014), the number of putative master regulators is often very large. Recent work elucidating the role of SEs in controlling cell type-specific master regulators has provided researchers with a new tool for data-driven identification of such regulators (Hnisz et al., 2013) (Parker et al., 2013). We hypothesized that merged SEs with dynamic behavior during differentiation based on their H3K27ac signal would allow finding the genes, including the TFs, most relevant for the dynamic process. Indeed, quantification of merged SEs shows high correlation with expression levels of their target genes over time, both validating the approach and allowing for more accurate association of SEs to their target genes (Figure 4). A recent study applied a similar strategy for SE quantification and further showed that SE dynamics, as measured by MED1 occupancy, were predictive of enhancer looping to target genes, and highlighted H3K27ac as the histone modification that best predicted such loop dynamics, further supporting the validity of our approach (Siersbaek et al., 2017).
Combining EPIC-DREM results and dynamic SE profiling points towards two TFs with potentially significant roles in both lineages, Ahr and Glis1 (Figure 4). In addition, dynamic SE profiling supports a role for Mn1. Both Ahr and Glis1, and their SEs, show an overall reduction in signal during both differentiations, although with differential and lineage-specific dynamics (Figure 5 and data not shown), suggesting that they could play an important role in maintaining MSCs. On contrary, the SE downstream of Mn1 shows an increase in its signal in both lineages (data not shown), suggesting it could be a general driver of differentiation. Indeed, MN1 has already been shown to be targeted by VDR in osteoblasts and to be required for proper osteoblast differentiation (Meester-Smoor et al., 2005; Sutton et al., 2005; X. Zhang et al., 2009). Our results suggest a similar role also in adipocytes.
GLIS1 has not been functionally associated to adipocyte or osteoblast differentiation although recent work has implicated it as differentially expressed in brown adipocyte differentiation (Pradhan et al., 2017). However, consistent with a potential role in the multipotent progenitors, GLIS1 has been shown capable of promoting reprogramming of fibroblasts to induced pluripotent stem cells (iPSCs) (Maekawa et al., 2011). Further work will be needed to elucidate the role of GLIS1 in MSCs.
Unlike for GLIS1 and MN1, previous work has already linked AHR separately to inhibition of both adipocyte and osteoblast differentiation through studies on biological impact of dioxin, an environmental toxin capable of activating AHR (Alexander et al., 1998), (Naruse et al., 2002), (Korkalainen et al., 2009). In 3T3-L1 adipocytes this inhibition is known to be mediated through overexpression of Ahr in a dioxin-independent manner (Shimb et al., 2001), while increased levels of Ahr expression in MSCs in rheumatoid arthritis are inhibitory of osteogenesis (Tong et al., 2017). Still, the biological function of AHR in the MSCs has remained unclear while maintanance of HSCs, located in the same niche of bone marrow with MSCs, has been suggested to depend on the normal function of AHR (Gasiewicz et al., 2014). Moreover, as HSC maintenance also depends on the cytokines and chemokines provided by the MSCs, AHR is likely to impact HSCs through its gene regulatory functions in both MSCs and HSCs (Ambrosi et al., 2017) (Boitano et al., 2010) (Jensen et al., 2003).
Here we show that repression of Ahr in MSC differentiation happens with lineage-specific dynamics and is accompanied by similar reduction in signal of an exceptionally large SE cluster downstream of Ahr. Together Ahr and the SEs form their own TAD in mouse cells and in human cells the SE signal is specific for MSCs. Analysing the contributions of the different constituents of the Ahr-SEs will be important for understanding which pathways converge to regulate Ahr in MSCs and whether they function in a synergistic manner as suggested for some other large SEs (Hnisz et al., 2015) (Dukler et al., 2016).
AHR depletion in the undifferentiated and early stage differentiated cells confirmed many of the EPIC-DREM predictions but also revealed an enrichment for lineage-specific genes among AHR targets, including Notch3 (Figure 6). Notch signaling has been implicated in numeours developmental processes with highly diverse outcomes (Bray, 2016) and also in MSCs Notch3 regulation is accompanied by changes in other Notch genes (Supplementary Figure S6). Interestingly, AHR has been recently linked to regulation of Notch signaling in mouse lymphoid cells and testis (Lee et al., 2011) (Huang et al., 2016). However, the affected Notch receptors varied depending on the cell type in question. Notch genes show different expression profiles across tissues and cell types and Ahr-mediated regulation of Notch signaling could be context-specific depending on the prevailing GRN or chromatin landscape. Curiously, Notch3 has a cell type-selective expression profile, favouring mesenchymal tissues like bone, muscle and adipose tissue (Wu et al., 2016).
Our approach for identification of the dynamic GRNs and SEs allows key regulator identification in various time series experiments involving cell state changes. Our current results together with previous data identify AHR as a likely guardian of mesenchymal multipotency and provide an extensive resource for further analyses of mesenchymal lineage commitment.
METHODS
Cell culture
The mouse MSC line ST2, established from Whitlock-Witte type long-term bone marrow culture of BC8 mice (Ogawa et al., 1988), was used during all experiments. Cells were grown in Roswell Park Memorial Institute (RPMI) 1640 medium (Gibco, Life Technologies, 32404014) supplemented with 10% fetal bovine serum (FBS) (Gibco, Life Technologies, 10270-106, lot #41F8430K) and 1% L-Glutamine (Lonza, BE17-605E) in a constant atmosphere of 37°C and 5 % CO2. For differentiation into adipocytes and osteoblasts, ST2 cells were seeded 4 days before differentiation (day-4), reached 100% confluency after 48 hours (day-2) and were further maintained for 48 hours post-confluency (day 0). Adipogenic differentiation was subsequently initiated on day 0 (D0) by adding differentiation medium I consisting of growth medium, 0.5 mM isobutylmethylxanthine (IBMX) (Sigma-Aldrich, I5879), 0.25 μM dexamethasone (DEXA) (Sigma-Aldrich, D4902) and 5 μg/mL insulin (Sigma-Aldrich, I9278). From day 2 (D2) on differentiation medium II consisting of growth medium, 500 nM rosiglitazone (RGZ) (Sigma-Aldrich, R2408) and 5 μg/mL insulin (Sigma-Aldrich, I9278) was added and replaced every 2 days until 15 days of differentiation. Osteoblastic differentiation was induced with growth medium supplemented with 100 ng/mL bone morphogenetic protein-4 (BMP-4) (PeproTech, 315-27). Same media was replaced every 2 days until 15 days of osteoblastogenesis.
Gene silencing
Undifferentiated ST2 cells (day-1) were transfected with Lipofectamine RNAiMAX (Life Technologies, 13778150) according to manufacturer’s instructions using 50 nM of gene-specific siRNAs against mouse Ahr (siAhr) (Dharmacon, M-044066-01-0005) or 50 nM of a negative control siRNA duplexes (siControl) Dharmacon, D-001206-14-05). Cells were collected 48 h post-transfection. Sequences of the siRNAs are listed in Supplementary Table S5.
Western blotting
After a washing step of the cells with 1x PBS, and addition of 1x Läemmli buffer, the lysates were vortexed and the supernatants were heated at 95°C for 7 minutes. Proteins were subjected to SDS-PAGE (10% gel) and probed with the respective antibodies. The following antibodies were used: anti-AHR (Enzo Life Biosciences, BML-SA210-0025), anti-ACTIN (Merck Millipore, MAB1501). HRP-conjugated secondary antibodies were purchased from Cell Signaling. Signals were detected on a Fusion FX (Vilber Lourmat) imaging platform, using an ECL solution containing 2.5 mM luminol, 100 mM Tris/HCl pH 8.8, 0.2 mM para-coumaric acid, and 2.6 mM hydrogenperoxide.
RNA extraction and cDNA synthesis
Total RNA was extracted from ST2 cells using TRIsure (Bioline, BIO-38033). Medium was aspirated and 1000 μL of TRIsure was added to 6-wells. To separate RNA from DNA and proteins, 200 μL of chloroform (Carl Roth, 6340.1) was added. To precipitate RNA from the aqueous phase, 400 μL of 100% isopropanol (Carl Roth, 6752.4) was added and RNA was incubated at -20°C overnight. cDNA synthesis was done using 1 μg of total RNA, 0.5 mM dNTPs (ThermoFisher Scientific, R0181), 2.5 μM oligo dT-primer (Eurofins MWG GmbH, Germany), 1 U/μL Ribolock RNase inhibitor (ThermoFisher Scientific, EO0381) and 1 U/μL M-MulV Reverse transcriptase (ThermoFisher Scientific, EP0352) for 1h at 37°C or 5 U/ μL RevertAid Reverse transcriptase for 1 h at 42°C. The PCR reaction was stopped by incubating samples at 70°C for 10 minutes.
Quantitative PCR
Real-time quantitative PCR (qPCR) was performed in an Applied Biosystems 7500 Fast Real-Time PCR System and using Thermo Scientific Absolute Blue qPCR SYBR Green Low ROX Mix (ThermoFisher Scientific, AB4322B). In each reaction 5 μL of cDNA, 5 μL of primer pairs (2 μM) and 10 μL of the Absolute Blue qPCR mix were used. The PCR reactions were carried out at the following conditions: 95°C for 15 minutes followed by 40 cycles of 95°C for 15 seconds, 55°C for 15 seconds and 72°C for 30 seconds. To calculate the gene expression level the 2-(ΔΔCt) method were used where ΔΔCt is equal to (ΔCt(target gene) – ΔCt(housekeeping gene))tested condition - (ΔCt(target gene) – ΔCt(housekeeping gene))control condition. Rpl13a was used as a stable housekeeping gene and D0 or siControl were used as control condition. Sequences of the primer pairs are listed in Supplementary Table S5.
Chromatin Immunoprecipitation
Chromatin immunoprecipitation of histone modifications was performed on indicated time points of adipocyte and osteoblast differentiation. Cells were grown on 10 cm2 dishes. First, chromatin was cross-linked with formaldehyde (Sigma-Aldrich, F8775-25ML) at a final concentration of 1% in the culture media for 8 minutes at room temperature. Then, the cross-linked reaction was quenched with glycine (Carl Roth, 3908.3) at a final concentration of 125 mM for 5 minutes at room temperature. The formaldehyde-glycine solution was removed and cells were washed twice with ice-cold phosphate-buffered saline (PBS) (Lonza, BE17-516F) containing cOmpleteTM mini Protease Inhibitor (PI) Cocktail (Roche, 11846145001). Then, cells were lysed in 1.7 mL of ice-cold lysis buffer [5 mM 1,4-Piperazinediethanesulfonic acid (PIPES) pH 8.0 (Carl Roth, 9156.3); 85 mM potassium chloride (KCl) (PanReac AppliChem, A2939); 0.5 % 4-Nonylphenyl-polyethylene glycol (NP-40) (Fluka Biochemika, 74385)] containing PI and incubated for 30 minutes on ice. The cell lysates were then centrifuged at 660 xg for 10 min at 7°C and the pellet was resuspended in 400 μL of ice-cold shearing buffer [50 mM Tris Base pH 8.1 (Carl Roth, 4855.2); 10 mM Ethylenediamine tetraacetic acid (EDTA) (Carl Roth, CN06.3); 0.1 % Sodium Dodecylsulfate (SDS) (PanReac Applichem, A7249); 0.5 % Sodium deoxycholate (Fluka Biochemika, 30970)] containing PI. Chromatin was sheared with a sonicator (Bioruptor®Standard Diagenode, UCD-200TM-EX) during 20 cycles at high intensity (30 s off and 30 s on) for the ST2 cells differentiated into adipocytes and osteoblasts and 25 cycles at high intensity (30 s off and 30 s on) for the ST2 differentiated into osteoblasts for 9 days on. The sheared cell lysate was then centrifuged at 20817 xg for 10 minutes at 7°C and the supernatant containing the sheared chromatin was transferred to a new tube. For each immunoprecipitation 10 μg (for H3K4me3) or 15 μg (for H3K27ac and H3K36me3) of sheared chromatin and 4 μg as input were used. The sheared chromatin was diluted 1:10 with modified RIPA buffer [140 mM NaCl (Carl Roth, 3957.2); 10 mM Tris pH 7.5 (Carl Roth, 4855.2); 1 mM EDTA (Carl Roth, CN06.3); 0.5 mM ethylene glycol-bis(β-amino-ethyl ether)-N,N,N’,N’-tetraacetic acid (EGTA) (Carl Roth, 3054.3); 1 % Triton X-100 (Carl Roth, 3051.2); 0.01 % SDS (PanReac Applichem, A7249); 0.1 % sodium deoxycholate (Fluka Biochemika, 30970)] containing PI. The diluted sheared chromatin was incubated overnight with the recommended amount provided by the manufacturer of an antibody against H3K4me3 (Millipore, 17-614), 5 μg of an antibody against H3K27ac (Abcam, ab4729) or 5 μg of an antibody against H3K36me3 (Abcam, ab9050). The next day, the antibodies were captured using 25 μL of PureProteome™ Protein A Magnetic (PAM) Bead System (Millipore, LSKMAGA10) for 2 hours at 4°C on a rotating wheel. After, the PAM beads were captured using a DynaMagTM-2 magnetic stand (Life Technologies, 12321D). The supernatant was discarded and the PAM beads were washed twice with 800 μL of Immunoprecipitation wash buffer 1 (IPWB1) [20 mM Tris, pH 8.1 (Carl Roth, 4855.2); 50 mM NaCl (Carl Roth, 3957.2); 2 mM EDTA (Carl Roth, CN06.3); 1 % Triton X-100 (Carl Roth, 3051.2); 0.1 % SDS (PanReac Applichem, A7249)], once with 800 μL of Immunoprecipitation wash buffer 2 (IPWB2) [10 mM Tris, pH 8.1 (Carl Roth, 4855.2); 150 mM NaCl (Carl Roth, 3957.2); 1 mM EDTA (Carl Roth, CN06.3), 1 % NP-40 (Fluka Biochemika, 74385), 1 % sodium deoxycholate (Fluka Biochemika, 30970), 250 mM of lithium chloride (LiCl) (Carl Roth, 3739.1)], and twice with 800 μL of Tris-EDTA (TE) buffer [10 mM Tris, pH 8.1 (Carl Roth, 4855.2); 1 mM EDTA (Carl Roth, CN06.3), pH 8.0]. Finally, the PAM beads and the inputs were incubated with 100 μL of ChIP elution buffer [0.1 M sodium bicarbonate (NaHCO3) (Sigma-Aldrich, S5761); 1 % SDS (PanReac Applichem, A7249)]. The cross-linking was reversed by adding 10 μg of RNase A (ThermoFisher, EN0531) and 20 μg of proteinase K (ThermoFisher, EO0491) at 65°C overnight. Then, the eluted chromatin was purified using a MinElute Reaction Cleanup Kit (Qiagen, 28206) according to the manufacturer’s instructions. The DNA concentration was measured using the Qubit® dsDNA HS Assay Kit (ThermoFisher, Q32851) and the Qubit 1.0 fluorometer (Invitrogen, Q32857) according to the manufacturer’s instructions.
ChIP-Seq
The sequencing of the ChIP samples was done at the Genomics Core Facility in EMBL Heidelberg, Germany. For sequencing, single-end and unstranded reads were used and the samples were processed in an Illumina CBot and sequenced in an Illumina HiSeq 2000 machine. In total, 979 572 918 raw reads were obtained. Raw reads quality was assessed by fastqc [v0.11, (“FastQC,”)]. This quality control unveiled that some reads were containing part of the adapters. Those spurious sequences were cleaned up from the genuine mouse sequences by AdapterRemoval (Lindgreen, 2012) [v1.5]. The PALEOMIX pipeline (Schubert et al., 2014) [v1.0.1] was used for all steps from FASTQ files to BAM files including trimming, mapping, and duplicate marking. This workflow ensures that all files are complete and valid. Retained reads were required to have a minimum length of 25 bp. Bases with unreliable Phred scores (0-2) were trimmed out. In total 31 909 435 reads were discarded (3.26%). Eventually, 947 663 483 reads were retained (96.74%). Trimmed reads were further mapped using BWA (Li & Durbin, 2009) [v0.7.10] with the backtrack algorithm dedicated to short sequences. The mouse reference was the mouse genome GRCm38.p3 (mm10, patch 3) downloaded from NCBI. For validating, merging BAM files, and marking duplicates, we used the suite tool Picard [v1.119, (“Picard,”)]. Duplicates were marked but not removed. Only reads with a mapping quality of 30 were retained to ensure a unique location on the genome resulting in 661 364 143 reads (69.79% of the trimmed reads). The samples with a coverage of less than 8 million reads (mapping quality > 30) were excluded from the downstream analysis. Raw FASTQ and BAM files have been deposited in the European Nucleotide Archive with the accession number PRJEB20933.
The ChIP-Seq peaks were called with Model-based analysis of ChIP-Seq (Zhang et al., 2008) (MACS) version 2.1.0 for H3K4me3, with HOMER (Heinz et al., 2010) for H3K27ac, and with SICER (Zang et al., 2009) version 1.1 for H3K36me3, using input from undifferentiated ST2 cells as control for IPs from D0 cells and input from D5 adipocyte-or osteoblast-differentiated cells for the IPs from the respectively differentiated cells.
RNA-Seq
The sequencing of the time course samples was done at the Genomics Core Facility in EMBL Heidelberg, Germany. For sequencing, single-end and unstranded reads were used and the samples were processed in an Illumina CBot and sequenced in an Illumina NextSeq machine.
The sequencing of the AHR knock-down samples was performed at the Luxembourg Center for Systems Biomedicine (LCSB) Sequencing Facility. The TruSeq Stranded mRNA Library Prep kit (Illumina) was used to prepare the library for sequencing with 1 μ g of RNA as starting material according to manufacturer’s protocol. The library quality was checked using an Agilent 2100 Bioanalyzer and quantified using Qubit dsDNA HS assay Kit. The libraries were then adjusted to 4 nM and sequenced on a NextSeq 500 (Illumina) according to the manufacturer’s instructions.
The obtained reads were quality checked using FastQC version 0.11.3 (“FastQC,”). Cutadapt version 1.8.1 (Martin, 2011) was used to trim low quality reads (-q 30 parameter), remove Illumina adapters (-a parameter), remove reads shorter than 20 bases (-m 20 parameter) with an error tolerance of 10% (-e 0.1 parameter). Then, removal of reads mapping to rRNA species was performed using SortMeRNA (Kopylova et al., 2012) with the parameters –lother, –log, -a, -v, –fastx enabled. Lastly, the reads were quality checked using FastQC version 0.11.3 to control whether bias could have been introduced after the removal of Illumina adapters, low quality reads and rRNA reads. Then, the reads were mapped to the mouse genome mm10 (GRCm38.p3) and using the gene annotation downloaded from Ensembl (release 79) using the Spliced Transcripts Alignment to a Reference (Dobin et al., 2013) (STAR) version 2.5.2b using the previously described parameters (Baruzzo et al., 2017). The reads were counted using the function featureCounts from the R package Rsubread (Liao et al., 2014) version 1.4.6-p3 and the statistical analysis was performed using DESeq2 (Love et al., 2014) version 1.14.1 in R 3.3.2 and RStudio (RStudio Team (2015). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA).
EPIC-DREM analysis
To identify TFs that have a regulatory function over time, we designed a new computational workflow that combines the computational TF prediction method TEPIC (Schmidt et al., 2017) with DREM (Schulz et al., 2012), a tool to analyze the dynamics of transcriptional regulation.
We identified TF footprints in the H3K27ac signal using HINT-BC (Gusmao et al., 2016), which is included in the Regulatory Genomics Toolbox, version 0.9.9. Next, we predicted TF binding in those footprints using TEPIC, version 2.0. We used the provided set of 687 PWMs for Mus musculus and mouse genome version mm10 (GRCm38). As DREM requires a time point-specific prediction of binding of a regulator with its target, we needed to develop an approach to determine a suitable TF-specific affinity cut-off, for each time point. For this, we created a similar set of random regions that mirrors the GC content and length distribution of the original sequences of the footprints. TF affinities ar calculated in the random regions are used to determine a suitable cut-off for the original affinities ao using the frequency distribution of the TF affinities. Affinities for TF i are denoted by ari and aoi. Let r ϵ R denote a randomly chosen genomic region that is screened for TF binding, and let |r| denote its length. Analogously, let o ϵ O denote a footprint that is screened for TF |o| binding, and let denote its length. We normalize both ari and aoi by the length of their corresponding region and obtain the normalized TF affinities and
Using the distribution of values we derive a TF-specific affinity threshold ti for a p-value cut-off of 0.05 (See section ChIP-seq validation of TEPIC affinity cut-off for how this p-value was chosen). For a TF i, we compute a binary affinity value from the original affinity according to the cut-off t with:
The binary affinity values boi can be used to compute a binary TF – gene association agi between gene g and TF i: , where Ogw denotes all footprint regions that occur within a window of size w around the TSS of gene g.
Informally, a g gene is associated to TF i if there is a predicted binding site within a window of predefined size w around the gene’s TSS. Here, we use w = 50kb. Together with gene expression estimates, the TF– gene associations can be directly used as input to DREM. In this analysis, we used version 2.0.3 of DREM. The entire workflow of EPIC-DREM is shown in Figure 2. TEPIC is available online at (“TEPIC”), DREM can be downloaded at (“DREM”).
ChIP-seq validation of TEPIC affinity cut-off
To validate that the affinity threshold described above indeed results in an adequate separation between bound and unbound sites, we conducted a comparison to TF-ChIP-seq data. We obtained TF-ChIP-seq data from ENCODE for K562 (18 TFs), HepG2 (36 TFs), and GM12878 (24 TFs). In addition, we downloaded H3K27ac data for the mentioned cell lines from ENCODE. A list of all ENCODE accession numbers is provided in Supplementary Table S6. As described above, we called footprints using HINT-BC and calculated TF affinities in the footprints as well as in the randomly selected regions that map the characteristics of the footprints. To understand the influence of different thresholds, we calculated affinity thresholds for the following p-values: 0.01, 0.025, 0.05, 0.075, 0.1, 0.2, 0.3, 0.4, and 0.5. All affinities below the selected affinity value are set to zero, the remaining values are set to one. The quality of the discretization is assessed through the following “peak centric” validation scheme, as used before in (Cuellar-Partida et al., 2012). The positive set of the gold standard is comprised of all ChIP-seq peaks that contain a motif predicted by FIMO (Grant et al., 2011), the negative set contains all remaining ChIP-seq peaks. A prediction is counting as a true positive (TP) if it overlaps the positive set, it counts as a false positive (FP) it if overlaps the negative set. The number of false negatives (FN) is the number of all entries in the positive set that are not overlapped by any prediction. For all TFs in all cell lines we calculate Precision (PR) and Recall (REC) according to
As one can see, Precision is increasing with a stricter p-value threshold, while Recall is decreasing. We found that using 0.05 seems to be a reasonable compromise between Precision and Recall. The median Precision and Recall values calculated over all cell lines and all TFs are shown in Supplementary Figure S2A. Detailed results on a selection of TFs that are present in all three cell lines are shown in Supplementary Figure S2B.
Identification of dynamic merged SEs
In order to identify temporal SEs across both lineages, BedTools (Quinlan & Hall, 2010) version 2.24.0, Hypergeometric Optimization of Motif EnRichment (Heinz et al., 2010) (HOMER) version 4.7.2 and Short Time-series Expression Miner (Ernst & Bar-Joseph, 2006) (STEM) version 1.3.8 were used. First, the coverage of individual SEs was summarized using genomeCoverageBed command using –g mm10 and –bg parameters. Then, unionBedGraphs command was used to combine multiple SEs coverage into a single map such that the SEs’ coverage is directly comparable across multiple samples. Finally, mergeBed command was used to combine SEs overlapping ≥ 1 bp into a single merged SE which spans all the combined SEs. In order to calculate the normalized read count number of merged SEs, annotatePeaks.pl with –size given and –noann parameters was used. Lastly, STEM was used to cluster and identify SEs temporal profiles and SEs with Maximum_Unit_Change_in_Model_Profiles_between_Time_Points 2 and Minimum Absolute Expression Change 1.0 were considered as dynamic.
Enrichment analysis
EnrichR (14.4.2017) (Chen et al., 2013; Kuleshov et al., 2016) was used to perform gene enrichment analysis.
Availability of data and materials
The datasets generated and analysed during the current study are available in the European Nucleotide Archive with the accession number PRJEB20933. Scripts can be requested from authors directly.
COMPETING INTERESTS
The authors declare they have no competing interests.
FUNDING
This work was supported by funding from the University of Luxembourg. DG was supported by fellowship from the Luxembourg National Research Fund (FNR) (AFR 7924045).
AUTHOR CONTRIBUTIONS
DG, TS and LS conceived the project and designed the experiments and analysis. DG performed all the experiments and DG and LS analyzed the results. DG and AG performed the RNA-seq and ChIP-seq analysis. FS and MHS developed the EPIC-DREM approach. DG, FS and MHS performed the EPIC-DREM analysis. MS performed the Western blot analysis. RH prepared the libraries and performed the sequencing for the AHR-KD experiments. PE developed the randomization method to derive control footprint regions. All authors commented on the manuscript.
ACKNOWLEDGEMENTS
We would like to thank Dr. Maria Bouvy-Liivrand for help with establishing the ST2 cell culture and differentiation and EMBL Gene Core at Heidelberg for support with high-throughput sequencing. The experiments presented in this paper were carried out using the HPC facilities of the University of Luxembourg (Varrette et al., 2014).
Footnotes
Deborah.Gerard{at}uni.lu; fschmidt{at}mmci.uni-saarland.de; Aurelien.Ginolhac{at}uni.lu; Martine.Schmitz{at}uni.lu; Rashi.Halder{at}uni.lu; pebert{at}mpi-inf.mpg.de; mschulz{at}mmci.uni-saarland.de; Thomas.Sauter{at}uni.lu; Lasse.Sinkkonen{at}uni.lu