Abstract
A reason microorganisms are so successful is their ability to rapidly adapt to constantly changing environments. Bacterial small RNAs (sRNAs) play an important role in adaptive responses by shaping gene expression profiles and integrating multiple regulatory pathways. This enables microorganisms to more efficiently counteract environmental insults than could be achieved by simply switching transcription factors on and off. Although a plethora of sRNAs have been identified, the majority have not been functionally characterized and their relative contribution in regulating adaptive responses remains unclear. To better understand how Escherichia coli acclimatizes to changes in nutrient availability, we performed UV cross-linking, ligation and sequencing of hybrids (CLASH) to uncover sRNA-target interaction networks. Using this proximity-dependent RNA-RNA ligation method, we uncovered thousands of sRNA-target duplexes associated with the RNA chaperone Hfq at specific growth stages, many of which have not been described before. Our work revealed that 3′UTR-derived sRNAs and sRNA-sRNA interactions are more prevalent than previously anticipated. We uncovered sRNA-target interaction networks that play a role in adaptation to changes in nutrient availability by enhancing the uptake of nutrients from the environment. We describe detailed functional analyses of a novel sRNA (MdoR), the first example of a bacterial 3′UTR-derived sRNA that functions as part of a mixed coherent feed forward loop. MdoR enhances the effectiveness of maltose uptake by (a) inactivating repressive pathways that blocks the accumulation of specific maltose transporters and (b) by reducing the flux of general porins to the outer membrane. Our work suggests that many mRNAs encode regulatory sRNAs embedded within their 3′UTRs, allowing direct regulatory interactions between functionally related mRNAs. Small RNA sponging interactions appear frequently and reveal that major nutritional stress responses are coordinated post-transcriptionally. This provides striking examples of how cells utilize sRNA regulatory networks to integrate multiple signals and regulatory pathways to enhance nutrient stress adaptation.
Microorganisms are renowned for their ability to adapt to environmental changes by rapidly rewiring their gene expression program. Complex responses are mediated through integrated transcriptional and post-transcriptional networks. Control at the transcriptional level dictate which genes are expressed1,2 and is well-characterised in Escherichia coli. Post-transcriptional regulation has not been extensively studied, but is key for regulating adaptive responses. By using riboregulators and RNA-binding proteins (RBPs), cells can efficiently integrate multiple pathways and incorporate additional signals into regulatory circuits. E. coli employs many post-transcriptional regulators, including small regulatory RNAs (sRNAs3), cis-acting RNAs4, and RNA binding proteins (RBPs), such as ribonucleases5, RNA chaperones and RNA helicases. The sRNAs are the largest class of bacterial regulators, which often work in tandem with RBPs to regulate their RNA targets3,6. The base-pairing interactions are often mediated by RNA chaperones such as Hfq and ProQ, which help to anneal or stabilize the sRNA and sRNA-target duplex7–9. Small RNAs can repress or stimulate translation and transcription, as well as control mRNA stability3,7,10,11. Although sRNAs are important components of regulatory networks12, relatively little is known about how sRNAs help cells to efficiently respond to environmental challenges.
To understand sRNA-mediated adaptive responses, detailed knowledge of the underlying circuits is required. In E. coli, hundreds of sRNAs have been discovered, but only a limited number have been thoroughly characterised. A key step to unravel the roles of sRNAs in regulating gene expression is to identify their targets. To tackle this globally, high-throughput methods have been developed that have uncovered a plethora of sRNA-target interactions, many more than previously anticipated13–18.
During growth in rich media, E. coli are exposed to continuously changing conditions, such as carbon-source shifts, and fluctuations in nutrient availability, pH and osmolarity. Consequently, E. coli elicit complex responses that result in physiological and behavioural changes such as envelope composition remodelling, quorum sensing, nutrient scavenging, swarming and biofilm formation. Even subtle changes in the growth conditions can trigger adaptive responses. Accordingly, each stage of the growth curve is characterised by different physiological states driven by activation of different transcriptional and post-transcriptional networks. Moreover, growth phase dependency of virulence and pathogenic behaviour has been demonstrated in both Gram-positive and Gram-negative bacteria – in some cases a particular growth stage is non-permissive for the induction of virulence19,20. Although the exponential and stationary phases have been characterised in detail21,22, little is known about the transition between these two phases, which is a very dynamic stage of the cell population cycle, involving many concurrent changes that span multiple metabolic pathways. There are deep physiological differences between exponentially-growing cells and cells that have reached stationary phase, yet a complete view of how the timings and dynamics are achieved, and how gene expression is modulated during these changes remains elusive. During these stages the cell population switches between alternative carbon sources as their preferred sources are exhausted from the medium. To adapt, they need to rapidly rewire their transcriptome to be able to scavenge these alternative carbon sources23–25. For instance, as substrate availability varies, the expression of specific nutrient transporters and metabolic enzymes is regulated accordingly. The role of sRNAs in regulating this transition is even less clear. Most sRNAs are expressed and associate with Hfq in a growth-dependent manner, with the vast majority being primarily expressed during stationary phase, and only a few being bound by Hfq at the transition stage26. However, the function of these sRNAs in regulating the transition phase is still unclear.
Here we applied UV cross-linking, ligation and sequencing of hybrids (CLASH)27,28 on E. coli Hfq to uncover sRNA-target RNA interaction dynamics that take place during the entry from exponential to stationary phase. We demonstrate that the highly stringent purification steps make Hfq CLASH a robust method for direct mapping of Hfq-mediated sRNA-target interactions in E. coli. We highlight the dynamics of these interactions during the transition from exponential to stationary phases of growth in Lysogeny broth (LB). Our data revealed Hfq binding to 3′UTRs of hundreds of mRNA transcripts, suggesting that 3′UTR-derived sRNAs26,29–31 might be more prevalent than previously anticipated. Moreover, our analyses predicted a surprisingly large number of sRNA-sRNA interactions, several of which we experimentally validated. Our data suggests that CyaR, an sRNA that is primarily expressed during the transition from late exponential to stationary phase, plays an important role in enhancing nutrient uptake from the environment by antagonizing the sRNA GcvB, which targets mRNAs involved in amino acid and peptide uptake. This emphasizes the notion that sRNA-sRNA interactions play an important role in environmental adaptation. Our results expand on sRNA-target interactions uncovered by RNase E CLASH13 and RIL-seq18, and we show that Hfq CLASH can generate very robust results. We also characterized a novel 3′UTR-derived sRNA, which we refer to as MdoR (mal-dependent OMP repressor). Unlike the majority of bacterial sRNAs, MdoR is very transiently expressed during the progression from exponential to stationary phase. We demonstrate that MdoR is a degradation intermediate of the malG 3′UTR, the last transcript of the malEFG polycistron that encodes components of the maltose transport system. We show that MdoR directly downregulates a select number of mRNAs, including several encoding major porins. Analysis of the chimeric reads predicted that MdoR forms unusually long base-pairing interactions with many of its targets. We propose that MdoR is part of a regulatory network that, during the transition from exponential to stationary phase, stimulates accumulation of high affinity maltose transporters in the outer membrane by inactivating opposing pathways.
Results
Hfq CLASH in E. coli
To unravel the post-transcriptional networks that underlie the transition between exponential and stationary growth phases in E. coli, we performed CLASH27,28 (Fig. 1a) on Hfq to uncover growth-stage dependent sRNA regulatory networks. E. coli cells expressing an HTF (His6-TEV-3xFLAG)-tagged Hfq were grown in LB and an equal number of cells were harvested at different optical densities (OD600). Cell samples from seven different optical densities were subsequently subjected to Hfq CLASH (Fig. 1b). To generate high quality Hfq CLASH data we had to make a number of improvements to the original protocol used for RNAse E CLASH13. The biggest problem that needed to be resolved was the seemingly rapid degradation of RNA during the cell lysis and first immunoprecipitation steps, which resulted in the recovery of relatively short cross-linked RNA fragments and poor recovery of chimeric reads. To minimize this degradation, we shortened the incubation time with anti-FLAG beads to 1-2 hours. Crucially, actively growing cells were UV irradiated for only seconds in the Vari-X-linker32 and harvested by rapid filtration to increase the recovery of short-lived protein-RNA and RNA-RNA interactions and to minimize changes in the transcriptome induced by DNA damage. Finally, to improve the detection of chimeras, all libraries were paired-end sequenced. As a negative control, CLASH was performed on an untagged strain. The control samples had ∼10 times lower number of reads, demonstrating the specificity of the approach. To complement the CLASH data and for normalization purposes, RNAseq data was generated from cells harvested at the same optical densities.
Based on the growth curve shown in Fig. 1b, we categorized OD600 0.4 and 0.8 as exponential growth phase: 1.2, 1.8, 2.4 as the transition phase from exponential to stationary: 3.0 and 4.0 as early stationary phase. Strikingly, we observed that in LB medium Hfq levels in stationary phase were up to 18-fold higher compared to early log phase (Fig. 1c). To determine the cross-linking efficiency, Hfq-RNA complexes immobilized on nickel beads were radiolabelled, resolved on NuPAGE gels and detected by autoradiography. The data show that the recovery of Hfq and radioactive signal was comparable at each OD600 (Fig 1c), suggesting that the beads were saturated with Hfq during the purification.
Hfq binds to the transcriptome in a growth-stage dependent manner
Meta-analyses of the Hfq CLASH sequencing data revealed that the distribution of Hfq binding across mRNAs was very similar at each growth stage. We observed the expected Hfq enrichment at the 5′UTRs and at the 3′UTRs at each growth stage (see Supplementary Fig.1 for examples). After identifying significantly enriched Hfq binding peaks (FDR <= 0.05) we used the genomic coordinates of these peaks to search for Hfq binding motifs in mRNAs. The most enriched k-mer included poly-U stretches (Supplementary Fig.1b), that resemble the poly-U tracts characteristic to Rho-independent terminators found at the end of many bacterial transcripts33, and confirms the motif uncovered in CLIP-seq studies in Salmonella34.
Given the established role of Hfq in sRNA stabilization and mediating sRNA-target interactions, it was logical to assume that changes in Hfq binding and availability directly affect sRNA steady-states. This would imply that the Hfq binding data should show a strong correlation with the RNA-seq data. To test this, we compared the Hfq cross-linking data to the RNA-seq data. K-means clustering of the TPM-normalized read count data revealed 7-9 different patterns of changes in read counts in the Hfq cross-linking and total RNAseq data (Fig. 2a). Reminiscent of recent work performed in Salmonella26, most sRNAs in E. coli appear to be preferentially expressed when the cells reach stationary phase (Fig. 2a). However, much to our surprise, the Hfq cross-linking profile did not always follow the same trend (Fig. 2a). The scatterplot analyses highlight that, globally, changes in sRNA expression are indeed not always recapitulated in their respective Hfq-binding profile (and vice versa) (Fig 2b). The correlation between changes in sRNA expression levels versus changes in Hfq cross-linking was poor at lower cell density (OD600 0.8: r = 0.27) but gradually improved as the cells approach stationary phase (OD600 3.0 r =0.6; Fig. 2b). A very similar result was obtained when doing the same comparison for all Hfq-bound RNAs, including mRNAs (Supplementary Fig.2). Notable examples are ChiX and GcvB. In the case of ChiX, Hfq binding only shows modest changes during the growth phases (Fig 2A; right heat map, cluster 8), whereas sRNA levels steadily increased (Fig. 2a; left heat map; cluster 3). In the case of GcvB, Hfq cross-linking decreased slightly (Fig. 2a; right heat map; cluster 8); however, the sRNA expression levels increased slightly at higher optical densities (Fig. 2a; left heat map; cluster 6). Some sRNAs, such as ArcZ and GlmZ, are close to the trend line in all growth conditions, suggesting Hfq plays an important role in their stabilization (Table S1). However, particularly at low optical densities, many sRNAs deviated from the trend. A striking example is ArrS, an RNA that until recently18 was thought to act only as a cis-encoded sRNA. ArrS is very highly expressed throughout growth (especially during stationary phase) relative to OD600 0.4, but the increase in Hfq cross-linking is comparatively low.
A plausible explanation for the observed results is that Hfq levels were significantly lower during exponential phase (Fig. 1c) and that during this growth phase there may not be sufficient Hfq to bind all sRNAs. We conclude that the dynamics of sRNA expression and binding to Hfq are not always well correlated and that this may be linked to the availability of Hfq in the cell (also see discussion).
Hfq CLASH robustly detects RNA-RNA interactions
To get a complete overview of the RNA-RNA interactions captured by Hfq CLASH, we merged the data from two biological replicates of CLASH growth phase experiments (Table S2). Overlapping paired-end reads were merged and analyzed using the hyb pipeline35. To select RNA-RNA interactions for further analysis, we took into consideration whether the chimera was found in independent experiments, the number of unique chimera counts (after collapsing the data), the minimum folding energy, and occurrence of the chimeras in both orientations (gene_A - gene_B and gene_B - gene_A). The distribution of combinations of transcript classes found in all discovered unique chimeric reads indicates sRNA-mRNA interactions as the most frequent Hfq-mediated interaction type (∼35%) (Fig. 3a). We suspect that this number might even be higher, as more than a quarter of the chimeras contained sRNA and fragments from intergenic regions. Manual inspection of several of these indicated that a number of the intergenic sequences were located near genes for which the UTRs were unannotated or short.
Analysis of the unique interactions uncovered at different growth stages reveals that the majority are growth stage-specific. A relatively small number of interactions were common to all growth conditions, and the clear majority are occurring at the transition between exponential and stationary phase and at the entry into stationary phase (Fig. 3b). Meta-analysis of the mRNA fragments of the chimeras revealed that the majority of mRNA fragments in the chimeras were strongly enriched in 5′UTRs peaking near the translational start codon (Fig. 3c), consistent with the canonical mode of translational inhibition by sRNAs36 and demonstrating the robustness of the data. Some enrichment was also found in 3′UTRs of mRNAs (Fig. 3c). Motif analyses revealed a distinct sequence preference in 5′UTR and 3′UTR binding sites (Fig. 3d, Table S3). The 3′UTR-containing chimera consensus motif corresponds to poly-U transcription termination sites. The A-rich 5′UTR motif is more consistent with Hfq binding to Shine Dalgarno-like (ARN)n sequences37. Two examples of experimentally validated sRNA-target interactions are detailed in Fig. 4. ChiX is an abundant sRNA that down-regulates chiP expression by occluding the Shine-Dalgarno (SD) sequence38,39. In the absence of chitosugars, ChiX also represses chbC by base-pairing to an intercistronic region between chbB and chbC39–41. We investigated in detail the chimeras in our data that support the known ChiX-chiP and ChiX-chbC interactions (Fig. 4a,b). This example also illustrates that even without any statistical filtering of the reads, the Hfq CLASH data has remarkably low background, as shown by the sharp peaks at the known target sites and low background signals elsewhere on the coding sequences. Furthermore, in silico RNA secondary structure analysis of the chimeric sequences was sufficient to precisely map the base-pairing interaction between the sRNA and its target (as shown for ChiX and chiP in Fig. 4b), and confirm the experimentally-validated interacting regions/sequences. The sRNAs ArcZ and RprA showed the highest number of different interactions and the vast majority overlapped the known seed sequence42,43, including a large number with relatively low read counts (2-5; Fig. 4c). This suggests that even low abundance chimeras could represent bona fide interactions. Consistent with this idea, comparing our data to RIL-seq (S-chimeras)18 and validated interactions found in the sRNATarbase 3 database44 showed that ∼50% of the interactions found in all three datasets had low read counts in the Hfq CLASH data (Fig. 4d). Therefore, unique chimeras that were found at least twice in the data were also considered for further analyses. A complete overview of chimeras with a minimum of two unique cDNA counts are provided in Table S2.
Hfq CLASH predicts sRNA-sRNA sponging as a widespread layer of post-transcriptional regulation
We also uncovered a large pool (∼160) of sRNA-sRNA interactions, 95 of which were supported by at least two chimeric reads (Table S4). The sRNA-sRNA network is dominated by several abundant sRNAs that appear to act as hubs – sRNAs with many different interacting partners: ChiX, Spot42 (spf), ArcZ and GcvB (Table S4). The vast majority of the interactions were growth stage specific and the sRNA-sRNA networks show extensive rewiring across the exponential, transition and stationary phases (Fig. 5a). In many cases the experimentally-validated sRNA seed sequences were found in the chimeric reads, for both established and novel interactions. For example, the vast majority of ArcZ sRNA-sRNA chimeras contained the known and well conserved seed sequence (Fig. 5b, Supplementary Fig.3). The exception was the ArcZ-ssrS (6S RNA) interaction. Although we found many unique chimeras supporting this interaction, it involves a region upstream of the ArcZ seed sequence, which is not very well conserved (Supplementary Fig.3). Therefore, we did not study this interaction in more detail.
The sRNA-sRNA chimeras containing CyaR fragments were of particular interest, as the sRNA is primarily expressed during the transition from late exponential to stationary phase (Fig. 2a;45). In the case of CyaR, the known seed sequence as well as a conserved ∼25 nt fragment in the 5′ region was found in chimeras (Fig. 5b; Supplementary Fig.3). Similar seed sequences were identified in CLASH experiments using RNase E as a bait13, suggesting that this region represents a bona fide interaction site. Notably, we identified many ArcZ-CyaR chimeras containing the seed sequence from both sRNAs, suggesting that these sRNAs could inhibit each other’s activity. To validate these findings, we co-expressed the two sRNAs in E. coli and monitored their expression as well as expression of their target mRNAs37. The data were subsequently normalized to the results obtained with a scrambled sRNA to calculate fold changes in expression levels. Since it is difficult to predict directly from the CLASH data which sRNA in each pair acts as the decoy/sponge, we tested both directions. ArcZ overexpression not only significantly decreased the expression of its own mRNA targets (tpx, sdaC) but also CyaR (Fig. 5d, panel I). Concomitantly, we observed a substantial increase in CyaR targets nadE and yqaE (Fig. 5d, panel I). Over-expression of CyaR reduced the level of a direct mRNA target (nadE) but it did not significantly alter the level of ArcZ or ArcZ mRNA targets (tpx and sdaC; Fig. 5d, panel II). This suggests that the regulation is unidirectional. It is, however, important to note that although in this experiment the expression of CyaR increased ∼45-fold, it still did not reach the levels of ArcZ (Supplementary Fig.4). Therefore, it is possible that under the tested conditions the CyaR over-expression was not sufficient to see a sponging effect on ArcZ. Nevertheless, we conclude that ArcZ acts as a CyaR anti-sRNA and can trigger its degradation. We next analysed the CyaR-GcvB sRNA-sRNA interaction that was detected in our Hfq CLASH data. Overexpression of CyaR not only significantly reduced expression levels of its targets (nadE and yqaE) but also that of GcvB and a concomitant increase in the levels of GcvB target dppA (Fig. 5d, panel III). Conversely, although GcvB overexpression destabilised its target dppA46, CyaR levels were not significantly affected. However, levels of CyaR targets nadE and yqaE significantly increased (albeit modestly), suggesting that in this case the regulation is bi-directional and that GcvB can act as a CyaR sponge (Fig. 5d, panel IV). All of the CyaR-GcvB chimeric reads contained fragments of the 5′ region of CyaR and a part of the CyaR seed sequence and the data indicate that these base pair with the previously described R1 and R2 seed sequences (Supplementary Fig.4;46). In addition, we found chimeras containing the 5′ end of GcvB and a portion of the CyaR seed sequence. Interestingly, except for one nucleotide, the first 9 nucleotides in GcvB are identical to the R2 seed sequence (Supplementary Fig.3), suggesting that CyaR interaction with GcvB has alternative binding modes.
In conclusion, this work uncovered two new and functional sRNA-sRNA interactions in E. coli. The biological significance of these interactions is detailed in the discussion.
Hfq CLASH identifies novel sRNAs in untranslated regions
Two lines of evidence from our data indicate that many more mRNAs may be harbouring sRNAs in their UTRs or be involved in base-pairing among themselves. Firstly, almost 13% of the intermolecular chimeras mapped to mRNA-mRNA interactions (Fig. 3a, Table S5). Second, we observed extensive binding of Hfq in 3′UTRs near transcriptional terminators, indicating E. coli 3′UTRs may contain many functional sRNAs, as observed in Salmonella31. We uncovered 3′UTRs of mRNAs base-pairing with other mRNA in 5′UTRs, coding sequences and 3′UTRs (Fig. 6a). We identified 148 3′UTR-containing mRNA fragments that were involved in 375 unique interactions (minimum two unique reads), 59 of which were also present in the RIL-seq S-chimeras data18. Thirteen of the 3′UTRs were found as part of chimeras in the RIL-seq data, while 12 appeared stabilised upon transient inactivation of RNase E performed in Salmonella (TIER-seq data31); Fig. 3a, Table S5). Out of the 375 3′UTR-mRNA chimeras, 70 were 3′UTRs fused to 5′UTRs of mRNAs, suggesting that these may represent 3′UTR-derived sRNAs that base-pair with 5′UTRs of mRNAs, a region frequently targeted by sRNAs. Strikingly, many of 3′UTR fragments in these chimeras mapped to cpxP, which in Salmonella is known to harbour the CpxQ sRNA (Table S5). Our analyses greatly increased the number of potential CpxQ mRNA targets. We also identified six mRNA 3′UTRs that were uncovered in all three datasets highlighted in Fig. 6a, suggesting these were most likely to contain sRNAs. Northern blot analyses confirmed the presence of sRNAs in malG and gadE 3′UTRs (Fig 6b, Supplementary Fig.5a). The latter was also recently experimentally confirmed in the RIL-seq data and was annotated as gadF18. Furthermore, significant Hfq cross-linking could be detected in the 3′UTRs of these transcripts (Fig. 6c). Our Hfq CLASH data revealed many more 3′UTRs that could potentially harbor sRNAs (Table S5). For example, we could show that the 3′UTR of ygaM, which was found in chimeric reads in our data, also likely contains an sRNA as a short fragment originating from the ygaM 3′UTR could be detected that cross-linked robustly to Hfq (Fig. 6b,c and Supplementary Fig.5b). We conclude that 3′UTRs may be a larger reservoir of sRNAs than previously anticipated.
The majority of sRNAs are more abundant at higher cell densities (including the gadE-, ygaM-derived sRNAs and rybB; see Fig. 6b, 2a), most likely because Hfq is highly expressed at these stages (Fig. 1c). In sharp contrast, the sRNA fragment identified in the malG 3′UTR was expressed very transiently and peaked at an OD600 of 1.8 (Fig. 6b), reminiscent of what was observed for CyaR (Fig. 1a). We envisioned that the particularly transient expression of this sRNA may be correlated with a role in the adaptive responses triggered during transition from exponential to stationary phases of growth. We named it MdoR (mal-dependent OMP repressor) and characterized it in detail.
MdoR is a 104 nt sRNA that contains part of the malG coding sequence, including the stop codon and it contains a Rho-independent terminator (Supplementary Fig.6a,b). The full-length malG and malEFG transcripts recapitulate the same expression profile: both peak at OD600 of 1.8 and drop to almost undetectable levels at OD600 2.4, whereas a fraction of MdoR molecules remain stable at OD600 2.4, perhaps due to stabilization by Hfq (Fig. 7a,b). Our RNAseq data revealed that all genes involved in maltose transport and metabolism have the same cell-density dependent dynamics in expression (Fig. 7a). Clearly, under our growth conditions, the mal regulon is specifically induced as cells reach the late exponential/transition into stationary phase. The similarity in the shape of MdoR expression profile shows that this sRNA is not made under different conditions than the transcription unit it overlaps, corroborating to the hypothesis that it is not independently produced. Additionally, we identified shorter malG 3′UTR-containing fragments of intermediate length between malG and MdoR that could be degradation intermediates (Fig. 7b). The detection of these intermediate species suggests that the malEFG primary transcript is undergoing serial ribonucleolytic cleavage steps. The fact that a malG fragment was also detected in the Salmonella TIER-seq data31 suggests that this sRNAs is released through RNase E processing or degradation. In addition, no clear promoter sequence could be found in the malG coding sequence, supporting the notion that MdoR is not generated as a primary transcript. Collectively, this suggests that MdoR is a product of ribonucleolytic processing and/or degradation of the malEFG or malG transcripts. To test this possibility, we used a temperature-sensitive RNase E mutant strain (rnets 47). The expectation was that MdoR levels would decrease at the non-permissive temperature. Unfortunately, MdoR was undetectable after shifting the parental strain to 43°C, suggesting the sRNA is unstable at elevated temperatures. In the RNase E mutant, however, MdoR became stabilized at the non-permissive temperature (Supplementary Fig.6c). We conclude that RNase E can degrade MdoR, however, the data do not allow us to determine whether the sRNA is also released from the malG 3′UTR by the same nuclease. As an alternative approach to test whether these intermediate species and MdoR are degradation products, we asked whether these products contained mono-phosphates at their 5′ termini, a hallmark of many ribonuclease cleaved RNAs. Total RNA extracted from cells at OD600 1.2 and 1.8, conditions in which MdoR is most abundant, was treated with Terminator 5′-Phosphate Dependent Exonuclease (5′-P Exo) (Fig. 7c). This exonuclease degrades RNA species with 5′ mono-phosphate but not 5′-triphosphate ends. MdoR could not be detected in RNA samples treated with 5′-P Exo (Fig. 7c, lanes 2 and 4), suggesting it bears a 5′ mono-phosphate. In contrast, RybB, an sRNA with a 5′ tri-phosphate transcribed from an independent promoter48,49 was a poor substrate for the exonuclease (Fig. 7c). As a positive control we included RaiZ, an sRNA with a 5′-mono-phosphate generated by RNase E processing8. Like MdoR, RaiZ was degraded in the presence of 5′-P Exo. 5′-P Exo treatment of the total RNA also reduced the levels of the longer intermediate species as well as the full-length malG. These data support a mechanism by which the full-length polycistronic RNA undergoes decay that is initiated at a site in the upstream malEFG region. It is known that constitutive processing of malEFG mRNA produced from a single promoter allows uncoupling of gene expression within the operon. The distal gene malE is clipped off and selectively stabilized, allowing it to be expressed at higher levels than other members of the operon50. This maturation process involves the degradosome51 and turnover generates 5′-mono-phosphorylated mRNA fragments that promote further rounds of cleavage that ultimately degrade malG. The malG 3′UTR, however, would be less susceptible to degradation as it is stabilized by Hfq binding.
MdoR directly regulates the expression of major outer membrane porins and represses the envelope stress response pathway
The MdoR chimeras frequently contained 5′UTR fragments of two mRNAs encoding major porins, ompC and ompA (Fig. 8a-c). We also found MdoR chimeras containing fragments of a cis-encoded sRNA, OhsC (Fig. 8d), suggesting that either MdoR controls its expression by sponging, or vice versa). The most abundant and favourable interactions of MdoR with mRNAs (ompC and ompA) appear to be utilizing roughly the same region of the malG 3′UTR for base-pairing (Fig. 8b,c), indicative that the corresponding site on the predicted sRNA may be a main, functional seed. The predicted interaction between MdoR and ompC is unusually long and consists of two stems interrupted by a bulge, suggesting that these two RNAs form a stable complex. Conservation analyses and in silico target predictions (CopraRNA52,53) indicate that the seed sequence predicted by CLASH is relatively well-conserved (Supplementary Fig.7A), and could be utilized for the regulation of multiple targets (Supplementary Fig.7b).
To verify the MdoR CLASH data we pulse-overexpressed the sRNA from a plasmid-borne arabinose inducible promoter followed by total RNA sequencing (Fig. 8e). To minimize secondary changes in gene expression, cells were harvested after 15 minutes of MdoR induction, by which time the sRNA was readily detectable. Note that induction was performed at OD660 = 0.4, when endogenous levels of MdoR are very low. As a control we used cells harbouring an empty vector. Differential gene expression analysis (DESeq2;54), identified ∼20 transcripts that were significantly overrepresented in the control data compared to the MdoR overexpression data (Fig. 8f; Table S6). Thus, these are likely downregulated by MdoR in vivo. This included the sigma factor rpoE (σE), which plays an important role in controlling gene expression during stress responses, including envelope stress55–58. The observed reduction in levels of the anti-σE protein RseA can be explained by the fact that σE and RseA are part of the same operon. Intriguingly, MdoR overexpression also reduced the levels of sRNAs RyeA and MicA, the latter of which depends on σE for its expression59. In Salmonella MicA down-regulates another outer membrane protein, LamB, a high affinity maltose/maltodextrin transporter60. Fragments of three mRNAs (ompC, ompA and ptsH) that were found in MdoR chimeric reads were also differentially expressed in the MdoR overexpression RNA-seq data, providing strong evidence that these are direct MdoR targets. PtsH is a component of the phosphoenolpyruvate-dependent sugar phosphotransferase system. Of note is that five unique MdoR-ptsH chimeric reads were identified in the CLASH data, reinforcing the notion that even chimeras with low unique read counts in CLASH can represent genuine interactions.
All the available data suggest that MdoR is part of a mixed feed forward loop that enhances the uptake of maltose/maltodextrin by maltose transporters (also see discussion). By base-pairing with the 5′UTRs of the mRNAs it firstly reduces the flux of the more general porins such as OmpC and OmpA to the outer membrane surface. Secondly, we propose that MdoR stimulates the accumulation of the high-affinity maltose transporter LamB in the OMP by suppressing the inhibitory σE pathway and MicA. To further test this model, we performed additional validation experiments. Firstly, we confirmed the DESeq2 results for a number of the regulated genes (ompC, rpoE, micA and ryeA) by RT-qPCR (Fig. 9a). We included an MdoR mutant in which seed sequence in stem 1 was changed into its complementary sequence (Supplementary Fig.6a). As a control we also included the RybB sRNA, which is known to regulate both rpoE and ompC expression48,61,62. In all cases, the MdoR SM mutations abolished the negative regulatory effect on target expression (Fig. 9a).
To demonstrate direct target regulation in vivo, we employed a well-established reporter system where an sRNA is co-expressed with a construct containing the mRNA target region fused to the coding sequence of superfolder green fluorescent protein (sfGFP)63,64 (Supplementary Fig.8b). Fusions for ompC, ompA and sE were constructed, but only the OmpC and OmpA-sfGFP reporters produced stable fusions that could be analysed. We also included an MdoR sRNA seed sequence mutant (MdoR SM) and an ompC mutant containing compensatory mutations (OmpC SM; Supplementary Fig.8a). GFP expression and cell density were quantified in plate readers. As a positive control we used the MicC and RybB sRNAs they regulate E. coli ompC expression62,65. Fluorescence measurements confirmed that levels of OmpC-sfGFP and OmpA-sfGFP fusions were significantly lower in cells expressing MdoR (Supplementary Fig.8c). Importantly, MdoR over-expression did not change fluorescence in cells only expressing the GFP coding sequence (Supplementary Fig.8c). Mutating the MdoR seed region largely restored OmpA- and OmpC-sfGFP reporter levels to the levels of the no sRNA negative control. The MdoR SM mutant was still able to partially suppress expression of the OmpC SM-sfGFP mutant, suggesting that base-pairing interactions between these two mutants is less stable compared to the wild-type (Supplementary Fig.8c). The wild-type MdoR was also able to partially suppress expression of the ompC SM mutant. We suggest that the predicted base-pairing interactions between MdoR and ompC in the second stem (Supplementary Fig.8a) might be sufficient to partially suppress ompC expression. Regardless, the data strongly imply that MdoR directly regulates ompC expression.
To distinguish between RNA degradation and regulation at the level of translation, we performed two types of experiments. We first measured expression of the GFP reporters by RT-qPCR using GFP-specific primers. The results were essentially identical to the GFP fluorescence measurements (Fig. 9b); reduced ompC-sfGFP mRNA levels were observed in cells expressing MdoR, but not in cells expressing MdoR SM. MdoR was still able to partially suppress the ompC SM mutant, but MdoR SM was able to downregulate ompC SM significantly better. Finally, we also performed polysome profiling experiments to assess the level of ompC translation upon overexpression of MdoR. Although this did not noticeably affect 70S and polysome levels (Supplementary Fig.9a), we observed a significant (∼37%) reduction of ompC mRNA in the polysomal fractions (Supplementary Fig.9b). This suggests that MdoR can also repress translation of ompC. However, because in this experiment over-expression of MdoR reduced ompC mRNA levels ∼5-fold, we conclude that MdoR predominantly regulates OmpC protein expression by targeting the ompC mRNA for degradation.
MdoR enhances maltoporin expression during maltose fermentation
To further substantiate these results, we next switched to a more controlled system to investigate the effect of endogenous MdoR on its targets. In order to determine whether MdoR has a role in adaptation to maltose-metabolising conditions, single overnight cultures grown in glucose were split and (re)inoculated in fresh medium containing either glucose or maltose as the sole carbon sources, and expression of several mal regulon genes and MdoR targets were quantified. We show that MdoR and its parental transcript malG are almost undetectable during growth in glucose, and highly expressed during growth in maltose (∼35-fold increase, Supplementary Fig.10a,b). This is consistent with catabolite repression of the mal regulon by glucose, and its induction my maltose66. Intriguingly, we observed that ompC mRNA levels are overall significantly lower during growth in maltose, compared to glucose (Supplementary Fig.10a). This suggests that porin expression is also regulated by nutrient source in E. coli. Similarly, MicA, a repressor of LamB synthesis, has reduced levels in maltose compared to glucose (Supplementary Fig.10a). We next mutated the entire seed sequence of the chromosomal copy of MdoR (both stem 1 and 2) in the chromosomal context, to completely disrupt base-pairing with ompC mRNA. Notably, the mutant MdoR sRNA is less abundant than the wild-type and longer fragments that contain upstream malG regions could be more readily detected (Supplementary Fig.10b). We speculate that the mutation in the upstream MdoR sequence might have affected RNase E recruitment or cleavage, impairing MdoR processing. The MdoR mutant strain also accumulated significantly higher levels of MicA (Fig. 9c) and less lamB. OmpC mRNA levels were also slightly elevated in the mutant in maltose relative to the parental strain, although in this case a similar increase was also observed in glucose.
Collectively, the data strongly suggest a role for MdoR in enhancing the uptake of maltose when more favourable carbon sources become limiting.
Discussion
Microorganisms need to constantly adapt their transcriptional program to counteract changes in their environment, such as changes in temperature, cell density and nutrient availability. In bacteria, small RNAs (sRNAs) and their associated RNA-binding proteins are believed to play a key role in this process. By controlling translation and degradation rates of mRNAs upon stress imposition67–69, they can regulate the kinetics of gene expression as well as suppress noisy signals. A major challenge for bacteria is the transition from exponential growth to stationary phase, when the most favourable nutrients become limiting. To counteract this challenge, cells need to rapidly remodel their transcriptome to be able to efficiently metabolize alternative carbon sources. This transition is very dynamic and involves activation and repression of diverse metabolic pathways. However, it is unclear to what degree sRNAs contribute to this transition. The most useful pieces of information would be to know what sRNAs are upregulated during this transition phase and to identify their RNA targets. This would help to uncover the regulatory networks that govern this adaptation, as well as provide a starting point for performing more detailed functional analyses on sRNAs predicted to play a key role in this process. For this purpose, we performed UV cross-linking, ligation and sequencing of hybrids (CLASH28) to unravel the sRNA-target interactions during this transition. Using Hfq as a bait we uncovered thousands of unique sRNA-target interactions. Our data are consistent with previously published work13,18 but we also identified a large number of novel interactions, in particular sRNA-sRNA interactions. Our data indicate that sRNA sponging may be more prevalent than previously anticipated. We validated a number of our CLASH findings and uncovered a novel 3′UTR derived sRNA that plays a role in enhancing uptake of an alternative carbon source during the transition to stationary phase by inhibiting competing pathways.
Improving Hfq CLASH
Our S. cerevisiae Cross-linking and cDNA analysis data (CRAC;70) showed that a percentage of the cDNAs were formed by intermolecular ligations of two RNA fragments (chimeras) that were known to base pair in vivo28. These findings prompted the development of protocols that enable enrichment of sRNA-target chimeric reads using Hfq as an obvious bait. The initial Hfq UV cross-linking data (CRAC;37) did not yield sufficiently high numbers of chimeric reads to extract new biological insights and it was proposed that duplexes formed by Hfq are rapidly transferred to the RNA degradosome in line with observations from other groups71–73, greatly reducing the likelihood of capturing sRNA-target interactions with Hfq using CLASH13. However, a recent study demonstrated that Hfq can be used effectively as a bait to enrich for sRNA-target duplexes under lower-stringency purification conditions suggesting that sRNA-mRNA duplexes are sufficiently stable on Hfq to allow purification18. This encouraged us to further optimize the CLASH method. We made a number of relatively small changes to the protocol that, when combined, enabled us to recover a large number of sRNA-target chimeric reads (detailed in Materials and Methods). We shortened various incubation steps to minimize RNA degradation and performed very long and stringent washes after bead incubation steps to remove background binding of non-specific proteins and RNAs. We very carefully controlled the RNase digestion step that is used to trim the cross-linked
RNAs prior to making cDNA libraries; this ensured that we would recovered longer chimeric RNA fragments. The resulting cDNA libraries were paired-end sequenced to increase the recovery of chimeric reads with high mapping scores from the raw sequencing data. These improvements led to a substantial improvement in the recovery of chimeric reads (9.5% compared to 0.001%. 0.71% were intermolecular chimeras). Another important change we made is that we substantially reduced the UV cross-linking times required for Hfq cross-linking. We recently developed the Vari-X-linker32, which enables us to cross-link protein-RNA interactions in actively growing cells in seconds and we used filtration devices to rapidly harvest the bacterial cells. We previously showed that filtration combined with short UV cross-linking times dramatically reduces noise introduced by the activation of the DNA damage response and significantly increased the recovery of short-lived RNA species32. Encouragingly, there is considerable overlap between published Hfq RIL-seq data and our CLASH data (Table S1), however, we detected a large number of interactions that were not previously described. We speculate that many of these interactions represent short-lived RNA duplexes that are preferentially captured with our UV cross-linking and rapid cell filtration setup.
A major strength of CLASH is that the purification steps are performed under highly stringent and denaturing conditions. During the first FLAG affinity purification steps the beads are extensively washed with high salt buffers and the second Nickel affinity purification step is done under completely denaturing conditions (6M guanidium hydrochloride) to ensure that only RNAs that are covalently bound to the protein of interest are purified. These two purification steps, combined with the more extensive washes we have introduced, can generate high quality Hfq CLASH data. Consistent with its canonical role in RNA metabolism, meta analyses of the raw data showed a strong enrichment of Hfq in 5′-and 3′UTRs (Supplementary Fig.1a, c) and motif analyses of read clusters uncovered the known Hfq RNA-binding motifs (Supplementary Fig.1b and Fig. 3d). Moreover, when we mapped the mRNA fragments from the chimeric reads to the genome (without prior statistical filtering), discrete peaks were detected that predominantly mapped to UTRs and the background signal was low. This demonstrates the overall robustness and the specificity of our improved CLASH protocol. Comparison of our Hfq CLASH data with the Hfq RIL-seq data and previously validated interactions (sRNATarbase3) revealed that ∼50% of the shared interactions were relatively low abundant in our data (after collapsing the data; Fig. 4d). This was unexpected as intuitively one would predict that chimeras with low read counts would have a higher likelihood of representing spurious interactions. Analysis of the ArcZ and ChiX chimeric fragments revealed that the vast majority overlapped with the known seed sequences (Fig. 4c), even chimeras with counts as low as two. Altogether, this suggest that many of the low abundance chimeras in CLASH data represent bona fide interactions. Therefore, in this manuscript we reported all chimeras that have at least two read counts (see Supplementary Tables S1-S3).
Correlation between Hfq binding and steady state levels of RNA substrates
Hfq binding to sRNAs is important for stabilizing the sRNA as well as promoting sRNA-target interactions. We therefore assumed therefore that cDNA counts obtained from the Hfq CLASH data would show a strong positive correlation with the steady state RNA levels. However, a strong correlation could only be observed at the end of exponential phase. The prevailing wisdom is that many sRNAs would be degraded if not associated with Hfq, therefore this was quite a surprising finding. We propose that this may be linked to the availability of Hfq, as the protein is ∼15 times less abundant at exponential phase as compared to stationary (Fig. 2c). Similar changes in Hfq expression at different growth phases have also been observed in pathogenic bacteria74,75. It is conceivable that at different growth stages sRNAs are packaged in different RNPs and that the composition of these complexes is dynamic. The E. coli sRNA IsrA/McaS has been shown to associate with a large number of different proteins, including Hfq, ProQ and CsrA; RNA chaperones that are known to bind and stabilize sRNAs and regulate sRNA-target interactions34,76–78. It is tempting to speculate that the composition of this RNP may be growth phase dependent. Therefore, Hfq may not be essential to stabilize all sRNAs at low cell densities and their stability may vary at different growth stages. A plausible model is that some sRNAs are sequestered and sufficiently stabilized by other RBPs (such as ProQ and CsrA77) during early growth stages and that Hfq can only stably associate with these RNPs once expression levels are sufficiently high.
The 3′UTR derived sRNA MdoR functions in a mixed feed forward loop by suppressing opposing pathways
We found a large number of chimeras that represent over a hundred distinct intermolecular interactions between 3′UTRs and other mRNA features, which implicate direct mRNA-mRNA communication. One mechanism by which mRNA crosstalk is achieved is by sRNA generation through degradation of the parental mRNA. These fragments were primarily described as decoys or sponges for other sRNAs79, but could act as trans-acting sRNAs as well29. Among the interactions in our data we found 70 interactions between 3′UTRs and 5′UTRs (Table S5). We speculate many of these represent novel 3′UTR-derived sRNAs that target 5′UTRs of mRNAs. These could either have been processed from 3′UTRs in an RNase E (or other RNase)-dependent manner or transcribed from an internal promoter. Some of these predictions were validated by our follow up work (discussed below) and, while this work was in progress, also by others18. One of the 3′UTR-derived sRNAs we uncovered (MdoR) was of particular interest as it is only detected during the transition from late exponential to early stationary phase. A model for how we believe MdoR functions in maltose uptake is shown in Figure 10a. We showed that MdoR directly regulates the expression of outer membrane porins (OmpA and OmpC) through direct base-pairing interactions and inhibits the membrane stress response activation by suppressing σE and the MicA sRNA. Our data do not allow us to determine whether MdoR downregulates σE and MicA directly, however, it is worth mentioning that a low number of chimeric reads between the 3′UTR of malG and MicA were found in the unfiltered RIL-Seq data18, suggesting it could well be direct. Unfortunately, we were unable to test direct interactions between MdoR and σE in vivo as all the σE-GFP fusions we generated were not stably expressed. Although several 3′UTR derived sRNAs have been described in diverse bacterial species26,29,31,79–81, MdoR is unique in a sense that it not only a 3′UTR derived sRNA that targets multiple pathways, but it is also part of a mixed feed forward loop that enhances the uptake of the alternative carbon sources maltose and maltodextrin by suppressing opposing pathways (Fig. 10a). The genetic structure and transcriptional regulation of the mal regulon are well understood. However, its post-transcriptional regulation has remained largely unexplored. Our work uncovered new links between the maltose uptake (mal) regulon, envelope stress-responses and membrane composition/assembly pathways. While initially cells are initially metabolising more favourable carbon sources such as glucose, these carbon sources are generally rapidly depleted and bacteria need to quickly switch to alternative C-sources, such as maltose and maltodextrins. During maltose utilisation, the malEFG operon is transcribed by the MalT transcription factor and the transcript is processed by RNase E as well as other degradosome components51. Here we show that the MdoR sRNA is a product of malEFG processing, which is protected from further degradation by Hfq. Efficient uptake of maltose/maltodextrin not only requires the inner membrane transporters encoded by the malEFG operon, but also the high-affinity transporter LamB (Fig. 10a), which cooperates with the inner membrane proteins to import these carbon sources. LamB is significantly upregulated when cells start to utilize alternative carbon sources. To promote maltose uptake via LamB, MdoR downregulates MicA, an sRNA repressor of LamB. This suppression enhances the translation of lamB. Expression of OMPs, however, needs to be carefully coordinated as any changes in the protein composition of the OM, such as changes in the levels of LamB, can lead to induction of the σE envelope stress response82. Therefore, to ensure efficient LamB production, we propose that MdoR (most likely indirectly) suppresses the σE pathway that negatively regulates lamB expression through MicA (Fig. 10a). This regulatory cascade triggers envelope remodelling and requires chaperones for correct insertion of LamB into the membrane. We propose that the MdoR-mediated downregulation of rpoE (∼30% reduction upon MdoR overexpression) moderates σE induction caused by the reorganisation of the membrane when maltodextrins are utilised. When ectopically expressed at high levels, MdoR reduces mRNA levels of ompC, ompA and other membrane proteins. Reducing the synthesis of OmpA and OmpC proteins may free up these resources enabling more efficient production of LamB. Due to the very high abundance and intrinsic stability of many of these OMP mRNAs (minute-long half-lives), we rationalise that even a mild reduction in their steady state can profoundly relieve the pressure on the OMP synthesis and assembly pathways83. The net outcome of mal regulon transcription, MdoR biogenesis and regulatory activity, is increased expression of high-affinity components of maltose-specific transport (MalE and LamB). This results in higher diffusion rates of maltodextrins inside the cell.
CLASH uncovers a large network of sRNA-sRNA interactions
Our analyses uncovered an unexpectedly large number of sRNA-sRNA interactions. The majority of these interactions involved known sRNA seed sequences, suggesting that these could represent bona fide sponging interactions that prevent sRNAs from base-pairing with their targets. About 40 sRNA-sRNA interactions involving sRNAs from the core genome were detected in the RNase E CLASH dataset, about a quarter of which were also detected in our Hfq data. The relatively low overlap between the datasets is not surprising, given the differences in the growth conditions (virulence inducing conditions for EHEC E. coli vs growth transitions for E. coli MG1655). Moreover, many of the sRNA-sRNA interactions recovered in association with RNase E are likely duplexes that are in the process of being degraded. Many of the sRNA-sRNA interactions unique to our Hfq data may represent sponging interactions between anti-sRNAs and seed sequences of sRNAs that interfere with target regulation and may not necessarily involve recruitment/activity of RNase E.
The chimeras containing CyaR fragments were of particular interest as CyaR is preferentially expressed during the transition from late exponential to stationary phase (Fig. 2a45,84) and may therefore play an important role in adaptation to nutrient availability. We could show that over-expression of ArcZ reduced CyaR levels and that CyaR can down-regulate GcvB levels (Fig. 10b). Interestingly, over-expression of ArcZ in Salmonella showed a dramatic reduction in CyaR bound to Hfq and upregulation of CyaR targets, such as nadE42, suggesting that this activity is conserved between these two Gram-negative bacteria. The fate of these sRNA-sRNA duplexes may depend on where the interactions take place. Folding of the chimeric reads suggests that ArcZ preferentially base-pairs with CyaR at the 5′ end (Supplementary Fig.3a), which may alter secondary structures that normally help to stabilize the sRNA. The regulation appeared to be unidirectional (Fig. 10b), however, under the tested conditions ArcZ levels were always higher than CyaR, even after over-expressing the latter. Therefore, it is certainly possible that under conditions were ArcZ levels are lower, CyaR may be able to exert a regulatory effect.
Unlike previously reported sRNA decoys or sponges39,40,79, ArcZ and CyaR are transcribed from independent promoters, and target mRNAs associated with many different processes. Thus, these interactions would be expected to connect many different pathways. For example, ArcZ regulation of CyaR may connect adaptation to stationary phase/biofilm development45,85 to quorum sensing and cellular adherence45. CyaR expression is controlled by the global regulator Crp. Most of the genes controlled by Crp are involved in transport and/or catabolism of amino acids or sugar. It was therefore surprising that, as noted by the authors45, the mRNAs targeted by CyaR had little to do with these processes. Our data revealed a missing link and suggest that CyaR indeed plays a role in adaptation to nutrient availability by supressing the activity of GcvB, which negatively regulates mRNAs involved in amino acid and peptide uptake (Fig. 10b). We propose that CyaR is part of a mixed coherent feed forward loop that stimulates amino acid uptake during the transition from late exponential to the stationary phase (Fig. 10b). Interestingly, ArcZ downregulates the sdaCB dicistron which encodes for proteins involved in serine uptake and metabolism42. This operon has been shown to be regulated by Crp as well, suggesting that ArcZ can counteract the activity of Crp.
Materials and Methods
Bacterial strains and culture conditions
An overview of the bacterial strains used in this study is provided in Table S7. The E. coli MG1655, TOP10 or TOP10F’ strains served as parental strains. The E. coli K12 strain used for CLASH experiments, MG1655 hfq∷HTF was previously reported37. Cells were grown in Lysogeny Broth (LB) or minimal medium with supplements (1xM9 salts, 2 mM MgSO4,0.1 mM CaCl2, 0.03 mM thiamine, 0.2% carbon-source) at 37°C under aerobic conditions with shaking at 200 rpm. The media were supplemented with antibiotics where required at the following concentrations: ampicillin - 100 µg/ml, chloramphenicol - 25 µg/ml, kanamycin – 50 µg/ml. Where indicated, 0.2% glucose or maltose were used. For induction of sRNA expression from plasmids, 1 mM IPTG, 200 nM anhydrotetracycline hydrochloride or 0.2% L-arabinose were used.
Construction of sRNA expression plasmids
For the pulse-overexpression constructs, the sRNA gene of interest was cloned at the transcriptional +1 site under Para control by amplifying the pBAD+1 plasmid (Table S7) by inverse PCR using Q5 DNA Polymerase (NEB). The pBAD+1 template is derived from pBADmycHis A37. The sRNA genes and seed mutants (SM) were synthesized as ultramers (IDT; Table S7), which served as the forward primers, as described(Tree et al., 2014). The reverse primer (oligo pBAD+1_5P_rev) bears a monophosphorylated 5′-end to allow blunt-end self-ligation. The PCR reaction was digested with 10U DpnI (NEB) for 1h at 37°C and purified by ethanol precipitation. The sRNA-pBAD linear PCR product was circularized by self-ligation, performed as above. Ligations were transformed in DH5α competent cells. Positive transformants were screened by sequencing. The control plasmid pBAD+1 was constructed similarly by self-ligation of the PCR product generated from oligonucleotides pBAD+1_XbaI_fwd and pBAD+1_5P_rev. Small RNA overexpression constructs derived from the pZA21MCS and pZE12luc (Expressys) were generated identically, using the indicated ultramers in Table S7 as forward primers, and oligos pZA21MCS_5P_rev and pZE12_5P_rev as reverse primers, respectively, and transformed in E. coli TOP10F’.
Construction of mRNA-superfolder GFP fusions
Table S7 lists all the plasmids, gene fragments and primers used for cloning procedures in this work. To construct constitutively expressed, in-frame mRNA-sfGFP fusions for the fluorescence reporter studies, the 5′UTR, start codon and first ∼5 codons of target genes were cloned under the control of PLtetO-1 promoter in a pXG10-SF backbone as previously described63,64. Derivatives of the target–GFP fusion plasmids harboring seed mutations (SM) were generated using synthetic mutated gene-fragments (IDT, Table S7). To prepare the inserts, the target region of mRNA of interest was either amplified by PCR from E. coli genomic DNA or synthesized as g-blocks (IDT) and cloned using NheI and NsiI restriction sites. Transformants were screened by restriction digest analysis and verified by Sanger sequencing (Edinburgh Genomics).
Hfq UV Cross-linking, Ligation and AnalysiS of Hybrids (Hfq-CLASH)
CLASH was performed essentially as described13, with a number of modifications including changes in incubation steps, cDNA library preparation, reaction volumes and UV cross-linking. E. coli expressing the chromosomal Hfq-HTF were grown overnight in LB at 37°C with shaking (200 rpm), diluted to starter OD600 0.05 in fresh LB, and re-grown with shaking at 37°C in 750 ml LB. A volume of culture equivalent to 80 OD600 per ml was removed at the following cell-densities (OD600): 0.4, 0.8, 1.2, 1.8, 2.4, 3.0 and 4.0, and immediately subjected to UV (254 nm) irradiation for 22 seconds (∼500 mJ/cm2) in the Vari-X-linker32. Cells were harvested using a rapid filtration device 32 onto 0.45 μM nitrocellulose filters (Merck Millipore) and flash-frozen on the membrane in liquid nitrogen. The following day, the membranes were washed with ∼15 ml ice-cold phosphate-buffered saline (PBS), and cells were harvested by centrifugation. Cell pellets were lysed by bead-beating in 1 volume per weight TN150 buffer (50mM Tris pH 8.0, 150 mM NaCl, 0.1% NP-40, 5 mM β-mercaptoethanol) in the presence of protease inhibitors (Roche), and 3 volumes 0.1 mm Zirconia beads (Thistle Scientific), by performing 5 cycles of 1 minute vortexing followed by 1-minute incubation on ice. One additional volume of TN150 buffer was added. To reduce the viscosity of the lysate and remove contaminating DNA the lysate was incubated with RQ1 DNase I (10U/ml Promega) for 30 minutes on ice. Two-additional volumes of TN150 were added and mixed with the lysates by vortexing. The lysates were centrifuged for 20 minutes at 4000 rpm at 4°C and subsequently clarified by a second centrifugation step at 13.4 krpm, for 20 min at 4°C. Purification of the UV cross-linked Hfq-HTF-RNA complexes and cDNA library preparation was performed as described 70. Cell lysates were incubated with 50 μl of pre-equilibrated M2 anti-FLAG beads (Sigma) for 1-2 hours at 4°C. The anti-FLAG beads were washed three times 10 minutes with 2 ml TN1000 (50 mM Tris pH 7.5, 0.1% NP-40, 1M NaCl) and three times 10 minutes with TN150 without protease inhibitors (50 mM Tris pH 7.5, 0.1% NP-40, 150mM NaCl). For TEV cleavage, the beads were resuspended in 250 μl of TN150 buffer (without protease inhibitors) and incubated with home-made GST-TEV protease at room temperature for 1.5 hours. The TEV eluates were then incubated with a fresh 1:100 dilution preparation of RNaceIt (RNase A and T1 mixture; Agilent) for exactly 5 minutes at 37°C, after which they were mixed with 0.4g GuHCl (6M, Sigma), NaCl (300mM), and Imidazole (10mM). Note this needs to be carefully optimized to obtain high-quality cDNA libraries. The samples were then transferred to 50 μl Nickel-NTA agarose beads (Qiagen), equilibrated with wash buffer 1 (6 M GuHcl, 0.1% NP-40, 300 mM NaCl, 50 mM Tris pH 7.8, 10 mM Imidazole, 5 mM beta-mercaptoethanol). Binding was performed at 4°C overnight with rotation. The following day, the beads were transferred to Pierce SnapCap spin columns (Thermo Fisher), washed 3 times with wash buffer 1 and 3 times with 1xPNK buffer (10 mM MgCl2, 50mM Tris pH 7.8, 0.1% NP-40, 5 mM beta-mercaptoethanol). The washes were followed by on-column TSAP incubation (Thermosensitive alkaline phosphatase, Promega) treatment for 1h at 37°C with 8 U of phosphatase in 60 μl of 1xPNK, in the presence of 80U RNasin (Promega). The beads were washed once with 500 μl wash buffer 1 and three times with 500 μl 1xPNK buffer. To add 3′-linkers (App-PE - Table S7), the Nickel-NTA beads were incubated in 80 μl 3′-linker ligation mix with (1 X PNK buffer, 1 µM 3′-adapter, 10% PEG8000, 30U Truncated T4 RNA ligase 2 K227Q (NEB), 60U RNasin). The samples were incubated for 4 hours at 25°C. The 5′-ends of bound RNAs were radiolabeled with 30U T4 PNK (NEB) and 3μl 32P-γATP (1.1µCi; Perkin Elmer) in 1xPNK buffer for 40 min at 37°C, after which 100 mM cold ATP (Roche) was added to a final concentration of 1mM, and the incubation prolonged for another 20 min to complete 5′-end phosphorylation. The resin was washed three times with 500 μl wash buffer 1 and three times with equal volume of 1xPNK buffer. For on-bead 5′-linker ligation, the beads were incubated 16h at 16°C in 1xPNK buffer with 40U T4 RNA ligase I (NEB), and 1 μl 100 μM L5 adapter (Table S7), in the presence of 1mM ATP and 60U RNasin (Promega). The Nickel-NTA beads were washed three times with wash buffer 1 and three times with buffer 2 (50 mM Tris–HCl pH 7.8, 50 mM NaCl, 10 mM imidazole, 0.1% NP-40, 5 mM β-mercaptoethanol). The protein-RNA complexes were eluted in two steps in new tubes with 200 μl of elution buffer (wash buffer 2 with 250 mM imidazole). The protein- RNA complexes were precipitated on ice by adding TCA to a final concentration of 20%, followed by a 20-minute centrifugation at 4°C at 13.4 krpm. Pellets were washed with 800 μl acetone, and air dried for a few minutes in the hood. The protein pellet was resuspended and incubated at 65°C in 20 μl 1x NuPage loading buffer (Novex), resolved on 4–12% NuPAGE gels, and visualised by autoradiography. The cross-linked proteins-RNA were cut directly from the gel and incubated with 160 μg of Proteinase K (Roche) in 600 μl wash buffer 2 supplemented with 1% SDS and 5 mM EDTA at 55°C for 2-3 hours with mixing. The RNA was subsequently extracted by phenol-chloroform extraction and ethanol precipitated. The RNA pellet was directly resuspended in RT buffer and was transcribed in a single reaction with the SuperScript IV system (Invitrogen) according to manufacturer’s instructions using the PE_reverse oligo as primer. The cDNA was purified with the DNA Clean and Concentrator 5 kit (Zymo Research) and eluted in 11 μl DEPC water. Half of the cDNA (5 μl) was amplified by PCR using Pfu Polymerase (Promega) with the cycling conditions (95°C for 2 min; 20-24 cycles: 95°C for 20s, 52°C for 30s and 72°C for 1 min; final extension of 72°C for 5 min). The PCR primers are listed in Table S7. PCR products were treated with 40U Exonuclease 1 (NEB) for 1 h at 37°C to remove free oligonucleotide and purified by ethanol precipitation/ or the DNA Clean and Concentrator 5 kit (Zymo Research). Libraries were resolved on a 2% MetaPhor agarose (Lonza) gel and 175-300bp fragments were gel-extracted with the MinElute kit (Qiagen) according to manufacturer’s instructions. All libraries were quantified on a 2100 Bionalyzer using the High-Sensitivity DNA assay and a Qubit 4 (Invitrogen). Individual libraries were pooled based on concentration and barcode sequence identity. Paired-end sequencing (75 bp) was performed by Edinburgh Genomics on an Illumina HiSeq 4000 platform.
RNAseq
E. coli MG1655 was cultured and harvested as described for the CLASH procedure. Total RNA was extracted using the Guanidium thiocyanate phenol method. RNA integrity was assessed with the Prokaryote Total RNA Nano assay on a 2100 Bioanalyzer (Agilent). Genomic DNA was removed by incubating 10 μg of total RNA with 2U Turbo DNase (Ambion) in a 50 μl final volume for 30 minutes at 37°C in the presence of 10 U SuperaseIn RNase Inhibitor (Ambion). RNA was subsequently phenol-chloroform extracted and purified by ethanol-precipitation. Ribosomal RNA was removed with the Ribo-Zero rRNA Removal Kit (Gram-Negative Bacteria) (Illumina) according to the manufacturer’s instructions. Successful rRNA depletion was verified on the Agilent 2100 Bioanalyzer. The RNA was fragmented for 5 min at 95°C in the presence of Superscript III buffer (Invitrogen) followed by a five minute incubation on ice. Reverse-transcription (RT) was performed with Superscript III (Invitrogen) in 20 μl reactions according to manufacturer’s procedures using 250 ng of ribosomal RNA depleted RNA and 2.5 μM random hexamers (PE_solexa_hexamer, oligo 73, Table S7). The RNA and free primers were degraded using 20U of Exonuclease I (NEB) and 50U RNaseIf (NEB) and the cDNA was purified with the DNA Clean & Concentrator 5 kit (Zymo Research). Ligation of the 5′ adapter (P5_phospho_adapter, oligo 39) to the cDNA was performed using CircLigase II (EpiCentre) for 6 hours at 60°C, followed by a 10 minute inactivation at 80°C. The cDNA was purified with the DNA Clean & Concentrator 5 kit (Zymo Research). Half of the cDNA library was PCR amplified using Pfu polymerase (Promega) using the P5 forward PCR oligonucleotide and barcoded BC reverse oligonucleotides (200 nM; Table S7; 95°C for 2 min, 95°C for 20s, 52°C for 30s and 72°C for 1 min, and a final extension of 72°C for 5 min. 20 cycles of amplification). The PCR products were treated with Exonuclease 1 (NEB) for 1 h at 37°C and purified by ethanol precipitation. Libraries were resolved on a 2% MetaPhor agarose (Lonza) gel 200-500 bp fragments were gel-extracted using the MinElute kit (Qiagen). All libraries were quantified on a 2100 Bionalyzer using the High-Sensitivity DNA assay. Individual libraries were pooled in equimolar amounts. Paired-end sequencing (75 bp) was performed by Edinburgh Genomics on a Illumina HiSeq 4000 platform.
Small RNA over-expression studies
Individual TOP10F’ clones carrying pZA21 and pZE12-derived sRNA constructs and control plasmids combinations (Table S7) were cultured to OD600 0.1 and expression of sRNAs was induced with IPTG and anhydrotetracycline hydrochloride for one hour. Cells were collected by centrifugation for 30 seconds at 14000 rpm, flash-frozen in liquid nitrogen and total RNA was isolated as above. Genomic DNA was digested with Turbo DNase (Ambion), then the RNA was purified with RNAClean XP beads (Beckman Coulter). Gene expression was quantified by RT-qPCR (see below) using 10 ng total RNA as template, and expressed as fold change relative to the reference sample containing pJV30086 or pZA21.
For pulse-overexpression studies overnight MG1655 cultures containing pBAD∷sRNA and empty pBAD+1 control plasmids were inoculated in fresh LB-ampicillin medium at a starting OD600 of 0.05, and grown aerobically at 37°C to OD600 0.4. Pre-induction (0 min) and post-induction samples were harvested. For induction, cultures were supplemented with L-arabinose and rapidly collected by filtration and flash-frozen in liquid nitrogen at the indicated time-points. RNA was extracted from three biological replicate time-series, followed by RNASeq library preparation, next generation sequencing and DESeq2 analysis of differentially expressed genes.
GFP reporter system to quantify sRNA effect on target expression
A two-plasmid system was used to express each sRNA, and mRNA-sfGFP fusions63,64 with modifications. The sRNA and sfGFP-fusion plasmids were co-transformed in E. coli TOP10 cells by electroporation and cells were maintained on dual selection with ampicillin and chloramphenicol. In TOP10 cells, the mRNA-sfGFP constructs are constitutively expressed, whereas sRNA expression requires L-arabinose induction. The expression of sfGFP-fused targets in the presence or absence of sRNAs was quantified at the protein level, by plate reader experiments and at the RNA level, by RT-qPCR.
For the plate reader experiments, a single colony of bacterial strain harbouring a sRNA-target-sfGFP combination was inoculated in a 96-well Flat Bottom Transparent Polystyrene plate (Fisher Scientific) with lid (Thermo Scientific) and cultured overnight at 37°C in 100 μl LB supplemented with antibiotics and L-arabinose to induce expression of sRNAs. Next day, each overnight inoculum was diluted 1:100 by serial dilution, in triplicate, in LB with freshly prepared L-arabinose to a final volume of 100 μl. Cultures were grown in a 96-well plate in an Infinite 200 Pro plate reader (Tecan) controlled by i-control software (Tecan) for 192 cycles at 37°C with 1 min orbital shaking (4 mm amplitude) every 5th minute. To monitor optical density over time, the following parameters were used: wavelength 600 nm, bandwidth 9 nm. Fluorescence was monitored with excitation wavelength 480 nm, bandwidth 9 nm and emission wavelength 520 nm, bandwidth 20 nm. Measurements were recorded at 5 minute intervals, by top reading. Raw data was processed following guidance from previous reports63. First, the range of linearity of increase of fluorescence with OD600 was identified for all individual triplicates. Only the linearity range common to all triplicates was considered for further analysis. For each set of triplicates, the mean fluorescence was calculated at each OD600. To correct for background and cell autofluorescence, the fluorescence mean of a strain with plasmid pXG-0 was subtracted from all strains with GFP plasmids at the equivalent OD600. Ultimately, a curve was generated for each sample, plotting the background-corrected fluorescence (GFP) versus OD600. The experiments were performed for three biological replicates, and mean values and standard error of the means calculated for each strain.
RT-qPCR
Total RNA (12.5µg) was treated with 2U of Turbo DNase (Ambion) for 1 hour at 37°C in a 10 μl reaction in the presence of 2U SuperaseIn RNase inhibitor (Ambion). The DNase was inactivated by 10 minutes incubation at 75°C. Reverse transcription (RT) was performed in a single reaction for all target genes of interest using a mix of gene-specific RT primers at 3.5 μM concentration each. After addition of 2.5 μl RT primer mix, the RNA and primers were denatured at 70°C for 3 min, then snap chilled and incubated on ice for 5 min. RT was performed for 1 hour at 55°C with SuperScript III (Invitrogen) using 5 μl of RNA-RT primers mix in 10 μl final volume (100 U Superscript III, 2.5 mM DTT, 1xFS Buffer, 0.75 mM dNTPs) in the presence of 1U RNasin (Promega). RT was followed by treatment with 5U RNase H for 30 min at 37°C to remove the RNA from the RNA-cDNA duplexes. The cDNA was diluted 10-fold with DEPC water. Quantitative PCR was performed on 50ng of DNAseI-treated total RNA using the Brilliant III UltraFast SYBR Green QPCR Master Mix (Agilent) and the Luna Universal One-Step RT-qPCR Kit (NEB) according to manufacturer’s instructions. The qPCRs were run on a LightCycler 480 (Roche), and the specificity of the product was assessed by generating melting curves, as follows: 65°C-60s, 95°C (0.11 ramp rate with 5 acquisitions per C, continuous). The data analyses were performed with the IDEAS2.0 software, at default settings: Absolute Quantification/Fit Points for Cp determination and Melt Curve Genotyping. The qPCR efficiency of primer pairs was assessed by performing standard curves by serial dilution of template RNA or genomic DNA. Negative controls such as -RT or no template control were used throughout, and the qPCR for all samples was performed in technical triplicate. Outliers from the samples with technical triplicate standard deviations of Cp > 0.3 were discarded from the analyses. To calculate the fold-change relative to the control, the 2-ddCp method was employed, using recA or 5S rRNA (rrfD) as the reference genes where indicated. Experiments were performed for minimum two biological replicates, and the mean fold-change and standard error of the mean were computed. Unless otherwise stated, significance of the fold-change difference compared to the reference sample control (for which fold-change =1) was tested with a one-sample t-test.
Northern Blot analysis
Total RNA was extracted from cell lysates by GTC-Phenol extraction. For large RNA fragments, 10 μg of total RNA was resolved on a 1.25% BPTE-gel (pH 7) and transferred to a nylon membrane (HyBond N+, GEHealthcare) by capillarity. For short RNA fragments, 10 μg total RNA was separated on a 8% polyacrylamide TBE-Urea gel and transferred to a nylon membrane by electroblotting for four hours at 50 V. All membranes were UV-crosslinked in a Stratalinker at intensity 1200. Membranes were pre-hybridised in 10 ml of UltraHyb Oligo Hyb (Ambion) for one hour and probed with 32P-labeled DNA oligo for 12-18 hours in a hybridization oven at 42°C. The sequences of the probes used for Northern blot detection are detailed in Table S7. Membranes were washed two times in 2xSSC with 0.5% SDS solution for 10 minutes, and visualized with a Phosphor imaging screen and FujiFilm FLA-5100 Scanner (IP-S mode). For detection of highly abundant species (5S rRNA) autoradiography was used for exposure.
Western blot analyses
E. coli MG1655 Hfq∷htf strain was cultured in LB in the same conditions as the CLASH and RNA-seq experiments. Cells were collected at OD600 0.4, 0.8, 1.2, 1.8, 2.4, 3 and 4. A volume of cell lysate containing 40 ug protein was run on PAGE gels for each OD600 and transferred to a nitrocellulose membrane. The membranes were blocked for one hour in blocking solution (5% non-fat milk in PBST (1X phosphate saline buffer, 0.1% Tween-20). To detect Hfq-HTF protein, the membrane was probed overnight at 4°C with the Rabbit anti-TAP polyclonal primary antibody (Thermo Fisher, 1:5000 dilution in blocking solution), which recognizes an epitope at the region between the TEV-cleavage site and His6. For the loading control we used a rabbit polyclonal to GroEL primary antibody (Abcam, 1:300000 dilution), for 2 hours at room temperature. After 3×10 min PBST washes, the membranes were blotted for two hours with a Goat anti-rabbit secondary antibody (Thermofisher, 1:5000 in blocking solution) at room temperature. Finally, after three 10-minute PBST washes, the proteins were visualised using Pearce enhanced chemiluminescence (ECL, Thermo Fisher) according to manufacturer’s instructions
Primer extension analysis
One microgram total RNA was reverse-transcribed using SuperScript III reverse transcriptase (Invitrogen) using 32P-radiolabelled oligonucleotides as primers (Table S7). Primers were added to the RNA and annealing was performed by heating the samples at 85°C for three minutes and then snap chilling them on ice. The RT was performed for one hour at 45°C, followed by Exonuclease I and RNaseIf (NEB) (0.5 μl each) treatment for 30 minutes at 37°C. Reactions were stopped by mixing with an equal volume of 2XRNA loading dye (NEB), 2 minutes incubation at 95°C and snap chilled. The sequencing ladders were prepared with Sequenase v2.0 (USB/Affymetrix) according to specified instructions. Samples were resolved on 6% PAA/8M TBE-urea gels and visualized using the FLA5100 phosphoimager system.
Construction of the MdoR seed-mutant strain
To mutate the chromosomal copy of MdoR, we used the λRed system 87. We amplified the integration cassette from plasmid pKD4 with ultramers 895 and 896, containing homology regions to the coding sequence of malG, the desired MdoR sequence and to the region immediately downstream of the Rho-independent terminator, respectively (Fig S8). With this design, the scar after removal of the Kanr cassette was expected at a site outside the MdoR/malG sequence. The PCR product was electroporated in E. coli MG1566 strains carrying the pKD46 plasmid from which λRed recombinase was induced with 10 mM L-arabinose. Correct replacement of the MdoR seed sequence was screened by colony PCR using primer pairs: 725 & 909 and 726 & 910. The antibiotic resistance cassette was removed from substitution mutants by FLP-recombinase expressed constitutively from pE-FLP88. Successful allele replacement was confirmed by Sanger sequencing.
Polysome profiling analyses
Wild-type E. coli MG1655 containing empty pBAD plasmid and an isogenic strain containing pBAD∷MdoR were grown in LB until OD600 0.4, then treated with L-arabinose (15 minutes in total) to induce overexpression of MdoR, and cycloheximide (Sigma) at a final concentration of 100 µg/ml for 3 minutes. 200 ml of cells were harvested by rapid filtration and flash frozen. The cells were washed in ice-cold PBS supplemented with 100 µg/ml cycloheximide.
Polysomal profiling was performed according to previously described protocols89,90 with minor changes in the lysis buffer (10 mM NaCl, 10 mM MgCl2, 10 mM Tris-HCl (pH 7.5), 100 ug/ml cycloheximide, 1% (w/v) Na-Deoxycholate, 1U DNAse, 0.6U/mL RiboLock in DEPC water). Lysates were kept on ice for 30 min, centrifuged 3X at 15000 g for 10 min. The supernatants were loaded on a linear 10%–30% [w/v] sucrose gradient and centrifuged for 4 hours using a SW41 rotor at 40000 rpm in a Beckman Optima XPN-100 Ultracentrifuge. Fractions of 1 mL in volume were collected monitoring the absorbance at 254 nm with the UA-6 UV/VIS detector (Teledyne Isco). Fractions from the entire gradient (total RNA) and from the fractions corresponding to ribosomes (70S) and polysomes (polysomal RNA) were pooled and RNA was purified by acid phenol–chloroform extraction according to91.
Terminator™ 5′-PhosphateDependent Exonuclease treatment
Ten micrograms of total RNA extracted from cell-samples at OD600 1.2 and 1.8 were treated with 5′-Terminator Dependent Exonuclease (Epicentre) as per manufacturer instructions using Buffer A. The reaction was terminated by phenol extraction and ethanol precipitation, and the RNA was loaded on 8% polyacrylamide-urea gels and transferred to nylon membranes that were probed for MdoR, RaiZ, RybB and 5S rRNA (Table S7).
Inactivation of RNase E
The E. coli rne-3071 (Table S7) and wild-type strains were grown in LB medium at 33°C to an OD600 of 1.5, then shifted to 43°C for 30 min to inactivate RNase E. Cells were harvested by rapid filtration (0.45 μM filters, Merck Millipore) and flash-frozen in liquid nitrogen. Detection of RNAs was performed by Northern blot.
Seed mutant studies
Wild-type MG1655 and seed mutant strains were grown overnight in minimal medium with glucose. Next day, each starter culture was split and inoculated at OD600 0.05 in fresh M9 medium with glucose or maltose as the sole carbon source. Growth was monitored and cells were harvested at OD600 0.5. Total RNA was extracted and gene expression was quantified by RT-qPCR or Northern Blot.
Computational analysis
Pre-processing of the raw sequencing data
Raw sequencing reads in fastq files were processed using a pipeline developed by Sander Granneman, which uses tools from the pyCRAC package92. The entire pipeline is available at https://bitbucket.org/sgrann/). The CRAC_pipeline_PE.py pipeline first demultiplexes the data using pyBarcodeFilter.py and the in-read barcode sequences found in the L5 5′ adapters. Flexbar then trims the reads to remove 3′-adapter sequences and poor quality nucleotides (Phred score <23). Using the random nucleotide information present in the L5 5′-adaptor sequences, the reads are then collapsed to remove potential PCR duplicates. The reads were then mapped to the E. coli MG1655 genome using Novoalign (http://www.novocraft.com). To determine to which genes the reads mapped to, we generated an annotation file in the Gene Transfer Format (GTF). This file contains the start and end positions of each gene on the chromosome as well as what genomic features (i.e. sRNA, protein-coding, tRNA) it belongs to. To generate this file, we used the Rockhopper software93 on E. coli rRNA-depleted total RNA-seq data (generated by Christel Sirocchi), a minimal GTF file obtained from ENSEMBL (without UTR information). The resulting GTF file contained information not only on the coding sequences, but also complete 5′ and 3′ UTR coordinates. We then used pyReadCounters.py with novoalign output files as input and the GTF annotation file to count the total number of unique cDNAs that mapped to each gene.
Normalization steps
To normalise the read count data generated with pyReadCounters.py and to correct for differences in library depth between time-points, we calculated Transcripts Per Million reads (TPM) for each gene. Briefly, for each time-point the raw counts for each gene was first divided by the gene length and then divided by the sum of all the values for the genes in that time-point to normalize for differences in library depth. Subsequently, only genes with a minimum of 5 TPM in all datasets compared were used. The TPM values for each OD600 studied were divided by the TPM values of the first sample (OD600 0.4). Thus, the fold-change starts at 1 for all samples in the OD600 series. These ratios were then log2-normalized. The log2-normalized fold-changes were used to compare RNAseq and Hfq-cross-linking profiles among samples, and to perform k-means clustering with the python sklearn. cluster. Kmeans class.
Hfq-binding coverage plots
For the analysis of the Hfq binding sites the pyCRAC package92 was used (versions. 1.3.2 and 1.3.3). The pyBinCollector tool was used to generate Hfq cross-linking distribution plots over genomic features. First, PyCalculateFDRs.py was used to identify the significantly enriched Hfq-binding peaks (minimum 10 reads, minimum 20 nucleotide intervals). Next, pyBinCollector was used to normalize gene lengths by dividing their sequences into 100 bins, and calculate nucleotide densities for each bin. To generate the distribution profile for all genes individually, we normalized the total number of read clusters (assemblies of overlapping cDNA sequences) covering each nucleotide position by the total number of clusters that cover the gene. Motif searches were performed with pyMotif.py using the significantly enriched Hfq-binding peaks (FDR intervals). The 4-8 nucleotide k-mers with Z-scores above the indicated threshold were used for making the motif logo with the k-mer probability logo tool 94 with the - ranked option (http://kplogo.wi.mit.edu/).
Analysis of chimeric reads
Chimeric reads were identified using the hyb package using default settings35 and further analysed using the pyCRAC package92. To apply this single-end specific pipeline to our paired-end sequencing data, we joined forward and reverse reads using FLASH 95, which merges overlapping paired reads into a single read. These were then analysed using hyb. The-anti option for the hyb pipeline was used to be able to use a genomic E. coli hyb database, rather than a transcript database. Uniquely annotated hybrids (.ua.hyb) were used in subsequent analyses. To visualise the hybrids in the genome browser, the.ua.hyb output files were converted to the GTF format. To generate distribution plots for the genes to which the chimeric reads mapped, the parts of the chimeras were clustered with pyClusterReads.py and BEDtools96 (intersectBed) was used to remove clusters that map to multiple regions. To produce the coverage plots with pyBinCollector, each cluster was counted only once, and the number of reads belonging to each cluster was ignored.
sRNA density plots
To visualize the nucleotide read density of sRNA-target pairs for a given sRNA, we first merged the hyb datasets for all OD600 and biological replicates, and filtered the interactions represented by at least two chimeric reads in the unified dataset. For each sRNA-target pair in the filtered dataset, the hit counts at each nucleotide position for all chimeras were summed, regardless of the orientation of the chimeras. The count data was log2-normalised (actually Log2(Chimera count +1) to avoid NaN for nucleotide positions with 0 hits when log-transforming the data).
sRNA-sRNA network visualization
Only the interactions represented by at least two chimeric reads in the merged CLASH dataset were considered. For each such interaction, chimera counts corresponding to sRNA-sRNA in either orientation were summed, log2-transformed and visualized with the igraph Python package.
Differential expression analyses
For the differential expression analyses DESeq2 was used54. Three MdoR pulse-overexpression datasets were compared to three pBAD Control overexpression datasets. Only differentially expressed genes that had an adjusted p-value of 0.05 or lower were considered significant.
Multiple sequence alignments and conservation analyses
The homologous sequences of MdoR in other enterobacteria were retrieved by BLAST. JalView was used for the multiple sequence alignments, using the MAFFT algorithm97.
Data and Code availability
The next generation sequencing data have been deposited on the NCBI Gene Expression Omnibus (GEO) with accession number GSE123050. The python pyCRAC and kinetic-CRAC software packages used for analysing the data are available from https://bitbucket.org/sgrann. The hyb pipeline for identifying chimeric reads is available from https://github.com/gkudla/hyb. The FLASH algorithm for merging paired reads is available from https://github.com/dstreett/FLASH2.
Author contributions
I.A.I, R.v.N and S.G conceived the experiments. All the authors contributed to designing and performing the experiments. I.A.I performed the majority of the experiments and carried out most of the bioinformatics data analyses. M.M. and G.V’s contributions were instrumental for generating the polysome analysis data. I.A.I and S.G drafted the manuscript and all the authors reviewed the manuscript and approved the final version.
Competing interest
The authors declare no competing financial interest.
Materials & Correspondence
All requests for code, materials and reagents should be sent to Sander Granneman (http://www.sgrannem@ed.ac.uk)
Acknowledgement
We are grateful to Lionello Bossi for his valuable feedback on the project and for very fruitful discussions. We would like to thank the members of the Granneman lab for critically reading the manuscript. This work was supported by grants from the Wellcome Trust (091549 to S.G and 102334 to I.A.I.), the Wellcome Trust Centre for Cell Biology core grant (092076), a Medical Research Council non Clinical Senior Research Fellowship (MR/R008205/1 to S.G.) and the Australian National Health and Medical Research Council Project grants (APP1067241 and APP1139315 to J.J.T). G.V is supported by IMMAGINA BioTechnology s.r.l. and the Provincia Autonoma di Trento, Italy. Next generation sequencing was carried out by Edinburgh Genomics (HiSeq4000), The University of Edinburgh. Edinburgh Genomics is partly supported through core grants from NERC (R8/H10/56), MRC (MR/K001744/1) and BBSRC (BB/J004243/1).
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.
- 15.
- 16.
- 17.
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.
- 57.
- 58.↵
- 59.↵
- 60.↵
- 61.
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.
- 69.↵
- 70.↵
- 71.↵
- 72.
- 73.↵
- 74.↵
- 75.↵
- 76.
- 77.↵
- 78.
- 79.↵
- 80.
- 81.
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵