Abstract
MiRNA biogenesis is highly regulated at the post-transcriptional level; however, the role of sequence and secondary RNA structure in this process has not been extensively studied. A single G to A substitution present in the terminal loop of pri-mir-30c-1 in breast cancer patients had been previously described to result in increased levels of mature miRNA. Here, we report that this genetic variant directly affects Drosha-mediated processing of pri-mir-30c-1 in vitro and in cultured cells. Structural analysis of this variant revealed an altered RNA structure that facilitates the interaction with SRSF3, an SR protein family member that promotes pri-miRNA processing. Our results are compatible with a model whereby a genetic variant in pri-mir-30c-1 leads to a secondary RNA structure rearrangement that facilitates binding of SRSF3 resulting in increased levels of miR-30c. These data highlights that primary sequence determinants and RNA structure are key regulators of miRNA biogenesis.
MicroRNAs (miRNAs) are short non-coding RNAs that negatively regulate the expression of a large proportion of cellular mRNAs, thus affecting a multitude of cellular and developmental pathways1,2. The canonical miRNA biogenesis pathway involves two sequential processing events catalyzed by RNase III enzymes. In the nucleus, the Microprocessor complex, comprising the RNase III enzyme Drosha, the double-stranded RNA-binding protein, DGCR8, and additional proteins carries out the first processing event, which results in the production of precursor miRNAs (pre-miRNAs)3,4. These are exported to the cytoplasm, where a second processing event carried out by another RNase III enzyme, Dicer, leads to the production of mature miRNAs that are loaded into the RISC complex5.
Due to the central role of miRNAs in the control of gene expression, their levels must be tightly controlled. As such, dysregulation of miRNA expression has been shown to result in grossly aberrant gene expression and leads to human disease6–9. In particular, the Microprocessor-mediated step of miRNA biogenesis has been shown to be regulated by multiple signaling pathways, including the transforming growth factor-β (TGF-β) pathway as well as the p53 response, leading to activation of subsets of individual miRNAs10,11. Furthermore, altered miRNA expression has been associated with the progression of cancer12,13, where a global downregulation of miRNA expression is usually observed14,15. It was recently shown that miRNA biogenesis can also be regulated in a cell-density-dependent manner via the Hippo-signaling pathway, and that the observed perturbation of this pathway in tumors may underlie the widespread downregulation of miRNAs in cancer16. Thus, miRNA production is tightly controlled at different levels during the biogenesis cascade. Extensive evidence has shown that RNA-binding proteins (RNA-BPs) recognize the terminal loop of miRNA precursors and influence either positively or negatively the processing steps carried out by Drosha in the nucleus and/or Dicer in the cytoplasm. These include the hnRNP proteins, hnRNP K and hnRNP A1, as well a the cold-shock domain protein, Lin28 and the RNA helicases, p68/p7217. In the case of the multifunctional RNA-binding protein hnRNP A1, we have previously shown that it can act as an auxiliary factor by binding to the conserved terminal loop of pri-miR-18a and promote its Microprocessor-mediated processing18,19. Conversely, the same protein can act as a repressor of let-7 production in differentiated cells20.
Several studies have shown that there is a correlation between the presence of polymorphisms in pri-miRNAs and the corresponding levels of mature miRNAs21; however, a mechanistic understanding of how sequence variation and RNA structure control miRNA biogenesis has not been explored in great detail. Screening of novel genetic variants in human precursor miRNAs linked to breast cancer identified two novel rare variants in the precursors of miR-30c and miR-17, resulting in conformational changes in the predicted secondary structures and also leading to altered expression of the corresponding mature miRNAs. These patients were non-carriers of BRCA1 or BRCA2 mutations, suggesting the possibility that familial breast cancer may be caused by variation in these miRNAs22. In particular, the single G to A substitution in primary miR-30c-1 (pri-mir-30c-1) terminal loop, which was also later observed in gastric cancer patients23, results in an increase in the abundance of the mature miRNA.
Here, we investigated the mechanism by which the pri-mir-30c-1 variant detected in breast cancer patients results in an increased expression of this miRNA. We found that this genetic variant directly affects the Microprocessor-mediated processing of this miRNA. A combination of structural analysis with RNA chromatography coupled to Mass spectrometry revealed changes in the pri-miRNA structure that lead to differential binding of a protein factor, SRSF3, that has been previously reported to act as a miRNA biogenesis factor. These results provide a mechanism by which the pri-mir-30c-1 genetic variant results in an increased expression of the mature miR-30c. Altogether these data highlights that primary sequence as well as RNA structure have a crucial role in the post-transcriptional regulation of miRNA biogenesis.
RESULTS
The G/A substitution in pri-miR-30c-1 affects the Microprocessor-mediated processing of the primary miRNA
In order to understand the mechanism underlying miR-30c deregulation in breast and gastric cancers, we investigated how the reported G27-to-A mutation observed in a Chinese population might affect miRNA biogenesis. It was previously shown that this substitution results in an increase in the abundance of the mature miRNA; however, the mechanism that leads to an increased expression is unknown22,23. First, we transiently transfected MCF7 breast cancer cells with constructs encoding 380 nucleotides (nt) of primary hsa-pri-mir-30c-1 (pri-miRNA), either in a WT version or bearing the G/A variant (Fig. 1a). We observed that the G/A substitution resulted in increased levels of mature miR-30c (Fig. 1b), resembling the situation observed in patients with this mutation. Furthermore, this was not due to increased transcription of the G/A-harboring pri-miRNA, as shown by unchanged levels pri-miRNA levels (data not shown). In order to dissect the precise step of miRNA biogenesis pathway that is affected by the G/A substitution, we used an RNA version of pri-mir-30c-1 that has yet to undergo processing by the Microprocessor in the nucleus and by Dicer in the cytoplasm (pri-miRNA). As a counterpart, we transfected an RNA oligonucleotide that mimics the precursor miRNA (pre-miRNA), a sequence that arises upon processing by the Microprocessor. Importantly, we observed approximately a two-three fold increase of miR-30c mature levels when transfecting the G/A sequence derived from the pri-miRNA sequence, whereas no changes were detected following transfection of the pre-miRNA sequence (Fig. 1c). This experiment demonstrates that the G/A substitution exclusively affects the Drosha-mediated processing of the pri-miRNA. Moreover, we could recapitulate this result in an in vitro reaction. We found that in vitro transcribed pri-mir-30c-1 was readily processed in the presence of MCF7 total extracts, rendering a product of ~65 nts that corresponds to pre-mir-30c. Notably, the processing of the G/A variant was increased, when compared to the WT version, as was observed in living cells (Fig. 1d). The effect of the G/A variation in the processing of pri-mir-30c-1 was also recapitulated using a purified Microprocessor (FLAG-Drosha/FLAG-DGCR8 complex) and a shorter in vitro transcribed substrate (153nt) (Supplementary Fig. 1). Altogether, these complementary approaches indicate an enhanced Microprocessor-mediated processing of the G/A variant sequence and this recapitulates what was previously observed in breast and gastric cancer patients.
The sequence of the terminal loop in pri-miR-30c-1 is highly conserved across species
Experiments described above confirmed a crucial role for the G27 residue in influencing miR-30c biogenesis. We analyzed the genomic variation in the hsa-pri-mir-30c-1 sequence across vertebrates, and detected substantial evolutionary constraint across the entire locus, as indicated by positive GERP (Genomic Evolutionary Rate Profiling) scores (Fig. 2a). Constrained residues, which highlight regions under purifying selection, are located in the mature miRNA sequences in both arms, as expected due to their effect in the regulation of gene expression. Interestingly, part of the terminal loop (TL), where G27 is embedded, has also a very high level of constraint, which is suggestive of a role of this sequence in miRNA biogenesis, as was previously described for a subset of miRNAs17,19,24. In addition, residues at the 5´ end (nts −1 to −8; −11 to −16 and −20 to −23) and the 3´end (nts +1 to +16 and +18 to +25) are also highly constrained. Indeed, several of these residues were included as part of the stem in the in silico predicted RNA structure suggesting their importance for maintaining the RNA secondary structure (Fig. 2b). These data led us to focus our attention on these invariant sequences as potentially having a crucial role in the regulation of miR-30c biogenesis.
RNA structural analysis reveals the specific requirement of G27 for maintaining the pri-miRNA structure
Next, in order to establish the importance of the G27 to A substitution in RNA structure, we performed structural analysis by Selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE)25. This approach allows performing quantitative RNA structural analysis at single nucleotide resolution and is mostly independent of base composition. While highly reactive residues are located at single-stranded regions, non-reactive nucleotides are involved in base pairs, non-Watson-Crick base pairs, tertiary interactions or single stacking interactions in the C2´-endo conformation26. To this end, in vitro transcribed RNA comprising 380 nt of pri-miR-30c-1 (either wild-type or the G/A variant sequence) was treated with N-methylisatoic anhydride (NMIA), which reacts with the 2´hydroxyl group of flexible nucleotides (Fig. 3a). Gross modifications of SHAPE reactivity were observed in specific regions of the G/A variant, when compared with the wild-type sequence. The resulting profiles revealed a decreased SHAPE reactivity in the TL (residues 28-30), with a concomitant increase in the 5' region (−18,−16, and −15) as well as in the 3´end (nts +11, +16 and +19) (Fig. 3a,b). This result indicated the presence of different conformations in the pri-miRNA with the G-to-A substitution, as compared to its wild-type counterpart. In order to gain more information into the folding and tertiary structure of this pri-miRNA, we assessed the solvent accessibility of each nucleotide by hydroxyl radical cleavage footprinting, generated by reduction of hydrogen peroxide by iron (II)27. Hydroxyl radicals break the accessible backbone of RNA with no sequence dependence. We defined buried regions, as zones with more than two consecutives nucleotides having a reactivity (R) smaller than the mean of all reactivity, whereas exposed regions are those with more than two consecutive nucleotides having R larger than the mean of all reactivity. We observed that the wild-type sequence presents two buried regions located between nts 8 to 40 and +9 to +25, as well as two exposed segments between nt −25 to 7 and nts 40 to +8 (Supplementary Fig. 2a). The G to A substitution caused changes in the exposure to solvent, with both the TL (nts 28 to 40) and also the 3´end region (nts +17 to +22) becoming solvent accessible. By contrast, the 5´end (nts −7 to 7) and a small region in miRNA-3p (nts 55 to 58) are no longer solvent accessible.
Altogether, the SHAPE and radical hydroxyl data suggest that the G27A substitution is indeed affecting the RNA flexibility of pri-miRNA-30c-1, modifying both base-pairing interaction as well as solvent accessibility of the nucleotides located in the TL and in the basal region of the stem. This could be a consequence of a long distance interaction disruption between those regions (Fig. 3c and Supplementary Fig. 2b), which could in turn modify the interaction with RNA-binding proteins important for miR-30c biogenesis.
SRSF3 binds to a basal region of hsa-pri-mir-30c-1
A working model that emerges from data described above is that either a repressor of Microprocessor-mediated processing binds to the wild-type sequence or, alternatively, the change in RNA structure induced by the G/A sequence variation could lead to the binding of an activator. In order to identify RNA-BPs that differentially bind to either the pri-miR-30c-1 WT or G/A sequence, we performed RNase-assisted chromatography followed by mass spectrometry in MCF7 total cell extracts28. This resulted in the identification of twelve proteins that interact with the WT sequence and eight that bind to the G/A variant, being 7 common between both substrates. Significantly, several of the common proteins were previously implicated in miRNA biogenesis and/or regulation, including the heat shock cognate 70 protein5, the hnRNP proteins, hnRNP A118,20 and hnRNP A2/B129, as well as the RNA helicase DDX174, and Poly [ADP-ribose] polymerase 1 (PARP)30 (Fig. 4a and Supplementary Fig. 3a).
We decided to focus on those interactors that showed preferential interaction with either the WT or the G/A variant sequence. In this category, the most likely candidate that interacted with the WT sequence is the RNA-binding protein FUS/TLS (fused in sarcoma/translocated in liposarcoma). Interestingly, this protein has been shown to promote miRNA biogenesis by facilitating the co-transcriptional recruitment of Drosha31; however, we could not validate the specific interaction between FUS and miR-30c-1 WT sequence by Immunoprecipitation followed by Western blot analysis (IP-WB) (Supplementary Fig. 3b). As for the G/A variant, there was a single exclusive interactor, SRSF3, which is a member of the SR family of splicing regulators. These family of proteins are involved in constitutive and alternative splicing, but some of them have been shown to fulfil other cellular functions32,33.
Importantly, SRSF3 has also been reported to be required for miRNA biogenesis34. We could validate this interaction by RNA chromatography followed by Western-blot analysis with an antibody specific for SRSF3 (Fig. 4b). We also observed preferential binding of endogenous SRSF3 protein to the G/A variant of pri-mir-30c-1, as shown by immunoprecipitation of SRSF3 followed by RT-qPCR quantification of the associated pri-miRNA (Fig. 4c). In order to analyze the interaction of SRSF3 with pri-miR-30c-1 wild-type and G/A sequences, we carried out toeprint and SHAPE assays using SRSF3 protein purified from MCF7 cells. Toeprint analysis was performed with fluorescent labeled antisense primers and capillary electrophoresis35. In this assay, bound SRSF3 will block the reverse transcriptase and will illuminate the site where SRSF3 is bound to RNA. Prominent toeprint of SRSF3 with the G/A sequence was observed around nt A+25-G+24 (RNA size 170nt, position A+25)(Fig. 4d). Similarly, analysis of SHAPE reactivity in the presence of added purified SRSF3, revealed a dose-dependent protection from NMIA attack upon addition of SRSF3 in specific RNA residues in a dose-dependent manner (nts −19, −18, +16,+18,+19) of the basal region of G/A (Fig. 4f,e). Of importance, a conserved CNNC motif (nts from +16 to +22), previously described as SRSF3 binding site34,36,37 is located within the recognition place. Together, we can conclude that the interaction of SRSF3 with pri-miR-30c-1 takes place at the CNNC motif at the basal region of the G/A variant.
SRSF3 protein is responsible for the increased levels of the G/A variant of miR-30c
As previously described, SRSF3 was proposed to have a role in miRNA biogenesis by recognition of a CNNC motif located 17nts away from Drosha cleavage site34. Pri-mir-30c-1 has two overlapping CNNC motifs (residues from +16 to +21). Notably, accessibility around this region increased in the G/A variant, as determined by SHAPE and hydroxyl radical analysis (Fig. 3 and Supplementary Fig. 2). Furthermore, toeprint and SHAPE assays in the presence of purified SRSF3 protein confirmed the specific recognition of the CNNC motif by SRSF3 in a dose-dependent manner (Fig. 4). Next, we addressed whether the preferential binding of SRSF3 to the pri-mir-30c G/A variant sequence was responsible for its increased expression by comparing the mature levels of miR-30c-1 wild-type or G/A variant under variable levels of SRSF3 expression. To this end, we co-transfected pri-mir-30c-1 constructs in MCF7 cells under transiently overexpression of SRSF3, or alternatively, transfected specific siRNAs to knock-down endogenous SRSF3 protein (Supplementary Fig. 4). Of interest, we observed that reduced levels of SRSF3 drastically decreased the levels of the G/A miR-30c variant, without affecting the levels of wild-type miR-30c (Fig. 5a, compare WT vs G/A panels). By contrast, transient overexpression of SRSF3 increased significantly the levels of wild-type mature miR-30c, but has a more modest effect on the G/A variant sequence. Altogether, these experiments suggest that SRSF3 binding is limiting in the WT scenario and that is essential to promote miRNA biogenesis in the G/A context.
To confirm the role of SRSF3 in the differential processing observed with pri-mir-30c-1 G/A variant sequence, we proceeded to mutate the two consecutive CNNC motifs that are the natural binding sites for SRSF3 (Fig. 5b). We generated a set of mutants that affected either the first or second CNNC motif (mut1 and mut2, respectively) or a deletion of both motifs (ΔCNNC). The CNNCmut1 carrying a double substitution C+15U+17 to AA, led to a severe reduction in the levels of miR-30c expression only with the G/A variant sequence (Fig. 5b). Similarly, a double substitution of the second CNNC motif G+20C+21 to AA (CNNCmut2) behaved similarly, exclusively affecting the G/A variant. This experiment strongly suggests that the binding of SRSF3 is an important determinant of miR-30c expression. Finally, we could recapitulate the observation that SRSF3 binding is limiting for the processing of wild-type pri-mir-30c-1 in an in vitro system, supplemented with purified SRSF3 protein (Fig. 6). Firstly, we found that the FLAG-Drosha/FLAG-DGCR8 complexes used for the in vitro processing assays contained residual levels of SRSF3 protein (Supplementary Fig. 5). Thus, the relative higher processing of the G/A variant can be explained by the preferential binding of SRSF3 present in the reaction to the G/A variant. Importantly, addition of purified SRSF3 protein increases the Microprocessor-mediated production of WT pre-mir-30c-1 (Fig. 6a), whereas addition of purified SRSF3 to the G/A variant (Fig. 6c) or to ΔCNNC variants that lack SRSF3 binding sites did not affect the processing activity (Fig. 6b, d). This is reminiscent of what was observed in MCF7 cells under variable levels of SRSF3.
DISCUSSION
The central role of miRNAs in the regulation of gene expression requires that their expression is tightly controlled. Indeed, the biogenesis of cancer-related miRNAs, including those with a role as oncogenes (‘oncomiRs), or those with tumor suppressor functions is often dysregulated in cancer13,15,38. Interestingly, some miRNAs have been shown to display both tumor suppressor and also oncogenic roles, depending on the cell type and the mRNA targets39, as was described for miR-221, which exerts oncogenic properties in the liver40 but also acts as a tumor suppressor in erythroblastic leukemias41. Futhermore, miR-375 has been shown to display a dual role in prostate cancer progression, highlighting the importance of the cellular context on miRNA function42.
Despite a more comprehensive knowledge on the role of RNA-BPs in the posttranscriptional regulation of miRNA production, there is only circumstantial evidence on how RNA sequence variation and RNA structure impact on miRNA processing. There are several reports showing that a single nucleotide substitution in the sequence of precursor miRNAs could have a profound effect in their biogenesis. Nonetheless, there is limited information about SNPs in the terminal loop region (TL) of pri-miRNAs. A bioinformatic approach led to the identification of 32 such SNPs in 21 miRNA loop regions of human miRNAs43. Some studies have found a correlation between the presence of polymorphisms in pri-miRNAs and expression levels of their corresponding mature miRNAs, affecting cancer susceptibility, as shown for miR-15/16 in chronic lymphocytic leukemia (CLL)44,45, miR-146 in papillary thyroid carcinoma46 and miR196a2 in lung cancer47. Another example is the finding of a rare SNP in pre-miR-34a, which is associated with increased levels of mature miR-34a. This could be of biological significance since precise control of miR-34a expression is needed to maintain correct beta-cell function, thus this could affect type 2 diabetes susceptibility48. The emerging picture is that human genetic variation could indeed not only have a role in miRNA function by affecting miRNA seed sequences and/or miRNA binding sites in the 3’UTRs of target genes, but it can also contribute significantly to modulation of miRNA biogenesis21.
In this study, we focused on a rare genetic variation found in the conserved terminal loop of pri-mir-30c-1 (G27 to A variant) that was found in breast cancer and gastric cancer patients and leading to increased expression of miR-30c22,23. There is circumstantial evidence that miR-30c is involved in many malignancies acting as tumour suppressor49,50 or as an oncogene51–53. In order to understand the mechanism underlying miR-30c deregulation in breast cancer, we investigated how this mutation affects miRNA biogenesis. We show that the G-A substitution in pri-mir-30c-1 directly affects Drosha-mediated processing both in vitro as well as in cultured cells (Fig. 1, 5, 6 and Supplementary Fig. 1). The conservation of pri-mir-30c-1 sequences across vertebrate species highlights the importance of the primary sequence in the TL, 5´ and 3´ regions (Fig. 2), suggesting a crucial role in miRNA biogenesis. Indeed, conserved sequences in TL have been shown to be important for recognition by auxiliary factors24 as well as for DGCR8 binding54, allowing efficient and accurate miRNA processing. It has also been shown that pri-miRNA tertiary structure is a major player in the regulation of miRNA biogenesis, as observed for the well characterized miR17-92 cluster55–57. Here, using SHAPE structural analysis, in conjunction with solvent accessibility analysis by hydroxyl cleavage, we found that the G/A sequence variation leads to a structural rearrangement of the apical region of the pri-miRNA affecting the conserved residues placed at the basal part of the stem (Fig. 3 and Supplementary Fig. 2). This demonstrates that pri-mir-30c-1 is organized as a complex and flexible structure, with the TL and the basal region of the stem potentially involved in a tertiary interaction. Further work is required to determine the existence of direct contacts between these regions.
Interestingly, we also observed that this RNA structure reorganization promotes the interaction with SRSF3, an SR protein family member that was demonstrated to facilitate pri-miRNA recognition and processing34, by recognizing the CNNC motif located 17nts away from Drosha cleavage site. Pri-mir-30c-1 has two overlapping CNNC motifs (residues from +16 to +21) (Fig. 5b). Notably, accessibility around this region increased in G/A variant (Fig. 3). Furthermore, toeprint and SHAPE assays in the presence of purified SRSF3 protein clearly demonstrated that SRSF3 is specifically recognizing the CNNC motif in a dose-dependent manner (Fig. 4).
Altogether, data presented here suggest that binding of SRSF3 to the wild-type sequence is limiting and that the structural reorganization induced by the G/A substitution makes the SRSF3 binding sites more accessible. Taking everything into account we propose a model whereby a genetic variant in a conserved region within the TL of pri-mir-30c-1 causes a reorganization of the RNA secondary structure promoting the interaction with SRSF3, which in turn enhances the Microprocessor-mediated processing of pri-mir-30c-1 leading to increased levels of miR-30c (Fig. 7). We conclude that primary sequence determinants and RNA structure are key regulators of miRNA biogenesis.
METHODS
Methods and any associated references are available in the online version of the paper
AUTHOR CONTRIBUTIONS
N.F., R.A.C, S.M. and J.F.C. conceived, designed, interpreted the experiments and wrote the manuscript. N.F. performed most of the experiments and data analysis. R.A.C. carried out in vitro pri-miRNA processing and N.H. performed biochemical experiments. S.M. carried out some of the initial experiments. R.S.Y. provided bioinformatics analysis. J.F.C. supervised the whole project.
COMPETING INTERESTS STATEMENT
The authors declare that they have no competing financial interests.
SUPPLEMENTAL INFORMATION
Supplemental information includes 5 figures, Supplemental Experimental Procedures, and Supplemental References and can be found with this article online
ACKNOWLEDGEMENTS
We are grateful to David Fitzpatrick and Magdalena Maslon (MRC HGU) for discussions and to Javier Martinez (IMBA, Vienna) and Encarnación Martínez-Salas (CBM, Madrid) for critical reading of the manuscript. This work was supported by Core funding from the Medical Research Council and by the Wellcome Trust (Grant 095518/Z/11/Z).