Abstract
Gene duplication promotes adaptive evolution in two principle ways: allowing one duplicate to evolve a new function and resolving adaptive conflicts by splitting ancestral functions between the duplicates. In an apparent departure from both scenarios, low-expressing transcription factor (TF) duplicates commonly regulate similar sets of genes and act in overlapping conditions. To examine for possible benefits of such apparently redundant duplicates, we examined the budding yeast duplicated stress regulators Msn2 and Msn4. We show that Msn2,4 indeed function as one unit, inducing the same set of target genes in overlapping conditions, yet this two-factor composition allows its expression to be both environmental-responsive and with low-noise, thereby resolving an adaptive conflict that inherently limits expression of single genes. Our study exemplified a new model for evolution by gene duplication whereby duplicates provide adaptive benefit through cooperation, rather than functional divergence: attaining two-factor dynamics with beneficial properties that cannot be achieved by a single gene.
Introduction
The number of Transcription Factors (TFs) expressed in eukaryotes increases rapidly with increasing genome size and organism complexity, ranging from ∼50 in obligate parasites to >1000 in high eukaryotes1. Gene duplication played a major role in this evolutionary expansion2,3, as is evident from the fact that the number of DNA Binding Domains (DBD) remained practically constant with the increasing genome size. In fact, the majority of TFs belong to just a few DBD families2,3, the content of which increased rapidly with genome size. Understanding the adaptive forces that promote this duplication-dependent expansion is therefore of a great interest.
Gene duplication can promote evolution by allowing one of the duplicates to adapt a novel function while the second duplicate maintains the ancestral function. More often, however, the two duplicates do not gain a new function but rather lose complementary subsets of ancestral functions4,5. Sub-functionalization not only explains duplicate maintenance, but can also promote adaptive evolution by enabling further optimization of each individual function and resolving adaptive conflicts of the ancestral gene6,7. Indeed, optimizing a dual-function protein is often constrained by conflicting requirements imposed by the different functions: a mutation that favor one function can perturb the other function, presenting an adaptive conflict that is only resolved upon duplication.
In the context of TFs, duplication may allow one factor to acquire a new set of target genes (neo-functionalization). Alternatively, the ancestral targets could split between the duplicates (sub-functionalization). In both scenarios, duplicate divergence would increase and refine the regulatory logic. Previous studies exemplified both scenarios8–10, but whether they are relevant for the majority of TF duplicates remained unclear. In fact, present-day genomes express a large number of TF duplicates that regulate highly similar, or even redundant, set of targets.
Budding yeast provide a convenient platform for studying the possible adaptive role of apparently redundant duplicates. The yeast lineage underwent a Whole Genome Duplication (WGD) event about one hundred million years ago11, and while most duplicates generated in this event were lost, about 10% were retained, amongst which TF are over-represented. Many of the retained TF duplicates show little signs of divergence but rather retained a highly conserved DBD, and, accordingly, bind the same DNA binding motif and regulate a highly similar set of genes. We reasoned that studying such duplicates might help shed light on possible benefits provided by TF duplication.
As a case in point, we investigated Msn2 and Msn4, a well-studied TF duplicates in budding yeast, which activate a large number of targets of the environmental stress response12,13. Previous studies established the tight similarity between these factors14, but we still began our study by increasing experimental resolution in an attempt to detect target divergence. Our results, however, re-enforced the conclusion that the two factors regulate the same set of target genes, translocate to the nucleus with the precise same dynamics, and expressed in an overlapping set of conditions.
Our search for differences between the duplicates pointed us in a different direction: the challenge which cells face when attempting to minimize noise in gene expression. Transcription is a stochastic process and is therefore characterized by random variations (noise) between genetically identical cells15,16. Cell to cell expression variability is deleterious for genes that require precise tuning of expression, such as dosage sensitive genes17,18, but beneficial when enabling processes not possible by deterministic dynamics, such as bet-hedging strategies19–21. Accordingly, noise levels vary greatly between genes22. Yet, the ability to tune expression noise through changes in gene promoter is limited by mechanistic constraints, and in particular by the well-documented conflict between regulatory control and noise: genes that are regulated over a wide dynamic range also show a high level of expression noise15,23–25. Thus, while coding for low-noise expression is possible, it comes at the cost of lowering the dynamic range over which expression can be changed by regulatory signals.
Our study shows that through duplication, Msn2,4 resolved this interplay between environmental responsive and noise. Following duplication, Msn2 expression became highly stable, showing limited responsiveness to environmental conditions and low expression noise. By contrast, Msn4 expression accentuated the environmental-responsive expression of the unduplicated homologue. This resulted in an overall expression of the Msn2,4 unit that is responsive to the environment but is also low-noise at the low expressing conditions. We provide evidence that this expression tuning is phenotypically adaptive, and define the genetic changes that correlates with the change in gene responsiveness and noise. Our results suggest that duplicates can promote adaptive evolution not only through functional divergence, as suggested by the neo-or sub-functionalization models, but also through effective cooperation, by attaining two-factor dynamics with emergence beneficial properties that cannot be achieved using a single gene.
Results
Low-noise (Poisson) distribution of MSN2 expression in individual cells
Msn2 and Msn4 are TF duplicates that regulate the stress response in budding yeast12,26. Stress genes show a noisy expression, and we were therefore surprised to observe that Msn2 is expressed at very similar amounts across individual cells. In fact, Msn2-GFP was the least noisy of all proteins expressed at its level, as quantified in a study surveying >2500 GFP-fused proteins22 (Figure 1A). To examine whether this low noise is also seen at the MSN2 transcript level, we used single-molecule Fluorescent In-Situ Hybridization (smFISH)27 technique (Figure 1B). The number of MSN2 transcripts in individual cells was well described by a Poisson distribution, as expected when individual mRNA transcripts are produced and degraded at constant rates28,29. This distribution presents the lower limit of gene expression noise, obtained in the absence of regulation, and other noise-amplifying processes28.
Increasing Msn2 expression promotes stress protection but reduces cell growth rate
Low expression noise characterizes genes coding for essential functions or components of large complexes29,30, for which expression tuning is beneficial30–32. By contrast, Msn2 is not essential, does not participate in large complexes, and is mostly inactive in rich media. To examine whether, and how, Msn2 expression level impacts cell fitness, we engineered a library of strains expressing Msn2 at gradually increasing amounts using synthetic promoters33. Measuring growth rates of the library strains using a sensitive competition assay (Figure 1C), we found that decreasing Msn2 expression to below its wild-type levels, and down to a complete deletion, had no detectable effect on growth rate within the resolution of our assay (0.5%). By contrast, increasing Msn2 abundance gradually decreased growth rate (Figure 1D,S1). Next, we tested the effect of Msn2 levels on the ability to proliferate in harsh stress, by incubating the library cells with high H2O2 concentrations (Figure 1C,E,S1). Here, increasing Msn2 levels was beneficial: cells that expressed high levels of Msn2 resumed growth faster than low-expressing ones. Therefore, increasing Msn2 expression better protects cells against stress, but reduces their growth rate. An optimal Msn2 level is therefore desirable to balance the need for rapid growth and stress protection, explaining the requirement for low-noise tuning of its gene expression.
The tradeoff between rapid growth and stress preparation depends on the contribution of these two parameters to the overall population fitness, as defined by the evolutionary history. This relative contribution, in turn, depends on growth conditions. For example, when cells encounter optimal growth conditions, maximizing division rate dominates, but when nutrients become limiting, or respiration is triggered, the importance of stress protection increases. Consistent with this, wild-type cells were better protected against H2O2 exposure at higher cell densities, as they approached stationary phase, resuming growth faster after stress induction (Figure 2A). We therefore expected Msn2 expression to increase with cell density. This, however, was not the case. Although Msn2 contributed to stress protection at all densities, its expression remained constant, throughout the growth curve (Figure 2B).
Msn4 expression is environmental-sensitive and high noise
Msn4, the Msn2 duplicate, is also a stress genes activator13,26. Msn4-GFP was undetectable in reported measurements34,35, suggesting that its expression level is low during rapid growth. We reasoned that Msn4 expression increases along the growth curve to promote stress protection. This was indeed the case: Msn4 expression increased with cell density, both at the transcript and the protein levels (Figure 2B,C). This higher expression was accompanied by increased contribution to stress protection, as was measured by introducing H2O2 to strains deleted of msn2 in different cell densities (Figure 2A). Consistent with the control-noise tradeoff described above, this dynamic regulation of MSN4 was accompanied by a high level of expression noise, which significantly exceeded the Poissonian variance (Figure 2C).
Msn2 and Msn4 co-localize to the nucleus with the same dynamics in individual cells
Msn4 could collaborate with Msn2 in promoting stress protection by regulating the same set of genes or by inducing a distinct set of targets. Similarly, it could respond to the same, or to different sets of post-translational factors. Since activation of Msn2,4 culminates in nuclear localization36,37, we first followed the nuclear translocation dynamics of fluorescent-tagged Msn2 and Msn4 (Figure 3A). In response to stress, the two factors translocated to the nucleus within minutes, showing precisely the same kinetics within individual cells (Figure 3B,C-yellow shade, S2). Similarly, during the stochastic pulsing following stress37,38, translocation of the two factors was tightly synchronized within individual cells, but not between different cells (Figure 3C-pink shade). Deletion of one factor did not affect the dynamics of its duplicate (Figure S3).
Msn2 and Msn4 induce the same set of target genes
Next, we examined for differences in Msn2,4 target genes. In rapidly growing cells, deletion of MSN2 strongly reduced stress gene induction, while deletion of MSN4 had little, if any, effect (Figure 4A, S5). Swapping the MSN2,4 promoters completely reversed the target induction capacity of these factors (Figure 4B, S6B). The identity of the targets remained the same: Msn4 driven by the MSN2 promoter induced precisely the same targets normally induced by Msn2. When tested in conditions where both factors are expressed to equivalent amounts, the two factors induced the same set of genes (Figure 4C, S6). Since a previous study39 which followed stress gene induction using fluorescence reporters, indicated some differences in individual targets dependence on Msn2,4, we examined specifically the genes reported to be differently regulated. However, none of these genes showed any difference in their Msn2,4 dependency in any of the 6 conditions for which we performed tight time-course measurements (Figure S7). To further corroborate these results, we also measured the genome-wide binding profiles of the two factors, using the sensitive ChEC-seq method40. The two factors bound to the precise same promoters, occupied the precise same positions within individual promoters, and showed an identical preference for their common (known) DNA binding motif (Figure S8). This identity of Msn2,4 targets is consistent with the high conservation of their DNA binding domains (Figure 3D), and identity of their in-vitro DNA binding preferences41 (Figure S9). We conclude that Msn2,4 proteins are co-regulated by the same signals and at the same kinetics, and activate the same set of target genes with the same kinetics, essentially functioning as one TF.
Differential design of the Msn2,4 promoters explains the differences in their expression flexibility and noise
Msn2 expression is stable along the growth curve, while Msn4 is strongly induced. To examine whether this differential regulation in expression is specific to these conditions, or is rather a more general property of the two genes, we surveyed a dataset composed of thousands of transcription profiles12,42,43. Expression of Msn2 showed little variability under all conditions tested, while Msn4 was highly variable (Figure 5A,B). Expression of Msn2 and Msn4 therefore conforms to the general tradeoff between expression noise and regulatory control: Msn2 is stable across conditions and shows low cell to cell variability (noise), while Msn4 expression readily responds to environmental signals and is noisy.
Previous studies have defined promoter designs that encode for flexible and noisy, or stable and low-noise expression44–46. Flexible promoters tend to contain a TATA box and bind nucleosomes immediately upstream to their Transcription Start Site (TSS), while stable promoters lack a TATA box and display a Nucleosome Free Region (NFR) upstream of their TSS. Consistent with their differential flexibility, we find that the MSN4 promoter contains a TATA box, binds nucleosomes around its TSS and contains a large number of TF binding sites. By contrast, the MSN2 promoter does not contain a TATA box, displays an NFR immediately upstream of the TSS and is largely devoid of TF binding sites (Figure 5C; data from47–49).
When aligned by their coding frames, the nucleosome patterns along the upstream regions of MSN2 and MSN4 promoters are highly similar. However, the location of the TSS is different: in MSN4, the TSS is positioned ∼105 bp away in a region that is nucleosome occupied, while in MSN2 the TSS is significantly further upstream and located on the border of an NFR. The resulting 5’UTR of MSN2 is exceptionally long (∼430 bp of length, found in only 2% of S. cerevisiae genes). Deleting this NFR region from the MSN4 promoter, practically eliminated Msn4 induction along the growth curve. By contrast, deleting regions close to the ORF had little, if any further effect (Figure S10). Furthermore, replacing this region in the MSN4 promoter by the corresponding region from MSN2 promoter, including its new TSS and NFR, increased MSN4 expression and reduced its noise (Figure S11). Therefore, this promoter region accounts for the differential expression characteristics of MSN2 and MSN4.
A shift in the TSS following WGD event modified MSN2 promoter design
Msn2,4 were duplicated in the whole-genome duplication (WGD) event, ∼100 million years ago11, and were retained in all WGD species tracing to this event. To examine whether the differential promoter structure of MSN2,4 is conserved in other WGD species, we used available 5’ RNA data50 and further profiled TSS positioning in these species. The TSS positions of the MSN2 and MSN4 homologues were conserved in all post-WGD species (Figure 6A). Sequence analysis further indicated that the TATA box was conserved in all MSN4 homologues but absent from all MSN2 homologues (Figure 6A). We next profiled 5’ RNA in two non-WGD species. The transcript of the single MSN homologue has a short 5’UTR, similar to that of MSN4. This pattern of conservation is consistent with a scenario in which the stable MSN2 promoter evolved from an ancestral flexible promoter through a shift in the TSS to a distant, TATA-lacking position, at the boundary of a nearby NFR.
MSN4 accentuated the environmental-responsive but noisy expression of the non-WGD homologue, while MSN2 gained a stable, low-noise expression
To examine whether the differential expression flexibility of MSN2,4 is also conserved in the other post-WGD species, we used available expression data51 of 13 yeast species along their growth curves. In all post-WGD species, MSN4 expression increased along the growth curve, while MSN2 expression remained stable (Figure 6B). The same dataset also profiled non-WGD species, allowing us to also examine the expression of the MSN single homologue in these species. The single MSN homologue in the non-WGD species showed a moderate induction along the growth curve, with dynamic range that was larger than that of MSN2, but lower than that of MSN4 (Figure 6B). To examine whether this intermediate regulation is also reflected in the expression noise, we introduced the MSN promoter from K. lactis, a non-WGD species into S. cerevisiae, upstream of the MSN2 ORF, and measured expression noise using smFISH. As predicted, this promoter showed an intermediate noise level that was higher than MSN4 but lower than MSN2 (Figure 6C). In fact, when plotted on the noise-control curve, the three promoters all fell on the same line, consistent with same-proportion change in noise and dynamic range of regulated expression. Therefore, our analysis suggests that MSN2 gained its stable, low-noise expression following the duplication event, likely by shifting its TSS, while MSN4 accentuated the regulated expression of the ancestral factor, likely through the acquisition of new binding sites for transcription factors, increasing its dynamic range and expression noise.
Discussion
Taken together, our study defined a novel role for the Msn2,4 duplication. We were initially surprised to find that these two duplicates regulate the same set of target genes, translocate to the nucleus with the precise same dynamics, and are expressed in an overlapping set of conditions. What limits their replacement, in at least some species, by a single factor of a more refined transcriptional control? Our data shows that Msn2,4 function as one unit whose expression is both environmentally-responsive and low-noise (Figure 6D), thereby resolving an inherent conflict that limits the tuning of individual gene expression. Msn2 provides the low-noise basal expression, whereas Msn4 is induced when additional amounts are needed.
What could be the evolutionary force promoting this new evolution? Since the MSN duplication traces to the WGD event, it is tempting to propose that its new expression characteristics were driven by the shift in metabolism: Rapidly growing non-WGD species respire, while WGD species ferment. Following this metabolic change, genes needed in respiring cells may shift from being constitutively expressed, to being Msn-dependent, as was indeed reported52. We propose that changes in the identity of Msn2,4-dependent genes accentuated its phenotypic effects on growth and drove selection for increased precision of Msn2 expression.
Gene duplications is a major source of evolutionary innovation4,5 that greatly contributes to the expansion of transcription networks2,3. A surprisingly large fraction of TF duplicates, however, retained a conserved DNA binding domain and bind to the same DNA motif (Figure S12), suggesting limited divergence in regulatory targets. These, and other duplicates of apparent redundant function53–56 do not comply with the accepted models of neo- or sub-functionalization explaining duplicate advantage. Our study suggests a third model whereby duplicates with redundant biochemical properties realize dynamic properties that are not possible, or difficult to achieve using a single factor. In the case of Msn2,4, duplication resolved a conflict between regulatory control and noise. In other cases, interactions between the factors may define a circuit with dynamic properties not implementable by a single gene55,57,58. Further studies will define the relative contribution of such circuit-forming mechanisms in explaining the retention of TFs or other duplicates.
Acknowledgments
We thank members of our lab for fruitful discussions and comments on the MS. We thank Nir Friedman and his group for their help and fertile discussions, especially to Daphna Joseph-Strauss. We thank Yoav Breuer for his help with constructing and performing one of the smFISH experiment. We thank Alon Appleboim for his support and suggestions. This work was supported by the ISF, and the Minerva Center.
Footnotes
Typo in the text