Abstract
Mutations provide the variation that drives evolution, yet their effects on fitness remain poorly understood. Here we explore how mutations in the essential enzyme Adenylate Kinase (Adk) of E. coli affect multiple phases of population growth. We introduce a biophysical fitness landscape for these phases, showing how they depend on molecular and cellular properties of Adk. We find that Adk catalytic capacity in the cell (product of activity and abundance) is the major determinant of mutational fitness effects. We show that bacterial lag times are at a well-defined optimum with respect to Adk’s catalytic capacity, while exponential growth rates are only weakly affected by variation in Adk. Direct pairwise competitions between strains show how environmental conditions modulate the outcome of a competition where growth rates and lag times have a tradeoff, altogether shedding light on the multidimensional nature of fitness and its importance in the evolutionary optimization of enzymes.
Introduction
Random mutagenesis is often used to assess the distribution of fitness effects in simple experimental models such as propagating viruses and microbes evolving under antibiotic stress1, 2. However, the enormous size of sequence space severely constrains how much of the fitness landscape can be explored this way, and mechanistic and predictive insights from these experiments are further limited by a lack of knowledge of the molecular effects of mutations. Instead, a more targeted experimental approach relies on the concept of a biophysical fitness landscape, in which fitness effects of mutations are mapped through their effects on molecular traits of the mutated proteins. In this approach, biophysically-rational genetic variation is introduced on the chromosome, and the molecular and phenotypic effects of that variation are analyzed concurrently3–6. By mapping fitness effects to variation of molecular properties rather than to sequences of mutated proteins, we can dramatically reduce the dimensionality of the genotype-to-phenotype mapping. The underlying hypothesis is that variation in a small number of properly-selected molecular traits of mutated proteins can explain most of the resulting mutational variation in fitness, and that the relationship between these molecular traits and fitness is smooth and continuous. Several recent studies have supported this approach5–7.
The relationship between sequence variation and fitness is further confounded by the fact that multiple life-history traits contribute to fitness8, and the relative importance of these traits to the long-term evolutionary fate of a mutation may be highly dependent on environmental and ecological conditions. While multicellular organisms are generally described by a large number of traits (e.g., viability at various life phases, mating success, fecundity, etc.), unicellular microorganisms like bacteria and yeast are described by relatively fewer components of fitness, such as the time in lag phase, the exponential growth rate, and the overall yield at saturation. All these phases of growth contribute toward the outcome when in competition for limited resources, and hence determine fitness3,9. The relative importance of these different phases of bacterial growth in sculpting the fitness landscape depends on the conditions of growth and competition10–12.
Overall, the challenge in quantitatively characterizing the fitness landscape is twofold: Understanding fitness in terms of contributions from different phases of growth, and linking each of these components to molecular and cellular traits. In this work, we address both challenges by introducing biophysically-rational genetic variation in the adk locus that encodes the essential E. coli enzyme Adenylate Kinase (Adk), and projecting the ensuing variations of fitness effects (phenotypic components like growth rate and lag time) onto the biophysical traits of Adk. We find that a unique combination of molecular and cellular traits of Adk — the product of intracellular abundance and catalytic activity, which we term catalytic capacity — provides a reliable predictor of fitness effects across the full range of phenotypic variation. Furthermore, we find that the length of the lag phase is more sensitive to variation in Adk catalytic capacity than is the exponential growth rate, so that the lag phase of the wild-type E. coli appears to be optimal with respect to variation in Adk catalytic capacity.
Results
Biophysical properties of Adk mutants
Destabilizing mutations have been shown to cause a drop in intracellular protein abundance, mostly through a decrease in the folded fraction of the protein3. Hence in order to sample a broad range of molecular and cellular traits of Adk protein below the wild-type levels, we chose a set of 21 missense mutations at 6 different positions of adk. (Table S1 and Fig. 1). We selected residues such that their accessible surface area was less than 10% and they were at least 6 Å away from the catalytically-active sites of Adk, so that mutations at these residues were likely to destabilize the protein13. For most mutants, we chose amino acid mutations that appeared only at low frequency in an alignment of 895 homologous sequences of Adk. As intended, the purified mutant proteins were destabilized over a wide range (~17 °C in terms of Tm, and ~7.5 kcal/mol in terms of folding ΔG) (Table S1, Figs. 1B, S1, S2). In only one case (L209I) did we change the E. coli sequence to the consensus amino acid at that position, and we found it in fact stabilized the protein by ~1 kcal/mol (Table S1). Although most of the Adk mutants were less stable than the wild-type (WT), they nevertheless existed predominantly as monomers in solution (Fig. S3). However, several mutations in one position — V106H, V106N, and V106W — did have significant fractions of proteins present in higher oligomeric forms, in addition to the predominant monomeric species (Fig. S3). These proteins bound 4,4’-Dianilino-1,1’-binaphthyl-5,5’-Disulfonate (Bis-ANS) dye to a higher degree compared to the rest of the mutants (Fig. S4), indicating the presence of possible molten globule states in solution14. The proteostat dye that reports on protein aggregation4,15 also bound these mutants more strongly compared to others (Fig. S4), clearly indicating a higher fraction of aggregated species. The catalytic efficiency (kcat/KM) of the mutant Adk proteins was distributed broadly with most mutants showing a lower activity than E. coli WT (Table S1, Figs. 1C, S5).
Intracellular abundance of Adk follows prediction from Boltzmann distribution
We then incorporated each of the 21 adk mutations one-by-one into the E. coli chromosome using a genome-editing approach based on homologous recombination3,4. We measured the total intracellular abundance of WT and mutant Adk proteins using a quantitative western blot (Table S2). The sigmoidal dependence of total intracellular Adk abundance on folding stability ΔG (Fig. 1D) is well-described by the Boltzmann distribution for two-state unfolding proteins: where PF is the fraction of folded molecules in the ensemble of intracellular Adk and b =1/ kBT, with Boltzmann constant kB and growth temperature T. The total measured abundance of a protein is its amount in the cytoplasm at steady-state, achieved by a balance between production and degradation. Since Adk is expressed from a constitutive promoter in the cells, it is generally safe to assume that the rates of production of all mutants are similar. Under this assumption, the sigmoidal dependence of abundance on stability clearly indicates that the unfolded protein is degraded in the active medium of the cytoplasm.
Mutations in Adk affect lag times more than exponential growth rates
Mutations in Adk affect both intracellular abundance (via folding stability) and catalytic activity of the protein. Flux dynamics theory predicts, and experiments have confirmed, that the key enzymatic parameter determining the flux through an enzymatic reaction chain is the quantity which we call “catalytic capacity,” defined as the product of intracellular abundance and enzymatic efficiency kcat/KM5,6,16. To that end, we determined how two key components of bacterial growth — the exponential growth rate and the lag time (Fig. 2A) — depend on the total catalytic capacity of Adk in E. coli cells (Fig. 2B, C; also see Methods and Fig. S6-S8 for estimation of growth parameters). We find that while only 3 out of 21 strains show a drop in growth rate greater than 5% of WT, 17 strains show an increase in lag time for a similar change over the WT value (Table S2). This suggests that the mutations in Adk affect the lag phase more significantly than the exponential growth phase. One mechanism for producing longer apparent lag times is when a greater proportion of cells that come out of stationary phase are simply nonviable, as described in a recent study17. However, this appears not to be the major cause in our case, as lag times are fairly consistent across replicates (error bars in Fig. 2C) and do not negatively correlate with the number of viable cells (Fig. S9). We also find that the variation in total catalytic capacity of Adk correlates better with the variation in lag times (Spearman’s rank correlation ρ = −0.44, p = 0.057) than with the variation in growth rates (Spearman’s rank correlation ρ = −0.08, p = 0.737) (Fig. S10). The variation in lag times is also better explained by the variation in catalytic capacity than with any of the Adk properties separately (stability, abundance, or activity) (Fig. S10). Surprisingly, growth rate appears to tolerate a rather large drop in catalytic capacity of Adk, while lag time does not.
WT E. coli is positioned at the cusp of the biophysical fitness landscape for lag time
Since almost all the designed mutants were destabilizing and therefore have lower catalytic capacity than E. coli WT, they only provide sampling in the lower range of catalytic capacity. In studies so far, no evidence exists for changes in intracellular protein abundance for stabilizing mutations. Hence to determine the dependence of growth rate and lag time on catalytic capacity above WT levels, we over-expressed WT Adk from a pBAD plasmid (see Supplementary Methods). We observed no significant change in either growth rate or lag time at higher than endogenous catalytic capacity (Fig. 2B, C, S8, and Table S3). This means that while the growth rate appears to be insensitive to large changes in Adk catalytic capacity both below and above the wild-type level, WT catalytic capacity appears to be situated at the threshold of optimizing lag time. Next, we attempted to quantitatively compare the position of WT on these two fitness landscapes. To that end, we used a simple reciprocal Michaelis-Menten-like function to fit the relative growth times (growth time is reciprocal of growth rate μ) and lag times (Fig. S11, also see Eq. 3 and Methods). The fitting parameter b which characterizes the onset of curvature on the landscape (analogous to KM in Michaelis-Menten equation for enzymatic rate) reports proximity of WT to the cusp on the landscape (see Methods). It was 0.006 for growth time and 0.019 for lag time as compared to normalized catalytic capacity of 1 for WT. This shows that WT is situated closer to the cusp in terms of lag time as compared to growth time or growth rate.
Shorter lag imparts advantage at low carrying capacity: A computational model
This data highlights the pleiotropic effects of mutations on different phases of bacterial population growth, which raises the question of how pleiotropy shapes the evolutionary fate of a mutation. We explore this issue by considering the outcome of binary competitions between strains18. We first simulated binary competitions over a wide range of growth rates and lag times in media conditions that allow for either 5-fold (low carrying capacity) or 500-fold (high carrying capacity) increase over the initial population (Fig. 3A) (See Methods). We found that there is a significant tradeoff between lag times and growth rates in determining the winners of binary competitions, with lag playing a more important role at low carrying capacity (Fig. 3A), implying that beneficial lag provides a greater fitness advantage under strongly nutrient-limiting conditions.
Shorter lag imparts advantage at low carrying capacity: Experimental evidence
To realize varying nutrient conditions in binary competition experiments, we explored the growth of E. coli over a range of glucose concentrations, mimicking the variation of carrying capacity in simulations, and found that only the carrying capacities are proportional to glucose concentration with minimal effects on lag time and growth rate (Fig. 4). This suggests that observing the outcome of the competition at different time snapshots in a nutrient-rich medium is equivalent to running the competition at different glucose concentrations (carrying capacities). To evaluate the predictions from simulations, we carried out two sets of binary competition experiments based on the overall distribution of growth rates and lag times (Fig. 3B). First, we selected strains exhibiting a tradeoff between growth rate (μ) and lag time (λ) (μ1 >μ2 and λ,1>λ,2 (inset of Fig. 5B). Second, we tested competition between strains that differ in their lag times but have nearly indistinguishable growth rates (μ1 ≈ μ2 and λ1 >λ2 (inset of Fig. 5C). In all cases a strain with shorter lag time is expected to dominate at lower carrying capacity conditions (corresponding to the competition outcome at early time points), however this advantage would be lost at later time points if its growth rate is lower than that of the competing strain (Fig. 5A). In the second scenario, the advantage due to short lag is expected to persist even at high carrying capacity conditions because the growth rates of the competing strains do not differ. We estimated the relative proportions of the two strains by a qPCR-based mismatch amplification mutation assay (MAMA) approach19 (see Methods and Fig. S12). As expected in the first scenario, L083F and V106H dominated at earlier time points when competed against A093I and L209I, respectively, due to their shorter lag times (λL083F < λA093I and λV106H < λL209I) (Fig. 5B). Eventually their fraction dropped below 0.5 at later time points (equivalent to high carrying capacity) where the growth rates determine the competition output (μL083F < μA093I and μV106H < μL209I) (Fig. 5B). Similarly, for the second scenario, despite having similar growth rates (μWT ≈ μY182V ≈ μL209A), the fraction of WT was always maintained above 0.5 as it spends a shorter time in the lag phase compared to Y182V and L209A (Fig. 5C). The early advantage to WT due to its shorter lag phase determined the competition fitness throughout the whole growth cycle.
Discussion
A complete mapping of mutational fitness effects would require sampling a practically infinite number of mutations, an impossible proposition. Instead, we can project fitness onto a fairly small number of molecular properties of proteins5–7,20. Within this paradigm, the identity of a particular mutation does not matter as much as its effect on essential biochemical and biophysical properties of the proteins in question. Our 21 engineered mutations in Adk, along with the overexpression data, allow us to outline the biophysical fitness landscape, covering a wide range of variation of the physical parameters of the Adk protein. This data shows that we can collapse several molecular phenotypes into a single effective parameter – the product of protein abundance and activity kcat/KM (catalytic capacity) – which quantitatively determines the biophysical fitness landscape to a great extent (Fig. 2B, C). That is, Fig. 2 indicates that the fitness effects of mutations can largely be predicted from their biophysical effects over a broad range of catalytic capacity. Indeed, Adk catalytic capacity explains the variation in lag times to a large extent (Fig. S10), validating the utility of a biophysical fitness landscape for mapping fitness effects.
These results illustrate how the evolutionary endpoint of molecular traits may depend fundamentally on the multidimensional nature of fitness, with the relative importance of different components of fitness depending on the environment and lifestyle of the organism. It has been argued that endogenous molecular traits are established as a result of mutation-selection balance21, with the final outcome depending on the relative strengths of selection and genetic drift as determined by the population structure22,23. Here we encounter a more complex situation where mutations in the essential enzyme Adk change multiple fitness components. In this case, the mutation-selection balance apparently resulted in disparate outcomes for the two fitness components with respect to the molecular trait, placing lag time at the cusp while keeping the exponential growth rate farther within the plateau region of its respective biophysical fitness landscape. Such an outcome may reflect different strengths of selection on growth and lag. The relative strength of selection on these fitness components depends crucially on the environmental conditions (e.g. nutrient availability, etc.)24. Our studies of binary competitions (Figs. 3 and 5) highlight this scenario by showing how the environmental parameter of carrying capacity can determine winners and losers in evolutionary dynamics. Although the lag time of a population can depend not only on the environment but also on the population’s specific history (e.g., how long it was previously in stationary phase), the fundamental role of Adk in metabolism suggests that its effects on lag time are likely to be common across conditions and histories. The deep connection between ecological history of species and optimization of biophysical traits of their proteins is a subject for valuable future studies.
Much of our current understanding of microbial cultures and fitness comes from experiments done in the laboratory, where strains are typically grown under a large supply of nutrients. The situation might be very different in the wild, however, where bacteria and other microbes have to survive under harsh conditions of nutrient starvation, extreme temperature, and other environmental stresses25–27. For example, E. coli is the predominant facultative anaerobe in the gastrointestinal tract28 which allows it to thrive in fluctuating environments of differing oxygen concentrations along the GI tract (e.g., the small vs. the large intestine). In these circumstances, organisms are likely to spend only a minute fraction of their life-cycle in the exponential growth phase, while undergoing many cycles of lag-growth-saturation as new resources become available and old ones are exhausted. It is therefore intuitive to expect that there has been strong selection in favor of organisms that can not only divide rapidly during exponential growth, but that can also wake up quickly from their lag phase and respond to newly available resources. Our study demonstrates how this selection may shape individual molecular traits.
This work highlights the relationship between various components of fitness and the molecular properties of modern enzymes — the endpoint of evolutionary selection. An interesting question which is beyond the scope of current work is how modern variants emerged in evolutionary dynamics. To that end mapping temporary reconstructed ancestral species onto biophysical fitness landscape of Adk (and other enzymes) appears a promising direction of future research.
Methods
Selection of mutations: Mutations at relatively-buried positions generally result in decreased stability and lower fitness13,29. Hence we selected the sites for mutagenesis with side-chain accessibility of less than 10%. In addition, the selected sites were also away from the active-site residues, or active-site contacting residues, and a minimum of 6 Å away from the inhibitor Ap5A binding sites (pdb 1ake). The structure of Adk is divided into three domains: LID (residues 118-160), NMP (residues 30-67), and Core (residues 1-29, 68-117, and 161-214). We define the active-site residues as those whose accessible surface area changes by at least 5 Å2 in the presence of the inhibitor Ap5A. A similar criterion was used to define the residues contacting the active site. Altogether 4 residues from the LID domain, 3 from the NMP domain, and 28 from the Core domain satisfy these criteria. Of the 28 sites from the Core domain, we randomly chose 6 to mutate. We chose the identities of the mutations to span various sizes of the side chains and a range of conservation. We derived the conservation from the multiple sequence alignment of 895 sequences for Adk collated from ExPASy database (as of Nov 2012).
Generation of mutant strains: We generated the strains with WT and mutant adk with chloramphenicol- and kanamycin-resistance genes on either end of the adk gene using the genome editing approach as described previously3. Since the adk gene is flanked by two repeat regions (REPt44 and REPt45) on the wild-type chromosome, we extended the homology required for recombination up to the middle of the adjacent genes.
Growth curve measurements and media conditions: WT and mutant strains were grown overnight at 30 °C from single colonies in a supplemented M9 medium (0.2 % glucose, 1 mM MgSO4, 0.1 % casamino acids, and 0.5 μg/ml thiamine). OD600 was measured for all the strains and then the cultures were normalized to whichever had the lowest OD. The normalized cultures were diluted 1:100 in fresh supplemented M9 media and the growth curves were monitored in triplicates using Bioscreen C at 37 °C. We derived the growth parameters by fitting ln(OD) versus time with the four-parameter Gompertz function (see below). The error in replicates was found to be between 2-3% on an average, and it did not improve significantly upon increase in number of replicates.
Fitting growth data and estimation of growth parameters: In our study, we define lag time (λ) as the time required to achieve the maximum growth rate (μ) (Fig. 2A). Growth time (τ) was defined as reciprocal of growth rate μ. Since it has the same units as lag time, it is more convenient to use for the statistical analysis and data fitting (Fig. S11).
We used two different methods to infer these parameters: A) direct analysis of growth curve derivatives and B) fits to the Gompertz function (Fig. S6).
In method A, we took the growth rate as the maximum value of where Δt is 15 minutes. The lag time was then the earliest time at which this maximum growth rate was achieved.
For method B we used the following four-parameter Gompertz function to fit ln(OD) vs. time plots: where the carrying capacity is K, the maximum growth rate is μ=K/(b·exp(1)) and the lag time λ is the time taken to achieve the maximum growth rate.
For both the methods, we considered only data points with OD600 ≥ 0.02. The instantaneous derivatives of all growth curves show presence of a distinct peak at OD600 values greater than 0.02 (Fig. S6), indicating monoauxic growth and also asserting that the derived growth parameters are unaffected due to ignoring the lower OD data.
The μ and λ estimated from the two aforementioned methods are strongly correlated (Pearson’s r = 0.80, p = 1.4e−5 for μ and r = 0.71, p = 3.0e-4 for λ) (Fig. S7). However, the uncertainty in the fitted parameters appears to be less than the uncertainty in the parameters obtained from the derivatives, which are limited by the low time-resolution of the experimental data (acquired at an interval of 15 min).
The growth rate (μ) and lag time (λ) appear to be statistically independent of each other across the Adk mutant strains (Spearman’s ρ = 0.31, p = 0.15, Fig. 3B). Hence it is conceivable that selection can act separately on these two traits, which is further illustrated by the different fitness landscapes observed when projected onto the axis of catalytic capacity (Fig. 2B, C).
Statistical tests for mutational variation in growth and lag phases: We estimated the monotonic relationship between various growth traits and molecular/cellular properties of Adk mutant proteins using Spearman’s rank correlation (Fig. S10). The agreement between growth parameters derived using instantaneous derivatives and Gompertz fit were estimated by Pearson’s correlation coefficient (Fig. S7). We excluded V106N from all statistical analysis and data fitting as its lag time is ~13 s.d. away from the average lag time of all other strains.
Quantification of the location of WT on the fitness landscapes: A Michaelis-Menten-like elasticity curve function has been used previously5,6,16,20 to fit the dependence of growth rate on catalytic capacity. Since we are considering growth and lag times rather than rates, we use a reciprocal form of the Michaelis-Menten-like function for fitting relative growth time (τ/τWT) and relative lag time (λ/λWT) vs. catalytic capacity (Fig. S11): where a is the asymptotic value of the trait for infinitely large catalytic capacity, and b is the catalytic capacity when the trait equals twice the asymptotic value (2a). Since catalytic capacity is normalized by WT, b serves as a measure of how close to the cusp the WT on the respective landscapes is. For fits in Fig. S11, we empirically set a = 1 which enables easy comparison of parameter b for lag time and growth time plots.
Simulation of binary competition: We simulated the competition of two strains by using the Gompertz function (Eq. 2) to model the growth of individual strains. The initial population (OD0) for both strains was equal, and growth ceases when ∑(ODt OD0)i=K where K is the carrying capacity. We considered two different values of carrying capacities (5 and 500). We set μ1 and λ1 to values derived experimentally for WT Adk strain (Table S2), while the growth rates and lag times for the second competing strain were varied randomly across the intervals 0.005 to 0.030 min-1 (for growth rate) and 50 to 300 min (for lag time).
Binary growth competition and quantification: The overnight cultures for individual strains were grown for 16 hours at 30 °C. These cultures were mixed in 1:1 proportion, diluted to an OD of 0.01 in fresh supplemented M9 media, and then regrown at 37 °C. The samples were drawn at different time points, and the OD was adjusted to 2.0, either by concentration or dilution. 5 µl of OD 2.0 culture was eventually diluted in 45 µl of lysis solution (QuickExtract DNA extraction solution (Epicentre)) to reach OD 0.2. Genomic DNA extracted from 50 µl of OD 0.2 culture was diluted 5000 times and used as template. The individual strains in the competition were differentially amplified using allele-specific primers and quantified by a qPCR-based mismatch amplification mutation assay method 19 using QuantiTect SYBR Green PCR kit (Qiagen). A 150 bp long non-mutagenic amplicon of adk gene was amplified as a reference to quantify total genomic DNA. The fraction of the competing strains was determined using the following equation: where Ct represents threshold cycle of qPCR, ref and 1 are the PCR reactions for amplifying the reference and the first allele in competition, while competition and pure represent the condition of culture.
Data availability: All raw data for growth curves of adk WT and mutant strains, as well as WT overexpression in E. coli BW27783 strains, is included as Dataset 1.
Acknowledgements
We thank Shimon Bershtein and Adrian Serohijos for helpful discussions.
Contributions
BVA and EIS - designed research; BVA, SB, JT and MMu - performed experiments; BVA, MMa, SB and EIS - analyzed the data; BVA, MMa, SB and EIS - wrote the paper; All authors edited and approved the final version.