Abstract
Mutations provide the variation that drives evolution, yet their effects on fitness remain poorly understood. Here we explore how mutations in the essential enzyme Adenylate Kinase (Adk) of E. coli affect multiple phases of population growth. We introduce a biophysical fitness landscape for multiple phases of bacterial growth, which shows how they depend on molecular and cellular properties of Adk. We find that Adk catalytic capacity in the cell (product of activity and abundance) is the major determinant of mutational fitness effects. We show that bacterial lag times are at an optimum for the endogenous enzyme, while exponential growth rates are only weakly affected by variation in Adk. Direct pairwise competitions between strains show how environmental conditions modulate the outcome of a competition where growth rates and lag times show a tradeoff, altogether shedding light on the multidimensional nature of fitness and its importance in the evolutionary optimization of enzymes.
Introduction
Random mutagenesis is often used to assess the distribution of fitness effects in simple experimental models such as propagating viruses and microbes evolving under antibiotic stress1,2. However, the enormous size of sequence space severely constrains how much of the fitness landscape can be explored this way, and mechanistic and predictive insights from these experiments are further limited by a lack of knowledge of the molecular effects of mutations. Instead, a more targeted experimental approach relies on the concept of a biophysical fitness landscape, in which fitness effects of mutations are mapped through their effects on molecular traits of the mutated proteins. In this approach, biophysically-rational genetic variation is introduced on the chromosome, and the molecular and phenotypic effects of that variation are analyzed concurrently3–6. By mapping fitness effects to variation of molecular properties rather than to sequences of mutated proteins, we can dramatically reduce the dimensionality of the genotype-to-phenotype mapping. The underlying hypothesis is that variation in a small number of properly-selected molecular traits of mutated proteins can explain most of the resulting mutational variation in fitness, and that the relationship between these molecular traits and fitness is smooth and continuous. Several recent studies have supported this approach5–7.
However, the relationship between sequence variation and fitness is further confounded by the fact that multiple phenotypic traits contribute to fitness, and the relative importance of these traits to the long-term evolutionary fate of a mutation8 may be highly dependent on environmental and ecological conditions. While a large number of traits (e.g., viability at various life phases, mating success, fecundity, etc.) determine fitness of multicellular organisms, relatively fewer components of fitness, such as the time in lag phase, the exponential growth rate, and the overall yield in stationary phase, determine fitness of unicellular microorganisms like bacteria and yeast. When in competition for limiting resources, all phases of growth contribute towards the outcome, and hence determine fitness3,9. The relative importance of these different phases of bacterial growth in sculpting the fitness landscape depends on the conditions of growth and competition10,11,12.
Overall, the challenge in quantitatively characterizing the fitness landscape is twofold: Understanding fitness in terms of contributions from different phases of growth, and linking each of these components to genotypic properties of cells. In this work, we address both challenges by introducing biophysically-rational genetic variation in the adk locus that encodes the essential E. coli enzyme Adenylate Kinase (Adk), and projecting the ensuing variations of phenotypic components onto the biophysical traits of Adk. To that end, we assess a comprehensive set of biophysical properties and fitness effects of Adk mutants. We find that a unique combination of molecular and cellular traits of Adk — a product of intracellular abundance and catalytic activity, which we term catalytic capacity — serves as a reliable predictor of fitness effects covering the full range of genotypic and phenotypic variation. Furthermore, we find that the length of the lag phase is more sensitive to variation in Adk catalytic capacity than is the exponential growth rate, so that the lag phase of the wild-type E. coli appears to be optimal with respect to broad variation of Adk catalytic capacity.
Results
Biophysical properties of Adk mutants
We chose a set of 21 missense mutations at 6 different positions of adk designed to sample a broad range of molecular and cellular traits of the protein (Table S1 and Fig. 1). We selected residues such that their accessible surface area was less than 10% and they were at least 6 Å away from the catalytically-active sites of Adk, so that mutations at these residues were likely to destabilize the protein13. For most mutants, we chose amino acid mutations that appeared only at low frequency in an alignment of 895 homologous sequences of Adk. As intended, the purified mutant proteins were destabilized over a wide range (∼17 °C in terms of Tm, and ∼7.5 kcal/mol in terms of folding ΔG) (Table S1, Figs. 1B, S1, S2). In only one case (L209I) did we change the E. coli sequence to the consensus amino acid at that position, and we found it in fact stabilized the protein by ∼1 kcal/mol (Table S1). Although most of the Adk mutants were less stable than the wild-type, they nevertheless existed predominantly as monomers in solution (Fig. S3). However, several mutations in one position — V106H, V106N, and V106W — did have significant fractions of proteins present in higher oligomeric forms, in addition to the predominant monomeric species (Fig. S3). These proteins bound 4,4’-Dianilino-1,1’-binaphthyl-5,5’-Disulfonate (Bis-ANS) dye to a higher degree compared to the rest of the mutants (Fig. S4), indicating the presence of possible molten globule states in solution14. The proteostat dye that reports on protein aggregation4,15 also bound these mutants more strongly compared to others (Fig. S4), clearly indicating a higher fraction of aggregated species. The catalytic efficiency (kcat / KM) of the mutant Adk proteins was distributed broadly with most mutants showing a lower activity than E. coli WT (Table S1, Figs. 1C, S5).
Intracellular abundance of Adk follows prediction from Boltzmann distribution
We then incorporated each of the 21 adk mutations one-by-one into the E. coli chromosome using a genome-editing approach based on homologous recombination3,4. We measured the total intracellular abundance of WT and mutant Adk proteins using a quantitative western blot (Table S2). The sigmoidal dependence of total intracellular Adk abundance on folding stability (ΔG) (Fig. 1D) is well-described by the Boltzmann distribution for two-state unfolding proteins: where PF is the fraction of folded molecules in the ensemble of intracellular Adk and b = 1/ kBT, with the Boltzmann constant kB and the growth temperature T. The total measured abundance of a protein is its amount in the cytoplasm at steady-state, achieved by a balance between production and degradation. Since Adk is expressed from a constitutive promoter in the cells, it is generally safe to assume that the rates of production of all mutants are similar. Under this assumption, the sigmoidal dependence of abundance on stability clearly indicates that the unfolded protein is degraded in the active medium of the cytoplasm.
Mutations in Adk cause more variation in lag times than exponential growth rates
Mutations in Adk affect both intracellular abundance (via folding stability) and catalytic activity of the protein. Flux dynamics theory predicts, and experiments have confirmed, that the key enzymatic parameter determining the flux through an enzymatic reaction chain is the quantity which we call “catalytic capacity,” defined as the product of intracellular abundance and enzymatic efficiency kcat / KM5,6,16. To that end, we determined how two key components of bacterial growth — the exponential growth rate and the lag time (Fig. 2A) — depend on the total catalytic capacity of Adk in E. coli cells (Fig. 2B,C; also see Methods and Fig. S6–S8 for estimation of growth parameters). We find that the variance in lag times across all strains is significantly greater than the variance in exponential growth times (reciprocal growth rates) (Brown-Forsythe test, p = 9×10−4), and the mean change in lag time (relative to wild-type) from each mutation is significantly greater than the mean change in growth time (Welch’s t-test, p = 3×10−7) (see Methods for details). This suggests that the mutations in Adk affect the lag phase more significantly than the exponential growth phase. One mechanism for producing longer apparent lag times is when a greater proportion of cells that come out of stationary phase are simply nonviable, as described in a recent study17. However, this appears not to be the major cause in our case, as lag times are fairly consistent across replicates (error bars in Fig. 2C), and do not negatively correlate with the number of viable cells (Fig. S9). We also find that the variation in total catalytic capacity of Adk correlates better with the variation in lag times (Spearman rank correlation ρ = −0.44, p = 0.057) than with the variation in growth rates (Spearman rank correlation ρ = −0.08, p = 0.737) (Fig. S10). The variation in lag times is also better explained by the variation in catalytic capacity than with any of the Adk properties separately (stability, abundance, or activity) (Fig. S10). Surprisingly, growth rate appears to tolerate a rather large drop in catalytic capacity of Adk, while lag time does not.
WT E. coli is positioned at the cusp of the biophysical fitness landscape for lag time
Since almost all the mutants have lower catalytic capacity than E. coli WT, they only provide sampling in the lower range of catalytic capacity. To determine the dependence of growth rate and lag time on catalytic capacity above WT levels, we over-expressed WT Adk from a pBAD plasmid (see Supplementary Methods). We observed no significant change in either growth rate or lag time at higher than endogenous catalytic capacity (Fig. 2B,C, Table S3). This means that while the growth rate appears to be insensitive to large changes in Adk catalytic capacity both below and above the wild-type level, WT catalytic capacity appears to be situated at the threshold of optimizing lag time. Next, we quantitatively compared the position of WT on these two fitness landscapes. To that end, we used a simple Michaelis-Menten-like function to fit the data in Fig 2B and C (see Eq. 3 and 4 and Methods). The fitting parameter ‘c’ which characterizes the onset of curvature on the landscape (analogous to KM in Michaelis-Menten equation for enzymatic rate) reports proximity of WT to the cusp on the landscape (see Methods). It was 0.005 for growth rate, and 0.12 for lag time as compared to normalized catalytic capacity of 1 for WT. This clearly shows that WT is situated close to the cusp in terms of lag time and well inside the plateau in terms of growth rate. This also suggests that selection for lag time, rather than growth rate, was the predominant determinant of WT Adk catalytic capacity.
Shorter lag imparts advantage at low carrying capacity: A computational model
This data highlights the pleiotropic effects of mutations on different phases of bacterial population growth, which raises the question of how pleiotropy shapes the evolutionary fate of a mutation. We explore this issue by considering the outcome of binary competition between strains18. We first simulated binary competitions over a wide range of growth rates and lag times in media conditions that allow for either 5-fold (low carrying capacity) or 500-fold (high carrying capacity) increase over the initial population (Fig. 3A) (See Methods). We found that there is a significant tradeoff between lag times and growth rates in determining the winners of binary competitions, with lag playing a more important role at low carrying capacity (Fig. 3A), implying that beneficial lag provides a greater fitness advantage under strongly nutrient-limiting conditions.
Shorter lag imparts advantage at low carrying capacity: Experimental evidence
To realize varying nutrient conditions in binary competition experiments, we explored the growth of E. coli over a range of glucose concentrations, mimicking the variation of carrying capacity in simulations, and found that only the carrying capacities are proportional to glucose concentration with minimal effects on lag time and growth rate (Fig. 4). This suggests that observing the outcome of the competition at different time snapshots in a nutrient-rich medium is equivalent to running the competition at different glucose concentrations (carrying capacities). To evaluate the predictions from simulations, we carried out two sets of binary competition experiments based on the overall distribution of growth rates and lag times (Fig. 3B). First, we selected strains exhibiting a tradeoff between growth rate (μ) and lag time (λ) (μ1 > μ2 and λ,1 > λ,2) (inset of Fig. 5B). Second, we tested competition between strains that differ in their lag times but have nearly indistinguishable growth rates (μ1 ≈ μ2 and λ1 > λ2) (inset of Fig. 5C). In all cases a strain with shorter lag time is expected to dominate at lower carrying capacity conditions (corresponding to the competition outcome at early time points), however this advantage would be lost at later time points if its growth rate is lower than that of the competing strain (Fig. 5A). In the second scenario, the advantage due to short lag is expected to persist even at high carrying capacity conditions because the growth rates of the competing strains do not differ. We estimated the relative proportions of the two strains by a qPCR-based mismatch amplification mutation assay (MAMA) approach19 (see Methods and Fig. S11). As expected in the first scenario, L083F and V106H dominated at earlier time points when competed against A093I and L209I, respectively, due to their shorter lag times (λL083F < λA093I and λV106H < λL209I) (Fig. 5B). Eventually their fraction dropped below 0.5 at later time points (equivalent to high carrying capacity) where the growth rates determine the competition output (λL083F < λA093I and μV106H < μl209I) (Fig. 5B). Similarly, for the second scenario, despite having similar growth rates (μWT ≈ μY182V ≈ μL209A), the fraction of WT was always maintained above 0.5 as it spends a shorter time in the lag phase compared to Y182V and L209A (Fig. 5C). The early advantage to WT due to its shorter lag phase determined the competition fitness throughout the whole growth cycle.
Discussion
A complete mapping of mutational fitness effects would ideally require sampling a practically infinite number of mutations, an impossible proposition. Instead, we can project fitness onto a fairly small number of molecular properties of proteins. Within this paradigm, the identity of a particular mutation does not matter as much as its effect on essential biochemical and biophysical properties of the proteins in question. Our data, as well as previous studies5–7,20, validate this approach by showing that we can collapse several molecular phenotypes into a single effective parameter – the product of protein abundance and activity kcat / KM (catalytic capacity) – which quantitatively determines the biophysical fitness landscape to a great extent (Fig. 2B,C). That is, Fig. 2 indicates that the fitness effects of mutations can largely be predicted from their biophysical effects over a broad range of catalytic capacity, validating the utility of a biophysical fitness landscape to map variation in the adk locus to the phenotype. The 21 engineered mutations, along with the Adk overexpression data, allow us to outline the biophysical fitness landscape comprehensively, covering a wide range of variation of the physical parameters of Adenylate Kinase.
These results illustrate how the evolutionary endpoint of molecular traits may depend fundamentally on the multidimensional nature of fitness, with the relative importance of different components of fitness depending on the environment and lifestyle of the organism. It has been argued that endogenous molecular traits are established as a result of mutation-selection balance21, with the final outcome depending on the relative strengths of selection and genetic drift as determined by the population structure22,23. Here we encounter a more complex situation where mutations in the essential enzyme Adk change multiple traits with different effects on fitness. Apparently the mutation-selection balance resulted in disparate outcomes for the two traits, placing lag time at the cusp while keeping the exponential growth rate farther within the plateau region of its respective biophysical fitness landscape. Such an outcome can reflect different strengths of selection and drift as applied to different phenotypic traits. It is therefore possible that ecological conditions of E. coli put stronger emphasis on survival in low carrying-capacity or fluctuating environments, leading to the balance of selection and drift that keeps lag phase just on the cusp in Fig. 2, i.e., optimal with respect to point mutations that decrease catalytic capacity. Our studies of binary competitions (Figs. 3 and 5) highlight this scenario by showing how the environmental parameter of carrying capacity can determine winners and losers in evolutionary dynamics. Although the lag time of a population can depend not only on the environment but also on the population’s specific history (e.g., how long it was previously in stationary phase), the fundamental role of Adk in metabolism suggests that its effects on lag time are likely to be common across conditions and histories. The deep connection between ecological history of species and optimization of biophysical traits of their proteins is a subject for interesting future studies.
Much of our current understanding of microbial cultures and fitness comes from experiments done in the laboratory, where strains are typically grown under a large supply of nutrients. The situation might be very different in the wild, however, where bacteria and other microbes have to survive under harsh conditions of nutrient starvation, extreme temperature, and other environmental stresses24–26. In these circumstances, organisms are likely to spend only a minute fraction of their life-cycle in the exponential growth phase, while undergoing many cycles of lag-growth-saturation as new resources become available and old ones are exhausted. It is therefore intuitive to expect that there has been strong selection in favor of organisms that can not only divide rapidly during exponential growth, but that can also wake up quickly from their lag phase and respond to newly available resources. Our study demonstrates how this selection may shape individual molecular traits.
Our study highlights the relationship between various components of fitness and the molecular properties of modern enzymes — the endpoint of evolutionary selection. An interesting question which is beyond the scope of current work is how modern variants emerged in evolutionary dynamics. To that end mapping temporary reconstructed ancestral species onto biophysical fitness landscape of Adk (and other enzymes) appears a promising direction of future research.
Methods
Selection of mutations
Mutations at relatively-buried positions generally result in decreased stability and lower fitness13,27. Hence we selected the sites for mutagenesis with side-chain accessibility of less than 10%. In addition, the selected sites were also away from the active-site residues, or active-site contacting residues, and a minimum of 6 Å away from the inhibitor Ap5A binding sites (pdb 1ake). We define the active-site residues as those whose accessible surface area changes by at least 5 Å2 in the presence of the inhibitor Ap5A. A similar criterion was used to define the residues contacting the active site. Altogether 4 residues from the LID domain, 3 from the NMP domain, and 28 from the Core domain satisfy these criteria. Of the 28 sites from the Core domain, we randomly chose 6 to mutate. We chose the identities of the mutations to span various sizes of the side chains and a range of conservation. We derived the conservation from the multiple sequence alignment of 895 sequences for Adk collated from ExPASy database (as of Nov 2012).
Generation of mutant strains
We generated the strains with WT and mutant adk with chloramphenicol- and kanamycin-resistance genes on either end of the adk gene using the genome editing approach as described previously3. Since the adk gene is flanked by two repeat regions (REPt44 and REPt45) on the wild-type chromosome, we extended the homology required for recombination up to the middle of the adjacent genes.
Growth curve measurements and media conditions
WT and mutant strains were grown overnight at 30 °C from single colonies in a supplemented M9 medium (0.2 % glucose, 1 mM MgSO4, 0.1 % casamino acids, and 0.5 μg/ml Thiamine). 0D600 was measured for all the strains and then the cultures were normalized to whichever had the lowest OD. The normalized cultures were diluted 1:100 in fresh supplemented M9 media and the growth curves were monitored in triplicates using Bioscreen C at 37 °C. We derived the growth parameters by fitting ln(OD) versus time with the four-parameter Gompertz function (see below). The error in replicates was found to be between 2-3% on an average, and it did not improve significantly upon increase in number of replicates.
Fitting growth data and estimation of growth parameters
In our study, we define lag time (λ) as the time required to achieve the maximum growth rate (μ) (Fig. 2A). We used two different methods to infer these parameters: A) direct analysis of growth curve derivatives and B) fits to the Gompertz function (Fig. S6).
In method A, we took the growth rate as the maximum value of where Δt is 15 minutes. The lag time was then the earliest time at which this maximum growth rate was achieved.
For method B we used the following four-parameter Gompertz function to fit ln(OD) vs. time plots (considering only points with OD600 >= 0.02): where the carrying capacity is K, the maximum growth rate is μ = K / (b·exp (1)), and the lag time λ is the time taken to achieve the maximum growth rate.
The μ and λ estimated from the above methods are strongly correlated (Pearson’s r=0.80, p=1.4e-5 for μ, and r=0.71, p=3.0e-4 for λ) (Fig. S7). However, the uncertainty in the fitted parameters appears to be less than the uncertainty in the parameters obtained from the derivatives, which are limited by the low time-resolution of the experimental data (acquired at an interval of 15 min).
The growth rate (μ) and lag time (λ) appear to be statistically independent of each other across the Adk mutant strains (Spearman ρ = 0.31, p = 0.15, Fig. 3B). Hence it is conceivable that selection can act separately on these two traits, which is further illustrated by the different fitness landscapes observed when projected onto the axis of catalytic capacity (Fig. 2B,C).
Statistical tests of mutational variation in growth and lag phases
We compare the relative effects of mutations on growth and lag phases in two ways. First, we calculate the variances in exponential growth time (reciprocal 1/μ of growth rate, proportional to the maximum cell division time) and lag time; we use growth time rather than growth rate since it must have the same units (i.e., minutes) as lag time for comparison. These variances tell us how much each strain’s growth or lag time differs from the average across all strains. We then use the Brown-Forsythe test (since the growth and lag times are not normally distributed) to determine whether these variances are significantly different. Second, we calculate the mean absolute deviation of each mutant’s growth and lag times relative to the wild-type values. This tells us the average change in growth or lag time caused by a mutation.
The strength of selection on the growth or lag phase should be proportional to the difference in growth or lag time between the two competing strains (Manhart, Adkar, Shakhnovich, unpublished results). Therefore the variances in traits are proportional to the average selection coefficient between all pairs of strains for that trait, while the mean absolute deviation relative to wild-type is proportional to the average selection coefficient between each mutant and the wild-type. The statistical tests above, which determine whether the variances and mean absolute deviation are significantly different between growth time and lag time, also indicate which trait is under stronger average selection between the strains.
Quantification of the location of WT on the fitness landscapes
As in previous works from our lab 5,6 as well as earlier work 16,20 we used the following Michaelis-Menten-like elasticity curve functions to fit the landscape of growth rate vs catalytic capacity (Fig. 2B): where, a is the saturation parameter, and c is the catalytic capacity at a/2. For a similar landscape with lag time (Fig. 2C), the reciprocal of Eq. 3 was used in the following form: where, b is the asymptote parameter, and c is the catalytic capacity at 2b. In both the equations 3 and 4, c is a characteristic value of catalytic capacity at which the landscape transitions from the plateau to the curved part. Since catalytic capacity is normalized by WT c serves as a measure of how close to the cusp the WT on the respective landscapes.
Simulation of binary competition
We simulated the competition of two strains by using the Gompertz function (Eq. 2) to model the growth of individual strains. The initial population (OD0) for both strains was equal, and growth ceases when ∑(ODt/OD0)i = K, where K is the carrying capacity. We considered two different values of carrying capacities (5 and 500). We set μ1 and λ1 to values derived experimentally for WT Adk strain (Table S2), while the growth rates and lag times for the second competing strain were varied randomly across the intervals 0.005 to 0.030 min−1 (for growth rate) and 50 to 300 min (for lag time).
Binary growth competition and quantification
The overnight cultures for individual strains were grown for 16 hours at 30 °C. These cultures were mixed in 1:1 proportion, diluted to an OD of 0.01 in fresh supplemented M9 media, and then regrown at 37 °C. The samples were drawn at different time points, and the OD was adjusted to 2.0, either by concentration or dilution. 5ul of OD 2.0 culture was eventually diluted in 45ul of lysis solution (QuickExtract DNA extraction solution (Epicentre)) to reach OD 0.2. Genomic DNA extracted from 50ul of OD 0.2 culture was diluted 5000 times and used as template. The individual strains in the competition were differentially amplified using allele-specific primers and quantified by a qPCR-based mismatch amplification mutation assay method 19 using QuantiTect SYBR Green PCR kit (Qiagen). A 150 bp long non-mutagenic amplicon of adk gene was amplified as a reference to quantify total genomic DNA. The fraction of the competing strains was determined using the following equation: where Ct represents threshold cycle of qPCR, ref and 1 are the PCR reactions for amplifying the reference and the first allele in competition, while “competition” and “pure” represent the condition of culture.
Contributions
BVA and EIS - designed research; BVA, SB, JT and MMu - performed experiments; BVA, MMa, SB and EIS - analyzed the data; BVA, MMa, SB and EIS - wrote the paper; All authors edited and approved the final version.
Acknowledgements
We thank Shimon Bershtein and Adrian Serohijos for helpful discussions.