Abstract
Predicting evolutionary change poses numerous challenges. Here we take advantage of the model bacterium Pseudomonas fluorescens in which the genotype-to-phenotype map determining evolution of the adaptive “wrinkly spreader” (WS) type is known. We present mathematical descriptions of three necessary regulatory pathways and use these to predict both the routes that evolution follows and the expected mutational targets. To test predictions, mutation rates and targets were determined for each pathway. Unanticipated mutational hotspots caused data to depart from predictions but the new data were readily incorporated into refined models. A mismatch was observed between the spectra of WS-causing mutations obtained with and without selection due to low fitness of previously undetected WS-causing mutations. Our findings contribute toward the development of mechanistic models for forecasting evolution, highlight current limitations, and draw attention to challenges in predicting locus-specific mutational biases and fitness effects.
Impact statement Using a combination of genetics, experimental evolution and mathematical modelling this work defines information necessary to predict the outcome of short-term adaptive evolution
Introduction
Adaptation requires the realization of beneficial mutations. As self-evident as this maybe, predicting the occurrence of beneficial mutations and their trajectories to improved fitness is fraught with challenges (Lässig, et al. 2017). Nonetheless progress has been made for phenotypically diverse asexual populations subject to strong selection. Effective approaches have drawn upon densely sampled sequence data and equilibrium models of molecular evolution to predict amino acid preferences at specific loci (Luksza and Lassig 2014). Predictive strategies have also been developed based on selection inferred from the shape of coalescent trees (Neher, et al. 2014). In both instances the models are coarse-grained and sidestep specific molecular and mutational details.
There is reason to by-pass molecular details: mutation, being a stochastic process, means that for the most part details are likely to be idiosyncratic and unpredictable. But an increasing number of investigations give reason to think otherwise – that adaptive molecular evolution might follow rules (Pigliucci 2010; Stern 2013; Laland, et al. 2015). This is particularly apparent in studies of parallel molecular evolution (Colosimo, et al. 2005; Woods, et al. 2006; Ostrowski, et al. 2008; Flowers, et al. 2009; Meyer, et al. 2012; Tenaillon, et al. 2012; Zhen, et al. 2012; Herron and Doebeli 2013; Galen, et al. 2015; Bailey, et al. 2017; Kram, et al. 2017; Stoltzfus and McCandlish 2017).
A standard starting position for predicting adaptive evolution recognises the importance of population genetic parameters including mutation rate, generation time, population size, selection and more recently information on the distribution of beneficial fitness effects, but these factors alone leave unconsidered mechanistic details that arise from the genotype-to-phenotype map and from mutational biases. To what extent do these details matter?
Mutations arise randomly with respect to utility, but genetic architecture can influence the translation of mutation into phenotypic variation: the likelihood that a given mutation generates phenotypic effects depends on the genotype-to-phenotype map (Alberch 1991; Gompel and Prud’homme 2009; Stern and Orgogozo 2009; Rainey, et al. 2017). The function of gene products and their regulatory interactions thus provides information on likely mutational targets underpinning particular phenotypes. This is evident when considering a hypothetical structural determinant subject to both positive and negative regulation and whose over-expression generates a given adaptive phenotype. Assuming a uniform distribution of mutational events, mutations in the negative regulator (and not the positive activator) will be the primary cause of the adaptive phenotype. This follows from the fact that loss-of-function mutations are more common than gain-of-function mutations. Indeed, an emerging rule indicates that phenotypes determined by genetic pathways that are themselves subject to negative regulation are most likely to arise by loss-of-function mutations in negative regulatory components (McDonald, et al. 2009; Tenaillon, et al. 2012; Lind, et al. 2015; Fraebel, et al. 2017).
Mutation is not equally likely at each nucleotide of a given genome (Lind and Andersson 2008; Lynch 2010; Seier, et al. 2011; Foster, et al. 2015; Reijns, et al. 2015; Sankar, et al. 2016; Stoltzfus and McCandlish 2017). Numerous instances of mutational bias have been reported. Prime examples are simple sequence repeats such as homopolymeric nucleotide tracts or di-, tri- and tetrameric repeats that mutate at high frequency via slipped strand mispairing (Levinson and Gutman 1987). These readily identifiable sequences define contingency loci in obligate human pathogens and commensals (Moxon, et al. 1994) and are widespread in eukaryotic genomes (Tautz and Renz 1984). The behaviour of contingency loci can be further modulated by defects in components of methyl-directed mismatch repair systems (Richardson and Stojiljkovic 2001; Martin, et al. 2004; Hammerschmidt, et al. 2014; Heilbron, et al. 2014).
Certain palindromic structures also lead to mutational bias (Viswanathan, et al. 2000; Lovett 2004) and promote amplification events including that increase mutational target size (Roth, et al. 1996; Kugelberg, et al. 2010; Reams and Roth 2015), transition-transversion bias (Stoltzfus and McCandlish 2017) and elevated mutation rates at CpG sites (Galen, et al. 2015) can also skew the distributions of mutational effects. Further bias arises from the chromosomal neighbourhood of genes under selection (Steinrueck and Guet 2017), the location of genes with regard to interactions with DNA replication/transcription machineries (Sankar, et al. 2016), and environmental factors that affect not only mutation rate but also the spectra of mutational events (Krasovec, et al. 2017; Maharjan and Ferenci 2017; Shewaramani, et al. 2017).
Beyond the genotype-to-phenotype map and mutational biases, predicting adaptive evolution requires ability to know a priori the fitness effects of specific mutations. At the present time there is much theoretical and empirical interest in the distribution of fitness effects (DFE) (Eyre-Walker and Keightley 2007) — and particularly the DFE of beneficial mutations (Orr 2005) — because of implications for predicting the rate of adaption and likelihood of parallel evolution (de Visser and Krug 2014), but knowledge of the shape of the distribution is insufficient to connect specific mutations to their specific effects, or to their likelihood of occurrence. Such connections require a means of knowing the connection between mutations and their environment-specific fitness effects. This is tall order. A starting point is to understand the relationship between all possible mutational routes to a particular phenotype and the set that are realised by selection.
Here we take a bacterial system in which the genetic pathways underpinning evolution of the adaptive “wrinkly spreader” (WS) type are known and use this to explore the current limits on evolutionary forecasting. Pseudomonas fluorescens SBW25 growing in static broth microcosms rapidly depletes available oxygen establishing selective conditions that favour mutants able to colonise the air-liquid interface. The most successful mutant-class encompasses the WS types (Ferguson, et al. 2013; Lind, et al. 2017b). These types arise from mutational activation of diguanylate cyclases (DGCs), cause over-production of the second messenger c-di-GMP (Goymer, et al. 2006; McDonald, et al. 2009), over-production of an acetylated cellulose polymer (Spiers, et al. 2002; Spiers, et al. 2003) and ultimately formation of a self-supporting microbial mat (Figure 1A).
McDonald et al. (McDonald, et al. 2009) showed that each time the tape of WS evolution is re-run mutations generating the adaptive type arise in one of three DGC-encoding pathways (Wsp, Aws, or Mws) (Figure 1A). Subsequent work revealed that when these three pathways are eliminated from the ancestral type that evolution proceeds along multiple new pathways (Lind, et al. 2015). Preferential usage of Wsp, Aws and Mws pathways stems from the fact that they are subject to negative regulation and thus, relative to pathways subject to positive regulation, or requiring promoter-activating mutations, gene fusion events, or other rare mutations, present a large mutational target.
Given repeatability of WS evolution, knowledge of the Wsp/Aws/Mws pathways, plus genetic tools for mechanistic investigation — including capacity to obtain WS mutants in the absence of selection — the WS system offers a rare opportunity to explore the feasibility of developing bottom-up strategies for evolutionary forecasting. Our findings show that mechanistic-level predictions are possible, but also draw attention to challenges that stem from current inability to a priori predict locus specific mutational biases and environment-specific fitness effects.
Results
Obtaining an unbiased measure of pathway-specific mutation rates to WS
Knowledge of the rate at which mutation generates WS types via each of the Wsp, Aws and Mws pathways — unbiased by the effects of selection — provides a benchmark against which the predictive power of null models can be appraised. To achieve such measures we firstly constructed a set of genotypes containing just one of the three focal pathways: PBR721 carries the Wsp pathway but is devoid of Aws and Mws, PBR713 carries the Aws pathway but is devoid of Wsp and Mws, while PBR712 harbours the Mws pathway but is devoid of Wsp and Aws. Into each of these genotypes a promoterless kanamycin resistance gene was incorporated immediately downstream of the promoter of the cellulose-encoding Wss operon and fused to an otherwise unaffected Wss operon (Figure 1B).
In the ancestral SM genotype the cellulose promoter is inactive in shaken King’s Medium B (KB) broth and thus the strain is sensitive to kanamycin. When a WS-causing mutation occurs the wss promoter becomes active resulting in a kanamycin-resistant WS type (Fukami, et al. 2007; McDonald, et al. 2011). Individual growth of this set of three genotypes in shaken KB, combined with plating to detect kanamycin-resistant mutants, makes possible a fluctuation assay (Luria and Delbruck 1943; Hall, et al. 2009) from which a direct measure of the rate at which WS mutants arise can be obtained. Importantly, because WS types are maladapted in shaken broth culture, the screen for kanamycin-resistant clones allows the pathway-specific mutation rate to WS to be obtained without the biasing effects of selection for growth at the air-liquid interface (Figure 1B). The results are shown in Figure 2.
The mutation rate was highest for the Aws pathway (6.5 ×; 10−9); approximately double that of Wsp (3.7 ×; 10−9) and an order of magnitude higher than that of the Mws pathway (0.74 ×; 10−9) (Figure 2). The rate at which WS mutants arose from the ancestral genotype in which the three pathways are intact (11.2 ×; 10−9) was approximately the sum of the rates for the three pathways (11.0 ×; 10−9) confirming that the Wsp, Aws and Mws pathways are the primary routes by which WS types evolve (Lind, et al. 2015). That the Aws pathway has the greatest capacity to generate WS is surprising given the smaller target size (three genes compared to seven genes in the Wsp pathway).
Modelling the genotype-to-phenotype map underpinning WS evolution
Much is known about the function and interactions among components of each of the three focal pathways. This knowledge allows development of models that capture the dynamic nature of each pathway and thus allow predictions as to the likelihood that evolution will precede via each of the three pathways. An unresolved issue is the extent to which these models match experimental findings. Following a brief description of each pathway we describe the models.
The 8.4 kb Wsp pathway is a chemotaxis-like system (Goymer, et al. 2006; Guvener and Harwood 2007; Romling, et al. 2013; Micali and Endres 2016) comprised of seven genes with the first six genes (wspA-wspF) being transcribed as a single unit and the last (wspR from its own promoter (Bantinaki, et al. 2007). WspA (PFLU1219) is a methyl-accepting chemotaxis (MCP) protein that forms a complex with the CheW-like scaffolding proteins WspB (PFLU1220) and WspD (PFLU1222). WspA senses environmental stimuli and transmits the information via conformational changes in the WspA/WspB/WspD complex to effect activity of WspE (PFLU1223), a CheA/Y hybrid histidine kinase response regulator. WspE activates both the WspR (PFLU1225) diguanylate cyclase (DGC) and the CheB-like WspF methylesterase WspF (PFLU1224) following transference of an active phosphoryl group. The activity of WspA is modulated by methylation: the constitutively active CheR-like methyltransferase WspC (PFLU1221) transfers methyl groups to conserved glutamine residues on WspA while when phosphorylated, WspF serves to remove these groups. WS mutants are known to arise by mutations in the WspF negative regulator and also in the WspE kinase (McDonald, et al. 2009). In vitro manipulations of WspR that abolish repression of the DGC domain by the response regulator domain are known, but have never been observed to occur in experimental populations (Goymer, et al.2006).
The 2.3 kb aws operon contains three genes transcribed from a single promoter (awsXRO). Homologous genes in Pseudomonas aeruginosa (yfiRNB, PA1121-1119) have been characterised in detail (Malone, et al. 2010; Malone, et al. 2012; Xu, et al. 2016). The outer membrane lipoprotein AwsO (PFLU5209) has an OmpA domain, a signal peptide and binds to peptidoglycan. AwsO is thought to be the sensor whose activity is modulated in response to envelope stress (Malone, et al. 2012). AwsO sequesters the periplasmic protein AwsX (PFLU5211) at the outer membrane. AwsX functions as a negative regulator of the DGC AwsR (PFLU5210) in the inner membrane. Both increased binding of AwsX to AwsO or loss of negative regulation by inactivation of the interaction between AwsX and AwsR can lead to WS (McDonald, et al. 2009; Malone, et al. 2010; Malone, et al. 2012).
The 3.9 kb mwsR gene (PFLU5329, known as morA (PA4601) in Pseudomonas aeruginosa), encodes a predicted membrane protein with both a DGC domain that produces c-di-GMP and a phosphodiesterase (PDE) domain that degrades c-di-GMP. Little is known of the molecular details determining its function, but both catalytic domains appear to be active (Phippen, et al. 2014). Deletion of the PDE domain results in a WS phenotype with activity being dependent on a functional DGC domain (McDonald, et al. 2009).
If the specific effect of changing each nucleotide (and sets of nucleotides) was known then models for each pathway would not be required, but such knowledge does not exist. We therefore take a simplifying approach in which attention focuses on the interactions between components that correspond to reactions whose rate can either increase, decrease, or remain unaffected, depending on mutations in the component parts. Such mutations increase the reaction (enabling mutations), decrease the reaction (disabling mutations) or leave them unaffected. The components and interactions are shown in Figure 3. Figure 3 along with figure supplements 1 to 3 describe the molecular reactions and the associated differential equations governing the dynamics of each pathway. An advantage of this simplifying approach is that changes to a reaction may encompass mutations in more than a single component. For example, mutations in either WspE or WspR may increase reaction r5 of the Wsp pathway (Figure 3A).
Equipped with the set of mathematical descriptions it is possible to consider combinations of enabling, disabling, and no effect changes to reaction rates and determine the likelihood that a WS type is generated. For the Wsp system this amounts to 36 or 729 combinations. An example of one set of the possible mutations (mi) in Wsp is 1, −1, 0, 0, 0, 0 (an increase in r1, a decrease in r2, but no change in r3, r4, r5, or r6 (Figure 3A)).
Predicting the pathways that evolution follows and genetic targets
To determine whether mutations producing WS occur more often in Wsp compared to Aws or Mws pathways, we adopt a Bayesian approach in which the probability that a particular pathway is used is decomposed into two terms: the probability that a particular set of mutations (mi) occurs in Wsp (or Aws, or Mws) represented as P (mi∈ Wsp) and the probability that those mutations give rise to a wrinkly spreader represented as P (WS |mi ∈Wsp) (or Aws, or Mws).
To estimate P(mi ∈Wsp) we assume fixed probabilities of enabling and disabling mutations and compute the product. Thus, the probability of mi – 1, −1, 0, 0, 0, 0 is pepd(1 − pe − pd)4, where pe is the probability of a mutation with an enabling effect and pd is the probability of a mutation with a disabling effect. Recognising the value of accommodating the possibility of localised mutational bias we note that pe and pd can be adjusted for the affected reactants. The second term, P (WS |mi ∈Wsp), requires knowing both how gene products interact and how these interactions result in a phenotype. This information is estimated based on the pathway dynamics represented in Figure 3 and Figure 3 – figure supplement 1 by repeated sampling from the space of all possible reaction rates, initial concentrations, and magnitude of effects (see Materials and Methods).
The results of simulations are shown in Figure 4. Figure 4A shows that the Wsp pathway is predicted to be the target of mutation 1.3 - 2.1 times more often than the Aws pathway while Figure 4B shows that the Mws pathway is predicted to be the target of mutation 0.7 - 1.0 times less often that the Aws pathway. While these results agree with the experimental data showing Mws to be least likely pathway to be followed, the predictions are at odds with the mutation rate data showing WS types to be twice as likely to arise from mutation in Aws, versus Wsp. The causes of this discrepancy are described in the following section.
In addition to predicting the preferred mutational pathways to produce a WS, the Bayesian approach also predicts genes likely to be affected by mutation (Figure 4 figure supplement 1 and 2). Predictions as to specific genetic targets come from appraisal of the relative importance of each reaction in terms of generating a wrinkly spreader (Figure 4 – supplement 1 and 2). While it is recognised that a majority of WS mutations arise from mutations in negative regulators of DGC activity, such as WspF and AwsX (McDonald, et al. 2009; Lind, et al. 2015), further predictions are possible based on impacts of alterations in gene function on reaction rates. For example, with reference to the Wsp pathway (Figure 4 – figure supplement 1), there are two reactions (r2 and r6) that affect WspF function: r2 describes the rate of removal of methyl groups from the signalling complex and r6 the rate at which WspF is activated by transfer of active phosphoric groups from the WspE kinase. Loss-of-function (disabling) mutations being much more common than gain-of-function (enabling) mutations means that both WspF and WspE are likely targets. The null model predicts that in the area of parameter space in which Wsp is most likely compared to Aws, 45% of the time WS will be generated when the second reaction, r2, is altered (Figure 4 – figure supplement 1). The same is true for reaction r6. Thus the presence of a negative regulator is predicted to extend the mutational target size well beyond the gene itself. This is also true for Aws, where r3 is the main contributor to the WS phenotype in the case where disabling change is more common than enabling change. Here mutations are predicted not only in the negative regulator AwsX, but also in the interacting region of the DGC AwsR (Figure 4 – figure supplement 2).
Loss-of-function mutations in negative regulators and their interacting partners are not the only predicted targets. For Wsp r1, r3, r4, and r5 are altered approximately 5% of the time in the parameter region where disabling mutations are more common than enabling mutations, but contribute more when the rate of enabling mutations is increased (Figure 4 – figure supplement 1). Enabling mutations based on the model are likely to be found in WspC increasing r1, WspABD increasing r3, WspABD/WspE increasing r4 and WspR increasing r5 (Figure 2A). For Aws, enabling mutations are predicted to increase reaction r1 by mutations causing constitutive activation of AwsO, r2 increasing binding of AwsO and AwsX and r4 increasing formation of the active AwsR dimer (Figure 3B, Figure 4 – figure supplement 2).
In summary, high rates of WS mutations are predicted for wspF, wspE, wspA, awsX and awsR with lower rates for wspC, wspR and awsO. Several of these predictions sit in accord with previous experimental observations, however, notable are predictions that evolution might also target wspA and wspR, two genes that have not previously been identified as mutational causes of WS types (McDonald, et al. 2009).
Analysis of mutants reveals sources of mutational bias
There are several reasons why predictions from the models might be out of kilter with experimental data on mutation rates. We firstly looked to the distribution of WS generating mutations among the 109 mutants collected during the course of the fluctuation assays. Of the 109 mutants, 105 harboured a mutation in wsp (46 mutants), aws (41 mutants) or mws (18 mutants) (Figure 5A, Figure 5 – source data 1). The remaining four had mutations in previously described rare pathways (PFLU0085, PFLU0183), again confirming that these non-focal pathways produce just a fraction of the total set of WS mutants (Lind, et al. 2015).
The distribution of mutations for each of the three pathways is indicative of bias. As shown in Figure 5B, almost 29% of all WS-causing mutations (adjusted for differences in mutation rates between the three pathways) were due to an identical 33 base pair in-frame deletion in awsX (Δt229-g261, ΔY77-Q87), while a further 13 % were due to an identical mutation (79 a->c, T27P) in awsR. At least 41 different mutations in Aws can lead to WS: if mutation rates were equal for these sites the probability of observing 20 identical mutations would be extremely small. In fact 10 million random samplings from the observed distribution of mutations failed to recover this bias. While the Wsp pathway also contains sites that were mutated more than once (six positions were mutated twice, one site three times and one five times), sources of mutational bias in Wsp were less evident than in Aws (Figure 5B).
The mathematical models presented above assumed no mutational bias thus the lack of fit between mutation rate data and predictions from the models. Nonetheless, changing specific reaction rates within the models readily incorporates such knowledge. For example, the mutational hotspot in awsX affects reactions r2 and r3 in the Aws differential equation system (Figure 3B, Figure 3 – figure supplement 2). The effect of a five-fold change in the probability of enabling/disabling change in these reactions leads to the prediction that the Aws pathway is more likely to generate WS types than Wsp for most probability values (see Figure 4D). The only area of parameter space in which evolution is more likely to utilise the Wsp pathway is for rare mutations that have a high probability of enabling change (>> 10−2). One interesting consequence is that it changes the phase-space over which evolution of WS via mutations in the Wsp pathway is more likely with respect to the Aws pathway. In Figure 4A, evolution is most likely to proceed via the Wsp pathway when the probability of disabling change is greater than the probability of enabling change. In contrast, when the likelihood of producing WS types is affected by the mutational hotspot in awsX, then evolution will proceed via Wsp only when the probability of enabling change is greater than the probability of disabling change (Figure 4D and Figure 4 figure supplement 2).
Analysis of mutants reveals mutational targets and effects
Wsp pathway
Mutations were identified in five genes of the seven-gene pathway all of which were predicted by the null model (Figure 4 – figure supplement 1). The most commonly mutated gene was wspA (PFLU1219), with ten of 15 mutations (Figure 5) being amino acid substitutions (six unique) clustered in the region 352-420 at the stalk of the signalling domain. This region has been implicated in trimer-of-dimer formation for the WspA homologue in Pseudomonas aeruginosa (O’Connor, et al. 2012) which is critical for self-assembly and localization of Wsp clusters in the membrane. It is possible that these mutations stabilize trimer of dimer formation, change the subcellular location of the Wsp complex, or affect interaction with WspD (putative interface 383-420 in WspA) (Griswold, et al. 2002) and thus affecting relay of signal to WspE. These effects we interpreted as enabling mutations increasing r3 in Figure 2A. The four additional mutations were in frame deletions in a separate region of the transducer domain (ΔT293 - E299, ΔA281-A308). Although it is possible that these mutations could also affect trimer-of-dimer formation, there are predicted methylation sites in the region (Rice and Dahlquist 1991) that regulate the activity of the protein via methyltransferase WspC and methylesterase WspF. Given that disabling mutations are more common than enabling mutations it is likely that these mutations decrease r2 in Figure 3A by disrupting the interaction with WspF. We also identified a single mutation that fused the open reading frame of WspC, the methyltransferase that positively regulates WspA activity, to WspD, resulting in a chimeric protein (Figure 5, Figure 5 – source data). This mutation is likely to be a rare enabling mutation that increases the activity of WspC (increasing r1 in Figure 3A) by physically tethering it to the WspABD complex thus allowing it to more effectively counteract the negative regulator WspF. Alternatively, the tethering may physically block the interaction with WspF (decrease of r2 in Figure 3A).
The second most commonly mutated gene in the wsp operon was wspE (PFLU1223) (Figure 5). Four amino acids were repeatedly mutated in the response regulatory domain of WspE and all cluster closely in a structural homology model made with Phyre2 (Kelley, et al. 2015). All mutated residues surround the active site of the phosphorylated D682 and it is likely that they disrupt feedback regulation by decreasing phosphorylation of the negative regulator WspF (decreasing r6) rather than increasing activation of WspR (r5 in Figure 3A).
Twelve mutations were detected in wspF (PFLU1224). These are distributed throughout the gene and include amino acid substitutions, in-frame deletions as well as a frame-shift and a stop codon (Figure 5). The pattern of mutations is consistent with both the role of WspF as a negative regulator of WspA activity and the well-characterised effect of loss-of-function mutations in this gene (Bantinaki, et al. 2007; McDonald, et al. 2009). The mutations are interpreted as decreasing r2 in Figure 3A. Five mutations were found in WspR (PFLU1225), the DGC output response regulator that produces c-di-GMP and activates expression of cellulose (Figure 5). All mutations were located in the linker region between the response regulator and DGC domains. Mutations in this region are known to generate constitutively active wspR alleles by relieving the requirement for phosphorylation (Goymer, et al. 2006). They may additionally affect subcellular clustering of WspR (Huangyutitham, et al. 2013) or shift the equilibrium between the dimeric form of WspR, with low basal activity, towards a tetrameric activated form (De, et al. 2009). In our model these increase reaction r5.
Aws pathway
Mutations were identified in all three genes of the Aws pathway – all of which were predicted by the null model. In the Aws pathway, mutations were most commonly found in awsX (25 out of 41 mutations (Figure 5)). The above-mentioned mutational hotspot produced in-frame deletions likely mediated by 6 bp direct repeats (Figure 5 – source data 1). The deletions are consistent with a loss of function and a decrease in r3 (Figure 3B) that would leave the partially overlapping open reading frame of the downstream gene (awsR) unaffected.
The DGC AwsR, was mutated in 14 cases with an apparent mutational hot spot at T27P (9 mutants) in a predicted transmembrane helix (amino acids 19-41). The remaining mutations were amino acid substitutions in the HAMP linker and in the PAS-like periplasmic domain between the two transmembrane helices. These amino acid substitutions are removed from the output DGC domain (Figure 5) and their effects are difficult to interpret, but they could cause changes in dimerization (Malone, et al. 2012) or the packing of HAMP domains, which could, in turn, alter transmission of conformational changes in the periplasmic PAS-like domain to the DGC domain causing constitutive activation (Parkinson 2010). Such effects would increase r4 in Figure 3B. Mutations in the N-terminal part of the protein are easier to interpret based on the existing functional model (Malone, et al. 2012) and most likely disrupt interactions with the periplasmic negative regulator AwsX resulting in a decrease in r3 in Figure 3B.
Two mutations were found in the outer membrane lipoprotein protein AwsO between the signal peptide and the OmpA domain (Figure 5). Both mutations were glutamine to proline substitutions (Q34P, Q40P), which together with a previously reported G42V mutation (McDonald, et al. 2009) suggest that multiple changes in this small region can cause a WS phenotype. This is also supported by data from Pseudomonas aeruginosa in which mutations in nine different positions in this region lead to a small colony variant phenotype similar to WS (Malone, et al. 2012). A functional model based on the YfiBNR in P. aeruginosa (Malone, et al. 2012; Xu, et al. 2016), suggest that AwsO sequesters AwsX at the outer membrane and that mutations in the N-terminal part of the protein lead to constitutive activation and increased binding of AwsX. This would correspond to an increase in r2 in Figure 3B, which would relieve negative regulation of AwsR.
Mws pathway
The MwsR pathway (comprising just a single gene) harbours mutations in both DGC and phosphodiesterase (PDE) domains. Only mutations in the C-terminal phosphodiesterase (PDE) domain were predicted (Figure 3C). Eleven of 18 mutations were identical in-frame deletions (ΔR1024-E1026) in the PDE domain, mediated by 8 bp direct repeats (Figure 5, Figure 5 – source data 1). It has been shown previously that deletion of the entire PDE domain generates the WS phenotype (McDonald, et al. 2009), suggesting a negative regulatory role that causes a decrease of r2 in the model in Figure 3C. One additional mutation was found in the PDE domain (E1083K) located close to R1024 in a structural homology model made with Phyre2 (Kelley, et al. 2015), but distant to the active site residues (E1059-L1061). Previously reported mutations (A1018T, ins1089DV) (McDonald, et al.2009) are also removed from the active site and cluster in the same region in a structural homology model. This suggests that loss of phosphodiesterase activity may not be the mechanism leading to the WS phenotype. This is also supported by the high solvent accessibility of the mutated residues, which indicates that major stability-disrupting mutations are unlikely and changes in interactions between domains or dimerization are more probable. Thus, it is likely that the WS phenotype resulting from a deletion in the PDE domain is caused by disruption of domain interactions or dimerization rather than loss of phosphodiesterase activity.
The remaining mutations within mwsR are amino acid substitutions in the GGDEF domain, close to the DGC active site (927-931) with the exception of a duplication of I978-G985. While it is possible that these mutations directly increase the catalytic activity of the DGC, increasing r1 in Figure 3C, such enabling mutations are considered to be rare. An alternative hypothesis is that these mutations either interfere with c-di-GMP feedback regulation or produce larger conformational changes that change inter-domain or inter-dimer interactions, similar to the mutations in the PDE domain. Based on these data we reject the current model of Mws function, which predicted mutations decreasing r2 (Figure 3C) through mutations inactivating the PDE domain. We instead suggest that the mutations are likely to disrupt the conformational dynamics between the domains and could be seen either as activating mutations causing constitutive activation or disabling mutations with much reduced mutational target size that must specifically disrupt the interaction surface between the domains. In both cases the previous model lead to an overestimation of the rate to WS for the Mws pathway.
Fitness of WS types
We measured the fitness of representative WS types with mutations in each of the mutated genes (wspA, wspC/D, wspE, wspF, wspR, awsX, awsR, awsO, mwsR) in 1:1 competitions against a reference WspF ΔT226-G275 deletion mutant marked with GFP (Figure 6). This type of fitness data should be interpreted with caution because the fitness of WS mutants have been shown to be frequency dependent and some WS mutants are superior in early phase attachment as opposed to growth at the air-liquid interface (Lind, et al. 2015). Nevertheless, these competition experiments provide an estimate of fitness when several different WS mutants compete at the air-liquid interface (a likely situation given a ∼10−8 mutation rate to WS and a final population size of >1010). The fitness data account for the over- or under-representation of some WS mutants when grown under selection (McDonald, et al. 2009) compared to those uncovered without selection (as reported here).
The three wspF mutants, the wspC-wspD fusion, and the wspE mutants have similar fitness. In contrast, both wspA mutants are slightly less fit and both wspR mutants are severely impaired (Figure 6). This sits in accord with previous work in which mutations generating WS obtained with selection have been detected in wspF and wspE, but not wspA or wspR (Goymer, et al. 2006; McDonald, et al. 2009). All awsXRO mutants have similar low fitness compared to the wspF reference strain (Figure 6), which explains why under selection these are found at lower frequencies compared to mutations in the wsp pathway (McDonald, et al. 2009) despite a roughly two-fold higher mutation rate to WS.
Differences of mutational spectra with and without selection
A final question concerns the outcome of the original experimental evolution under selection (McDonald, et al. 2009) and whether it can be explained by our detailed measurement of mutation rates, mutational targets and fitness assays. As indicated above, there exist major differences in the spectrum of mutations isolated with and without selection (Figure 7). The most obvious difference is in the use of the Wsp pathway, which is most commonly used (15/24) under selection and yet produces WS types at a lower rate than the Aws pathway. The explanation lies in the lower fitness of Aws mutants (Figure 6). Similarly, fitness effects also explains differences in the spectrum of wsp mutations, with no wspA mutations being found under selection despite being the most commonly mutated gene without selection (15/46). The previous failure to detect wspR mutants in a screen of 53 WS mutants (Goymer, et al. 2006) is similarly explained by low fitness of WS types arising via mutations in this gene.
Discussion
The issue of evolutionary predictability and the relative importance of stochastic events compared to deterministic processes have a long history in evolutionary biology (Darwin 1872; Simpson 1949; Jacob 1977; Gould 1989; Conway Morris 2003; Orgogozo 2015). Recent interest has been sparked by an increasing number of observations that evolution, under certain circumstances, can be remarkably repeatable (Colosimo, et al. 2005; Shindo, et al. 2005; Jost, et al. 2008; Barrick, et al. 2009; Lee and Marx 2012; Meyer, et al. 2012; Zhen, et al. 2012; Herron and Doebeli 2013), but whether these cases are representative for evolutionary processes in general remains to be determined. A related question, with greater potential for practical applications, is whether it is possible to forecast short-term evolutionary events and if so, then the challenge is to stipulate the data necessary to make successful predictions.
Our uniquely detailed knowledge of the WS experimental evolution system has provided a rare opportunity to disentangle the contributions of selection, mutational biases and genetic architecture to evolutionary outcomes, and consequently explore the limits of evolutionary forecasting. A thorough understanding of the function of the molecular species and their interactions allowed development of a null model by defining the genotype-to-phenotype map, which successfully predicted mutational targets and the relative likelihood that evolution followed each of the three principle pathways. Importantly, genetic architecture is likely to be transferable between different species, which stands to allow the formulation of general principles and evolutionary rules (Lind, et al. 2015). Despite the simplicity of the mathematical null models, which contain only general information about functional interactions, we successfully predicted mutational targets including previously unknown mutations in wspA. Without information about fitness and mutational biases, however, only order of magnitude predictions of mutant frequencies can be made. Thus, it is possible to predict that Wsp, which is subject to negative regulation will be more common than a DGC that requires enabling mutations (Lind, et al. 2015), but not which of two pathways (Wsp and Aws) with differently wired negative regulation is likely to be dominant after selection. Insights from the null model combined with data on mutational targets also allowed us to reject our functional model of Mws.
Direct measurement of the fitness effects of large number of mutations is difficult and time-consuming and typically only possible for microbial species. Therefore future success in predicting fitness effects of mutations rests on the ability to infer them from other parameters, such as estimated effects of mutations on thermodynamic stability or molecular networks, or from incorporation of information concerning evolutionary conservation of amino acid residues. Alternatively, it might be possible to extrapolate findings from a small number of mutations that are either directly constructed and assayed in the laboratory or through fitness estimates of polymorphisms data for natural populations. Recent work on the prediction of the fitness effects of random mutations in several genes suggests that in many cases large effect mutations can be predicted using methods based on evolutionary conservation (Lind, et al. 2017a).
Interestingly WS mutations in the same gene often have similar fitness effects (Figure 6). Obviously no general conclusions can be drawn from these few cases, but if mutations with similar functional effects, for example disruption of a particular interaction, can be assumed to be equally fit, this would greatly reduce the number of specific mutants that need to be experimentally assayed for each gene. Several studies suggest that fitness distributions are often bimodal, with a significant proportion being complete loss-of-function mutations, which could explain the similar fitness effects of mutations in the same genes if they result in inactivation of a particular biochemical reaction of interaction (Sanjuan 2010; Jacquier, et al. 2013; Sarkisyan, et al. 2016; Lind, et al. 2017a). The extent to which fitness effects are transferable between strains with different genetic backgrounds or closely related species remains to be more fully investigated (Ungerer, et al. 2003; Pearson, et al. 2012; Wang, et al. 2014).
Estimates of genomic mutation rates are remarkably consistent across species (Drake 1991), and mutational biases as evident in the types of base substitutions, are well-characterized for a large number of bacterial species (Sung, et al. 2012; Wei, et al. 2014; Farlow, et al. 2015; Foster, et al. 2015; Long, et al. 2015). It is also known that molecular processes, such as transcription and replication, can introduce mutational biases (Beletskii and Bhagwat 1996; Hudson, et al. 2003; Lind and Andersson 2008; Reijns, et al. 2015; Zhao, et al. 2015) and mutational hotspots caused by homopolymeric tracts and direct repeats can greatly increase local mutation rate (Streisinger and Owen 1985; Seier, et al. 2011). However, the distribution of mutation rates across a gene or operon remains largely unknown. Absence of this knowledge currently hinders efforts to forecast adaptive evolution.
There are several cases of probable mutational hotspots in the spectrum of WS mutants found in this study before the influence of selection. One specific deletion (ΔY77-Q87) in awsX accounts for nearly half (20/41) of the mutations in the Aws pathway. Thus, despite the existence of hundreds of possible mutations leading to WS (this work and (McDonald, et al. 2009; McDonald, et al. 2011; Lind, et al. 2015)) one single mutation accounts for more than one quarter of all WS mutations. While the six base pair direct repeat flanking the deletion provides a convincing explanation for its increased rate, it is not clear why this deletion would be ten times more common than the ΔP34-A46 deletion in the same gene that is flanked by ten base pair repeats and contains five base pairs identical to those from the ΔY77-Q87 deletion (Figure 5 – source data 1). There are also instances where single base pair substitutions are overrepresented: the AwsR T27P mutation is found in nine cases, while eight other single pair substitutions in Aws were found only once. Consider further the fact that WspE (a gene of ∼2.3 kb), where changes to only four specific amino acids repeatedly cause WS, and WspF (a gene of ∼1 kb) where any mutation that disrupts function results in WS (Figure 5) contribute equally to the evolution of WS types. Together, these findings draw attention to the limited value of including mutational target size alone as a parameter in predictive models.
It is evident from these findings and from related studies (Pollock and Larkin 2004) that there is need for detailed experimental measurement of local mutation rates in specific systems. Such investigations stand to contribute to understanding of the causes of mutational bias and the extent to which biases might be conserved among related or even unrelated organisms. If local nucleotide sequence is the major determinant, an estimate of mutation rate will apply strictly to very closely related species, but if the dynamics of molecular processes, such as transcription and replication (Sankar, et al. 2016), are major influences then estimates might be applicable to a wider range of species.
Evolutionary forecasting is likely to be most successful for biological systems where there are experimental data on a large number of independent evolutionary events, such as influenza, HIV and cancer (Kouyos, et al. 2012; Fraser, et al. 2014; Lawrence, et al. 2014; Luksza and Lassig 2014; Neher, et al. 2014; Eirew, et al. 2015). Evolution might appear idiosyncratic indicating that every specific system requires detailed investigation, but our hope is that deeper knowledge of the distribution of fitness effects and mutational biases will allow short term forecasts to be produced using modelling without the need for large-scale experimental studies. A major boost to further refinement of evolutionary forecasting is likely to come from combining coarse and fine-grained approaches. Our demonstration that simple null models of functional networks can produce highly relevant quantitative predictions is an important step forward allowing predictions to be directly tested in other experimental systems.
Materials and methods
Strains and media
The strains used in the study are all Pseudomonas fluorescens SBW25 (Silby, et al. 2009) or derivatives thereof. The reporter construct (pMSC), used for isolation of WS mutants before selection, fused the Pwss promoter to a kanamycin resistance marker (nptII) (Fukami, et al. 2007; McDonald, et al. 2011). P. fluorescens strains with deletions of the wsp (PFLU1219-1225), aws (PFLU5209-5211) and mws (PFLU5329) operons were previously constructed as described by McDonald et al. (McDonald, et al. 2011). All experiments used King’s medium B (KB) (King, et al. 1954), solidified with 1.5% agar and incubation was at 28°C. All strains were stored in glycerol saline at −80°C.
Fluctuation tests and isolation of WS mutants before selection
Strains with the pMSC reporter construct and either wild type genetic background or double deletions of aws/mws, wsp/mws or wsp/aws were used to estimate mutation rates to WS before selection. Overnight cultures were diluted to approximately 103 cfu/ml and 60 independent 110 ul cultures were grown for 16-19 h (OD600– 0.9-1.0) with shaking (200 rpm) in 96-well plates before plating on KB plates with 30 mg/l kanamycin. Viable counts were estimated by serial dilution and plating on KB agar. One randomly chosen colony per independent culture with WS colony morphology was restreaked once on KB agar. The assay was repeated at least four times for the double deletion mutants and twice for the wild type strain. Mutations rates were estimated using the Ma-Sandri-Sarkar Maximum Likelihood Estimator (Hall, et al. 2009) available at www.keshavsingh.org/protocols/FALCOR.html.
Sequencing
Mutations causing the WS phenotype were identified by Sanger sequencing of candidate genes in the remaining common pathway to WS, for example the wsp operon for the aws/mws deletion strain. In a few cases where no mutations were identified in the previously established WS target genes, we used genome sequencing (Illumina HiSeq, performed by Macrogen Korea).
Fitness assays
Competition assays were performed as previously described (Lind, et al. 2015) by mixing the WS mutant 1:1 with a reference strain labelled with green fluorescent protein and measuring the ratio of each strain before and after static growth for 24 h using flow cytometry (BD FACS Canto). We used a WspF ΔT226-G275 deletion mutant as the reference strain because WspF mutants are the most commonly found WS type when grown under selective conditions (McDonald, et al. 2009) and the in frame deletion of 50 amino acids most likely represents a complete loss-of-function mutation with minimal polar effects on the downstream wspR. Selection coefficients per generation were calculated as s – [ln(R(t)/R(0))]/[t], as previously described (Dykhuizen 1990) where R is the ratio of alternative WS mutant to WspF ΔT226-G275 GFP and t the number of generations determined using viable counts. Control competition experiments with isogenic WspF ΔT226-G275 reference strains with and without GFP were used to correct for the cost of the GFP marker. Control competitions were also used to determine the cost of the double deletions and the reporter construct relative to a wild type genetic background, for example an AwsX ΔY77-Q87 mutant in Δwsp/Δmws background with pMSC was competed with a GFP labeled AwsX ΔY77-Q87 mutant in wild type background. Competitions were performed independently inoculated quadruplicates for each strain.
Homology models
Homology models of the structure of WspA, WspE, WspR, AwsR, AwsO and MwsR were made using Phyre2 in intensive mode (http://www.sbg.bio.ic.ac.uk/phyre2) (Kelley, et al. 2015).
Probability estimation in the mathematical models
The differential equation models describe the interactions between proteins in each of the three WS pathways. In order to solve the differential equations, two pieces of information are required: i) the initial concentrations of the molecular species and ii) the reaction rates. Although this information is unavailable a random-sampling approach was used to generate different random sets of initial concentrations and reaction rates. Each random set was used to establish a baseline of potential WS expression making it possible to evaluate whether a set of mutations results in a WS type. Effectively, this approach allows sampling of the probability distribution P (WS|mi ∈ Wsp) used in our Bayesian model.
We randomly sample 1,000 different sets of reaction rates and initial concentrations from uniform priors: reaction rates were sampled randomly from a uniform distribution on log space (i.e. 10U[−2,2]) and initial concentrations of reactants were sampled from a uniform distribution U[0,10]. For each set, the appropriate differential equation model was integrated and the steady state concentration of the compounds that correspond to a wrinkly spreader (RR in Aws, R* in Wsp and D* for Mws) computed. This served as a baseline for the non-WS phenotype that was used for comparison to determine whether combinations of mutations result in increased WS expression. After obtaining the baseline, we implemented particular combinations of enabling/disabling mutations (a mi). Ideally, a distribution linking enabling/disabling mutations to a fold change in reaction rates would be used, but this information is unavailable. In order to progress the effect sizes for enabling and disabling mutations were sampled from 10U[0,2] and 10U[−2,0], respectively, and then multiplied by the reaction rates. The differential equations were then solved for the same time that it took the baseline to reach steady state. The final concentration of R* (Figure 3A), RR (Figure 3B) and D* (Figure 3C) was then compared to the baseline and the number of times out of 1,000 that the WS-inducing compound increased served as an estimate of P (WS|mi ∈ Wsp). The probability distribution stabilized by 500 random samples and additional sampling did not produce significant changes (data not shown).
The absence of empirical data on reaction rates, initial concentrations, and expected mutation effect size meant using a random sampling approach requiring estimates for parameter ranges. Parameter ranges were chosen to be broad enough to capture differences spanning several orders of magnitudes while allowing numerical computations for solving the differential equations. To assess the effect of these ranges on the results, the sampling procedure was repeated for WSP for three different parameter regimes i) an expanded range for initial concentrations [0-50], ii) an expanded range for reaction rates 10[−3,3], iii) a compressed range for mutational effect size 10^[−1,1]. This analysis shows that qualitative results are robust to these changes (see Figure S1).
Source code for the mathematical modelling is deposited as supplemental material.
Competing interests
The authors declare no competing interests.
Author contributions
Peter A. Lind Conceptualization, Methodology, Investigation, Data Curation, Writing—original draft, Writing—review and editing, Visualization
Eric Libby Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing—original draft, Writing—review and editing, Visualization
Jenny Herzog Methodology, Investigation, Data Curation
Paul B. Rainey Conceptualization, Methodology, Validation, Writing—original draft, Writing—review and editing, Visualization, Supervision, Project administration, Funding acquisition
Acknowledgements
This work was supported by Marsden Fund Council from New Zealand Government funding, administered by the Royal Society of New Zealand.