Abstract
The activity of a gene may be influenced or modified by other genes in the genome. Here, we show that co-evolution can be used to identify quasi-independent gene groups inside of larger cellular systems. Using folate metabolism as a case study, we show that co-evolution indicates a sparse architecture of interactions, with three small groups of genes co-evolving in the midst of others that evolve independently. For one such module – dihydrofolate reductase (DHFR) and thymidylate synthase (TYMS) – we use epistasis measurements and forward evolution to demonstrate both internal functional coupling and independence from the remainder of the genome. Mechanistically, the coupling is driven by a constraint on their relative activities, which must be balanced to prevent accumulation of a metabolic intermediate. Applying co-evolution analyses genome-wide reveals a number of other gene pairs with statistical signatures similar to DHFR/TYMS, suggesting that small adaptive units are a general feature of cellular systems.
Introduction
The collective action of enzymes in metabolism provides the basic materials for cells to grow and divide. Functional coupling between metabolic enzymes in a pathway can constrain enzyme activity, binding affinity, and expression level (Barenholz et al., 2017; Salvador and Savageau, 2003, 2006). Because the pattern of constraints within and between proteins remains largely unknown, our ability to rationally engineer new systems (Kim and Copley, 2012; Michener et al., 2014a; Michener et al., 2014b), understand how the cell adapts to perturbations (Kim et al., 2010; Long et al., 2018), and quantify the relationship between mutations and disease (Kondrashov et al., 2002; Zuk et al., 2012) is limited. In all of these instances, one needs to be able to answer the following basic question: if the activity or expression level of a particular enzyme is perturbed, which (if any) of the other enzymes in metabolism would require compensatory modifications? More generally, an ability to globally map such coupling between proteins and to identify groups of enzymes within a pathway that adapt and function as a unit would help render cellular systems more tractable and predictable.
To begin to address this problem, we used comparative genomics to study the pattern of selective constraints between enzymes in bacterial folate metabolism. The central premise of this approach is that functional couplings between proteins drive co-evolution of the associated genes, leaving behind a set of detectable statistical signatures in extant genomes. Co-evolution can be manifested in different ways, and here we use two measures: conservation of proximity on the chromosome (synteny, (Huynen et al., 2000; Janga et al., 2005; Junier and Rivoire, 2013)) and the coordinated loss and gain of genes across species (sometimes called phylogenetic profiling, (Kim and Price, 2011; Pellegrini et al., 1999; Rivoire, 2013)). Our analyses indicate three small groups of genes within folate metabolism that behave as independent co-evolutionary units: the genes within a group co-evolve with each other, but each group appears to evolve relatively independently from the remainder of folate metabolism (and the genome).
If functional constraints between enzymes indeed underlie co-evolutionary patterns, then co-evolutionary maps may provide an approach to understanding how pathways adapt to perturbation. The prediction is simple: when part of a co-evolutionary unit is perturbed, we expect compensatory mutations within genes in that unit, but not elsewhere in the pathway or genome. To test this hypothesis, we chose the most strongly co-evolving enzyme pair, consisting of dihydrofolate reductase (DHFR) and thymidylate synthase (TYMS). Using quantitative measurements of epistasis and forward evolution, we demonstrate that these two enzymes are (1) epistatically coupled through a shared metabolite and (2) adapt independently from the remainder of the genome. In particular, we observe that mutations in these two enzymes are sufficient for E. coli to acquire resistance to the inhibition of DHFR by trimethoprim. Extending our statistical analyses of co-evolution genome wide reveals additional gene pairs that strongly co-evolve with one another yet are mostly independent from the rest of the genome, suggesting that the finding of small adaptive units embedded within larger cellular systems may be a general feature of metabolic networks.
Results
An evolution-based map of functional coupling in folate metabolism
The core one-carbon folate metabolic pathway consists of thirteen enzymes that interconvert various folate species and produce methionine, serine, thymidine and purine nucleotides in the process (Fig. 1A and Table S1) (Green and Matthews, 2013). The input of the pathway is 7,8-dihydrofolate (DHF), produced by the bifunctional enzyme dihydrofolate synthase/folylpolyglutamate synthetase (FPGS) through the addition of L-glutamate to dihydropteroate. Once DHF is formed, it is reduced to 5,6,7,8-tetrahydrofolate (THF) by the enzyme dihydrofolate reductase (DHFR) using NADPH as a co-factor. THF can then be modified by a diversity of one-carbon groups at both the N-5 and N-10 positions, and subsequently serves as a one-carbon donor in several critical reactions, including the synthesis of serine and purine nucleotides (Fig. 1A, bottom square portion of pathway). The only step that oxidizes THF back to DHF is catalyzed by the enzyme thymidylate synthase (TYMS), which modifies uridine monophosphate (dUMP) to thymidine monophosphate (dTMP) in the process.
This pathway is well conserved across bacteria, ensuring good statistics for our analysis. Due to the central role of these metabolites in cell growth and division, the function of folate metabolism can be readily assessed through quantitative growth rate measurements (Reynolds et al., 2011). Further, folate metabolism is a target of several well-known antibiotics (trimethoprim and sulfamethoxazole), and chemotherapeutics (methotrexate and 5-fluorouracil) (Ducker and Rabinowitz, 2016; Gangjee et al., 2007). These factors enable experimental strategies to measure gene function, adaptation, and epistatic coupling in vivo. Thus, folate metabolism provides a good model system to examine the use of co-evolution in identifying adaptive units, and to study the functional mechanisms underlying evolutionary coupling in a metabolic pathway.
To study co-evolution in this pathway, we computed two different statistical measures across 1445 bacterial genomes: 1) synteny, the conservation of chromosomal proximity between genes and 2) co-occurrence, the coordinated loss and gain of genes across species. These measures have the advantage of being relatively straightforward to compute, provide two separate measures of evolutionary coupling, and were previously established as reliable indicators of protein functional relationships (Huynen et al., 2000; Janga et al., 2005; Junier and Rivoire, 2016; Kim and Price, 2011; Pellegrini et al., 1999; Snel et al., 2002). Following from previous work, we used clusters of orthologous groups of proteins (COGs) to define orthologs across species (Galperin et al., 2015). For all COG pairs, we then computed the strength of co-evolution (as relative entropy) either by synteny or co-occurrence (see Supplemental Experimental Procedures for details).
These analyses reveal a sparse pattern of evolutionary coupling in which most folate metabolic genes are relatively independent from each other (Fig. 1B-C). Consistent with expectation, we observe coupling between physically interacting gene products: the glycine cleavage system proteins H, P and T (gcvH, gcvP, and gcvT in E. coli). Together with lipoamide dehydrogenase (lpdA), these enzymes form the glycine decarboxylase complex (GDC), a macromolecular complex that reversibly catalyzes either the degradation or biosynthesis of glycine (Okamura-Ikeda et al., 1993). Notably, lipoamide dehydrogenase also functions as part of both the 2-oxoglutarate dehydrogenase and pyruvate dehydrogenase multienzyme complexes in addition to the GDC (Carothers et al., 1989). The more general functional role of this protein in multiple complexes may explain the weaker co-evolution of lpdA with gcvH, gcvP, and gcvT.
Interestingly, we also see evolutionary coupling of enzyme pairs with no evidence for physical interaction: 1) DHFR/TYMS and 2) methionine synthase (MS) and methionine tetrahydrofolate reductase (MTHFR). DHFR and TYMS comprise the most strongly coupled gene pair in the folate cycle, and are de-coupled at the genome scale (Fig. 1). Both pairs of enzymes catalyze consecutive reactions in folate metabolism, suggesting that a biochemical mechanism might underlie their functional coupling. Yet, we note that most enzymes that catalyze neighboring reactions do not show statistical correlation (Fig. 1). Instead, co-evolution provides a different representation of folate metabolism in which most genes are independent and a few interact to form evolutionarily constrained units. Due to the limited sampling of genomes, some false negatives are expected – that is, gene pairs which are functionally coupled but for which we do not see a strong co-evolutionary signal. Thus, the capacity for evolutionary statistics to predict adaptive units within a given organism requires thorough experimental testing.
To accomplish this, we focused on the DHFR and TYMS enzyme pair in E. coli. These proteins are the most strongly co-evolving in the pathway and are not known to physically interact, providing an opportunity to better understand how biochemical constraints might drive co-evolution. Though the genes encoding DHFR and TYMS are proximal along the chromosome of many bacterial species, they are approximately 2.9 megabases apart in E. coli. So, our experiments will determine whether coupling between genes found in synteny persists even when they are distant on the chromosome.
Coupling between DHFR and TYMS depends on enzyme activity
Does the co-evolution of DHFR and TYMS correspond to epistatic coupling in the folate metabolic network? To address this, we conducted quantitative measurements of genetic epistasis for a library of ten DHFR mutants in the background of either WT TYMS or TYMS R166Q, a catalytically inactive variant (twenty constructs in total). Here, we quantify epistasis as non-additivity in the mutational effect on fitness for pairs of mutations in DHFR and TYMS (Fig 2A). The ten DHFR mutants were selected to span a range of catalytic activities; in prior work we measured the catalytic parameters (kcat and Km) of these mutants in vitro (Reynolds et al., 2011). We used a previously validated next-generation sequencing based assay to measure the relative fitness of all possible mutant combinations (20 total) in a single internally controlled experiment (Reynolds et al., 2011). In this system, DHFR and TYMS are expressed from a single plasmid that contains two DNA barcodes – one associated with DHFR and one with TYMS – that uniquely encode the identity of each mutant (Fig. 2B). The full library of mutants is transformed into the auxotroph strain E. coli ER2566 ΔfolA ΔthyA, and grown as a mixed population in a turbidostat. The turbidostat allows us to maintain the cell population at a fixed density in exponential phase for the duration of the experiment, with excellent control over media conditions. We then sampled time points over a twelve-hour period, and used next-generation sequencing to compute allele frequencies at each time point (Fig. 2C). By fitting a slope to the plot of allele frequencies versus time, we obtain a relative growth rate for each mutant in the population (Fig. 2C and Fig. S1). While many epistasis measurements rely on the extreme perturbation of total gene knock out (Babu et al., 2011b; Typas et al., 2008), this approach permits a more complete examination of how coupling varies with the magnitude of the functional perturbation.
The TYMS R166Q mutant is non-viable in the absence of exogenous thymidine. Epistasis was thus measured across two supplementation conditions: one in which TYMS R166Q produces a growth defect (5 μg/ml thymidine), and one that constitutes a full rescue of TYMS activity (50 μg/ml). Analysis of the DHFR mutants in the background of WT TYMS (grey points, Fig. 2D-E) shows that growth rate is relatively insensitive to ~10-fold decrease in DHFR activity. Tolerance to a small reduction of DHFR activity is preserved in a TYMS R166Q background, resulting in near additive epistasis. As DHFR activity is diminished further, the relationship between catalytic power and growth rate becomes monotonic: decreases in activity result in slower growth. In addition, a pattern of positive, or buffering epistasis emerges (upper bar plots in Fig. 2D-E); TYMS R166Q has the counter-intuitive consequence of partly or even fully (at 50 μg/ml thymidine) restoring growth rate for low-activity DHFR mutants (Fig. 2D-E). The presence of epistasis – in which loss of function in TYMS buffers decreases in the catalytic activity of DHFR – is consistent with the evolutionary coupling of this enzyme pair.
Mechanism of DHFR/TYMS coupling
Given no evidence for the physical association of DHFR and TYMS in bacteria, what is the mechanism underlying their coupling? The finding that epistasis between the two genes depends on enzyme activity suggests a simple hypothesis: their coupling arises from the need to balance the concentration of key metabolites in the folate metabolic pathway. Support for this idea comes from prior work showing that treatment of E. coli with the DHFR inhibitor trimethoprim results in intracellular accumulation of DHF, which inhibits the upstream enzyme folylpoly-γ-glutamate synthetase (FP-γ-GS) (Kwon et al., 2008). FP-γ-GS catalyzes the polyglutamylation of reduced folates, an important modification that increases folate retention in the cell and promotes the use of reduced folates as substrates in a number of downstream reactions (McGuire and Bertino, 1981). Thus, DHF accumulation results in off-target enzyme inhibition and cellular toxicity, a likely explanation for the growth rate defect observed in hypomorphic DHFR alleles (Fig. 2D-E). Because DHF is a product of TYMS, it is logical that loss-of-function mutations in TYMS might rescue growth in DHFR hypomorphs by preventing the accumulation of DHF.
To test this hypothesis, we carried out liquid chromatography-mass spectrometry (LC-MS) profiling of folate pathway metabolites in DHFR/TYMS mutant combinations. Specifically, we selected five DHFR variants that span a range of catalytic activities (WT, G121V, F31Y.L54I, M42F.G121V, and F31Y.G121V), and measured the relative abundance of intracellular folates in the background of either wild-type or R166Q TYMS. The experiment was carried out for log-phase cultures in M9 glucose media supplemented with 0.1% amicase and 50 μg/ml thymidine, conditions in which the selected DHFR mutations display significant growth defects individually, but in which the corresponding DHFR/TYMS double mutants are restored to near wild-type growth. Current mass spectrometry methods allow discrimination between the full diversity of folate species, which differ in oxidation, one-carbon modification, and polyglutamylation states, permitting a broad metabolic study of the effects of mutations (Lu et al., 2007).
The data confirm that for DHFR loss-of-function mutants, intracellular DHF concentration increases (Fig. 3A, bottom four rows). In addition, we find evidence for a depletion of reduced polyglutamated folates (Glu >= 3), while several mono- and di-glutamated THF species accumulate (particularly for THF, Methylene THF and 5-Methyl THF). This pattern of changes in the reduced folate pool is consistent with inhibition of FP-γ-GS by DHF accumulation (Fig. 3A, Fig. S2). It is also consistent with the observed growth rate defects in the DHFR loss-of-function mutants (Fig. 3B). How does the metabolite profile look in the background of the corresponding TYMS loss-of-function mutant? As predicted, we find clear evidence that the metabolite profile is corrected in the background of TYMS R166Q. Indeed, the concentrations of the reduced polyglutamated folates are restored to near-wild-type levels for most of the DHFR alleles (Fig 3A, first four rows below WT/R166Q). These data show that coordinated decreases in the activity of DHFR and TYMS maintain balance in key intracellular metabolites, a condition associated with optimal growth. Thus the coupling of DHFR and TYMS can be explained by a joint constraint on their catalytic activities, a biochemical mechanism that may have driven co-evolution of the DHFR/TYMS gene pair.
Forward evolution reveals adaptive independence of DHFR and TYMS from the rest of the genome
The analysis of co-evolution presented in figure 1 goes beyond the prediction of epistatic coupling between DHFR and TYMS. The finding that this enzyme pair co-evolves relatively weakly with the remainder of the pathway suggests that they might represent an independent adaptive unit within folate metabolism. To test this, we conducted a global suppressor screen in which we inhibited DHFR with the common antibiotic trimethoprim and then examined the pattern of compensatory mutations with whole genome sequencing. If DHFR and TYMS indeed represent a quasi-independent adaptive unit, then suppressor mutations should be found within these two genes (E. coli folA and thyA) with minimal contribution from other sites.
We used a morbidostat, a specialized device for continuous culture, to facilitate forward evolution in the presence of trimethoprim (Toprak et al., 2012; Toprak et al., 2013) (Fig. 4A-C). The morbidostat dynamically adjusts the trimethoprim concentration in response to bacterial growth rate and total optical density, thereby providing steady selective pressure as resistance levels increase (see Fig. 4 legend for details). The basic principle is that cells undergo regular dilutions with fresh media until they hit a target optical density (OD = 0.15); once this density is reached, they are diluted with media containing trimethoprim until growth rate is decreased. This approach makes it possible to obtain long trajectories of adaptive mutations in the genome with sustained phenotypic adaptation (Toprak et al., 2012). For example, in a single 13-day experiment, we observe resistance levels in our evolving bacterial populations that approach the trimethoprim solubility limit in minimal (M9) media.
TYMS loss-of-function mutations have been observed in trimethoprim resistant clinical isolates for multiple genera of bacteria (King et al., 1983; Kriegeskorte et al., 2014). To emulate the natural environment in which TYMS function is dispensable, we conducted evolution in three media conditions sampling various concentrations of exogenous thymidine (5, 10, and 50 μg/ml) that culminate in a full rescue of TYMS activity. All conditions were also supplemented with 0.2% amicase, a source of free amino acids. By alleviating selective pressure on the entire pathway, we aim to expose a larger range of adaptive mutations without biasing the pathway towards a particular result; this is similar to the common practice of conducting second site suppressor screens for essential genes under relatively permissive conditions (Forsburg, 2001). In this context, an absence of mutation outside of the two-gene pair becomes more significant.
Over the 13 days of evolution, we estimated the trimethoprim resistance of each evolving population by computing the median drug concentration in the culture vial beginning with the first dilution with drug to the end of the day (Fig 4D and Fig. S3). To identify the mutants causally related to trimethoprim resistance, we selected 10 single colonies from the endpoint of each of the three experimental conditions for phenotypic and genotypic characterization (30 strains in total, Fig. 5, Fig. S4 and Table S2). For each strain, we measured the trimethoprim IC50, growth rate dependence on thymidine, and conducted whole genome sequencing. All 30 strains were confirmed to be highly trimethoprim resistant (Fig. 5A, Table S2). We were unable to measure IC50 values for strains 4, 5 and 10 from the 50 μg/ml thymidine condition: these three strains grew very slowly but were completely insensitive to trimethoprim. Further, strains from all three thymidine supplemented conditions now depend on exogenous thymidine for growth, indicating a loss of function in the thyA gene that encodes TYMS (Fig. 5B and Fig. S4). We confirmed that this loss of function is not a simple consequence of neutral genetic variation in the presence of thymidine; cells grown in 50 μg/ml thymidine in the absence of trimethoprim retain TYMS function over similar time scales (Fig. S5A-B).
Whole genome sequencing for all 30 strains reveals a striking pattern of mutation (Fig. 5C and Tables S3-S4). Isolates from all three conditions acquire coding-region mutations in both DHFR and TYMS, or even just in TYMS. In particular, strains 4, 5, 7, and 10 in the 50 μg/ml thymidine condition contain mutations in TYMS but not DHFR – showing that one route to resistance is the acquisition of mutations in a gene not directly targeted by antibiotic. All mutations identified in DHFR reproduce those observed in the earlier morbidostat study of trimethoprim resistance (Toprak et al., 2012). The mutations in TYMS – two insertion sequence elements, a frame shift mutation, loss of two codons, and a non-synonymous active site mutation – are consistent with loss of function. Thus, the mutations in DHFR and TYMS are compatible with the proposed mechanism of coupling: reduced TYMS activity can buffer inhibition of DHFR.
In support of the adaptive independence of the DHFR/TYMS pair, we observe no other mutations in folate metabolism genes (Fig. 5C and Table S4). More generally, few other mutations occur elsewhere in the genome, and the majority of these are not systematically observed across clones. This suggests that they may be spurious variations not associated with the adaptive phenotype. One of the evolved strains contains only mutations in DHFR and TYMS (strain 1 in 50 μg/ml thymidine, Fig. 5C), indicating that variation in the DHFR/TYMS genes is sufficient to produce resistance. To verify this, we introduced several of the observed DHFR and TYMS mutations into a clean wild-type E. coli MG1655 background and measured the IC50. These data show that the DHFR/TYMS mutations are sufficient to reproduce the resistance phenotype measured for the evolved strains (Fig. S5). DHFR and TYMS show a capacity for adaptation through compensatory mutation that is contained within the two-gene unit.
Genome-wide analyses of co-evolution identifies additional adaptive pairs
Our data for the DHFR/TYMS pair provide confidence that co-evolution might reasonably detect adaptive units – small groups of genes that collectively evolve with each other but adapt more independently from the remainder of the cell. To examine the prevalence of other pairs like DHFR/TYMS genome wide, we extended our analyses of gene synteny and co-occurrence to include all genes represented in E. coli. To ensure good statistics, we filtered the orthologs analyzed to those that co-occur in a sufficiently large number of genomes (2095 COGs, ~500,000 pairs in total) (see also Supplemental Experimental Procedures). We compared our analysis to: (1) metabolic annotations from KEGG (Kanehisa et al., 2012) and (2) the set of high-confidence binding interactions in E. coli reported by the STRING database (Szklarczyk et al., 2015). Consistent with intuition and prior work, evolutionarily coupled gene pairs show enrichment for physical complexes, enzymes in the same metabolic pathway, and more specifically, enzymes with a shared metabolite (Fig. 6A,B).
Overall, these data provide encouraging validation of the ability of co-evolutionary methods to detect interactions. But to investigate the possibility of adaptive units like DHFR/TYMS, we must identify groups of genes that are strongly coupled to each other, and comparatively decoupled from the remainder of the genome. Here, we focus on the simplest case of a modular gene pair. Accordingly, in Figure 6C-D, we show scatterplots of all gene pairs, indicating the strength of coupling within each pair (as a relative entropy, along the x-axis) versus the strongest coupling outside of the pair (along the y-axis). In this representation, gene pairs below the diagonal are more tightly coupled to each other than any other gene in the dataset. Thus, we predict that gene pairs in the lower right corner may act as adaptive units. This simple plot is limited to the identification of two gene adaptive units, and our analysis will need to be extended to identify communities within the complete weighted network of co-evolutionary relationships (Newman, 2010). However, this analysis clearly indicates the presence of other pairs like DHFR and TYMS, and suggests interesting candidates for initial experimental testing.
For illustrative purposes, we consider a simple definition for the prediction of adaptive units based on empirical cutoffs for internal and external coupling ( and , Table S1). In this regime of the synteny analysis, we identify the gene pair accB/accC which encodes two of the four subunits of acetyl-CoA carboxylase, the first enzymatic step in fatty acid biosynthesis. Overexpression of either accB or accC individually causes reductions in fatty acid biosynthesis, but overexpressing the two genes in stoichiometric amounts rescues this defect (Abdel-Hamid and Cronan, 2007; Janssen and Steinbuchel, 2014). Constraints on relative expression have also been noted for the selA/selB and tatB/tatC gene pairs (Bolhuis et al., 2001; Rengby et al., 2004). The tatB/tatC genes encode components of the TatABCE twin-arginine translocation complex, while selA/selB are not known to bind but are both involved in selenoprotein biosynthesis. Thus, these co-evolving units may generally indicate groups of genes whose activity or expression is constrained relative to each other. This interpretation is consistent with previous work that has shown that synteny identifies stretches of genes larger than single operons that are functionally co-expressed (Junier and Rivoire, 2016). In this sense, the bifunctional fused form of DHFR/TYMS found in protozoa and plants could be regarded as an extreme case that guarantees stoichiometric expression (Beverley et al., 1986; Lazar et al., 1993). Taken together, these data support the idea that co-evolution analyses can be used to identify small adaptive units within cellular systems more generally.
Discussion and Conclusions
DHFR and TYMS represent an adaptive unit in folate metabolism
Our data indicate that DHFR and TYMS are epistatically coupled to each other through a constraint on relative activity mediated by metabolite concentration. Namely, prior work has shown that inhibition of DHFR with TMP leads to rapid accumulation of DHF and inhibition of the upstream enzyme FP-γ-GS (Kwon et al., 2008). Using DHFR mutants of known catalytic activity, we show that reducing TYMS activity alleviates DHF accumulation and restores growth rate. This result is recapitulated in our forward evolution experiments, which show that inhibition of DHFR with TMP promotes TYMS loss of function mutations. These results provide a mechanistic explanation for the well-known clinical result that trimethoprim resistance is often accompanied by loss of function mutations in TYMS (King et al., 1983). Further, mutations within the two enzymes are sufficient to accomplish TMP resistance, without a need for compensatory mutations elsewhere in the pathway or genome. Our comparative genomics analyses show that this functional coupling and relative adaptive independence is reflected in the statistics of both synteny and gene co-occurrence across thousands of bacterial species: DHFR and TYMS strongly co-evolve with one another, but less so with the remainder of the genome.
These findings are consistent with several pieces of prior work. The first comes from examining TYMS ortholog variation across species. For instance, the bacterium R. capsulatus lacks both thyA (TYMS) or folA (DHFR) homologs, and instead produces thymidine via thyX, a thymidylate synthase that generates THF (rather than DHF) in the process of thymidine production. When thyX is deleted from R. capsulatus, growth can only be complemented by the addition of both thyA and folA from R. sphaeroides; the thyA gene alone is insufficient. Computational simulation shows that in the absence of a high-activity DHFR (folA), thyA rapidly depletes reduced (THF) folate pools by converting them to DHF (Leduc et al., 2007). These data thus also suggest a fitness constraint on the relative activities of folA and thyA.
Further, mathematical models of the eukaryotic folate cycle indicate that standard biochemical kinetics might result in modularity of the DHFR/TYMS pair (Leduc et al., 2007; Nijhout et al., 2004). In eukaryotic cells, thymidine synthesis is the rate-limiting step for DNA synthesis, and transcription of the TYMS and DHFR genes is greatly upregulated (via a common transcription factor) at the G1/S cell cycle transition (Bjarnason et al., 2001). Computationally increasing the activities of DHFR and TYMS 100-fold in the context of a model based on known biochemical parameters results in increased thymidine synthesis but only modestly changes the concentration of other folates, indicating that biochemical rates and metabolic network structure give rise to modularity in different metabolite pools. This suggests that decoupling of the DHFR/TYMS pair from the remainder of folate metabolism might provide one route for metabolic homeostasis (Nijhout et al., 2004).
Taken together, our data indicate that DHFR/TYMS co-evolution is driven by a shared metabolic constraint. This is broadly consistent with the idea that co-evolution is not limited to detecting physical interactions but instead reflects the coupling of gene activities regardless of mechanism (Huynen et al., 2000; Snel et al., 2002). A toxic metabolic intermediate represents one case in which protein stoichiometry is important, but in principle any functional constraint on relative expression levels or biochemical flux might lead to similar evolutionary signatures. Recent work examining the potential small molecule regulatory network of metabolism in both E. coli and humans indicates many metabolic intermediates may have some inhibitory and/or regulatory capacity (Alam et al., 2017; Reznik et al., 2017). This suggests that constraints on intermediate formation could be a common driver of co-evolution between metabolic enzymes.
Comparative genomics as a strategy for identifying adaptive units in cellular systems
Co-evolution analyses are often evaluated by their ability to recapitulate known interactions or predict new ones (Huynen et al., 2000; Janga et al., 2005; Kim and Price, 2011; Pellegrini et al., 1999; Szklarczyk et al., 2015). For example, in Figure 6 we show that our two measures of evolutionary coupling successfully enrich for physical complexes and proteins in shared metabolic pathways. But in analogy to recent work in proteins (Halabi et al., 2009) and gene expression (Junier and Rivoire, 2016), these methods might be further developed to identify quasi-independent adaptive units within cellular systems (Li et al., 2014; Reynolds, 2014). Our case study for the DHFR/TYMS pair provides the first experimental test that signatures of co-evolution might indeed be reasonably used to identify independent adaptive units at the systems scale. As in Figure 6, the pattern of (evolutionary) interactions can be examined to look for groups of two or more genes that are statistically coupled to one another but more decoupled from the rest of the genome with the hypothesis that these co-evolving units might represent cooperative units of function and adaptation within extant species.
Experimentally testing the functional meaning of these units requires new methods to not only evaluate interactions, but the independence of a group of genes from the rest of the genome. The global suppressor screen shown here establishes one general experimental criterion for establishing adaptive independence. The logic behind this approach is two-fold: 1) we are directly examining if past evolutionary statistics across prokaryotes relate to laboratory-timescale forward evolution in E. coli and 2) it permits screening for suppressor mutants genome wide. However, other experimental strategies for evaluating independence are possible and should be considered. One obvious criterion is modularity in genetic interactions: how do pairwise epistasis measurements compare to statistically identified adaptive units? At present, genome-wide measurements of epistasis have not yet been made for E. coli. However, high-throughput approaches for generating double mutants in different genes (e.g. eSGA and GIANT coli) have yielded large data sets of pairwise epistasis for several pathways including cell wall biosynthesis, translation, and the DNA damage response (Babu et al., 2014; Babu et al., 2011a; Babu et al., 2011b; Butland et al., 2008; Gagarinova et al., 2016; Kumar et al., 2016; Typas et al., 2008). These datasets provide an interesting basis for further study, but face several limitations: 1) epistasis is often indirectly inferred – single mutant growth rates are not always explicitly measured and 2) fitness effects are often measured in the extreme limit of complete gene knockout. CRISPR-interference (CRISPR-i) overcomes many of these technical difficulties and is an interesting future avenue for introducing genetic knockdowns of varying magnitude in a high-throughput way (Bikard et al., 2013; Qi et al., 2013). The results presented here provide a starting place for now conducting these and other experiments comprehensively.
We detect co-evolution between DHFR and TYMS by both synteny and co-occurrence, but can coupling through a shared metabolite result in detectable amino acid sequence co-variation? Direct Coupling Analysis (DCA) and Statistical Coupling Analysis (SCA) represent two distinct approaches for the analysis of amino acid sequence co-evolution. The former was developed with the goal of detecting local, physical contacts between amino acids (Hopf et al., 2014; Morcos et al., 2011; Ovchinnikov et al., 2014; Skerker et al., 2008), while the latter has identified co-evolving residue networks associated with function and allostery (McLaughlin et al., 2012; Raman et al., 2016; Reynolds et al., 2011; Suel et al., 2003). DCA has proven powerful in predicting physical protein interactions and specific contacts in macromolecular complexes (Feinauer et al., 2016; Hopf et al., 2014; Ovchinnikov et al., 2014). SCA has not yet been applied to inter-protein interactions but has been used to study co-evolution and allostery between domains (Lee et al., 2008; Reynolds et al., 2011; Smock et al., 2010). We used two web-server implementations of DCA to examine the DHFR/TYMS pair. One implementation (EVcomplex, (Hopf et al., 2014)) predicts no high-scoring contacts between the two proteins (EVcomplex score >=0.8), while the other (GREMLIN, (Ovchinnikov et al., 2014)) predicts that the two proteins may interact, on the basis of a single high-scoring inter-protein residue pair. In either case, the finding of only one or zero couplings between DHFR and TYMS is consistent with the interpretation of DCA couplings as physical contacts because these enzymes are not known to physically interact in bacteria. In contrast, application of SCA to the DHFR/TYMS pair reveals a number of co-evolving positions, but also detects co-evolution between DHFR/TYMS and several less obviously related proteins (Fig. S6). Thus, more work is needed to understand the significance of the inter-protein SCA correlations. More generally, these results point to a clear need to develop appropriate statistical tools for examining co-evolution between functionally coupled proteins. The DHFR/TYMS pair can provide an experimentally powerful test case to carefully study the amino acid sequence constraints imposed between non-binding but epistatically coupled proteins.
Finally, our comparative genomics analysis has the capacity to identify co-evolving units distinct from expectations based on metabolic proximity or physical interaction maps. In many cases, the co-evolution observed between genes can be rationalized by involvement in a shared complex or metabolic pathway. However, the finding of independence is less intuitive – for example, DHFR and TYMS co-evolve with one another but not other enzymes in the pathway with a shared metabolite. We also observe the accB and accC gene pair as an independent co-evolving unit even though they are only two of the four proteins involved in the acetyl-CoA carboxylase physical complex. As mentioned earlier, experiments suggest the expression level of the accB/C gene pair are constrained relative to each other, but seemingly less so with the remainder of the complex (Abdel-Hamid and Cronan, 2007). Thus, co-evolutionary analyses have the potential to reveal new units of adaptation and function within larger cellular systems and complexes. Though comprehensive testing remains a matter for extensive future work, this inspires the possibility that metabolism might be decomposed into small multi-gene units using evolutionary information. If so, these new units would assist in focusing mechanistic work, suggest strategies for the engineering of cellular systems, and provide a path towards predictive modeling of cellular phenotypes.
Conflict of Interest
The authors declare they have no conflict of interest.
Materials and Methods
Comparative genomics analyses
Synteny analysis was conducted using a simplified version of the methods described in (Junier and Rivoire, 2013, 2016). See the Supplemental Experimental Procedures for a detailed description of the synteny and co-occurrence calculations.
Forward evolution of trimethoprim resistance in the morbidostat
The morbidostat/turbidostat apparatus was constructed as described by Toprak and colleagues (Toprak et al., 2013). The founder strain for the forward evolution experiment was E. coli MG1655 modified by phage transduction to encode green fluorescent protein (egfp) and chloramphenicol resistance (cat) at the P21 attachment site. The goal of this modification was to prevent and detect contamination with other strains. Throughout the forward evolution experiment, cells were grown at 30°C in M9 media supplemented with 0.4% glucose and 0.2% amicase (Sigma); 30 μg/ml of chloramphenicol (Cam) was added for positive selection.
To begin the experiment, the founder strain was cultured overnight at 37°C in Luria Broth (LB) + 30 μg/ml Cam. This culture was washed twice with M9, and back diluted into M9 + 30 μg/ml Cam supplemented with 5, 10, or 50 μg/ml thymidine (thy) for overnight adaptation in culture tubes at 30°C. The next day (henceforth referred to as day 0; day 1 is the end of the first day of adaptation), these overnight cultures were streaked onto LB agar plates: two colonies per condition were chosen for whole genome sequencing (WGS) in order to obtain an accurate sequence for the founder strain. The remainder of the overnight cultures was used to inoculate four morbidostat tubes at containing M9 media with varying thymidine supplementation (5, 10, and 50μg/ml thy). The starting optical density was approximately 0.005. Initial antibiotic concentrations were 0, 11.5 and 57.5 μg/ml trimethoprim for media stocks A, B, and C respectively. Each culture grew unperturbed until it surpassed an OD600 of 0.06, at which point it underwent periodic dilutions with fresh media. The dilution rate is given by the formula , where V = 15ml is the culture volume, and ΔV = 3ml is volume added. We chose a dilution frequency f = 3 h−1, to give rdil = 0.55. Above the OD600 = 0.15, these dilutions are used to introduce TMP into the culture (see also Fig. 5B). This allows controlled inhibition of DHFR activity in response to growth rate. Cycles of growth and dilution continued for a period of ~22 hours, at which point the run was paused to make glycerol stocks, replenish media, and update TMP stock concentrations. Culture vials for the next day of evolution were filled with fresh media and inoculated using 300μl from the previous culture. Complete trajectories of OD600 and drug concentration are shown in Fig. S4. Endpoint cultures were streaked onto LB agar plates supplemented with 30 μg/ml of Cam and 50 μg/ml thymidine to obtain isolated colonies for whole genome sequencing.
Whole genome sequencing
Two isolates were selected from each adapted day 0 culture, and ten clonal isolates (colonies) were randomly selected from the endpoint of each evolution condition, totaling 36 strains. Isolation of genomic DNA was performed using the QIAamp DNA Mini Kit (Qiagen). The Nextera XT DNA Library Prep Kit (Illumina) was used to fragment and label each genome for paired-end sequencing using a v2 300-cycle MiSeq kit (Illumina). Average read length and coverage can be found in Table S3. Genome assembly and mutation prediction was performed using breseq (Deatherage and Barrick, 2014). The reference sequence was a modification of the E. coli MG1655 complete genome (accession no. NC_000193) , edited to include the GFP marker and chloramphenicol resistance cassette in our founder strain. The modified reference sequence and all complete genome sequences from the beginning and endpoint of forward evolution are available in the NCBI BioProject database (accession number: PRJNA378892).
Measurements of thymidine dependence
All strains were grown overnight in LB + 5μg/ml thy, with the exception of the strains evolved in the 50μg/ml thy, which were supplemented with 50μg/ml thy to ensure viability. Cultures were then washed twice in M9 media without thymidine, and inoculated at an OD600=0.005 in 96-well plates containing M9 media supplemented with 10-fold serial dilutions of thymidine, ranging from 0.005 μg/ml to 50 μg/ml (in singlicate). OD600 was monitored in a Victor X3 plate reader at 30°C over a period of 20 hours. Growth was quantified using the positive integral of OD600 over time. This measure captures mutational or drug-induced changes in the duration of lag phase as well as perturbations in growth rate (Toprak et al., 2012). For each strain, we identified a start-time (t0) at the end of lag-phase for the fully-rescued 50μg/ml thy condition. We chose each t0 computationally as the last point before monotonic growth above the limit of detection. The log(OD600) versus time curves for all conditions are then vertically shifted (‘background-subtracted’), such that the function value at this start-time is zero. This curve is then numerically integrated from t0 to t0+10 hours using the trapezoid method.
Measurements of trimethoprim resistance (IC50)
All strains were grown overnight in LB + 5μg/ml thy, with the exception of the strains evolved in the 50μg/ml thy, which were supplemented with 50μg/ml thy to ensure viability. Each strain was then washed into media conditions corresponding to the strain’s forward evolution condition, and adapted for 5.5 hours at 30°C. The recovery cultures were used to inoculate 96-well plates containing M9 media sampling serial dilutions of TMP (in triplicate), with a starting OD600 = 0.005. OD600 was monitored using a Tecan Infinite M200 Pro microplate reader and Freedom Evo robot at 30°C over a period of at least 12 hours. The trimethoprim resistance of each strain was quantified by its absolute IC50, the drug concentration (μg/ml) at which growth is half-maximal. The relationship between growth and trimethoprim inhibition is modeled using the four parameter logistic function: where Y is growth, X is TMP concentration, a is the asymptote for uninhibited growth, d is the limit for inhibited growth, c provides the concentration midway between a and d, and b captures sensitivity (Sebaugh, 2011). Growth was quantified using the positive integral of OD600 data over a 10h period of growth (see also the methods for measurement of thymidine dependence). For each strain, we identify a start-time (t0) at the end of lag-phase for the uninhibited 0μg/ml TMP condition. Growth versus TMP concentration was fit to the above model using MATLAB. IC50 was calculated as the concentration X* for which growth Y(X*) = a/2.
Growth without trimethoprim selection in 50μg/ml thymidine using the turbidostat
The founder strain for this experiment was identical to that used for evolution of trimethoprim resistance. Throughout the experiment, cells were grown at 30°C in M9 media supplemented with 0.4% glucose and 0.2% amicase (Sigma); 30 μg/ml of chloramphenicol (CAM) was added for positive selection. To begin the experiment, the founder strain was cultured overnight at 37°C in Luria Broth (LB) + 30 μg/ml Cam. This culture was washed twice with M9, and back diluted into M9 supplemented with 50μg/ml thymidine (thy) for overnight adaptation in culture tubes at 30°C. The next day (henceforth referred to as day 0; day 1 is the end of the first day of continuous culture), the overnight culture was used to inoculate three turbidostat tubes containing 17ml of M9 supplemented with 50 thy. The starting optical density was approximately 0.005. Each culture grew unperturbed until it reached an OD600 of 0.15, at which point it was diluted with 2.4 ml of fresh media. These cycles of growth and dilution persisted for a period of ~22 hours, at which point the run was paused to make glycerol stocks and replenish media. Culture vials for each following day of evolution were filled with fresh media and inoculated using 300μl from the previous culture.
Epistasis Measurements
All relative growth rate measurements were performed in the E. coli folate auxotroph strain ER2566 ΔfolA ΔthyA (Lee et al., 2008). DHFR (folA) and TYMS (thyA) are provided on the plasmid pACYC-Duet1 (in MCS1 and MCS2, respectively) and are each under control of a T7 promoter. For these experiments, we use leaky expression (no IPTG induction). Each mutant plasmid (20 in total) is marked with a genetic barcode in a non-coding region between the two genes. Plasmids were transformed into the auxotroph strain, and each mutant was grown overnight in separate LB +30μg/ml Cam +50μg/ml thy cultures. Then, cultures were washed 2x in M9 media supplemented with 0.4% glucose and 0.2% amicase and 30μg/ml Cam, and adapted overnight at 30°C. All mutants were mixed in equal ratios based on OD600 and inoculated at a starting OD600 = 0.1 in the turbidostat. Growth rates were measured under two conditions: 5 thy and 50 thy, with three replicates each. The turbidostat clamps the culture to a fixed OD600 = 0.15 by adding fresh dilutions of media. Every 2 hours over the course of 12 hours a 1ml sample was removed, pelleted and frozen for next-generation sequencing. Amplicons containing the barcoded region with appropriate sequencing adaptors (350 basepairs in total size) were generated by two sequential rounds of PCR with Q5 polymerase. The barcoded region was sequenced with a single-end MiSeq run using a v2 50 cycle kit (Illumina). We obtained 14,348,937 reads. Data analysis was performed using a series of custom python scripts to count barcodes, and MATLAB to fit relative growth rates.
Constructing DHFR/TYMS mutants in a clean genetic background
We followed the protocol for scarless genome integration using the modified λ-red system developed by Tas et al. (Tas et al., 2015). In this method, a tetracycline (Tet) resistance cassette (“landing pad”) is first integrated at the site targeted for mutagenesis. Then, the landing pad is excised by the endonuclease I-SceI, and replaced with the desired mutation by λ-red mediated recombination. NiCl2 is used to counterselect against cells that retain the tetracycline cassette. Tas et al. provides a detailed protocol; here we give the specifics necessary for our experiments. For the λ-red machinery, we transformed the plasmid pTKRED (Genbank accession number GU327533) into electrocompetent E. coli MG1655 with a genomic egfp/cat resistance cassette (the forward evolution founder strain). For the Δ25-26 TYMS mutation, we introduced the tetA landing pad between genome posisitons 2,964,900 and 2,965,201 (genome NC000913) corresponding to the N-terminus of the thyA gene. For the DHFR mutations (L28R, W30R, and P21L), the landing pad was recombined between genome positions 49,684 and 49,990 (genome NC000913). In order to replace the Tet cassette, cells were induced with 2mM IPTG and 0.4% arabinose, and then transformed with 100ng of dsDNA PCR product containing the mutation of interest (with appropriate homology arms). This reaction experienced 3 days of outgrowth at 30°C in rich defined media (RDM, Teknova) with glucose substituted for 0.5% v/v glycerol. The media was supplemented with 6 mM or 4mM NiCl2 for counterselection against tetA at the thyA locus or folA locus respectively. The outgrowth culture was streaked onto agar plates and screened daily for the mutant of interest using LB supplemented with 50 μg/ml thy, 30 μg/ml spectinomycin, and +/− 5-10 μg/ml Tet. All mutations were confirmed by Sanger sequencing of the complete folA and thyA open reading frame; for folA the promoter region was also sequenced.
LC-MS Metabolite Measurements
Cells were cultured in M9 0.2% glucose media containing 0.1% amicase, 50 ug/ml thy, and 30 ug/ml Cam at 30°C for metabolite analysis. In mid-log phase at OD600 ~0.2, E. coli culture (3 ml for nucleotide measurement and 7 ml for folate measurement) was filtered on a nylon membrane (0.2 μm), and the residual medium was quickly washed away by filtering warm saline solution (200 mM NaCl at 30’C) over the membrane loaded with cells to exclude non-desirable extracellular metabolites from LC-MS analysis. The membrane was immediately transferred to a 6 cm Petri dish containing 1 ml cold extraction solvent (−20°C 40:40:20 methanol/acetonitrile/water; for folate stability, 2.5 mM sodium ascorbate and 25 mM ammonium acetate in folate extraction solvent (Lu et al., 2007)) to quench metabolism. After washing the membrane, the cell extract solution was transferred to a microcentrifuge tube and centrifuged at 13000 rcf for 10 min. The supernatant was transferred to a new microcentrifuge tube. Folate samples were prepared with an additional extraction: the pellet was resuspended in the cold extraction solvent and sonicated for 10 min in an ice bath. After the second extraction and centrifugation, the supernatant was combined with the initial supernatant. The metabolite extracts were dried under nitrogen flow and reconstituted in HPLC-grade water for LC-MS analysis. Metabolites were measured using stand-alone orbitrap mass spectrometers (ThermoFisher Exactive and Q-Exactive) operating in negative ion mode with reverse-phase liquid chromatography (Lu et al., 2010). Exactive chromatographic separation was achieved on a Synergy Hydro-RP column (100 mm×2 mm, 2.5 μm particle size, Phenomenex) with a flow rate of 200 μL/min. Solvent A was 97:3 H2O/MeOH with 10 mM tributylamine and 15 mM acetic acid; solvent B was methanol. The gradient was 0 min, 5% B; 5 min, 5% B; 7 min, 20% B; 17 min, 95% B; 20 min, 100% B; 24 min, 5% B; 30 min, 5% B. Q-Exactive chromatographic separation was achieved on an Poroshell 120 Bonus-RP column (150 × 2.1 mm, 2.7 μm particle size, Agilent) with a flow rate of 200 μL/min. Solvent A is 10mM ammonium acetate + 0.1% acetic acid in 98:2 water:acetonitrile and solvent B is acetonitrile. The gradient was 0 min, 2% B; 4 min, 0% B; 6 min, 30% B; 11 min, 100% B; 15 min, 100% B; 16 min, 2% B; 20 min, 2% B. LC-MS data were analyzed using the MAVEN software package (Clasquin et al., 2012)
Acknowledgements
We thank members of the Reynolds lab for review of the manuscript, E. Toprak for extensive advice on morbidostat construction and operation, S. Benkovic for the ER2566 ΔfolA ΔthyA strain, T. Kuhlman for molecular biology reagents used in genome editing, T. Bergmiller for the GFP/Chloramphenicol resistance marker incorporated into our founder strain, and R. Ranganathan for discussions. This research was funded in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4557 to K.R. and by the Green Center for Systems Biology at UT Southwestern Medical Center. A.S. was supported in part by NIH training grant 5T32GM8203-28. I.J. is supported by an ATIP-Avenir grant (Centre National de la Recherche Scientifique).