Abstract
The ability to predict cell behavior is complicated by an unknown pattern of functional interdependence among genes. Here, we use the conservation of gene proximity across species (synteny) to infer functional couplings between genes. For the folate metabolic pathway, we observe a sparse, modular architecture of interactions, with two small groups of genes coevolving in the midst of others that evolve independently. For one such module – dihydrofolate reductase and thymidylate synthase – we use epistasis measurements and forward evolution to demonstrate both internal functional coupling and independence from the remainder of the genome. Mechanistically, the coupling is driven by a constraint on their relative activities, which must be balanced to prevent accumulation of a metabolic intermediate. The results indicate an organization of cellular systems not apparent from inspection of biochemical pathways or physical complexes, and support the strategy of using evolutionary information to decompose cellular systems into functional units.
Introduction
The activity of one gene is often modified by the activity of other genes in the genome. This functional coupling between genes makes it difficult to predict cellular behavior as a whole from measurements of each gene (or protein) taken independently. As a consequence, our ability to rationally engineer new metabolic systems (Kim and Copley, 2012; Michener et al., 2014a; Michener et al., 2014b), and quantify the relationship between mutations and disease (Kondrashov et al., 2002; Zuk et al., 2012) is limited. Further, this interdependency amongst genes makes it non-trivial to understand how complex cellular systems are possible through an evolutionary process of stepwise variation with selection (Breen et al., 2012; Wagner and Altenberg, 1996; Weinreich et al., 2013). Thus, an ability to globally map functional couplings between genes and subsequently decompose cellular systems into quasi-independent modules - each module consisting of several genes engaged in cooperative function - would help render biological systems tractable and predictable.
However, it remains unclear if such a modular decomposition is possible, and if so, what the general strategy should be for finding it. A fundamental aspect of this problem is to distinguish functional couplings associated with core, conserved processes from those couplings that reflect species and/or environment specific adaptations. In this sense, we seek a general description of genetic interactions that can serve as a basis for guiding targeted experiments and modeling cellular systems. Here, we develop a map of pairwise gene interactions through statistical analysis of co-evolution across thousands of bacterial genomes. The central premise is that functional couplings between proteins drive co-evolution of the associated genes, regardless of details of the interaction mechanism. This co-evolution then leaves a set of detectable statistical signatures in extant genome sequences. Comparative study of natural genetic (co-) variation across genomes should then reveal fundamental functional interactions important to core cellular processes under evolutionarily relevant conditions, rather than those specific to particular species or environments.
Co-evolution can be manifested in different ways – correlations in amino acid sequence variation, coordinated loss and gain of genes across species, or constraints on relative chromosomal location. In this work, we focus on synteny, the conservation of chromosomal proximity between genes (Overbeek et al., 1999; Tamames, 2001). Synteny is a reliable indicator of functional relationships (Huynen et al., 2000; Janga et al., 2005; Overbeek et al., 1999; Rogozin et al., 2002), and the co-expression of genes (Junier and Rivoire, 2016; Korbel et al., 2004). As in prior work, we thus use synteny to infer functional couplings between genes. In addition, we also use the absence of synteny as a measure of independence, with the goal of decomposing cellular systems into groups of genes that co-evolve with each other, but are relatively independent from the rest of the genome.
We begin with a focused study of an experimentally powerful model system: folate metabolism. The folate metabolic pathway involves several interlocking enzymatic loops that catalyze the reactions necessary for synthesis of purine nucleotides, thymidine and a few amino acids. Analysis of gene synteny indicates that this pathway can be decomposed into small modules of one to three genes. Using quantitative measurements of epistasis and forward evolution, we present the first critical tests of these predictions: (1) epistatic coupling within a module and (2) adaptive independence of the module from the remainder of the genome. Motivated by these findings, we carry out a genome wide analysis of pairwise functional couplings between genes (2095 genes, 551,198 gene pairs), which recapitulates and extends the basic findings of our evolutionary analysis for the folate metabolic network. The results indicate a modular organization of genes into groups that is not obvious given knowledge of the underlying biochemistry or physical complexes. We suggest that such evolutionary modules might represent basic units of function within the cell.
Results
An evolution-based map of functional coupling in folate metabolism
The core one-carbon folate metabolic pathway consists of thirteen enzymes that interconvert various folate species and produce methionine, serine, thymidine and purine nucleotides in the process (Fig. 1A and Table S1) (Green and Matthews, 2013). The input of the pathway is 7,8-dihydrofolate (DHF), produced by the bifunctional enzyme dihydrofolate synthase/folylpolyglutamate synthetase (FPGS) through the addition of L-glutamate to dihydropteroate. Once DHF is formed, it is reduced to 5,6,7,8-tetrahydrofolate (THF) by the enzyme dihydrofolate reductase (DHFR) using NADPH as a co-factor. THF can then be modified by a diversity of one-carbon groups at both the N-5 and N-10 positions, and subsequently serves as a one-carbon donor in several critical reactions, including the synthesis of serine and purine nucleotides (Fig. 1A, bottom square portion of pathway). The only step that oxidizes THF back to DHF is catalyzed by the enzyme thymidylate synthase (TYMS), which modifies uridine monophosphate (dUMP) to thymidine monophosphate (dTMP) in the process. This pathway is well conserved across organisms, ensuring good statistics for our analysis. Further, the function of folate metabolism can be readily assessed through quantitative growth rate measurements (Reynolds et al., 2011), and due to the central role of these metabolites in cell growth and division, folate metabolism is a target of several well-known antibiotics (trimethoprim and sulfamethoxazole), and chemotherapeutics (methotrexate and 5-fluorouracil) (Ducker and Rabinowitz, 2016; Gangjee et al., 2007). These factors enable experimental strategies to measure gene function and epistatic coupling in vivo. Thus, folate metabolism provides a good model system to examine the use of synteny in identifying functional modules.
We studied the pattern of couplings between genes in the folate pathway through a quantitative analysis of synteny over 1445 bacterial genomes. The basic operation is to compute the frequency at which a particular pair of orthologous genes occur within a given distance along the chromosome across all genomes, and then calculate the significance of this observation (as a p-value) given a null model in which genes are randomly and uniformly distributed along the chromosome (Junier and Rivoire, 2016). Previous work has shown that this analysis identifies stretches of genes larger than single operons that tend to be co-expressed (Junier and Rivoire, 2016). Here, we convert the synteny p-value between any two genes i and j into a relative entropy Dij. This provides a measure of synteny that is independent of the number of genomes analyzed (see Supplemental Experimental Procedures for details).
Examining synteny for genes comprising the folate pathway reveals a sparse pattern of evolutionary coupling in which most genes are relatively independent from each other (Fig. 1B). Consistent with intuition and expectations from prior work, we observe coupling between physically interacting genes: the glycine cleavage system proteins H, P and T (gcvH, gcvP, and gcvT in E. coli). Together with lipoamide dehydrogenase (lpdA), these enzymes form the glycine decarboxylase complex (GDC), a macromolecular complex that reversibly catalyzes either the degradation or biosynthesis of glycine (Okamura-Ikeda et al., 1993). Notably, lipoamide dehydrogenase also functions as part of both the 2-oxoglutarate dehydrogenase and pyruvate dehydrogenase multienzyme complexes (Carothers et al., 1989); this generality of function may underlie its evolutionary independence from gcvH, gcvP, and gcvT.
Interestingly, we also see evolutionary coupling of enzyme pairs with no evidence for physical interaction: 1) DHFR/TYMS and 2) methionine synthase (MS) and methionine tetrahydrofolate reductase (MTHFR). Indeed, DHFR and TYMS comprise the most strongly coupled gene pair in the folate cycle. Both pairs of enzymes catalyze consecutive reactions in folate metabolism, suggesting a possible mechanistic basis for functional coupling. However, we note that biochemical proximity of reactions is not a sufficient criterion for evolutionary coupling; many gene pairs that are locally linked in the biochemical network do not show statistical correlation (Fig. 1). Thus, our synteny analysis does not simply recapitulate the connections in a standard biochemical network map. Instead, it provides a different representation in which many genes are near independent and a few interact to form modular units. These interacting genes behave as evolutionary modules – they coevolve with one another, but are relatively independent from the rest of the metabolic pathway.
The DHFR/TYMS enzyme pair provides a good test case for the hypothesis that evolutionary modules represent near-independent functional units. The genes are highly coupled by co-evolution, but this coupling is not explained by the formation of a physical complex. Further, though the genes encoding DHFR and TYMS are proximal along the chromosome of many bacterial species, they are approximately 2.9 megabases apart in E. coli (∼4.6 Mbp total genome size). So, experiments in this model system provide an opportunity to test if statistical modularity over an ensemble of genomes corresponds to functional modularity even in the absence of chromosomal proximity in the selected instance.
Coupling between DHFR and TYMS depends on enzyme activity
Does the coevolution of DHFR and TYMS correspond to functional coupling in the folate metabolic network? To address this, we conducted quantitative measurements of genetic epistasis for a library of ten well-characterized DHFR mutants in the background of either WT TYMS or TYMS R166Q, a catalytically inactive variant (twenty constructs in total). We used a previously validated next-generation sequencing based assay to measure the relative fitness of all possible mutant combinations (20 total) in a single internally controlled experiment (Reynolds et al., 2011). In this system, DHFR and TYMS are expressed from a single plasmid that contains two DNA barcodes – one associated with DHFR and one with TYMS – that uniquely encode the identity of each mutant (Fig. 2A). The full library of mutants is transformed into the auxotroph strain E. coli ER2566 ΔfolA ΔthyA, and grown as a mixed population in a turbidostat. The turbidostat allows us to maintain the cell population at a fixed density in exponential phase for the duration of the experiment, with excellent control over media conditions. We then sampled time points over a twelve-hour period, and used next-generation sequencing to compute allele frequencies at each time point (Fig. 2B). By fitting a slope to the plot of allele frequencies versus time, we obtain a relative growth rate for each mutant in the population (Fig. 2B and Fig. S1). The advantage of this approach is that we obtain a quantitative measure of growth rate variation for many mutations in parallel, and thus establish a more complete picture of how epistasis varies with the magnitude of the perturbation.
Analysis of the DHFR mutants in the background of WT TYMS (grey points, Fig. 2C-D) shows that growth rate depends monotonically on DHFR catalytic activity: decreasing DHFR activity corresponds to slower growth. The TYMS R166Q mutant is non-viable, unless the media is supplemented with thymidine (the product of TYMS). In the context of WT DHFR, TYMS R166Q results in a growth rate defect in media supplemented with low amounts of thymidine (5 μg/ml), and no growth rate defect in the presence of 50 μg/ml thymidine. However, in the context of the low activity DHFR mutants, TYMS R166Q has the counter-intuitive consequence of partly (Fig. 2C) or even fully restoring growth rate (Fig. 2D). That is, the TYMS R166Q mutation decreases fitness in the background of a high-activity DHFR, but increases fitness when paired with a low-activity DHFR. This epistasis – in which loss of function in TYMS buffers decreases in the catalytic activity of DHFR – is consistent with the evolutionary coupling of this enzyme pair.
Mechanism of DHFR/TYMS coupling
Given no evidence for the physical association of DHFR and TYMS in bacteria, what is the mechanism underlying their coupling? The finding that epistasis between the two genes depends on enzyme activity suggests a simple hypothesis: the coupling arises from the need to balance the concentration of key metabolites in the folate metabolic pathway. Support for this idea comes from prior work showing that treatment of E. coli with the DHFR inhibitor trimethoprim results in intracellular accumulation of DHF, which inhibits the upstream enzyme folylpoly-γ −glutamate synthetase (FP-γ −GS) (Kwon et al., 2008). FP-γ −GS catalyzes the polyglutamylation of reduced folates, an important modification that increases folate retention in the cell and promotes the use of reduced folates as substrates in a number of downstream reactions (McGuire and Bertino, 1981). Thus, DHF accumulation results in off-target enzyme inhibition and cellular toxicity, an explanation for the growth rate defect observed in hypomorphic DHFR alleles (Fig. 4C-D). Because DHF is a product of TYMS, it is logical that loss-of-function mutations in TYMS might rescue growth in DHFR hypomorphs by preventing the accumulation of DHF.
To test this hypothesis, we carried out liquid chromatography-mass spectrometry (LC-MS) profiling of folate pathway metabolites in DHFR/TYMS mutant combinations. Specifically, we selected five DHFR variants that span a range of catalytic activities (WT, G121V, F31Y.L54I, M42F.G121V, and F31Y.G121V), and measured the relative abundance of intracellular folates in the background of either wild-type or R166Q TYMS. The experiment was carried out for log-phase cultures in M9 glucose media supplemented with 0.1% amicase and 50 μg/ml thymidine, conditions in which the selected DHFR mutations display significant growth defects individually, but in which the corresponding DHFR/TYMS double mutants are restored to near wild-type growth. Current mass spectrometry methods allow discrimination between the full diversity of folate species, which differ in oxidation, one-carbon modification, and polyglutamylation states, permitting a broad metabolic study of the effects of mutations (Lu et al., 2007).
The data confirm that for DHFR loss-of-function mutants, intracellular DHF concentration increases (Fig. 3A, bottom four rows). In addition, we find evidence for a depletion of reduced polyglutamated folates (Glu >= 3), while several mono- and di-glutamated THF species accumulate (particularly for THF, Methylene THF and 5-Methyl THF). This pattern of changes in the reduced folate pool is consistent with inhibition of FP-γ −GS by DHF accumulation (Fig. 3A, Fig. S2). It is also consistent with the observed growth rate defects in the DHFR loss-of-function mutants (Fig. 3B). How does the metabolite profile look in the background of the corresponding TYMS loss-of-function mutant? As predicted, we find clear evidence that the metabolite profile is corrected in the background of TYMS R166Q. Indeed, the concentrations of the reduced polyglutamated folates are restored to near-wild-type levels for most of the DHFR alleles (Fig 3A, top four rows). These data show that coordinated decreases in the activity of DHFR and TYMS maintain balance in key intracellular metabolites, a condition associated with optimal growth. Thus the coupling of DHFR and TYMS can be explained by a joint constraint on their catalytic activities – a biochemical mechanism for the coevolution of the DHFR/TYMS gene pair.
Forward evolution reveals independence of DHFR and TYMS from the rest of the genome
The analysis of coevolution presented in Fig. 1 goes beyond just the prediction of epistatic coupling between DHFR and TYMS. The lack of coupling to other folate metabolic genes suggests that they might act as a near-independent evolutionary module within folate metabolism. To test this, we carried out a genome-wide suppressor screen in which we make perturbations to one component of the two-gene unit and examine the pattern of compensatory mutations. If DHFR and TYMS act as a quasi-independent unit, then suppressor mutations should be found within the genetic loci encoding this pair of enzymes with minimal contributions from other sites. Practically, this experiment entails making a perturbation within the DHFR/TYMS module that reduces organismal growth rate, conducting forward evolution to generate an adaptive response, and performing whole genome sequencing of the output.
As a perturbation, we grew wildtype E. coli cells (strain MG1655) in the presence of trimethoprim, a common antibiotic and inhibitor of many prokaryotic DHFRs. To facilitate the evolution of resistance to trimethoprim, we used a morbidostat, a specialized device for continuous culture (Toprak et al., 2012; Toprak et al., 2013) (Fig. 4A-C). The morbidostat dynamically adjusts the trimethoprim concentration in response to bacterial growth rate and total optical density, thereby providing steady selective pressure as resistance levels increase (see Fig. 4 legend for details). The basic principle is that cells undergo regular dilutions with fresh media until they hit a target optical density (OD = 0.15); once this density is reached, they are diluted with media containing trimethoprim until growth rate is decreased. This approach makes it possible to obtain long trajectories of adaptive mutations in the genome with good statistics and sustained phenotypic adaptation (Toprak et al., 2012). For example, in a single 13-day experiment, we observe resistance levels in our evolving bacterial populations that approach the trimethoprim solubility limit in minimal (M9) media. We carried out evolutionary trajectories in four different media conditions, in which the concentration of exogenous thymidine was varied from none to an amount sufficient to rescue the knockout of TYMS (0, 5, 10, and 50 μg thymidine). All conditions were also supplemented with amicase, a source of free amino acids. As shown in Fig. 5, these different environments can buffer genetic variation in the folate metabolic pathway to different extents. This offers a means to expose a larger range of adaptive mutations than one would observe under a single environment; in this context an absence of mutations outside of the two-gene module becomes more significant.
Over the 13 days of evolution, we estimate the trimethoprim resistance of each evolving population by computing the median drug concentration in the culture vial from the first dilution with drug to the end of the day. Following the median drug concentration, we see that populations supplemented with thymidine evolve trimethoprim resistance more rapidly (Fig. 4D and Fig. S3), suggesting that addition of thymidine to the media accelerates the acquisition of resistance, possibly by opening up new evolutionary paths. To identify the mutants causally related to trimethoprim resistance, we selected 10 single colonies from the endpoint of each of the four experimental conditions for phenotypic and genotypic characterization (40 strains in total, Fig. 5). For each strain, we measured the trimethoprim IC50, growth rate dependence on thymidine, and conducted whole genome sequencing. Consistent with the dynamic estimates of trimethoprim resistance, strains isolated from thymidine-supplemented conditions attained trimethoprim IC50s two orders of magnitude higher than their un-supplemented counterparts (Fig. 5A and Table S2). We were unable to measure IC50 values for strains 4, 5 and 10 from the 50 μg/ml thymidine condition: these three strains grew very slowly but were completely insensitive to trimethoprim. Further, strains from all three thymidine supplemented conditions now depend on exogenous thymidine for growth, indicating a loss of function in the thyA gene that encodes TYMS (Fig. 5B and Fig. S4). This loss of function is not a simple consequence of neutral genetic variation in the presence of thymidine; cells grown in 50 μg/ml thymidine in the absence of trimethoprim retain TYMS function over similar time scales (Fig. S5).
Whole genome sequencing for all 40 strains reveals a striking pattern of mutation (Fig. 5C and Tables S3-S4). Consistent with a previous morbidostat-based study of trimethoprim resistance, under conditions of no thymidine we observe a mutation in the promoter of the folA gene that encodes DHFR, but no mutations in TYMS (Toprak et al., 2012). This mutation was previously shown to enhance trimethoprim resistance by increasing DHFR expression (Flensburg and Skold, 1987). In comparison, isolates from all three thymidine-supplemented conditions acquire coding-region mutations in both DHFR and TYMS, or even just in TYMS. For example, strains 4, 5, 7, and 10 in the 50 μg/ml thymidine condition contain mutations in TYMS but not DHFR – showing that one route to resistance is the acquisition of mutations in a gene not directly targeted by antibiotic. All mutations isolated in DHFR reproduce those observed in the earlier morbidostat study of trimethoprim resistance (Toprak et al., 2012). The mutations in TYMS – two insertion sequence elements, a frame shift mutation, loss of two codons, and a non-synonymous active site mutation - are consistent with loss of function. Thus, the mutations in DHFR and TYMS are consistent with the proposed mechanism of coupling: reduced TYMS activity can buffer inhibition of DHFR.
Consistent with the evolutionary independence of the DHFR/TYMS pair, we observe no other mutations in folate metabolism genes (Fig. 5C and Table S4). More generally, few other mutations occur elsewhere in the genome, and the majority of these are not systematically observed across clones. This result implies that they may be spurious variations not associated with the adaptive phenotype. One of the evolved strains contains only mutations in DHFR and TYMS (strain 1 in 50 μg/ml thymidine, Fig. 5C), indicating that variation in the DHFR/TYMS genes is sufficient to produce resistance. To establish this, we introduced several of the observed DHFR and TYMS mutations into a clean wild-type E. coli MG1655 background and measured the IC50. These data show that the DHFR/TYMS mutations are sufficient to reproduce the resistance phenotype measured for the evolved strains (Fig. S5). Thus, DHFR and TYMS show a capacity for adaptation through compensatory mutation that is contained within the two-gene unit. Consistent with the laboratory findings reported here, loss-of-function mutations in TYMS have been observed in a subset of trimethoprim-resistant gram-negative clinical isolates (including E. coli), indicating that resistance from modulation of the DHFR/TYMS gene pair is also relevant in a natural environment (King et al., 1983). From this, we conclude that DHFR and TYMS act as a quasi-independent adaptive module.
A global statistical analysis of modular synteny pairs in bacteria
Our focused study of the folate metabolic pathway shows that gene synteny can reveal functionally meaningful evolutionary modules within a cellular system. To examine the modular structure of the entire genome, we conducted a global analysis of pairwise synteny relationships amongst genes represented in E. coli. Following from previous work, we use clusters of orthologous groups of proteins (COGs) to define orthologs across species (Galperin et al., 2015). To ensure good statistics, we limit the COGs analyzed to those that co-occur in at least 100 effective genomes (2095 COGs, ∼500,000 pairs in total) (see also Supplemental Experimental Procedures). In Figure 6A, we show a scatterplot of gene pairs, indicating the strength of coupling within each pair (as a relative entropy, along the x-axis) versus the strongest coupling outside of the pair (along the y-axis). In this plot, points fall below the diagonal if the genes in the pair are more tightly coupled to each other than any other gene in the dataset (see Table S5 for a list of pairs). One of these points (in red) corresponds to the DHFR/TYMS pair. Thus, these two enzymes are not only decoupled from folate metabolism, but from all other genes in the genome-wide analysis.
These data reinforce observations made at the single pathway scale. Just like for the folate pathway (Fig. 1B), the pattern of coupling between genes at the genome scale is sparse, as demonstrated by the high density of points with weak coupling (on the left of the graph, along the y-axis). Analysis of the maximum coupling for each gene shows that 906 genes (43%) do not have significant coupling to any other gene in the genome (max(Dij)) < 0.025), suggesting that many genes might behave as single gene modules (Fig. S6A). To understand the relationship between gene pairs coupled by synteny and functional or physical interaction of the associated gene products, we compared our analysis to metabolic annotations from KEGG (Kanehisa et al., 2012) and the set of high-confidence binding interactions in E. coli reported by the STRING database (Szklarczyk et al., 2015). As expected, coupled gene pairs show enrichment for physical complexes, enzymes in the same metabolic pathway, and more specifically, enzymes with a shared metabolite (Fig. 6A,C). But, like for the folate pathway, the vast majority of sequential reactions are not coupled. In general, the statistical analysis does not simply recover the local biochemical relationships in the metabolic pathway diagram. Instead, it identifies couplings between a subset of enzyme pairs.
A general definition for evolutionary modules depends on both strong internal coupling within a module () and weak external coupling () to other genes in the genome. Though it remains a matter for future work to experimentally test the relationship between both of these values and functional modularity, it is instructive to examine other gene pairs with patterns of evolutionary coupling similar to DHFR and TYMS. For illustrative purposes, we consider a simple definition of modular pairs based on empirical cutoffs for internal and external coupling (> 1.0 and < 0.5) (dashed orange box in Fig. 6B). In this set, we observe enrichment for known functional and physical interactions beyond that for coupled gene pairs (Fig. 6C). Table S6 shows that this enrichment does not depend strongly on the choice of cutoff.
The connection between synteny and co-expression (Junier and Rivoire, 2016; Korbel et al., 2004) leads to a natural interpretation of these evolutionary modules as groups of genes whose activity or expression is constrained relative to each other, but that are more independent from the rest of the pathway or system. The DHFR and TYMS pair is consistent with this interpretation – the cell can tolerate reductions in DHFR activity if they are accompanied by loss of function in TYMS. Study of other evolutionary modules from our analysis provides further support for this idea. For example, the gene pair accB/accC encodes two of the four subunits of acetyl-CoA carboxylase, the first enzymatic step in fatty acid biosynthesis. Overexpression of either accB or accC individually causes reductions in fatty acid biosynthesis, but overexpressing the two genes in stoichiometric amounts rescues this defect (Abdel-Hamid and Cronan, 2007; Janssen and Steinbuchel, 2014). Constraints on relative expression have also been noted for the selA/selB and tatB/tatC gene pairs (Bolhuis et al., 2001; Rengby et al., 2004).
Though the analysis presented here focuses on pairs, the concept of evolutionary modules extends to larger groups of genes. In this regard, we expect that some of the genes near the diagonal are in fact part of larger gene modules (e.g. the highly coupled ribosomal gene pair rpsC and rpmC, see also Table S5). Beyond a mere partition into independent modules, the evolutionary analysis in fact leads to a richer representation: a weighted network of synteny relationships. This network awaits further computational analysis and comprehensive testing, following from the approaches developed in this work.
Discussion and Conclusions
Metabolic constraints as an origin for co-evolution and modularity
Much prior work has demonstrated that physical protein interactions can drive coevolution, particularly via the acquisition of complementary interface mutations (Aakre et al., 2015; Hopf et al., 2014; Ovchinnikov et al., 2014; Podgornaia and Laub, 2015). Our analysis of the DHFR/TYMS pair demonstrates a different mechanism for coevolution: constraints on metabolite concentration can drive coordinated changes in enzyme activity. For the DHFR/TYMS pair, coupling appears to be driven by the need to constrain intracellular levels of the intermediate DHF. As a consequence, we see that treatment with trimethoprim experimentally can result in coordinated evolution of both genes. Additionally, recent work has shown that growth rate defects due to overexpression of E. coli DHFR can be partly rescued by increasing TYMS expression, consistent with a general constraint on the relative activities of these two genes (Bhattacharyya et al., 2016). Thus, coevolution is not limited to physical complexes, but more generally reflects the coupling of gene activities regardless of mechanism (Huynen et al., 2000; Snel et al., 2002).
While the mechanism of DHFR/TYMS coupling seems reasonably clear, how and why this pair is decoupled from the rest of metabolism is less obvious. Mathematical models of the folate cycle based on standard biochemical kinetics provide several useful insights (Leduc et al., 2007; Nijhout et al., 2004). First, in eukaryotic cells, thymidine synthesis is the rate-limiting step for DNA synthesis, and transcription of the TYMS and DHFR genes is greatly upregulated (via a common transcription factor) at the G1/S cell cycle transition (Bjarnason et al., 2001). Computationally increasing the activities of DHFR and TYMS 100-fold results in increased thymidine synthesis but only modestly changes the concentration of folate pools. Secondly, the bacterium R. capsulatus lacks both thyA (TYMS) or folA (DHFR) homologs, and instead produces thymidine via thyX, a thymidylate synthase that generates THF (rather than DHF) in the process of thymidine production. When thyX is deleted from R. capsulatus, growth can only be complemented by the addition of both thyA and folA from R. sphaeroides; the thyA gene alone is insufficient. Computational simulation shows that in the absence of a high-activity DHFR (folA), thyA rapidly depletes reduced folate pools by converting them to DHF. The results of these two computational studies are consistent with the idea that relative activities of DHFR and TYMS should be matched. Further, the results suggest that decoupling DHFR and TYMS from the remainder of folate metabolism provides a general strategy to maintain homeostasis independent of physiological or evolutionary variation in these two genes. That is, modularity might allow for adaptive variation in DHFR and TYMS activity while enabling robustness in the remainder of the pathway.
Using evolutionary statistics to decompose cellular systems
The central premise of this work is to use evolutionary statistics to infer couplings between genes, and identify near-independent adaptive modules. Prior work has largely focused on mapping functional couplings between genes in metabolic systems either computationally via flux balance analysis (Deutscher et al., 2006; He et al., 2010; Segre et al., 2005) or experimentally through high-throughput, quantitative assays of cell growth and epistasis (Babu et al., 2011; Collins et al., 2010; Costanzo et al., 2016; Typas et al., 2008). Though important, such studies cannot generally separate the species- or experiment-specific constraints between genes from the conserved constraints that represent the fundamental aspects of genome function. We propose that quantitative analysis of statistical relationships over an ensemble of diverse genomes can provide general models that serve to focus experimental study on the core processes of cellular systems.
Comparison of the genome-scale synteny analysis to existing large datasets (KEGG and STRING) provides encouraging validation of this approach – many of the coupled gene pairs identified by our analysis are consistent with known interactions, including physical complexes and consecutive reactions in metabolism. However, the data reported here shows that existing databases of metabolic structure, physical interactions or gene expression should not be seen as “gold standards” for validating and interpreting co-evolutionary data. Indeed, since coevolution can be driven by different mechanisms, the patterns of epistasis we deduce could extend beyond known physical or metabolic interactions to yield new principles of genome organization and function. Thus, a meaningful test of evolutionary statistical analyses requires new types of experiments that can test both the functional coupling of genes, and the independence of proposed multi-gene modules. Large-scale measurements of gene epistasis begin to address this, but in many cases, are limited to the extreme case of total gene knockout. The epistasis measurements for DHFR and TYMS illustrate how mutations across a range of perturbations to catalytic activity (and growth rate phenotype) can provide additional insight into the nature of gene interaction. The experimental methods developed here provide a clear technical framework for testing and guiding development of co-evolution based approaches.
The experimental data for DHFR and TYMS establish that statistical analysis of synteny across genomes has the capacity to identify functional modules in metabolism. However, other signals of co-evolution exist and should be considered. For example, correlations in gene presence (or absence) across bacterial species have been used to predict functional interactions (Pellegrini et al., 1999), and to identify modules of evolutionarily coupled genes (Kim and Price, 2011). In the case of the folate metabolic pathway, the pattern of coupling obtained by gene presence/absence echoes the modular decomposition observed by synteny (Fig. S6B,C). Again, we observe an overall sparse pattern of coupling, and the DHFR/TYMS gene pair forms an isolated evolutionary module. So, in this instance, the modularity of the DHFR/TYMS pair is identifiable by two distinct measures. More generally, further study is required to more carefully understand the relationship between different co-evolutionary signals, but it is possible that different measures may inform us about distinct aspects of the underlying biology. In summary, our results suggest the existence of a rich intermediate organizational layer between individual genes and complete pathways, consisting of multi-gene modules. This work establishes a viable path to decompose the genome into such functionally and evolutionarily meaningful gene groups using evolutionary information.
Author Contributions
Conceptualization, K.A.R, O.R., and I.J.; Methodology, K.A.R, O.R., I.J, and J.D.R.; Investigation, A.S., C.I., J.O.P, L.C., K.A.R., O.R., and I.J.; Writing – Original Draft, A.S. and K.A.R.; Writing – Review & Editing, A.S., C.I., J.O.P, J.D.R., O.R.,I.J.,K.A.R.; Supervision, K.A.R.
Experimental Procedures
Statistical analysis of gene coevolution
Synteny analysis was conducted using a slightly modified version of the methods described in (Junier and Rivoire, 2013, 2016). See the Supplemental Experimental Procedures for a detailed description of the synteny and cooccurrence calculations.
Forward evolution of trimethoprim resistance in the morbidostat
The morbidostat/turbidostat apparatus was constructed as described by Toprak and colleagues (Toprak et al., 2013). The founder strain for the forward evolution experiment was E. coli MG1655 modified by phage transduction to encode green fluorescent protein (egfp) and chloramphenicol resistance (cat) at the P21 attachment site. The goal of this modification was to prevent and detect contamination with other strains. Throughout the forward evolution experiment, cells were grown at 30°C in M9 media supplemented with 0.4% glucose and 0.2% amicase (Sigma); 30 μg/ml of chloramphenicol (Cam) was added for positive selection.
To begin the experiment, the founder strain was cultured overnight at 37°C in Luria Broth (LB) + 30 μg/ml Cam. This culture was washed twice with M9, and back diluted into M9 + 30 μg/ml Cam supplemented with 0, 5, 10, or 50μg/ml thymidine (thy) for overnight adaptation in culture tubes at 30°C. The next day (henceforth referred to as day 0; day 1 is the end of the first day of adaptation), these overnight cultures were streaked onto LB agar plates: two colonies per condition were chosen for whole genome sequencing (WGS) in order to obtain an accurate sequence for the founder strain. The remainder of the overnight cultures was used to inoculate four morbidostat tubes at containing M9 media with varying thymidine supplementation (0, 5, 10, and 50μg/ml thy). The starting optical density was approximately 0.005. Initial antibiotic concentrations were 0, 11.5 and 57.5 μg/ml trimethoprim for media stocks A, B, and C respectively. Each culture grew unperturbed until it surpassed an OD600 of 0.06, at which point it underwent periodic dilutions with fresh media. The dilution rate is given by the formula , where V = 15ml is the culture volume, and ΔV = 3ml is volume added. We chose a dilution frequency f = 3 h-1, to give rdil = 0.55. Above the OD600 = 0.15, these dilutions are used to introduce TMP into the culture (see also Fig. 3B). This allows controlled inhibition of DHFR activity in response to growth rate. Cycles of growth and dilution continued for a period of ∼22 hours, at which point the run was paused to make glycerol stocks, replenish media, and update TMP stock concentrations. Culture vials for the next day of evolution were filled with fresh media and inoculated using 300μl from the previous culture. Complete trajectories of OD600 and drug concentration are shown in Fig. S1. Endpoint cultures were streaked onto LB agar plates supplemented with 30 μg/ml of Cam and 50 μg/ml thymidine to obtain isolated colonies for whole genome sequencing.
Whole genome sequencing
Two isolates were selected from each adapted day 0 culture, and ten clonal isolates (colonies) were randomly selected from the endpoint of each evolution condition, totaling 48 strains. Isolation of genomic DNA was performed using the QIAamp DNA Mini Kit (Qiagen). The Nextera XT DNA Library Prep Kit (Illumina) was used to fragment and label each genome for paired-end sequencing using a v2 300-cycle MiSeq kit (Illumina). Average read length and coverage can be found in Table S3. Genome assembly and mutation prediction was performed using breseq (Deatherage and Barrick, 2014). The reference sequence was a modification of the E. coli MG1655 complete genome (accession no. NC_000193) , edited to include the GFP marker and chloramphenicol resistance cassette in our founder strain. The modified reference sequence and all complete genome sequences from the beginning and endpoint of forward evolution are available in the NCBI BioProject database (accession number: PRJNA378892, see also Table S4).
Measurements of thymidine dependence
All strains were grown overnight in LB + 5μg/ml thy, with the exception of the strains evolved in the 50μg/ml thy, which were supplemented with 50μg/ml thy to ensure viability. Cultures were then washed twice in M9 media without thymidine, and inoculated at an OD600=0.005 in 96-well plates containing M9 media supplemented with 10-fold serial dilutions of thymidine, ranging from 0.005 μg/ml to 50 μg/ml (in singlicate). OD600 was monitored in a Victor X3 plate reader at 30°C over a period of 20 hours. Growth was quantified using the positive integral of OD600 over time. This measure captures mutational or drug-induced changes in the duration of lag phase as well as perturbations in growth rate (Toprak et al., 2012). For each strain, we identified a start-time (t0) at the end of lag-phase for the fully-rescued 50μg/ml thy condition. We chose each t0 computationally as the last point before monotonic growth above the limit of detection. The log(OD600) versus time curves for all conditions are then vertically shifted (‘background-subtracted’), such that the function value at this start-time is zero. This curve is then numerically integrated from t0 to t0+10 hours using the trapezoid method.
Measurements of trimethoprim resistance (IC50)
All strains were grown overnight in LB + 5μg/ml thy, with the exception of the strains evolved in the 50μg/ml thy, which were supplemented with 50μg/ml thy to ensure viability. Each strain was then washed into media conditions corresponding to the strain’s forward evolution condition, and adapted for 4 hours at 30°C. The recovery cultures were used to inoculate 96-well plates containing M9 media sampling serial dilutions of TMP (in triplicate), with a starting OD600 = 0.005. OD600 was monitored using a Tecan Infinite M200 Pro microplate reader and Freedom Evo robot at 30°C over a period of at least 12 hours. The trimethoprim resistance of each strain was quantified by its absolute IC50, the drug concentration μg/ml) at which growth is half-maximal. The relationship between growth and trimethoprim inhibition is modeled using the four parameter logistic function: where Y is growth, X is TMP concentration, a is the asymptote for uninhibited growth, d is the limit for inhibited growth, c provides the concentration midway between a and d, and b captures sensitivity (Sebaugh, 2011). Growth was quantified using the positive integral of OD600 data over a 10h period of growth (see also the methods for measurement of thymidine dependence). For each strain, we identify a start-time (t0) at the end of lag-phase for the uninhibited 0μg/ml TMP condition. Growth versus TMP concentration was fit to the above model using MATLAB. IC50 was calculated as the concentration X* for which growth Y(X*) = a/2.
Growth without trimethoprim selection in 50μg/ml thymidine using the turbidostat
The founder strain for this experiment was identical to that used for evolution of trimethoprim resistance. Throughout the experiment, cells were grown at 30°C in M9 media supplemented with 0.4% glucose and 0.2% amicase (Sigma); 30 μg/ml of chloramphenicol (CAM) was added for positive selection. To begin the experiment, the founder strain was cultured overnight at 37°C in Luria Broth (LB) + 30 μg/ml Cam. This culture was washed twice with M9, and back diluted into M9 supplemented with 50μg/ml thymidine (thy) for overnight adaptation in culture tubes at 30°C. The next day (henceforth referred to as day 0; day 1 is the end of the first day of continuous culture), the overnight culture was used to inoculate three turbidostat tubes containing 17ml of M9 supplemented with 50 thy. The starting optical density was approximately 0.005. Each culture grew unperturbed until it reached an OD600 of 0.15, at which point it was diluted with 2.4 ml of fresh media. These cycles of growth and dilution persisted for a period of ∼22 hours, at which point the run was paused to make glycerol stocks and replenish media. Culture vials for each following day of evolution were filled with fresh media and inoculated using 300μl from the previous culture.
Epistasis Measurements
All relative growth rate measurements were performed in the E. coli folate auxotroph strain ER2566 ΔfolA ΔthyA (Lee et al., 2008). DHFR (folA) and TYMS (thyA) are provided on the plasmid pACYC-Duet1 (in MCS1 and MCS2, respectively) and are each under control of a T7 promoter. For these experiments, we use leaky expression (no IPTG induction). Each mutant plasmid (20 in total) is marked with a genetic barcode in a non-coding region between the two genes. Plasmids were transformed into the auxotroph strain, and each mutant was grown overnight in separate LB +30μg/ml Cam +50μg/ml thy cultures. Then, cultures were washed 2x in M9 media supplemented with 0.4% glucose and 0.2% amicase and 30μg/ml Cam, and adapted overnight at 30°C. All mutants were mixed in equal ratios based on OD600 and inoculated at a starting OD600 = 0.1 in the turbidostat. Growth rates were measured under two conditions: 5 thy and 50 thy, with three replicates each. The turbidostat clamps the culture to a fixed OD600 = 0.15 by adding fresh dilutions of media. Every 2 hours over the course of 12 hours a 1ml sample was removed, pelleted and frozen for next-generation sequencing. Amplicons containing the barcoded region with appropriate sequencing adaptors (350 basepairs in total size) were generated by two sequential rounds of PCR with Q5 polymerase. The barcoded region was sequenced with a single-end MiSeq run using a v2 50 cycle kit (Illumina). We obtained 14,348,937 reads. Data analysis was performed using a series of custom python scripts to count barcodes, and MATLAB to fit relative growth rates.
Constructing DHFR/TYMS mutants in a clean genetic background
We followed the protocol for scarless genome integration using the modified λ-red system developed by Tas et al. (Tas et al., 2015). In this method, a tetracycline resistance cassette (“landing pad”) is first integrated at the site targeted for mutagenesis. Then, the landing pad is excised by the endonuclease I-SceI, and replaced with the desired mutation by λ-red mediated recombination. NiCl2 is used to counterselect against cells that retain the tetracycline cassette. Tas et al. provides a detailed protocol; here we give the specifics necessary for our experiments. For the λ-red machinery, we transformed the plasmid pTKRED (Genbank accession number GU327533) into electrocompetent E. coli MG1655 with a genomic egp/cat resistance cassette (the forward evolution founder strain). For the Δ25-26 TYMS mutation, we introduced the tetA landing pad between genome posisitons 2,964,900 and 2,965,201 (genome NC000913) corresponding to the N-terminus of the thyA gene. For the DHFR mutations (L28R, W30R, and P21L), the landing pad was recombined between genome positions 49,684 and 49,990 (genome NC000913). In order to replace the Tet cassette, cells were induced with 2mM IPTG and 0.4% arabinose, and then transformed with 100ng of dsDNA PCR product containing the mutation of interest (with appropriate homology arms). This reaction experienced 3 days of outgrowth at 30°C in rich defined media (RDM, Teknova) with glucose substituted for 0.5% v/v glycerol. The media was supplemented with 6 mM or 4mM NiCl2 for counterselection against tetA at the thyA locus or folA locus respectively. The outgrowth culture was streaked onto agar plates and screened daily for the mutant of interest using LB supplemented with 50 μg/ml thy, 30 μg/ml Spec, and +/-5-10 μg/ml Tet. All mutations were confirmed by Sanger sequencing of the complete folA and thyA open reading frame; for folA the promoter region was also sequenced.
LC-MS Metabolite Measurements
Cells were cultured in M9 0.2% glucose media containing 0.1% amicase, 50 ug/ml thy, and 30 ug/ml Cam at 30°C for metabolite analysis. In mid-log phase at OD600 ∼0.2, E. coli culture (3 ml for nucleotide measurement and 7 ml for folate measurement) was filtered on a nylon membrane (0.2 μm), and the residual medium was quickly washed away by filtering warm saline solution (200 mM NaCl at 30’C) over the membrane loaded with cells to exclude non-desirable extracellular metabolites from LC-MS analysis. The membrane was immediately transferred to a 6 cm Petri dish containing 1 ml cold extraction solvent (-20°C 40:40:20 methanol/acetonitrile/water; for folate stability, 2.5 mM sodium ascorbate and 25 mM ammonium acetate in folate extraction solvent (Lu et al., 2007)) to quench metabolism. After washing the membrane, the cell extract solution was transferred to a microcentrifuge tube and centrifuged at 13000 rcf for 10 min. The supernatant was transferred to a new microcentrifuge tube. Folate samples were prepared with an additional extraction: the pellet was resuspended in the cold extraction solvent and sonicated for 10 min in an ice bath. After the second extraction and centrifugation, the supernatant was combined with the initial supernatant. The metabolite extracts were dried under nitrogen flow and reconstituted in HPLC-grade water for LC-MS analysis. Metabolites were measured using stand-alone orbitrap mass spectrometers (ThermoFisher Exactive and Q-Exactive) operating in negative ion mode with reverse-phase liquid chromatography (Lu et al., 2010). Exactive chromatographic separation was achieved on a Synergy Hydro-RP column (100 mm×2 mm, 2.5 μm particle size, Phenomenex) with a flow rate of 200 μL/min. Solvent A was 97:3 H2O/MeOH with 10 mM tributylamine and 15 mM acetic acid; solvent B was methanol. The gradient was 0 min, 5% B; 5 min, 5% B; 7 min, 20% B; 17 min, 95% B; 20 min, 100% B; 24 min, 5% B; 30 min, 5% B. Q-Exactive chromatographic separation was achieved on an Poroshell 120 Bonus-RP column (150×2.1 mm, 2.7 μm particle size, Agilent) with a flow rate of 200 μL/min. Solvent A is 10mM ammonium acetate + 0.1% acetic acid in 98:2 water:acetonitrile and solvent B is acetonitrile. The gradient was 0 min, 2% B; 4 min, 0% B; 6 min, 30% B; 11 min, 100% B; 15 min, 100% B; 16 min, 2% B; 20 min, 2% B. LC-MS data were analyzed using the MAVEN software package (Clasquin et al., 2012).
Synteny calculations
A. Starting dataset
Calculating synteny requires a collection of genomes where individual genes are assigned into orthology classes. The Clusters of Orthologous Groups of proteins (COGs) defined by Koonin and colleagues provide one well-established set of ortholog annotations (Galperin et al., 2015). The results presented here use all complete and COG-annotated bacterial genomes available in the NCBI database as of March 2015 (1445 genomes and 4764 COGs, this dataset is also used in Junier and Rivoire, 2016). A genome may contain more than one gene in the same COG, but for clarity, we start by presenting the calculations assuming that every orthology class maps to at most one gene in each genome.
B. Counting pairs in co-occurrence
Synteny is only relevant for the subset of genomes where both orthology classes are present. Thus, we begin by counting the number of genomes where orthology classes i and j co-occur. As previously published (Junier and Rivoire, 2016), we correct for the uneven phylogenetic distribution of sequenced genomes (strains) by introducing genome weights. To this end, we compute a distance between each pair of strains, based on the sequence similarity of a few conserved genes (δgh = 1 — Sgh, where Sgh is the average sequence similarity). The weight ws of strain s is then defined as 1/ns where ns is the number of strains within a given distance δ of s. Varying δ can provide information at different “phylogenetic depths” (Junier and Rivoire, 2013) but here we fix δ = 0.3, our results being generally invariant to this value.
The effective number of strains where orthology classes i and j co-occur is formally given by where the sum is over the strains s and where 𝟙[X] is a generic indicator function with 𝟙[X] = if and only if X is true. Hence, 𝟙[i ∩ s = ∅] = 1 if i is represented in strain s and 0 otherwise.
C. Defining gene proximity
We measure the distance d(i,j) between the midpoint of two genes i and j in base pairs (and set d(i,j) = to if they are on different chromosomes). Given a circular chromosome of length L, the greatest possible distance between genes is L/2 (on opposite sides of the circle). Thus, given a null model in which genes are randomly distributed along the chromosome, the probability of finding the gene pair within a genomic proximity d* is just the normalized value p* = d* / (L/2).
D. Counting pairs in synteny
The value p* provides a measure of signifcance for finding two genes at a distance d* in one genome. However, we are interested in the conservation of proximity across many species. To begin, we count the effective number of strains in which i and j are within a given distance d*.
However, because p* = (2d)/L, the probability of finding two genes within distance d* depends on the chromosome length L, which varies between strains. In order for the probability of observing a positive event under the null model to be common for all strains, we instead consider the normalized distance and compute: For strains that contain multiple chromosomes, we take for Ls the sum of the lengths of its different chromosomes. This corresponds to a null model where the genes are randomly shuffled within and between chromosomes (or, up to boundary effects, to concatenating all the chromosomes into a single one). We take p* = 0.02, corresponding to d = 50 kb in the context of a chromosome of length 5 Mb. This cutoff is chosen to represent a length scale longer than those typical for gene coexpression and synteny, so that the choice of cutoff does not determine the results. Further, the results are robust with respect to the choice of p* .
Finally, to account for the possibility that a single strain may contain multiple pairs of genes in two given orthology classes ij, we correct Eq. (3) by averaging over all these pairs: where i ⋂ s is as before the set of genes in orthology class i and in strain s and |i ⋂ s| the size of this set. This formula is simpler than the one used in (Junier and Rivoire, 2016) but leads to similar results.
E. Measuring significance
Now that we have counted the number of genomes in which i and j are proximal, we can assess the significance of this result. In a standard statistics “coin toss” problem, one computes the significance of obtaining X “tails” out of M “flips” (given a probability of tails p* = 0.5) using the binomial distribution. Here, we compute the significance of finding a pair of genes in proximity Xij times out of Mij genomes (given a probability of p* = 0.02) using the same approach: where I(a,b,x) is the regularized incomplete beta function.
This relatively naive null model (which assumes a uniform distribution of genes along the chromosome, and treats weighted genomes as independent trials) provides a good description of the data for the majority of orthology class pairs - indicating that most gene pairs have no significant conservation of chromosomal proximity (Junier and Rivoire, 2016). A subset of pairs nevertheless deviate from the statistical expectations of the null model; these are the syntenic pairs of interest.
Finally, analysis of any large dataset inevitably leads to spurious false positives that simply occur by random chance. To account for this, we apply the Bonferroni principle - we set here a threshold of significance to π* = 2/N(N – 1) ∼ 10−7 where N = 4764 is the number of orthology classes defined by COGs. That is, we choose a cutoff such that we should not find any significantly syntenic gene pair “by random” among all 107 possible gene pairs. This criterion is very stringent, and may be relaxed to set instead a false discovery rate (Junier and Rivoire, 2016).
F. Degree of synteny
The p-values πij depend on the number of genomes in the dataset. It is more meaningful to define a measure of conservation that depends only on rescaled variables, here the frequencies fij = Xij/Mij. For these frequencies to be meaningful, we need, however, to restrict to cases where the number Mij of genomes where genes i and j co-occur is large. Here, we restrict to pairs of COGs with Mij ≥ 100. A degree of synteny is then given by the relative entropy:
In the limit of large Mij, e-Mij Dij approximates the first term of the sum in Eq. (5) and therefore MijDij correlates with – ln πij. The maximal value of Dij is set by p*: as p* = 0.02 corresponds to – lnp* ≃ 4, the range of values f< Dij is thus 04. Finally, since Mij ≥ 102 and π* = 10−7, any value of Dij larger than D* = – (In 10−7)/102 ≃ 0.02 reports significant synteny.
G. Application to E. coli
To analyze synteny relationships relevant to E. coli, we keep only the COGs i that are represented in its genome, and analyze COG pairs for which Mij ≥ 100 (2095 COGs in total). In Fig. 6B, we plot for each pair ij of these COGs their degree of synteny Di,j (x-axis) against their maximal degree of synteny with any other COG maxk≠i,j (Dik, Djk) (y-axis). We define two-gene modules as all pairs where the within-pair coupling Di,j > 1, and the maximum coupling outside of the pair Di,j < 0.5. In this figure, we use the String database to annotate physical interactions, taking a threshold of 700 and the largest score when multiple paralogs are present.
Acknowledgements
We thank members of the Reynolds lab for review of the manuscript, E. Toprak for extensive advice on morbidostat construction and operation, S. Benkovic for the ER2566 ΔfolA ΔthyA strain, T. Kuhlman for molecular biology reagents used in genome editing, T. Bergmiller for the GFP/Chloramphenicol resistance marker incorporated into our founder strain, and R. Ranganathan for discussions. This research was funded in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4557 to K.R. and by the Green Center for Systems Biology at UT Southwestern Medical Center. A.S. was supported in part by NIH training grant 5T32GM8203-28. I.J. is supported by an ATIP-Avenir grant (Centre National de la Recherche Scientifique).