Abstract
The dynamics of genetic diversity in large clonally-evolving cell populations are poorly understood, despite having implications for the treatment of cancer and microbial infections. Here, we combine barcode lineage tracking, sequencing of adaptive clones, and mathematical modelling of mutational dynamics to understand diversity changes during experimental evolution. We find that, despite differences in beneficial mutational mechanisms and fitness effects between two environments, early adaptive genetic diversity increases predictably, driven by the expansion of many single-mutant lineages. However, a crash in diversity follows, caused by highly-fit double-mutants fed from exponentially growing single-mutants, a process closely related to the classic Luria-Delbruck experiment. The diversity crash is likely to be a general feature of clonal evolution, however its timing and magnitude is stochastic and depends on the population size, the distribution of beneficial fitness effects, and patterns of epistasis.
In large clonally-evolving populations, lineages harboring beneficial “driver” mutations expand, compete with one another, and acquire further beneficial mutations, shaping genetic diversity [1–3]. Recent studies employing deep genomic sequencing have shown that large laboratory [4–6] and clinical [7–13] cell populations harbor high levels of genetic diversity that changes through time. In disease-relevant scenarios, such as cancer [7–11] and within-host microbial dynamics [13], the timescale over which diversity builds up is often short, such that dominant clones only accumulate a handful of driver mutations. When the supply of driver mutations is low, evolution is characterized by successive selective sweeps, wherein a single adaptive lineage periodically purges genetic diversity [14, 15]. However, when the supply of driver mutations is high, evolution is characterized by clonal interference, with multiple adaptive lineages expanding and competing through time [5, 16–19]. In the clonal interference regime, mutations often rise and fall together in cohorts [5, 11, 20]. However, due to the limited ability to detect low frequency mutations via genomic sequencing, it remains unclear what controls diversity changes through time in this regime, whether large purges of diversity might also occur as in the case of a selective sweep, and whether these diversity crashes are predictable across replicates and environments [21].
To address these questions, we introduced ~500,000 unique DNA barcodes into S. cerevisiae [22] and evolved populations of ~5 × 108 cells in triplicate for 304 generations under two well-mixed nutrient-limited environments: nitrogen-limitation (N-lim) and carbon-limitation (C-lim). Lineage abundances were tracked by barcode sequencing every ~8-24 generations. Lineages that harbor adaptive mutations were identified by finding those trajectories that deviate from a neutral expectation [22]. Tracking all adaptive lineages through time reveals the changing levels of adaptive lineage diversity (Figure 1). Initially, adaptive diversity expands, driven by thousands of independent mutations. This expansion is quantitatively different between environments; in N-lim, the expansion is slower and fewer lineages reach high frequencies. Later however, similarities between environments emerge: a handful of lineages begin to dominate the population, causing a crash in adaptive lineage diversity.
We suspected that differences in lineage diversity dynamics between the two environments could be attributed, in part, to differences in the mutational Distribution of Fitnesses Effects (mDFE), defined as the distribution of mutation rates over fitness effects for single beneficial mutations arising on the ancestor. We have previously shown that high-resolution lineage tracking over short times can be used to infer the mDFE [22]. In C-lim, the mDFE results in ~104 beneficial mutations with fitness effects (s) above 3% entering the population over the first ~100 generations. This initially produces a quasi-deterministic expansion in diversity because the low fitness effect beneficial mutations that dominate early occur at a high rate. Later, the diversity expansion becomes more stochastic because the high fitness effect beneficial mutations that dominate during this time occur at a lower rate (i.e. the effective beneficial mutation rate decreases, SM2.1). To test whether these features might generalize to other environments we inferred the mDFE for N-lim (SM1). We find that the shape of the mDFE in N-lim is qualitatively different from that in C-lim (Figure 2). Most strikingly, the rates of mutation to higher fitness effects (s>5%) are ~3-fold lower in N-lim, and fall off rapidly, resulting in no detectable fitness effects above 8%. With time, the lineage dynamics become exponentially more sensitive to these differences at the higher fitness effects (expanded region Figure 2 and SM2). In C-lim, these mutations establish (escaping stochastic loss), expand, and compete over shorter timescales (Figure 1 A vs. C). In N-lim, the lower rate of mutation to higher fitness effects results in even more stochastic dynamics: the highly fit mutations occur in smaller numbers, causing larger variations between replicates (Figure 1 C vs. D, SM2.1).
To verify that single beneficial mutations are the determinants of the early diversity dynamics, we used barcode-directed whole-genome sequencing to identify hundreds of unique adaptive mutations that span a wide range of fitness effects. In C-lim, we previously found a near-comprehensive spectrum of adaptation-driving single-mutations by sequencing 418 adaptive clones isolated from generation 88 [23]. Here, we repeated this for N-lim, sequencing 291 adaptive clones picked from generation 192 and re-measuring fitness (SM3.2). In both environments, the majority of sequenced adaptive clones contain a single adaptive mutation (>75%), consistent with single-mutants being the determinants of early diversity dynamics. Lineages containing two adaptive mutations are crucial to the later-time diversity dynamics and we return to these below. Focusing on single-mutant clones, we find major differences in mutational mechanisms and the mutational targets of adaptation (Figure 2). Surprisingly, Ty transposition events play a major role in driving adaptation in N-lim but not C-lim. In both environments, adaptation is driven first by cells that undergo a frequent diploidization (Dip) event, and later by cells that acquire mutations in a small set of nutrient sensing pathway genes (Figure 2) and [16]. The majority of these recurrently mutated genes are putative loss-of-function (LoF) mutations, with a minority being putative gain-of-function (GoF) mutations (SM3.4). Additionally, there are a significant number of adaptive mutations in singly mutated genes that do not appear to function in nutrient sensing pathways, suggesting cells have many pathways to increase fitness in N-lim (Figure 2B).
In each evolution, we observed a lineage diversity crash whereby a handful of lineages outcompete all others (Figure 1). One possibility is that multiple adaptive clones within each lineage contribute to a lineage’s dominance. Alternatively, a single large clone within each lineage may be primarily responsible for this dominance. However, it is unclear how such large clones would arise. To investigate these questions, we simulated the diversity dynamics using the mDFE inferred in each environment, removing any lineages which were found to contain two adaptive mutations occurring in the same cell (double-mutants), and tracked both the lineage and clone diversity (SM4). To make comparisons of genetic diversity across a large number of simulations, we plot the Shannon entropy [24] of adaptive lineages and clones through time, which track one another closely until the time at which triple- and quadruple-mutant clones first expand within barcoded lineages (SM2.5). Simulations accounting only for the stochastic occurrence of, and competition between, single-mutants (single mutant model) predict that diversity should crash slowly, at odds with observations (Figure 3A). We therefore reasoned that the diversity crash is caused by the emergence of double-mutants. To test this, we modified our simulations to allow for multiple-mutants drawn from the single-mutant mDFE and whose fitness effects combine additively (additive model, Figure 3B). Additive model simulations produce diversity crashes that are caused by a handful of lineages that each contain a single dominant double-mutant clone (typically <5 lineages are >90% of the population at times beyond 150 generations, SM4).
To understand why a handful of large double-mutant clones cause a diversity crash we considered a model of the mutation acquisition process (SM2) and [17]. Single-mutants establish in large numbers because the product of the population size (N = 5 × 108) and the effective beneficial mutation rate (Ub>~10−7, see SM2.1) is large (NUb>50). However, since the probability of two mutations occurring concurrently is small (NUb2<<1), double mutants must enter stochastically on the background of exponentially growing single-mutants. By considering when an expanding single-mutant clone will typically give rise to a double-mutant (following similar arguments presented in [15, 17]), we find that the distribution of sizes (n) of double-mutant clones is a power law ~1/n(2+Δ)/(1+Δ), where Δ is the fitness effect of the second mutation over the fitness effect of the first mutation (s2/s1). If double-mutants are no fitter than single-mutants (Δ=0), this collapses to the classic Luria-Delbruck distribution of clone sizes, ~1/n2 [25], whereby many double-mutant cells belong to a large number of small clones. However, when double-mutants are significantly fitter than single-mutants (Δ>1), most double-mutant cells belong to a handful of large clones (SM2). In this second case, a few lucky double-mutants with the earliest occurrence times go on to dominate the population, but stochasticity in their occurrence times results in variability in the timing and depth of the diversity crash between replicates (Figure 3F, SM4).
To validate that a handful of double-mutants cause the diversity crashes observed in our experiments, we examined the trajectories of lineages containing a confirmed double-mutant and verified that they do indeed go on to dominate (data in Figure 3D & E). Surprisingly, however, the sequencing of clones revealed that dominant double-mutants were not composed of two high fitness effect mutations (e.g. LoF+LoF) as would be predicted by our additive model simulations (SM4). Instead, dominant clones that were sequenced were Dip+GoF double-mutants (Dip+ras2 in C1; Dip+mep1 in N1), despite neither GoF mutation occurring at a high rate. We reasoned that this could be caused by epistasis: some classes of beneficial mutations combine non-additively and these interactions must play a crucial role in determining which mutations go on to dominate. To test this, we modified our additive model simulations (above) to ban second mutations that are implausible or unobserved (Dip+Dip, Dip+LoF, LoF+LoF, or LoF+GoF, SM4). Simulations using this “epistasis model” produce diversity crashes and lineage trajectories that are more consistent with observations (Figure 3C). Importantly, the epistasis model predicts that clones driving the diversity crash will usually be Dip+GoF, if the crash is deep, and LoF+Dip if the crash is shallow.
Lineage trajectories alone have limited power to distinguish between the additive and epistasis models (Figures 3B & C). To further test our models, we therefore asked if the dynamics of mutations, rather than lineages, are consistent with predictions of either model. We measured the abundance of diploids in the population every 8-24 generations using a colony-growth assay (SM6) and [23], not only for the 4 evolutions described above, but for two additional evolutions (1 in C-lim, 1 in N-lim) not characterized by lineage tracking (Figure 4B-E). Consistent with our observations, both models predict that replicate diploid trajectories will track each other closely, first, as large numbers of ancestral cells diploidize and expand, and second, as diploids begin to be out-competed by haploids that have acquired fitter LoF and GoF mutations. At later times, however, the models deviate. In C-lim, the additive model predicts that LoF+LoF or LoF+GoF double-mutants drive the continuing decline of diploids (Figure 4B). In N-lim, the additive model predicts that Dip+LoF mutations should expand fast enough that diploid trajectories never dip (Figure 4D). However, consistent with observations, the epistasis model predicts that the diploid trajectory will dip and subsequently recover, driven by LoF and GoF haploids that diploidize and by diploids that acquire GoF mutations (Figures 4C & E). Since this diploid recovery is driven by rare double-mutants, its timing and depth are predicted to be highly stochastic, resulting in large variations between replicates, in agreement with our data. In cases where diploids recover early (e.g. C1, N1 and N3), our model correctly predicts that the recovery is likely driven by Dip+GoF double-mutants, largely because this event has the highest Δ value (SM2).
We have shown that the genetic diversity in our experiments first increases quasi-deterministically, caused by a large number of single-mutants, and later crashes stochastically, caused by competition from a handful of highly-fit double-mutants that occur anomalously early. While the expansion rate and the timing and depth of the crash are influenced by the population size, the mDFE, and the pattern of epistasis between adaptive mutations, they are likely to be general features of clonal evolution. In environments such as our experiments where the mDFE lacks extremely high fitness effect mutations, mutations that cause the diversity expansion are being fed from a large, effectively-constant population, while mutations that cause the crash are being fed from a small, exponentially-growing population. In environments where extremely high fitness effect mutations are possible, such as in the presence of a growth-inhibiting drug [26], a diversity expansion and crash will still occur, but the crash will sometimes be driven by the expansion of a highly fit single-mutant (SM2.6 and SM2.7). That is, a diversity crash is likely to occur, whether driven by a well-described selective sweep of a single mutant [14, 15] or by a multiple mutant that occurs anomalously early. While we have focused on well-mixed populations, a qualitatively similar phenomenon has been described during the expansion of spatially structured populations [27]. More generally, our work highlights that while genetic diversity evolves in a highly stochastic way and depends on rare events, the statistics of these processes provide a means by which to forecast evolution [21].
While in our experiments the deterministic to stochastic transition occurs between first and second mutations, in small populations (NUb<1), first adaptive mutations are stochastic, and diversity crashes (of neutral mutations) will occur at nearly every adaptive event. In even larger populations (NUb2/s> 1 >NUb23/s2), double-mutants will occur deterministically, but triple-mutants stochastically, and therefore the diversity crash will be caused by a few triple-mutants. More generally, for mDFEs that lack a small supply of relatively large fitness effects (whose dynamics can be characterized by a single “predominant fitness class” [17]), the diversity crash will be driven by clones harbouring k beneficial mutations, where k is the smallest integer for which Ns(Ub/s)k<1, a parameter which also controls the steady-state range of fitnesses present in an adapting population [17]. Previous work has found that “beneficial cohorts”, multiple beneficial mutations co-occurring in clones that are at frequencies below the detection limit of whole-genome or whole-exome sequencing, commonly drive laboratory [5, 20] and clinical [11] clonal evolution. Our results suggest that, at least during the early stages of clonal evolution, these cohorts are expected, with cohort size being determined by k (SM2). Furthermore, theoretical work predicts that beneficial cohorts and fluctuations in genetic diversity should occur throughout evolution, driven by the stochastic occurrence of anomalously early and/or fit multiple-mutants [28, 29]. Experimental validation of these predictions, however, will likely require the development of double-barcoding technologies [30] or barcodes that continuously generate diversity through time [31–33].
Methods
Experimental evolutions
Details of strain construction, media and equipment used in the evolutions can be found in the supplementary information of Levy et. al. [14].
Barcode sequencing
Details of the DNA preps, PCR reactions and sequencing protocols can be found in the supplementary information of Levy et. al. [14] for the Carbon condition. Further relevant details for the Nitrogen condition can be found in Section 1.1 and 1.2 of the supplementary information accompanying this manuscript.
Identifying adaptive lineages
Adaptive lineages were identified using the same methods and code as outlined in the supplementary information of Levy et. al. [14]
Isolating and Sequencing of adaptive clones
All clones, including those from the Nitrogen conditions, were isolated and sequenced following the protocol outlined in detail in the “Star methods” section of Venkataram et. al. [16]. For further details of the sequencing of the Nitrogen clones we refer the reader also to Sections 3.1 and 3.2 of the supplementary information accompanying this manuscript.
Simulated lineage dynamics
Details of the simulations can be found in Sections 2 and 4 of the supplementary information accompanying this manuscript.
Acknowledgements
We wish to thank all members of the Levy, Sherlock and Fisher labs for useful discussions and comments. J.R.B. is supported NSF PHY-1545840, Stand Up 2 Cancer and by the Louis and Beatrice Laufer Center; D.S.F. by NSF PHY-1305433, NSF PHY-1545840, Stand Up 2 Cancer and NIH R01 HG003328; G.S by NIH grants R01 HG003328 and GM110275; and S.F.L by NIH grant R01 HG008354 and by the Louis and Beatrice Laufer Center. All data available on request.