Abstract
The contribution of pre-existing and de novo genetic variation towards clonal adaptation is poorly understood, but essential to design successful antibiotic or cancer therapies. To address this, we evolved genetically diverse populations of budding yeast, S. cerevisiae, consisting of ∼107 diploid cells with unique haplotype combinations. We studied the asexual evolution of these populations under selective inhibition of DNA replication and cell metabolism by time-resolved whole-genome sequencing and phenotyping. All populations underwent clonal expansions driven by de novo mutations, but remained genetically and phenotypically diverse. Despite the genetic diversity of the founder cells, we observed recurrent adaptive mutations. However, the founding fitness variance limited the scope for adaptive mutations to expand. The clones exhibited continued evolution by widespread genome instability, rendering recessive de novo mutations homozygous and refining pre-existing variation. Our results show that three intertwined processes dominate the adaptive response: exploiting genetic backgrounds, de novo mutations and genome instability.
I. INTRODUCTION
The adaptive response of a cell population can thwart therapeutic control of a wide spectrum of diseases, from bacterial and viral infections to cancer. A prototypical scenario arises when individuals in a population acquire heritable genetic or non-genetic changes to adapt and thrive in a new environment (Toprak et al., 2012; Marusyk et al., 2014; Balaban et al., 2004). Since the seminal findings by Luria and Delbrück that phage-resistant bacteria can acquire adaptive mutations prior to selection (Luria and Delbrück, 1943), measuring the fitness effects and dynamics of mutations has been key to map the principles of evolutionary adaptation (Barrick and Lenski, 2013). The focus has typically been on characterizing few mutations at a time under the implicit assumption that beneficial mutations are rare, treating pre-existing and acquired mutations separately. However, many mutations are often simultaneously present in a population, which result in fitness differences between individuals that selection can act upon (Parts et al., 2011; Lang et al., 2013; Levy et al., 2015). Recent findings indicate that genetic heterogeneity can thus play a major role in the development of resistant bacterial infections (Young et al., 2012; Lieberman et al., 2014) and in cancer recurrence (Gerlinger et al., 2012; Landau et al., 2013; Eirew et al., 2014). Given that mutations in asexual populations are physically linked in the genome, their fates are mutually dependent and selection can only act on these sets of variants in their entirety (Lang et al., 2011). As a result, genetic diversity can change the evolutionary fate of new adaptive mutations by limiting the number of backgrounds where they can still outcompete the fittest extant individuals. To what extent then can the adaptive response be attributed to genetic variation already present and how much to acquired? How do the aggregate effects of mutations influence adaptive trajectories?
To address these questions, we investigated the interaction between pre-existing (or background) genetic variation and new mutations in a population of diploid cells with unique combinations of alleles. The cells originate from two diverged S. cerevisiae strains (Fig. 1A). We carried out 12 rounds of random mating and sporulation (meiosis) between DBVPG6044, a West African palm wine strain (WA), and YPS128, a North American oak tree bark strain (NA) (Parts et al., 2011). The cross population (WAxNA) consisted of 107-108unique haplotypes, with a pre-existing variant segregating every 230 bp on average. We further identified 82 SNPs acquired de novo during the crossing phase from genome sequences of 173 founder individuals, which is consistent with a mutation rate of approximately 3 × 10−10 SNPs per site per cell division, close to empirical estimates in other yeast strains (Zhu et al., 2014). Starting from WA, NA and WAxNA founders, we asexually evolved populations in serial batch culture with inhibitors of DNA replication (hydroxyurea) and of cellular growth and metabolism (rapamycin) at concentrations impeding, but not ending, cell proliferation. These drugs were chosen as generic instances of strong selective pressures that act on fundamental growth-related pathways with known targets. We derived replicate lines of WA, NA (2 each in hydroxurea and rapamycin) and WAxNA (6 in hydroxyurea, 8 in rapamycin and 4 in a control environment). We monitored evolutionary changes by whole-genome sequencing of populations after 2, 4, 8, 16 and 32 days, as well as clonal isolates at 0 and 32 days (Table S1). Finally, we measured growth phenotypes at the initial and final time point for a subset of populations, and quantified the relative fitness contributions of background and de novo variation using a genetic cross.
II. RESULTS
Two regimes of selection became readily apparent in both sequence and phenotype. Initially, there were local changes in the frequency of parental alleles under selection (Fig. 1B). Over time, macroscopic subclonal populations arose and expanded, depleting the pool of genetic diversity. These successful genotypes persisted in time, manifested by broad jumps in the allele frequency visible across the genome (Fig. 1B and Fig. S2). But what drives these expansions: is it the founder haplotypes themselves, de novo mutations relegating the parental variation to the role of passengers, or their combined action?
Selective effects on background variation
To determine the adaptive value of background variation, we identified regions where local allele frequencies changed over the time course of the selection experiments. Frequency differences over time indicate that selection is acting on background alleles, and linked passenger mutations also change in frequency by genetic hitchhiking (Illingworth et al., 2012). We performed a systematic scan for background variants under selection using data up to 4 days, when no population yet had detectable subclones which would distort this signal (‘Materials and methods’). A region of interest was found in chromosome VIII (coordinates ∼460-490 kb) in all WAxNA populations under rapamycin (Fig. 1C). We evaluated two candidate genes in this region by reciprocal hemizygosity, validating the CTF8 NA allele to increase rapamycin resistance (Fig. S10). CTF8 harbors two background missense variants and has previously been implicated in sensitivity to rapamycin, although the mechanism remains unknown (Parsons et al., 2004). Carrying the CTF8NA allele confers a 36% growth rate advantage. KOG1, which falls within the same region and is a subunit of the TOR complex 1 (TORC1), differs by seven missense mutations between the parents. However, reciprocal hemizygous deletions only revealed a modest fitness difference between WA and NA sequences of KOG1. We did not find events that replicated across all populations in hydroxyurea.
Pervasive selection of macroscopic subclones
To reconstruct clonal expansions in the WAxNA populations we used background genetic variants as markers. Using the cloneHD algorithm (Fischer et al., 2014), we inferred the subclonal genotypes and their frequency in the populations, both of which are unknown a priori (‘Materials and methods’). We found at least one subclone in all populations under selection, but none in the control environment (Fig. 2 and Fig. S4). No population became fully clonal during the experiment, with subclone frequencies stabilizing after 16 days in several rapamycin populations. Similarly, WA and NA populations under selection underwent adaptation as evidenced by de novo mutation frequencies (Fig. S3).
To genetically characterize subclones, we isolated and sequenced 44 clones drawn from WAxNA populations after the selection phase (‘Materials and methods’) Fig. S6). From population and isolate sequence data, we observed 19 recurrent de novo mutations in the hydroxyurea targets RNR2 and RNR4 during hydroxyurea selection and in the rapamycin targets FPR1 and TOR1 during rapamycin selection (Table S2). Each of these driver mutations had a drug resistant growth rate phenotype (Figs. S9 and S10) and carried a private background combination of ∼31,000 passenger SNPs on average, compared to other sequenced isolates. FPR1 mutations were always homozygous and likely to inactivate the gene or inhibit its expression. In contrast, TOR1 mutations were heterozygous while we found RNR2 and RNR4 mutations in both heterozygous and homozygous constellations. All driver mutations occurred in highly conserved functional domains. The variant allele fractions of these mutations mirrored the inferred subclonal dynamics (Fig. 2A and C, and Figs. S3 and S4).
Clonal expansions were also evident by changes in the fitness distribution of cells. We established this by phenotyping 96 randomly isolated individuals from 3 populations per environment at 0 and 32 days, as well as the 44 sequenced individuals at 32 days (Fig. 2B and D). In rapamycin selection, the phenotype distribution became bimodal after 32 days, reflecting the fitness of subclones substantially improving with respect to the mean fitness of the bulk population. The clonal subpopulation divided on average twice as fast as the ancestral population. Sequenced isolates with driver mutations in FPR1 and TOR1 were on the leading edge of the phenotype distribution, far ahead of the bulk. Furthermore, the bulk component showed a 10% average improvement, possibly due to selection of beneficial genetic backgrounds. Conversely, bimodality was only detected in one population in hydroxyurea selection (WAxNA F12 1 HU 3), where the clonal peak grew 22% faster and the bulk 16% faster on average compared to the ancestral. Isolates with RNR2 driver mutations fell onto the leading edge of the fitness distribution. These six isolates originated from the same expanding subclone and two of them had 13% faster growth rate than the remaining four, although they all shared the same heterozygous RNR2 driver mutation. In both of these clones, we found a large region in chromosome II to have undergone loss of heterozygosity (LOH), offering a putative genetic cause for their growth advantage (Fig. 3C).
Diversi cation and genome instability
Surprisingly, we found several of the driver mutations to exist in homozygous rather than heterozygous states (Table S2). We hypothesized that genome instability, causing widespread LOH, can significantly contribute to adaptation in diploid populations. To detect mechanisms of genome instability, we used heterozygous genetic variants as markers. Firstly, we used the sequences of haploid individuals from the ancestral population, drawn before the last round of crossing, to create in silico diploid genomes and calculate the length distribution of homozygous segments. Similarly, we measured the length distribution of homozygous segments from evolved isolate genomes. We observed a significant increase of long homozygosity tracts in the evolved clones – a hallmark of LOH (Fig. 3A). Secondly, we directly counted LOH events in populations using multiple sequenced isolates from the same expanding subclone (‘Materials and methods’). We identified a minimum of 6 events per genome per clone (Fig. S6). Whilst this estimate is a lower bound and is limited due to the number of sequenced individuals per subclone, the LOH rates are substantial. An alternative route to homozygosity was observed in a single clone found to be haploid (clone C1 in WAxNA F12 1 RM 1) and therefore homozygous genome-wide. This haploid clone is closely related to a diploid clone (C3) from the same population and both clones share the same FPR1 W66*de novo mutation (Fig. S6B). These data are consistent with the appearance of the FPR1 heterozygous mutation in an ancestral diploid clone that took two independent routes – focal LOH or meiosis – to unveil the recessive driver mutation. Altogether, we find that genome instability can render de novo mutations homozygous as a necessary event in a multi-hit process towards drug resistance.
The stress environments themselves have an active role in accelerating genome evolution by chromosomal instability. Using a fluctuation assay, we investigated the effect of the genetic background and of the selective environment on chromosomal instability by tracking the loss of the URA3 marker. Consistent with previous studies (Barbera and Petes, 2006), replication stress induced by hydroxyurea caused an increase in LOH rates. We also observed a background-dependent increase in LOH in rapamycin (Fig. 3B).
Fitness effects of background and de novo variation
Finally, we sought to partition and quantify the individual fitness contributions of pre-existing and de novo genetic variation. To this end, we designed a genetic cross where background and de novo variants were re-shuffled to create new combinations (Fig. 4A). We randomly isolated diploids from both ancestral and evolved populations, sporulated these and determined whether the derived haploids contained wild-type or mutated RNR2, RNR4, FPR1 and TOR1 alleles. We then crossed haploids to create a large array of diploid hybrids where all genotypes (+/+, +/-, -/-) for each of these genes existed in an ensemble of backgrounds, thus recreating a large fraction of the genotype space conditioned on the presence or absence of driver mutations. We measured the growth rates of both haploid spores and diploid hybrids, estimating and partitioning the variation in growth rates contributed by the background genotype and de novo genotypes using a linear mixed model (‘Materials and methods’). After conditioning for RNR2, RNR4, FPR1 and TOR1 driver mutation status, a large fraction of the phenotypic variance still remained, reflecting the effect of the genetic backgrounds in which they emerged (Fig. 4C and D). In fact, under hydroxurea exposure, background genetic variation accounted for an estimated 51% of the growth rate variance, more than twice the estimated 23% contributed by RNR2 and RNR4 de novo mutations. This result directly implies that moderate effect de novo mutations must arise on favorable genetic backgrounds to give rise to macroscopic subclones. In contrast, under rapamycin exposure, the pre-existing genetic variation accounted for only 22% of the variance, much less than the 70% attributed to FPR1 and TOR1 mutations. Such large-effect mutations can expand in a vast majority of backgrounds, explaining how they can almost entirely determine the motion of the fitness distribution (Fig. 2D). The ensemble average over backgrounds showed that the mean effect of RNR2, RNR4 and TOR1 mutations was fully dominant and highly penetrant regardless of the background (Fig. 4E and F). In contrast, FPR1 mutants were recessive and only increased growth rate when homozygous, again irrespective of the background (Fig. 4F). Taken together, these data are consistent with the aggregation of small-effect, pre-existing variants that add up and substantially contribute to fitness in both selection environments.
III. DISCUSSION
Populations adapt to external challenges by exploiting pre-existing variation and discovering de novo solutions, but how these variants jointly in uence evolution is not well understood. The rate of adaptation depends on multiple factors such as population size, mutation rate and ploidy (Barrick and Lenski, 2013; Selmecki et al., 2015; zur Wiesch et al., 2011). Our results show that sufficiently large populations could readily find beneficial de novo mutations, but their adaptive trajectories were simultaneously shaped by pre-existing and de novo variation with overlapping timescales. Our findings are relevant to understand large asexual populations with extensive pre-existing variation, such as pathogens and cancer, which easily reach sizes of billions of cells. Despite the large initial genetic heterogeneity, the same driver mutations recurred in multiple populations, indicating convergent evolution towards a restricted number of molecular targets. This is a key aspect to be able predict the outcome of selection, and thus to what extent it could inform the design of second line therapies in the treatment of bacterial and viral infections or cancer. Noticeably, the substantial genotypic and phenotypic diversity that remained after selection could well be a potent substrate for populations to escape from the application of second line therapies. Clearly, whether or not these results hold more generally needs to be studied across systems.
Our results show that, not only was there a constant supply of new beneficial mutations, but selection concomitantly acted on pre-existing variation through its combined effects on fitness. New mutations entered the population and the bulk of the population steadily improved. In some cases, acquired mutations were sufficient to determine the motion of the fitness distribution, such as TOR1 mutations in rapamycin. Conversely, the preexisting fitness variance in uenced the fate of de novo drivers like RNR2 and RNR4 mutations, which needed to land on a favorable background to be competitive. Therefore, detecting a known driver mutation without a measurement of the background fitness distribution would be insufficient to predict its fate. Controling and balancing the fitness effects of background and de novo mutations in response to these stress environments may be possible, as their effects can be modulated by inhibiting global regulators (Jarosz and Lindquist, 2010).
We observed a balance between the loss of diversity due to selection, and active diversi cation mechanisms that partially re-established and re ned existing variants. The background not only contributed substantially to fitness, but was also continuously re-con figured by genome instability, diversifying the clones as they expanded – the moving target metaphor of resistance evolution captured in action. Although these rearrangements were mostly copy number neutral, they lead to phenotypic change by changing scores of mutations from heterozygous to homozygous state in a single step. Such chromosomal rearrangements represent a key mechanism in shaping genome diversity in asexual organisms (Flot et al., 2014; Weir et al., 2016; Ford et al., 2015) and in somatic evolution of cancer (Stephens et al., 2011), where cells accumulate a genetic load during tumor development that LOH can phenotypically reveal. Recently developed genome editing approaches may enable localizing and measuring the fitness effect of specific LOH regions (Sadhu et al., 2016). Equally, chromosomal rearrangements also had a major effect on de novo variants, as recessive FPR1 mutations needed a second hit by LOH for resistance. Taken together, we hope our results will stir work towards better theoretical and empirical understanding of the complex interplay of selection simultaneously acting on pre-existing and de novo genetic variation, and of the role of genome instability continuously molding the genomes in a population.
IV. MATERIALS AND METHODS
The experimental methods and quantitative analyses of this study are presented in Section IV.A ‘Experimental methods’. In Section IV.B ‘Theory and data analysis’, we present the de nition of the model for localization of drivers amongst hitchhiking passengers and the probabilistic inference method for subclonal reconstruction. Furthermore, we discuss the model for the estimation of variance components from background-averaged fitness measurements. Supplementary figures and tables are appended at the end of the ‘Materials and methods’. Code and data are available from the GitHub repository (https://github.com/ivazquez/gv-paper-2016).
A. Experimental methods
Evolution assays
In our study, we begin with two yeast strains which have diverged over millions of years ( diversification phase), that are randomly mated by meiotic recombination to generate a large pool of recombinant mosaic haplotypes (crossing phase), followed by selecting a fraction of the population under stress without severe bottlenecks (selection phase).
Diversi cation phase
Parental strains were derived from a West African strain (DBVPG6044; MATα, ura3::KanMX, lys2::URA3, ho::HphMX) isolated from palm wine and a North American strain (YPS128;MATa, ura3::KanMX, ho::HphMX) isolated from oak tree (Table S3). Hereafter we refer to these strains as WA and NA, respectively. These strains were selected from two diverged S. cerevisiae lineages and feature 52,466 single-nucleotide differences uniformly distributed across the genome (see ‘Sequence analysis’).
Crossing phase
The selection experiments were carried out using WA, NA, WAxNA F2 and WAxNA F12 founder populations derived from hybrids between WA and NA. The WAxNA F2 and F12 populations were respectively generated from the F1 and F11 hybrids between WA ( MAT, ura3::KanMX, lys2::URA3, ho::HphMX) and NA (MATa, ura3::KanMX, ho::HphMX).
The WAxNA F1/F11 diploid populations were expanded in YPDA and sporulated in solid KAc medium (2% potassium acetate, 2% agar) for 14 days at 23°C. Sporulation of diploids was confirmed by visual inspection of asci. Over 90% of sporulation efficiency was observed after 14 days. Any remaining unsporulated cells were selectively removed using the ether protocol (Parts et al., 2011; Dawes and Hardie, 1974). The haploid population was subjected to mass mating according to the protocol described by Parts et al. (2011). Briefly, the asci were resuspended in 900 µl of sterile water and digested with 100 µl of zymolase (10 mg ml−1) for 1 hour at 37°C. The cells were washed twice with 800 µl of sterile water, vortexed for 5 minutes to allow spore dispersion, plated in YPDA and incubated for 2 days at 23°C. The YPDA plates were replica plated in minimal medium to select diploid cells (MATa MATα, LYS2/lys2::URA3). The WAxNA F2/F12 generation was collected from the plates and used as a founder population for the selection experiments as well as stored at −80°C as a frozen stock.
Selection phase
In the selection phase, WA, NA, WAxNA F2 and WAxNA F12 founder populations (referred to as ancestral) were evolved asexually in two selective environments and one control environment. Where indicated, the selective media were supplemented with hydroxyurea (HU) at 10 mg ml−1 or rapamycin (RM) at 0.025 µg µl−1. We serially propagated multiple replicate populations over a period of 32 days (~200 generations), which we refer to as evolved populations. Every two days, 10% of the total cell population was transferred to fresh plates at constant drug concentration until day 34. WA and NA populations are labeled by their background, the environment in the selection phase and the selection replicate, e.g. NA RM 1. WAxNA populations are labeled by background, number of crosses, cross replicate, selection environment and selection replicate, e.g. WAxNA F12 2 HU 1. Time series samples are labeled from T0 to T32 and isolate clones carry a suffix, e.g. C1, C2, etc.
We followed the evolution of these populations over the course of the experiment using whole-genome sequencing and phenotyping of the bulk population, of ancestral and of evolved isolates. Whole-population sequencing was performed after t = 0, 2, 4, 8, 16 and 32 days, and ancestral and evolved individuals were also sequenced (Table S1). Genomic DNA was extracted from the samples using the ‘Yeast MasterPure’ kit (Epicentre, USA). The samples were sequenced with Illumina TruSeq SBS v4 chemistry, using paired-end sequencing on Illumina HiSeq 2000/2500 at the Wellcome Trust Sanger Institute. Sequence data for the parental strains and the ancestral individuals were previously submitted to the SRA/ENA databases under study accession no. ERP000780 and the NCBI BioProject under accession no. PRJEB2608. Sequence data for the time-resolved populations and the evolved individuals have been submitted to the SRA/ENA databases under study accession no. ERP003953 and the NCBI BioProject under accession no. PRJEB4645.
Random and targeted clone isolation
We isolated ancestral and evolved individuals from representative selection experiments to characterize their individual genome sequences and their fitness. Each sample underwent serial dilution to attain a single-cell bottleneck, plated on YPDA. We isolated individuals from both ancestral populations (WAxNA F12 1, WAxNA F12 2) and 6 evolved populations (WAxNA F12 1 HU 2, WAxNA F12 1 HU 3, WAxNA F12 2 HU 3, WAxNA F12 1 RM 3, WAxNA F12 1 RM 4, WAxNA F12 2 RM 2) to measure the initial and final fitness distribution. 96 colonies were randomly picked from each population to span a range of fitness. We measured their growth rate using the high-resolution scanning platform described in ‘Growth phenotyping’.
Furthermore, we isolated individuals at the fitter end of the phenotype distribution, possibly harboring driver mutations. Since adaptation to one environment typically results in fitness gains or losses in other environments, we profiled 96 individuals from each selection experiment with an array of 6 different environments (YPD, HU at 10 mg ml−1, RM at 0.025 µg µl−1, galactose at 2%, heat at 40°C and sodium arsenite at 1.5 mM) to discriminate cells based on their phenotypic response. After visual inspection of shared effects across environments, we tested genetic markers by PCR/digestion and targeted resequencing of de novo mutations identified from the genome analysis of whole populations. In the hydroxyurea experiment, a heterozygous mutation in RNR4 was genotyped by PCR followed by BanI digestion. In the rapamycin experiment, heterozygous DEP1 and INP54 de novo mutations were genotyped using PCR, followed by AluI digestion and confirmed by Sanger sequencing in a subset of samples. We chose a total of 44 clones (22 per environment) for whole-genome sequencing (Table S1).
Engineered genetic constructs
We selected two genes in which we found putative loss-of-function mutations in hydroxyurea (RNR2, RNR4) and five genes in rapamycin (CTF8, DEP1, FPR1, TOR1, YNR066C) and engineered gene deletions to confirm whether their knockouts are beneficial. We also built hemizygous strains to determine the adaptive value of background variation in putative driver genes, by engineering in or out ancestral or evolved alleles in opposite backgrounds. For pre-existing variants, the test for reciprocal hemizygosity uses one-step PCR deletion with URA3 as a selectable marker. Starting from haploid versions of the WA and NA strains (either MATa, ho::HygMX, ura3::KanMX or MATα, ho::NatMX, ura3::KanMX), we deleted the candidate genes and constructed all possible combinations of reciprocal hemizygous strains (Fig. S8). The deletion in the haploid strain was confirmed by PCR and then crossed with the opposite mating type to generate the hemizygous hybrid strains (Steinmetz et al., 2002; Cubillos et al., 2011; Salinas et al., 2012). To test driver de novo mutations, we engineered reciprocal hemizygous deletions for two clones carrying heterozygous mutations in RNR2 and TOR1. The gene deletion was performed using the dominant selectable marker NatMX and we used Sanger sequencing to identify the deleted allele (wild-type or mutated copy).
Genetic cross
We sought to measure the fitness contributions of pre-existing and de novo mutations using a genetic cross. To do so, we designed a large-scale cross where both ancestral and evolved genetic backgrounds were re-shuffled in new combinations and tested for fitness with and without drugs. The genetic cross included the parents, ancestral and evolved isolates. The WA and NA haploid parents were used in MATa, ura3 and MATα, lys2 configurations. We derived haploid lines by sporulation on KAc medium from the ancestral and evolved clones. Only tetrads with four viable spores were chosen for continuation in the experiment. Spores were genotyped for mating type (MATa, MATα) using tester strains and for auxotrophies (ura3, lys2) by plating on dropout medium. We chose spores from tetrad configurations with the mating marker co-segregating as MATa, ura3 or MATα, lys2, allowing a systematic cross between all strains of opposite mating type. We then determined whether each spore inherited the wild-type or the mutated allele by Sanger sequencing of the candidate gene.
Eight ancestral haploid segregants (4 MATα, lys2 and 4MATa, ura3) were randomly isolated from the ancestral population. For the hydroxyurea environment, we probed individually beneficial de novo mutations in RNR2 (Y169H) and RNR4 (R34I), which reside on different chromosomes of the S. cerevisiae genome. The RNR2 mutant was isolated from WAxNA F12 1 HU 3 (clone C3) and the RNR4 mutant from WAxNA F12 2 HU 1 (clone C1) at t = 32 days. For rapamycin, three evolved clones isolated at t = 32 days were used: one clone with no identifiable driver from WAxNA F12 2 RM 2 (clone C1), a homozygous FPR1 mutant (W66*) from WAxNA F12 2 RM 1 (clone C3); and a heterozygous TOR1 mutant (W2038L) from WAxNA F12 1 RM 2 (clone C3). For the hydroxyurea experiment, 21 tetrads were taken for crossing (12 for RNR2 and 9 for RNR4) resulting in 84 spores. For the rapamycin environment, 25 tetrads were used (1 without driver, 4 for FPR1, 20 for TOR1), resulting in 100 spores.
A genetic cross of size 48 × 48 in hydroxyurea yielded 2,304 hybrids, and 56 × 56 in rapamycin, giving 3,136 hybrids. We performed the genetic cross using the Singer RoToR HDA robot on YPDA plates. Subsequently, the hybrid populations were grown for two rounds on minimal medium to ensure colonies of solely diploid cells and avoid haploid leakage. A small number of crosses were not successful due to mating inefficiency or slow growth (56 in hydroxyurea and 654 in rapamycin), leaving a total of 2,248 and 2,482 hybrids, respectively. This was due to mistyping of the mating locus in one FPR1 spore and three TOR1 spores, which were excluded together with their derived hybrids. Phenotypic measurements of the crosses were carried out using the high-throughput method of yeast colony growth described in ‘Growth phenotyping’.
Luria-Delbrück fluctuation assay
We performed a fluctuation test to determine the rate of loss-of-heterozygosity (LOH) in different backgrounds, by following the loss of a heterozygous URA3 marker that results in 5-FOA resistant colonies (Luria and Delbrück, 1943; Lang and Murray, 2008). In all strains tested the URA3 gene was deleted from its native location in chromosome V and inserted in the lys2 locus (lys2::URA3) in chromosome II (~470 kb) (Fig. 3B). This genotype is the same used in the crossing phase and therefore shared by all individuals in the population. Our system does not have dedicated markers to distinguish different mechanisms leading to LOH but instead gives an aggregate measurement of the total rate. The strains were first patched in URA dropout medium and then streaked for single colonies in plates with YPD or YPD supplemented with the drugs (HU at 10 mg ml−1 or RM at 0.025 µg ml−1). Colonies were grown for 3 days at 30°C. Cells were resuspended in water and cell concentration was measured by flow cytometry to obtain a correct dilution factor in the subsequent plating. Cells from each replicate were plated in YPDA to determine total number of colony-forming units and 5-FOA (1 g l−1) plates to count colonies that are URA3 defective. We confirmed the loss of the URA3 marker by diagnostic PCR. Four replicates for each experiment were used to determine the rate.
To ensure the absence of meiotic spores we inspected ~ 100 cells per sample. This control was introduced for two reasons. First, the NA parent is a very fast and efficient sporulator (Gerke et al., 2006). We observed the induction of meiosis even without the specific KAc environmental signal required in the laboratory strain S288C (and its derivatives) to initiate sporulation. Second, rapamycin has been shown to promote sporulation by modulating the nutrient sensing pathway (Zheng and Schreiber, 1997). We did not observe fully formed meiotic spores during this experiment, though we cannot exclude that meiotic events before the meiotic commitment point (e.g. double-strand breaks) may have occurred that could affect the LOH rate (Laureau et al., 2016).
We fitted the fluctuation data to a model of the Luria-Delbrück distribution. We determined the expected number of LOH events per culture m, such that LOH rate can be estimated by , where N is the average number of cells per culture. In the control environment, we observed a rate µ = 9.8 × 10−6 per cell per division in the WA background, consistent with previous reports (Barbera and Petes, 2006; Andersen et al., 2008). We observed a ten-fold higher rate in the NA background (µ = 3.24 × 10−5) and the WAxNA F1hybrid had an intermediate rate (µ = 7.39 ×10−6). These data indicate that LOH rates can vary between genetic backgrounds. There was a sharp increase of LOH rates when colonies were grown in hydroxyurea, irrespective of the background tested. This finding is consistent with previous studies in the laboratory strain S288C reporting that replication stress promotes recombinogenic DNA lesions (Barbera and Petes, 2006). We also observed a background-dependent increase in LOH rate in the presence of rapamycin, especially in the NA founder. Rapamycin has been reported to trigger sporulation (Zheng and Schreiber, 1997) and it is likely that the formation of double-strand breaks during meiosis can increase the LOH rate.
Growth phenotyping
To carry out phenotype measurements we used a high-resolution scanning platform, Scan-o-matic, to monitor growth in a 1536-colony design on solid agar medium (Zackrisson et al., 2015). Solid media plates designed for use with the Singer RoToR HDA robot (Singer Ltd) were used throughout the experiment. Casting was performed on a leveled surface, drying for ~ 1 day. We designed a randomized experimental layout by distributing genotypes of interest over 1,152 positions across each plate, keeping every fourth position for 384 controls used for removal of spatial bias. Controls were interleaved in the pre-culture step using a custom-made RoToR pinning program.
We recorded phenotypic measurements using high-quality desktop scanners (Epson Perfection V700 PHOTO scanners, Epson Corporation, UK) connected via USB to a standard desktop computer. Scanner power supplies were separately controlled by power managers (GEMBIRD EnerGenie PowerManager LAN, Gembird Ltd, Netherlands) that immediately shut down the scanner lamp between scans. Images were acquired using SANE (Scanner Access Now Easy). We performed transmissive scanning at 600 dpi using 8-bit grey scale, capturing four plates per image. Plates were fixed by custom-made acrylic glass fixtures. Orientation markers ensured exact software recognition of fixture position. Each fixture was calibrated by scanner using a calibration model that provided positions for each feature of that fixture, relative to its orientation markers. Pixel intensities were normalized and standardized across instruments using transmissive scale calibration targets (Kodak Professional Q-60 Color Input Target, Kodak Company, USA). Scanners were maintained in a 30°C, high-humidity environment (incubation room) and kept covered in custom-made boxes during experiments to avoid light influx and minimize evaporation.
Experiments were run for 3 days and scans were continuously performed every 20 minutes. Each image stack was processed in a two-pass analysis. The first-pass was performed during image acquisition and was responsible for setting up the information needed for growth estimations. Positions in each image were matched to the fixed calibration model using the fixture orientation markers, allowing detection and annotation of plates and transmissive scale calibration strips. In the second-pass analysis, images were segmented to identify plate and transmissive scale calibration strip positions. The calibration strips were trimmed and the pixel intensities compared to the manufacturer’s supplied values, such that normalized pixel values remained independent of fluctuations in scanner properties over time and space. The colonies were detected using a virtual grid across each plate based on pinning format, and the grid was adjusted for the intersections to match the centre of the features detected. At every intersection, each colony and the surrounding area were segmented to determine the local background and pixel intensities. Differences in pixel intensity were converted to population size estimates n (t) by calibration to independent cell number estimates (spectrometer and FACS). Based on these, we obtained growth curves in physical units.
Raw measurements of population size were smoothed in a two-step procedure. First, a median filter identified and removed local spikes in each curve. Second, a Gaussian filter reduced the in uence of remaining local noise. Since we expect a population to double in size during the average time taken to progress through the cell cycle, we use an exponential growth model defined as n(t) = n(0)eλt, where λ is the absolute growth rate. If the time that has passed is exactly the doubling time τ, it can be shown that within this time span the growth rate can be rewritten as . It follows that λ can then be estimated from the linear fit of any two log-transformed measurements of n (t) in exponential phase, according to . For quality control, the residuals of the model are then used to determine goodness-of-fit and to flag growth curves suspected to be of poor quality, which are visually inspected for artefacts. Rejection rates averaged approximately 0.3% across experiments.
To account for systematic errors caused by spatial variation within plates, we used an isogenic control at every fourth position. We defined a two-dimensional reference matrix for doubling time estimates of the 384 controls (on each 1,536 plate) to correct for structured spatial bias. Controls with extreme values were removed and the remaining control positions were used to interpolate a normalization surface. This surface was first smoothed with a kernel filter to exclude any remaining noisy measurements, and then with a Gaussian smoothing to soften the contours of the landscape. For a colony measured at position ij, the absolute growth rate was rescaled by taking the log-transformed difference between the observed estimate and the growth of the normalization surface, i.e. the relative growth rate is then . To account for systematic bias between plates, the growth parameters were transformed by shifting the mean of the controls on each plate to match the mean of all plates in the experimental series.
Media composition
During the crossing phase, the cells were expanded and maintained in YPDA medium (2% glucose, 2% pep-tone, 1% yeast extract and 2% agar). WAxNA F1/F11 populations were sporulated in solid KAc medium (2% potassium acetate and 2% agar). WAxNA F2/F12 populations were then selected in minimal medium lacking uracil and lysine (0.67% of yeast nitrogen base (YNB), 2% glucose and 0.2% of dropout mix minus uracil and lysine). The selection phase of the experiments was carried out in YPDA medium supplemented with the drug. All selection experiments with drugs (as well as follow-ups) used media supplemented with hydroxyurea (HU) at 10 mg ml−1 or rapamycin (RM) at 0.025 µg ml−1, supplied by Sigma-Aldrich. The drug concentrations were chosen by serial dilution of dose levels in haploid and diploid WA and NA strains. We selected concentrations that maximized the differential growth between the two diploid parents by ten-fold in each environment (Fig. 4, C and D).
As part of the follow-up assays, we used antibiotic resistance as a selectable marker to engineer gene deletions and build hemizygous strains, plating in YPDA supplemented with the corresponding antibiotic (‘Engineered genetic constructs’). We supplemented YPDA medium with nourseothricin (Nat) at 100 µg ml−1, hygromycin B (Hyg) at 200 µg ml−1 and G418 at 400 µg ml−1. Transformations of reciprocal hemizygous strains also relied on URA3 as a selectable marker and were plated in minimal medium lacking uracil (0.67% YNB, 2% glucose and 0.2% dropout mix minus uracil). The fluctuation assay was carried out in YPDA, or YPDA supplemented with the drug (‘Luria-Delbrück fluctuation assay’). Colonies defective in the URA3 allele were selected in 5-FOA plates (YPDA medium supplemented with 5-fluoroorotic acid at 1 g l−1). In the genetic cross, the clones used were sporulated in solid KAc medium described above (‘Genetic cross’). Haploid strains were derived from dissected spores and genotyped for their mating type, URA3 /LYS2 auxotrophies and known de novo mutations. Strains were crossed in YPDA and selected in minimal medium depleted of uracil and lysine. Growth phenotyping on solid medium was performed in Singer PlusPlates (Singer Ltd). Each plate was cast with 50 ml of Synthetic Complete medium, composed of 0.14% YNB, 0.5% ammonium sulphate, 0.077% Complete Supplement Mixture (CSM, ForMedium), 2% (w/v) glucose and pH buffered to 5.8 with 1% (w/v) succinic acid. The medium was supplemented with 20 g l−1 of agar.
B. Theory and data analysis
Sequence analysis
Short-read sequences were aligned to the S. cerevisiae S288C reference genome (Release R64-1-1, downloaded from the Saccharomyces Genome Database on February 5, 2011). Sequence alignment was carried out with Stampy v1.0.23 (Lunter and Goodson, 2011) and local realignment using BWA v0.7.12 (Li and Durbin, 2009). After removing PCR duplicates, the median genome-wide DNA coverage was 94× across whole-population samples, 23× across ancestral isolates and 30× across evolved isolates (ranging from 9× to 150×; first quartile 24× and third quartile 91×).
We detected single-nucleotide variants where the WA and NA parents differ, which comprises the background variation segregating in the cross (52,466 sites). We obtained allele counts on these loci using GATK UnifiedGenotyper v3.5-0-g36282e4 (DePristo et al., 2011). These counts were polarized to report WA alleles at each locus, as neither of the parents is the reference genome. The allele counts for segregating variants were first processed using the filterHD algorithm, which takes into account persistence along the genome due to linkage and allows for jumps in allele frequency if there are emerging subclones in the populations.
To detect de novo mutations we used three different algorithms: GATK UnifiedGenotyper v3.5-0-g36282e4 (DePristo et al., 2011), Platypus v0.7.9.1 (Rimmer et al., 2014) and SAMtools v1.2-10 (Li, 2011). We focused on single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels). We first performed calling of both SNPs and indels for all ancestral isolates, evolved isolates and the parents. Using BCFtools (Li, 2011), we subtracted parental variation from all derived samples (ancestral and evolved). We further subtracted variation found in ancestral isolates from all evolved samples to account for segregating variation that was missed. We then required to see a given variant in more than six reads, be covered by more than ten reads and pass the default flags for the algorithms. For all isolates we further required that there is only a single alternative allele. We then used GATK UnifiedGenotyper to get variants identified by at least two of the algorithms. For whole-population samples we did not require only a single alternate allele and we changed Platypus filtering to allow also ‘allele bias’ calls. To detect allele frequency changes over time, we only considered loci where the minimum variant allele count across time points was less than two and the maximum more than six reads. To avoid an increase of false positives in regions where genetic heterogeneity (e.g. variation in copy number) could cause difficulties in mixed samples we used more stringent filters than for isolate samples on mapping and base quality biases and goodness of fit. Finally, to increase our sensitivity of detection of putative de novo variants in recurrent target genes, we looked for mutations in CTF8, RNR2, RNR4, FPR1 or TOR1 that were only called by a single algorithm.
Genome-wide scan of pre-existing variants under selection
We observed patterns of selective sweeps when a ‘driver’ allele with a significant fitness advantage starts to gain in frequency due to the selective pressure applied (Fig. 1B and Fig. S2). This movement also causes allele frequency changes at nearby loci containing ‘passenger’ alleles that are genetically linked with the driver, in a process called genetic hitchhiking.
To discern drivers and passengers, we consider a model of a population evolving in a regime of strong selection, where there is a favored allele (driver) at locus i, and a set of linked passengers. We have previously developed a computational approach to analyse selection acting on pre-existing genetic variation that results from a cross (Illingworth et al., 2012). Genetic drift plays a negligible role for allele frequency changes in the experiment as the population size (∼107 cells) is much larger than its duration (~200 generations). Therefore, we can assume that the allele frequencies change deterministically and the remaining noise is due to sampling caused by finite sequencing depth.
A selective sweep is then well approximated by a model of the frequency of the WA allele at locus i which satis es the logistic equation,
The frequency of the NA allele at locus i is . Here, the selection coefficient σi is the fitness difference between the alleles, and the pre-factor reflects a diploid population with additive selection. This growth model is a deterministic approximation to the stochastic evolution of qi(t), which is commonly described by the Wright-Fisher model with directional selection.
To account for the effects of linkage between mutations, we consider a model with two alleles possible at each locus, in which the driver mutation is at locus i and passengers at loci j. We refer to the two alleles at the i locus as α ∈ {WA, NA}, and the alleles at the j loci as b ∈ {WA, NA}.
According to our model, the dynamics of passenger alleles are fully specified by the motion of the local driver. The effect of the selected allele on existing variation at a passenger locus j is given by where the two-locus haplotype frequency is , denotes linkage disequilibrium.
We note that due to short-read sequencing of a mixed population we cannot directly measure or linkage disequilibrium, but we can parameterize Dij in terms of the recombination which took place during the crossing phase. After Nc generations of sexual recombination, Dij(t) = (1 − ρtot)Nc Dij(t0), where the total recombination rate ρtot = ρΔij depends on the distance between the loci Δij in base pairs (bp) and the local recombination rate ρ in units of (bp × gen)−1.
Therefore for a given driver locus and a set of passengers the model is fully specified by the strength of selection, the pairwise linkage structure (or recombination landscape), and allele frequencies at both driver and passenger loci at t = 0 days. Previously, we learned all these parameters via a maximum likelihood approach with a binomial noise model accounting for sequencing noise coupled with a search heuristic for proposing candidate driver locations.
Here we have made two extensions to this approach. Firstly, we utilize a recombination landscape inferred for this cross in a separate study to remove the need to estimate a local recombination rate from the allele frequency movement (Illingworth et al., 2013). Secondly, we use the posterior mean of the allele frequency at t = 0, as obtained with the filterHD algorithm, to fix the initial condition. As a result, for each driver-passenger model we only need to learn the strength of selection and we can therefore systematically scan through each of the 52,466 segregating sites and test the alleles at each locus to be under selection. The resulting log-likelihood score is compared to a null model where selection on the driver locus is set to zero. This null model corresponds to no frequency changes during the experiment (no parameters to be learned) and scoring the observations from different time points using t = 0 allele frequencies from filterHD.
We performed a systematic driver scan including passengers within variable window sizes ± 2 kb, 5 kb, 10 kb, 30 kb, 50 kb. Emerging subclones result in global allele frequency changes that supersede the local signal, which is the hallmark of selection acting on pre-existing variation. In consequence, we only considered time points when populations had not yet become clonal, up to t = 4 days. For each scan we selected the top 200 loci (out of 52,466) and then required that a given window was identified to be among the top scoring ones in at least two populations. The remaining windows were merged if their passenger loci overlapped. Finally, we required that the region was not identified among those scoring highly in the control environment.
The scan identified a region of interest for rapamycin resistance, found in chromosome VIII around 460 kb–490 kb as discussed in the main text. The signal is visible in all rapamycin populations but not in the control. However, we were not able to localize it fully due to a low recombination rate in this region and possibly also caused by the presence of multiple drivers. The top hits with different passenger window sizes show substructure in terms of peak location. Smaller windows contain multiple peaks, which then get merged to single peaks in larger windows. We note that theoretically the passenger window size should not matter provided the linkage model is good enough and there are no multiple drivers affecting the passenger dynamics. In summary, the region as a whole has strong support across populations to contain pre-existing variation where NA allele(s) are beneficial in rapamycin, albeit we cannot statistically map the signal more finely. From candidate genes in the region, we validated CTF8 to have a resistance phenotype (see ‘Genome-wide scan of pre-existing variants underselection’). The region also contains KOG1 (the gene has pre-existing missense variants in the population) which is part of the TOR pathway and so a plausible target of selection here. We did not find regions that replicated across all populations in hydroxyurea.
Reconstruction of subclonal composition
In the late stages of the selection experiment we identified global allele frequencies changes of pre-existing, segregating variants caused by one or multiple de novo mutations (or a particularly favorable combination of the background variation itself) in clones that are under positive selection. During the selection phase, which is asexual, mutations in the genome of a cell are physically linked. Thus after a cell acquires a beneficial de novo mutation this can outweigh all its background variants, which become passengers (they may of course contribute to the fitness of that cell as well). At the genomic level, such an expanding subclone leaves a large imprint on the data at polymorphic sites, with long-range correlations reflecting the genotype of the cell hit by the beneficial de novo mutation. This movement with global, long-range correlations and sudden jumps corresponding to the expanding genotype is qualitatively different from the movement resulting from the localized sweep picture discussed in the previous section.
In this section, we describe how we extend and use the cloneHD algorithm (Fischer et al., 2014) to reconstruct the emerging subclone dynamics in a cell population. The cloneHD algorithm was developed to explain data from short-read DNA sequencing experiments of mixed cell populations (read depth and variant counts) under the following assumptions: (i) The cells evolve asexually (without recombination). This ensures that there are long-range correlations along the genome, which can, in principle, be reconstructed from short-read data. (ii) The population consists of a mixture of subclones, i.e. groups of genetically identical cells. The total number of subclones and their relative fractions in the population are unknown. The number of subclones, which can be reconstructed from real data, is small and depends on how different they are and what their population fractions are. (iii) Each subclone carries a unique copy number profile and genotype. Both of which are unknown. (iv) There is a distinct bulk component of the population which differs from the subclones, e.g. by having a different set of genotypes. Its fraction is also unknown. (v) When several samples are jointly analyzed, the same subclonal populations are assumed to be present in all samples. However, their frequencies in some of the samples can be zero.
Previously, cloneHD was used to explain subclonal heterogeneity found in human cancers. With a few extensions, this methodology can also be used for the yeast evolution experiment studied here. After the crossing phase, the populations evolve asexually under selective pressure. The rounds of crossing of the two original strains have produced a diverse pool of recombinants, where the genotype of each cell is – for all practical purposes – unique. This ancestral population of diploid cells is modeled here as the bulk component. Its allele frequency profile can be seen in Fig. 1B and Fig. S2.
At the later stages of the evolution, a small number of individual yeast cells start to outgrow the rest of the population, maybe due to a lucky combination of pre-existing variation or due to de novo mutations. These cells grow clonally to measurable fractions of the population and leave their fingerprint in the allele frequency profile genome wide. In the extreme case, a single cell grows clonally to take over the entire population and its individual genotype that can be directly observed in the sequencing data. In the general case, there will be mixture of subclones and bulk population as described above. As an added complication, subclone copy number profiles need not be pure diploid.
This scenario is already covered in principle in the model underlying cloneHD (see esp. Section 4 in the Supporting Information of Fischer et al. (2014)). In the current study, the population is sequenced at several time points such that there are multiple related samples available for inference with cloneHD. For the read depth at locus i and time point t, the emission probability is where Mt is the mean sequencing depth per haploid DNA, is the total copy number of subclone j at locus i, c0 is the total copy number of the reference compartment (2 for diploid) and is the frequency of subclone j (with ).
The number of WA reads determines the observed allele frequency and is assumed to be binomially distributed where gij is the genotype of subclone j at locus i and is the initial allele frequency spectrum. The only substantial difference to the situation in the cancer setup is that here the genotype of a particular subclone j is persistent across large regions of the yeast genome reflecting the haplotype structure resulting from the cross. In cancer, these correlations along the genome are missing since the above model is only applied to somatic point mutations which fall randomly on the available chromosomes. Altogether, the subclonal structure of the yeast cell populations can be reconstructed with cloneHD in cna+snv mode, where both the CNA and SNV data are modeled with persistence along the genome. The rest of the cloneHD workflow fully applies. First, the read depth and allele frequency data is analyzed with filterHD, thus finding a segmentation of both data tracks for all samples jointly (in later stages subclones are larger and the transition points become more prominent). This information and the initial allele frequency profile are provided to cloneHD together with the read depth and pre-existing variant allele data in cna+snv mode. The maximum likelihood set of subclonal genomes (including their copy number profiles and genotypes) and their cell fractions is then found by cloneHD at each time point. Figure S1 shows the general setup and the cloneHD reconstruction for simulated data in one population.
We assessed the ability of our algorithm to recover several features of interest from simulated jump-diffusion processes over a range of plausible parameters. For each parameter set, we simulate a 1 Mb region with L = 10,000 observations and 60 reads per locus on average, then compute maximum likelihood estimates using different numbers of subclones. Our choice of jump probability for simulated data is set to 4 × 10−5per base. This reflects the size of linkage blocks with plausible recombination scenarios during crossing. The clones are added in a chosen background assuming the bulk has reached a steady profile. We would like to reconstruct three features: (i) the total number of subclones, (ii) their subclonal frequency, and (iii) obtain posterior estimates of subclonal genotypes.
The maximum-likelihood estimates of the subclonal fractions are approximately equal to the true values. The reconstruction is shown in Fig. S1B as a black solid line, which is the cloneHD solution for the mean posterior SNV emission rate. We can recapitulate the correct number of breakpoints and their location. The fidelity of our reconstruction to the true subclonal genotype is corroborated by the close correlation between our estimates from whole-population sequencing and the true genotypes derived from clonal isolates.
Subclonal dynamics
To ascertain the expansion of subclones throughout the experiment from whole-population sequencing, we determined the allele frequency of de novo mutations in WA, NA and WAxNA populations during the selection phase. We found that these mutations typically did not reach detectable frequency (i.e. between 1–5%) until more than 50 generations had passed, with steady increases thereafter (Figs. S3 and S4). Across populations, we found 66 point mutations spanning 41 unique loci by whole-population sequencing, out of which 50 fall onto coding sequence. These loci contain 32 functional driver mutations: 4 in RNR2, 10 in RNR4, 11 in FPR1, and 7 in TOR1. This includes two tri-allelic loci: one corresponding to FPR1 driver mutations W66* and W66S, and another to a SNP and an insertion in FPR1.
To reconstruct the subclonal composition of each WAxNA population we used cloneHD, providing the jumps found by filterHD and the posterior mean allele frequencies of the ancestral population to act as a bulk component for the inference (see ‘Reconstruction of subclonal composition’). As visual inspection did not reveal clear copy number aberrations in the samples we used cloneHD in snv mode. For each population, we systematically tried 0–4 subclones and determined the total data likelihoods under each model. The number of subclones per population are summarized in Table S1, together with the time evolution of subclone frequencies in Fig. S4. We required a log-likelihood gain greater than 20,000 units for the inclusion of an additional subclone. This conservative cut-off only allows genome-wide signals to be associated with a subclone. This is necessary as the bulk component of the population can also change throughout the experiment. This means that, with a less conservative cut-off, the algorithm could introduce artifactual subclones with suitable genotypes to improve fits in regions where selection acts on the bulk (see ‘Genome-wide scan of pre-existing variants under selection’).
Adaptive mutations and genome instability
Overall, we identified 91 SNPs and indels in 173 ancestral haploid isolates and 140 point mutations in 44 evolved diploid isolates. We detected 82 SNPs and 1 indel across 22 evolved isolates in hydroxyurea (range 1–8 per isolate), containing 10 adaptive mutations in RNR2 and 12 in RNR4 (Figs. S5 and S6). There were 56 SNPs and 1 indel across 22 evolved isolates in rapamycin (range 0–6 per isolate), which contained 8 adaptive mutations in FPR1 and 5 in TOR1 (Table S2). 33 out of 36 mutations detected by whole-population sequencing across WAxNA populations could be found in clonal isolates. All de novo driver mutations found by clone sequencing were confirmed by targeted Sanger sequencing.
Sequence analysis revealed that 3 out of 4 unique variants in RNR2 (N151H, E154G and Y169H) and 2 out of 3 unique variants in RNR4 (R34G/I) mapped to a conserved domain of the ribonucleotide-diphosphate reductase small chain.FPR1 mutations occurred at W66, either introducing a premature stop codon or changing to serine. Previous studies indicate that the majority of non-synonymous changes in FPR1 affect protein stability (Koser et al., 1993). Furthermore, the premature stop at W66 truncated the residue required for rapamycin binding (Y89). We observed clones carrying the W66* mutation selected multiple times from the same founder population indicating a pre-existent individual carrying a heterozygous mutation and independent LOH events that render the loss-of-function mutation homozygous (Fig. S6B). All ve driver SNPs in TOR1 (S1972I/R, W2038L/C and F2045L) mapped to the FKBP12-rapamycin-binding (FRB) domain, which is ~100 aa long, providing a mechanistic explanation of the drug resistance (Fig. S6A). Previous studies have found dominant mutations in S1972 and equivalent mutations in the mammalian TOR (mTOR) have a similar effect on drug binding. Substitutions at W2038 with a similar dominant effect are equivalent by homology to those previously described in TOR2 (W2042) (Lorenz and Heitman, 1995).
Using background variants as markers, we could detect mis-segregation of chromosomes leading to loss-ofheterozygosity. The presence or absence of the WA or the NA allele provides a robust signal of heterozygosity or LOH that is not affected by sampling noise in coverage. We used cloneHD to genotype the sequenced isolate samples at segregating sites. We then grouped isolate sequences by subclone lineage, requiring at least 80% genotype similarity. In hydroxyurea, this resulted in 22 isolates stemming from 8 clonal backgrounds, with more than a single isolate each. In rapamycin, 22 isolates were assigned to 4 clonal backgrounds, with more than a single isolate each. For each clonal background, we inferred its ancestral genotype. In case of a locus with a unique genotype across all isolates this value was assigned to be the ancestral state. In all other cases the ancestral state was set to be heterozygous as lost alleles cannot be regained. We then annotated all the isolates from each clone for LOH events. In addition, we segmented the coverage depth as a function of genomic position with cloneHD and found instances of copy number gains (2n>3n) in chromosomes VIII, IX and X in hydroxyurea and chromosome IX in rapamycin, as well as whole-genome copy loss (2n>n) in rapamycin. Figure S6 shows the inferred ancestral genotypes and the derived SNPs, indels, LOH events and copy number variants, grouped by population and clonal background.
To exemplify the interaction between pre-existing and de novo variation, inspection of de novo mutations in the WAxNA F12 1 HU 3 population shows that one SNP in RNR2 spans six isolates, being part of an expanding subclone (see Figs. S4 and S6A). These isolates have further diversi ed by acquiring passenger SNPs and indels, and undergoing LOH. As discussed in the main text clones C5 and C6 grow faster than the other four and have both a large LOH event in chromosome II that is not present in the other isolates, possibly providing the growth advantage. In the case of the WAxNA F12 1 RM 1 population, isolate C1 is haploid genome-wide whereas isolate C3 from the same expanding clone is diploid with prevalent LOH. Both individuals have a homozygous driver mutation in FPR1 as is required for resistance phenotype.
To determine the rate of LOH events, we counted the number of independent events within a chromosome that have led to the gain or loss of an ancestral allele in the evolved isolate sequences. This estimate is challenging given the ancestral states contain both homozygous and heterozygous loci, so that the precise end points of individual LOH events are not clear. To obtain a lower bound, we counted whether any isolate had undergone LOH affecting ≥ 10 consecutive background variants, for each chromosome in each clone. We found 48 events in hydroxyurea and 24 events in rapamycin (0.38 per chromosome per clone). We excluded two haploid individuals from this counting as well as from the length distribution of homozygosity tracts in Fig. 3A. We detected two events of cross-contamination between populations, but sampled isolates from these populations are still independent samples from the corresponding clones and therefore valid for the LOH analysis. The contamination detected however means that those de novo mutations involved should not be counted to have arisen independently.
Population averaging vs. individual measurements
Here we describe how we measure growth properties of an ensemble of cells when you may be looking at a heterogeneous population and multiple subclones, i.e. several haplotypes may be present. With an ensemble method, we will typically measure the population average. However, since we found subclones co-existing, these may be found in states that are far from the population mean. Hence, we determined the intra-population growth rate of the populations at the start and the end of the selection phase (Fig. 2 and Fig. S7). For each population, we estimated the probability distribution of the growth rate λ at time t by sampling k isogenic individuals. With an ensemble of k = 96 individuals per time point we took n = 32 replicate measurements per individual. The replicates were measured in two independent runs, evenly distributed over 16 experimental plates which were initiated from a single pre-culture plate and run in 4 scanners, all in parallel.
We modeled the probability distribution of the data as a mixture model of normal distributions, p(λ) = . We can interpret the mixing coefficients as the bulk and multiple clonal components. We determined the fraction Ft of cells in the fitter, faster growth state by fitting p(λ) to a mixture of two normal distributions, with five fitting parameters: the two means and variances, and the relative weights between them. In bimodal populations, the weights provide Ft and the estimates are in good agreement with the average of the two inflection points surrounding the trough between the bulk and the clonal subpopulations.
Validation of putative driver genes
To test candidate driver mutations, we measured the growth rate of engineered gene deletions described in ‘Engineered genetic constructs’ to confirm whether their knockouts are beneficial. We also measured the growth of hemizygous strains to test allelic differences in driver genes with pre-existing and newly acquired mutations. The engineered genetic constructs are listed in Table S4. We performed n = 64 replicate measurements of each construct in two independent runs, which were initiated from a single pre-culture plate, evenly distributed over 16 experimental plates and simultaneously run in 4 scanners. The growth rate of each of these strains λbg is shown in Figs. S9 and S10, labeled by genetic background b and genotype g.
We deleted one copy of RNR2 in WA and NA diploids and sporulation of these strains resulted in tetrads with two viable spores and two unviable rnr2Δ mutants, indicating that this gene is essential in both backgrounds. RNR2 is also essential in the laboratory S288C background. Furthermore, the heterozygous deletions of RNR2 diploids show strong haploinsufficiency for hydroxyurea resistance (Fig. S9). In contrast to its interaction partner, RNR4 is not essential in the laboratory background. However, deletion of this gene in diploid WA and NA backgrounds proved it to be essential in the WA background. The NA strain is viable after deletion, though with severe growth defects. Diploid hemizygous strains for RNR4 deletions in both backgrounds show increased sensitivity due to dosage effects (Fig. S9).
FPR1 and TOR1 are not essential genes and we performed deletions in both haploids and diploids. FPR1 directly binds rapamycin inhibiting the TOR pathway and its deletion is highly resistant (Fig. S10). Deletion of one copy of FPR1 does not increase the growth rate in rapamycin, indicating that both copies of the gene need to be inactivated to drive resistance. Consistently with this observation, all mutations observed in FPR1 are homozygous (see ‘Luria-Delbrück fluctuation assay’ for mechanisms leading to homozygosity). Large colonies in the FPR1 plating assay all acquired double-hit events (de novo SNP or indel plus LOH) that inactivated both functional copies of the gene (inset in Fig. S10). Estimates of the number of colonies for parent and hybrid backgrounds follow a similar trend to the estimates obtained with the fluctuation test. In contrast, TOR1 deletion results in high sensitivity to rapamycin and a single deleted copy does not alter the drug response (Fig. S10).
Reciprocal hemizygosity tests in ancestral hybrids confirmed background-dependent effects in CTF8, with strong positive selection acting on the NA allele as predicted by our model of driver-passenger dynamics. KOG1, which is a component of the TOR signalling pathway, did not show any allelic differences, but deleting either copy caused haploinsufficiency in rapamycin. No allelic differences were observed for DEP1, INP54 and YNR066C, which are confirmed as passengers. We also deleted either the wild-type or the mutated allele of evolved mutant clones, generating pairs of clones identical throughout the genome except for the candidate driver mutation. The four genes harboring driver de novo mutations do not appear to show allelic differences between the two parental backgrounds as shown by the reciprocal hemizygosity test.
Background-averaged fitness effects
We carried out a genetic cross to reconstruct a fraction of the genotypes that a population can explore and examined the average mutational effect of beneficial variants in multiple genetic backgrounds. We isolated isogenic individuals from parents, ancestral and evolved populations. As described in ‘Genetic cross’, we sporulated these diploid cells and selected haploid segregants of each mating type (48 in hydroxyurea and 56 in rapamycin), parameterized by an index 0 or α. We crossed the MATa and MATα versions to create hybrids. The cross forms a two-dimensional lattice that is conveniently parameterized by the set of lattice positions a, α.
We obtained a set of measurements for the growth rate λ of individuals, each of which has a unique combination of background genotype b, de novo genotype d, sampling time t and auxotrophy x. Every haploid genome being crossed is an independent background indexed by b{a,α} = 1, 2,…, nb (nb = 48 in HU and nb = 56 in RM, either a or α), such that reshuffled diploid hybrids are parameterized by baα. Genetic backgrounds are sampled before the cross (parents), before selection starts at t = 0 (ancestral) or after t = 32 days (evolved), such that t{a, α} = 1, 2,… nt (nt = 2 for the parents; nt = 4 at t = 0; nt = 42 in HU and nt = 46 in RM at t = 32). We denote de novo genotypes by d{a, α} = 1, 2,…, nd (nd = 12 for RNR2; nd = 9 for RNR4; nd = 1 without driver; nd = 4 for FPR1, nd = 20 for TOR1). Haploid spores are auxotroph and segregate with the mating locus, such that x{a, α} ∈ {ura3-, lys2-}., whereas diploid hybrids do not have amino acid deficiencies. To estimate the measurement error, we carried out nr replicate measurements of each unique spore (nr = 12 in HU and nr = 6 in RM) and of each hybrid genotype combination (nr = 3). Replicates were initiated from the same pre-culture plate, evenly distributed over 32 plates that and run in 4 scanners, all in parallel.
The data matrix shows the fitness effect of every de novo genotype d at each background b sampled at time t, averaged over measurement replicates and measured relative to the ancestral population (Fig. S11). Based on these measurements, we observed that de novo mutations are beneficial, yet their association to genetic backgrounds have idiosyncratic effects. The effects of de novo mutations are mediated by background fitness as evidenced by the large phenotypic variance. Genetic crosses between different backgrounds need not give rise to a ‘symmetric’ phenotype, as we only enforce 2:2 segregation for the mating locus MATa/α. Whilst background variants will co-segregate with the mating locus, de novo mutations need not.
To examine the mean effects of functional genotypes in hydroxyurea (RNR2, RNR4) or rapamycin (FPR1, TOR1), we calculated an ensemble average of the growth rate λ over pairs of backgrounds baα with different degrees of relatedness, with 〈…〉 the mean over genetic backgrounds. We found that, on average, RNR2, RNR4 and TOR1 mutations are dominant and highly penetrant (Fig. 4D and F, and Fig. S12B and D). In contrast, FPR1 is recessive and only increases fitness when the mutation is homozygous and carries a fitness cost (Fig. 4F and Fig. S12D, respectively).
We partitioned the variation in fitness contributed by background and de novo driver mutations using linear mixed models. To model genetic backgrounds containing beneficial mutations we need to describe how likely a phenotype is in the presence or absence of any mutation. We restricted our model to pairs of individuals that are not closely related to avoid spurious correlations by population structure, so we retained ancestral and evolved individuals and excluded the parents. We are interested in the aggregate effect across all mutations within a spore or hybrid rather than the effects of individual variants. As the data represents a finite sample from the distribution of all possible genetic backgrounds, the background contribution to the phenotype is naturally modeled as a random-effect term (i.e. individual genetic backgrounds are drawn at random from a population, and the variance of the underlying distribution is to be inferred). In addition, other systematic effects that potentially contribute to fitness are modeled as fixed-effect terms: (i) time t when the individual was sampled, i.e. at t = 0 (ancestral) or t = 32 days (evolved); (ii) de novo driver mutation status d of the individual, e.g.FPR1 driver mutation in homozygous state; and (iii) auxotrophy, denoted by x, e.g.ura3- or lys2-. We used the R-package lme4 to implement four nested linear mixed models outlined below (Bates et al., 2015).
Model 1
We first considered a model where we only included the background without other effects. This means that the observed growth rate λb for a background b conditioned on the random effect taking a value βb is distributed as where β0 is a constant that all backgrounds get and needs be inferred, represents measurement noise, xb is an element from the model design matrix (here 1 for each b as they all are assigned a value). Finally, the background growth rate is distributed as , its variance ∑2 is a model parameter to be inferred. We note that for each background b we have multiple measurement replicates of λb. Altogether, Model 1 has three modeling parameters, β0, ∑2 and .
Models 2, 3 and 4
Model 2 includes the same factors as Model 1, but the time of sampling t is nested as a fixed effect. Model 3 also accounts for de novo driver mutation status denoted by d. In addition, Model 4 includes a fixed effect accounting for amino acid deficiencies (or auxotrophy), denoted by x. Altogether the growth rate λbtdx, conditioned on the random effect taking a value βb, is distributed as: where βt, βd, βx are fixed-effect terms to be inferred and xt, xd, xx are elements of the model design matrix. Compared to Model 1, Models 2, 3 and 4 have extra parameters βt, βd, βx. The number of free parameters depends on how many unique levels each factor contains, e.g. how many driver mutations are sampled in the experiment.
The likelihood for a data vector λ given the full model (Model 4) can then be written as where the integrand is the product of the probability density given by Eq. 9 and the posterior distribution over the random effects.
Next, we applied all four models to the phenotypes of the genetic cross: a genetic cross based on hydroxyurea selection, measured in hydroxyurea and a control environment; and a genetic cross based on rapamycin selection, measured in rapamycin and a control environment, both for spores and hybrids. We fitted each model using restricted maximum likelihood with the R-package lme4 (Bates et al., 2015), summarized in Table S7. Using Akaike’s Information Criterion (AIC) for model selection all conditions had a score supporting Model 4 apart for those selected and measured in hydroxyurea, where both spores and hybrids supported Model 3. We compared the fitted and observed values and in all cases the fits were good, as shown in Fig. S13 for Model 4.
We can assess the overall goodness-of-fit of the models by the proportion of variance explained. In particular, we would like to know the contribution of various model components to the overall t, and to do so we obtain separate measures for the partial contributions of fixed and random effects (Gelman and Hill, 2006) where is the variance contribution by random effects, any incremental fixed effect contributes additively to the fixed-effect variance, s.t. , and r2 represents the proportion of variance explained by the fixed and random effects combined. Dropping the term from the numerator, we can evaluate r2 and the fixed-effects variance for linear mixed models, as described in (Gelman and Hill, 2006), and estimate the background contribution to the variance by . Then to further delineate the fixed-effect variances to individual contributions, we used the simpler models and their estimated . The variance components shown in Fig. 4D have been constructed this way. We note that modeling the background component using fixed effects instead leads to a variance decomposition that is nearly identical to that with linear mixed models as described here. However, we note that modeling the background as a fixed effect leads to a large number of parameters (one extra parameter per background) and thus describing the background by random effects is a better model for the data.
Author Contributions
I.V.-G., J.W., V.M. and G.L. designed research; I.V.-G., F.S., J.L., A.F., B.B., J.H., A.B., E.A.P. and V.M. performed research; I.V.-G. and V.M. analyzed data; and I.V.-G., V.M. and G.L. wrote the paper.
V. ACKNOWLEDGMENTS
We thank Agnès Llored, Jordi Tronchoni and Martin Zackrisson for technical help; Elizabeth Gibson for support with library preparation and sequencing; and Erik Garrison, Daniel Kunz, Leopold Parts, David Posada and Magda Reis for critical reading of the manuscript. We also thank participants of the program on Evolution of Drug Resistance held at the Kavli Institute for Theoretical Physics (University of California, Santa Barbara) for discussions. I.V.-G. is a recipient of a Wellcome Trust PhD fellowship and a Sanger Early Career Innovation Award. This research was supported by the Wellcome Trust to I.V.-G. (grant number WT097678) and to V.M. (grant number WT098051), by Fundación Ibercaja to I.V.-G., by ATIP-Avenir (CNRS/INSERM), Fondation ARC (grant number SFI20111203947), FP7-PEOPLE-2012-CIG (grant number 322035), the French National Research Agency (grant numbers ANR-13-BSV6-0006-01 and 11-LABX-0028-01), Cancéropôle PACA (AAP emergence) and DuPont Young Professor Award to G.L. F.S. was supported by ATIP-Avenir (CNRS/INSERM), Becas Chile, CONICYT/FONDECYT (grant number 3150156) and MN-FISB (grant number NC120043) postdoctoral fellowships. A.F. was supported by the German Research Foundation (grant number FI 1882/1-1), J.L. by Fondation ARC (grant number PDF20140601375), B.B. by La Ligue Contre le Cancer (grant number GB-MA-CD-11287) and J.H. by the French National Research Agency (grant number 11-LABX-0028-01).