Clear: Composition of Likelihoods for Evolve And Resequence Experiments

Arya Iranmehr; Ali Akbari; Christian Schlötterer; Vineet Bafna

doi:10.1101/080085

Abstract

The advent of next generation sequencing technologies has made whole-genome and whole-population sampling possible, even for eukaryotes with large genomes. With this development, experimental evolution studies can be designed to observe molecular evolution “in-action” via Evolve-and-Resequence (E&R) experiments. Among other applications, E&R studies can be used to locate the genes and variants responsible for genetic adaptation. Existing literature on time-series data analysis often assumes large population size, accurate allele frequency estimates, and wide time spans. These assumptions do not hold in many E&R studies.

In this article, we propose a method-Composition of Likelihoods for Evolve-And-Resequence experiments (Clear)–to identify signatures of selection in small population E&R experiments. Clear takes whole-genome sequence of pool of individuals (pool-seq) as input, and properly addresses heterogeneous ascertainment bias resulting from uneven coverage. Clear also provides unbiased estimates of model parameters, including population size, selection strength and dominance, while being computationally efficient. Extensive simulations show that Clear achieves higher power in detecting and localizing selection over a wide range of parameters, and is robust to variation of coverage. We applied Clear statistic to multiple E&R experiments, including, data from a study of D. melanogaster adaptation to alternating temperatures and a study of outcrossing yeast populations, and identified multiple regions under selection with genome-wide significance.

1 Introduction

Natural selection is a key force in evolution, and a mechanism by which populations can adapt to external ‘selection’ pressure. Examples of adaptation abound in the natural world [22], including for example, classic examples like lactose tolerance in Northern Europeans [9], human adaptation to high altitudes [55, 69], but also drug resistance in pests [15], HIV [24], cancer [27, 70], malarial parasite [3, 44], and others [56]. In these examples, understanding the genetic basis of adaptation can provide valuable information, underscoring the importance of the problem.

Experimental evolution refers to the study of the evolutionary processes of a model organism in a controlled [7, 10, 28, 37, 38, 46, 47] or natural [5, 8, 16, 17, 41, 50, 68] environment. Recent advances in whole genome sequencing have enabled us to sequence populations at a reasonable cost, even for large genomes. Perhaps more important for experimental evolution studies, we can now evolve and resequence (E&R) multiple replicates of a population to obtain longitudinal time-series data, in order to investigate the dynamics of evolution at molecular level. Although constraints such as small sizes, limited timescales, and oversimplified laboratory environments may limit the interpretation of E&R results, these studies are increasingly being used to test a wide range of hypotheses [34] and have been shown to be more predictive than static data analysis [12, 18, 52]. In particular, longitudinal E&R data is being used to estimate model parameters including population size [33, 49, 60, 64, 65, 67], strength of selection [11, 29, 30, 40, 43, 57, 60], allele age [40] recombination rate [60], mutation rate [6, 60], quantitative trait loci [4] and for tests of neutrality hypotheses [8, 13, 23, 60].

While many E&R study designs are being used [6, 53], we restrict our attention to the adaptive evolution due to standing variation in fixed size populations. This regime has been considered earlier, typically with D. melanogaster as the model organism of choice, to identify adaptive genes in longevity and aging [13, 51] (600 generations), courtship song [63] (100 generations), hypoxia tolerance [71] (200 generations), adaptation to new laboratory environments [26, 46] (59 generations), egg size [32] (40 generations), C virus resistance [42] (20 generations), and dark-fly [31] (49 generations).

The task of identifying selection signatures can be addressed at different levels of specificity. At the coarsest level, identification could simply refer to deciding whether some genomic region (or a gene) is under selection or not. In the following, we refer to this task as detection. In contrast, the task of site-identification corresponds to the process of finding the favored mutation/allele at nucleotide level. Finally, estimation of model parameters, such as strength of selection and dominance at the site, can provide a comprehensive description of the selection process.

In the effort to analyze E&R selection experiments, many authors chose to adapt existing tests that were originally used for static data, pairwise comparisons (two time-points) and single replicates to perform a null scan. For instance, Zhu et al. [71] used the ratio of the estimated population size of case and control populations to compute test statistic for each genomic region. Burke et al. [13] applied Fisher exact test to the last observation of data on case and control populations. Orozco-terWengel et al. [46] used the Cochran-Mantel-Haenszel (CMH) test [1] to detect SNPs whose read counts change consistently across all replicates of two time-point data. Turner et al. [63] proposed the diffStat statistic to test whether the change in allele frequencies of two populations deviate from the distribution of change in allele frequencies of two drifting populations. Bergland et al. [8] calculated F_st to populations throughout time to signify their differentiation from ancestral (two time-point data) as well as geographically different populations. Jha et al. [32] computed test statistic of generalized linear-mixed model directly from read counts.

Alternatively, direct methods have been developed to analyze time-series data by taking a likelihood approach, and estimating population genetics parameters. Bollback et al. [11] proposed a Hidden Markov Model (HMM) to estimate the selection coefficient s and population size by using a diffusion approximation to the continuous Wright Fisher Markov process. Steinrücken and Song [57] proposed a general diploid selection model which takes into account of dominance of the favored allele and approximates likelihood analytically. Mathieson and McVean [43] adopted HMMs to structured populations and estimated parameters using an Expectation Maximization (EM) procedure on discretized allele frequency. Feder et al. [23] modeled increments in allele frequency with a Brownian motion process, proposed the Frequency Increment Test (FIT). More recently, Topa et al. [62] proposed a Gaussian Process (GP) for modeling single-locus time-series pool-seq data. Terhorst et al. [60] extended GP to compute joint likelihood of multiple loci under null and alternative hypotheses. Recently, Schraiber et al. [54] proposed a Bayesian framework to estimate parameters using Monte Carlo Markov chain sampling.

While existing methods have been successfully applied to their corresponding application, they make some assumptions which may not hold in E&R studies. First, they assume that the underlying population size is large, so it is reasonable to model dynamics of allele frequencies using continuous state models. A number of existing methods were originally designed to process wide time spans such as ancient DNA studies. Finally, they assume that input data is in the form of unbiased allele frequencies, which may not be valid for shotgun sequencing experiments.

Here, we consider a Hidden Markov Model (HMM), similar to Williamson et al. [67] and Boll-back et al.’s [11] but under a “small-population-size” regime. Specifically, we use a discrete state (frequency) model. We show that for small population sizes, discrete models can compute likeli-hood exactly, which improves statistical performance, especially for short time-span experiments. Additionally, we add another level of sampling-noise to the traditional HMM model, allowing for heterogeneous ascertainment bias due to uneven coverage among variants. We show that for a wide range of parameters, Clear provides higher power for detecting selection, estimates model parameters consistently, and localizes favored allele more accurately compared to the state-of-the-art methods, while being computationally efficient.

2 Materials and Methods

Consider a panmictic diploid population with fixed size of N individuals. Let ν = {ν_t}_{t∈ 𝒯} be frequencies of the derived allele at generations t ∈ 𝒯 for a given variant, where at generations 𝒯 = {τ_i : 0 ≤ τ₀ < τ₁,… < τ_T} samples of n individuals are chosen for pooled sequencing. The experiment is replicated R times. We denote allele frequencies of the R replicates by the set {ν}_R. To identify the genes and variants that are responding to selection pressure, we use the following procedure:

(i) Estimating population size. The procedure starts by estimating the effective population size, N̂, under the assumption that much of the genome is evolving neutrally.
(ii) Estimating selection parameters. For each polymorphic site, selection and dominance parameters s, h are estimated so as to maximize the likelihood of the time series data, given N̂
(iii) Computing likelihood statistics. For each variant, a log-odds ratio of the likelihood of selection model (s > 0) to the likelihood of neutral evolution/drift model is computed. Likelihood ratios in a genomic region are combined to compute the Clear statistic for the region.
(iv) Hypothesis testing. An empirical null distribution of the Clear statistic is calculated using genome-wide drift simulations, and used to compute p-values and thresholds for a specified FDR. We perform single locus hypothesis testing within selected regions to identify significant variants and report genes that intersect with the selected variants.

These steps are described in detail below.

2.1 Estimating Population Size

Methods for estimating population sizes from temporal neutral evolution data have been developed [2, 11, 33, 60, 67]. Here, we aim to extend these models to explicitly model the sampling noise that arise in pool-seq data. Specifically, we model the variation in sequence coverage over different locations, and the noise due to sequencing only a subset of the individuals in the population. In addition, many existing methods [11, 23, 60, 62] are designed for large populations, and model frequency as a continuous quantity. We show that smooth approximations may be inadequate for small populations, low starting frequencies and sparse sampling (in time) that are typical in experimental evolution (see Results, Fig 3A-C, and Fig 2). To this end, we model the Wright-Fisher Markov process for generating pool-seq data (Fig S1) via a discrete HMM (Fig 1-B). We start by computing a likelihood function for the population size given neutral pool-seq data.

Fig 1: Evolve and Resequence Selection Experiments on D. melanogaster.

(A) Typical configuration in which time-series data is collected for D. melanogaster. A small set of founder lines (F = 200) is selected from a large population (N_o = 10⁶), and used to create a sub-population of isofemale lines. Multiple replicates of the population are evolved and resequenced to collect time-series genomic data. For sequencing, n individuals are randomly sampled and sequenced with coverage λ. (B) Graphical model showing dependence of the random variables in the singlelocus model used to compute Clear statistics. Observed variables, c (derived allele read count) and d (total read count) are shaded. The variables v, y, λ denote allele frequency, sampled allele frequency, and mean sequencing coverage, respectively. (C) Mean and 95% confidence interval of the theoretical (i,iii) and empirical (ii,iv) trajectories of the favored allele for hard (i,ii) and soft (iii,iv) sweep scenarios and N = 1000. The first 50 generations are shaded in gray to represent the sampling span of sampling in short-term experiments, illustrating the difficulty in predicting selection at early stages of selective sweep.

Fig 2: Comparison of empirical distributions of allele frequencies (red) versus predictions from Brownian Motion (green), and Markov chain (blue).

Comparison of empirical and theoretical distributions under neutral evolution (panels A-F) and selection (panels G-M) with different starting frequencies ν₀ ∈ {0.005,0.1} and sampling times of 𝒯 = {0, τ}, where τ ∈ {1,10,100}. For each panel, the empirical distribution was computed over 100,000 simulations. Brownian motion (Gaussian approximation) provides poor approximations when initial frequency is far from 0.5 (A) or sampling is sparse (B,C,E,F). In addition, Brownian motion can only provide approximations under neutral evolution. In contrast, Markov chain consistently provides a good approximation in all cases.

Fig 3: Power calculations for detection of selection.

Detection power for Clear(𝓗), Frequency Increment Test (FIT), Gaussian Process (GP), and CMH under hard (A-C) and soft sweep (D-F) scenarios. λ, s denote the mean coverage and selection coefficient, respectively. The y-axis measures power – sensitivity with false positive rate FPR ≤ 0.05 – for 2, 000 simulations with N = 1, 000, L = 50Kbp. The horizontal line reflects the power of a random classifier. In all simulations, 3 replicates are evolved and sampled at generations 𝒯 = {0,10, 20, 30, 40, 50}.

Likelihood for Neutral Model

We model the allele frequency counts 2Nν_t as being sampled from a Binomial distribution. Specifically, where π is the global distribution of allele frequencies in the base population. Here we simply assume is π is the site frequency spectrum of fixed sized neutral population Fig S2. Note that π may depend on the demographic history of the founder lines.

To estimate frequency after τ transitions, it is enough to specify the 2N × 2N transition matrix P^(τ), where P^(τ)[i,j] denotes probability of change in allele frequency from i/2N to j/2N in τ generations:

Furthermore, in an E&R experiment, n ≤ N individuals are randomly selected for sequencing. The sampled allele frequencies, {y_t}_t∈𝒯, are also Binomially distributed

We introduce the 2N × 2n sampling matrix Y, where Y[i, j] stores the probability that the sample allele frequency is j/2n given that the true allele frequency is i/2N.

We denote the pool-seq data for that variant as {x_t = <c_t,d_t>}_t∈ _𝒯 where d_t,c_t represent the coverage, and the read count of the derived allele, respectively. Let {λ_t}_t _∈ _𝒯 be the sequencing coverage at different generations. Then, the observed data are sampled according to

The emission probability for a observed tuple x_t = <d_t, c_t> is

For 1 ≤ t ≤ T, 1 ≤ j ≤ 2N, let α_t,j denote the probability of emitting x₁,x₂,…,x_t and reaching state j at τ_t. Then, α_t can be computed using the forward-procedure [19]: where δ_t = τ_t − τ_t-1. The joint likelihood of the observed data from R independent observations is given by where x = {x_t}_t∈𝒯. The graphical model and the generative process for which data is being generated is depicted in Fig 1-B and Fig S1, respectively.

Finally, the last step is to compute an estimate N̂ that maximizes the likelihood of all M variants in whole genome. Let denote the time-series data of the i-th variant in replicate r. Then,

2.2 Estimating Selection Parameters

Likelihood for Selection Model

Assume that the site is evolving under selection constraints s ∈ ℝ, h ∈ ℝ₊, where s and h denote selection strength and dominance parameters, respectively. By definition, the relative fitness values of genotypes 0|0, 0|1 and 1|1 are given by w₀₀ = 1, w₀₁ = 1 + hs and w₁₁ = 1 + s. Then, ν_t+, the frequency at time τ _t + 1 (one generation ahead), can be estimated using:

The machinery for computing likelihood of the selection parameters is identical to that of population size, except for transition matrices. Hence, here we only describe the definition transition matrix Q_s,_h of the selection model. Let denote the probability of transition from i/2N to j/2N in τ generations, then (See [20], Pg. 24, Eqn. 1.58-1.59):

The maximum likelihood estimates are given by

Using grid search, we first estimate N (Eq. 8), and subsequently, we estimate parameters s, h (Eq. 12, Fig S3). By broadcasting and vectorizing the grid search operations across all variants, the genome scan on millions of polymorphisms can be done in significantly smaller time than iterating a numerical optimization routine for each variant(see Results and Fig 4).

Fig 4: Running time.

Box plots of running time per variant (CPU-secs.) of Clear(𝓗), CMH, FIT, and GP with single, 3, 5, 7, and 10 loci over 1000 simulations conducted on a workstation with Intel Core i7 processor. The average running time for each method is shown on the x-axis. In all simulations, 3 replicates are evolved and sampled at generations 𝒯 = {0; 10; 20; 30; 40; 50}.

2.3 Empirical Likelihood Ratio Statistics

The likelihood ratio statistic for testing directional selection, to be computed for each variant, is given by where s̅ =arg max. Similarly we can define a test statistic for testing if selection is dominant by

While extending the single-locus WF model to a multiple linked-loci can improve the power of the model [60], it is computationally and statistically expensive to compute exact likelihood. In addition, computing linked-loci joint likelihood requires haplotype resolved data, which pool-seq does not provide. Here, similar to Nielsen et al [45], we calculate composite likelihood ratio score for a genomic region. where L is a collection of segregating sites and H_ℓ is the likelihood ratio score based for each variant ℓ in L. The optimal value of the hyper-parameter L depends upon a number of factors, including initial frequency of the favored allele, recombination rates, linkage of the favored allele to neighboring variants, population size, coverage, and time since the onset of selection (duration of the experiment). In S1 Text, we provide a heuristic to compute a reasonable value of L, based on experimental data.

We work with a normalized value of 𝓗, given by where μ_C and σ_C are the mean and standard deviation of 𝓗 values in a large region 𝒞. We found different chromosomes to have different distribution of 𝓗_i values, and therefore decided to use single chromosomes as 𝒞.

2.4 Hypothesis Testing

Single-Locus tests

Under neutrality, Log-likelihood ratios can be approximated by 𝒳² distribution [66], and p-values can be computed directly. However, Feder et al. [23] showed that when the number of independent samples (replicates) is small, 𝒳² is a crude approximation to the true null distribution and results in more false positive. Following their suggestion, we first compute the empirical null distribution using simulations with the estimated population size (See Fig S1). The empirical null distribution of statistic H is used to compute p-values as the fraction of null values that exceed the test score. Finally, we use Storey and Tibshirani’s method [59] to control for False Discovery Rate in multiple testing.

Composite likelihood tests

Similar to single-locus tests, we compute the null distribution of the 𝓗* statistic using whole-genome simulations with the estimated population size, and subsequently compute FDR. The simulations for generating the null distribution of 𝓗* are described next.

2.5 Simulations

We use the same simulation procedure for two purposes. First, we use them to test the power of Clear against other methods in small genomic windows. Second, we use the simulations to generate the distribution of null values for the statistic to compute empirical p-values. We mainly chose parameters that are relevant to D. melanogaster experimental evolution [35]. See also Fig 1-A for illustration.

I. Creating initial founder line haplotypes. Using msms [21], we created neutral populations for F founding haplotypes with command $./msms <F> 1 -t <2μWNe> -r <2rNeW> <W>, where F = 200 is number of founder lines, N_o = 10⁶ is effective founder population size, r = 2 × 10⁻⁸ is recombination rate, μ = 2 × 10⁻⁹ is mutation rate. The window size W is used to compute θ = 2 μ N_oW and ρ = 2N_orW. We chose W = 50Kbp for simulating individual windows for performance evaluations, and W = 20Mbp for simulating D. melanogaster chromosomes for p-value computations.
II. Creating initial diploid population. An initial set of F = 200 haplotypes was created from step I, and duplicated to create F homozygous diploid individuals to simulate generation of inbred lines. N diploid individuals were generated by sampling with replacement from the F individuals.
III. Forward Simulation. We used forward simulations for evolving populations under selection. We also consider selection regimes which the favored allele is chosen from standing variation (not de novo mutations). Given initial diploid population, position of the site under selection, selection strength s, number of replicates R = 3, recombination rate r = 2 × 10⁻⁸ and sampling times 𝒯 = {0,10, 20, 30, 40, 50}, simuPop [48] was used to perform forward simulation and compute allele frequencies for all of the R replicates. For hard sweep (respectively, soft sweep) simulations we randomly chose a site with initial frequency of ν₀ = 0.005 (respectively, ν₀ = 0.1) to be the favored allele. For generating the null distribution with drift for p-value computations, we used this procedure with s = 0.
IV. Sequencing Simulation. Given allele frequency trajectories we sampled depth of each site in each replicate identically and independently from Poisson(λ), where λ ∈ {30,100,300} is the coverage for the experiment. Once depth d is drawn for the site with frequency ν, the number of reads c carrying the derived allele are sampled according to Binomial(d,ν). For experiments with finite depth the tuple <c, d> is the input data for each site.

3 Results

Modeling Allele Frequency Trajectories in Small Populations

We first tested the goodness of fit of the discrete versus continuous models in modeling allele frequency trajectories, under general E&R parameters. For this purpose, we conducted 100K simulations with two time samples 𝒯 = {0, τ} where τ ∈ {1,10,100} is the parameter controlling the density of sampling in time. In addition, we repeated simulations for different values of starting frequency ν₀ ∈ {0.005,0.1} (i.e., hard and soft sweep) and selection strength s ∈ {0,0.1} (i.e., neutral and selection). Then, given initial frequency ν₀, we computed the expected distribution of the frequency of the next sample ν _τ under two models to make a comparison. Fig 2A-F shows that Brownian motion (continuous model) is inadequate when ν₀ is far from 0.5, or when sampling times are sparse (τ > 1). If the favored allele arises from standing variation in a neutral population, it is unlikely to have frequency close to 0.5, and the starting frequencies are usually much smaller (see Fig S2). Moreover, in typical D. melanogaster experiments for example, sampling is sparse. Often, the experiment is designed so that 10 ≤ τ ≤ 100 [26, 35, 46, 71].

In contrast to the Brownian motion approximation, discrete Markov chain predictions (Eq. 11) are highly consistent with empirical data for a wide range of simulation parameters (Fig 2A-M). Moreover, the discrete markov chain can be modified to model the case when the the allele is under selection.

Detection Power

We compared the performance of Clear against other methods for detecting selection. For each method we calculated detection power as the percentage of true-positives identified with false-positive rate ≤ 0.05. For each configuration (specified with values for selection coefficient s, starting allele frequency ν₀ and coverage λ), power of each method is evaluated over 2000 distinct simulations, half of which modeled neutral evolution and the rest modeled positive selection.

We compared the power of Clear with Gaussian process (GP) [60], FIT [23], and CMH [1] statistics. FIT and GP convert read counts to allele frequencies prior to computing the test statistic. Clear shows the highest power in all cases and the power stays relatively high even for low coverage (Fig 3 and Table S1). In particular, the difference in performance of Clear with other methods is pronounced when starting frequency is low. The advantage of Clear stems from the fact that favored allele with low starting frequency might be missed by low coverage sequencing. In this case, incorporating the signal from linked sites becomes increasingly important. We note that methods using only two time points, such as CMH, do relatively well for high selection values and high coverage. However, the use of time-series data can increase detection power in low coverage experiments or when starting frequency is low. Moreover, time-series data provide means for estimating selection parameters s, h (see below). Finally, as Clear is robust to change of coverage, our results (Fig 3B,C) suggest that taking many samples with lower coverage is preferable to sparse sampling with higher coverage.

Site-identification

In general, localizing the favored variant, using pool-seq data is a nontrivial task due to extensive linkage disequilibrium [61]. To measure performance, we sorted variants by their H scores and computed rank of the favored allele for each method. For each setting of ν₀ and s, we conducted 1000 simulations and computed the rank of the favored mutation in each simulation. The cumulative distribution of the rank of the favored allele in 1000 simulation for each setting (Fig 5) shows that Clear outperforms other statistics.

Fig 5: Ranking performance for 100× coverage.

Cumulative Distribution Function (CDF) of the distribution of the rank of the favored allele in 1000 simulations for Clear (H), Gaussian Process (GP), CMH, and Frequency Increment Test (FIT), for different values of selection coefficient s and initial carrier frequency. Note that the individual variant Clear score (H) is used to rank variants. The Area Under Curve (AUC) is computed as an overall quantitative measure to compare the performance of methods for each configuration. In all simulations, 3 replicates are evolved and sampled at generations 𝒯 = {0,10,20,30,40, 50}.

An interesting observation is revisiting the contrast between site-identification and detection [39, 61]. When selection strength is high, detection is easier (Fig 3A-F), but site-identification is harder, due to the high LD between flanking variants and the favored allele (Fig 5A-F). Moreover, site-identification becomes more difficult whenever the initial frequency of the favored allele is low, i.e., at the onset of selection, LD between favored allele and its nearby variants is high. For example, when coverage λ = 100 and selection coefficient s = 0.1, the detection power is 75% for hard sweep, but 100% for soft sweep (Fig 3B-E). In contrast, the favored site was ranked as the top in 14% of hard sweep cases, compared to and 95% of soft sweep simulations.

Estimating Parameters

Clear estimates effective population size N̂ and selection parameters, ŝ and ĥ, as a byproduct of the hypothesis testing. We computed bias of selection fitness (s – ŝ) and dominance (h – ĥ) for of Clear and GP for 1000 simulations in each setting. The distribution of the error (bias) for 100 × coverage is presented in Fig 6 for different configurations. Fig S4 and Fig S5 provide the distribution of estimation errors for 30 ×, and 300 × coverage, respectively. For hard sweep, Clear provides estimates of s with lower variance of bias (Fig 6A). In soft sweep, GP and Clear both provide unbiased estimates of s with low variance (Fig 6B). Fig 6C-D shows that Clear provides unbiased estimates of h as well when h ∈ {0,0.5,1, 2} and s = 0.1. We also tested if Clear provide unbiased estimates of N, by estimating population size on 1000 simulations when N ∈ {200, 600,1000}. As shown in Fig 7A-C, maximum likelihood is attained at true value of the parameter.

Fig 6: Distribution of bias for 100 × coverage.

The distribution of bias (s – ŝ) in estimating selection coefficient over 1000 simulations using Gaussian Process (GP) and Clear (H) is shown for a range of choices for the selection coefficient s and starting carrier frequency ν₀when coverage λ = 100 (Panels A,B). GP and Clear have similar variance in estimates of s for soft sweep, while Clear provides lower variance in hard sweep. Also see Table S2. Panels C,D show the variance in the estimation of h. In all simulations, 3 replicates are evolved and sampled at generations 𝒯 = {0,10,20,30, 40, 50}.

Fig 7: Maximum likelihood Estimates of N.

Mean and 95% confidence interval of likelihoods of N on simulated data with N = 200 (A), N = 600(B), and N = 1000 individuals, over 1000 simulations. Chromosome-wise (D) and genome-wide (E) likelihood of population size for data from a study of D. melanogaster adaptation to alternating temperatures. Likelihood of the Chromosome 3R is attained at 150, while genome-wide maximum likelihood estimate for population size is 250. (F) Likelihood of the population size with respect to all the variants in the yeast dataset. Despite large census population size (10⁶ – 10⁷ [14]), this dataset exhibits much smaller effective population size (N̂ = 2000).

Running Time

As Clear does not compute exact likelihood of a region (i.e., does not explicitly model linkage between sites), the complexity of scanning a genome is linear in number of polymorphisms. Calculating score of each variant requires and 𝒪(TRN³) computation for 𝓗. However, most of the operations are can be vectorized for all replicates to make the effective running time for each variant. We conducted 1000 simulations and measured running times for computing site statistics H, FIT, CMH and GP with different number of linked-loci. Our analysis reveals (Fig 4) that Clear is orders of magnitude faster than GP, and comparable to FIT. While slower than CMH on the time per variant, the actual running times are comparable after vectorization and broadcasting over variants (see below).

These times can have a practical consequence. For instance, to run GP in the single locus mode on the entire pool-seq data of the D. melanogaster genome from a small sample (≈1.6M variant sites), it would take 1444 CPU-hours (≈ 1 CPU-month). In contrast, after vectorizing and broadcasting operations for all variants operations using numba package, Clear took 75 minutes to perform an scan, including precomputation, while the fastest method, CMH, took 17 minutes.

3.1 Analysis of a D. melanogaster Adaptation to Alternating Temperatures

We applied Clear to the data from a study of D. melanogaster adaptation to alternating temperatures [26, 46], where 3 replicate samples were chosen from a population of D. melanogaster for 59 generations under alternating 12-hour cycles of hot stressful (28°C) and non-stressful (18°C) temperatures and sequenced. In this dataset, sequencing coverage is different across replicates and generations (see S2 Fig of [60]) which makes variant depths highly heterogeneous (Fig S8).

We first filtered out heterochromatic, centromeric and telomeric regions [25], and those variants that have collective coverage of more that 1500 in all 13 populations: three replicates at the base population, two replicates at generation 15, one replicate at generation 23, one replicate at generation 27, three replicates at generation 37 and three replicates at generation 59. After filtering, we ended up with 1,605,714 variants.

Next, we estimated genome-wide population size N̂ = 250 (Fig 7-E) which is consistent with previous studies [33, 46]. The likelihood curves of Clear are sharper around the optimum compared to that of Bollback et. al [11]’s method (see Supplementary Fig. 1 in [46]). Also, chromosomes 3L and 3R appear to have smaller population size Fig 7-D, N̂ = 200,150, respectively. Others have made similar observations on this data. In particular, Jonas et al. [33] shown that the chromosome-wise population size varies even more when it is computed for each replicate separately (see Table 1 in [33]). For instance, N̂ is 131 for chromosome 3R replicate 1, while it is 328 for chromosome X replicate 2.

While it would be ideal to compute Clear statistic for each replicate and chromosome separately, computing empirical p-values and significant regions become computationally intensive as empirical null distribution of each replicate and each chromosome needs to be computed. Hence, we use a single genome-wide estimate N̂ = 250 in all analyses, but we normalize statistic 𝓗* separately for each chromosome.

We use a heuristic calculation (See S1 Text) to choose the sliding window size L as the distance where the LD between the favored mutation and a site L/2bp away remains strong. For D. melanogaster parameters, we obtained L = 30kbp. We computed the normalized test statistic 𝓗* on sliding windows of size of 30Kbp and step size of 5Kbp over the genome (See Fig 8-A).

Fig 8: Scan of Clear statistic on data from a study of D. melanogaster adaptation to alternating temperatures.

(A) Manhattan plot of scan for 𝓗* statistic over the genome. The dashed line represents cutoff for genome-wide FDR≤ 0.05, and identifies 5 contiguous intervals, I1-I5, which are shaded in blue. (B) Trajectories of the selected variants within intervals I1-I5.

Fig 9: Single locus analysis of the yeast outcrossing populations.

Manhattan plot of scan for testing directional selection (A) and dominant selection (C). The dashed line represents cutoff for genome-wide FDR≤ 0.05. Trajectories of the selected variants are depicted in panels (B) and (D).

Empirical null distribution of 𝓗* was estimated by creating 100 whole genome simulations (400K statistic values) as described in Section 2.5. Then, p-value of the test statistic in each region in the experimental data was calculated as the fraction of the null statistic values that are greater than or equal to the test statistic(see Fig S9). After correcting for multiple testing, we identified 5 contiguous intervals (Fig 8) satisfying FDR≤ 0.05, and covering 2, 829 polymorphic sites. We further performed single-locus hypothesis testing on the 2,829 sites to identify 174 individual variants with FDR ≤ 0.01 (Fig 8-B).

The final set of 174 variants fall within 32 genes(Table S3) including many Serine inhibitory proteases (serpins), and other genes involved in endocytosis. Recycling of synaptic vesicles is seen to be blocked at high temperature in temperature sensitive Drosophila mutants [36]. This is also supported by GO enrichment analysis, where a single GO term ‘inhibition of proteolysis’ is found to enriched (corrected p-value:0.0041). To test for dominant selection, we computed D statistic on simulated neutral and experimental data, and computed p-values accordingly. After correcting for multiple testing, 96 variants were discovered with FDR ≤ 0.01 (Fig S10).

3.2 Analysis of Outcrossing Yeast Populations

We also applied Clear to 12 replicate samples of outcrossing yeast populations [14], where samples are taken at generations 𝒯 = {0,180,360, 540}. We observed a significant variation in the genome-wide site frequency spectrum of certain populations over different time points for some replicates (Fig S11). The variation does not have an easily identifiable cause. Therefore, we focused analysis on seven replicates r ∈ {3, 7,8,9,10,11,12} with genome-wide site-frequency spectrum over the time range (Fig S12).

We estimated population size to be N̂ = 2000 haplotypes, and computed ŝ, ĥ and H statistic accordingly. To compute p-values, we created 1M single-locus neutral simulations according to experimental data’s initial frequency and coverage. By setting FDR cutoff to 0.05, only 18 and 16 variants show significant signal for directional and dominant selection, respectively (Fig S10). Selected variants for directional selection are clustered in two regions, which match 2 of the 5 regions (regions C and E in Fig. 2-a in [14]) identified by Burke et al. in their preliminary analysis.

4 Discussion

We developed a computational tool, Clear, that can detect regions and variants under selection E&R experiments. Using extensive simulations, we show that Clear outperforms existing methods in detecting selection, locating the favored allele, and estimating model parameters. Also, while being computationally efficient, Clear provide means for estimating populations size and hypothesis testing.

Many factors such as small population size, finite coverage, linkage disequilibrium, finite sampling for sequencing, duration of the experiment and the small number of replicates can limit the power of tools for analyzing E&R. Here, by an discrete modeling, Clear estimates population size, and provides unbiased estimates of s, h. It adjusts for heterogeneous coverage of pool-seq data, and exploits presence of linkage within a region to compute composite likelihood ratio statistic.

It should be noted that, even though we described Clear for small fixed-size populations, the statistic can be adjusted for other scenarios, including changing population sizes when the demography is known. For large populations, transitions can be computed on sparse data structures, as for large N the transition matrices become increasingly sparse. Alternatively, frequencies can be binned to reduce dimensionality.

The comparison of hard and soft sweep scenarios showed that initial frequency of the favored allele can have an nontrivial effect on the statistical power for identifying selection. Interestingly, while it is easier to detect a region undergoing strong selection, it is harder to locate the favored allele in that region.

There are many directions to improve the analyses presented here. In particular, we plan to focus our attention on other organisms with more complex life cycles, experiments with variable population size and longer sampling-time-spans. As evolve and resequencing experiments continue to grow, deeper insights into adaptation will go hand in hand with improved computational analysis.

Software and Data Availability

The source code and running scripts for Clear are publicly available at https://github.com/airanmehr/clear.

D. melanogaster data originally published [26, 46]. The dataset of the D. melanogaster study, until generation 37, is obtained from Dryad digital repository (http://datadryad.org)under accession DOI: 10.5061/dryad.60k68. Generation 59 of the D. melanogaster study is accessed from European Sequence Read Archive (http://www.ebi.ac.uk/ena/) under the project accession number: PRJEB6340. The dataset containing experimental evolution of Yeast populations [14] is downloaded from http://wfitch.bio.uci.edu/~tdlong/PapersRawData/BurkeYeast.gz (last accessed 01/24/2017). UCSC browser tracks for D. melanogaster and Yeast data analysis are found in Suppl. Data 1 and 2, respectively.

Conflict of interest

VB is a co-founder, has an equity interest, and receives income from Digital Proteomics, LLC (DP). The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. DP was not involved in the research presented here.

5 S1 Text Choosing Window Size

In genome-wide scans for detecting selection, we apply the Clear statistic on sliding windows of length Lbp. The single locus statistic values within the window are averaged to get the composite statistic. While the statistic is robust to variation in window-size, choosing a very large window where LD has decayed will weaken the composite signal, and choosing a small window will decrease the power of composite likelihoods. Here, we use a systematic calculation to choose L as the distance where the LD between the favored mutation and a site L/2bp away remains strong.

Consider a segregating site l bp away from the favored allele in a selective sweep. Let ρ_τ be the LD between the favored allele and the site, τ generations after the onset of selection. Then, we have (see Eqs. 30-31 in [58]): where K^(τ) = 2ν _τ (1 – ν_τ) is the heterozygosity at the selected site, r is the recombination rate (crossovers/bp/gen). The ‘decay factor’, α _τ = e^−rτl, and ‘growth factor’, β_τ, are due to recombination and selection, respectively. Under regular parameter settings, linkage to the favored allele is expected to increase after onset of selection and then decreases due to crossover events (See Fig S13-A). While ρ₀ is unknown in pool-seq E&R experiments, we compute the value of l so that

In E&R scenarios, we let τ be the time of the last sampling. For given s, we aim to compute the smallest window size L over all possible starting frequencies. Specifically, where the term ν̂ _τ depends on initial frequency ν₀ and selection strength s (Eq. 9).

We used D. melanogaster dataset parameters, N = 250, r = 2 × 10⁻⁸ and τ = 59 to compute the optimal window size for different values of N s, ranging from weak selection to strong selection: N s ∈ {20,100,200, 500}, or s ∈ {0.08, 0.4,0.8,2}. We set L = 30Kbp (See Fig S13-B) to provide good resolution for detecting weak selection.

Fig S1:

The Generative Process for Neutral Wright-Fisher Time-series Pool-seq Data.

Fig S2: Site Frequency Spectrum.

Theoretical and Empirical SFS in a 50Kbp region for a neutral population of 200 individuals when N_e = 10⁶ and μ = 10⁻⁹. The x-axis corresponds to site frequency, and the y-axis to the number of variants with a specific frequency. In a neural population, majority of the variations stand in low frequency.

Fig S3: Likelihoods of the parameter s.

Likelihood of the parameter s in D. melanogaster data for a variant with ŝ = 0.2 (A) and ŝ = 0 (B).

Fig S4: Distribution of bias for 30 × coverage.

The distribution of bias (s – ŝ) in estimating selection coefficient over 1000 simulations using Gaussian Process (GP) and Clear (H) is shown for a range of choices for the selection coefficient s and starting carrier frequency when coverage λ = 30 (Panels A,B). GP and Clear have similar variance in estimates of s for soft sweep, while Clear provides lower variance in hard sweep. Also see Table S2. Panels C,D show the variance in the estimation of h.

Fig S5: Distribution of bias for 300 × coverage.

The distribution of bias (s – ŝ) in estimating selection coefficient over 1000 simulations using Gaussian Process (GP) and Clear (H) is shown for a range of choices for the selection coefficient s and starting carrier frequency ν₀ when coverage λ = ∞ (Panels A,B). GP and rm better than others have similar variance in estimates of s for soft sweep, while Clear provides lower variance in hard sweep. Also see Table S2. Panels C,D show the variance in the estimation of h.

Fig S6: Ranking performance for 30 × coverage.

Cumulative Distribution Function (CDF) of the distribution of the rank of the favored allele in 1000 simulations for Clear (H score), Gaussian Process (GP), and Cochran Mantel Haenszel (CMH), for different values of selection coefficient s and initial carrier frequency.

Fig S7: Ranking performance for 300 × coverage.

Fig S8: Coverage heterogeneity in time series data.

Each panel shows the read depth for 3 replicates of the data from a study of D. melanogaster adaptation to alternating temperatures data (see section 3.1). Heterogeneity in depth of coverage is seen between replicates, and also at different time points, in all 4 variants. None of these sites pass the the hard filtering with minimum depth of 30.

Fig S9: Distribution of p-values.

Distribution of p-values of Clear in null simulations and experimental data when N = 250. Panel (A),(C) shows the effect of under estimations (N̂ = 100) and over-estimation (N̂ = 500) of population size in computing p-values, and panel (B) shows the distribution of p-values when unbiased estimate is used to create simulations.

Fig S10: Single locus analysis of the data from a study of D. melanogaster adaptation to alternating temperatures.

Manhattan plot of scan for testing dominant selection (A). Significant variants with FDR ≤ 0.01 are denoted in red, and their trajectories are depicted in panel (B).

Fig S11: Site frequency spectrum of the Yeast dataset.

Whole-genome site frequency spectrum of the Yeast dataset at generations 0 (A), 180 (B), 360 (C) and 540 (D). Some replicates, e.g. replicate 2, undergoing severe demographic events.

Fig S12: Population similarity.

Principle component analysis of the 12 replicates throughout the experiment, showing that some populations exhibiting distinct frequency spectra.

Fig S13: Choosing window size for Clear statistic.

(A) Expected dynamics of LD between favored allele (s = 0.025) and a variant 50Kbp away, with initial frequency ν₀ = 0.01. (A-i) depicts the dynamic of the favored allele during the selective sweep. (A-ii) illustrates interaction of the growth and decay factors introduced in Eq. S1, with the red line describing overall effect of selection and recombination on LD. The vertical dashed line points to the time when the LD value starts to decrease below original LD. (B) Alternatively, we can fix time, and find the window-size at which LD decays below the original LD (Eq. S3). The plot shows the window size as a function of N s (20,100,200,500), after fixing other model parameters to match D. melanogaster E&R experiments (N = 250, r = 2 × 10⁸, τ= 59).

View this table:

Table S1: Average of power for detecting selection.

Average power is computed for 8000 simulations with s ∈ {0.025,0.05,0.075,0.1}. Frequency Increment Test (FIT), Gaussian Process (GP), Clear (𝓗 statistic) and Cochran Mantel Haenszel (CMH) are compared for different initial carrier frequency ν₀. For all sequencing coverages, Clear outperform other methods. When coverage is not high (λ ∈ {30,100}) and initial frequency is low (hard sweep), Clear significantly perform better than others.

View this table:

Table S2:

Mean and standard deviation of the distribution of bias (s – ŝ) of 8000 simulations with coverage λ = 100× and s ∈ {0.025, 0.05, 0.075, 0.1}.

View this table:

Table S3:

Overlapping genes with the 174 candidate variants.

Acknowledgments

AI, AA, and VB were supported by grants from the NIH (1R01GM114362) and NSF (DBI-1458557 and IIS-1318386). CS is supported by the European Research Council grant ArchAdapt.

References

[1].↵
Alan Agresti and Maria Kateri. Categorical data analysis. Springer, 2011.
[2].↵
Eric C Anderson, Ellen G Williamson, and Elizabeth A Thompson. Monte Carlo evaluation of the likelihood for Ne from temporally spaced samples. Genetics, 156(4):2109–2118, 2000.
OpenUrl Abstract/FREE Full Text
[3].↵
Frédéric Ariey, Benoit Witkowski, Chanaki Amaratunga, Johann Beghain, Anne-Claire Lan-glois, Nimol Khim, Saorin Kim, Valentine Duru, Christiane Bouchier, Laurence Ma, and Others. A molecular marker of artemisinin-resistant Plasmodium falciparum malaria. Nature, 505(7481):50–55, 2014.
OpenUrl CrossRef PubMed Web of Science
[4].↵
James G Baldwin-Brown, Anthony D Long, and Kevin R Thornton. The power to detect quantitative trait loci using resequenced, experimentally evolved populations of diploid, sexual organisms. Molecular biology and evolution, page msu048, 2014.
[5].↵
Rowan D H Barrett, Sean M Rogers, and Dolph Schluter. Natural selection on a major armor gene in threespine stickleback. Science, 322(5899):255–257, 2008.
OpenUrl Abstract/FREE Full Text
[6].↵
Jeffrey E Barrick and Richard E Lenski. Genome dynamics during experimental evolution. Nature Reviews Genetics, 14(12):827–839, 2013.
OpenUrl CrossRef PubMed
[7].↵
Jeffrey E Barrick, Dong Su Yu, Sung Ho Yoon, Haeyoung Jeong, Tae Kwang Oh, Dominique Schneider, Richard E Lenski, and Jihyun F Kim. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature, 461(7268):1243–1247, 2009.
OpenUrl CrossRef PubMed Web of Science
[8].↵
Alan O Bergland, Emily L Behrman, Katherine R O’Brien, Paul S Schmidt, and Dmitri A Petrov. Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. PLoS Genet, 10(11):e1004775, 2014.
OpenUrl CrossRef PubMed
[9].↵
Todd Bersaglieri, Pardis C Sabeti, Nick Patterson, Trisha Vanderploeg, Steve F Schaffner, Jared A Drake, Matthew Rhodes, David E Reich, and Joel N Hirschhorn. Genetic signatures of strong recent positive selection at the lactase gene. The American Journal of Human Genetics, 74(6):1111–1120, 2004.
OpenUrl CrossRef PubMed Web of Science
[10].↵
Jonathan P Bollback and John P Huelsenbeck. Clonal interference is alleviated by high mutation rates in large populations. Molecular biology and evolution, 24(6):1397–1406, 2007.
OpenUrl CrossRef PubMed Web of Science
[11].↵
Jonathan P Bollback, Thomas L York, and Rasmus Nielsen. Estimation of 2Nes from temporal allele frequency data. Genetics, 179(1):497–502, 2008.
OpenUrl Abstract/FREE Full Text
[12].↵
Adam R Boyko, Scott H Williamson, Amit R Indap, Jeremiah D Degenhardt, Ryan D Hernandez, Kirk E Lohmueller, Mark D Adams, Steffen Schmidt, John J Sninsky, Shamil R Sunyaev, and Others. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet, 4(5):e1000083, 2008.
OpenUrl CrossRef PubMed
[13].↵
Molly K Burke, Joseph P Dunham, Parvin Shahrestani, Kevin R Thornton, Michael R Rose, and Anthony D Long. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature, 467(7315):587–590, 2010.
OpenUrl CrossRef PubMed Web of Science
[14].↵
Molly K Burke, Gianni Liti, and Anthony D Long. Standing genetic variation drives repeatable experimental evolution in outcrossing populations of Saccharomyces cerevisiae. Molecular biology and evolution, page msu256, 2014.
[15].↵
P Daborn, S Boundy, J Yen, B Pittendrigh, and Others. DDT resistance in Drosophila correlates with Cyp6g1 over-expression and confers cross-resistance to the neonicotinoid imi-dacloprid. Molecular Genetics and Genomics, 266(4):556–563, 2001.
OpenUrl CrossRef PubMed Web of Science
[16].↵
Rachel Daniels, Hsiao-Han Chang, Papa Diogoye Séne, Danny C Park, Daniel E Neafsey, Stephen F Schaffner, Elizabeth J Hamilton, Amanda K Lukens, Daria Van Tyne, Souleymane Mboup, and Others. Genetic surveillance detects both clonal and epidemic transmission of malaria following enhanced intervention in Senegal. PLoS One, 8(4):e60780, 2013.
OpenUrl CrossRef PubMed
[17].↵
Vincent J Denef and Jillian F Banfield. In situ evolutionary rate measurements show ecological success of recently emerged bacterial hybrids. Science, 336(6080):462–466, 2012.
OpenUrl Abstract/FREE Full Text
[18].↵
Michael M Desai and Joshua B Plotkin. The polymorphism frequency spectrum of finitely many sites under selection. Genetics, 180(4):2175–2191, 2008.
OpenUrl Abstract/FREE Full Text
[19].↵
Richard Durbin, Sean R Eddy, Anders Krogh, and Graeme Mitchison. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press, 1998.
[20].↵
Warren J Ewens. Mathematical Population Genetics 1: Theoretical Introduction, volume 27. Springer Science & Business Media, 2012.
[21].↵
Gregory Ewing and Joachim Hermisson. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics, 26(16):2064–2065, 2010.
OpenUrl CrossRef PubMed Web of Science
[22].↵
Shaohua Fan, Matthew E B Hansen, Yancy Lo, and Sarah A Tishkoff. Going global by adapting local: A review of recent human adaptation. Science, 354(6308):54–59, 2016.
OpenUrl Abstract/FREE Full Text
[23].↵
Alison F Feder, Sergey Kryazhimskiy, and Joshua B Plotkin. Identifying signatures of selection in genetic time series. Genetics, 196(2):509–522, 2014.
OpenUrl Abstract/FREE Full Text
[24].↵
Alison F Feder, Soo-Yon Rhee, Susan P Holmes, Robert W Shafer, Dmitri A Petrov, and Pleuni S Pennings. More effective drugs lead to harder selective sweeps in the evolution of drug resistance in HIV-1. eLife, 5, jan 2016.
[25].↵
Anna-Sophie Fiston-Lavier, Nadia D Singh, Mikhail Lipatov, and Dmitri A Petrov. Drosophila melanogaster recombination rate calculator. Gene, 463(1):18–20, 2010.
OpenUrl CrossRef PubMed Web of Science
[26].↵
Susanne U Franssen, Viola Nolte, Ray Tobler, and Christian Schlötterer. Patterns of linkage disequilibrium and long range hitchhiking in evolving experimental Drosophila melanogaster populations. Molecular biology and evolution, 32(2):495–509, 2015.
OpenUrl CrossRef PubMed
[27].↵
Michael M Gottesman. Mechanisms of cancer drug resistance. Annual review of medicine, 53(1):615–627, 2002.
OpenUrl CrossRef PubMed Web of Science
[28].↵
Matthew Hegreness, Noam Shoresh, Daniel Hartl, and Roy Kishony. An equivalence principle for the incorporation of favorable mutations in asexual populations. Science, 311(5767):1615–1617, 2006.
OpenUrl Abstract/FREE Full Text
[29].↵
Christopher J R Illingworth and Ville Mustonen. Distinguishing driver and passenger mutations in an evolutionary history categorized by interference. Genetics, 189(3):989–1000, 2011.
OpenUrl Abstract/FREE Full Text
[30].↵
Christopher J R Illingworth, Leopold Parts, Stephan Schiffels, Gianni Liti, and Ville Mustonen. Quantifying selection acting on a complex trait using allele frequency time series data. Molecular biology and evolution, 29(4):1187–1197, 2012.
OpenUrl CrossRef PubMed Web of Science
[31].↵
Minako Izutsu, Atsushi Toyoda, Asao Fujiyama, Kiyokazu Agata, and Naoyuki Fuse. Dynamics of Dark-Fly Genome Under Environmental Selections. G3: Genes— Genomes— Genetics, pages g3—-115, 2015.
[32].↵
Aashish R Jha, Cecelia M Miles, Nodia R Lippert, Christopher D Brown, Kevin P White, and Martin Kreitman. Whole-genome resequencing of experimental populations reveals polygenic basis of egg-size variation in Drosophila melanogaster. Molecular biology and evolution, 32(10):2616–2632, 2015.
OpenUrl CrossRef PubMed
[33].↵
Ágnes Jónás, Thomas Taus, Carolin Kosiol, Christian Schlötterer, and Andreas Futschik. Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution. Genetics, aug 2016.
[34].↵
Tadeusz J Kawecki, Richard E Lenski, Dieter Ebert, Brian Hollis, Isabelle Olivieri, and Michael C Whitlock. Experimental evolution. Trends in ecology & evolution, 27(10):547–560, 2012.
OpenUrl CrossRef PubMed
[35].↵
Robert Kofler and Christian Schlötterer. A guide for the design of evolve and resequencing studies. Molecular biology and evolution, page mst221, 2013.
[36].↵
Toshio Kosaka and Kazuo Ikeda. Reversible Blockage of Membrane Retrieval and Endocytosis in the Garland Cell of the Temperature-sensitive. The Journal of cell biology, 97, 1983.
[37].↵
Gregory I Lang, David Botstein, and Michael M Desai. Genetic variation and the fate of beneficial mutations in asexual populations. Genetics, 188(3):647–661, 2011.
OpenUrl Abstract/FREE Full Text
[38].↵
Gregory I Lang, Daniel P Rice, Mark J Hickman, Erica Sodergren, George M Weinstock, David Botstein, and Michael M Desai. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature, 500(7464):571–574, 2013.
OpenUrl CrossRef PubMed Web of Science
[39].↵
Quan Long, Fernando A Rabanal, Dazhe Meng, Christian D Huber, Ashley Farlow, Alexander Platzer, Qingrun Zhang, Bjarni J Vilhjálmsson, Arthur Korte, Viktoria Nizhynska, and Others. Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nature genetics, 45(8):884–890, 2013.
OpenUrl CrossRef PubMed
[40].↵
Anna-Sapfo Malaspinas, Orestis Malaspinas, Steven N Evans, and Montgomery Slatkin. Estimating allele age and selection coefficient from time-serial data. Genetics, 192(2):599–607, 2012.
OpenUrl Abstract/FREE Full Text
[41].↵
Frank Maldarelli, Mary Kearney, Sarah Palmer, Robert Stephens, JoAnn Mican, Michael A Polis, Richard T Davey, Joseph Kovacs, Wei Shao, Diane Rock-Kress, and Others. HIV populations are large and accumulate high genetic diversity in a nonlinear fashion. Journal of virology, 87(18):10313–10323, 2013.
OpenUrl Abstract/FREE Full Text
[42].↵
Nelson E Martins, Vítor G Faria, Viola Nolte, Christian Schlötterer, Luis Teixeira, Élio Sucena, and Sara Magalhães. Host adaptation to viruses relies on few genes with different crossresistance properties. Proceedings of the National Academy of Sciences, 111(16):5938–5943, 2014.
OpenUrl Abstract/FREE Full Text
[43].↵
Iain Mathieson and Gil McVean. Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics, 193(3):973–984, 2013.
OpenUrl Abstract/FREE Full Text
[44].↵
Shalini Nair, Denae Nash, Daniel Sudimack, Anchalee Jaidee, Marion Barends, Anne-Catrin Uhlemann, Sanjeev Krishna, François Nosten, and Tim J C Anderson. Recurrent gene amplification and soft selective sweeps during evolution of multidrug resistance in malaria parasites. Molecular Biology and Evolution, 24(2):562–573, 2007.
OpenUrl CrossRef PubMed Web of Science
[45].↵
Rasmus Nielsen, Scott Williamson, Yuseob Kim, Melissa J Hubisz, Andrew G Clark, and Carlos Bustamante. Genomic scans for selective sweeps using SNP data. Genome research, 15(11):1566–1575, 2005.
OpenUrl Abstract/FREE Full Text
[46].↵
Pablo Orozco-ter Wengel, Martin Kapun, Viola Nolte, Robert Kofler, Thomas Flatt, and Christian Schlötterer. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Molecular ecology, 21(20):4931–4941, 2012.
OpenUrl CrossRef PubMed Web of Science
[47].↵
Tugce Oz, Aysegul Guvenek, Sadik Yildiz, Enes Karaboga, Yusuf Talha Tamer, Nirva Mum-cuyan, Vedat Burak Ozan, Gizem Hazal Senturk, Murat Cokol, Pamela Yeh, and Others. Strength of selection pressure is an important parameter contributing to the complexity of antibiotic resistance evolution. Molecular biology and evolution, page msu191, 2014.
[48].↵
Bo Peng and Marek Kimmel. simuPOP: a forward-time population genetics simulation environment. Bioinformatics, 21(18):3686–3687, 2005.
OpenUrl CrossRef PubMed Web of Science
[49].↵
Edward Pollak. A new method for estimating the effective population size from allele frequency changes. Genetics, 104(3):531–548, 1983.
OpenUrl Abstract/FREE Full Text
[50].↵
Brian J Reid, Rumen Kostadinov, and Carlo C Maley. New strategies in Barrett’s esophagus: integrating clonal evolutionary theory with clinical management. Clinical Cancer Research, 17(11):3512–3519, 2011.
OpenUrl Abstract/FREE Full Text
[51].↵
Silvia C Remolina, Peter L Chang, Jeff Leips, Sergey V Nuzhdin, and Kimberly A Hughes. Genomic basis of aging and life-history evolution in Drosophila melanogaster. Evolution, 66(11):3390–3403, 2012.
OpenUrl CrossRef PubMed Web of Science
[52].↵
Stanley A Sawyer and Daniel L Hartl. Population genetics of polymorphism and divergence. Genetics, 132(4):1161–1176, 1992.
OpenUrl Abstract/FREE Full Text
[53].↵
Christian Schlötterer, R Kofler, E Versace, R Tobler, and S U Franssen. Combining experimental evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation. Heredity, 114(5):431–440, 2015.
OpenUrl CrossRef PubMed
[54].↵
Joshua G Schraiber, Steven N Evans, and Montgomery Slatkin. Bayesian inference of natural selection from allele frequency time series. Genetics, 203(1):493–511, 2016.
OpenUrl Abstract/FREE Full Text
[55].↵
Tatum S Simonson, Yingzhong Yang, Chad D Huff, Haixia Yun, Ga Qin, David J Witherspoon, Zhenzhong Bai, Felipe R Lorenzo, Jinchuan Xing, Lynn B Jorde, and Others. Genetic evidence for high-altitude adaptation in Tibet. Science, 329(5987):72–75, 2010.
OpenUrl Abstract/FREE Full Text
[56].↵
Brad Spellberg, Robert Guidos, David Gilbert, John Bradley, Helen W Boucher, W Michael Scheld, John G Bartlett, John Edwards, Infectious Diseases Society of America, and Others. The epidemic of antibiotic-resistant infections: a call to action for the medical community from the Infectious Diseases Society of America. Clinical Infectious Diseases, 46(2):155–164, 2008.
OpenUrl CrossRef PubMed Web of Science
[57].↵
Matthias Steinrücken, Anand Bhaskar, and Yun S Song. A novel spectral method for inferring general diploid selection from time series genetic data. The annals of applied statistics, 8(4):2203, 2014.
OpenUrl
[58].↵
Wolfgang Stephan, Yun S Song, and Charles H Langley. The hitchhiking effect on linkage disequilibrium between linked neutral loci. Genetics, 172(4):2647–2663, 2006.
OpenUrl Abstract/FREE Full Text
[59].↵
John D Storey and Robert Tibshirani. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16):9440–9445, 2003.
OpenUrl Abstract/FREE Full Text
[60].↵
Jonathan Terhorst, Christian Schlötterer, and Yun S Song. Multi-locus Analysis of Genomic Time Series Data from Experimental Evolution. PLoS Genet, 11(4):e1005069, 2015.
OpenUrl CrossRef PubMed
[61].↵
Ray Tobler, Susanne U Franssen, Robert Kofler, Pablo Orozco-terWengel, Viola Nolte, Joachim Hermisson, and Christian Schlötterer. Massive habitat-specific genomic response in D. melanogaster populations during experimental evolution in hot and cold environments. Molecular biology and evolution, 31(2):364–375, 2014.
OpenUrl CrossRef PubMed Web of Science
[62].↵
Hande Topa, Ágnes Jónás, Robert Kofler, Carolin Kosiol, and Antti Honkela. Gaussian process test for high-throughput sequencing time series: application to experimental evolution. Bioinformatics, page btv014, 2015.
[63].↵
Thomas L Turner, Andrew D Stewart, Andrew T Fields, William R Rice, and Aaron M Tarone. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet, 7(3):e1001336, 2011.
OpenUrl CrossRef PubMed
[64].↵
Jinliang Wang. A pseudo-likelihood method for estimating effective population size from temporally spaced samples. Genetical research, 78(03):243–257, 2001.
OpenUrl CrossRef PubMed Web of Science
[65].↵
Robin S Waples. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics, 121(2):379–391, 1989.
OpenUrl Abstract/FREE Full Text
[66].↵
David Williams and David Williams. Weighing the odds: a course in probability and statistics, volume 548. Springer, 2001.
[67].↵
Ellen G Williamson and Montgomery Slatkin. Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics, 152(2):755–761, 1999.
OpenUrl Abstract/FREE Full Text
[68].↵
Mark A Winters, Robert M Lloyd Jr., Robert W Shafer, Michael J Kozal, Michael D Miller, and Mark Holodniy. Development of elvitegravir resistance and linkage of integrase inhibitor mutations with protease and reverse transcriptase resistance mutations. PloS one, 7(7):e40514, 2012.
OpenUrl CrossRef PubMed
[69].↵
Xin Yi, Yu Liang, Emilia Huerta-Sanchez, Xin Jin, Zha Xi Ping Cuo, John E Pool, Xun Xu, Hui Jiang, Nicolas Vinckenbosch, Thorfinn Sand Korneliussen, and Others. Sequencing of 50 human exomes reveals adaptation to high altitude. Science, 329(5987):75–78, 2010.
OpenUrl Abstract/FREE Full Text
[70].↵
Hiba Zahreddine and K L Borden. Mechanisms and insights into drug resistance in cancer. Front Pharmacol, 4(28.10):3389, 2013.
OpenUrl
[71].↵
Dan Zhou, Nitin Udpa, Merril Gersten, DeeAnn W Visk, Ali Bashir, Jin Xue, Kelly A Frazer, James W Posakony, Shankar Subramaniam, Vineet Bafna, and Gabriel G. Haddad. Experimental selection of hypoxia-tolerant Drosophila melanogaster. Proceedings of the National Academy of Sciences, 108(6):2349–2354, 2011.
OpenUrl Abstract/FREE Full Text

View the discussion thread.

Posted February 23, 2017.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Evolutionary Biology

Subject Areas

All Articles

Animal Behavior and Cognition (5209)
Biochemistry (11730)
Bioengineering (8743)
Bioinformatics (29179)
Biophysics (14964)
Cancer Biology (12080)
Cell Biology (17399)
Clinical Trials (138)
Developmental Biology (9417)
Ecology (14174)
Epidemiology (2067)
Evolutionary Biology (18294)
Genetics (12233)
Genomics (16791)
Immunology (11858)
Microbiology (28051)
Molecular Biology (11575)
Neuroscience (60919)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4955)
Plant Biology (10422)
Scientific Communication and Education (1682)
Synthetic Biology (2881)
Systems Biology (7338)
Zoology (1650)

[1] [1].↵
Alan Agresti and Maria Kateri. Categorical data analysis. Springer, 2011.

[2] [2].↵
Eric C Anderson, Ellen G Williamson, and Elizabeth A Thompson. Monte Carlo evaluation of the likelihood for Ne from temporally spaced samples. Genetics, 156(4):2109–2118, 2000.
OpenUrl Abstract/FREE Full Text

[3] [3].↵
Frédéric Ariey, Benoit Witkowski, Chanaki Amaratunga, Johann Beghain, Anne-Claire Lan-glois, Nimol Khim, Saorin Kim, Valentine Duru, Christiane Bouchier, Laurence Ma, and Others. A molecular marker of artemisinin-resistant Plasmodium falciparum malaria. Nature, 505(7481):50–55, 2014.
OpenUrl CrossRef PubMed Web of Science

[4] [4].↵
James G Baldwin-Brown, Anthony D Long, and Kevin R Thornton. The power to detect quantitative trait loci using resequenced, experimentally evolved populations of diploid, sexual organisms. Molecular biology and evolution, page msu048, 2014.

[5] [5].↵
Rowan D H Barrett, Sean M Rogers, and Dolph Schluter. Natural selection on a major armor gene in threespine stickleback. Science, 322(5899):255–257, 2008.
OpenUrl Abstract/FREE Full Text

[6] [6].↵
Jeffrey E Barrick and Richard E Lenski. Genome dynamics during experimental evolution. Nature Reviews Genetics, 14(12):827–839, 2013.
OpenUrl CrossRef PubMed

[7] [7].↵
Jeffrey E Barrick, Dong Su Yu, Sung Ho Yoon, Haeyoung Jeong, Tae Kwang Oh, Dominique Schneider, Richard E Lenski, and Jihyun F Kim. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature, 461(7268):1243–1247, 2009.
OpenUrl CrossRef PubMed Web of Science

[8] [8].↵
Alan O Bergland, Emily L Behrman, Katherine R O’Brien, Paul S Schmidt, and Dmitri A Petrov. Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. PLoS Genet, 10(11):e1004775, 2014.
OpenUrl CrossRef PubMed

[9] [9].↵
Todd Bersaglieri, Pardis C Sabeti, Nick Patterson, Trisha Vanderploeg, Steve F Schaffner, Jared A Drake, Matthew Rhodes, David E Reich, and Joel N Hirschhorn. Genetic signatures of strong recent positive selection at the lactase gene. The American Journal of Human Genetics, 74(6):1111–1120, 2004.
OpenUrl CrossRef PubMed Web of Science

[10] [10].↵
Jonathan P Bollback and John P Huelsenbeck. Clonal interference is alleviated by high mutation rates in large populations. Molecular biology and evolution, 24(6):1397–1406, 2007.
OpenUrl CrossRef PubMed Web of Science

[11] [11].↵
Jonathan P Bollback, Thomas L York, and Rasmus Nielsen. Estimation of 2Nes from temporal allele frequency data. Genetics, 179(1):497–502, 2008.
OpenUrl Abstract/FREE Full Text

[12] [12].↵
Adam R Boyko, Scott H Williamson, Amit R Indap, Jeremiah D Degenhardt, Ryan D Hernandez, Kirk E Lohmueller, Mark D Adams, Steffen Schmidt, John J Sninsky, Shamil R Sunyaev, and Others. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet, 4(5):e1000083, 2008.
OpenUrl CrossRef PubMed

[13] [13].↵
Molly K Burke, Joseph P Dunham, Parvin Shahrestani, Kevin R Thornton, Michael R Rose, and Anthony D Long. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature, 467(7315):587–590, 2010.
OpenUrl CrossRef PubMed Web of Science

[14] [14].↵
Molly K Burke, Gianni Liti, and Anthony D Long. Standing genetic variation drives repeatable experimental evolution in outcrossing populations of Saccharomyces cerevisiae. Molecular biology and evolution, page msu256, 2014.

[15] [15].↵
P Daborn, S Boundy, J Yen, B Pittendrigh, and Others. DDT resistance in Drosophila correlates with Cyp6g1 over-expression and confers cross-resistance to the neonicotinoid imi-dacloprid. Molecular Genetics and Genomics, 266(4):556–563, 2001.
OpenUrl CrossRef PubMed Web of Science

[16] [16].↵
Rachel Daniels, Hsiao-Han Chang, Papa Diogoye Séne, Danny C Park, Daniel E Neafsey, Stephen F Schaffner, Elizabeth J Hamilton, Amanda K Lukens, Daria Van Tyne, Souleymane Mboup, and Others. Genetic surveillance detects both clonal and epidemic transmission of malaria following enhanced intervention in Senegal. PLoS One, 8(4):e60780, 2013.
OpenUrl CrossRef PubMed

[17] [17].↵
Vincent J Denef and Jillian F Banfield. In situ evolutionary rate measurements show ecological success of recently emerged bacterial hybrids. Science, 336(6080):462–466, 2012.
OpenUrl Abstract/FREE Full Text

[18] [18].↵
Michael M Desai and Joshua B Plotkin. The polymorphism frequency spectrum of finitely many sites under selection. Genetics, 180(4):2175–2191, 2008.
OpenUrl Abstract/FREE Full Text

[19] [19].↵
Richard Durbin, Sean R Eddy, Anders Krogh, and Graeme Mitchison. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press, 1998.

[20] [20].↵
Warren J Ewens. Mathematical Population Genetics 1: Theoretical Introduction, volume 27. Springer Science & Business Media, 2012.

[21] [21].↵
Gregory Ewing and Joachim Hermisson. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics, 26(16):2064–2065, 2010.
OpenUrl CrossRef PubMed Web of Science

[22] [22].↵
Shaohua Fan, Matthew E B Hansen, Yancy Lo, and Sarah A Tishkoff. Going global by adapting local: A review of recent human adaptation. Science, 354(6308):54–59, 2016.
OpenUrl Abstract/FREE Full Text

[23] [23].↵
Alison F Feder, Sergey Kryazhimskiy, and Joshua B Plotkin. Identifying signatures of selection in genetic time series. Genetics, 196(2):509–522, 2014.
OpenUrl Abstract/FREE Full Text

[24] [24].↵
Alison F Feder, Soo-Yon Rhee, Susan P Holmes, Robert W Shafer, Dmitri A Petrov, and Pleuni S Pennings. More effective drugs lead to harder selective sweeps in the evolution of drug resistance in HIV-1. eLife, 5, jan 2016.

[25] [25].↵
Anna-Sophie Fiston-Lavier, Nadia D Singh, Mikhail Lipatov, and Dmitri A Petrov. Drosophila melanogaster recombination rate calculator. Gene, 463(1):18–20, 2010.
OpenUrl CrossRef PubMed Web of Science

[26] [26].↵
Susanne U Franssen, Viola Nolte, Ray Tobler, and Christian Schlötterer. Patterns of linkage disequilibrium and long range hitchhiking in evolving experimental Drosophila melanogaster populations. Molecular biology and evolution, 32(2):495–509, 2015.
OpenUrl CrossRef PubMed

[27] [27].↵
Michael M Gottesman. Mechanisms of cancer drug resistance. Annual review of medicine, 53(1):615–627, 2002.
OpenUrl CrossRef PubMed Web of Science

[28] [28].↵
Matthew Hegreness, Noam Shoresh, Daniel Hartl, and Roy Kishony. An equivalence principle for the incorporation of favorable mutations in asexual populations. Science, 311(5767):1615–1617, 2006.
OpenUrl Abstract/FREE Full Text

[29] [29].↵
Christopher J R Illingworth and Ville Mustonen. Distinguishing driver and passenger mutations in an evolutionary history categorized by interference. Genetics, 189(3):989–1000, 2011.
OpenUrl Abstract/FREE Full Text

[30] [30].↵
Christopher J R Illingworth, Leopold Parts, Stephan Schiffels, Gianni Liti, and Ville Mustonen. Quantifying selection acting on a complex trait using allele frequency time series data. Molecular biology and evolution, 29(4):1187–1197, 2012.
OpenUrl CrossRef PubMed Web of Science

[31] [31].↵
Minako Izutsu, Atsushi Toyoda, Asao Fujiyama, Kiyokazu Agata, and Naoyuki Fuse. Dynamics of Dark-Fly Genome Under Environmental Selections. G3: Genes— Genomes— Genetics, pages g3—-115, 2015.

[32] [32].↵
Aashish R Jha, Cecelia M Miles, Nodia R Lippert, Christopher D Brown, Kevin P White, and Martin Kreitman. Whole-genome resequencing of experimental populations reveals polygenic basis of egg-size variation in Drosophila melanogaster. Molecular biology and evolution, 32(10):2616–2632, 2015.
OpenUrl CrossRef PubMed

[33] [33].↵
Ágnes Jónás, Thomas Taus, Carolin Kosiol, Christian Schlötterer, and Andreas Futschik. Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution. Genetics, aug 2016.

[34] [34].↵
Tadeusz J Kawecki, Richard E Lenski, Dieter Ebert, Brian Hollis, Isabelle Olivieri, and Michael C Whitlock. Experimental evolution. Trends in ecology & evolution, 27(10):547–560, 2012.
OpenUrl CrossRef PubMed

[35] [35].↵
Robert Kofler and Christian Schlötterer. A guide for the design of evolve and resequencing studies. Molecular biology and evolution, page mst221, 2013.

[36] [36].↵
Toshio Kosaka and Kazuo Ikeda. Reversible Blockage of Membrane Retrieval and Endocytosis in the Garland Cell of the Temperature-sensitive. The Journal of cell biology, 97, 1983.

[37] [37].↵
Gregory I Lang, David Botstein, and Michael M Desai. Genetic variation and the fate of beneficial mutations in asexual populations. Genetics, 188(3):647–661, 2011.
OpenUrl Abstract/FREE Full Text

[38] [38].↵
Gregory I Lang, Daniel P Rice, Mark J Hickman, Erica Sodergren, George M Weinstock, David Botstein, and Michael M Desai. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature, 500(7464):571–574, 2013.
OpenUrl CrossRef PubMed Web of Science

[39] [39].↵
Quan Long, Fernando A Rabanal, Dazhe Meng, Christian D Huber, Ashley Farlow, Alexander Platzer, Qingrun Zhang, Bjarni J Vilhjálmsson, Arthur Korte, Viktoria Nizhynska, and Others. Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nature genetics, 45(8):884–890, 2013.
OpenUrl CrossRef PubMed

[40] [40].↵
Anna-Sapfo Malaspinas, Orestis Malaspinas, Steven N Evans, and Montgomery Slatkin. Estimating allele age and selection coefficient from time-serial data. Genetics, 192(2):599–607, 2012.
OpenUrl Abstract/FREE Full Text

[41] [41].↵
Frank Maldarelli, Mary Kearney, Sarah Palmer, Robert Stephens, JoAnn Mican, Michael A Polis, Richard T Davey, Joseph Kovacs, Wei Shao, Diane Rock-Kress, and Others. HIV populations are large and accumulate high genetic diversity in a nonlinear fashion. Journal of virology, 87(18):10313–10323, 2013.
OpenUrl Abstract/FREE Full Text

[42] [42].↵
Nelson E Martins, Vítor G Faria, Viola Nolte, Christian Schlötterer, Luis Teixeira, Élio Sucena, and Sara Magalhães. Host adaptation to viruses relies on few genes with different crossresistance properties. Proceedings of the National Academy of Sciences, 111(16):5938–5943, 2014.
OpenUrl Abstract/FREE Full Text

[43] [43].↵
Iain Mathieson and Gil McVean. Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics, 193(3):973–984, 2013.
OpenUrl Abstract/FREE Full Text

[44] [44].↵
Shalini Nair, Denae Nash, Daniel Sudimack, Anchalee Jaidee, Marion Barends, Anne-Catrin Uhlemann, Sanjeev Krishna, François Nosten, and Tim J C Anderson. Recurrent gene amplification and soft selective sweeps during evolution of multidrug resistance in malaria parasites. Molecular Biology and Evolution, 24(2):562–573, 2007.
OpenUrl CrossRef PubMed Web of Science

[45] [45].↵
Rasmus Nielsen, Scott Williamson, Yuseob Kim, Melissa J Hubisz, Andrew G Clark, and Carlos Bustamante. Genomic scans for selective sweeps using SNP data. Genome research, 15(11):1566–1575, 2005.
OpenUrl Abstract/FREE Full Text

[46] [46].↵
Pablo Orozco-ter Wengel, Martin Kapun, Viola Nolte, Robert Kofler, Thomas Flatt, and Christian Schlötterer. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Molecular ecology, 21(20):4931–4941, 2012.
OpenUrl CrossRef PubMed Web of Science

[47] [47].↵
Tugce Oz, Aysegul Guvenek, Sadik Yildiz, Enes Karaboga, Yusuf Talha Tamer, Nirva Mum-cuyan, Vedat Burak Ozan, Gizem Hazal Senturk, Murat Cokol, Pamela Yeh, and Others. Strength of selection pressure is an important parameter contributing to the complexity of antibiotic resistance evolution. Molecular biology and evolution, page msu191, 2014.

[48] [48].↵
Bo Peng and Marek Kimmel. simuPOP: a forward-time population genetics simulation environment. Bioinformatics, 21(18):3686–3687, 2005.
OpenUrl CrossRef PubMed Web of Science

[49] [49].↵
Edward Pollak. A new method for estimating the effective population size from allele frequency changes. Genetics, 104(3):531–548, 1983.
OpenUrl Abstract/FREE Full Text

[50] [50].↵
Brian J Reid, Rumen Kostadinov, and Carlo C Maley. New strategies in Barrett’s esophagus: integrating clonal evolutionary theory with clinical management. Clinical Cancer Research, 17(11):3512–3519, 2011.
OpenUrl Abstract/FREE Full Text

[51] [51].↵
Silvia C Remolina, Peter L Chang, Jeff Leips, Sergey V Nuzhdin, and Kimberly A Hughes. Genomic basis of aging and life-history evolution in Drosophila melanogaster. Evolution, 66(11):3390–3403, 2012.
OpenUrl CrossRef PubMed Web of Science

[52] [52].↵
Stanley A Sawyer and Daniel L Hartl. Population genetics of polymorphism and divergence. Genetics, 132(4):1161–1176, 1992.
OpenUrl Abstract/FREE Full Text

[53] [53].↵
Christian Schlötterer, R Kofler, E Versace, R Tobler, and S U Franssen. Combining experimental evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation. Heredity, 114(5):431–440, 2015.
OpenUrl CrossRef PubMed

[54] [54].↵
Joshua G Schraiber, Steven N Evans, and Montgomery Slatkin. Bayesian inference of natural selection from allele frequency time series. Genetics, 203(1):493–511, 2016.
OpenUrl Abstract/FREE Full Text

[55] [55].↵
Tatum S Simonson, Yingzhong Yang, Chad D Huff, Haixia Yun, Ga Qin, David J Witherspoon, Zhenzhong Bai, Felipe R Lorenzo, Jinchuan Xing, Lynn B Jorde, and Others. Genetic evidence for high-altitude adaptation in Tibet. Science, 329(5987):72–75, 2010.
OpenUrl Abstract/FREE Full Text

[56] [56].↵
Brad Spellberg, Robert Guidos, David Gilbert, John Bradley, Helen W Boucher, W Michael Scheld, John G Bartlett, John Edwards, Infectious Diseases Society of America, and Others. The epidemic of antibiotic-resistant infections: a call to action for the medical community from the Infectious Diseases Society of America. Clinical Infectious Diseases, 46(2):155–164, 2008.
OpenUrl CrossRef PubMed Web of Science

[57] [57].↵
Matthias Steinrücken, Anand Bhaskar, and Yun S Song. A novel spectral method for inferring general diploid selection from time series genetic data. The annals of applied statistics, 8(4):2203, 2014.
OpenUrl

[58] [58].↵
Wolfgang Stephan, Yun S Song, and Charles H Langley. The hitchhiking effect on linkage disequilibrium between linked neutral loci. Genetics, 172(4):2647–2663, 2006.
OpenUrl Abstract/FREE Full Text

[59] [59].↵
John D Storey and Robert Tibshirani. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16):9440–9445, 2003.
OpenUrl Abstract/FREE Full Text

[60] [60].↵
Jonathan Terhorst, Christian Schlötterer, and Yun S Song. Multi-locus Analysis of Genomic Time Series Data from Experimental Evolution. PLoS Genet, 11(4):e1005069, 2015.
OpenUrl CrossRef PubMed

[61] [61].↵
Ray Tobler, Susanne U Franssen, Robert Kofler, Pablo Orozco-terWengel, Viola Nolte, Joachim Hermisson, and Christian Schlötterer. Massive habitat-specific genomic response in D. melanogaster populations during experimental evolution in hot and cold environments. Molecular biology and evolution, 31(2):364–375, 2014.
OpenUrl CrossRef PubMed Web of Science

[62] [62].↵
Hande Topa, Ágnes Jónás, Robert Kofler, Carolin Kosiol, and Antti Honkela. Gaussian process test for high-throughput sequencing time series: application to experimental evolution. Bioinformatics, page btv014, 2015.

[63] [63].↵
Thomas L Turner, Andrew D Stewart, Andrew T Fields, William R Rice, and Aaron M Tarone. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet, 7(3):e1001336, 2011.
OpenUrl CrossRef PubMed

[64] [64].↵
Jinliang Wang. A pseudo-likelihood method for estimating effective population size from temporally spaced samples. Genetical research, 78(03):243–257, 2001.
OpenUrl CrossRef PubMed Web of Science

[65] [65].↵
Robin S Waples. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics, 121(2):379–391, 1989.
OpenUrl Abstract/FREE Full Text

[66] [66].↵
David Williams and David Williams. Weighing the odds: a course in probability and statistics, volume 548. Springer, 2001.

[67] [67].↵
Ellen G Williamson and Montgomery Slatkin. Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics, 152(2):755–761, 1999.
OpenUrl Abstract/FREE Full Text

[68] [68].↵
Mark A Winters, Robert M Lloyd Jr., Robert W Shafer, Michael J Kozal, Michael D Miller, and Mark Holodniy. Development of elvitegravir resistance and linkage of integrase inhibitor mutations with protease and reverse transcriptase resistance mutations. PloS one, 7(7):e40514, 2012.
OpenUrl CrossRef PubMed

[69] [69].↵
Xin Yi, Yu Liang, Emilia Huerta-Sanchez, Xin Jin, Zha Xi Ping Cuo, John E Pool, Xun Xu, Hui Jiang, Nicolas Vinckenbosch, Thorfinn Sand Korneliussen, and Others. Sequencing of 50 human exomes reveals adaptation to high altitude. Science, 329(5987):75–78, 2010.
OpenUrl Abstract/FREE Full Text

[70] [70].↵
Hiba Zahreddine and K L Borden. Mechanisms and insights into drug resistance in cancer. Front Pharmacol, 4(28.10):3389, 2013.
OpenUrl

[71] [71].↵
Dan Zhou, Nitin Udpa, Merril Gersten, DeeAnn W Visk, Ali Bashir, Jin Xue, Kelly A Frazer, James W Posakony, Shankar Subramaniam, Vineet Bafna, and Gabriel G. Haddad. Experimental selection of hypoxia-tolerant Drosophila melanogaster. Proceedings of the National Academy of Sciences, 108(6):2349–2354, 2011.
OpenUrl Abstract/FREE Full Text

Clear: Composition of Likelihoods for Evolve And Resequence Experiments

Abstract

1 Introduction

2 Materials and Methods

2.1 Estimating Population Size

Likelihood for Neutral Model

2.2 Estimating Selection Parameters

Likelihood for Selection Model

2.3 Empirical Likelihood Ratio Statistics

2.4 Hypothesis Testing

Single-Locus tests

Composite likelihood tests

2.5 Simulations

3 Results

Modeling Allele Frequency Trajectories in Small Populations

Detection Power

Site-identification

Estimating Parameters

Running Time

3.1 Analysis of a D. melanogaster Adaptation to Alternating Temperatures

3.2 Analysis of Outcrossing Yeast Populations

4 Discussion

Software and Data Availability

Conflict of interest

5 S1 Text Choosing Window Size

Acknowledgments

References

Citation Manager Formats

Subject Area