Within-host Mycobacterium tuberculosis evolution: a population genetics perspective

Ana Y. Morales-Arce; Rebecca B. Harris; Anne C. Stone; Jeffrey D. Jensen

doi:10.1101/863894

ABSTRACT

The within-host evolutionary dynamics of TB remain unclear, and underlying biological characteristics render standard population genetic approaches based upon the Wright-Fisher model largely inappropriate. In addition, the compact genome combined with an absence of recombination is expected to result in strong purifying selection effects. Thus, it is imperative to establish a biologically-relevant evolutionary framework incorporating these factors in order to enable an accurate study of this important human pathogen. Further, such a model is critical for inferring fundamental evolutionary parameters related to patient treatment, including mutation rates and the severity of infection bottlenecks. We here implement such a model and infer the underlying evolutionary parameters governing within-patient evolutionary dynamics. Results demonstrate that the progeny skew associated with the clonal nature of TB severely reduces genetic diversity and that the neglect of this parameter in previous studies has led to significant mis-inference of mutation rates. As such, our results suggest an underlying de novo mutation rate that is considerably faster than previously inferred, and a progeny distribution differing significantly from Wright-Fisher assumptions. This inference largely reconciles the seemingly contradictory observations of both rapid drug-resistance evolution but extremely low levels of genetic variation in both resistant and non-resistant populations.

INTRODUCTION

Tuberculosis (TB) is a public health threat worldwide (WHO, 2018). Despite clear motivation for study, the observed within- and between-host evolutionary dynamics of Mycobacterium tuberculosis (M.TB) are not well understood, and results to date represent something of a paradox. On the one hand, drug resistance evolves rapidly (Fonseca et al. 2015; Eldholm et al. 2015); on the other, the genomic characteristics of M.TB do not appear conducive for such rapid adaptation, with inferred mutation rates being amongst the slowest of any human pathogen (Rocha et al. 2006; Ford et al. 2011; Ford et al. 2013; Colangeli et al. 2014; Payne et al. 2019; Menardo et al. 2019) and remarkably little genetic variation observed within or between hosts. Furthermore, purifying selection has been argued to play both a dominant as well as a weak role in shaping patterns of variation (Hershberg et al. 2008; Pepperell et al. 2013), and demographic estimates suggest a population history of TB that either matches or is uncorrelated with that of its human host (Comas et al. 2013; Bos et al. 2014; Brites et al. 2015; Eldholm et al. 2016).

To obtain a more robust understanding of TB evolutionary dynamics, it is essential to first appreciate that between-population observations are simply an aggregation of within-population processes. As such, studying the population genetics of within-patient data is critical to understanding the genetic differences observed between patients as well as their treatment outcomes. Fortunately, recent advances in sequencing technologies have allowed for more abundant and higher quality within-patient data. These published datasets have revealed a few common features of M.TB, including low-levels of genome-wide variation. For instance, Trauner et al. (2017) deep-sequenced twelve patients across four-time points and observed fewer than 50 polymorphic sites per patient genome-wide. In addition, the observed site frequency spectrum (SFS) is generally characterized by an abundance of rare variants (i.e., it is strongly left-skewed). These patterns have partly led to the suggestion that purifying selection effects may be wide-spread in the M.TB genome (Brown et al. 2016; Phelan et al. 2016; Mortimer et al. 2018).

Additional evolutionary factors likely contribute to these genomic patterns as well. For example, population bottlenecks may reduce genetic variation and alter the shape of the SFS (see review Thornton et al. 2007). Previous M.TB studies have investigated these effects separately in both the deep-time view of the population bottleneck and subsequent growth experienced by the host human population (Hershberg et al. 2008, Liu et al. 2018), as well as the shallow-time view of the population bottleneck and subsequent growth characterizing each novel transmission event and treatment (e.g., Trauner et al. 2017). Additionally, in fitting the left-skewed SFS, Pepperell et al. (2013) found that such a demographic history combined with a mix of both deleterious and neutrally-evolving sites produced the nearest fit to the observed SFS. Finally, given the lack of recombination in M.TB, related linkage effects (i.e., background selection (Charlesworth et al. 1993)) have similarly been discussed within these contexts (Pepperell et al. 2010; Copin et al. 2016).

While these studies have provided many important insights, there remains a relatively unexplored, though potentially highly significant, effect: clonality. Indeed, clonality and the related progeny distribution represents an important violation of commonly used evolutionary inference approaches based upon the Wright-Fisher (WF) model and the related Kingman coalescent (Eldon et al. 2006; Dos Vultos et al. 2008; Huillet et al. 2011; Lapierre et al. 2016). Specifically, progeny distributions under the WF model are Poisson distributed with a mean and variance of 1. Therefore, when an individual produces many offspring, far in excess of simple replacement in the next generation, the assumption that only two lineages coalesce at a time is violated, resulting in multiple-merger coalescent (MMC) events (see reviews of Tellier & Lemaire 2014; Irwin et al. 2016).

While perhaps abstract at first blush, this violation has very important implications for the study of sequence variation and diversity. Namely, as M.TB has been found to exhibit strong progeny skew owing to obligate clonal reproduction (Baker et al. 2004; Dos Vultos et al. 2008), the null model against which the above studies are comparing becomes incorrect. For example, under a multiple-merger model, the effective population size (N_e) no longer scales linearly with census size (N) as it does under the Kingman coalescent (Huillet et al. 2011). As a result, genetic diversity is a nonlinear function of the underlying population size - a result of interest given the strongly constrained and similar levels of variation observed across TB patients, regardless of infection time or resistance status. Similarly, under these progeny-skew models, the SFS is skewed towards an excess of low-frequency variants, generating a negative Tajima’s D even under equilibrium neutrality (Eldon et al. 2006; Birkner et al. 2013; Blath et al. 2016) – which appears of relevance to M.TB populations given the pervasively left-skewed SFS observed both within and between TB patients. Finally, the fixation probability of beneficial mutations under progeny skew may become much larger than under the WF model, owing to the increased probability of rapidly escaping stochastic loss (Der et al. 2011). This is fundamental to understanding the rapidly and independently evolving drug-resistance mutations in global TB populations - a result seemingly at odds with the previously inferred mutation rates (Sherman et al. 2011; Colangeli et al. 2014; Duchêne et al. 2016). In sum, the general theoretical expectations owing to progeny skew alone appear to qualitatively match empirical observations from M.TB; observations which, to date, have been attributed to alternate processes.

Recent progress has been made in utilizing these models to disentangle and even co-estimate patterns of demography, progeny skew, and selection. While there exist a variety of potential MMC models (see review of Tellier & Lemaire 2014), the so-called Ψ-coalescent has been a major focus of this literature given the straight-forward biological interpretation. Namely, the parameter Ψ represents the proportion of the next generation arising from a single parent (e.g., Ψ = 0.05 implies that one individual contributes offspring that comprise 5% of the next generation). In addition to which, recent experimental measures from viral populations are offering real-time insights in to such progeny distributions (Vahey and Fletcher 2019). Three results are of particular importance here. First, Eldon et al. (2015) demonstrated that population growth may be distinguished from multiple-merger coalescent events owing to progeny-skew, given differing expectations in the SFS. Second, Matuszewski et al. (2018) derived analytical expectations for the SFS under a multiple merger coalescent model with changing population size and further demonstrated that these parameters can indeed be accurately inferred jointly within a likelihood framework. Finally, building upon the two above results as well as the approximate Bayesian statistical framework developed by Foll et al. (2014, 2015), Sackman et al. (2019) recently extended these results and demonstrated an ability to co-estimate progeny skew, effective population size, as well as per-site selection coefficients from time-sampled polymorphism data.

Thus, a tremendous opportunity now exists to understand better the impact of mutation, genetic drift, and selection in dictating patterns of M.TB evolution and genomic variation under this more realistic coalescent model accounting for the underlying progeny distributions inherent to clonal reproduction. While the approximate Bayesian approach of Sackman et al. (2019) would appear ideal for this purpose, it is a time-sampled estimator reliant upon considerable levels of segregating variation in order to track changing allele frequencies (as is commonly observed in viral populations, for example). Thus, this approach is under-powered given the minimal levels of variation observed in M.TB. Further, as the underlying mutation rate itself is a question of great interest and importance in M.TB (Payne et al. 2019), it is desirable to additionally co-estimate this parameter rather than assume it to be known.

With this motivation, we developed a novel statistical approach utilizing the insights described above pertaining to infection dynamics and widespread purifying selection, while overlaying inference of underlying mutation rates, progeny distributions, and the severities of infection-related population bottlenecks. By re-analyzing published data within this framework, our results demonstrate the great importance of the multiple merger consideration and resolve many of the evolutionary paradoxes surrounding M.TB.

MATERIALS AND METHODS

Simulations

We conducted forward-in-time simulations using the SLIM version 3 software package (Haller & Messer 2019). M.TB populations were modeled using a genome size of 441,153 kb, equivalent to a tenth portion of the true genome size, for computational efficiency. As M.TB is a compact, highly-coding genome (Cole et al. 1998; Fleischmann et al. 2002), we assumed a distribution of fitness effects (DFE) characterized equally by deleterious (s = −0.01) and nearly neutral (s=-0.001) mutations. Mutation rate (μ) measured in vitro has been reported to be as slow as 2e-10 (Ford et al. 2011). In contrast, higher estimates ranging from 1e-9 to 9e-6 (Ford et al. 2013) have been proposed; therefore, our study considered the full extent of this range. Furthermore, it is important to note that previous experiments have measured only the neutral mutation rate, not the total mutation rate. In other words, the large input of strongly deleterious mutations - comprising a substantial component of the total mutation rate - has not been included in earlier estimates as these mutations are unlikely to be sampled as segregating variation or as fixed differences. However, as these mutations are important for shaping diversity via both purifying selection and background selection effects, and as our interest is in understanding the total rate at which all de novo mutations are input into the population, we considered the total rather than the neutral mutation rate. In order to infer this parameter within the context of an appropriate progeny-skew model, μ was drawn from a prior uniform distribution between 1e-9 and 9e-6 per site per generation.

Following an initial burn-in period of 10N generations, we considered a three-stage demographic model characterizing a single patient infection: moving forward in time, we describe 1) a neutral equilibrium population of size N, 2) an initial infection bottleneck leading to an instantaneous population reduction to size N2, and 3) a subsequent population size recovery to size N. In stage 1, we modeled a population of size N = 1,000. In order to quantify the effects of underlying assumptions pertaining to population size, additional simulations and inference were performed at N = 25,000. During stage 2, the severity of the population bottleneck (β) was sampled from ∼U[0.001, 0.1], where N2 = N*β - as the distribution of infection size in humans is unknown. However, it has been reported that in cattle TB (M. bovis) infection can be established by a single cell forming unit (Dean et al. 2005). During stage 2, the degree of progeny skew (or Ψ) was sampled from a prior distribution of ∼U[0, 0.2]. A value of 0 corresponds to the standard WF model. Progeny skew was simulated following the procedure of Sackman et al. 2019. In brief, one individual is chosen from the primary population A and founds a separate subpopulation B, the single generation unidirectional migration rate from B to A is set to Ψ, and the chosen individual thus contributes NΨ offspring to the following generation of A. A series of mate choice callbacks in SLiM force the migration rate to be exact rather than stochastic (see Supplementary Materials of Sackman et al. 2019). Subpopulation B is removed, the next generation begins, and a new individual is randomly sampled for the following generation. As such, each generation is a combination of N(1-Ψ) replacement events and a single sweepstakes event of magnitude NΨ. To emulate patient sampling at the onset of symptoms (approximately three months minimum (Behr et al. 2018)), we allowed stage 2 to run for 90 generations, assuming a generation time of 24 hours (Cole et al. 1998), and stage 3 to run for 910 generations before outputting genome alignments in ms format. Thus, the total generation time of our model was 11N.

Drawing from these prior distributions, 10,000 points (i.e., parameter combinations) were sampled. For each parameter combination, we conducted 1,000 replicates in order to characterize both the mean and variance. Summary statistics were calculated in the R package PopGenome version 2.6.1 (Pfeifer et al. 2014).

Data analysis and joint posterior estimates

For comparison to patient data, we examined the distribution and mean of segregating sites in samples published by Trauner et al. (2017). (https://zenodo.org/record/322377#.XO2CAy2ZNBw). A subset of 1,000 populations was simulated under the conditions described above. Given that patient data are subject to stringent filtering criteria, removing SNPs under a particular frequency cut-off, it was necessary to filter the simulated data to allow fair comparison. Thus, 100 genomes were sampled per each simulated population, and variants < 2% frequency were filtered out before the calculation of summary statistics (Table S1). Afterward, only genomes with ≥ 2 segregating sites were considered for further analyses, and the fraction of “invariable genomes” was noted for comparison to patient samples (Figure 2 and Table S1).

View this table:

Table S1. Simulation results.

Parameters simulated, ordered by descending mutation rate (μ) and corresponding values of progeny skew (Ψ) and bottleneck severity (N2). Given are the mean and standard deviation of the summary statistics from 1000 replicates.

RESULTS & DISCUSSION

Considering levels of variation

We here report the first joint consideration of mutation rate, purifying selection, infection history, and progeny-skew in M.TB populations. A correlation of summary statistics (Figure 1) demonstrates that while mutation rate (μ) increases the number of segregating sites as expected, progeny skew (Ψ) acts to reduce variation. Furthermore, as the Ψ-parameter is of relevance every generation (i.e., every reproduction event), the impact of this previously unconsidered progeny skew on levels of variation is in fact much stronger than the single bottleneck event associated with infection. Considering the full distribution of sampled μ values (Figure S1), it is apparent that Ψ drastically reduces the average proportion of segregating sites genome-wide even for fast mutation rates, and that the probability of producing genomes devoid of any variation will naturally increase as μ decreases.

Figure S1. Log scale distribution of segregating sites in unfiltered simulated populations as a function of mutation rate (μ) and progeny skew (Ψ).

For each parameter combination of μ and Ψ drawn from the prior (10,000 in total), 1,000 replicates were simulated, with the mean given by the black dot and the standard deviation given by the gray bars. Ψ proportions are given on the X-axis, while the proportion of segregating sites observed in the genome are given on the Y-axis. Each panel corresponds to a different order of magnitude of mutation rate: (Α) 1e-6 to 9e-6, (B) 1e-7 to 9e-7, (C) 1e-8 to 9e-8, and (D) 1e-9 to 9e-9. The colored lines correspond to two examples of the proportion of segregating sites observed genome-wide in empirical patient data: 20 segregating sites as a mean (orange) and 50 segregating sites as the maximum observed from patient_10 (blue) (Trauner et al. 2017).

Figure 1. Correlation heatmap of parameters and summary statistics.

Correlations are given between the parameters of interest (mutation rate (μ), progeny skew (Ψ), and bottleneck severity (N2)), and summary statistics (the mean and variance of the level of variation as measured by the number of segregating sites, and the mean and variance of the distribution of variation as measured by Tajima’s D). As shown, N2 values do not correlate with any of the summary statistics, as the effect of the single generation bottleneck is swamped by the per-generation reproductive skew. Further, as expected, values of μ positively correlate with the number of segregating sites, while Ψ acts to reduce variation and is thus negatively correlated. Finally, while μ would not be expected to strongly correlate with the shape of the SFS (here summarized by Tajima’s D) for neutral mutations, it does so here given that we explicitly account for the input of deleterious mutations (see Methods).

In order to consider a range of μ consistent with observed data - once pervasive purifying selection and progeny skew have been taken in to account - two examples of patient data collected by Trauner et al. (2017) representing low (20 segregating sites genome-wide) and high (50 segregating sites genome-wide) variation samples were plotted and compared to the simulated data. In order to compare simulated data with patient data, the same filtering steps must be applied. In this case, SNPs under 2% frequency were filtered in the empirical data, and thus, the simulated data were similarly filtered in order to be comparable (Figure 2).

Figure 2. Log scale distribution of segregating sites above 2% frequency, as a function of mutation rate (μ) and progeny skew (Ψ).

For each parameter combination (1,000 in total) of μ and Ψ drawn from the prior distributions, 1,000 replicates were simulated, with the mean given by the black dot and the standard deviation given by the gray bars. Each panel corresponds to a different order of magnitude of mutation rate range: (Α) 1e-6 to 9e-6, (B) 1e-7 to 9e-7, (C) 1e-8 to 9e-8, and (D) 1e-9 to 9e-9. The colored lines correspond to two examples of the proportion of segregating sites observed genome-wide in empirical patient data: 20 segregating sites as a mean (orange), and 50 segregating sites from patient_10 (blue) (Trauner et al. 2017). As shown, the range of segregating sites for the fast mutation rates (panels A and B), result in expectations much larger than that observed in patient data, regardless of Ψ. Conversely, the slowest mutation rate (panel D), result in too little variation, except under WF conditions (i.e., Ψ near 0) which are known to be violated given clonality. Thus, rates on the order of 1e-8 to 9e-8 (panel C) appear to well explain the range of variation observed in patient data, and further imply values of Ψ ranging roughly from 0.05 to 0.1, consistent with values previously estimated for within-host virus data (Sackman et al. 2019).

As shown, fast mutation rates (μ on the order of 1e-6 and 1e-7) routinely produce expected numbers of segregating sites far above that observed in patients, regardless of other parameter values, while slow mutation rates (μ on the order of 1e-9) generally result in too little variation to match observation. This result is of interest as M.TB mutation rates are generally believed to be exceedingly slow (Sherman et al. 2011) - though importantly, this inference has largely neglected the important contribution of these additional evolutionary processes. Thus, once accounting for the diversity-reducing effects inherent to clonality, as well as the extent of purifying selection effects inherent to a compact, non-recombining genome, it is evident that the de novo mutation rate is likely faster than previously believed, with mutation rates on the order 1e-8 well matching the range of observed data (Figure 2). In addition, in order to consider the impact of underlying assumptions pertaining to population size, simulations were re-performed with a 25x larger population size. As shown in Figure S2, owing to these diversity reducing effects, mutations rates on the order of 1e-8 remain the best fit to observed levels of variation, with slower mutation rates still producing too little variation and too many invariable genomes to be consistent with patient data.

Figure S2. Log scale distribution of segregating sites in filtered simulated populations of N= 25,000 as a function of mutation rate (μ) and progeny skew (Ψ).

For each parameter combination of μ and Ψ drawn from the prior (100 in total), 100 replicates were simulated, with the mean given by the black dot and the standard deviation given by the gray bars. Ψ proportions are given on the X-axis, while the proportion of segregating sites observed in the genome are given on the Y-axis. Each panel corresponds to a different order of magnitude of mutation rate: (Α) 1e-8 to 9e-8, and (B) 1e-9 to 9e-9. The colored lines correspond to two examples of the proportion of segregating sites observed genome-wide in empirical patient data: 20 segregating sites (orange) and 50 segregating sites (blue) (Trauner et al. 2017). As shown, inference remains consistent with the results of Figure 2, even under this 25x larger population size.

Considering distributions of variation

For the general range of μ identified above, the number of genome-wide segregating sites in simulated population data ranged from a minimum average of 2.1 to a maximum average of 78.7 SNPs (Table S1), depending on the combination of μ and Ψ drawn from the priors. Specifically, higher values of μ may be off-set by higher values of Ψ, resulting in a similar number of segregating sites for multiple parameter combinations. For example, for the patient sample containing 10 SNPs, simulation results demonstrate that μ = 8.13e-08 and a Ψ =0.06 would produce an average of 10.15 +/-4.64 SNPs, potentially suggesting a good fit to the data. However, μ = 3.1e-08 and a Ψ =0.02 can also generate similar results, yielding 10.40 +/-4.45 SNPs. More generally, this ridge in the posterior distributions (Figure 3A) between these two parameters suggests that they will be difficult to estimate independently if only levels of variation are used.

Figure 3. Posterior parameter estimates pertaining to patient data.

(A) Joint posterior distribution for the parameters μ and Ψ. Solid contour lines specify the highest posterior density intervals. Owing to the diversity-increasing effect of μ and diversity-decreasing effect of Ψ, there exists a ridge in the joint posterior. Regardless, owing to differing expectations in the SFS along this ridge, inference suggests μ values within the 1e-8 range in combination with Ψ < 0.10. (Β) Posterior density for the severity of the infection bottleneck (N2). The X-axis gives the number of genomes at the time of infection reduced from 1,000. While the posterior distribution is non-uniform, the observation that all tested values remain consistent with patient data strongly suggests that there is not sufficient information in the data to estimate this third parameter (i.e., size of the bottleneck, in addition to μ and Ψ).

Thus, while comparisons with general levels of variation are useful for identifying a range as in the above section, more information is needed to parse values further. Importantly, previous theoretical results (Eldon & Wakeley 2016; Matuszewski et al. 2018) have well described the effect of Ψ on the observed distribution of genetic variation (i.e., SFS). To utilize this information, a general summary of the SFS, Tajima’s D (1989), was calculated on the filtered simulated data. As shown (Figure S3), the shape of the SFS, and thus the value of the D- statistic, is related to the value of Ψ. As the degree of progeny skew initially increases, D becomes increasingly negative, as previously described. However, as progeny distributions become highly skewed, levels of variation are sharply reduced, resulting in an apparent increase in D values (Figure S3). Increasing D values could also be a result of the underlying filtering criteria, as after filtering simulations to match real data with segregating sites > 2%, D values increased proportionally (Figure S4). However, Tajima’s D is consistently negative in all mutation ranges, even after filtering.

Figure S3. The shape of the site frequency spectrum in unfiltered simulated populations, as summarized by Tajima’s D, as a function of mutation rate (μ) and progeny skew (Ψ).

For each parameter combination of μ and Ψ drawn from the prior distributions (10,000 in total), 1,000 replicates were simulated, with the mean given by the black dot and the standard deviation given by the gray bars. Ψ proportions are given on the X-axis, while the mean Tajima’s D is given on the Y-axis. Each panel corresponds to a different order of magnitude of mutation rate: (Α) 1e-6 to 9e-6, (B) 1e-7 to 9e-7, (C) 1e-8 to 9e-8, and (D) 1e-9 to 9e-9. As shown, the distribution of variation expected to be observed in the genome differs as a function of these two parameters - with a strong excess of rare mutations (negative D) observed as the WF model is initially violated at small values of Ψ, with a slight recovery towards 0 (and larger standard deviations) observed as Ψ gets larger, owing to the rapidly decreasing number of segregating sites associated with increased progeny skew.

Figure S4. The shape of the site frequency spectrum (SFS), as summarized by Tajima’s D, in simulated populations after filtering out segregating sites under 2% frequency, as a function of mutation rate (μ) and progeny skew (Ψ).

1,000 parameters combinations were used. For each parameter combination of μ and Ψ drawn from the prior distributions, 1,000 replicates were simulated, with the mean given by the black dot and the standard deviation given by the gray bars. Ψ proportions are given on the X-axis, while the mean Tajima’s D is given on the Y-axis. Each panel corresponds to a different order of magnitude of mutation rate: (Α) 1e-6 to 9e-6, (B) 1e-7 to 9e-7, (C) 1e-8 to 9e-8, and (D) 1e-9 to 9e-9. As shown, the distribution of variation expected to be observed in the genome differs as a function of these two parameters - with a strong excess of rare mutations (negative D) observed as the WF model is initially violated at small values of Ψ, with a slight recovery towards 0 (and larger standard deviations) observed as Ψ gets larger, owing to the rapidly decreasing number of segregating sites associated with increased progeny skew.

Parameter inference from patient samples

Thus, we next considered these results in light of published patient data. Recent publications have suggested that NGS technologies could facilitate personalized treatment in TB patients, allowing for improved outcomes (Copin et al. 2016; Cancino-Muñoz et al. 2019). To utilize such data, however, it is vital to understand the evolutionary dynamics shaping within-host M.TB diversity. As an illustrative example, we have re-examined the number of segregating sites in patient samples and estimated a mean ∼10 segregating sites per sample (Trauner et al. 2017) (Figure S5). Further, using the results and expectations obtained above regarding the level and distribution of variation, the patient data appear best fit by simulated populations with μ values ranging from 7.3e-9 to 3.8e-7, with the strongest posterior density at μ ∼ 6e-8 and Ψ ∼0.06 (Figure 3A). Notably, faster μ values could produce similar results, but only if Ψ proportions are in excess of 0.15 (Figure 3A).

Figure S5. The distribution of segregating sites in real versus simulated datasets.

Comparison of the number of segregating sites observed from (A) patient samples (Trauner et al. 2017), (B) simulated datasets with μ = 3.1e-8 and Ψ = 0.02, resulting in the same mean number of segregating sites as observed in patients, and (C) simulated datasets with μ = 3.7e-7 and Ψ = 0.18, also resulting in the same mean number of segregating sites as observed in patients. Thus, these two point estimates essentially demarcate the boundaries on the joint posterior ridge (Figure 3).

In addition, owing to the strong per-generation reductions associated with progeny skew, there remains no signal in the data to estimate the severity of the infection bottleneck accurately (Figure 3B). Specifically, while there is an increase in density towards stronger bottleneck values, the posterior distribution is not notably distinct from the prior distribution ∼U[0.001, 0.1] (Figure 3B). Apart from the population size reduction associated with infection, these results have important implications for the ability to detect other parameters of clinical interest - namely, the presence of selective sweeps associated with beneficial mutations (e.g., potentially owing to drug-resistance). First, while there is strong statistical power to infer both μ and Ψ from patient data, there is little power to detect isolated events in the past (Figure 3B). This result, though unexpected under standard WF assumptions, is intuitive given the non-WF progeny distributions related to clonality. Namely, the diversity reduction associated with a single bottleneck event multiple generations in the past is not discernible from the per-generation diversity reduction related to clonal reproduction. Further, given that a selective sweep is, in fact, a type of population bottleneck (Barton 1998), this result also demonstrates that detecting selective sweeps associated with resistance mutations in this non-recombining organism, based on levels and patterns of genomic variation, will be exceedingly difficult. However, this observation well reconciles the fact that levels of variation do not appear significantly different between resistant and non-resistant M.TB patient populations (Trauner et al. 2017) - that is, these additional evolutionary processes are shaping variation so strongly, that the presence or absence of a resistance-associated selective sweep does not result in strongly differentiable expectations.

Implications for characterizing the history of TB in humans

A topic of wide-spread interest in the literature pertains to the history of TB in the human host. This inference has primarily been made within a phylogenetic context, relying on the construction of a single consensus sequence per patient. While such a comparison of consensus sequences can be highly mis-leading when making evolutionary inference (see Renzette et al. 2017), these age-estimates also inherently rely on an accurate knowledge of mutation rates in order to invoke the ‘clock-like’ accumulation of neutral mutations as a proxy for time (Menardo et al. 2019). As our results demonstrate that previous mutation rate estimates have likely been downwardly-biased, it is of interest to consider what these revised mutation rates would imply for this evolutionary history. However, there are at least three difficulties in directly comparing population-level estimates with previous consensus-based phylogenetic inference. Firstly, estimates are generally given per year, whereas the preferred evolutionary rate is per generation (as given here). There is support for one generation per day as a conversion (Cole et al. 1998), though further study is necessary to quantify the correct scaling factor. Secondly, when invoking a divergence-based clock, previous studies are measuring the neutral mutation rate, given that the rate of mutation is equivalent to the rate of divergence for neutral mutations only (Kimura 1968). However, we are here interested in the total mutation rate (that is, the rate at which neutral and non-neutral mutations arise per generation); therefore, our rate must be parsed into neutral and non-neutral components to enable appropriate comparison. Similar to the first point, additional research is necessary in order to better quantify the distribution of fitness effects in M.TB, as understanding the fraction of total mutations represented by neutral mutations is necessary for the conversion. Finally, we here consider the alleles segregating within a population for inference (i.e., within a patient), whereas previous studies often call a consensus sequence per patient (i.e., per population) and make inferences based on a collection of such consensus sequences. Such a summary of population-level variation into a single sequence is difficult to interpret, though what is evident is that a great majority of rare alleles will be neglected, and thus only a small subset of total variation (i.e., common alleles) will be considered (Renzette et al. 2017). As such, we propose that future evolutionary inference pertaining to TB would benefit tremendously from a full consideration of within-patient diversity, as we demonstrate here. In sum, any direct comparison with consensus-based phylogenetic age estimates would be overly speculative at this juncture.

CONCLUSIONS

TB patient infection dynamics have remained enigmatic. We here argue that much of the difficulty in interpreting patterns of variation and evolution has owed to an inappropriate underlying null model, relying on classical expectations developed for organisms with very different underlying biological properties. Fortunately, recent theoretical extensions in non-WF and alternative coalescent models, more appropriate for clonally reproducing organisms, have created an opportunity to revisit existing M.TB patient data. By accounting for the pervasive purifying selection effects associated with this non-recombining, highly-coding genome, as well as the skewed progeny distributions inherent to clonal reproduction, we have provided improved insights into the evolutionary dynamics shaping within-host variation. Further, through a consideration of these diversity-reducing effects, results suggest an underlying de novo mutation rate that is considerably faster than previously inferred. This largely reconciles the seemingly contradictory observations of both rapid resistance evolution, but extremely low levels of population variation. Namely, the population mutation rate may indeed be sufficiently fast to provide a steady input of beneficial mutations, explaining the rapid resistance evolution clinically observed. However, recurrent purifying selection and progeny skew act together to rapidly eliminate segregating variation from the population, reconciling the minimal levels of variation observed as well as the general homogeneity in levels and distributions of variation in both resistant and non-resistant patient samples alike. Furthermore, the role of these per-generation evolutionary forces in shaping patterns of variation is sufficiently strong that periodic events, including the infection-associated bottleneck and selective sweeps centered on drug-resistance mutations, will be challenging to detect and quantify on top of these more common processes.

LITERATURE CITED

1.↵
Baker, L., T. Brown, M. C. Maiden, and F. Drobniewski. 2004. Silent nucleotide polymorphisms and a phylogeny for Mycobacterium tuberculosis. Emerg Infect Dis. 10: 1568–1577.
OpenUrl CrossRef PubMed Web of Science
2.↵
Barton, N. H. 1998. The effect of hitch-hiking on neutral genealogies. Genet Res Camb.72(2): 123–33.
OpenUrl CrossRef Web of Science
3.↵
Behr, M. A., P. H. Edelstein, and L. Ramakrishnan. 2018. Revisiting the timetable of tuberculosis. BMJ. 362: k2738.
OpenUrl FREE Full Text
4.↵
Birkner, M., J. Blath, and B. Eldon. 2013. Statistical properties of the site-frequency spectrum associated with Λ-coalescents. Genetics. 195: 1037–53.
OpenUrl Abstract/FREE Full Text
5.↵
Blath, J., M. Cronjäger Christensen, B. Eldon, and M. Hammer. 2016. The site-frequency spectrum associated with Ξ-coalescents. Theor Popul Biol. 110: 36–50.
OpenUrl CrossRef
6.↵
Bos, K. I., K. M. Harkins, A. Herbig, M. Coscolla, N. Weber et al. 2014. Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis. Nature. 514: 494–497.
OpenUrl CrossRef PubMed Web of Science
7.↵
Brites, D., and S. Gagneux. 2015. Co-evolution of Mycobacterium tuberculosis and Homo sapiens. Immunol Rev. 264: 6–24.
OpenUrl CrossRef
8.↵
Brown, T. S., A. Narechania, J. R. Walker, P. J. Planet, P. J. Bifani, et al. 2016. Genomic epidemiology of Lineage 4 Mycobacterium tuberculosis subpopulations in New York city and New Jersey, 1999–2009. BMC Genomics. 17: 947.
OpenUrl
9.↵
Cancino-Muñoz, I., M. Moreno-Molina, V. Furió, G. A. Goig, M. Torres-Puente, et al. 2019. Cryptic resistance mutations associated with misdiagnoses of multidrug-resistant tuberculosis. J Infect Dis. 220: 316–320.
OpenUrl
10.↵
Charlesworth, B., M. T. Morgan, and D. Charlesworth. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics. 134: 1289–1303.
OpenUrl Abstract/FREE Full Text
11.↵
Colangeli, R., V. L. Arcus, R.T. Cursons, A. Ruthe, N. Karalus, et al. 2014. Whole genome sequencing of Mycobacterium tuberculosis reveals slow growth and low mutation rates during latent infections in humans. PLoS One. 9: 1–9.
OpenUrl CrossRef PubMed
12.↵
Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, et al. 1998. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 393: 537–544.
OpenUrl CrossRef PubMed Web of Science
13.↵
Comas, I., M. Coscolla, T. Luo, S. Borrell, K. E. Holt, et al. 2013. Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat Genet. 45:1176–1182.
OpenUrl CrossRef PubMed
14.↵
Copin, R., X. Wang, E. Louie, V. Escuyer, M. Coscolla et al. 2016. Within host evolution selects for a dominant genotype of Mycobacterium tuberculosis while T cells increase pathogen genetic diversity. PLoS Pathog. 12: e1006111.
OpenUrl CrossRef
15.↵
Dean, G. S., S. G. Rhodes, M. Coad, A. O. Whelan, P. J. Cockle, et al. 2005. Minimum effective dose of Mycobacterium bovis in cattle. Infect Immunity. 73: 6467–6471.
OpenUrl Abstract/FREE Full Text
16.↵
Der, R., C. L. Epstein, and J. B. Plotkin. 2011. Generalized population models and the nature of genetic drift. Theor Popul Biol. 80: 80–99.
OpenUrl CrossRef PubMed
17.↵
Dos Vultos, T. O., Mestre, J. Rauzier, M. Golec, N. Rastogi, et al. 2008. Evolution and diversity of clonal bacteria: The paradigm of Mycobacterium tuberculosis. PLoS One. 3: e1538.
OpenUrl CrossRef PubMed
18.↵
Duchêne, S., K. E. Holt, F. X. Weill, S. Le Hello, J. Hawkey, et al. 2016. Genome-scale rates of evolutionary change in bacteria. Microb Genomics. 2: 1–12.
OpenUrl
19.↵
Eldon, B., and J. Wakeley. 2006. Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics. 172: 2621–33.
OpenUrl Abstract/FREE Full Text
20.↵
Eldon, B., M. Birkner, J. Blath, and F. Freund. 2015. Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents? Genetics. 199: 841–56.
OpenUrl Abstract/FREE Full Text
21.↵
Eldholm, V., J. Monteserin, A. Rieux, B. Lopez, B. Sobkowiak, et al. 2015. Four decades of transmission of a multidrug-resistant Mycobacterium tuberculosis outbreak strain. Nat Commun. 6: 1–9.
OpenUrl CrossRef PubMed
22.↵
Eldholm, V., J. H. O. Pettersson, O. B. Brynildsrud, A. Kitchen, E.M. Rasmussen, et al. 2016. Armed conflict and population displacement as drivers of the evolution and dispersal of Mycobacterium tuberculosis. Proc Natl Acad Sci U S A. 113:13881–6.
OpenUrl Abstract/FREE Full Text
23.↵
Fleischmann, R. D., D. Alland, J. A. Eisen, L. Carpenter, O. White, et al. 2002. Whole-Genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol. 184: 5479–5490.
OpenUrl Abstract/FREE Full Text
24.↵
Foll, M., Y. P. Poh, N. Renzette, A. Ferrer-Admetlla, C. Bank, et al. 2014. Influenza virus drug resistance: a time-sampled population genetics perspective. PLoS Genet. 10: e1004185.
OpenUrl CrossRef PubMed
25.↵
Foll, M., H. Shim, and J. D. Jensen. 2015. WFABC: A Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Mol Ecol Resour. 15(1): 87–98.
OpenUrl CrossRef PubMed
26.↵
Fonseca, J. D., G.M. Knight, and T. D. McHugh. 2015. The complex evolution of antibiotic resistance in Mycobacterium tuberculosis. Int J Infect Dis. 32: 94–100.
OpenUrl CrossRef PubMed
27.↵
Ford, C.B., P. L Lin, M. Chase, R.R. Shah, O. Iartchouk, et al. 2011. Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection. Nat Genet. 43(5): 482–486.
OpenUrl CrossRef PubMed
28.↵
Ford, C. B., R.R. Shah, M. Kato-Maeda, S. Gagneux, M. B. Murray et al. 2013. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nat Genet. 45: 784–790.
OpenUrl CrossRef PubMed
29.↵
Haller, B. C., and P.W. Messer. 2019. SLiM 3: Forward genetic simulations beyond the Wright– Fisher model. Mol Biol Evol. 36: 632–637.
OpenUrl CrossRef
30.↵
Hershberg, R., M. Lipatov, P. M. Small, H. Sheffer, S. Niemann et al. 2008. High functional diversity in Mycobacterium tuberculosis driven by genetic drift and human demography. PLoS Biol. 6: e311.
OpenUrl CrossRef PubMed
31.↵
Huillet, T., and M. Möhle. 2011. Population genetics models with skewed fertilities: A forward and backward analysis. Stoch Model. 27: 521–554.
OpenUrl
32.↵
Irwin, K. K., S. Laurent, S. Matuszewski, S. Vuilleumier, L. Ormond, et al. 2016. On the importance of skewed offspring distributions and background selection in virus population genetics. Heredity. 117: 393–399.
OpenUrl CrossRef
33.↵
Kimura, M. 1968. Evolutionary rate at the molecular level. Nature. 217: 624–626.
OpenUrl CrossRef PubMed Web of Science
34.↵
Lapierre, M., C. Blin, A. Lambert, G. Achaz, and E. P. C. Rocha. 2016. The impact of selection, gene conversion, and biased sampling on the assessment of microbial demography. Mol Biol Evol. 33: 1711–1725.
OpenUrl CrossRef PubMed
35.↵
Liu, Q., A. Ma, L. Wei, Y. Pang, B. Wu, et al. 2018. China’s tuberculosis epidemic stems from historical expansion of four strains of Mycobacterium tuberculosis. Nat Ecol Evol. 2: 1982–1992.
OpenUrl
36.↵
Matuszewski, S., M. E. Hildebrandt, G. Achaz, and J. D. Jensen. 2018. Coalescent processes with skewed offspring distributions and nonequilibrium demography. Genetics. 208: 323–338.
OpenUrl Abstract/FREE Full Text
37.↵
Menardo, F., S. Duchêne, D. Brites, and S. Gagneux. 2019. The molecular clock of Mycobacterium tuberculosis. PLoS Pathog. 15: e1008067.
OpenUrl
38.↵
Mortimer, T. D., A. M. Weber, and C. S. Pepperell. 2018. Signatures of selection at drug resistance loci in Mycobacterium tuberculosis. mSystems. 3: e00108–17.
OpenUrl
39.↵
Payne, J. L., F. Menardo, A. Trauner, S. Borrell, S.M. Gygli, et al. 2019. Transition bias influences the evolution of antibiotic resistance in Mycobacterium tuberculosis. PLOS Biol. 17: e3000265.
OpenUrl
40.↵
Pepperell, C. S., V. H. Hoeppner, M. Lipatov, W. Wobeser, G. K. Schoolnik, and M. W. Feldman. 2010. Bacterial genetic signatures of human social phenomena among M. tuberculosis from an aboriginal Canadian population. Mol Biol Evol. 27: 427–40.
OpenUrl CrossRef PubMed Web of Science
41.↵
Pepperell, C. S., A. M. Casto, A. Kitchen, J.M. Granka, O. E. Cornejo, et al. 2013. The role of selection in shaping diversity of natural M. tuberculosis populations. PLoS Pathog. 9: e1003543.
OpenUrl CrossRef PubMed
42.↵
Pfeifer, B., U. Wittelsbürger, S. E. Ramos-Onsins, and M. J. Lercher. 2014. PopGenome: An efficient swiss army knife for population genomic analyses in R. Mol Biol Evol. 31: 1929–36.
OpenUrl CrossRef PubMed Web of Science
43.↵
Phelan, J., F. Coll, R. McNerney, D. B. Ascher, D. E. V. Pires, et al. 2016. Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance. BMC Med. 14: 31.
OpenUrl CrossRef
44.↵
Renzette, N., S. P. Pfeifer, S. Matuszewski, T. F. Kowalik, and J. D. Jensen. 2017. On the analysis of intrahost and interhost viral populations: human cytomegalovirus as a case study of pitfalls and expectations. J Virol. 91: e01976–16.
OpenUrl CrossRef
45.↵
Rocha, E. P. C., J. Maynard Smith, L.D. Hurst, M. T. G. Holden, J. E. Cooper, et al. 2006. Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol. 239:226–35.
OpenUrl CrossRef PubMed Web of Science
46.↵
Sackman, A. M., R. B. Harris, and J.D. Jensen. 2019. Inferring demography and selection in organisms characterized by skewed offspring distributions. Genetics. 211: 1019–1028.
OpenUrl Abstract/FREE Full Text
47.↵
Sherman, D. R., and S. Gagneux. 2011. Estimating the mutation rate of Mycobacterium tuberculosis during infection. Nat Genet. 43: 400–401.
OpenUrl CrossRef PubMed
48.
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 123: 585–595.
OpenUrl Abstract/FREE Full Text
49.↵
Tellier, A., and C. Lemaire. 2014. Coalescence 2.0: A multiple branching of recent theoretical developments and their applications. Mol Ecol. 23: 2637–2652.
OpenUrl CrossRef
50.↵
Thornton, K. R., J. D. Jensen, C. Becquet, and P. Andolfatto. 2007. Progress and prospects in mapping recent selection in the genome. Heredity. 98: 340–348.
OpenUrl CrossRef PubMed Web of Science
51.↵
Trauner, A., Q. Liu, L. E. Via, X. Liu, X. Ruan, et al. 2017. The within-host population dynamics of Mycobacterium tuberculosis vary with treatment efficacy. Genome Biol. 18: 71.
OpenUrl CrossRef
52.↵
Vahey, M. D., and D. A. Fletcher. 2019. Low Fidelity Assembly of Influenza A Virus Promotes Escape from Host Cells. Cell. 176: 281–294.
OpenUrl CrossRef
53.
World Health Organization (WHO). 2018. Global tuberculosis report 2018. World Health Organization, Geneva.

View the discussion thread.

Posted December 04, 2019.

Download PDF

Citation Tools

Subject Area

Genetics

Subject Areas

All Articles

Animal Behavior and Cognition (5200)
Biochemistry (11703)
Bioengineering (8722)
Bioinformatics (29127)
Biophysics (14932)
Cancer Biology (12048)
Cell Biology (17359)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14143)
Epidemiology (2067)
Evolutionary Biology (18268)
Genetics (12220)
Genomics (16766)
Immunology (11841)
Microbiology (28005)
Molecular Biology (11552)
Neuroscience (60808)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4939)
Plant Biology (10384)
Scientific Communication and Education (1679)
Synthetic Biology (2877)
Systems Biology (7333)
Zoology (1642)

[1] 1.↵
Baker, L., T. Brown, M. C. Maiden, and F. Drobniewski. 2004. Silent nucleotide polymorphisms and a phylogeny for Mycobacterium tuberculosis. Emerg Infect Dis. 10: 1568–1577.
OpenUrl CrossRef PubMed Web of Science

[2] 2.↵
Barton, N. H. 1998. The effect of hitch-hiking on neutral genealogies. Genet Res Camb.72(2): 123–33.
OpenUrl CrossRef Web of Science

[3] 3.↵
Behr, M. A., P. H. Edelstein, and L. Ramakrishnan. 2018. Revisiting the timetable of tuberculosis. BMJ. 362: k2738.
OpenUrl FREE Full Text

[4] 4.↵
Birkner, M., J. Blath, and B. Eldon. 2013. Statistical properties of the site-frequency spectrum associated with Λ-coalescents. Genetics. 195: 1037–53.
OpenUrl Abstract/FREE Full Text

[5] 5.↵
Blath, J., M. Cronjäger Christensen, B. Eldon, and M. Hammer. 2016. The site-frequency spectrum associated with Ξ-coalescents. Theor Popul Biol. 110: 36–50.
OpenUrl CrossRef

[6] 6.↵
Bos, K. I., K. M. Harkins, A. Herbig, M. Coscolla, N. Weber et al. 2014. Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis. Nature. 514: 494–497.
OpenUrl CrossRef PubMed Web of Science

[7] 7.↵
Brites, D., and S. Gagneux. 2015. Co-evolution of Mycobacterium tuberculosis and Homo sapiens. Immunol Rev. 264: 6–24.
OpenUrl CrossRef

[8] 8.↵
Brown, T. S., A. Narechania, J. R. Walker, P. J. Planet, P. J. Bifani, et al. 2016. Genomic epidemiology of Lineage 4 Mycobacterium tuberculosis subpopulations in New York city and New Jersey, 1999–2009. BMC Genomics. 17: 947.
OpenUrl

[9] 9.↵
Cancino-Muñoz, I., M. Moreno-Molina, V. Furió, G. A. Goig, M. Torres-Puente, et al. 2019. Cryptic resistance mutations associated with misdiagnoses of multidrug-resistant tuberculosis. J Infect Dis. 220: 316–320.
OpenUrl

[10] 10.↵
Charlesworth, B., M. T. Morgan, and D. Charlesworth. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics. 134: 1289–1303.
OpenUrl Abstract/FREE Full Text

[11] 11.↵
Colangeli, R., V. L. Arcus, R.T. Cursons, A. Ruthe, N. Karalus, et al. 2014. Whole genome sequencing of Mycobacterium tuberculosis reveals slow growth and low mutation rates during latent infections in humans. PLoS One. 9: 1–9.
OpenUrl CrossRef PubMed

[12] 12.↵
Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, et al. 1998. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 393: 537–544.
OpenUrl CrossRef PubMed Web of Science

[13] 13.↵
Comas, I., M. Coscolla, T. Luo, S. Borrell, K. E. Holt, et al. 2013. Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat Genet. 45:1176–1182.
OpenUrl CrossRef PubMed

[14] 14.↵
Copin, R., X. Wang, E. Louie, V. Escuyer, M. Coscolla et al. 2016. Within host evolution selects for a dominant genotype of Mycobacterium tuberculosis while T cells increase pathogen genetic diversity. PLoS Pathog. 12: e1006111.
OpenUrl CrossRef

[15] 15.↵
Dean, G. S., S. G. Rhodes, M. Coad, A. O. Whelan, P. J. Cockle, et al. 2005. Minimum effective dose of Mycobacterium bovis in cattle. Infect Immunity. 73: 6467–6471.
OpenUrl Abstract/FREE Full Text

[16] 16.↵
Der, R., C. L. Epstein, and J. B. Plotkin. 2011. Generalized population models and the nature of genetic drift. Theor Popul Biol. 80: 80–99.
OpenUrl CrossRef PubMed

[17] 17.↵
Dos Vultos, T. O., Mestre, J. Rauzier, M. Golec, N. Rastogi, et al. 2008. Evolution and diversity of clonal bacteria: The paradigm of Mycobacterium tuberculosis. PLoS One. 3: e1538.
OpenUrl CrossRef PubMed

[18] 18.↵
Duchêne, S., K. E. Holt, F. X. Weill, S. Le Hello, J. Hawkey, et al. 2016. Genome-scale rates of evolutionary change in bacteria. Microb Genomics. 2: 1–12.
OpenUrl

[19] 19.↵
Eldon, B., and J. Wakeley. 2006. Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics. 172: 2621–33.
OpenUrl Abstract/FREE Full Text

[20] 20.↵
Eldon, B., M. Birkner, J. Blath, and F. Freund. 2015. Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents? Genetics. 199: 841–56.
OpenUrl Abstract/FREE Full Text

[21] 21.↵
Eldholm, V., J. Monteserin, A. Rieux, B. Lopez, B. Sobkowiak, et al. 2015. Four decades of transmission of a multidrug-resistant Mycobacterium tuberculosis outbreak strain. Nat Commun. 6: 1–9.
OpenUrl CrossRef PubMed

[22] 22.↵
Eldholm, V., J. H. O. Pettersson, O. B. Brynildsrud, A. Kitchen, E.M. Rasmussen, et al. 2016. Armed conflict and population displacement as drivers of the evolution and dispersal of Mycobacterium tuberculosis. Proc Natl Acad Sci U S A. 113:13881–6.
OpenUrl Abstract/FREE Full Text

[23] 23.↵
Fleischmann, R. D., D. Alland, J. A. Eisen, L. Carpenter, O. White, et al. 2002. Whole-Genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol. 184: 5479–5490.
OpenUrl Abstract/FREE Full Text

[24] 24.↵
Foll, M., Y. P. Poh, N. Renzette, A. Ferrer-Admetlla, C. Bank, et al. 2014. Influenza virus drug resistance: a time-sampled population genetics perspective. PLoS Genet. 10: e1004185.
OpenUrl CrossRef PubMed

[25] 25.↵
Foll, M., H. Shim, and J. D. Jensen. 2015. WFABC: A Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Mol Ecol Resour. 15(1): 87–98.
OpenUrl CrossRef PubMed

[26] 26.↵
Fonseca, J. D., G.M. Knight, and T. D. McHugh. 2015. The complex evolution of antibiotic resistance in Mycobacterium tuberculosis. Int J Infect Dis. 32: 94–100.
OpenUrl CrossRef PubMed

[27] 27.↵
Ford, C.B., P. L Lin, M. Chase, R.R. Shah, O. Iartchouk, et al. 2011. Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection. Nat Genet. 43(5): 482–486.
OpenUrl CrossRef PubMed

[28] 28.↵
Ford, C. B., R.R. Shah, M. Kato-Maeda, S. Gagneux, M. B. Murray et al. 2013. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nat Genet. 45: 784–790.
OpenUrl CrossRef PubMed

[29] 29.↵
Haller, B. C., and P.W. Messer. 2019. SLiM 3: Forward genetic simulations beyond the Wright– Fisher model. Mol Biol Evol. 36: 632–637.
OpenUrl CrossRef

[30] 30.↵
Hershberg, R., M. Lipatov, P. M. Small, H. Sheffer, S. Niemann et al. 2008. High functional diversity in Mycobacterium tuberculosis driven by genetic drift and human demography. PLoS Biol. 6: e311.
OpenUrl CrossRef PubMed

[31] 31.↵
Huillet, T., and M. Möhle. 2011. Population genetics models with skewed fertilities: A forward and backward analysis. Stoch Model. 27: 521–554.
OpenUrl

[32] 32.↵
Irwin, K. K., S. Laurent, S. Matuszewski, S. Vuilleumier, L. Ormond, et al. 2016. On the importance of skewed offspring distributions and background selection in virus population genetics. Heredity. 117: 393–399.
OpenUrl CrossRef

[33] 33.↵
Kimura, M. 1968. Evolutionary rate at the molecular level. Nature. 217: 624–626.
OpenUrl CrossRef PubMed Web of Science

[34] 34.↵
Lapierre, M., C. Blin, A. Lambert, G. Achaz, and E. P. C. Rocha. 2016. The impact of selection, gene conversion, and biased sampling on the assessment of microbial demography. Mol Biol Evol. 33: 1711–1725.
OpenUrl CrossRef PubMed

[35] 35.↵
Liu, Q., A. Ma, L. Wei, Y. Pang, B. Wu, et al. 2018. China’s tuberculosis epidemic stems from historical expansion of four strains of Mycobacterium tuberculosis. Nat Ecol Evol. 2: 1982–1992.
OpenUrl

[36] 36.↵
Matuszewski, S., M. E. Hildebrandt, G. Achaz, and J. D. Jensen. 2018. Coalescent processes with skewed offspring distributions and nonequilibrium demography. Genetics. 208: 323–338.
OpenUrl Abstract/FREE Full Text

[37] 37.↵
Menardo, F., S. Duchêne, D. Brites, and S. Gagneux. 2019. The molecular clock of Mycobacterium tuberculosis. PLoS Pathog. 15: e1008067.
OpenUrl

[38] 38.↵
Mortimer, T. D., A. M. Weber, and C. S. Pepperell. 2018. Signatures of selection at drug resistance loci in Mycobacterium tuberculosis. mSystems. 3: e00108–17.
OpenUrl

[39] 39.↵
Payne, J. L., F. Menardo, A. Trauner, S. Borrell, S.M. Gygli, et al. 2019. Transition bias influences the evolution of antibiotic resistance in Mycobacterium tuberculosis. PLOS Biol. 17: e3000265.
OpenUrl

[40] 40.↵
Pepperell, C. S., V. H. Hoeppner, M. Lipatov, W. Wobeser, G. K. Schoolnik, and M. W. Feldman. 2010. Bacterial genetic signatures of human social phenomena among M. tuberculosis from an aboriginal Canadian population. Mol Biol Evol. 27: 427–40.
OpenUrl CrossRef PubMed Web of Science

[41] 41.↵
Pepperell, C. S., A. M. Casto, A. Kitchen, J.M. Granka, O. E. Cornejo, et al. 2013. The role of selection in shaping diversity of natural M. tuberculosis populations. PLoS Pathog. 9: e1003543.
OpenUrl CrossRef PubMed

[42] 42.↵
Pfeifer, B., U. Wittelsbürger, S. E. Ramos-Onsins, and M. J. Lercher. 2014. PopGenome: An efficient swiss army knife for population genomic analyses in R. Mol Biol Evol. 31: 1929–36.
OpenUrl CrossRef PubMed Web of Science

[43] 43.↵
Phelan, J., F. Coll, R. McNerney, D. B. Ascher, D. E. V. Pires, et al. 2016. Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance. BMC Med. 14: 31.
OpenUrl CrossRef

[44] 44.↵
Renzette, N., S. P. Pfeifer, S. Matuszewski, T. F. Kowalik, and J. D. Jensen. 2017. On the analysis of intrahost and interhost viral populations: human cytomegalovirus as a case study of pitfalls and expectations. J Virol. 91: e01976–16.
OpenUrl CrossRef

[45] 45.↵
Rocha, E. P. C., J. Maynard Smith, L.D. Hurst, M. T. G. Holden, J. E. Cooper, et al. 2006. Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol. 239:226–35.
OpenUrl CrossRef PubMed Web of Science

[46] 46.↵
Sackman, A. M., R. B. Harris, and J.D. Jensen. 2019. Inferring demography and selection in organisms characterized by skewed offspring distributions. Genetics. 211: 1019–1028.
OpenUrl Abstract/FREE Full Text

[47] 47.↵
Sherman, D. R., and S. Gagneux. 2011. Estimating the mutation rate of Mycobacterium tuberculosis during infection. Nat Genet. 43: 400–401.
OpenUrl CrossRef PubMed

[48] 48.
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 123: 585–595.
OpenUrl Abstract/FREE Full Text

[49] 49.↵
Tellier, A., and C. Lemaire. 2014. Coalescence 2.0: A multiple branching of recent theoretical developments and their applications. Mol Ecol. 23: 2637–2652.
OpenUrl CrossRef

[50] 50.↵
Thornton, K. R., J. D. Jensen, C. Becquet, and P. Andolfatto. 2007. Progress and prospects in mapping recent selection in the genome. Heredity. 98: 340–348.
OpenUrl CrossRef PubMed Web of Science

[51] 51.↵
Trauner, A., Q. Liu, L. E. Via, X. Liu, X. Ruan, et al. 2017. The within-host population dynamics of Mycobacterium tuberculosis vary with treatment efficacy. Genome Biol. 18: 71.
OpenUrl CrossRef

[52] 52.↵
Vahey, M. D., and D. A. Fletcher. 2019. Low Fidelity Assembly of Influenza A Virus Promotes Escape from Host Cells. Cell. 176: 281–294.
OpenUrl CrossRef

[53] 53.
World Health Organization (WHO). 2018. Global tuberculosis report 2018. World Health Organization, Geneva.