Abstract
Recent reports have suggested that CRISPR-based gene drives are unlikely to invade wild populations due to drive-resistant alleles that prevent cutting. Here we develop mathematical models based on existing empirical data to explicitly test this assumption. We show that although resistance prevents drive systems from spreading to fixation in large populations, even the least effective systems reported to date are highly invasive. Releasing a small number of organisms often causes invasion of the local population, followed by invasion of additional populations connected by very low gene flow rates. Examining the effects of mitigating factors including standing variation, inbreeding, and family size revealed that none of these prevent invasion in realistic scenarios. Highly effective drive systems are predicted to be even more invasive. Contrary to the National Academies report on gene drive, our results suggest that standard drive systems should not be developed nor field-tested in regions harboring the host organism.
Introduction
CRISPR-based gene drive systems can bias inheritance of desired traits by cutting a wild-type allele and copying the drive system in its place1. Following reports of successful CRISPR gene drive systems in yeast2 and fruit flies3, scientists emphasized the need to employ strategies beyond traditional barrier containment as a laboratory safeguard4,5. These precautions were judged necessary to prevent unintended ecological effects, but also because any unauthorized release affecting a wild population could severely damage trust in scientists and governance, significantly delaying or even precluding applications of gene drive and other biotechnologies.
Drive resistance can result from mutations that block cutting by the CRISPR nuclease. Recent examinations of the phenomenon by experiments and deterministic models have generated substantial media attention6–9. Resistance can arise from standing genetic variation at the drive locus or because the drive mechanism is not perfectly efficient and is predicted to prevent drive fixation in wild populations unless additional mitigating strategies are employed1,9–⇓⇓12.
Recent articles highlighting the problem of resistance for gene drives have suggested that resistance will prevent drive invasion in wild populations—with some even implying that resistance could serve as an experimental safeguard. While resistance should prevent drive fixation, an allele can nonetheless spread to significant frequency without fixing. To clarify this point, we sought to quantify the likelihood and magnitude of spread in the most likely unauthorized release scenario—a small number of engineered individuals released into a wild population.
CRISPR gene drive systems function by converting drive-heterozygotes into homozygotes in the late germline or early embryo1 (Figure 1A). First, a CRISPR nuclease encoded in the drive construct cuts at the corresponding wild-type allele—its target prescribed by an independently expressed guide RNA (gRNA)—producing a double-strand break13. This break is then repaired either through homology-directed repair, producing a second copy of the gene drive construct, or through a nonhomologous repair pathway (non-homologous end joining, NHEJ, or microhomology-mediated end joining, MMEJ), which typically introduces a mutation at the target site14,15. Because the drive target is determined through sequence homology, such a mutation generally results in resistance to future cutting by the gene drive. Thus, the allele converts from a wild-type to resistant allele if it undergoes repair by a pathway other than homology-directed repair. Moreover, drive-resistant alleles are expected to exist in wild populations simply due to standing genetic variation7,8.
Deterministic models, which assume an infinite, well-mixed population, predict whether an allele is initially favored by selection, i.e., favored to increase in frequency when initially rare in a wild population16. Whether gene drives are initially favored by selection depends on two key parameters: the homing efficiency (P), or the probability of undergoing homology-directed repair instead of nonhomologous repair, and fitness (f), or the relative fecundity or death rate the drive and its cargo confer on their organism compared to the wild-type. Mathematically, drives are initially favored by selection if f(1 + P) > 1, i.e., if the inheritance bias of the drive exceeds its fitness penalty9,11,17. Given that the homing efficiencies of reported drive systems typically range from 0.37 to 0.99 (Table S1), current drive systems can clearly be initially favored by selection. Although the fitness parameter, f, is typically not measured in proof-of-concept studies, a substantial fitness cost is tolerable by all reported CRISPR drive constructs6,2,3,18,19 (Figure 1B).
However, in finite populations, the fate of initially rare alleles is determined not only by selection but also by stochastic fluctuations20–⇓22. Therefore, stochastic models are required to predict the probability that a drive spreads to some preset frequency when they are initially favored by selection. A previous, and arguably prescient, stochastic model of endonuclease drive containment found that homing-based drives, such as those subsequently developed using CRISPR, were among the likeliest to invade of the various drive alternatives23. To determine whether drives are still able to invade in the presence of resistance, we formulated a finite population, stochastic, Moran-based model that allows us to study small releases in finite and structured populations (Methods).
Results
Our model considers three distinct allelic classes: wild-type (W), gene drive (D), and resistant (R). Consistent with experiments, we assume that the drive invariably cuts the wild-type allele in the germline of a heterozygous WD individual, converting to a drive allele with probability P, or a resistant allele with probability 1–P. Each genotype, AB, has a relative reproductive rate, fAB, corresponding to its fitness in deterministic models, normalized such that the wild-type homozygote has fitness one (fWW=1), the drive confers a dominant cost (fDW=fDD=fDR<1), and resistance is neutral (fRR=1). This ordering of the parameters represents the worst-case scenario for drive spread (SI Section 2.6).
At the population level, our basic model considers N diploid individuals mating randomly. The process unfolds in discrete steps, during which parents are chosen for reproduction, an offspring is chosen according to the mechanism above, and another individual is replaced by the offspring (Figure 1C and Methods). These steps are repeated until one allele fixes. A generation is N time-steps, which corresponds to the mean lifespan of an individual.
Figure 1D shows typical simulations for drive efficiencies of 0.15, 0.5, and 0.9, which correspond respectively to a constitutively active drive system targeting a common insertion site, and conservative and high efficiency systems (based on previous experimental studies, Table S1, Figure 1B, SI Section 1). These simulations assume a dominant drive fitness cost of 10%, a population of size 500, and a release of 15 drive-homozygous individuals. (Note that the dynamics are similar for larger population sizes; see SI Section 2.1 and Figure S1.) In all three cases, the drive, on average, irreversibly alters a majority of the population, either via invasion of the drive itself or via spread of drive-created resistant alleles. We call the maximum frequency of drive alleles reached during a simulation the peak drive, and we say a drive has invaded if it reaches a frequency of 0.1. Importantly, although arbitrary, the choice of 0.1 is large enough to ensure peak drive on par with deterministic models (SI Section 2.6 and 2.7). Invasion is very unlikely when the drive is not initially favored by selection.
We next calculated the distribution of peak drive while varying the number of organisms released (Figs. 1e and 1f). We find that these distributions are bimodal, with one mode centered around the initial frequency (corresponding to drift leading rapidly to extinction) and one centered roughly around the maximum values observed in the large-release scenarios in Figure 1D. The former mode shrinks rapidly as more organisms are released, and for the parameters studied, a release of 10 individuals nearly guarantees invasion with substantial peak drive (SI Section 2.6, Figure S7).
To understand the extent to which isolation might prevent invasion of other populations connected by gene flow, we introduced population structure. Our model consists of five subpopulations (or islands) that are equally connected by migration (Figs. 2a and SI Section 7, Methods). Typical dynamics are illustrated in Figure 2C. Figures 2B and 2D show the escape probability, or the probability of the drive invading (attaining a frequency of 0.1) at least one subpopulation other than its originating one, and Figure 2E shows the probability of invading a varying number of subpopulations.
Our results in Figure 2 suggest that if the migration rate is extremely low, then the drive is effectively contained in the initial subpopulation. If the migration rate is high, the drive is almost guaranteed to invade all subpopulations linked to the originating one. For intermediate migration rates—characterized roughly by migration rates on the order of the inverse of the drive extinction time—both outcomes occur. In the scenario studied in Figure 2, a migration rate of 10-3, which corresponds to a single migration event every 2 generations on average (Methods), virtually guarantees escape for moderate drive efficiencies (Methods). For further details and analytical formulae allowing rapid estimation of escape probabilities, see SI Section 2.7.
Finally, we sought to understand the effects of additional mitigating factors that could potentially affect peak drive or invasion. We considered the most prominent factors that have arisen in previous papers, and we studied each by varying parameters in our basic model and developing model extensions. Our results are explored in detail in the Supplementary Information (Section 3).
First, we considered preexisting drive resistance resulting from standing genetic variation7,8 (SI Section 2.2). We find that increasing the proportion of the population that is initially resistant linearly decreases the mean peak drive (R2=0.996). Using the parameters in Figure 1E and considering a release of 15 individuals, more than 50% preexisting resistance is required to contain average peak drive below 10% (Figure S2).
Second, we studied the effect of varying family size, which may be relevant to species such as mosquitoes with large egg batch sizes19,24. We extended the model so that k (adult) offspring are produced from a reproduction event, rather than one. We find that this effect scales the release and population sizes25 by a factor of 4/(2k + 6). For illustration, we estimated k for Anopheles gambiae to be roughly 10 (SI Section 2.3), so that a release of 7 individuals roughly corresponds to a release of 1 individual in our basic model. While this effect somewhat reduces the chance of drive invasion for small release sizes, it does not preclude it.
Third, we varied fitness and homing efficiency across the regime where the drive is initially favored by selection (Figure 1B) and recorded peak drive (SI Section 2.4, Figure S5). We find that peak drive is on average greater than 30% across the majority of the regime and almost always greater than 10%.
Fourth, we studied the effect of inbreeding, which has been shown in several recent theoretical studies26,8 to impede drive spread (SI Section 2.5). We extended the model to include a probability s of an individual selfing rather than mating with a second individual26. The model assumes no inbreeding depression and thus considers the worst-case scenario for drive26. We find that even in this scenario, high selfing probabilities are required to reduce peak drive and the probability of invasion for moderate drive costs.
There are a variety of other phenomena that could affect invasiveness, e.g., density dependence27, environment28, costly resistance29, local ecology, and even mating incompatibilities between some laboratory strains and wild individuals. Such effects should be carefully studied in subsequent papers. Most importantly, the drive architecture itself should affect invasiveness; we consider here only alteration-type drive systems, while others, e.g., sex-ratio distorters and genetic load drives, would be expected to yield different dynamics. In particular, population suppression drive systems may locally self-extinguish before invading new populations. However, for alteration drives, our key qualitative finding—that peak drive is difficult to reliably contain below a socially tolerable threshold following a very small release of organisms—appears robust to a variety of mitigating factors. Fundamentally, we exercise caution by omitting application-specific phenomena that might aid containment in particular instances but not in general.
Discussion
Our results suggest that current first-generation CRISPR gene drive systems are capable of far-reaching—perhaps, for species distributed worldwide, global—spread, even for very small releases. A simple, constitutively expressed CRISPR nuclease and guide RNA cassette targeting the neutral site of insertion—an arrangement that could occur accidentally—may be capable of altering many populations of the target species depending on the homing efficiency of the organism in question. More generally, resistance can be problematic for intentional applications of gene drives, but we find that it is not a major impediment to invasion of unintended populations.
Our results have numerous implications for future gene drive research. First, researchers interested in studying self-propagating gene drives may wish to refrain from constructing systems that are capable of invading wild populations. Invasion can be avoided by employing intrinsic molecular confinement mechanisms such as synthetic site targeting or split drive, as recommended by the National Academies4. Second, contrary to the National Academies’ recommendation of a staged testing strategy, the predicted invasiveness of current CRISPR-based drive systems may preclude field trials, possibly even on ostensibly isolated islands. The development of ‘local’, intrinsically self-exhausting gene drive systems30–⇓⇓⇓34, sensitive methods of monitoring population genetics, and strategies for countering self-propagating drive systems and restoring populations to wild-type should be a correspondingly high priority.
Methods
Well-mixed finite population model
To model gene drives in finite populations, we introduce a Moran-type model with sexual reproduction (illustrated in Figure 1C). We consider a population of N individuals, each of which is diploid. We focus on a locus with three allelic classes: wild-type (W), CRISPR gene drive element (D) and drive-resistant (R). There are six possible genotypes: WW, WD, WR, DD, DR, and RR. We assign to each genotype α a reproductive rate fα.
The process proceeds in discrete time-steps, during each of which three events occur in succession (Figure 1C). First, two individuals are chosen without replacement for mating with probabilities proportional to their reproductive rates, so that genotype α is selected with probability
Here Nα is the number of individuals having genotype α, and the sum in the denominator is over all six genotypes. Second, after selecting the two parents, the offspring genotype is chosen randomly based on the genotypes of the two parents. To proceed, we introduce notation α = AB to mean that genotype α consists of alleles A and B, and we index these alleles via α1 = A and α2 = B. Note that we track only one genotype for each heterozygote, implicitly combining counts for genotypes AB and BA. Using this notation, the probability that an offspring of genotype γ is chosen given a mating between parents of genotypes α and β is given by the quantity , which is equal to
Here is a gamete production probability—the probability that a parent with genotype α produces a gamete with haplotype A—and δAB is the Kronecker delta, defined by δAB = 1 if A = B (i.e., if the offspring under consideration is a homozygote), and δAB = 0 otherwise. The gamete production probabilities, , are determined by accounting for the gene drive process described in the main text. They are given by: , , , . The remaining values not listed, e.g., , are zero. Third, an individual is chosen uniformly at random for death. Thus, the population size remains constant. The resulting counts become the starting abundances for the next iteration of the process. The process is initialized with a small number, i, of drive homozygotes (DD) and the remaining population, N − i, wild-type homozygotes (WW). The process continues as described above either until a specified number of time steps have elapsed or until one of the three alleles has fixed. Any of the alleles can fix, but typically either the wild-type or resistant alleles fix, due to the emergence of resistance.
Finite population model with population structure
To study the effects of population structure on drive containment, we extended the well-mixed model from the previous section. We now consider l well-mixed subpopulations, each consisting initially of N/l individuals. The process proceeds in discrete time steps, as before. In each time step, we either migrate an individual from one population to another, or we choose a particular subpopulation and proceed through one mating and replacement iteration, as outlined above. More specifically, one step of the process proceeds as follows (illustrated in Figure S8). With probability m, we initiate a migration event. In this case, we perform three steps. First, we choose a source population with probability proportional to its size. Second, we choose an individual uniformly at random from the source population for migration. Finally, we move the chosen individual to a linked subpopulation uniformly at random. Or, with probability 1 − m, we initiate a mating event as described in the well-mixed section. To carry this out, we first choose the population in which the event will occur. We choose this population with probability proportional to the square of its total fitness. We then step through one iteration of the well-mixed mating process within this subpopulation. Note that in this model the migration rate has a simple interpretation. The time between migrations is geometrically distributed with parameter m, so the mean time between migrations is 1/m time steps. Recall that a “generation” is equal to the mean lifespan of an individual, that is, N reproduction events or N/(1 − m) time steps. Then the typical time between migrations can be expressed with the units as generations:
Deterministic model
To compare our stochastic simulations with deterministic results, we use a recently published model9. From that work, we employ the “previous drive” model, as it was designed to agree with the existing proof-of-concept CRISPR drive constructs that we consider here. Specifically, we consider the case of 1 guide RNA (n = 1 in that work’s notation), and zero production of costly resistant alleles (γ = 1).
Author contributions
K.M.E. and M.A.N. conceived the study. C.N., B.A. and M.A.N. created the mathematical models, and all authors analyzed the results. C.N. and B.A. wrote the manuscript with contributions from all authors.
Acknowledgments
We thank J. Wakeley for helpful discussions. C.N. received support from the NSF Graduate Research Fellowship Program under grant no. DGE1144152. K.M.E. was supported by the Burroughs Wellcome Fund (IRSA 73786).