Abstract
CRISPR is a newly discovered prokaryotic immune system. Bacteria and archaea with this system incorporate genetic material from invading viruses into their genomes, providing protection against future infection by similar viruses. The conditions for coexistence of prokaryots and viruses is an interesting problem in evolutionary biology. In this work, we show an intriguing phase diagram of the virus extinction probability, which is more complex than that of the classical predator-prey model. As the CRISPR incorporates genetic material, viruses are under pressure to evolve to escape the recognition by CRISPR. When bacteria have a small rate of deleting spacers, a new parameter region in which bacteria and viruses can coexist arises, and it leads to a more complex coexistence patten for bacteria and viruses. For example, when the virus mutation rate is low, the virus extinction probability changes non-montonically with the bacterial exposure rate. The virus and bacteria co-evolution not only alters the virus extinction probability, but also changes the bacterial population structure. Additionally, we show that recombination is a successful strategy for viruses to escape from CRISPR recognition when viruses have multiple proto-spacers, providing support for a recombination-mediated escape mechanism suggested experimentally. Finally, we suggest that the reentrant phase diagram, in which phages can progress through three phases of extinction and two phases of abundance at low spacer deletion rates as a function of exposure rate to bacteria, is an experimentally testable phenomenon.
1. Introduction
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a recently discovered immune system of prokaryotes. It is widely distributed in bacteria and archaea. Nearly half of bacteria and almost all archaea possess the CRISPR system [1, 2, 3, 4]. CRISPR is adaptive and heritable [5]: bacteria can acquire a short piece of invading DNA (termed proto-spacer) and integrate this piece of exogenous DNA into the CRISPR locus. The nucleotide sequence in the CRISPR locus that originated from the invading DNA is called a spacer. The mechanism of the CRISPR system is categorized into three stages: the acquisition and integration of new spacers into CRISPR, expression and maturation of CRISPR RNAs (crRNAs), and CRISPR interference [6, 7, 8]. In the acquisition stage, proto-spacers from viruses (phages) or plasmids are integrated into the CRISPR locus. During the expression stage, CRISPRs are first transcribed to precursor CRISPR RNAs (pre-crRNAs). Pre-crRNAs are then catalyzed by Cas (CRISPR-associated) proteins into mature crRNAs. In the interference stage, crRNAs guide Cas proteins to cleave the complementary DNA of invading plasmids or phages [9, 10, 11, 12, 13, 14].
The discovery of the CRISPR system challenged our understanding of the evolutionary dynamics of bacteria and phages [5, 15]. Several models were established to explain the interesting features of CRISPR and the co-evolution of prokaryotes and phages. Levin used an ecological model to investigate the question of why and how CRISPR is established and maintained in a bacterial population [16]. A similar model that considered the conjugational transfer of beneficial plasmids suggested that plasmids may be more likely to evade CRISPR-Cas immunity by inactivation of functional CRISPR-Cas rather than by mutation of the target proto-spacers [17]. He and Deem introduced a population dynamics model to explain the heterogeneous distribution of the spacer diversity in CRISPR [18], i.e. the decrease of spacer diversity with distance from leader. A later paper considered a density-dependent phage growth model and showed that recombination allows viruses to evade CRISPR more effectively than does point mutation alone when greater than one mismatch between the crRNA and protospacer was required for viruses to escape CRISPR recognition [19]. Childs et al. used an eco-evolutionary model of CRISPR with imperfect immunity to also show that both bacteria and phages were highly diversified by coevolution and that diversity decreased with distance from leader [20]. In another paper, a metric, population-wide distributed immunity (PDI), was defined to quantify the immunity distributed among the host-viral population. This model showed that the number of viral proto-spacers, mutation rate, host spacer acquisition rate, and spacer number could change the host-viral population structure by a distributed immunity [21]. Haerter et al. considered spatial effects. Their model showed that CRISPR and spatial self-organization stabilized the coexistence of bacteria and phages. Protected by CRISPR, bacteria could coexist with phages even when the diversity of phages was large [22]. In a follow-up paper, the fitness cost of spacers was taken into consideration. Due to the spatial inhomogeneity and the fitness cost of spacers, it was observed that evolution favors an intermediate number of spacers [23]. Weinberger et al. combined a population-genetic model with metagenomic sequencing to study the population dynamics of bacteria and phages [24]. They reported the gradual loss of bacterial diversity through selective sweeps in the host population. This model also showed that the trailer-end of the spacer array was conserved even though the old spacers did not provide immunity against current phages. Increasing the spacer deletion rate repressed the bacterial immunity and led to a viral bloom. Weinberger et al. also examined why CRISPRs are more common in archeae than in bacteria [25] with stochastic model of viral-CRISPR co-evolution. The model showed that a decreased viral mutation rate increases the prevalence of CRISPR in archae, and CRISPR appeared only at an intermediate level of innate immunity. In a follow-up paper [26], a model with explicit population dynamics showed that CRISPR was ineffective for extremely large populations. Because mesophiles usually have larger population sizes, this model gave another explanation for the increased prevalence of CRISPR in hyperthermophiles compared to mesophiles. Finally, a phase diagram of bacteria and phages has been computed, with results similar to the classical predator-prey model [26], i.e. bacteria and phages coexist only when the virulence of phages is not too high and the immunity of bacteria is not too strong. The mean-field assumption of this approach, however, is in contrast to the strong stochastic effects seen in experiments [27].
Here we investigated the conditions under which bacteria and phages can coexist in a fully stochastic model of co-evolution. The competition here differs from that in the classical competitive exclusion principle [28, 29], which studies the competition between species that occupy the same ecological niche. In our model, bacteria and phages do not occupy exactly the same ecological niche but rather can co-exist. Phages can hijack bacteria, and bacteria can gain immunity to avoid being infected. We studied the impact of different phage evolution strategies, namely point mutation and recombination, on the co-evolution of bacteria and phages. We found an interesting phase diagram of the extinction probability of phages, which cannot be explained by the classical predator-prey model. In the classical predator-prey model, bacteria and phages only coexist within one parameter region. Outside this region, bacteria and phages cannot both coexist. In this paper, we find bacteria and phages can coexist in several parameter regions. Indeed, bacteria and phages coexistence is re-entrant as a function of the exposure rate of phages to bacteria, for low phage mutation rates.
2. Method
We used a stochastic model to study the population dynamics of bacteria and phages. The bacteria have a rate of acquiring and losing spacers. The phages have multiple proto-spacers that can evolve by point mutation and recombination. Spacers and proto-spacers are expressed as a bit string. Each bit can be either “0” or “1”. The length of each spacer and proto-spacer is L bits. The number of proto-spacers in phages is np. CRISPR suppresses the phages, and unrecognized phages can infect and reproduce in bacteria. The co-evolving dynamics is described by seven events:
Bacteria reproduction: The growth rate of wild type bacteria that do not acquire any spacers is c0. Each spacer has a cost c. The growth rate of bacteria that have spacer array is , where is the number of spacers in the spacer array is the density of healthy bacteria, xI is the density of infected bacteria, and xBM is the maximum density of bacteria.
Bacteria infection: Healthy bacteria can be infected by phages. The adsorption rate of phages to healthy bacteria is βxP xB, where β is the exposure rate, xP is the density of free phages, and xB is the density of healthy bacteria. Bacteria have a probability γ to acquire a new spacer from the invading phage genome. Each proto-spacer has probability γ/np to be acquired. The newly acquired spacer is always inserted at the leader-proximal end of CRISPR, and the phage is degraded. Old spacers are shifted to the distal end. The maximum number of spacers per bacteria is ns. If the number of spacers reaches ns, the oldest spacer is deleted when a new spacer is acquired. The alternative event, with probability 1 – γ, is no incorporation of a proto-spacer. In this case, if any spacer in the CRISPR matches any proto-spacer of the phage, the phage is killed. Otherwise, this bacterium becomes infected.
CRISPR deletes one spacer: A bacterium that possesses the spacer array has a rate to delete one spacer, where is the number of spacers in spacer array , and d is the rate of deleting one spacer. When one spacer is deleted, the other spacers will be shifted towards the leader end.
Bacterial lysis: Infected bacteria have a rate 1/τ to lyse, where τ is the latent time. When the infected bacteria lyse, b new phages come out, where b is the burst size. Each of the newborn phages can have point mutations or recombination.
Phage mutation: Phages upon bacterial lysis can have point mutation. The rate of point mutation is μ per base per replication. A mutation flips the value of a nucleotide.
Phage recombination: Phages upon bacterial lysis can have recombination. The rate of recombination is υ. A recombination occurs with another phage randomly in the whole phage population, as a mean-field approximation to multiple infection. The recombination crossover probability is pc[19].
Phage degradation: Each phage has a decay rate δ.
Initially, no bacteria have spacers. There are one or more strains of phages in the environment initially. Each strain of phages has np distinct proto-spacers.
The values of parameters are determined by the experimental data (see Supplementary Information). We used the Lebowitz-Gillespie algorithm [30, 31] to sample the stochastic process of the co-evolution of bacteria and phages. The master equation of this stochastic process is in the Appendix.
3. Results
We examine the coexistence of phages and bacteria as a function of the phage mutation rate and bacterial exposure rate. Fig. 1 shows a phase diagram for the phage and bacterial populations. In Fig. 1(c), there are four transitions in the extinction probability of phages when the mutation rate of phages is small. In region (1), phages begin to emerge but the density of phages stabilizes at a low level. Bacteria and phages can coexist in this region. In region (2), the density of phages increases initially but then decreases to zero. In this region, phages have a high probability to go extinct. In region (3), the density of phages initially increases and then decreases, but in contrast to the behavior in region (2), phages can grow back and avoid extinction in this case. In region (4), phages rapidly go extinct after a sharp initial burst. The extinction probability of phages is high, and the extinction probability approaches a limit. In this region, bacteria and phages cannot coexist.
The four transitions for the extinction probability of phages as a function of β can be explained by Eq. 3.1 and Eq. 3.2. In region (1), because the density of phages is low and the value of β is small, the number of spacers in bacteria is almost 0, Fig. 1(e). Therefore, almost all bacteria are susceptible to phages. The equations of infected bacteria and phages can be approximated as where xI is the density of infected bacteria and xP is the density of phages. Solving Eq. 3.1, we find when β* = δ/[xB (b − 1)] ≈ 10−12 mL · min−1 the replication rate of phages begins to exceed the phage decay rate, so phages emerge in the system.
In region (2), as β increases, the density of phages rapidly increases and bacteria begin to acquire spacers, Fig. 1(d) and (e). We can estimate the selection pressure on bacteria in this region. When xP ≈ 109 mL−1, which is the typical density of phages before bacteria acquire spacers in region (2), the infection rate of each bacterium that has no spacers is βxP ≈ 10−3 min−1, which is the same order as the growth rate of bacteria. So the bacteria that acquire spacers dominate the bacterial population in a short time, and the density of phages will go down, eventually to zero.
In region (3), the phages increase first, then bacteria acquire spacers, leading the phages to decrease, which is similar to the behavior in the region (2). But when the density of phages is low, bacteria will delete spacers due to the deletion rate and the cost of spacers. Because the mutation rate of phages is small, bacteria that acquire one or more spacers have immunity against most phages. Phages can only infect those bacteria that lost all spacers. Here we define the proportion of bacteria that have lost all spacers as q. Then the density of susceptible bacteria is xB · q. Thus the equation of infected bacteria can be approximated as In region (3), the value of q is roughly 0.1 from Fig. 1(e), so we can find β* = δ/[xB (qb − 1)] ≈ 10−11 mL · min−1. Therefore, in region (3), phages can grow back when some portion of bacteria lose spacers. As the density of phages increases, the average number of spacers in bacteria also increases, which in turn represses the growth of phages, as in Figs. 1(d) and (e). So in this case, the density of phages fluctuates around a low value and eventually stabilizes.
The density of free phages decreases due to two factors. One factor is decay. The other factor is due to CRISPR recognition and subsequent degradation. Therefore, the overall decay rate of phages is βxB + δ. In the left boundary of region (4), β is the order of 10−9 mL · min−1, and the overall decay rate of phages is βxB + δ ~ 10−2 min−1. Following the same argument as in region (3), the minimum value of q for which phages can grow back is q* = (βxB + δ)/(bβxB) ~ 10−2. The time required for q* bacteria to lose spacers is t>q*/d ≅ 1000 min, which is longer than the half life of phages. Thus, before bacteria lose spacers, all of phages are adsorbed into bacteria. Because bacteria have acquired spacers and the mutation rate of phages is small, phages that are adsorbed into bacteria are killed by CRISPR. Therefore, in region (4), phages go extinct rapidly after the initial burst. When β further increases, if bacteria acquire spacers, phages will go extinct rapidly. If bacteria do not acquire spacers, bacteria will go extinct, as in Fig. 1(f). The extinction probability of phage approaches a limit, in Fig. 1, the probability that one of the initial bacteria acquired a spacer, where is the initial bacterial population.
From the above explanation of the four regions in Fig. 1, we have the conditions for which this interesting non-classical phase diagram of phage extinction exists. First, bacteria must possess the CRISPR adaptive immune system: if bacteria do not have CRISPR, bacteria and phages can only coexist when β is small, β ≈ 10−12 mL · min−1, and region (2) and region (4) will not exist. Second, bacteria must have some rate to lose the acquired immunity. If bacteria can accumulate an unlimited number of spacers, phages will eventually go extinct if the length of the proto-spacers is finite and region (3) will not exist. Third, the rate of losing the adaptive immunity must be small. In region (2) and the left boundary of region (4), phages cannot grow back because the rate of losing spacers is small. If the rate of losing spacers is large, region (2) will disappear and the left boundary of region (4) will move towards higher β values, as shown in Fig. 2(a) and (b). Conversely this phase diagram is not sensitive to the probability of acquiring new spacers. Increasing γ only changes the pattern of the extinction probability in high μ regions, making it more difficult for phages to escape from CRISPR recognition, as shown in Fig. 2(c) and (d). From the above results, we predict when the deletion rate of spacer and the mutation rate of phages is small, decreasing the adsorption rate of phages can make phages extinct. However, further decreasing the adsorption rate can allow phages to reemerge.
CRISPR changes the bacterial population structure. In Fig. 3, the Shannon entropy of the first spacer is used as a measure of the diversity. In Fig. 3, the diversity of spacers rises slowly when β is small, region (1) in Fig. 1(c). This is because the selection pressure on bacteria is small, and CRISPR does not provide bacteria much advantage. As β increases, the diversity of spacers increases faster because the density of phages is larger and the value of β is higher, making the adsorption of phages into bacteria more rapid. But the steady-state value of the diversity decreases, implying the distribution of spacers becomes more biased. If the selection pressure on bacteria is larger, the bacteria that acquire spacers will dominate the population in a shorter time. When the bacteria that have spacers dominate the population, phages are repressed, and the density of phages stays at a low level. The process of acquiring spacers becomes slower, leading to a smaller steady value of spacer diversity.
Phages can have rapid recombination [32]. Recombination is compared to point mutation of phages in Fig. 4. Here there are two strains of phages initially, and so acquisition of two spacers is required for bacterial immunity. The limiting extinction probability in this case is in Fig. 4. Additionally, at very large β, bacteria with a finite number of spacers, ns, eventually go extinct when the spacer array by chance is entirely occupied by protospacers from only one strain of phage. Finally, the extinction probability of phages when phages have only recombination is lower than that when phages have only point mutation, because recombination can change several proto-spacers at once.
4. Discussion
The cost of adding novel spacers is undetectable in some experiments [33, 34]. In our model, we set the cost of adding new spacers to a small value, consistent with the experimental data. We also found that the results are persistent with changes to the cost of adding novel spacers in our model. When we set the cost of adding new spacers to zero and 0.5, the results, which are shown in Fig. A1 and Fig. A2, are almost identical with those when the cost of adding new spacers is 0.1.
Here we showed that bacteria can coexist with one phage strain because of the balance between acquisition and deletion of spacers. But this balance cannot always be achieved, and in some parameter regimes either the phages or bacteria go extinct. For example, the study of [27] showed coexistence of phage and bacteria, wereas the study of [35] showed elimination of phage by bacteria for a sufficiently diverse bacteria population of CRISPR spacers. When the bacterial exposure rate varies, the coexistence of bacteria and phages shows an interesting pattern of reentrant phases. Thus, a testable prediction of our model is that when the bacterial exposure rate is low, phages go to extinction; increasing the bacterial exposure rate makes the phage population emerge in the system, but increasing the bacterial exposure rate still further can result in phages extinction. Phages can further reemerge if the bacterial exposure rate is increased more. Finally, phages go extinct when the bacterial exposure rate excesses a critical threshold. The whole process is depicted in Fig. 1(c). The bacterial exposure rate may change due to the variation in phage tail and host receptor affinities, or because of the change in temperature and ion densities, and changes to this exposure rate strongly influence the balance between acquisition and deletion of spacers.
When there is greater than a single phage sequence, for example, when the phage mutation rate increases, the coexistence of bacteria and phages is stabilized due to less ability of the bacteria to recognize the diverse phage strains. Here our approach has mimicked controlled environments such as laboratory and factory strains rather than natural environment strains such as those arising in the ocean [36, 37]. In natural environments, the diversity of bacteria and phages is likely rather large. In our current model, there is initially a single bacteria strain and one or two phages strains initially. As time elapses, the diversity of bacteria increases, but the diversity of phages remains low because the phage mutation rate is relatively small.
The boundaries of the phases that arose from the stochastic co-evolution process were explained by a mean-field analysis. In this way, we theoretically estimated the threshold of the bacterial exposure rate, β, at which the bacteria and phages can coexist and gained the insight into why bacteria and phages coexist. The phase diagrams showed here are the results at steady state, when average the total densities of bacteria and phages remain unchanging with time, with the density of each species fluctuating around the average values.
When multiple species of phage were present, recombination allowed the phage to more easily escape extinction by the CRISPR immune system. The phase diagram was shifted such that lower rates of recombination were as effective at immune evasion as were higher rates of mutation. These results support the interpretation of long-term bacterium-phage coevolution experiments, in which recombination among multiple phage strains enable phage persistance against the bacterial CRISPR system [38].
Other properties of the phage-bacteria coevolving system also affect the phase boundaries. For example, when the spacer diversity is sufficient, the phages can be driven to extinction. The boundary for extinction depends on the number of spaces in the CRISPR system, as seen for example, by comparing the present results to previous results for a larger CRISPR array [19]. The ability of CRISPR to drive phages to extinction has been seen experimentally [35].
We note that high rates of bacterial exposure lead to phage persistance and bacterial extinction. High rates of exposure may result from effect contact between the phage and bacteria. High rates may also result from migration of naive bacteria to regions of high phage concentration and diversity. From the present results, we see that CRISPR will become a less effective protection mechanism at high exposure or migration rates. This result has been obeserved experimentally, where high bacterial migration rates induced a shift from CRISPR-mediated protection to a surface modification-mediated defense by bacteria [39]. The more specific CRISPR mechanism is effective when bacteria have enough time to incorporate the proto-spacers providing protection [19, 33].
In summary, we predict an interesting phase diagram of phage extinction probability. When the deletion rate of spacers in CRISPR is small, phages go extinct when the value of β is low, but phages can coexist with bacteria when β is even lower. CRISPR changes the evolution of bacteria and phages, accelerating the co-evolution of bacteria and phages. Finally, recombination is a more efficient mechanism for phages to escape the recognition of CRISPR than is point mutation when there are multiple proto-spacers in the phage. Future work may consider biotechnology applications, genome editing approaches, population-level bacterial control, or effects of recombination in the microbiome.
Authors’ contribution
P.H. wrote the codes, collected and analyzed the data, drafted the manuscript. Both P.H. and M.W.D developed and analyzed the model. M.W.D helped to draft the manuscript. All authors gave final approval for publication.
Competing interest
We declare we have no competing interests.
Funding
We received no funding for this study.
Acknowledgment
We thank Dr Jeong-Man Park, the Catholic University of Korea, for helpful discussions about the method of this paper.
Appendix
(a) Table of Parameters
The parameters used in the main text are listed in table. A1.
The values of c0 and are estimated from [27]. The cost of spacers is low [17]; here we choose c = 0.1. The values of b and τ are estimated from [41]. The value of β is estimated from [40]. The value of γ is estimated from [20] and [25]. The value of d is estimated from [17]. The value of δ is estimated from [42]. The value of μ is estimated from [43]. The value of ν is estimated from [44] and is the same order as the value of μ. The interference between proto-spacers and CRISPR spacers is governed by the PAM and the seed regions [45]. The length of the PAM is about 3 bp and the length of the seed region is 7 bp [45], so we set the length of spacers and proto-spacers to 10 pb. In the experiment to which we compare [27], the average number of spacers in CRISPR is small, on average 0.8 spacers per bacteria, so we set the maximum number of spacers to 6. The average number of spacers in our simulation is shown in Fig. A3. There are 27 spacers that account for between 82% and 99% of all spacers sampled on any individual day in the experiment to which we compare [27]. Here we set np to 30. We also tried np = 1. The results are qualitatively the same, as shown in Fig. A5. The volume V is set to mimic the typical volume of a droplet.
(b) Master Equation
The master equation of the stochastic process described in the main text is where is the population of the bacteria with spacer array is the population of infected bacteria invaded by phages with proto-spacer array is the population of phages with proto-spacer array , and is the maximum population of bacteria. In Eq. A1, when recognizes and 1 otherwise. The when and 0 otherwise. The when and 0 otherwise. The hamming distance between and is . The is a bit string, which denotes the recombination pattern. Each bit in is either 0 or 1. If , it means there is a crossover at position k. If , it means there is no crossover at position k. The if can be generated by the recombination pattern from and and 0 otherwise. if and 0 otherwise. In , if , it means . In general, is shorthand for .
We can show that and
Therefore, and
This is why we get the last term in Eq. A1.
(c) Mean Field Equations
The corresponding mean field equations for the densities of bacteria and phages, shown for illustrative purpose and not used in the simulations reported in the main text, are where is the density of bacteria with spacer array is the density of infected bacteria invaded by phages with proto-spacer array , and is the density of phages with proto-spacer array . The functions of θ1, θ2, θ3 and θ4 are the same as those in Eq. A1.
(d) Varying the Cost of Adding New Spacers
The phase diagrams of the extinction probability of phages and bacteria do not change when the cost of adding novel spacers varies.
(e) Number of Spacers
The average number of spacers in our simulation does not reach ns in most of the parameter regime. In the range β ∈ [10−12, 10−8] and ½ ∈ [10−8, 10−6], the average number of spacers is 0−2, which is in agreement with the experiment data in [27].
(f) Number of Proto-spacers
When np = 1, the pattern of the extinction probability of phages is qualitatively the same as Fig. 1 in the main text.
Author contributions: MWD conceived of the study and wrote the manuscript. PH carried out the research and wrote the manuscript.