Abstract
We estimate the number of spacers in a CRISPR array of a bacterium which maximizes its protection against a viral attack. The optimality follows from a competition between two trends: too few distinct spacers make the bacteria vulnerable to an attack by a virus with mutated corresponding protospacers, while an excessive variety of spacers dilutes the number of the CRISPR complexes armed with the most recent and thus most effective spacers. We first evaluate the optimal number of spacers in a simple scenario of an infection by a single viral species and later consider a more general case of multiple viral species. We find that depending on such parameters as the concentration of CRISPR-CAS interference complexes and its preference to arm with more recently acquired spacers, the rate of viral mutation, and the number of viral species, the predicted optimal array length lies within a range quite reasonable from the viewpoint of recent experiments.
Author summary CRISPR-Cas system is an adaptive immunity defense in bacteria and archaea against viruses. It works by accumulating in bacterial genome an array of spacers, or fragments of virus DNA from previous attacks. By matching spacers to corresponding parts of virus DNA called protospacers, CRISPR-Cas system identifies and destroys intruder DNA. Here we theoretically estimate the number of spacers that maximizes bacterial survival. This optimum emerges from a competition between two trends: More spacers allow a bacterium to hedge against mutations in viral protospacers. However, keeping too many spacers makes the older ones inefficient because of accumulation of mutations in corresponding protospacers in viruses. Thus, fewer CRISPR-Cas molecular machines are left armed with more efficient young spacers. We have shown that a higher efficiency of CRISPR-Cas system allows a bacterium to utilize more spacers, increasing the optimal array length. On contrary, a higher viral mutation rate makes older spacers useless and favors shorter arrays. A higher diversity in viral species reduces the efficiency of CRISPR-Cas but does not necessary lead to longer arrays. We think that our study provides a new viewpoint at a huge variety in the observed array lengths and adds relevance to evolutionary models of bacterial-phage coexistence.
INTRODUCTION
CRISPR-Cas systems provide prokaryotes with adaptive immunity against viruses and plasmids by targeting foreign nucleic acids [1–3]. Multiple CRISPR-Cas systems differing in molecular mechanisms of foreign nucleic acids destruction, cas genes, CRISPR repeats structure, and the lengths, numbers and origin of spacers have been discovered [4, 5]. Yet the current understanding of diversity and function of CRISPR-Cas systems is far from being complete. The origins and, therefore, the targets of most spacers remain unknown [6–8]. The ubiquity of CRISPR-Cas systems in archaea compared to less than 50% presence in bacteria is also not well-explained [4, 9]. Evolutionary reasons for plethora of distinct CRISPR-Cas systems types, often coexisting in the same genome, remain largely unexplored [5, 10, 11]. It is also not clear why CRISPR arrays of some CRISPR-Cas systems contain only one or few spacers, while others have dozens or even hundreds of them [10–15]. It is commonly accepted that the number of spacers in an array is a result of a compromise between better protection offered against abundant, diverse, and faster evolving viruses by a larger spacer repertoire and a higher physiological cost of maintaining a longer array [16]. However, even the largest of the CRISPR systems contribute only 1% to the total size of a prokaryotic genome [11], so it is hard to imagine that adding or removing a few spacers would affect the growth rate in a noticeable way. Indeed, while there are various acknowledged sources of fitness cost for maintaining a CRISPR-Cas system [17, 18], none of them significantly depends on the number of the CRISPR spacers [11, 19, 20].
Virtually all models of prokaryotic and viral coevolution driven by CRISPR immunity include some representation of the number of CRISPR spacers. In some models the array content is limited by a maximal number of spacers (see, for example, [21], where such number is 8), or the number of spacers is determined dynamically as a result of competition between spacer acquisition and loss (such as in [22, 23]). For a given set of environmental conditions, such as the abundance and variety of infecting viruses, the dynamic determination of the optimal number of spacers often manifests itself as dominance of bacterial subpopulation with such arrays. At the same time, the number of spacers plays a major role in determining the complexity of simulation because it is usually required to check all possible pairwise spacer-protospacer matches to determine the immune status of a pair of bacterial and viral strains.
In this study, we propose a somewhat different view at the optimality of the number of spacers in CRISPR array. In particular, we ask a question of a rather idealized nature: What would be the number of spacers that maximizes protection of a given bacterium (rather than, for example, the survival of a bacterial species) from viruses? We show that the number of CRISPR spacers is primarily limited by dilution” of CRISPR complexes carrying the most immune-active recently acquired spacers that target viral protospacers which had the least time to mutate. Our analysis requires a more detailed look at the kinetics of binding of CRISPR effector (a complex of Cas proteins with an individual protective CRISPR RNA, crRNA) to viral targets and distribution of crRNAs with particular spacers among the effectors. Since the origin and utility of the majority of spacers in each array are unknown, we made a simplifying assumption that all spacers in an array come from viral DNA and are used to repel viral infections. Another simplifying assumption we made is that instead of focusing on the actual evolution that occurs in ever-changing natural viral and bacterial communities, we compare the performance of arrays in their steady state for a given set of environmental parameters. We find that there exists a non-trivial optimal number of spacers, which maximizes the bacterial survival chances.
THE MODEL
Basic assumptions
Consider a prokaryotic cell with an active CRISPR-Cas system in a medium where phages capable of infection are present. The cell is attacked by individual viruses in a random and independent way: an attack is either repelled or kills the cell on a much shorter timescale than a typical time interval between subsequent attacks (Fig. 1). We assume that CRISPR-Cas immunity is the only protection available against the infection and each infection, which overcomes the CRISPR defense, results in cell death.
The CRISPR array consists of a number of spacers acquired during previous viral attacks and does not change over the timescale of analysis. Each spacer corresponds to a protospacer in DNA of viruses capable of infection. A match between a spacer and a protospacer is a necessary (but not sufficient) condition for efficient defense from infection. Protospacers may mutate, making now partially complementary spacer ineffective. Thus, it could be beneficial for a cell to pick up more than one spacer from each virus thus reducing the probability of failure of CRISPR-Cas system to recognize viral DNA [16]. This allows the cell to hedge against mutation in single protospacer leading to more reliable recognition of the virus and increased probability of survival. It is intuitively appealing to arm more CRISPR effectors with newer, more recently acquired spacers rather than with the older ones so that the corresponding protospacers would have had less time to mutate. The older the spacer, the higher is the probability that the next encountered virus will have a corresponding protospacer mutated leading to cell death. Indeed, there is a strong preference for spacers acquisition at one end of CRISPR array [24, 25]. As a result, spacers in natural arrays are ordered according to their age, with more recently acquired spacers located closer to promoter from which the array is transcribed. While the abundance of individual crRNAs is a complex function of their processing rate from pre-crRNA CRISPR-array transcripts and stability, promoter-proximal crRNAs are expected to be generally more abundant that promoter-distal ones. This effect is expected from transcription polarity and made more pronounced by the palindromic nature of CRISPR repeats, which should promote transcription termination by RNA polymerase. Thus comes the second element of selective pressure over the number of CRISPR spacers: A too long array will “dilute” the concentrations of CRISPR effector complexes armed with most recently acquired and thus most efficient spacers, replacing them with older spacers whose com plimentary protospacers had a longer time to accumulate mutations. For simplicity, we assume that a single mismatch between a spacer and its protospacer makes the spacer ineffective [3]. While the reality is more complex and certain mutations in a protospacer do not preclude its recognition by the spacer [26], mutations in protospacer adjacent motif [27, 28] or seed region [26]) indeed abolish CRISPR interference and it is mutations of this kind that we consider in our work.
The optimal number of spacers may be thought of as emerging from competition between the opposing “more reliable recognition” and “dilution” trends. We ignore the fitness cost of maintaining a CRISPR array, often considered to be consisting of two parts: spacer-number-independent and spacer-number-dependent [21, 22]. While duplication of CRISPR-Cas system DNA must have its cost, yet every new spacer constitutes a very small part of CRISPR-Cas DNA (which itself is a small part of cellular genome) and such cost is ignored.
To summarize, we try to determine the optimal number of spacers in a CRISPR system illustrated in Fig. 1 under the following simplifying assumptions:
The cutting of viral DNA is possible when there is a perfect match between the spacer and protospacer, and a single mismatch makes the spacer-protospacer pair useless for cell protection/CRISPR interference [26–28].
Probability for a CRISPR effector complex to contain crRNA with a particular spacer decreases exponentially with the age of the spacer.
The total number of CRISPR effector complexes in the cell is constant. While there is some evidence that cas genes expression might be regulated in vivo [29], the assumption that at given conditions expression levels are constant seems to be reasonable.
There is only a single copy of viral DNA inside the cell upon infection, i.e., the multiplicity of infections is low.
CRISPR arrays have a constant size and composition that do not change on the timescale of viral infection, i.e., there is no CRISPR adaptation.
A single encounter between CRISPR-effector and virus resolves on a shorter timescale than the time between subsequent encounters.
We do not take into account any fitness costs of maintaining an array of a given length [19, 20].
Probability of interference
Assume that a cell carries an array consisting of S spacers which we number in the direction of age such that the most recently acquired spacer is assigned number 1. The cell is being attacked by a virus and CRISPR defense comes into play. The probability Bi for CRISPR effector charged with crRNA with spacer i to bind to the corresponding protospacer (or the fractional occupancy of the protospacer) is controlled by competition between binding and dissociation events which are described by the first and second terms in the right-hand side of the following kinetic equation,
Here k+ and k− are the association and dissociation rate constants for a matching spacer-protospacer pair and Ci is the copy number (uniquely related to its concentration since the volume of the cell is constant) of CRISPR effectors carrying the ith spacer crRNA. The steady state binding probability (or the fraction of time the corresponding protospacer is recognized by CRISPR effector) is
Now we compute how C CRISPR effectors present in the cell pick up crRNAs with particular spacers. We have postulated that the number of effector complexes that acquired spacer i decreases exponentially with the age of i. That is, each next spacer is δ times less likely to be present in CRISPR effector complex than its younger neighbor. We will further refer to δ as “crRNA decay coefficient” since we assume that the exponential decrease in the number of crRNA molecules with a defined spacer causes the corresponding decrease in the number of CRISPR effector complexes with this crRNA [30]. Hence the number of effector complexes Ci with crRNA with spacer i is
We determine Ci from the condition that the total number of CRISPR effector complexes is C by summing the corresponding geometric progression
Substituting (4) into (2) produces a complete expression for the binding probability between the ith spacer-protospacer pair,
Here β ≡ Ck+/(k−) is the dimensionless coefficient which determines the “binding efficiency” of CRISPR effector. The larger β, the larger fraction of time the effector spends bound to matching protospacer. The biological meaning of β becomes clear if one considers a CRISPR array consisting of a single spacer. Then the binding probability becomes the function of β only,
In such a case, the binding probability depends on how β compares to 1: If β ≫ 1, the binding probability saturates to its maximum equal to 1, while if β ≪ 1, the binding probability becomes proportional to β. For β = 1 the binding probability is precisely 1/2.
Assume that binding of every CRISPR effector to its matching protospacer proceeds independently of binding by other effectors to theirs, i.e., protospacers are well-separated in viral genomes. The total rate of interference is then proportional to the sum of binding probabilities of matching spacer-protospacer pairs and the probability of survival of viral DNA P(t) decays with a simple exponential kinetics,
Here a is the viral DNA degradation rate constant, which we consider to be a fixed property of a CRISPR-effector universal for all spacer-protospacer pairs. Hence the probability of successful interference where τ is the effective time of interference, roughly equal to the time of the duplication of viral DNA. In other words, for successful termination of infection, the CRISPR effector complexes have to destroy the viral DNA before or during the first round of its duplication. Destruction of individual viral genomes at later times can not prevent the runaway viral DNA replication and productive infection. Introducing a dimensionless parameter χ ≡ τα which characterizes the interference efficiency, turns Eqs. (8 and 5) into
Constants β and χ have simple interpretations in terms of familiar Michaelis kinetics: The process of interference can be viewed as a transformation of viral DNA catalyzed by CRISPR effectors. The binding efficiency β corresponds to the inverse Michaelis constant per substrate concentration, S/KM in standard notations, (6,9). The interference efficiency χ corresponds to the maximal reaction rate in Michaelis kinetics, i.e. the rate at which viral DNA bound by CRISPR effector is degraded.
Survival probability
Assume that a virus infecting a cell at a given moment is drawn from a big pool with a probability of infection proportional to the concentration of its type v and that infections by different viruses are independent of each other. Then the probability Ak to experience k infections over time t is given by a Poisson distribution with the average number of infections rNt scaling linearly with time. where r is a proportionality coefficient considered to be the same for all viruses and N is concentration of the viral particles. To survive during a given time, each cell needs to repel all infections happening within this time, hence the probability of survival till time t is
Here I, defined in Eq. (9), is the probability to survive a single infection, i.e., the probability of successful CRISPR interference. From our assumption that viruses infect independently of each other it follows that the probability E(t) for a cell to survive in the medium with several different viruses with concentrations vj is given by the product of survival probability determined for each virus separately,
This is sketched in Fig. 2. The probability of CRISPR interference with a single infection Ij is defined as in (9) with the sum running over all spacers taken from the jth virus. In the following we use E(t) as the measure of overall CRISPR system performance.
Single viral species
To illustrate and further develop the general statement (12), consider a scenario of a single viral species infecting a cell that has a CRISPR array with just two spacers. The immunity depends on the mutation status of corresponding protospacers in viral population. In this model, the mutation status of the spacer will be defined as the fraction of mutated protospacers in the viral population. We denote by m1 and m2 the probabilities for the first and second protospacers to remain mutation-free and thus recognizable by CRISPR effectors. If the total concentration of viral particles is N the concentration of the wild type variant without any mutations is m1m2N, the concentration of the variant with mutation in the second protospacer is m1(1 – m2)N, the concentration of the variant with mutation in the first protospacer is m2(1 – m1)N, and the concentration of the variant with mutations in both protospacers, i.e., an escape mutant not subject to CRISPR interference, is (1 – m1)(1 – m2)N. From Eqs. (9 and 12) and our assumption that a mutation in protospacer renders the corresponding spacer completely inefficient, it follows that the survival probability in such case is
The last term in the exponent corresponds to the probability to experience no infection by viruses with both mutated protospacers (in which case I4 = 0 since such an infection would result in cell death). Transforming the expression in the exponent, we obtain
This expression has a simple probabilistic interpretation: The ith term in curly brackets describes the probability of failure of CRISPR effector complexes armed with the ith spacer crRNA. The product of such terms describes the probability of failure of all CRISPR effectors and thus the death of the cell. The expression (14) is the probability for the Poisson process of failures of CRISPR system to have zero counts or no failures at all, which translates into survival of the cell. Mutual independence of encounters with different mutation variants of the virus simplifies the survival probability of the cell to the probability of not to be affected by the average” encounter repeated rNt times. This simple interpretation allows us to generalize (14) to cases of arrays containing more than 2 spacers, replacing the upper limit of the product by an actual number of CRISPR spacers S,
To reduce the number of independent parameters in Eq. (15) and in the following expressions for the survival probability, we estimate mi. We assume that spacers were acquired to the array in a periodic fashion, that is, the time intervals tins between subsequent acquisition of spacers were the same. The probability for a protospacer to remain mutation-free decreases exponentially with time, and the “age” of the ith protospacer is proportional to i. Hence, the probability of a perfect match for the ith spacer-protospacer pair at the middle of the time interval between spacer acquisitions can be approximated as μi–1/2. Here 0 < μ < 1 is the probability for a protospacer in viral DNA not to undergo any mutations during tins and –1/2 in the exponent stands for assessing the cell survival probability in tins/2 time units after the acquisition of the last spacer, i.e. in the middle of the interval between spacer acquisitions. The parameter μ depends on genetic and environmental factors such as the rate of mutations in viral DNA, the size of the viral population, the size of protospacer, and the average rate at which cells acquire new spacers. Eq. (16), together with the binding probability (5), completely define the survival probability of a cell with a given number of spacers S as a function of dimensionless parameters μ, χ, δ and β. Note that the optimal number of spacers does not depend on the total time of observation t that was used for cell survival evaluation: In Eq. (16) the position of the maximum of E(t) is determined by the maximum of the product in the exponent and is independent of rNt.
A typical dependence of survival probability E(t) on the crRNA decay coefficient δ and the number of spacers S is shown in Fig. 3. We inferred the interference probability I1 ≈ 0.7 of a single spacer array from the experimental data [31] and set the binding efficiency β = 1 and the interference efficiency χ = 2 to reproduce the measured single-spacer interference probability. The probability for a protospacer not to mutate over the typical period between spacer acquisition was chosen to be μ = 0.9. The typical number of infections over the time of observation was rNt = 5. It follows from Fig. 3 that the survival is maximized for δ ≈ 0.7 and S = 6. In panel B the dependence of E(t) vs. S is shown for several values of d. Curiously, for low d, the survival E(t) does not noticeably decrease for large S. It happens because of the exponential suppression in frequencies of crRNA with older spacers in effector complexes: no matter how long the array is, only crRNA with the first few spacers are mainly used by effectors. Thus, an “automatic” cutoff is implemented.
Naturally, the optimal number of spacers depends on such parameters as protospacer mutation rate 1 – μ and the efficiency of effector binding to its targets β: In Fig. 4 we show how the plot of the typical case” shown above in Fig. 3 is affected by changes in these system parameters. An increase in the mutation rate shifts the optimum towards fewer spacers or stronger reliance of the CRISPR-Cas system on crRNA with the first spacer. In the extreme case this can lead to the optimal array containing one spacer only (Fig. 4, top-left corner). This corresponds to the case when there is a very high chance that older spacers have mutated, so the benefit from using the second spacer cannot overcome the decrease in the number of effector complexes loaded with crRNA containing the first, most recently acquired spacer. In contrast, an increase of CRISPR interference efficiency shifts the optimum towards more CRISPR spacers and more equal contribution of spacers of different age (Fig. 4, bottom-right corner). An increase in the binding efficiency leads to a larger fraction of time the effector spends bound to the protospacer ultimately leading to binding saturation. In this case the sharing of CRISPR effectors between crRNAs with different spacers is beneficial as it allows the effectors to reduce competition for the same protospacer. An increase in the CRISPR interference efficiency χ also leads to an increase in survival probability (data not shown).
For a more detailed study of the optimal number of spacers, we conducted the following calculations: for each set of array-independent” parameters μ, β, χ we analyzed the CRISPR efficiency in the whole range of the number of spacers S and crRNA decay coefficients δ. The number of spacers Sopt and crRNA decay coefficient δopt that maximized survival probability, as well as the maximal survival probability itself Emax(t) are plotted in Fig. 5. As discussed above, higher viral mutation rates lead to lower survival probability and fewer spacers (Fig. 5A). For very high mu tation probability (above 0.7) the CRISPR interference efficiency approaches zero for all values of other parameters. The mutation rate of viruses caps the CRISPR efficiency as the probability to survive the infection is constrained by the probability Imax that at least one of viral protospacers has not mutated.
On the other hand, a high binding β or interference efficiency χ lead to arrays with more spacers and higher survival probability (Fig. 5B, C). In this case, more CRISPR effectors can complex with crRNAs with older spacers without interfering with the binding to crRNAs with younger spacers due to the system saturation. Arrays with more spacers both increase the viral DNA degradation rate and, more importantly, reduce the chance that the cell becomes unprotected if some of protospacers mutate. This suggests a correlation between the optimal number of spacers Sopt and the maximal protective performance of CRISPR-Cas system Emax(t). Comparing the optimal number of spacers and maximal survival probability heat-maps shown in Fig. 6, one sees that the parameters that produce high survival probability indeed correspond to arrays with relatively many (more than 10 spacers) spacers.
Figs. 5 and 6 lead to a conclusion that there is a definite set of parameters for which CRISPR- Cas systems are efficient. The virus mutation probability should remain low on the timescale of spacer acquisition, while the binding of effector complexes to target protospacers and the rate of degradation of viral DNA should be high. This set of parameters favors arrays with more spacers. It implies a simple rule: the array can contain many spacers and be efficient or contain few spacers and be inefficient. In reality, the array composition could change on the timescale of viral infections, which may increase CRISPR interference efficiency. This, however, goes beyond the important assumption of our model of the static nature of the array and thus is beyond our present consideration. On the other hand, it shreds the light on the adaptive immunity as the only efficient way of CRISPR-related defense in the viral environments with fast mutation rates.
Multiple viral species
Consider now a more realistic scenario of a cell confronting several distinct viral species. Using the same logic as in the section above and, specifically considering infections by different viruses being independent of each other, we conclude that the survival probability is given by the Eq. (12), where the index of the product j enumerates all viral species, including their mutation variants, present in the system. The interference term associated with a viral species j not targeted by any spacer present in a given array is zero, Ij = 0. The corresponding term in the survival probability exp(‒rNtVj) describes the probability for a cell not to encounter such a virus till time t.
Similarly to the case of single viral species, we account for mutation variants of each virus and reduce (12) to the product running over only distinct viral species. In order to simplify further analysis we denote by vi the fraction of the ith virus in the total number of viruses N so that vi = Ni/N, where Ni is the number of viral particles of species i. This results in the following expression for survival probability of a cell with a given combination of spacers,
Here the sum over j counts all ν viral species while the product over i enumerates all spacers {Sj} taken from the jth virus. As in (15), we approximate mi by μi–1/2 assuming again that spacers are acquired in a periodic fashion, with equal times between acquisitions.
The equation (18) describes survival probability of a cell with a given CRISPR array characterized by sets of spacers {Sj} taken from viral species j. In order to evaluate the overall performance of a CRISPR array with S spacers, we need to enumerate survival probabilities for all combination of spacers in such an array. To do so, we assume that the probability to acquire a spacer from a given viral species is proportional to the fraction of such species in the total viral pool. Hence the probability of an array to have certain combination of spacers is where vk is the relative concentration of viral species from which the spacer k has been acquired. For example, an array of two spacers (a, b) in a system populated by two viral species 1 and 2 with relative concentrations v1 and v2 can be in any of the following four forms with corresponding probabilities: , and .
The average survival probability of a cell in a multiviral medium is a sum of survival probabilities corresponding to each combination of spacers Ec, weighted by the probability to acquire such a combination Pc, and the summation runs over all combinations of spacers.
A typical plot of E(t) is presented in Fig. 7. In this calculation we considered two species of viruses with the same population size v1 = v2 = 0.5. The values of other parameters were the same as in Fig. 3: The binding efficiency β =1, the interference efficiency χ = 2, the probability for a protospacer not to mutate over the typical period between spacer acquisition μ = 0.9, and the typical virus encounter number rNt = 5. Comparing to the single-virus case in Fig. 3, the total number of viral particles is the same, but the virus pool is now split between two species.
In general, the shape of the survival probability E(t) profile is similar to the single-virus case and E(t) reaches its maximum for certain δ and S. However, comparing the optimal number of spacers, crRNA decay coefficient, and survival probabilities between the single- and two-virus cases (Figs. 3A and 7), one sees that in the two-virus case the maximum is generally shifted towards arrays with more spacers, and E(t) is lower. For a given set of parameters the addition of the second virus does not significantly shift the optimal S and δ but drops the survival probability dramatically. If the virus mutation rate is lower and the CRISPR interference efficiency is higher, the presence of an additional viral species will affect the optimal S and δ more strongly. However, relating the model parameters to the experimental results [31], it is unlikely that the CRISPR efficiency in the multivirus environment can be significantly higher in vivo than the numbers shown in Fig. 7.
When the number of virus species in the total virus pool increases even without a change in the total viral particles concentration, the survival probability approaches zero (Fig. 8A). This occurs because the efficient number of spacers is limited by the virus mutation rate and the number of effector complexes present in the cell (encoded in the coefficient β). In other words, further increase in the number of spacers does not lead to any increase in protective function of CRISPR- Cas. Since an array of an effectively limited number of spacers has to contain spacers from more virus species, fewer spacers match each virus and the survival probability decreases.
Another observation is obtained considering the two-virus case and changing the ratio of those viruses in the pool (Fig. 8B). As expected, the survival probability reaches a maximum when the fraction of one virus approaches zero (which correspond to the single-virus case) and goes to a minimum when the two viruses are equally abundant.
This brings us to the conclusion that survival probability of a cell dramatically depends on the diversity of the viral pool.
DISCUSSIONS AND CONCLUSIONS
The function of CRISPR-Cas as prokaryotic adaptive immune system has been extensively studied from the point of view of molecular mechanisms. Its ecological role and its contribution to the “arms race” between prokaryotes and their viruses have been analyzed in many evolutionary dynamics models and found to be very complex and often unpredictable. In this work, we qualitatively explored the forces affecting the number of spacers in a CRISPR array. We found that more spacers in a CRISPR array targeting a virus decrease the chances of the virus to escape detection through simultaneous mutation in all targeted protospacers. Also, more spacers lead to more effective use of CRISPR effectors, distributing them between a larger number of target protospacers, which results in higher probability of viral DNA destruction. However, at the same time, more diverse crRNA repertoire results in fewer effector complexes charged with crRNAs containing recently acquired spacers that target protospacers least likely to mutate. The interplay of these forces leads to the optimum in the number of spacers per array, determined by the properties of the CRISPR system and viral environment in the following way: A better binding of the CRISPRS effectors to their targets and faster rate of target DNA degradation allow a bacterium to maintain more spacers in the array and increase its survival probability. Also, less frequent mutations in viral protospacers create an opportunity for hedging against those mutations by keeping more of previously acquired spacers. In contrast, a less efficient kinetics of binding and viral DNA cutting and faster-mutating viruses make arrays with fewer spacers more advantageous. A few comments on applicability of our results and biological insights that can follow from them are in order:
Effects of dynamics and environment
Our results were derived explicitly assuming a steady state of the CRISPR-virus dynamics. However, in previous research, both modeling and experimental, it was shown that CRISPR systems are far from being stable, undergoing periodic and irregular variations that play an important role in their function [21, 32]. While in our analysis we assumed that the viral environment is constant (except for appearance of mutant protospacers), the actual viral dynamics may affect the optimal lengths of CRISPR arrays. Changes in the environment could explain an increased efficiency of the shorter array in experimental condition comparing to the wild strain [33]. Moreover, the primed acquisition of spacers could happen on the timescale of a virus attack [34], which would invalidate our basic assumption of separation of timescales of viral attacks and spacer acquisition. These factors, not analyzed in our work, could affect the optimal number of CRISPR spacers and are subject to further analysis.
Comparison with existing results
Our results generally agree with the main findings of models existing in the field: We confirm that a higher diversity of viruses in the environment results in a dominance of viruses over the CRISPR system [22, 35]. This effect could be achieved by either a high number of virus species in the environment or a high mutation rate of viruses belonging to the single species (often associated with large viral population). However, here we have also shown that a diversity of virus species leads to arrays with more spacers while a higher viral mutation rate leads to arrays with fewer spacers. This agrees with a proposed hypothesis that a lower viral mutation rate leads to arrays with on average more spacers in thermophilic bacteria [35]. Another important note on comparing our model with existing ones is related to the definition of probability of CRISPR immunity failure. Some of the models used a binary approach to immunity failure [21]. Either the infected cell kills the virus or the virus kills the cell and reproduces normally. We define the CRISPR failure probability 1 – I as the probability of viral DNA not getting cut by CRISPR effectors/executors during viral DNA duplication cycle. Distinguishing between these two approaches is important as it affects the interpretation of parameters obtained from experiments. For example, a CRISPR-Cas system can remain active in doomed or dead cells, resulting in lower viral burst size and fewer secondary infections [31]. Our analysis based on [31] resulted in the estimate of the CRISPR failure probability around 30% compared to 10−5 in [21].
Importance of hairpins
One of important observations is that the equipartition of crRNA between CRISPR effector complexes is not optimal and a decrease of the fraction of older crRNA bound to effectors increases the overall efficiency of the immune response. While there is a limited pool of effectors, they serve better when binding to crRNAs with most recently acquired spacers. Since the probability that a spacer no longer matches the protospacer increases with time, Cas effectors should either have a higher affinity towards crRNA from younger spacers (which is impossible to accomplish) or crRNA containing more recent spacers should be more abundant. This latter may be implemented naturally owing to formation of hairpin by CRISPR repeats in the primary array transcripts [36, 37]. It is well known that hairpins have a potential to pause or terminate transcription elongation [38, 39]. The longer the array is, the more hairpins need to be transcribed and the higher the chance is that transcription would be terminated before the RNA polymerase reaches the end of the array. This could result in more abundant shorter pre-crRNAs that include only the younger spacers.
Fitness cost of CRISPR system
While in our study we ignored the fitness costs of an active CRISPR system, we find it important to discuss it as these were studied in various experimental works and included in some models [40]. It has been shown in a number of publications that the activity of CRISPR systems is under strong evolutional pressure. There are various factors that can contribute to the cost of CRISPR including genomic burden [41], the cost of maintenance of cas genes [19], self-immunity [42] and blockage of beneficial horizontal gene transfer (HGT) [17]. However genomic burden seems not to be significant in most cases as even the largest of the CRISPR systems contribute only 1% to the total size of a prokaryotic genome [11]. In the case of self-immunity, it seems to be related to the very process of acquisition of new spacers, thus, self-immunity only indirectly affects the length of the spacer array [43–45]. For the cost of gene maintenance [19] and blockage of HGT [20], it has been shown that an increase in the number of spacers also does not have any significant fitness cost. Thus, in this work, we considered that the fitness cost of CRISPR system did not affect the optimal number of spacers in CRISPR array. In other words, there is no additional fixed cost of the spacer apart from the one arising from Cas effector dilution. That resulted in separation of the number of spacers question from the overall fitness. The factors described in this work affect the optimum length of the CRISPR array and the total fitness benefit of CRISPR system. And this total fitness benefit now can be compared to the fitness cost of CRISPR-Cas system maintenance, that will give the answer whether the CRISPR system will be effective or tends to be knocked out [46].
CONCLUSIONS
We theoretically predict the optimal number of spacers in a CRISPR array which falls into reasonable range from the viewpoint of current experimental data and show that it depends on the interference efficiency of CRISPR effector, crRNA spacer-protospacer binding efficiency, and virus mutation rate.
Good (from the point of view of the cell) conditions, such as high interference and binding efficiencies and slow mutation of viral protospacers favor arrays with more spacers. Conversely, less favorable conditions shift the optimum to arrays with fewer spacers.
Arrays containing only a few (less than 10) spacers offer significantly less protection from viral attacks.
The majority of optimal array configurations have a non-uniform distribution of unique crRNAs among CRISPR effector complexes with a preference for crRNAs with more recently acquired spacers.
Fighting against multiple viral species shifts the optimum towards arrays with more spacers and dramatically decreases the maximum efficiency of the CRISPR system.
ACKNOWLEDGMENTS
Y.I. was supported by FONDECYT (Chile) grant 1151524 and by SkolTech during his stay there.