Abstract
Marine viruses are critical drivers of ocean biogeochemistry and their abundances vary spatiotemporally in the global oceans, with upper estimates exceeding 108 per ml. Over many years, a consensus has emerged that virus abundances are typically 10-fold higher than prokaryote abundances. The use of a fixed-ratio suggests that the relationship between virus and prokaryote abundances is both predictable and linear. However, the true explanatory power of a linear relationship and its robustness across diverse ocean environments is unclear. Here, we compile 5671 prokaryote and virus abundance estimates from 25 distinct marine surveys to characterize the relationship between virus and prokaryote abundances. We find that the median virus-to-prokaryote ratio (VPR) is 10:1 and 16:1 in the near-and sub-surface oceans, respectively. Nonetheless, we observe substantial variation in the VPR and find either no or limited explanatory power using fixed-ratio models. Instead, virus abundances are better described as nonlinear, power-law functions of prokaryote abundances - particularly when considering relationships within distinct marine surveys. Estimated power-laws have scaling exponents that are typically less than 1, signifying that the VPR decreases with prokaryote density, rather than remaining fixed. The emergence of power-law scaling presents a challenge for mechanistic models seeking to understand the ecological causes and consequences of marine virus-microbe interactions. Such power-law scaling also implies that efforts to average viral effects on microbial mortality and biogeochemical cycles using “representative” abundances or abundance-ratios need to be refined if they are to be utilized to make quantitative predictions at regional or global ocean scales.
Viruses of microbes have been linked to central processes across the global oceans, including biogeochemical cycling [9, 29, 39, 43, 47, 53] and the maintenance and generation of microbial diversity [3, 36, 39, 47, 52]. Virus propagation requires contacting and infecting cells. The per cell rate at which microbial cells – including bacteria, archaea, and microeukaryotes – are contacted by viruses is assumed to be proportional to the product of virus and microbial abundances [33]. If virus and microbe abundances were related in a predictable way it would be possible to infer the rate of contact, and potentially the relative importance of virus-induced cell lysis, from estimates of microbial abundance alone.
Virus ecology underwent a transformation in the late 1980s with the recognition that virus abundances, as estimated using culture-independent methods, were orders of magnitude higher than estimates via culture-based methods [4]. Soon thereafter, researchers began to report the “virus to bacterium ratio” (VBR) as a statistical proxy for the strength of the relationship between viruses and their potential hosts in both freshwater and marine systems [31]. This ratio is more appropriately termed the “virus-to-prokaryote ratio” (VPR) – a convention which we use here [1].
Observations accumulating over the past 25 years have observed wide variation in VPR, yet there is an emergent census that a suitable first-approximation is that VPR is 10 (see Table I). This ratio also reflects a consensus that typical prokaryote abundances are approximately 106 per ml and typical virus abundances are approximately 107 per ml [50, 57]. Yet, the use of a fixed ratio carries with it another assumption: that of linearity, i.e., if prokaryote abundance were to double, then viruses are expected to double as well. An alternative is that the relationship between virus and prokaryote abundance is better described in terms of a nonlinear relationship, e.g., a power-law.
In practice, efforts to predict the regional or global-scale effects of viruses on marine microbial mortality, turnover and even biogeochemical cycles, depend critically on the predictability of the relative density of viruses and microbial cells. Here, we directly query the nature of the relationship between viruses and prokaryotes via a large-scale compilation and re-analysis of abundance data across marine environments.
I. RESULTS
A. VPR exhibits substantial variation in the global oceans
In the compiled marine survey data (see Figure 1, Table S1 and the Materials and Methods), 95% of prokaryote abundances range from 5.0 ×103 to 4.1 × 106 per ml and 95% of virus abundances range from roughly 3.7 × 105 to 6.4 × 107 per ml (Figure 2A). Both prokaryote and virus concentrations generally decrease with depth as reported previously (e.g., see [18]). The median VPR for the near-surface samples (≤ 100m) is 10.5 and the median VPR for the sub-surface samples (> 100m) is 16.0. In that sense, the consensus 10:1 ratio does accurately represent the median VPR for the surface data. We also observe substantial variation in VPR, as has been noted in prior surveys and reviews (see Table 1). Figure 2B shows that 95% of the variation in VPR in the near-surface ocean lies between 1.4 and 160 and between 3.9 and 74 in the sub-surface ocean. For the near-surface ocean, 44% of the VPR values are between 5 and 15, 16% are less than 5 and 40% exceed 15. This wide distribution, both near-and sub-surface demonstrates potential limitations in utilizing the 10:1 VPR, or any fixed ratio, as the basis for a predictive model of virus abundance derived from estimates of prokaryote abundance.
B. Virus abundance does not vary linearly with prokaryote abundance
Figure 3 shows two alternative, predictive models of the relationship between logarithmically scaled virus and prokaryote abundances for water column samples. The models correspond to a fixed-ratio model and a power-law model. To clarify the interpretation of fitting in log-log space consider a fixed-ratio model with a 12:1 ratio between virus and prokaryote abundance, V = 12 ≤ B. Then, in log-log space the relationship is which we interpret as a line with y-intercept of log10 12 = 1.08 and a slope (change in log10V for a 1-unit change in log10 B) of 1. By the same logic, any fixed-ratio model will result in a line with slope 1 in the log-log plot and the y-intercept will vary logarithmically with VPR. The alternative predictive model is that of a power-law: V = cBα1. In log-log space, the relationship is: The slope, α1, of a fitted line on log-transformed data denotes the power-law exponent that best describes the relationship between the variables. The intercept, α0, of a fitted line on log-transformed data denotes the logarithmically transformed pre factor.
The 10:1 line has a residual squared error of -16% and -25% in the surface and deep samples, respectively (Table 2). In both cases, this result means that a 10:1 line explains less of the variation in virus abundance compared to a model in which virus abundance is predicted by its mean value across the data. In order to evaluate the generality of this result, we considered an ensemble of fixed-ratio models each with a different VPR. In the near-surface samples, we find that all fixed-ratio models explain less of the variation (i.,e., have negative values of R2) than does a “model” in which virus abundance is predicted to be the global mean in the dataset (Figure S1). This reflects the failure of constant ratio (i.e., linear) models to capture the cluster of high VPRs at low prokaryote density apparent in the density contours of Figure 2A and the shoulder of elevated high VPR frequency in Figure 2B. The largest contributor to this cluster of points is the Arctic SBI study (see Figure S1). Whereas, in the sub-surface samples, fixed-ratio models in which VPR varies between 12 and 22 do have positive explanatory power, but all perform worse than does the power-law model (Figure S1). In contrast, the best fitting power-law model explains 15% and 64% of the variation in the data, for near-and sub-surface samples respectively (Table 2). The best-fit power-law scaling exponent is 0.42 with 95% confidence intervals (CIs) of (0.39, 0.46) for near-surface samples and 0.53 with 95% CIs of (0.52, 0.55) for sub-surface samples. In other words, doubling prokaryote abundance along either regression line is not expected to lead to a doubling in virus abundance, but rather a 20.42 = 1.33 and 20.53 = 1.44 fold increase, respectively. The power-law model is an improvement over the fixed ratio model in both cases, even when accounting for the increase in parameters (Table 2). In the near-surface, refitting surface data without outliers improves explanatory power to approximately R2 = 0.3 in contrast to an R2 = 0.65 for the sub-surface (see Methods and Figure S2). In summary, the predictive value of a power-law model is much stronger in the sub-surface than in the near-surface, where confidence in the interpretation of power-law exponents is limited.
C. Study-to-study measurement variation is unlikely to explain the intrinsic variability of virus abundances in the surface ocean
Next, we explored the possibility that variation in methodologies affected the baseline offset of virus abundance measurements and thereby decreased the explanatory power of predicting virus abundances based on prokaryote abundances. That is, if V * is the true and unknown abundance of viruses, then it is possible that two studies would estimate and = V*(1 + ε2) where |ϵ1| and |ϵ2| denote the relative magnitude of study-specific shifts. We constrain the relative variation in measurement, such that the measurement uncertainty is 50% or less (see Materials and Methods). The constrained regression model improves the explanatory power of the model (see Table 2), but in doing so, the model forces 18 of the 25 studies to the maximum level of measurement variation permitted (Figure S3). We do not expect that differences in measurement protocols to explain nearly 2 orders of magnitude variation in estimating virus abundance given the same true virus abundance in a sample. Note that when sub-surface samples were analyzed through the constrained power-law model, there was only a marginal increase of 2% in R2 and, moreover, 9 of the 12 studies were fit given the maximum level of measurement variation permitted (Figure S3)[h!]. The constrained intercept model results suggest that the observed variation in virus abundance in the surface oceans is not well explained strictly by variation in measurement protocol between studies.
D. VPR decreases with increasing prokaryote abundance – a hallmark of power-law relationships
We next evaluate an ensemble of power-law models: Vi = ciNαi where the index i denotes the use of distinct intercepts and power-law exponents for each survey. The interpretation of this model is that the non-linear nature of the virus to prokaryote relationship may differ in distinct oceanic realms or due to underlying differences in sites or systems, rather than due to measurement differences. Figure 4 shows the results of fitting using the study-specific power-law model in the surface ocean samples. Study-specific power-law fits are significant in 18 of 25 cases in the surface ocean. The median power-law exponent for studies in the surface ocean is 0.50. Furthermore, of those significant power-law fits, the 95% distribution of the power-law exponent excludes a slope of one and is entirely less than one in 11 of 18 cases (see Figure 5). This model in which the power-law exponent varies with study is a significant improvement in terms of R2 (Table 2). For sub-surface samples, study-specific power-law fits are significant in 10 of 12 cases in the sub-surface (Figure S4). The median power-law exponent for studies in the sub-surface is 0.67. Of those significant power-law fits, the 95% distribution of the power-law exponent is entirely less than one in 6 of 10 cases (see Figure S5). A power-law exponent of less than one means that virus abundance increases less than proportionately given increases in prokaryote abundance. This study-specific analysis extends the findings that nonlinear, rather than linear, models are more suitable to describe the relationship between virus and prokaryote abundances. We find that the dominant trend in both near-surface and sub-surface samples is that VPR decreases as prokaryote abundance increases. The increased explanatory power by study is stronger for near-surface than for sub-surface samples. This increase in R2 comes with a caveat: study-specific models do not enable a priori predictions of virus abundance given a new environment or sample site, without further effort to disentangle biotic and abiotic factors underlying the different scaling relationships.
II. DISCUSSION
Viruses are increasingly considered as part of efforts to understand the factors controlling marine microbial mortality, productivity, and biogeochemical cycles [8, 9, 19, 39, 45, 55]. Quantitative estimates of these viral-induced effects can be measured directly, but are often inferred indirectly, using the relative abundance of viruses to prokaryotes. To do so, there is a consensus that assuming the virus-to-prokaryote ratio is 10 in the global oceans - despite observed variation - is a reasonable starting point. Here, we have re-analyzed the relationship of virus to prokaryote abundance in 25 marine survey datasets. We find that 95% of the variation in VPR ranges from 1.4 to 160 in the near-surface ocean and from 3.9 to 74 in the sub-surface. Although the 10:1 ratio accurately describes the median of the VPR in the surface ocean, the broad distribution of VPR implies that prokaryote abundance is a poor quantitative predictor of virus abundance. Moreover, increases in prokaryote abundance do not lead to proportionate increases in virus abundance. Instead, we propose that the virus to prokaryote abundance relationship is nonlinear, and that the degree of nonlinearity – as quantified via a power-law exponent – is typically less than 1. This sublinear relationship can be interpreted to mean that VPR decreases as an increasing function of prokaryote abundance, and generalizes earlier observations [57].
Power-law relationships between virus and prokaryote abundance emerge from complex feedbacks involving both exogeneous and endogenous factors. The question of exogenous factors could be addressed, in part, by examining environmental covariates at survey sites. For example, if prokaryote and virus abundances varied systematically with another environmental co-factor during a transect, then this would potentially influence the inferred relationship between virus and prokaryote abundances. In that same way, variation in environmental correlates, including temperature and incident ration, may directly modify virus life history traits [20, 44]. Likewise, some of the marine survey datasets examined here constitute repeated measurements at the same location (e.g., at the Bermuda Atlantic Time-series Study (BATS)). Time-varying environmental factors could influence the relative abundance of microbes and viruses. It is also interesting to note that viruses-induced mortality is considered to be more important at eutrophic sites [57], where microbial abundance is higher - yet the observed decline in VPR with prokaryote abundance would suggest the opposite.
It could also be the case that variation in endogenous factors, e.g., the life history traits of viruses and microbes, including variability in the structure of cross-infection [54], could drive changes in emergent relationships between virus and prokaryote abundances. Virus-microbe interactions can lead to intrinsic oscillatory dynamics. Indeed, previous observations of a declining relationship between VPR and prokaryote abundance have been attributed to changing ratios across phyto-plankton bloom events, including possible virus-induced termination of blooms [57]. Similar arguments were proposed in the analysis of tidal sediments [13]. Alternatively, observations of declining VPR with prokaryote density have been attributed to variation in underlying diversity [6]. Whatsoever the mechanism(s), it is striking that virus abundances in some surveys can be strongly predicted via alternative power-law functions of prokaryote abundances. Mechanistic models are needed to further elucidate these emergent macroecological patterns and relationships.
The present analysis separated the abundance data first according to depth and then according to survey as a means to identify different relationships between virus and prokaryote abundances in the global oceans. The predictive value of total prokaryote abundance is strong when considering sub-surface samples. In contrast, prokaryote abundance is not a strong predictor of virus abundance in the near-surface samples, when utilizing linear or nonlinear models. The predictive power of nonlinear models improved substantially in the near-surface when evaluating each marine survey separately. The minimal predictive value of prokaryoate abundances in the near-surface when aggregating across all surveys is problematic given that virus-microbe interactions have significant roles in driving prokaryote mortality and ecosystem functioning [9, 45, 53]. Indeed the aggregation of abundance measurements in terms of total prokaryote abundances may represent part of the problem. At a given site and time of sampling, each microbial cell in the community is potentially targeted by a subset of the total viral pool. In moving forward, understanding variation in virus abundance and its relationship to prokaryote abundance requires a critical examination of correlations at functionally relevant temporal and spatial scales, i.e., at the scale of interacting pairs of viruses and microbes. We encourage the research community to prioritize examination of these scales of interaction as part of efforts to understand mechanisms underlying nonlinear virus-microbe abundance relationships in the global oceans.
III. MATERIALS AND METHODS
A. Data source
Marine virus abundance data was aggregated from 25 studies (Table S1). A total of 5671 data points were aggregated. The data collection dates range from 1996 to 2012. Data primarily comes from coastal waters in the northern hemisphere and were collected predominately during the summer months, with the notable exceptions of long-term coastal monthly monitoring sites, i.e., the studies USC MO, BATS, and MOVE.
B. Data processing
Analyses of the data were performed using R version 3.1.1. Scripts and original data are provided at https://github.com/cwigington3/VIRBAC analysis.
C. Power-law model
A power-law regression model used the log10 of the predictor variable, prokaryote abundance per mL N, and the log10 of the outcome variable, virus abundance per mL V. The power-law regression was calculated using the equation log10 V = α0 + α1 log10 N. The α0 and α1 parameters were fit via OLS regression to minimize the sum of square error.
D. Constrained variable-intercept model
The constrained model is a “mixed-effects” regression model using the same predictor and outcome variables, log10 of prokaryote abundance per mL and the log10 virus abundance per mL, respectively. This model includes study-specific intercepts which were constrained such that the value for any of the intercepts were restricted to one standard error above or below the intercept value taken from the power-law model. The standard error value for this model came from the power-law model. The equation for this model is , where is the study-specific intercept and and α1 is the slope common to all studies, N is the predictor variable, and V is the outcome variable.
E. Variable slope and variable intercept model
A power-law model where the exponent and intercept varied with each study was evaluated using the same predictor variable, log10 prokaryote abundance per mL, and the same outcome variable, log10 virus abundance per mL. In this model, there was a study-specific α0 and α1 and a OLS regression calculated using the equation V = .
F. Bootstrapping model confidence intervals
Bootstrap analyses of the power-law model and mixed effects models were conducted to derive 95% confidence intervals surrounding the parameters estimated by the models. For all models the original dataset was sampled with replacement, by study, to arrive at a bootstramp sample dataset, this process was repeated 10,000 times. Distributions for all parameters were generated and the 2.5%, 50%, and 97.5% points were identified from among the 10,000 parameter estimates.
G. Outlier identification
Outliers in the data were identified by calculating the top and bottom 2% of estimated VPR amongst the entire 5,671 samples. The outliers corresponded to ratios below 1.81 and above 128. Those samples with virus to prokaryote ratios which fell outside of these bounds were considered outliers. There were 218 outlier samples taken at depths ≤ 100m and 10 outlier samples taken at depths >100m.
This work was supported by NSF grants OCE-1233760 (JSW) and OCE-1061352 (AB and SWW), a Career Award at the Scientific Interface from the Burroughs Wellcome Fund (JSW) and a Simons Foundation SCOPE grant (JSW). This work arose from discussions in the Ocean Viral Dynamics working group at the National Institute for Mathematical and Biological Synthesis, an Institute sponsored by the National Science Foundation, the U.S. Department of Homeland Security, and the Department of Agriculture through NSF Award EF-0832858, with additional support from The University of Tennessee, Knoxville.
References
- [1].↵
- [2].
- [3].↵
- [4].↵
- [5].
- [6].↵
- [7].
- [8].↵
- [9].↵
- [10].
- [11].
- [12].
- [13].↵
- [14].
- [15].
- [16].
- [17].
- [18].↵
- [19].↵
- [20].↵
- [21].
- [22].
- [23].
- [24].
- [25].
- [26].
- [27].
- [28].
- [29].↵
- [30].
- [31].↵
- [32].
- [33].↵
- [34].
- [35].
- [36].↵
- [37].
- [38].
- [39].↵
- [40].
- [41].
- [42].
- [43].↵
- [44].↵
- [45].↵
- [46].
- [47].↵
- [48].
- [49].
- [50].↵
- [51].
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].
- [57].↵
- [58].
- [59].
- [60].