Untangling the roles of parasites in food webs with generative network models

Abigail Z. Jacobs; Jennifer A. Dunne; Cris Moore; Aaron Clauset

doi:10.1101/019497

Abstract

Food webs represent the set of consumer-resource interactions among a set of species that co-occur in a habitat, but most food web studies have omitted parasites and their interactions. Recent studies have provided conflicting evidence on whether including parasites changes food web structure, with some suggesting that parasitic interactions are structurally distinct from those among free-living species while others claim the opposite. Here, we describe a principled method for understanding food web structure that combines an efficient optimization algorithm from statistical physics called parallel tempering with a probabilistic generalization of the empirically well-supported food web niche model. This generative model approach allows us to rigorously estimate the degree to which interactions that involve parasites are statistically distinguishable from interactions among freeliving species, whether parasite niches behave similarly to free-living niches, and the degree to which existing hypotheses about food web structure are naturally recovered. We apply this method to the well-studied Flensburg Fjord food web and show that while predation on parasites, concomitant predation of parasites, and parasitic intraguild trophic interactions are largely indistinguishable from free-living predation interactions, parasite-host interactions are different. These results provide a powerful new tool for evaluating the impact of classes of species and interactions on food web structure to shed new light on the roles of parasites in food webs.

I. INTRODUCTION

Ecological networks, and food webs in particular, are a useful quantitative tool for evaluating and understanding the structure and function of complex ecosystems. Most food web studies have focused on the interactions among free-living species, and have omitted diverse and ecologically important groups like parasites, which contribute to ecosystem function [1]. Food web research is beginning to explicitly include parasites, but it remains unclear whether parasites and free-living species (Figure 1) play distinct roles in structuring food webs, and thus whether food web theory needs to be altered to account for different types of feeding interactions (Figure 2). Resolving this question would shed new light on the fundamental principles of trophic organization in ecosystems.

FIG. 1. Three versions of a food web, showing the progressive addition of information.

G_F indicates the interactions among free-living species only, G_FP adds parasites and their interactions with free-living species, and G_FPC adds concomitant links from free-living species to parasites.

FIG. 2. Interaction types.

The types discussed are paired with the precise interaction types in the data. Broadly, the types fall into two categories: predation (top row and bottom right) and parasitism (bottom left). The free-living to parasite edgeset E_{F →cP} includes both predation on parasites E_{F →P} and concomitant predation links E_C, thus is in G_FPC only.

Several recent studies have considered this question, but substantial ambiguity remains, in part because the competing hypotheses have primarily been tested indirectly, focusing on the impact that including parasites has on standard statistical measures of food web structure. For instance, many studies have argued that parasites alter food web structure in fundamental ways [2–8], partly as a result of parasites unique characteristics like small body sizes compared to their hosts, trophic intimacy with their hosts, and often complex life cycles [2, 9–12]. One proposal to explain such differences posits a distinct inverse niche space that parasites occupy [13], which allows parasites and free-living species to follow different rules of interaction.

On the other hand, one recent study [14] showed that most of the changes to common network measures for food webs that result from adding parasites and their links are largely what we should expect simply from changing the food webs scale by increasing the number of species S and links L [15]. This study noted two exceptions. First, concomitant links, the feeding links connecting predators to the parasites of their prey [5], appear to alter the observed frequencies of certain motifs representing interactions among triplets of species [16]. Second, generalist parasites, which have multiple hosts, appear to have more complex trophic niches than generalist free-living predators [17]. The disagreement and accordance between this and past studies illustrates the complexity of the question of whether and how parasites alter food web structure, and demonstrate the need for statistically rigorous tools for addressing it [14].

A subsequent study further investigated the distributions of motifs among free-living species, parasites, and different types of interaction links [18]. This study showed that parasites have unique structural roles compared to free-living species: when concomitant links were excluded, parasites have diverse roles similar to free-living consumers that have both predators and prey (i.e., intermediate taxa), but when concomitant links were included, their roles were more constrained and different [18]. This study also found that concomitant links represent the most structurally diverse type of interaction.

A common feature of earlier work on these questions is the indirect nature in which they tested the null hypothesis that parasites and free-living species follow similar rules in how they fit into food webs [19]. This leaves open the possibility that parasites alter food web structure in subtle but important ways not apparent through existing approaches. In particular, most previous work has lacked either rigorous hypothesis testing or an explicit comparison of parasite interaction types to those among free-living species, or it has focused on changes in network-level statistics without controlling for confounding effects. A more direct approach would use an explicit but realistic null model of food web structure. We construct such a probabilistic null model based on the hypothesis that there is no difference in the structural roles of parasites and free-living species, but which nevertheless represents realistic food web structure. This approach can demonstrate whether a single set of rules is sufficient to simultaneously explain the interaction patterns of both parasites and free-living species. Here, we introduce a novel computational method for making just such a direct test, which we demonstrate by untangling the role of parasites within the well-studied Flensburg Fjord food web [14, 20].

Our approach is based on the probabilistic generalization of the empirically well-supported niche model [19], called the probabilistic niche model or PNM [17, 21]. The PNM enables the inference of an underlying niche structure that explains the observed links in a food web. Specifically, the model assumes a single underlying niche space, in which each species i is located at some niche location n_iand probabilistically feeds on species located near location c_i, the center of species is feeding range of width r_i. Through certain patterns on these parameters, the PNM can capture a variety of empirically supported structural features, such as hierarchical feeding, compartmental structure, and body-size determined feeding niches [22–24]. The PNM can also capture cascade structure and inverse niche structure [13, 25] (see also Supporting Information S4, Table S4). Generalizing over these hypotheses, we can assess the statistical quality of the overall model as multiple types of taxa and interactions are introduced. These characteristics make the PNM an attractive and powerful method for testing hypotheses about the structural role of parasite species and their interactions in food webs.

However, fitting the PNM to an existing food web does not itself test the hypothesis that parasites and free-living species follow distinct rules for feeding. Instead, we first partition the types of interactions, e.g., predation among free-living species versus free-living predation on parasites versus concomitant links. We then test whether or not parasites and free-living species or their interactions are different by comparing models across the sequence of food webs created by adding these interaction types one at a time. If parasites follow distinct patterns, then adding parasites and their links to a free-living food web forces the model to fit a broader variety of interaction patterns, which will result in a decrease in the PNMs goodness-of-fit. On the other hand, if parasites play similar structural roles to free-living species, adding these taxa and their links will not impact the models goodness-of-fit. Similarly, if concomitant links follow a distinct pattern to links among and between parasites and free-living species, adding them will result in another decrease in the PNMs goodness-of-fit. In this way, the PNM provides a mechanism-agnostic method for detecting heterogeneities in linkage patterns as more types of interactions are added, without having to identify the particular ecological mechanism at play.

Across different subsets of the data, we use three distinct goodness-of-fit measures to test the null hypothesis: one based on in-sample learning, one based on out-of-sample learning, and one based on statistical network feature similarity. A model that performs better under each of these measures more effectively captures the observed pattern of interaction in the food web. Conversely, if interaction types fail to be well represented, the goodness-of-fit measures will reveal this. By evaluating the null hypothesis several different ways, we increase the reliability of the overall test for differences between types of species and interaction types. Each of these tests requires estimating the best underlying niche structure for a food web, which is a difficult optimization problem related to finding the global maximum of a likelihood function characterized by many local optima. Previous work with the PNM [21] has used simulated annealing, which is inefficient and sensitive to initialization heuristics. Furthermore, initialization heuristics can induce a strong bias when applied to food webs containing parasites, making them unsuitable for testing the hypotheses of interest here. To resolve these technical difficulties, we use a sophisticated and more efficient optimization technique from statistical physics called parallel tempering to fit the PNM directly to the links contained in a food web, without any initialization heuristics. Compared to a number of alternative optimization techniques, this method achieves substantially better results on food web data.

We apply our method to the Flensburg Fjord food web, a single, well-resolved coastal food web, to demonstrate that this technique can address specific open questions about parasites in food webs. We use this data set, with and without parasites and with and without concomitant links, and trophically aggregated by species and over life stages [14, 20]. Here, we focus on demonstrating how to untangle the role of parasites or other types of taxa in food webs using generative models. We intentionally leave a comparative investigation across food webs for future work, and instead emphasize the development and demonstration of these methods, which can be applied to a wide range of questions about ecological network structure. We use these methods to specifically address the question of whether parasites and free-living species exhibit statistically distinguishable interactions or niches, and whether these methods can be used to recover previously hypothesized models.

II. RESULTS

The probabilistic niche model (PNM) is a generative model for food web structure. Each species i in the food web is represented by a triplet of parameters: n_i, c_i and r_i. The settings of these parameters for each species represent the underlying niche structure of the model. A directed consumer-resource link from species i to species j is generated with probability given by a generalized Gaussian function (Figure 3). Thus, the PNM defines a parametric probability distribution over all possible food web structures, and the particular settings of the parameters determine with types of structures are more or less likely to be generated. In observed food web data, the underlying niche structure is unknown, while the links are observed. For a given set of species and feeding links, we estimate the niche structure—i.e., the three parameters for each species—via maximum likelihood.

FIG. 3. Schematic of the probabilistic niche model (PNM).

Each species i has some position n_i in a latent niche space, represented here by a one-dimensional axis. Species i consumes each other species j (located at n_j) with a probability given by a Gaussian function centered at a preferred feeding location c_i and width r_i. Taxa whose niche positions are nearer the preferred feeding location are consumed with higher probability.

The likelihood function for the PNM is known to be rugged, exhibiting many local optima. This property makes it difficult to find the maximum likelihood niche structure via standard techniques, e.g., greedy optimization or gradient descent [26]. To circumvent this difficulty, we used a state-of-the-art optimization technique from statistical physics called parallel tempering [27] that is known to perform well on such functions. Parallel tempering is a Markov chain Monte Carlo (MCMC) technique in which we run a parallel set of MCMC simulations, distributed across a range of temperatures. Each chain evolves by sampling at the specified temperature, but can probabilistically move between more global and more local exploration by exchanging states with a chain at another temperature, higher or lower (Figure 4; Supporting Information S3). Each MCMC thus takes a random walk across temperatures that simulates a tempering process, which improves their ability to escape being trapped in local optima.

FIG. 4. Schematic of the parallel tempering method for fitting the model.

In parallel tempering, individual Markov chain Monte Carlo simulations run in parallel but at different temperatures that range between uniform exploration (top, high temperature) and greedy exploration (bottom, low temperature). Chains run as usual MCMC within an epoch of time. At the end of each epoch, a uniformly random neighboring pair of chains will exchange states according to a standard Metropolis-Hastings rule. The result is an efficient combination of liberal and greedy exploration strategies for find high-scoring maxima in the search space.

To evaluate the null hypothesis that parasites and free-living species follow similar feeding rules, we measured the quality of the estimated model, using three measures described below, as we added progressively more parasite information to a free-living food web. In this sequence, there are three versions of the food web: (i) FlensFree or G_F, containing all free-living species V_F and their predation links E_{F →F}; (ii) FlensPar of G_FP, containing all free-living species V_F and parasites V_P, the edges of G_F as well as parasite-host links E_{P →F}, predation among parasites E_{P →P}, and predation on parasites E_{F →P}; and (iii) FlensParCon or G_FPC, containing the same nodes and links as G_FP as well as concomitant links E_C, i.e., G_FPC includes predation and concomitant predation on parasites E_{F →cP}. The observed interaction types are made explicit in Figure 2.

If parasites and free-living species follow distinct sets of rules, the quality of the model will decline as we require it to fit increasingly distinct types of feeding patterns, i.e., from G_F to G_FP to G_FPC. Our measures of model quality are three-fold: (i) the goodness-of-fit for the model, formalized as an AUC statistic (see Supporting Information S2) on the observed predation links, which quantifies the ability of the model to correctly distinguish between observed predation links and observed non-feeding pairs; (ii) the fitted models ability to generate synthetic food webs with statistically similar structure to the empirical data via standard network measures; and (iii) the out-of-sample prediction accuracy for missing links, formalized as an AUC statistic in which we remove a subset of links from the empirical web and measure the models ability to accurately identify which edges were removed. We point out that these measures are comparable across food webs with different numbers of species S and links L.

Applying this method to the Flensburg Fjord food web, we found that the PNM fits the data well, consistently yielding high AUC scores for in-sample goodness-of-fit for each version of the web (Table I). This indicates that the model is able to find an underlying niche structure that differentiates the probabilities of observed consumer links from the probabilities of pairs of species that do not consume each other. However, for the two food web versions including parasites, the AUC is much lower on parasite-host links, E_{P → F}. On the other hand, goodness of fit is highest on E_{F →cP}, indicating that concomitant links are well captured by the PNM. We also find that trophic interactions among parasites E_{P →P} are fit as well as other types of predation links. Finally, the free-living species web G_F is slightly better fit than the food webs including parasites, measured across all observed edge types, which is consistent with scaling effects observed in Dunne et al. [14].

View this table:

TABLE I. PNM goodness of fit.

Goodness-of-fit (AUC) statistics for models fitted to the entire Flensburg Fjord food web but evaluated on different subsets of predation links. These high values indicate the model can simultaneously explain predation among and between both free-living species and parasites. Parenthetical values indicate the standard error calculated on the optima found from 100 independent runs of the algorithm.

We compared statistical network properties of the empirical data to synthetic food webs drawn from the fitted PNM models. We found close agreement between the synthetic webs and the empirical ones for standard measures of network properties such as connectance and clustering coefficient; the average shortest path length between species in the data was slightly longer than in the resampled networks (Table II). Overall, this suggests that the probabilistic niche model generates an ensemble of networks that are structurally similar to the original data. This was true across G_F, G_FP, and G_FPC, suggesting there is no scale dependence, nor sensitivity to parasites, on the ability of the model to fit and generate structurally similar networks.

View this table:

TABLE II. Network properties of the original and resampled webs.

Network statistics of the original (observed) data and resampled networks from model fit to the data, using maximum likelihood estimates.

Comparing predictions of presence or absence of missing links, we applied the link prediction goodness-of-fit test to different subsets of each of the three webs. We conducted a strong test of the ability of a single underlying niche structure to model both free-living and parasitic feeding links. We simulated an out-of-sample test by removing a uniformly random 10% of observed links (i) from among free-living species E_{F →F}, (ii) from freeliving species to parasites E_{F →P} (in G_FP) or E_{F →cP} (in G_FPC), (iii) from parasites to free-living species E_{P →F}, or (iv) from among parasites E_{P →P}, and fitting the model to the reduced food web (for G_F, this can only be done on E_{F →F}). We then measured its ability to correctly place higher probabilities on the missing edges than on all other non-predation links in the graph [28]. The AUC scores for each of these tests quantifies the amount of information the niche structure of the remaining links contains about the missing links of a given type. We found that differences across the different subwebs are not significant, which suggests the extra information (parasites; concomitant links) is not violating the models assumptions (Table III).

View this table:

TABLE III. Link prediction with links withheld by type.

Link prediction on subwebs using AUC. For each web G_F, G_FP, and G_FPC, ten percent of links are dropped from each subweb (listed in left column). The AUC is then calculated over the true non-links and false non-links (“missing links”) of the original web. Parenthetical values indicate the standard error calculated on the optima found from 100 independent runs of the algorithm.

Finally, we checked whether the model has overfitted the data, as a measure of the robustness of our results (Table IV). We removed a uniformly random 10% of observed links from each web and trained the model on the reduced food web. We then compared the probabilities of all true (observed) links to the probabilities of the true non-links. We also broke down the comparison of true links and non-links by link-type. If the model were overly sensitive to the missing links, then we expect the model to perform less well, predicting the missing links poorly. We found that the noisy in-sample AUC scores are comparable to the scores on the fully observed data (Table I), which suggests that the model is not overly sensitive to noise, i.e., not overfitting the data. As in Table I, parasite-host links E_{P →F} are least well fit by the model, but trophic interactions among parasites are comparably well fit to other predation link types; concomitant links are again well explained by the model.

View this table:

TABLE IV. Robustness of the data and subsets, model fitted to noisy under-sampled data.

Robustness on the data and subsets, with the model fitted with 10% of the observed links withheld. Parenthetical values indicate the standard error calculated on the optima found from 100 independent runs of the algorithm.

As a number of previously hypothesized models of niche structure are special cases of the PNM, we examined the inferred parameter values for the corresponding patterns that would indicate support for two alternate models, the cascade model and the inverse niche model [13, 25, 29]. The cascade model requires that consumers only feed on those below them in the niche space, whereas the inverse niche model follows a niche model on predation among free-living species as in Ref. [19], and parasites feeding on free-living hosts above them in the niche space, with feeding range width decreasing with higher niche position for parasites. We consider 100 local optima for each web and search for these properties within each optimum and on average. In no case did every species in the model have an inferred n_i ≥ c_i, i.e., a strict cascade. On average, and for both free-living species and parasites, there is no statistically significant direction of feeding: the average n_i – c_i isn’t statistically different than zero for any type of species. In no case did every species follow the inverse niche model, where free-living species follow the cascade model (n_i ≥ c_i) and parasites follow an inverse cascade (n_i ≤ c_i) (see Supporting Information S4 and Table S4). Contrary to the inverse niche assumption, niche position and feeding range width are uncorrelated for parasites. Feeding niches were also not continuous [21, 30]. Looking past those specific models, all parameters n_i, c_i, and r_i were distributed significantly differently for free-living species than for parasites (KS test, p < 0.001; Supporting Information S4, Table S5).

III. DISCUSSION

We found that there is little evidence that there is a structural distinction between parasites versus free-living species with respect to the models ability to learn the structure of predation in a real food web. The PNM accurately represented predation among free-living species, predation on parasites, concomitant predation, and predation among parasites. Furthermore, we found that predation is well explained by the niche model, regardless of consumer or resource type. The similarity of predation on parasites to predation among free-living species sheds light on the poorly-understood role of predation on parasites [5]. The PNM was able to successfully model predation even without additional allowances for secondary niches or separation by life stage [5, 13, 14]; other work suggests that separation by life stage would not improve the fit of the PNM [31]. Conversely, parasite-host interactions were less well described by the PNM. In this food web, parasites play similar roles to free-living predators, i.e., predation is predation, regardless of context or body size, but parasitism is a structurally distinct trophic strategy.

Our results showed that parasites occupy a broad range of niches, interspersed among the niches occupied by free-living species, but parasite niches are distributed differently than free-living niches. Despite this difference, separating parasites and free-living species may not be a necessary or meaningful distinction in describing the structure of predatory interactions. Other traits, such as niche width or relative abundance, may prove to be more useful features for modeling heterogeneities in food web link structure.

The general nature of the PNM also allows us to test for the signature of specific structuring mechanisms that represent alternative models of food web structure. For instance, the cascade model, the simplest and earliest food web model [25], embodied the notions that taxa feed with a fixed probability on species with lower niche values, and that their niche is non-contiguous. Hierarchical feeding is at the heart of subsequent niche and related models, although in a relaxed form [19, 22, 29], and the niche model further embodies contiguous feeding niches. Another example that is a special case of the PNM is the recent inverse niche model of free-living predation and parasitism on free-living hosts. The inverse niche model keeps a relaxed, contiguous feeding hierarchy for free-living species but reverses its direction for parasites, which feed on free-living taxa with higher niche values [13]. In our analysis of the Flensburg food web, we did not find evidence for the cascade or the inverse niche models using the PNM. The inverse niche model did not model predation on parasites or predation among parasites, E_{F →P}, E_{F →cP}, or E_{P →P}, so there is no explicit comparison possible for those interactions. However, the poor fit of links from parasites to free-living species E_{P →F} suggests that an alternate mechanism, perhaps similar to the inverse niche model, may be necessary to explain such parasite-host connections.

There has been disagreement about whether concomitant predation links should be included in food web data, in part because they represent a secondary form of trophic interaction compared to classic predation or parasitism (Figure 2) [3, 4, 32]. These secondary links embed information about trophic intimacy between parasites and hosts, and it is currently unclear what structural or functional roles these links play in food webs [5, 14]. Here, we found that concomitant links did not obscure the underlying niche structure of either free-living species or parasites, and including them led to no significant decrease in model fit. In fact, predation on parasites was easier to predict when concomitant links were included. Concomitant links were naturally represented by the PNM and appeared to follow similar patterns to other types of predation on parasites.

Previous work found that food webs including concomitant links deviated in motif frequencies [14, 18]. However, motifs and niches describe the network at fundamentally different levels, and so there is no conflict between such observations and our results. Concomitant links naturally close triangles and can create bidirectional links between parasites whose hosts also consume the parasites as concomitant prey. These bidirectional links are relevant as a mode of trophic parasite transmission and infection between free-living hosts. At the motif level, these triangles and bidirectional links obscure motif distributions, but here, they reinforce the niche structure and increase predictability (Supporting Information S5). Under the PNM, the global properties of the network, including our network measures and the distinction between links and non-links, are preserved when concomitant links are included. Dunne et al. [14] found that the roles of parasites as consumers were different from free-living species. We found that this difference splits by type of resource: specifically, when the resources are free-living, we find that the PNM represents these parasitic trophic interactions less effectively. When the resources are other parasites, the PNM represents these links easily, and as easily as when the consumer is free-living. Dunne et al. also found that parasites have more complex trophic niches, which reduces the goodness of fit for the PNM on predicting parasite consumer links. Complex trophic niches corroborate the lessened predictive ability of the PNM on parasite-host links.

Generative models, such as the PNM, are a sophisticated tool for investigating the structure of food webs, including whether different types of taxa and interactions follow distinct connectivity patterns. We united these techniques and applied them to a single food web. By applying these methods to a wide variety of food webs, one can assess the generality of these results. Applying such techniques to a broader range of data and to other types of trophic interactions and species will help characterize structural differences, generate novel ecological hypotheses, and support the iterative development and testing of ecological models and theory for parasites and other previously underrepresented taxa and interactions [7, 8, 14, 33]. Broadly, these methods provide a principled framework to detect heterogeneities in the roles of nodes and links in empirical network data.

IV. METHODS

A. Data

We use the Flensburg Fjord food web data [20] to demonstrate our methods. We consider three nested subsets of the data: G_F, G_FP, and G_FPC (Figure 1). We define the species set V_F as all free-living and basal taxa, and V_P as only parasites. We follow existing naming conventions to distinguish these data sets, which are constructed as follows:

FlensFree (G_F) contains only links between free-living and basal taxa.
G_F includes taxa V_F and predation links EF →F
FlensPar (G_FP)
G_FP includes taxa V_F and V_P and predation links E_{F →F}; predation on parasites, excluding concomitant links, E_{F →P}; parasite-host links E_{P →F}; and predation among parasites E_{P →P}
FlensParCon (G_FPC)
G_FPC includes taxa V_F and V_P and predation links E_{F →F}; predation on parasites, including concomitant links, E_{F →cP}; parasite-host links E_{P →F}; and predation among parasites E_{P →P}

We consider four subsets of the webs in our analyses, corresponding to the four quadrants of Figure 2: links (i) among free-living species V_F × V_F; (ii) from free-living species to parasites V_F × V_P, possibly including concomitant links; (iii) from parasites to free-living species V_P × V_F; and (iv) among parasites V_P × V_P, representing the sets of potential consumer-resource relationships. The elements of the subgraph of V_F × V_P will vary dependent on the inclusion of concomitant links, either with edges E_{F →P} or E_{F →cP}. Due to trophic aggregation, the set of free-living species V_F in G_FP and G_FPC is not equivalent to the set of free-living species in G_F.

B. Probabilistic Niche Model

The probabilistic niche model (PNM) of Williams et al. [21] and Williams and Purves [17] is a probabilistic construction of the niche model for food webs [19] that creates quasi-interval webs [30]. For a food web of S species, each species i that resides in an ecological niche located at n_i in the underlying one-dimensional niche space. Each species consumes other species in the food web with probability given from a species i’s feeding distribution with center at i’s ideal feeding position c_i and variance r_i, corresponding to i’s feeding range (Figure 3). We express the full vector of PNM parameters as θ = {a, n₁,…, n_S, c₁,…, c_S, r₁,…, r_S, e}.

The probability of species i consuming species j is given by: where α is an uncertainty parameter traditionally set to 0.9999 and e can take values other than 2 for a broader range of distributions [21]. When a is allowed to be a free parameter, we find values near 1, so we fix this parameter in practice. We allow e to be a free parameter, and find that its value decreases as we introduce parasite nodes and concomitant links (Table S3).

C. Evaluating model performance

The log-likelihood for the food web data G given the parameters θ is defined as: for the PNM [21]. Deviations of the data from the predictions made by the model can then be observed, either as non-edges with predicted high probability (G(i, j) = 0, Pr(i j θ)) or observed edges predicted with low probability (G(i, j) = 1, Pr(i → j |θ)). Ranking all edges by predicted presence Pr(i → j |θ) and comparing such a ranking to the observed G(i, j) then describes the goodness of fit of the PNM. We calculate the AUC A on the ranked probabilities, x_i and y_j over sets of size S₁ and S₂ as

We evaluate the performance of the model using AUC, or the area under the receiver operating characteristic curve. The AUC measures the separation of the distributions of probabilities predicting true links from true non-links. Intuitively, the AUC is the probability that given a true link and a true non-link, we rank the true link higher than the true non-link. AUC has high natural variance on sets of disparate sizes, as it effectively oversamples the smaller set to calculate that probability. Sets such as links (of size L) and non-links (of size S² − L) will be of disparate sizes when connectance is low, typical of food webs (L/S² ≪ 1).

D. Optimization for the PNM with parallel tempering

We use parallel tempering to find optima of the maximum likelihood parameters. Parallel tempering, also known as replica exchange MCMC (Markov chain Monte Carlo), is an efficient and easily parallelizable optimiza tion technique from statistical physics [27, 34].

In parallel tempering, Q replicas of the system (likelihood space) are explored using MCMC, under the Metropolis-Hastings algorithm. The prelicas are taken over a range of mixing temperatures T₁,…, T_Q, and are run in parallel. After every t steps of the chain, a pair of replicas at adjacent temperatures T_k, T_k+1 is allowed to switch location in the psace with probability based on their relative likelihood and temperatures (Figure 3). The replicas switch with probability

Parallel tempering is a general MCMC method and meets the detailed balance condition. By combining high temperature (fast-mixing) and low temperature (slow-mixing, or locally hill-climbing) chains, parallel tempering allows us to more quickly survey the likelihood space and explore more diverse local optima [27]. See Supporting Information S3 for more details and guidelines.

ACKNOWLEDGMENTS

We thank Rich Williams and Daniel Stouffer for helpful conversations. This work was supported in part by Grant #FA955012-1-0432 from the U.S. Air Force Office of Scientific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA) (AZJ, CM, AC); the NSF GRFP award DGE 1144083 (AZJ); NSF grant IIS-1452718 (AC), and the Santa Fe Institute.

Footnotes

↵* abigail.jacobs{at}colorado.edu

Reference

[1].↵
P. J. Hudson, A. P. Dobson, and K. D. Lafferty, Trends in Ecology & Evolution 21, 381 (2006).
[2].↵
M. Huxham, S. Beaney, and D. Raffaelli, Oikos, 284 (1996).
[3].↵
D. J. Marcogliese and D. K. Cone, Trends in Ecology and Evolution 12, 320 (1997).
OpenUrl
[4].↵
K. D. Lafferty, S. Allesina, M. Arim, C. J. Briggs, G. D. Leo, A. P. Dobson, J. A. Dunne, P. T. J. Johnson, M. Kuris, D. J. Marcogliese, N. D. Martinez, J. Memmott, P. A. Marquet, J. P. McLaughlin, E. A. Mordecai, M. Pascual, R. Poulin, and D. W. Thieltges, Ecology Letters 11, 533 (2008).
OpenUrl CrossRef PubMed Web of Science
[5].↵
P. T. J. Johnson, A. Dobson, K. D. Lafferty, D. J. Marcogliese, J. Memmott, S. A. Orlofske, R. Poulin, and D. W. Thieltges, Trends in Ecology and Evolution 25, 362 (2010).
OpenUrl
[6].
C. Fontaine, P. R. Guimarães, S. Kéfi, N. Loeuille, J. Memmott, W. H. Van Der Putten, F. J. Van Veen, and E. Thébault, Ecology Letters 14, 1170 (2011).
OpenUrl CrossRef PubMed
[7].↵
S. Kéfi, E. L. Berlow, E. A. Wieters, S. A. Navarrete, O. L. Petchey, S. A. Wood, A. Boit, L. N. Joppa, K. D. Lafferty, R. J. Williams, et al., Ecology Letters 15, 291 (2012).
OpenUrl CrossRef PubMed
[8].↵
J. R. Britton, Trends in Ecology and Evolution 28, 93 (2013).
OpenUrl
[9].↵
R. M. Thompson, K. N. Mouritsen, and R. Poulin, Journal of Animal Ecology 74, 77 (2005).
OpenUrl
[10].
A. D. Hernandez and M. V. Sukhdeo, Oecologia 156, 613 (2008).
OpenUrl CrossRef PubMed Web of Science
[11].
W. Kuang and W. Zhang, Network Biology 1, 171 (2011).
OpenUrl
[12].↵
R. M. Thompson, U. Brose, J. A. Dunne, R. O. J. Hall Hladyz,, R. L. Kitching, N. D. Martinez, H. Rantala, N. Romanuk, D. B. Stouffer, and J. M. Tylianakis, Trends in Ecology and Evolution (2012).
[13].↵
C. P. Warren, M. Pascual, K. D. Lafferty, and A. M. Kuris, Theoretical Ecology 3, 285 (2010).
OpenUrl
[14].↵
J. A. Dunne, K. D. Lafferty, A. P. Dobson, R. F. Hechinger, A. M. Kuris, N. D. Martinez, J. P. McLaughlin, K. N. Mouritsen, R. Poulin, K. Reise, et al., PLOS biology 11, e1001579 (2013).
OpenUrl CrossRef PubMed
[15].↵
1. D. Peterson and
2. V. Parker
N. Martinez and J. A. Dunne, in Ecological Scale: Theory and Applications, edited by D. Peterson and V. Parker (Columbia University Press, 1998) pp. 207–226.
[16].↵
D. B. Stouffer, J. Camacho, W. Jiang, and L. A. N. Amaral, Proceedings of the Royal Society B: Biological Sciences 274, 1931 (2007).
OpenUrl CrossRef PubMed Web of Science
[17].↵
R. J. Williams and D. W. Purves, Ecology 92, 1849 (2011).
OpenUrl CrossRef PubMed Web of Science
[18].↵
A. R. Cirtwill and D. B. Stouffer, Journal of Animal Ecology 84, 734 (2015).
OpenUrl
[19].↵
R. J. Williams and N. D. Martinez, Nature 404, 180 (2000).
OpenUrl CrossRef PubMed Web of Science
[20].↵
C. D. Zander, N. Josten, K. C. Detloff, R. Poulin, J. P. McLaughlin, and D. W. Thieltges, Ecology 92 (2007).
[21].↵
R. J. Williams, A. Anandanadesan, and D. Purves, PLOS ONE 5, e12092 (2010).
OpenUrl CrossRef PubMed
[22].↵
M.-F. Cattin, L.-F. Bersier, C. Banaffšek-Richter, R. Baltensperger, and J.-P. Gabriel, Nature 427, 835 (2004).
OpenUrl CrossRef PubMed Web of Science
[23].
S. L. Pimm and J. H. Lawton, Journal of Animal Ecology, 879 (1980).
[24].↵
G. Woodward, B. Ebenman, M. Emmerson, J. M. Montoya, J. M. Olesen, A. Valido, and P. H. Warren, Trends in Ecology & Evolution 20, 402 (2005).
OpenUrl
[25].↵
J. Cohen and C. Newman, Proceedings of the Royal so-ciety of London. Series B. Biological sciences 224, 421 (1985).
OpenUrl CrossRef
[26].↵
T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning, Vol. 2 (Springer, 2009).
[27].↵
D. J. Earl and M. W. Deem, Physical Chemistry Chemical Physics 7, 3910 (2005).
OpenUrl CrossRef PubMed Web of Science
[28].↵
A. Clauset, C. Moore, and M. E. J. Newman, Nature 453, 98 (2008).
OpenUrl CrossRef PubMed Web of Science
[29].↵
D. Stouffer, J. Camacho, R. Guimera, C. Ng, and L. Nunes Amaral, Ecology 86, 1301 (2005).
OpenUrl
[30].↵
A. E. Zook, A. Eklof, U. Jacob, and S. Allesina, Journal of Theoretical Biology 271, 106 (2010).
OpenUrl
[31].↵
D. L. Preston, A. Z. Jacobs, S. A. Orlofske, and P. T. Johnson, Oecologia 174, 953 (2014).
OpenUrl
[32].↵
K. D. Lafferty, A. P. Dobson, and A. M. Kuris, Proceedings of the National Academy of Sciences 103, 11211 (2006).
OpenUrl Abstract/FREE Full Text
[33].↵
D. W. Thieltges, P.-A. Amundsen, R. F. Hechinger, P. T. J. Johnson, K. D. Lafferty, K. N. Mouritsen, D. L. Preston, K. Reise, C. D. Zander, and R. Poulin, Oikos (2013), doi:10.1111/j.1600-0706.2013.00243.x.
OpenUrl CrossRef
[34].↵
M. E. J. Newman and G. T. Barkema, Monte Carlo methods in statistical physics. (Oxford: Clarendon Press, 1999).

Reference

[1].↵
C. D. Zander, N. Josten, K. C. Detloff, R. Poulin, J. P. McLaughlin, and D. W. Thieltges, Ecology 92 (2007).
[2].↵
R. J. Williams, A. Anandanadesan, and D. Purves, PLOS ONE 5, e12092 (2010).
OpenUrl CrossRef PubMed
[3].↵
R. J. Williams and D. W. Purves, Ecology 92, 1849 (2011).
OpenUrl CrossRef PubMed Web of Science
[4].↵
R. J. Williams and N. D. Martinez, Nature 404, 180 (2000).
OpenUrl CrossRef PubMed Web of Science
[5].↵
J. A. Dunne, K. D. Lafferty, A. P. Dobson, R. F. Hechinger, A. M. Kuris, N. D. Martinez, J. P. McLaughlin, K. N. Mouritsen, R. Poulin, K. Reise, et al., PLOS biology 11, e1001579 (2013).
OpenUrl CrossRef PubMed
[6].↵
1. G. Casella,
2. S. Fienberg, and
3. I. Olkin
J. Jiang, in Large Sample Techniques for Statistics, Springer Texts in Statistics, Vol. 0, edited by G. Casella, S. Fienberg, and I. Olkin (Springer New York, 2010) pp. 357–391, 10.1007/978-1-4419-6827-2 11.
OpenUrl
[7].↵
A. Clauset, C. Moore, and M. E. J. Newman, Nature 453, 98 (2008).
OpenUrl CrossRef PubMed Web of Science
[8].↵
J. M. Lobo, A. Jiménez-Valverde, and R. Real, Global Ecology and Biogeography 17, 145 (2008).
OpenUrl CrossRef Web of Science
[9].↵
D. J. Earl and M. W. Deem, Physical Chemistry Chemical Physics 7, 3910 (2005).
OpenUrl CrossRef PubMed Web of Science
[10].↵
M. E. J. Newman and G. T. Barkema, Monte Carlo methods in statistical physics. Oxford : Clarendon Press, 1999. (Oxford: Clarendon Press, 1999).
[11].
In this section and section S4.2, all results shown are calculated from optima found from each of 100 different parallel tempering runs, as used in the main text.
[12].↵
A. E. Zook, A. Eklof, U. Jacob, and S. Allesina, Journal of Theoretical Biology 271, 106 (2010).
OpenUrl
[13].↵
M.-F. Cattin, L.-F. Bersier, C. Banaffšek-Richter, R. Baltensperger, and J.-P. Gabriel, Nature 427, 835 (2004).
OpenUrl CrossRef PubMed Web of Science
[14].↵
C. P. Warren, M. Pascual, K. D. Lafferty, and A. M. Kuris, Theoretical Ecology 3, 285 (2010).
OpenUrl
[15].↵
A. R. Cirtwill and D. B. Stouffer, Journal of Animal Ecology 84, 734 (2015).
OpenUrl