Abstract
Statistical techniques exist for inferring community assembly processes from community patterns. Habitat filtering, competition, and biogeographical effects have, for example, been inferred from signals in phenotypic and phylogenetic data. The usefulness of current inference techniques is, however, debated as the causal link between process and pattern is often lacking and processes known to be important are ignored. Here, we revisit current knowledge on community assembly across scales and, in line with several reviews that have outlined the features and challenges associated with current inference techniques, we identify a discrepancy between features of real communities and current inference techniques. We argue, that mechanistic eco-evolutionary models in combination with novel model fitting and model evaluation techniques can provide avenues for more accurate, reliable and inclusive inference. To exemplify, we implement a trait-based and spatially explicit dynamic eco-evolutionary model and discuss steps of model modification, fitting, and evaluation as an iterative approach enabling inference from diverse data sources. This suggested approach can be computationally intensive, and model fitting and parameter estimation can be challenging. We discuss optimization of model implementation, data requirements and availability, and Approximate Bayesian Computation (ABC) as potential solutions to challenges that may arise in our quest for better inference techniques.
Introduction
Community assembly processes are difficult to observe directly in the field and revealing processes via manipulative experiments is not always feasible. Consequently, there is a considerable need to infer processes from observations, such as trait-distributions, species distributions, abundances, and phylogenies (Emerson and Gillespie 2008; Cavender-Bares et al. 2009; Vamosi et al. 2009; Cadotte et al. 2010; Pausas and Verdu 2010; Mouquet et al. 2012). As an example, the co-occurrence of species having similar niches (high phenotypic clustering) or dissimilar niches (low phenotypic clustering, also termed overdispersion) may reflect habitat filtering or ecological interactions, respectively (Webb et al. 2002). Other common methods quantify the correlation between species occurrence and abiotic factors (Legendre et al. 1997) or distance between habitats (Borcard et al. 1992) to infer habitat filtering or geographical contingencies.
Despite their widespread and frequent use, current assembly process inference techniques have limitations. Most methods rely on statistical models for one or a few processes, although it is well known that community assembly occurs via multiple processes (Ackerly et al. 2006; Ricklefs 2007; Leibold et al. 2010). Patterns observed in nature may be consistent with multiple explanations (Vellend 2010) and current techniques may thus fail to provide accurate inference, particularly if evolutionary processes and trophic interactions are poorly integrated (Emerson and Gillespie 2008; Pausas and Verdu 2010; Pontarp and Petchey 2016). Fundamental assumptions (e.g. that competition will result in overdispersed trait distributions), on which current inference techniques often rely, have also been questioned (Mayfield and Levine 2010). Such challenges and shortcomings (as well as advantages) of existing inference techniques are covered in several reviews (Emerson and Gillespie 2008; Cavender-Bares et al. 2009; Vamosi et al. 2009; Cadotte et al. 2010; Pausas and Verdu 2010; Mouquet et al. 2012; Adler et al. 2013). Our aim is thus not to review current inference techniques, though we outline the most relevant features of some of the common ones (Tables 1-2 and Online Appendix 1). Instead, we argue that a synthesis of existing modeling frameworks and statistical techniques have the potential to transform the practice of inference of process from pattern in ecology and evolutionary biology.
We set the stage by reviewing current knowledge on the diversity of community assembly processes that have been termed the “black box of community ecology” by Vellend (2010). With community assembly processes across spatiotemporal scales in mind, we thereafter emphasize the need for more holistic assembly-process inference. Such transformation, already underway, involves more mechanistic and complex models of community assembly. We highlight specific components of such a transformation including mechanistic modeling, parameter estimation, and model selection (see also Csillery et al. 2010; van der Plas et al. 2015; Cabral et al. 2017).
In a concrete example, we present a trait-based and spatially explicit dynamic eco-evolutionary community model that includes various processes ranging from intra- and interspecific competition, trophic interactions, dispersal as well as trait evolution within trophic levels and co-evolution among trophic levels. We choose to implement our model as a differential equations (Kot 2001) and matrix model which allows for tractable computational cost with the flexibility to initiate simulations with different conditions including or excluding particular processes. We then use this model to illustrate how steps of model modification (including or excluding processes) can be customized for a range of data sources including time series of population abundance, phylogenies, trait distributions, and spatial species distribution data. Furthermore, we argue that an iterative approach of model modification, model fitting, and model evaluation can answer calls for novel inference techniques.
Using complex models and sophisticated parameter estimation techniques come with challenges associated with data requirements, computational costs, and statistical issues such as model fitting and selection. Many of these challenges are already recognized in other fields such as ecological forecasting and data assimilation (Luo et al. 2011; Niu et al. 2014). Here we synthesize and assess them for the purposes of process inference. We envision that attempts to overcome such challenges, through a combination of data collection, experimental work and a well-defined inference framework will be a worthwhile endeavor on the road towards mechanistic inference of multiple community assembly processes acting in concert.
Processes of eco-evolutionary metacommunity assembly
Before delving into the technical details, it is important to recognize the complex nature of community assembly across spatiotemporal scales. On short temporal and small geographical scales in communities with no trophic interactions, habitat filtering (Wilson et al. 1999) and limiting similarity (MacArthur and Levins 1967) have been viewed as the dominating assembly processes (Fig. 1). The abiotic environment may filter the community such that only species with traits that facilitate their survival within particular environments (e.g. temperature or levels of precipitation) can persist. Habitat filtering will thus cause a local community to become phenotypically clustered and, if traits are phylogenetically conserved (Blomberg et al. 2003), also phylogenetically clustered. On the contrary, but not mutually exclusive, competition can drive community overdispersion as superior competitors outcompete inferior ones. With this being said, the paradigm of habitat filtering and competition driving community clustering and overdispersion have been challenged by recent studies that show trait convergence among competing species (Mayfield and Levine 2010; Godoy et al. 2014; Kraft et al. 2015).
In trophic communities, both empirical (Alto et al. 2012) and theoretical (Pontarp and Petchey 2016) studies show that trait-dependent trophic interactions also structure communities (Fig. 1). When correlated with environmental conditions, antagonistic trophic interactions can amplify habitat-filtering effects and thus lead to community clustering (Fine et al. 2006). Conversely, pathogens can increase competitive exclusion and thereby promote trait overdispersion (Gilbert and Webb 2007). Mutualistic interactions are also important in shaping communities (Bascompte and Jordano 2007). Pollinators shared among closely related plant species can, for example, increase phylogenetic clustering (Sargent and Ackerly 2008) and plants in early succession stages can facilitate co-occurrence of distantly related species which often lead to trait overdispersion (Valiente-Banuet and Verdu 2007). The bias in current inference methods, focusing on habitat filtering and competition is thus somewhat surprising.
Expanding into geographical space, the spatial distribution of habitats in relation to the dispersal propensity of organisms drives metapopulation and metacommunity dynamics, which alongside local ecological processes (Fig. 1) structures both competitive and trophic communities (Hanski 1999; Holyoak et al. 2005). Asynchrony in metapopulation dynamics and spatial dynamics in local extinction and recolonization of habitats can prevent extinctions (Hanski 1999; Holyoak et al. 2005). Such dynamics can be driven by multiple ecological mechanisms, such as different competitive advantages in different patches (Chesson 2000b; Chesson 2000a), competition-colonization trade-offs (Tilman 1994) and density-dependent predation (Holt 1993). Metacommunity dynamics (Fig. 1) can, however, also lead to extinctions, decrease the food-chain length and ultimately less diverse communities (Holt 1997). One can speculate that metapopulation dynamics that lead to species persisting where they would otherwise go extinct due to habitat filtering would render less community clustering, and that metapopulation dynamics that counteract competitive exclusion would increase community clustering. Such speculation is, however, difficult to confirm, as most current process-inference techniques do not consider metacommunity dynamics.
On longer time scales evolutionary processes can become important for the assembly of local communities (Fig. 1) (Urban and Skelly 2006). The absolute time on which this occurs is case dependent, and ecological and evolutionary time scales can overlap (Cortez and Ellner 2010). Hence, a mix of ecological and evolutionary processes assembles communities (Ellner et al. 2011). Knowledge of such eco-evolutionary processes and their effects on community structure is constantly increasing as, for example, theory explains how species adapt gradually according to selection gradients in a fitness landscape defined by the abiotic environment, resource availability and the traits and abundances of interacting species (see e.g., Brännström et al. 2013). Many empirical studies also demonstrate the importance of evolutionary processes at local spatial scales and character displacement due to competition, for example in Darwin’s finches (Schluter et al. 1985), may be quite common (Keller and Seehausen 2012). Furthermore, predation-induced trait divergence (Reznick et al. 2008; Zeller et al. 2012) can lead to decreased community clustering (Prinzing et al. 2008). Despite the evidence for eco-evolutionary processes being important, evolutionary processes are, however, poorly integrated into current inference techniques (Emerson and Gillespie 2008; Pausas and Verdu 2010).
At larger spatial and longer temporal scales (Fig. 1) the “evolving metacommunity” framework becomes relevant (Urban 2011; Mittelbach and Schemske 2015). This framework takes into account spatial variation in abiotic conditions and resource availability as well as dispersal and sequential colonization of species into a local community. A type of “race” between ecological (e.g. colonization) and evolutionary (e.g. local adaptation) processes occurs. Species can colonize a local community, adapt to novel conditions, and monopolize niche space before subsequent species invade (Urban and De Meester 2009). Conversely, invasion of well-adapted species can constrain evolutionary processes as niche space can be filled by well-adapted species, not through local adaptation(Urban et al. 2012). This “race” between ecological and evolutionary processes determines community and metacommunity structure and can be detected by, for example, phylogenetic structure analysis (Pontarp et al. 2012). Nevertheless, much-needed knowledge of the assembly processes and structure of spatially distributed evolving communities that also includes trophic interactions is lacking, though see Urban et al. (2008) for some conceptual examples.
The case for inclusive and mechanistic process inference
Despite the complex nature of community assembly, inference methods often aim to infer about one or few processes, assuming the absence or at least no important influence of all others (Table 1; see Online Appendix 1 for a detailed review of the most common methods and their limitations). These inference techniques have been praised and criticized (see also Table 2 and review in Online Appendix 1) and calls for more inclusive and mechanistic approaches have been made (Mittelbach and Schemske 2015). Research aimed at addressing such calls exist, including the use of multiple existing inference techniques on the same data (Blois et al. 2014). Others aim at extensions and improvement of existing methods (Helmus et al. 2007; Leibold et al. 2010; De Bie et al. 2012). Such attempts, although necessary, remain associated with many of the challenges described in Table 2.
In line with advances in other fields, such as macroecology (D’Amen et al. 2015; Cabral et al. 2017) and ecological forecasting (Niu et al. 2014; Urban et al. 2016), we suggest that rather than using and developing existing inference techniques, a more general conceptual and flexible methodological inference framework should be adopted, coupling more realistic models with appropriate methods for fitting them to observed patterns. We present a generic eco-evolutionary and trait-based model as an example, coupled with Bayesian methods including model formulation, model fitting, and model improvement as a unified process inference approach (Fig 2). The framework can make use of a priori knowledge about the biological system studied, and although general, it is flexible enough to explain case-specific conditions. Different types of data can be utilized and the modeled mechanistic detail can be adjusted in accordance with different ecological realities, data types and data availability. Improvement of the inference is facilitated through quantitative evaluation of the inference quality and reliability.
Implementing mechanistic and inclusive approaches
Developing eco-evolutionary models for inference
Model construction is the first essential step in the inference framework proposed here, and it requires knowledge of the natural history of the study system, experiments, as well as known theory (Fig. 2). With Figure 1 in mind, it also becomes obvious that multiple processes should be included in the models as well as some mechanistic detail of those processes. Data (e.g. diversity, size distributions or phylogenetic patterns) also dictate model construction as model output and data needs to be comparable in subsequent parameter estimation and model selection steps (Fig. 2). It follows, that for a community model to be useful as a general inference tool, it needs to include multiple processes, it should be flexible, and it should output multiple types of data.
Dynamic models, which underlies much of our current understanding of communities, can be suitable for inference as it involves well-established functional forms and computational tractability (Brännström et al. 2012; Urban et al. 2016) (Fig. 2). The models are often made mechanistic through trait-based ecological interactions (Dieckmann and Doebeli 1999; Doebeli and Dieckmann 2003; Heinz et al. 2009) and evolutionary dynamics (e.g. Geritz et al. 1998; Dieckmann and Doebeli 1999; Doebeli and Dieckmann 2003).
Countless dynamic models have been analyzed and provided insights into ecology, evolution, and their interaction. As an example, Roughgarden (1972) used a trait-based modeled of a competitive community to study species co-existence. Evolutionary mechanisms such as mutation rates and mutation sizes have been studied in relation to trait evolution (Dieckmann and Law 1996). Others have implemented evolutionary mechanisms in models of co-evolution and trophic-community assembly (Ripa et al. 2009; Brännström et al. 2011). Spatial contingencies have been considered (Pontarp et al. 2015) and age- and stage structured populations, environmental and demographic stochasticity and variation in spatial structure can be included (Brännström et al. 2012).
Few studies have, however, utilized dynamic modeling for inference and none of the models presented above are suitable for inference in general as they are specifically designed to answer specific scientific questions. Dynamic eco-evolutionary models can, however, be constructed in a general and flexible way, fitting the requirements for inference. For the sake of argument, we implement such a model and we discuss its utility below.
We base our model on the generalized Lotka–Volterra (GLV) equations (Case 2000) extended into geographical space (Levin 1974). See Figure 3 for an illustration of the model and its initial values in our examples and see Appendix 2 for a detailed formulation of our model and description of the numerical implementation. Omitting space, for now, the per capita growth of n prey populations and m predator is formulated as: for i =1 to n, k =1 to m and where Vi and Pk denote prey and predator population size respectively. The parameter r and d is the intrinsic prey growth rate and the predator death rate, respectively. The functions on the right-hand side of equations 1 and 2 are trait dependent functions: and where K(ui, uopt) represents the carrying capacity for a monomorphic population i of prey individuals with trait value ui in a habitat characterized by a resource distribution with its peak resource availability at the point uopt. It follows that the resource availability declines symmetrically as u deviates from uopt according to the width of the resource distribution (σK). The interaction, α(ui,uj), between a prey population i (defined by its trait ui) and its competitor populations j (defined by their traits uj) is modeled in a similar way, through a Gaussian function. Here, we standardize the competition coefficients so that, for a focal population i, αii =1 and 0 < αij<1 (ui≠uj). σα determines the degree of competition between individuals given certain utilization traits and can thus be viewed as the niche width of the prey. Equation 5 models the interaction, a(ui,zk), between a focal predator population k with trait value z and a prey population i with trait value u. The parameter bmax denotes the maximum attack rate obtained when ui=zk and this rate then falls of symmetrically as ui deviates from zk according to a Gaussian function with variance σa. Similar to the σα parameter, σa can be viewed as the niche width of the predator.
We expand the non-spatial model described above into distinct patches or habitats distributed in space by implementing our model with a matrix formulation with vectors containing values for each population in each habitat, a community matrix that defines ecological interactions, and a dispersal matrix (Fig. 3 and Appendix 2). A fixed proportion of all local populations disperse between adjacent habitats. Furthermore, we follow an adaptive dynamics approach for the evolutionary dynamics (Geritz et al. 1998). In its full complexity, the model includes intra- and interspecific competition, trophic interactions, dispersal, trait evolution and in some cases evolutionary branching (Fig 3). The model provides us with information about population dynamics, equilibrium population sizes and trait distributions for each evolutionary step (Fig 4). Populations can also be assigned a species identity using, for example, a trait-based speciation definition (see also Pontarp et al. 2012; Pontarp et al. 2015). By registering the time and origin of all diversification events as well as trait distributions and abundance throughout evolutionary history we have all the information required to follow trait evolution, diversity, and phylogenetic and phenotypic community structure.
Tailoring mechanistic models to specific systems
Our model, as it is presented above and in Appendix 2, includes multiple processes and it can produce different types of data output (Figs 3-4). Our model can thus be used as an inference tool for complex systems, by searching for and finding distributions of parameter values (and therefore processes signs and strengths) that give the best correspondence between the model output and observations.
Unfortunately, increases in model complexity are accompanied by several challenges. Complex models tend to be difficult to interpret and the parameter estimation becomes increasingly computationally expensive and data demanding. Recent estimation techniques were developed with such challenges in mind (see below), but too complex models can become intractable and separating model structure error from parameter error becomes problematic (Keenan et al. 2011). Thus, the model should be made as simple as possible and still provide adequate information about the modeled community (May 2004). This can be accomplished by evaluating the data at hand and through prior knowledge of the ecology, evolution and natural history of the study system.
Prior knowledge of the study system may suggest that in some cases a relatively simple model is sufficient. A purely ecological model may, for example, be adequate for newly established communities of organisms with low evolutionary potential (e.g., low phenotypic/genotypic variation, low mutation rate, or low population sizes). In such cases, the model presented above can be reduced to an ecological community model, outputting time series data and trait distribution data only (Fig. 4 a). This is can be done by introducing species to a local community and allowing the community to assemble through ecological processes only, by omitting dispersal and the evolutionary algorithm altogether (Pontarp and Petchey 2016). For old communities or fast-evolving organisms (e.g., microbes) the full eco evolutionary model may be more appropriate, with or without the spatial component. Similarly, space may be omitted for largely sessile organisms in largely closed communities, while the inclusion of the spatial component of the model may be best for dispersing organisms and more open communities.
Quantifying assembly processes through parameter estimation
Let us assume that prior information is available and data availability has guided us in our manipulation of the model such that we are relatively confident that we are modeling the correct processes. Now, model fitting and parameter estimation can provide information that is rarely provided by “traditional” inference approaches. Estimates of parameter distributions provide quantitative information about the processes with which the parameters are associated. By comparing estimates among parameters, or by sensitivity analysis, the relative strength and importance of different processes can be evaluated. Furthermore, the covariance between parameter distributions can inform about dependencies and redundancies between processes.
There are many methods for estimating model parameters and different approaches are appropriate for various types of models and data (Sokal and Rohlf 1995; Burnham and Anderson 2002; Hartig et al. 2011). These methods are reviewed elsewhere (reviewed in Raupach et al. 2005; Williams et al. 2009; Luo et al. 2011). Different fitting techniques will likely be preferred, depending on what type of processes are modeled and data availability. In simple cases where space and evolution are omitted, minimizing the sum of squared residuals may, for example, be preferred. Rather than review all possible fitting techniques for different model scenarios, we discuss issues that may arise when the model is complex and multiple data sources are available.
Approximate Bayesian Computation (ABC) is a model fitting technique with promise for overcoming the difficulty in fitting complex models to diverse data, and it could be the first option in such situations (Sisson et al. 2007; Beaumont 2010; Csillery et al. 2010; Blum et al. 2013). ABC takes priors for each model parameter as input, simulates data using the model and evaluates the distance (often Euclidian distance) between model output and data through a set of summary statistics (Fearnhead and Prangle 2012). The search of parameter space for the best performing parameters given data can be accomplished using global optimization techniques such as Kalman filters (Kalman 1960), Markov chain Monte Carlo (Gao et al. 2011) or Sequential Monte Carlo (Sisson et al. 2007). Posterior distributions for the parameters are approximated by rejecting or accepting parameter combinations through some distance threshold evaluated on the distance of summary statistics between model output and data. The approach consequently does not rely on computing the likelihood of the model given data as is done by more traditional frequentist or Bayesian fitting techniques. We thus view ABC as having great promise for parameter estimation and thereby process inference with complex process-based models.
Inferring processes through model selection
Selecting among alternative model structures is the final essential step of inference. By iterating the model manipulation and model fitting steps, each time evaluating an increasingly complicated model, it is possible to circumvent potential problems of using an overly complex model from the start.
First, and before the models are fitted to real data, a theoretical model investigation can identify different processes that may give rise to similar patterns. If the models tell us that two processes give similar community patterns, it will be difficult to distinguish between those processes and additional information or even experimentation may be needed for successful inference. Second, it is possible to evaluate the intrinsic properties of the model versions and fitting techniques by testing them on simulated data produced by known parameter values. If the correct parameter values cannot be retrieved from simulated data, even though the model that underlie the patterns is known, correct inference on non-simulated (real) data, using that model and fitting technique, is unlikely. Third, while fitting models to data (Fig. 2) one can evaluate a model that includes fewer ecological processes against models that include more processes. The models are evaluated concerning how well they represent the data (Chivers et al. 2014), thus guiding the inclusion or exclusion of particular processes, and providing inference about processes.
Model selection is relatively straightforward when the models have the same number of parameters; goodness-of-fit can guide the selection. When the models have different numbers of parameters, other model selection techniques can be used. The most widely used are a suite of information criteria rooted in information theory (Akaike 1974). Other model selection criteria are, however, also possible. Again, ABC is a useful approach for complex models (Toni et al. 2009). The model selection procedure is based on the same general ABC principles presented above, except that the summary statistics is defined somewhat differently (Prangle et al. 2014). The output from the ABC model selection is focused on acceptance/rejection ratio between models rather than posterior parameter distributions (Toni et al. 2009; Liepe et al. 2014).
Discussion
Ecological communities are complex, with diverse processes and actors (e.g. Urban and Skelly 2006; Vellend 2010; Urban et al. 2012; Mittelbach and Schemske 2015) and it is clear that several of the current inference techniques are too simplistic (Emerson and Gillespie 2008; Cavender-Bares et al. 2009; Vamosi et al. 2009; Cadotte et al. 2010; Pausas and Verdu 2010; Mouquet et al. 2012; Adler et al. 2013). A novel, more mechanistic, more inclusive, and more unified approach for future assembly-process inference techniques is desirable as this will allow us to infer the causal link between multiple processes and community patterns (Mittelbach and Schemske 2015), rather than focusing on less informative phenomenological/statistical relationships. We identify Bayesian analyses of model formulation, model fitting, and model improvement as a unified process-inference approach (Fig 2).
Approaches, similar to the ones presented above, have been suggested for predicting community response to environmental change (D’Amen et al. 2015; Urban et al. 2016). Furthermore, in ecological forecasting, a set of ad hoc models are often constructed and the best performing model is used for prediction (Luo et al. 2011; Niu et al. 2014; Urban et al. 2016). Although the approaches have not been synthesized for process-inference explicitly before, inference does, however, seem to be moving in the proposed direction. As an example, work on annual plants and parameterized models of competitor dynamics provides an understanding of how patterns of species coexistence are related to phylogenetic (Godoy et al. 2014) and phenotypic (Kraft et al. 2015) similarity. Massie et al. (2010) modeled trophic interactions and structured populations to infer drivers of community dynamics from phytoplankton population data. DeLong et al. (2014) inferred predator-prey interactions from microcosm experimental data to better understand ecological drivers of predator body size. On larger spatial scales Carrara et al. (2012) used a spatially explicit population model to infer processes from microcosm metacommunities. Furthermore, Yoshida et al. (2007) parameterized an evolutionary predator-prey model and infer interaction strength from community dynamics.
Studies that more explicitly use the proposed inference approach presented above also exist. Recently, van der Plas et al. (2015) published a modeling approach for estimating the relative importance of different community assembly processes. They used a trait-based (but not dynamic as described above) model and they simulated the assembly through processes like dispersal, habitat filtering and limiting similarity. They then fitted their model to community data using ABC, and by estimating parameters that are directly linked to the strength of the different processes, they inferred the relative strength of those processes. Jabot and Bascompte (2012) used a similar approach to contrast dispersal limitations and stochastic metacommunity dynamics against trophic interactions. They too used a simulation approach to assemble, in this case, network communities and they used ABC to parameterize the model given data. Ultimately they inferred how trophic interactions shape biodiversity.
May et al. (2013) went even further and used ABC for parameter estimation and model selection as an inference tool. They also used simulations, and they contrasted a metacommunity model, a mainland-island model and an island community model against each other. The best performing model, given vegetation survey data, was used to inferring the role of connectivity through seed dispersal among habitat patches for regional community dynamics. Although they do not present their work as an iterative framework including model construction, parameter estimation and model evaluation, the studies presented here are excellent examples where more or less mechanistic models were used for process inference.
The proposed framework is general and from the literature reviewed above, we conclude that several modeling approaches and statistical techniques can be used. The synthesis of the relatively simplistic and flexible nature of dynamical modeling in combination with the powerful and flexible ABC does, however, seem particularly suitable. Dynamic modeling is simple in the sense that it is based on simple, often phenomenological population dynamical models (Brännström et al. 2012). The “skeleton” of such simple models are then extended to include detailed mechanisms through the inclusion of trait-based dynamics, complex functional forms and through a population- or individual-based implementation. The trade-off between realism, computational costs, and model tractability can be monitored and controlled as the models gain in complexity. By iterating over model construction and model evaluation several times, each time evaluating an increasingly complex version of the model, the optimal model for the study system and data can be found. Any model that include several processes will, however, tend to be complex, computationally costly and likelihood functions are often intractable, leading to the need of powerful fitting and model selection techniques like ABC.
It is also important to emphasize the empirical side of inference, namely data to which the models are fitted. As noted above, data inform the models and is thus imperative for the model fitting (Urban et al. 2016; Cabral et al. 2017). Data also dictate model construction as the model output needs to be comparable with data (e.g. diversity, size distributions or phylogenetic patterns). Furthermore, data provide knowledge of the natural history of the study system that also informs model construction. A priori information of a particular system can narrow down the priors for ABC and thus facilitate parameter estimation by reducing the parameter space that needs to be searched in the optimization procedure. For certain systems, reasonable parameter values may already be available in the literature. In other cases, it might be possible to measure some parameters in independent studies. As an example, DeLong et al. (2014) conducted separate experiments to estimate functional responses before they fitted a full predator-prey model to protist microcosm data. Similarly, Kraft et al. (2015) constructed the functional form for their model following information provided by experiments before they inferred vital rates and pairwise competitive interactions. Kraft’s work again illustrates the importance of prior and separate sources of knowledge and high-quality data, and it highlights the importance of combining experimental work with inference from observations. The experimental work reviewed in this paper are examples of how mechanistic modeling and parameter estimation techniques combine to provide a better understanding of community assembly and dynamics in general as well as to enable better inference of community assembly processes from observed macroscopic patterns.
Conclusion
We envision that process inference will continue to move away from simple statistical and non-mechanistic inference techniques for approaches with a constant flow of information between experimental and field data, model construction, parameter estimation, and model selection. This way the challenges associated with inference (Table 2), may be avoided and the full complexity of community and structure can be more and more considered and understood. The transformation in inference approach will come with technical challenges as well as increased demands on data available from natural systems, computational power, and experimental progress. Many of these difficulties are, however, already identified and to some extent resolved in other fields and thus ready to be put into action in a more formal way also for inferring processes of community assembly from macroscopic patterns.
Acknowledgements
We thank Jonathan Levine, Florian Altermatt and Miguel Verdú for comments and suggestions that improved the initial version of this paper.
Footnotes
Statement of authorship: MP and OP conceived the synthesis. All authors contributed to analysis and interpretation of the material and to the writing of the manuscript.
Author e-mails: mikael.pontarp{at}biol.lu.se, ake.brannstrom{at}umu.se, owen.petchey{at}ieu.uzh.ch