ABSTRACT
Aim Global indicators of change in the state of terrestrial biodiversity are often derived by intersecting observed or projected changes in the distribution of habitat transformation, or of protected areas, with underlying patterns in the distribution of biodiversity. However the two main sources of data used to account for biodiversity patterns in such assessments – i.e. ecoregional boundaries, and vertebrate species ranges – are typically delineated at a much coarser resolution than the spatial grain of key ecological processes shaping both land-use and biological distributions at landscape scale. Species distribution modelling provides one widely used means of refining the resolution of mapped species distributions, but is limited to a subset of species which is biased both taxonomically and geographically, with some regions of the world lacking adequate data to generate reliable models even for better-known biological groups.
Innovation Macroecological modelling of collective properties of biodiversity (e.g. alpha and beta diversity) as a correlative function of environmental predictors offers an alternative, yet highly complementary, approach to refining the spatial resolution with which patterns in the distribution of biodiversity can be mapped across our planet. Here we introduce a new capability – BILBI (the Biogeographic Infrastructure for Large-scaled Biodiversity Indicators) – which has implemented this approach by integrating advances in macroecological modelling, biodiversity informatics, remote sensing and high-performance computing to assess spatial-temporal change in biodiversity at ~1km grid resolution across the entire terrestrial surface of the planet. The initial implementation of this infrastructure focuses on modelling beta-diversity patterns using a novel extension of generalised dissimilarity modelling (GDM) designed to extract maximum value from sparsely and unevenly distributed occurrence records for over 400,000 species of plants, invertebrates and vertebrates.
Main conclusions Models generated by BILBI greatly refine the mapping of beta-diversity patterns relative to more traditional biodiversity surrogates such as ecoregions. This capability is already proving of considerable value in informing global biodiversity assessment through: 1) generation of indicators of past-to-present change in biodiversity based on observed changes in habitat condition and protected-area coverage; and 2) projection of potential future change in biodiversity as a consequence of alternative scenarios of global change in drivers and policy options.
INTRODUCTION
Continued growth in human populations around the world is intensifying demands on our natural environment. Coupled with the effects of anthropogenic climate change, the potential for large-scale modification and loss of our planet’s remaining biological diversity seems ever more likely (Pereira et al., 2010). To combat this ongoing decline, governments have agreed to multi-lateral policy goals which aim to limit, reduce or halt biodiversity loss and environmental degradation. The Convention on Biological Diversity (CBD) Strategic Plan for Biodiversity 2011-2020 and the associated Aichi Biodiversity Targets are one such policy framework that sets near-future targets across five strategic goals addressing ultimate drivers of biodiversity loss, proximate pressures, management responses, benefits to people, and implementation challenges (CBD, 2010). More recently the Sustainable Development Goals (SDGs) adopted by the United Nations promote a healthy and sustainable future both for humans and for our environment, including all “life on land” and “life below water” (UN, 2015), while the latest multi-lateral agreement to limit anthropogenic climate change, ratified in Paris in 2015, includes statements to limit the loss of natural habitat (through deforestation) with indirect consequences for biodiversity (Citroen et al., 2016).
Efficient planning of actions to achieve biodiversity-related goals and targets under these policy processes, and effective tracking of progress towards this achievement, requires the ability to measure the present state of biodiversity, detect trends of recent change, and project the potential future state of biodiversity expected under alternative policy options in a globally consistent way. Unfortunately our ability to report or project indicators of change for many aspects of biodiversity is still limited by an inability to observe or infer changes in ecological communities directly from currently available global datasets (Ferrier, 2011). Indicators of change employed in biodiversity assessments are most often derived by intersecting observed or projected changes in the distribution of habitat loss and degradation, or of protected areas, with underlying patterns in the distribution of biodiversity (e.g. Tittensor et al., 2014; Butchart et al., 2015)
Two sources of global data on terrestrial biodiversity patterns have been used most commonly in the derivation of protected-area and habitat indicators. The first of these is the World Wildlife Fund’s mapping of 867 terrestrial ecoregions, defined as “relatively large units of land containing a distinct assemblage of natural communities and species, with boundaries that approximate the original extent of natural communities prior to major land-use change” (Olson et al., 2001). Ecoregions have long provided a convenient and well-respected foundation for assessing changing patterns of protected-area coverage and habitat transformation around the world (e.g. Watson et al., 2016). However, as indicated by the above definition, ecoregions are typically delineated at a much coarser resolution than the spatial grain of key ecological processes shaping both land-use and biological distributions at the landscape scale (Londoño-Murcia et al., 2010; Calderón-Patrón et al., 2016; Serrano et al., 2018).
Using ecoregions as fundamental spatial units for assessing impacts of protected-area coverage and habitat transformation on biodiversity assumes that all biological elements (e.g. species) within an ecoregion will be equally affected by these activities. Yet, in reality, fine-scaled spatial heterogeneity in abiotic environmental attributes (e.g. terrain, soils, climate) within an ecoregion will tend to bias human uses to particular parts of the region (e.g. a greater likelihood of agriculture in flatter, more fertile environments) (Fig. 1a). Since these same environmental attributes shape natural distributions of species at landscape scale (Fig. 1b), impacts of any given land-use change within an ecoregion will tend to be biased towards a subset of the species occurring within that region (Fig. 1c). This means that protected-area or habitat indicators derived using ecoregions as the fundamental units of analysis risk under-or over-estimating implications of protection or habitat transformation for biodiversity contained within these regions (Ferrier et al., 2004; Londoño-Murcia et al., 2010).
The other major source of global data on biodiversity patterns commonly used for deriving indicators – i.e. extent-of-occurrence range maps for terrestrial vertebrate species (e.g. Jenkins et al., 2013) – presents similar spatial-resolution challenges. As for ecoregions, this data source has, over recent years, enabled the derivation of a wide variety of indicators, and has also underpinned numerous macroecological analyses of global biodiversity patterns. However the relatively coarse resolution of most range maps, and the reality that species occupy only those parts of their overall range offering suitable environmental conditions, has led some workers to suggest that these data should not be employed at a grid resolution finer than 1 degree, or approximately 100km x 100km near the equator (Hurlbert & Jetz, 2007). This again is a resolution far coarser than the spatial grain of key ecological processes shaping land-use and biological distributions at landscape scale.
Species distribution modelling (SDM) provides one widely used means of refining the resolution of mapped species distributions, by using fine-resolution environmental surfaces to characterise and spatially project a species’ niche space (Elith & Leathwick, 2009). This can be achieved either by using known occurrence records to fit a correlative model predicting occurrence of a given species as a mathematical function of multiple environmental variables, or through deductive modelling in which occurrence is predicted using simple rule-based descriptions of environmental suitability derived from expert knowledge (Ferrier, 2002). Distributions predicted using SDM can be used either directly in assessments, or combined with mapped species ranges, where available (e.g. for vertebrates), thereby providing refined mapping of the expected distribution of each species within its known range (Merow et al., 2017). However, regardless of the precise SDM technique employed, application of this general approach is restricted to species for which either there is a sufficient number of occurrence records available to develop a correlative model, or there is sufficient expert knowledge of the species’ habitat requirements to develop a deductive model. This capacity is therefore limited to a subset of species which is biased both taxonomically and geographically, with some regions of the world lacking adequate data to generate reliable SDMs even for better-known biological groups such as vertebrates, let alone for invertebrates and plants (Meyer et al., 2015).
Here we adopt an alternative, yet highly complementary, approach to integrating species-occurrence records with fine-scaled environmental surfaces. This allows us to refine the spatial resolution with which patterns in the distribution of biodiversity can be mapped across our planet. Rather than attempting to model distributions of individual species, this approach instead focuses on modelling, and thereby mapping, collective properties of biodiversity as a correlative function of environmental predictors. Macroecological modelling of spatial variation in alpha diversity, particularly of variation in local species richness, has a relatively long history of application in ecology and conservation biology (e.g. Francis & Currie, 1998). However, with increasing awareness that the total (gamma) diversity encompassed by any set of areas (e.g. in a conservation reserve system) will typically depend more on the extent to which these areas complement one another in terms of species composition, than it does on the richness of individual areas, macroecological modelling is now placing greater emphasis on modelling patterns of beta diversity in addition to those of alpha diversity (Ferrier & Guisan, 2006; D’Amen et al., 2017).
We here introduce a new capability for global biodiversity assessment – BILBI (the Biogeographic Infrastructure for Large-scaled Biodiversity Indicators) – underpinned by macroecological modelling of collective properties of biodiversity. The initial implementation of this infrastructure relies strongly on modelling of beta diversity patterns using an extension of one particular technique – generalised dissimilarity modelling (GDM; Ferrier et al., 2007) – applied to readily available biological and environmental datasets. The overall framework is, however, designed to be sufficiently generic and flexible to allow incremental refinement and addition of modelling techniques and datasets into the future. This capability is also intended to complement, rather than compete with, other approaches to global biodiversity assessment, including those focussed on individual species (e.g. Jetz et al., 2012). Species-level approaches will always play a vital role in biodiversity assessment for better-known biological groups, and especially for species of particular conservation concern within these groups. However the approach described here has potential to add significant value to such species-based assessments by: 1) allowing more effective use of data for highly-diverse biological groups, containing large numbers of species but with few records per species; and 2) enabling robust extrapolation of expected patterns across poorly-sampled regions, even where the particular species occurring in these regions are unknown or unsurveyed.
GENERAL FRAMEWORK
The BILBI modelling framework (Fig. 2) integrates advances in macroecological modelling, biodiversity informatics, remote sensing and high-performance computing to assess spatio-temporal change in biodiversity at 30-arcsecond (approximately 1km) grid resolution across the entire terrestrial surface of the planet, excluding Antarctica (above 60∼S). Best-available data on observed occurrences of species within defined biological groups (e.g. all vascular plants) are used to fit correlative models describing patterns in the distribution of biodiversity as a function of fine-scaled spatial variation in climate, terrain and soils, within major habitat types (biomes) and biogeographic realms. These patterns are mapped as spatially-complete gridded surfaces by interpolating and, where necessary, extrapolating predictions from the fitted models. The resulting surfaces describe patterns in the spatial distribution of biodiversity which would be expected in the absence of anthropogenic habitat transformation. These modelled patterns then serve as the foundation for two subsequent pathways of analysis in the BILBI framework (Fig. 2).
In the first pathway these patterns of biodiversity distribution are overlayed with observed changes in pressures (direct drivers) – particularly changes in habitat condition resulting from land-use change – or in management responses, such as the establishment of protected areas, to generate indicators of biodiversity change (e.g. for reporting progress towards the CBD’s Aichi Targets). In the second pathway, observed changes are replaced by projected changes in pressures and responses into the future. This enables application of BILBI in translating alternative scenarios of global change, and associated policy or management options, into expected consequences for the future persistence of biodiversity. In assessing such scenarios the BILBI framework allows consideration both of impacts mediated by changes in habitat condition, resulting for example from projected land-use change, and of potential impacts of climate change on community composition. The latter is predicted through space-for-time substitution of climate covariates in BILBI’s correlative models of spatial biodiversity distribution.
In the remainder of this paper we describe our initial implementation of the BILBI framework, focusing primarily on the foundational modelling of spatial patterns in the global distribution of terrestrial biodiversity. The two analytical pathways flowing from this foundation – relating to indicator generation, and scenario analysis respectively (Fig. 2) – will be addressed in detail in subsequent papers. As alluded to above, our modelling of biodiversity patterns in the initial implementation of BILBI has focused on describing and predicting spatial turnover in species composition – i.e. patterns of beta diversity. However, our longer-term intent is to extend this approach to accommodate joint modelling of both alpha and beta diversity; and to integrate next-generation techniques for achieving this as they become operational.
INITIAL GLOBAL IMPLEMENTATION
Modelling compositional turnover using presence-only data
Generalised dissimilarity modelling is a nonlinear regression technique for modelling the turnover in species composition between two sites as a function of environmental differences between, and geographical separation of, these sites. This technique accommodates two types of nonlinearity commonly encountered in large-scaled analyses of compositional turnover. The curvilinear relationship between increasing environmental or geographical distance, and observed compositional dissimilarity, between sites is addressed through the use of appropriate link functions in a generalised linear modelling framework. Variation in the rate of compositional turnover at different positions along environmental gradients is addressed by transforming these gradients using smooth monotonic functions fitted to the training data (Ferrier et al., 2007). The response variable in a standard GDM model is typically a measure of between-site compositional dissimilarity, calculated from lists of species observed at each of the two sites, using indices such as Sørensen or βsim (e.g. Jones et al., 2013; König et al., 2017). However, one of the biggest challenges in applying GDM globally has been that a large proportion of available species-occurrence data are presence-only rather than presence-absence in nature. Most occurrence records accessible through major data infrastructures, such as the Global Biodiversity Information Facility (GBIF), have been generated through geo-referencing of specimens from natural-history collections, or from relatively opportunistic field observations of individual species, rather than from planned inventories systematically recording all species present at a given site (Isaac & Pocock, 2015). Such data are not well suited to estimating compositional dissimilarity between sites, particularly in areas with lower sampling effort. This is because estimates of compositional dissimilarity will be inflated, to a varying yet unknown extent, by false absences of species at each of the sites concerned (Beck et al., 2013).
In implementing the BILBI framework we have addressed this problem by modifying GDM to work with a binary response variable, defined in terms of matches versus mismatches in species identity, for pairs of individual species observations (where a “species observation” is the recorded presence of a particular species at a particular site). The probability that a species randomly drawn from site i has the same identity as a species randomly drawn from site j is expected to be a function of both the total number of species actually occurring at each of the two sites (alpha diversity) and the number of species at each site which are unique to that site, because they do not occur at the other site (beta diversity), following the expression: where s is the identity of an individual species belonging to the combined list of n species occurring at either one, or both, of the sites and α is the number of species found at a particular site. As the quantity being summed reduces to 0 when a given species is not shared by both sites, this equation can be simplified to pi,j = (1/[αi αj])c, where c is the number of species shared between the two sites.
Using this understanding, we fit a modified form of GDM in which the standard response variable, describing the compositional dissimilarity between two sites (on a continuous scale between 0 and 1), is now replaced by a binary response variable describing the match (0) or mismatch (1) in species identity of a randomly drawn pair of species observations from the two sites. The probability of a mismatch in species identity is the complement of pi,j from above – i.e. 1-pi,j. This probability is modelled as a non-linear function of environmental covariates (predictors), in a similar manner to that employed in standard GDM model-fitting (Ferrier et al., 2007). However, due to the binary nature of the response when working with pairs of species observations, the negative exponential link function traditionally used in GDM is replaced by the logit link function. The overall form of an observation-pair GDM (obs-pairGDM) fitted to m environmental covariates is therefore: where |fl(xl,i) − fl(xl,j)| represents the separation of a pair species observations i and j along a nonlinear function of environmental covariate l fitted using monotonic I-splines (Ferrier et al., 2007). This is not yet an estimate of the turnover in species composition between two sites. As we showed above, pi,j is a function not only of compositional turnover, but also of the richness of species at the two sites concerned. However, if we can estimate the mean species richness of these sites then we can decompose pi,j into an estimate of compositional turnover between the sites (di,j) using: where p0 represents an estimate of the probability that a randomly drawn pair of species from a pair of identical sites (i.e. compositional turnover equals 0) are the same. To enable fitting we estimate p0 from the intercept of our model – i.e. the point where environmental separation between sites is 0 and thus the sites are treated as the same. For any subsequent analyses any prediction of di,j between a pair of sites can be treated as the expected proportion of species occurring at one site which do not also occur at the other site (averaged across the two sites), and is therefore effectively an estimate of the Sørensen index.
By fitting our models to pairs of individual species observations this method avoids the biases that can result from modelling community data where the inventory of species at sites is incomplete. We just need to satisfy the assumption that the particular species recorded as present at a given site constitute a random sample drawn from all species actually occurring at that site. In its current form the method also assumes that local species richness (alpha diversity) remains reasonably constant across the region of interest – i.e. that the number of species actually occurring at individual sites (1km cells in this study) does not vary markedly across the region – and therefore that the effect of alpha diversity on the response being modelled is accounted for by the model’s intercept. As we describe in the next section, we have taken considerable care to minimise violation of this assumption by fitting separate models for different biomes and biogeographic realms. Our team is also currently developing an extension of the above approach which relaxes this assumption, and thereby explicitly models pi,j as a function of variation in both alpha and beta diversity. Preliminary testing suggests that that this approach holds considerable promise as a means of simultaneously modelling patterns of both species richness and compositional turnover from presence-only data.
Biological inputs
Compositional-turnover models covering the terrestrial surface of the planet (above 60∼S) were developed for three biological groups – vascular plants, invertebrates and vertebrates. Species-occurrence data were obtained by first downloading all occurrence records accessible through GBIF for vascular plants, invertebrates and reptiles (as of March, 2014), and all records accessible through the Map of Life (MoL; Jetz et al., 2012) for birds, amphibians and mammals (as of May, 2014). All data were filtered to remove: records without accepted genus and species names (using the GBIF Backbone Taxonomy; https://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c); records with a specified spatial precision greater than 10km; and records falling outside of our land mask (i.e. in the ocean or in other large water bodies). Records falling ≤ 1 km from the coastline, but belonging to terrestrial species, were moved to the nearest terrestrial cell.
For each of the three broad biological groups for which compositional-turnover models were derived – i.e. vascular plants, invertebrates and vertebrates – occurrence records were further filtered, and partitioned into taxonomic sub-groups, based on rules specific to each broad group. The main purpose of this sub-grouping was to ensure that pairs of species observations used in model fitting were of species likely to be recorded by the same community of practice of scientists and/or naturalists. For vertebrates, records were partitioned into four sub-groups – mammals, birds, reptiles and amphibians – meaning, for example, that a record of a mammal species could only be paired with another mammal record, and not with a record of a bird, reptile or amphibian. All species in these four vertebrate sub-groups were used in model fitting, except those which obtain resources primarily from the marine environment (e.g. procellariform seabirds and pinniped). For invertebrates, the 15 sub-groups employed each was associated with a relatively strong global community of practice (therefore ensuring a reasonably stable taxonomy and geographic spread of records) and contained species which are predominantly terrestrial, or are in a terrestrial phase of their life cycle when sampled. These invertebrate sub-groups, mostly arthropods, were: ants, bees, beetles, bugs, butterflies, centipedes, dragonflies, grasshoppers, millipedes, moths, snails, spiders, termites, true flies, and wasps (see Table 1 for details). For vascular plants, all species were placed in a single sub-group.
All species records which passed the above filtering processes were then assigned to individual 30-arcsecond resolution grid-cells concordant with the spatial grain of the environmental surfaces used in the modelling. These data were then consolidated to remove replicate species records - i.e. only a single record of any given species was retained for any given cell. A final filter was then applied to remove cells with an extremely high number of species (relative to surrounding sites), likely representing the location of biological collections (e.g. museums) rather than actual species locations. The final pool of data used for model fitting contained 106,815,923 records of a given species in a given cell, for 411,348 species (Table 1).
Environmental inputs
Environmental covariates were selected a priori from the suite of possible climate, terrain and soil predictors. Individual model selection of covariate sets was not performed as this would reduce the comparability of results between models and limit our ability to generate continuous surfaces of biological composition across large areas. This a priori selection aimed to capture ecologically limiting factors of major importance across the world, and drew particularly on variables which had made a significant contribution to models previously fitted by our team in GDM-modelling exercises across a range of scales and biological groups. Additional criteria were that the layers provided consistent global coverage, and that they were freely available.
A standard set of 15 environmental variables were employed in all of the fitted GDM models (Table 2). This set included five soil variables (bare ground, bulk density, clay, pH, silt), two terrain variables (topographic roughness index, topographic wetness index) and eight climate variables (annual precipitation, annual minimum temperature, annual maximum temperature, maximum monthly diurnal temperature range, annual actual evaporation, potential evaporation of driest month, maximum and minimum monthly water deficit). For the climate variables it was important that these could be consistently projected into the future, and so the WorldClim elevation-adjusted data set (Hijmans et al., 2005) was chosen over products incorporating remotely-sensed data. We further adjusted the temperature, evaporation and water-deficit variables for the radiative-shading effects of topography based on the GMTED2010 DEM (Danielson & Dean, 2011). Details of this adjustment, and of the associated techniques we used to derive the evaporation and water-deficit variables are provided in (Reside et al., 2013a). All environmental grids were aligned to the WorldClim 30-arcsecond land-extent layer, with water bodies (defined as the Global Lakes and Wetlands Database v3 Lakes and Reservoirs: Lehner & Doll, 2004) masked out. Where necessary for the soil and terrain layers, minor information gaps were filled using a combination of extrapolation and appropriate values drawn from the literature. All methods developed and used to create these layers were designed to be applicable to both present-day climatic conditions and future climate scenarios generated from General Circulation Models (GCMs).
Model fitting
A suite of models, describing compositional-turnover for the three biological groups (invertebrates, vascular plants and vertebrates), was generated using the obs-pairGDM technique (e.g. Fig. 3a-d). These models were developed within the World Wildlife Fund’s nested biogeographic-realm, biome and ecoregion framework (Olson et al., 2001). A separate model was fitted for each possible pairing of the three biological groups with the 61 unique intersections of biomes and realms – referred to here as “bio-realms” – thereby yielding a total of 183 models (3 biological groups x 61 bio-realms) (Table 3).
This allowed consideration of major biogeographic discontinuities between realms and potential variation in the response of different species assemblages between biomes. However biomes were, as far as possible, not treated as closed systems. Model fitting drew on biological and environmental data from both the core biome and adjacent biomes within the same realm, and from adjacent ecoregions in neighbouring realms where the realm boundary was considered porous to the movement of species (e.g. the Nearctic/Neotropics divide in Florida) (Table 3). Models were then fitted to a combination of pairs of species observations, in which 50% of the pairs consisted of observations exclusively drawn from within the target bio-realm and the other 50% paired an observation from within the bio-realm with an observation from one of the buffering regions (Table 3).
We sampled a maximum of 1.5 million unique pairs of species observations to fit the model for each combination of broad biological group (vascular plants, invertebrates, vertebrates) and bio-realm. Each pair of species observations was always drawn from within the same taxonomic sub-group, rather than between sub-groups (See Table 1 for groupings). The 1.5 million pairs of observations were drawn as evenly as possible from the sub-groups within each broad biological groups – e.g. for vertebrates, an equal number of observation pairs were drawn from the data for mammals, birds, reptiles and amphibians. Additionally, to maintain a reasonable level of data independence, and to avoid a small number of species observations unduly influencing the fitting of any model, each individual species observation was only used a maximum of 10 times in the sampling of observation pairs.
Our models were fitted using three I-spline basis functions for each environmental covariate (with the exception of bare ground which used a single linear function due to it essentially being a binary variable) with knots placed at the 0, 0.5 and 1 quantiles of the distribution of values for that covariate within each target bio-realm. Where the environmental envelope of samples taken from buffering regions lay outside the envelope for the target biome, additional knots were added at the outer limits, resulting in 3, 4 or 5 knots depending on the structure of the data. This allowed the description of spatial turnover in species composition to be tuned primarily to the target bio-realm while also accounting for extensions of these patterns into neighbouring regions. We regarded the latter as being of particular importance for any subsequent use of these models to project impacts of climate change on biodiversity distribution.
As indicated above, the effect of major biogeographic discontinuities on spatial turnover in species composition was accounted for by fitting separate models for different bio-realms. The effect of geographic separation on compositional turnover at finer spatial scales was then addressed by including straight-line geographic distance between species observations as an additional covariate in models fitted within each bio-realm. Observed correlation between geographic distance and compositional turnover can result not only from the effects of geographic isolation per se, but also from high levels of spatial autocorrelation exhibited by many environmental variables (Warren et al., 2014). We therefore took considerable care to avoid geographic distance overwhelming, or masking, the effects of more direct environmental predictors within our models. We achieved this by fitting each model in a two-staged procedure; firstly fitting to the a priori selected set of environmental predictor variables alone, and then refitting the model with the combined effect of the environmental covariates now fixed as an offset, thereby allowing the addition of geographic distance to describe only variation not already accounted for by the environmental covariates. The effect of geographic distance was fitted using a linear function, rather than the complex splined functions used for the environmental covariates.
To aid in model fitting efficiency, the number of matching species observations used was up-weighted to provide a sample set that included a 1:1 ratio of species matches and mismatches. However, this up-weighting has the effect of disrupting the relationship between our modelled property (pi,j) and compositional turnover. Following model fitting the predicted probabilities of obtaining a species mismatch (pi,j) were corrected using: where (wm) represents a pre-calculated weighting value derived from the ratio of species mismatches and species matches from a completely random sample of species observation pairs for each species x bio-realm combination (𝑚). Adding eq. 4 into eq. 3 we then use: to derive our estimate of compositional turnover.
Example model outputs
Our fitted models greatly refine the mapping of beta-diversity patterns relative to more traditional surrogates such as ecoregions (Fig. 4a). By working at a biologically-relevant spatial resolution, and allowing biological composition to vary continuously across environmental and geographic space at this resolution, we are able to predict finer-scaled patterns of variation within ecoregions, or other large discrete units, and across the boundaries of these units (Fig. 4b). In a recent evaluation of the performance of GDM in mapping beta-diversity patterns across the Australian continent, Ware et al. (2018) demonstrated that GDM models fitted to scattered presence-only data on species occurrences, of the type employed here in our global modelling, achieved significantly better concordance with actual patterns of spatial variation in biological composition, based on independent test datasets for plants, invertebrates and vertebrates, than that achieved by best-available mapping of bioregions and major vegetation types.
When mapping of habitat loss and protected-area boundaries are superimposed over our modelled patterns of compositional variation, simple visual inspection suggests strong levels of covariance between these layers. In other words habitat loss and protection are not distributed randomly in relation to finer-scaled compositional variation, but are instead biased towards particular subsets of this variation. This level of covariance opens up considerable potential to derive protected-area and habitat indicators globally which account more effectively for biases in the distribution of habitat protection and loss playing out at a resolution below that of ecoregions (Fig 4b). We outline current initiatives tapping into this potential in the following section.
While we prefer to present results of our modelling as continuous patterns of compositional turnover, and to employ these continuous predictions directly in any subsequent assessment, we also recognise that some applications may be constrained to working with biodiversity surrogates taking the form of a discrete classification. For such applications there is potential to derive discrete classes (i.e. ‘ecosystems’ or ‘biologically-scaled environmental domains’) from our modelling of continuous patterns of compositional turnover, through numerical classification (Fig. 5a). This involves using the predicted compositional similarity between pairs of grid-cells to numerically cluster similar cells into discrete classes (for further explanation of this approach seeFerrier et al., 2007; Leathwick et al., 2011). Classifications such as these can be generated for a hierarchy of different spatial domains ranging from, for example, whole realms or bio-realms (Fig. 5a) through to further subdivision of any given class within this larger domain (Fig. 5b). Relative to approaches that rely on classifying patterns based on environmental variables alone (e.g. Sayre et al., 2014), classifications derived from this approach have the benefit of incorporating best-available biological information through the scaling of environmental space based on modelled patterns in compositional turnover. In an evaluation of the performance of a GDM-based classification of New Zealand’s river and stream system, Leathwick et al. (2011) found that this approach achieved significantly better discrimination of independently observed biological patterns than did either rule-based or numerical classifications based on environmental variables alone.
EMERGING APPLICATIONS TO GLOBAL BIODIVERSITY ASSESSMENT
As depicted in Fig. 2 our modelling of spatial patterns in the distribution of biodiversity within BILBI is intended to inform global biodiversity assessment through two subsequent pathways of analysis: 1) generation of indicators of past-to-present change in biodiversity based on observed changes in habitat condition and protected-area coverage; and 2) projection of potential future change in biodiversity as a consequence of alternative scenarios of global change (particularly changes in land-use and climate) and associated policy options. Considerable progress has already been made in applying the BILBI modelling framework to both of these activities over the past few years. While we here provide a brief overview of this work, detailed description of the techniques employed in, and the results obtained from, these applications are beyond the scope of this paper, and will be covered in a number of forthcoming publications.
The BILBI framework has been used to generate two global indicators for reporting progress against CBD Aichi Targets. Both of these are now included in the CBD’s official list of recognised indicators (https://www.cbd.int/doc/decisions/cop-13/cop-13-dec-28-en.pdf), and in the suite of indicators coordinated and promoted by the Biodiversity Indicators Partnership (https://www.bipindicators.net/). The first of these two indicators – the Protected Area Representativeness Index – assesses the extent to which terrestrial protected areas are “ecologically representative”, in accordance with Aichi Target 11 (https://www.bipindicators.net/indicators/protected-area-representativeness-index-parc-representativeness). This assessment is performed at a much finer ecological and spatial resolution than that typically employed in other assessments of protected-area representativeness. This is achieved by combining BILBI’s modelling of spatial turnover in biodiversity composition with the World Database on Protected Areas (UNEP & WCMC, 2016), and the analytical approach to assessing representativeness using continuous predictions of compositional turnover described by Ferrier et al. (2004) and Allnutt et al. (2008)
The second indicator – the Biodiversity Habitat index – is intended to add value to existing assessments of the “rate of loss … of all natural habitats, including forests”, under Aichi Target 5, by translating the observed spatial distribution of habitat loss and degradation into expected impacts on retention of terrestrial biodiversity (https://www.bipindicators.net/indicators/biodiversity-habitat-index). This indicator is generated using the same analytical approach underpinning the Protected Area Representativeness Index, but now combining BILBI’s compositional-turnover modelling with best-available data on habitat condition in place of protected-area coverage. Initially the indicator was derived for forest biomes only, using spatially-explicit data on forest loss from the Global Forest Change dataset (Hansen et al., 2013). However it is now being expanded to cover all terrestrial biomes across the planet by estimating condition through an extension of the Hoskins et al. (2016) statistical downscaling of coarse-resolution land-use data using 30-arcsecond environmental and remotely-sensed land-cover covariates. This work has adapted the Hoskins et al. (2016) approach to employ Version 2, in place of Version 1, of the Land Use Harmonisation product, thereby generating downscaled estimates of 12, rather than the original five, land-use classes (http://luh.umd.edu/), and MODIS Vegetation Continuous Fields (http://glcf.umd.edu/data/vcf/) as remote-sensing covariates in place of discrete land-cover classes. Applying this downscaling approach across multiple years provides an effective means of translating observed changes in remote-sensing covariates into estimated changes in the proportions of land-use classes occurring in each and every 30-arcsecond terrestrial grid-cell on the planet. These proportions can then, in turn, being translated into an estimate of habitat condition, for any given cell in any given year, using coefficients derived from global meta-analyses of land-use impacts on local retention of species diversity undertaken, for example, by the PREDICTS project (Hudson et al., 2014; Newbold et al., 2016).
In addition to generating the above indicators of past-to-present change, the BILBI framework is also being used to project potential future change in biodiversity as a consequence of alternative scenarios of global change and associated policy options. BILBI has already been employed, alongside a range of other biodiversity and ecosystem modelling approaches, in two major multi-model scenario analyses – one assessing the potential biodiversity and ecosystem-service impacts of selected combinations of land-use trajectories (based on Shared Socio-economic Pathways) and climate trajectories (based on Representative Concentration Pathways) (Kim et al., 2018); and the other assessing policy options to reverse ongoing biodiversity loss resulting from detrimental changes in land use (Leclere et al., 2018). To project the potential effects of climate change on beta-diversity patterns space-for-time substitution is used to predict the level of compositional change expected over time, as a function of BILBI’s compositional-turnover models describing how species composition changes spatially along present-day environmental gradients (for further detail on GDM-based projection of compositional change under climate change see: Fitzpatrick et al., 2011; Blois et al., 2013). Fine-scaled projection of changes in habitat condition is achieved by linking statistical downscaling of present-day land use (described above) with coarse-resolution scenarios of land-use change. The species-area relationship (SAR) is then used to predict the proportion of species expected to persist over the longer term, as a function of the effective proportion of habitat remaining. In contrast to other SAR-based approaches, which work with discrete environmental classes or ecosystem types, this approach applies the SAR to biologically-scaled environments varying continuously across space and time (for further detail of this particular approach see: Ferrier et al., 2004; Allnutt et al., 2008).
Two other broad areas of potential application of the BILBI modelling framework are worth noting at this point. GDM-based modelling of compositional turnover has been applied previously in both of these contexts, but only at sub-global scales (Ferrier et al., 2007). Establishment of the BILBI framework now opens up new opportunities to extend these applications globally. The first of these would involve using the scaled multidimensional environmental space, resulting from the fitting of GDM models within BILBI, to extrapolate geographical distributions of individual species as a function of the density of occurrence records for any given species across this environmental space. When (Elith et al., 2006) tested this approach by applying GDM, coupled with a simple kernel-regression technique, to data from six study-regions around the world they achieved a level of predictive performance very similar to that of MaxEnt (which later became the world’s most widely applied SDM technique) and markedly better than that achieved by most of the other SDM techniques evaluated in that study. Considerable potential now exists to employ the GDM models already fitted within the BILBI framework as a foundation for extrapolating the potential global distribution of any species with a sufficient number of occurrence records from the > 400,000 species of plants, invertebrates and vertebrates contributing data to these GDMs.
The second of these potential applications would involve using the fitted GDM models within BILBI to help assess the adequacy of biological sampling across environmental and geographical space, and to identify gaps in this coverage to help prioritise future survey effort. Ferrier (2002) outlined an iterative strategy for survey gap-analysis and prioritisation which couples GDM modelling with p-median analysis, an operations-research technique originally adapted for use in conservation biology by Faith and Walker (1996). This strategy prioritises locations which can best fill gaps in the coverage of existing survey sites across a GDM-scaled multidimensional environmental and geographical space fitted to the biological data from those existing sites. Data collected at these new sites can then be used to test and refit the GDM, thereby enabling iterative refinement of both the modelling itself, and the prioritisation of remaining gaps in biological data coverage. While the potential of this survey gap-analysis strategy has already been demonstrated globally using an unscaled environmental-geographical space, as part of the trial GBIF-MAPA application (Flemons et al., 2007), implementation of the fully coupled GDM/p-median approach has, until now, been limited to selected regions and taxa (e.g Ferrier et al., 2007; Bell et al., 2014). The establishment of global GDM models within the BILBI framework has therefore now opened up an unprecedented opportunity to apply this approach across the entire terrestrial surface of the planet. This would help to direct future biological survey and collection efforts to maximise not only the environmental and geographical coverage of occurrence records for known species, but also the likelihood of discovering new species previously unknown to science, especially within lesser-studied, yet hyper-diverse, taxa.
CONCLUSION
The BILBI modelling framework offers a means of making more effective use of scattered occurrence data for highly-diverse biological groups to map fine-scaled patterns in the distribution of biodiversity worldwide – including robust extrapolation of expected patterns across poorly-sampled regions, even where the particular species occurring in these regions are unknown or unsurveyed. This capability is intended to complement, rather than compete with, other approaches to global biodiversity assessment, including those employing discrete land classifications (e.g. ecoregions, ecosystem types) and those focussed on individual species within better-known biological groups, or on species of particular conservation concern.
While the initial implementation of BILBI relies strongly on modelling of beta diversity patterns using one particular technique (GDM) we have purposely designed the overall framework to be sufficiently generic and flexible to allow incremental refinement and addition of modelling techniques and datasets into the future. As noted earlier, work is already proceeding to incorporate mapping of alpha-diversity patterns, alongside the existing mapping of beta diversity, by extending our modelling approach to simultaneously model both species richness and compositional turnover from presence-only data. Also high on our list of priorities for model enhancement is to more rigorously account for the effects of biogeographical barriers on compositional turnover, at multiple scales. In its current form BILBI accounts for such effects only by assuming complete distinctiveness in species composition between major biogeographical realms, and through inclusion of straight-line spatial separation (or ‘isolation by distance’) as an explanatory variable, alongside environmental predictors, in the fitting of GDMs. Considerable potential exists to refine this approach by incorporating measures of ecological isolation (e.g. isolation of islands by intervening ocean, or isolation of mountains by intervening lowlands) into the modelling (Ferrier et al., 2007).
There is also considerable potential to refine our modelling of biodiversity patterns within BILBI through incorporation of new and emerging sources of environmental and biological data. This includes advances in the use of remote sensing to derive global environmental surfaces tailored specifically to the needs of biodiversity modelling (e.g. Wilson & Jetz, 2016). In terms of biological data, one of our highest priorities is to augment our existing focus on species-occurrence data (i.e. presence-only records) with greater use of inventory datasets, particularly regional and global compilations of survey-plot data for plants (e.g. Franklin et al., 2017). While the geographical coverage of such datasets is often patchy relative to species-occurrence datasets, they offer valuable potential to independently test the performance of existing BILBI models in predicting compositional turnover between rigorously-sampled sites, and to compare this performance with that of other modelling techniques (e.g. SDMs) and biodiversity surrogates (e.g. mapped ecoregions or ecosystems). Another priority is to make better use of information on phylogenetic relationships between species, where available for particular biological groups, to extend our current modelling of beta diversity to account for turnover in phylogenetic composition, rather than simply taxonomic composition (Rosauer et al., 2014).