Abstract
Microbes are embedded in complex microbiomes where they engage in a wide array of interspecific interactions. However, how these interactions shape diversification, and ultimately biodiversity, is not well understood. Two competing hypotheses have been put forward to explain how species interactions could influence diversification rates. Ecological Controls (EC) predicts a negative diversity-diversification relationship, where the evolution of novel types becomes constrained as available niches become filled. Diversity Begets Diversity (DBD) predicts a positive relationship, with diversity promoting diversification via niche construction and other species interactions. Using the Earth Microbiome Project, the largest standardized survey of global biodiversity to date, we provide support for DBD as the dominant driver of microbiome diversity. Only in the most diverse microbiomes does DBD reach a plateau, presumably because of increasingly saturated niche space. Genera that are strongly associated with particular biomes show a stronger DBD relationship than non-residents, consistent with prolonged evolutionary interactions driving diversification. Genera with larger genomes also experience a stronger DBD response, which could be due to a higher potential for metabolic interactions and niche construction. Our results provide evidence that microbiome diversity – and its potential for future diversification – is crucially shaped by species interactions.
One Sentence Summary Making use of the most comprehensive biodiversity catalogue, the Earth Microbiome Project, we show that prokaryote diversification significantly correlates with microbiome diversity, pointing to the importance of species interactions in driving microbial evolution and community structure.
Main text
The majority of the Tree of Life consists of microbes (1–3) and the functioning of all Earth’s ecosystems is reliant on diverse microbial communities (4). High-throughput 16S rRNA gene amplicon sequencing studies continue to yield unprecedented insight into the taxonomic richness of microbiomes (e.g. (5, 6)), and abiotic drivers of community composition (e.g. pH (7, 8)) are increasingly uncovered. Microbe-microbe (biotic) interactions can also be important in determining community composition (9), but comparatively little is known about how such interactions (e.g. cross-feeding (10) and toxin-mediated interference competition (11, 12)) shape microbiome diversity.
The dearth of studies exploring how microbial interactions influence diversification stands in marked contrast to a long research tradition on biotic controls of plant and animal diversity (13, 14). In an early study of 49 animal (vertebrate and invertebrate) community samples, Elton plotted the number of species versus the number of genera and observed a ~1:1 ratio in each individual sample, but a ~4:1 ratio when all samples were pooled (14). Elton took this observation as evidence for competitive exclusion, preventing related species, more likely to overlap in niche space, to co-exist. This concept, more recently referred to as niche filling or “ecological controls” (EC) (15) predicts speciation (or, more generally, diversification) rates to decrease with standing species diversity because available niches become filled over time (16). In contrast, the Diversity Begets Diversity (DBD) model predicts that diversity favors further diversification when species interactions create novel niches (17, 18). For example, niche construction (i.e. the physical, chemical or biological alteration of the environment by microbes) could influence the evolution of the species constructing the niche, and/or that of co-occurring species (19, 20). Evidence for the action of EC vs. DBD in plant and animal communities has been mixed (18, 21–24), and models suggest that intraspecific competition can promote diversification, while interspecific competition and predation inhibit diversification (25). Laboratory evolution experiments tracking the diversification of a focal bacterial lineage in communities of varying complexity have also yielded contradicting results, with support for EC, DBD, or intermediate scenarios (26–30).
To test whether natural microbial communities are shaped by EC or DBD dynamics, we used 2,000 microbiome samples each rarefied to 5,000 16S Amplicon Sequence Variants (ASVs; exact, error-corrected DNA sequences, directly comparable across samples (31)) archived in the Earth Microbiome Project (EMP), the largest available repository of biodiversity based on standardized sampling and sequencing protocols (32). All samples used were subjected to rarefaction, as diversification rate estimates are highly sensitive to sampling effort (33,34). In the absence of time series or a fossil record, microbial diversification must be inferred from extant diversity. Rather than taking a phylogenetic approach requiring complex assumptions (35, 36), we use the equivalent of the Species:Genus (S:G) ratios that Elton used three quarters of a century ago (14) to infer bacterial diversification rates. We use a range of taxonomic ratios (ASV:Genus, Genus:Family, Family:Order, Order:Class, and Class:Phylum) as proxies for diversification of a focal lineage of interest, from shallow to deep evolutionary time, and plot these as a function of the number of non-focal lineages (Genera, Families, Orders, Classes, and Phyla, respectively) with which the focal lineage could interact. A negative relationship is consistent with the EC hypothesis, whereas a positive relationship is consistent with the DBD hypothesis (Figure 1).
We used generalized linear mixed models (GLMMs) (37) to determine how the diversification of a focal lineage (e.g. its ASV:Genus ratio) is affected by the diversity of other lineages (e.g. non-focal genera) in the community. The effects of environment (as defined by the EMP Ontology ‘level 3 biomes;’ Methods) and the identity of the focal lineage were included by fitting these as random effects on the slope and intercept. The submitting laboratory (identified by the principal investigator) and the EMP unique sample identifier (i.e. if two taxa were part of the same sample) were also included as random effects. The DBD model was supported across all taxonomic ratios, which all had significantly positive slopes fitting the diversity-diversification relationship (Table 1, Supplementary Data file 1 Section 1). In some cases, the variation due to lineage, lineage*environment, or environment* laboratory interactions could reverse the sign of the slope (Table 1), but the vast majority of slope estimates are positive, supporting DBD (Figure S1).
As an illustration, Figure 2 shows diversification within the Proteobacteria (Classes:Phylum) as a function of all other Phyla in each sample across 17 biomes. All slopes were significantly positive (P < 0.001 after Bonferroni correction), except in hypersaline and non-saline sediments. The strongest DBD effect was found in animal proximal gut microbiomes (slope=0.3, P=1.03e–08). For each taxonomic ratio, the three most prevalent taxa followed positive slopes in most environments (Figures S2-S6), with only a few instances of significantly negative slopes (Alphaproteobacteria, Rhizobiales, Flavobacteriales, Flavobacteriaceae, Planctomyces in the plant rhizosphere, and Clostridium in non-saline sediment). Overall, the majority of significant slope estimates were positive (81.7-99.5% depending on the taxonomic ratio considered), favoring DBD as the dominant (but not universal) model supported by the data (Table S1).
To test for any potential confounding effects of data structure or sampling bias, we sought to remove any patterns of co-occurrence between ASVs in the same sample via permutation. We took 2,000 simulated samples by selecting from the overall distribution of 155,002 unique ASVs across all samples, weighted by their abundance (total number of sequence counts). This resulted in a slightly negative diversity-diversification relationship (slope = –0.002; Pearson correlation = –0.61; P<2.2.e–16; Figure S7), indicating that the observed positive relationships (Table 1; Figure 2) are not the effect of data structure.
The taxonomic approach considered thus far has certain limitations. First, not all taxonomic ranks have the same phylogenetic depth (38) and not all named taxa are monophyletic (39). Second, unclassified ASVs had to be excluded from taxonomic analysis. Although these limitations are not expected to bias the results toward either EC or DBD, we sought to validate the results with a taxonomy-independent approach, clustering ASVs at decreasing levels of nucleotide identity, from 100% identical ASVs down to 75% identity (roughly equivalent to phyla (40)). We estimated diversification as the mean number of descendants per cluster (e.g. number of 100% clusters per 97% cluster) and plotted this against the total number of non focal clusters (97% identity in this example). For each of the six nucleotide divergence ratios tested, the relationship between diversity and diversification remained positive (Figure S8), consistent with DBD and suggesting that the taxonomic analyses were largely unbiased.
To exclude the possibility that our results were driven by abiotic confounders, we repeated the taxonomic analysis on a subset of 192 EMP samples for which measurements of four important abiotic drivers of diversity, temperature, pH, latitude, and elevation (7, 8, 15, 41) were available. We fitted a GLMM with diversification rate as the dependent variable, and with the number of non-focal lineages, the four abiotic factors and their interactions as predictors (fixed effects). As in the full dataset (Table 1), diversification was positively associated with diversity at all taxonomic ratios (Table S2). As expected, certain abiotic factors, alone or in combination with diversity, had significant effects on diversification. However, the effects of abiotic factors were always weaker than the effect of community diversity (Table S2; Supplementary Data file 1 Section 2). Although only a small subset of abiotic factors was considered, this analysis suggests that the DBD trend is unlikely to be mainly driven by variation in the abiotic environment.
The DBD hypothesis rests on the premise that species interactions drive diversification (15, 18). We therefore expect that lineages that are more tightly associated with a specific biome (i.e. long-term residents) are more likely to have had a long history of interaction and thus are more likely to experience DBD than lineages that are not tightly associated with that biome (i.e. poorly adapted migrants or broadly adapted generalists). To test this prediction, we clustered environmental samples by their genus-level community composition using fuzzy k-means clustering (Figure 3A). Three clusters were identified, which we refer to as ‘animal-associated’, ‘saline,’ and ‘non-saline’ clusters. All three clusters included some unexpected outliers (e.g. plant corpus grouping with animals), but were generally intuitive and consistent with known distinctions between host-associated vs. free-living (32), and saline vs. non-saline communities (42). Resident genera were defined as those with a strong preference for a particular environment cluster, using indicator species analysis (permutation test, P<0.05; Figure 3A; Figure S9; Supplementary Data file 2), and genera without a strong preference were considered generalists. For each environment cluster, we ran a GLMM with resident genus-level diversity (number of non-focal genera) as a predictor of diversification (ASV:Genus ratio) for residents, generalists, or migrants (residents of one cluster found in a different cluster) (Supplementary Data file 1 Section 3). Resident diversity had no significant effect on the diversification of generalists (z=0.646, P=0.518; z=0.279, P=0.780; z=0.347, P=0.729, respectively for animal-associated, saline and non-saline clusters), but did significantly increase resident diversification (z=7.1, P= 1.25e-12; z=3.316, P=0.0009; z=7.109, P=1.17e-12, respectively). Resident diversity significantly decreased migrant diversification in saline (z=-3.194, P=0.0014) and non-saline environment clusters (z=-2.840, P=0.0045), but had no significant effect in the animal-associated cluster (z=-0.566, P=0.571) (Figure 3B). These results suggest that diversity begets diversification among lineages sharing the same environment over a long evolutionary time period, but that this is not the case for lineages that do not consistently occur in the same microbiome and presumably interact less frequently. The diversification of migrants in a new environment might even be impeded, presumably because most niches are already occupied by residents.
The positive effect of diversity on diversification should eventually reach a plateau as niches, including those constructed by biotic interactions, become saturated. In the animal distal gut, a relatively low-diversity biome, we observed a strong linear DBD relationship at most sequence identity ratios; in contrast, the more diverse soil biome clearly attained a plateau (Figure S10). To further test the hypothesis that increasingly diverse microbiomes experience weaker DBD due to saturated niche space, we used a GLMM to model Family:Order, Order:Class and Class:Phylum diversification (as DBD slope variation by environment was statistically significant for these taxonomic ratios; Table 1). We included the interaction between diversity and environment type as a fixed effect. Consistent with our hypothesis, DBD slopes were significantly more positive in less diverse (often host-associated) biomes (Figure 4A, Figure S11, Supplementary Data file 1 Section 4). We also found evidence for a DBD plateau in the nucleotide identity analyses, in which cubic models provided slightly better fits than linear models across all biomes (Figure S8).
The Black Queen hypothesis posits that microbes embedded in complex communities can exploit the production of extracellular public goods produced by other species, resulting in selection for loss of genes encoding these goods – as long as the essential trait is not lost from the community as a whole (43). Lineages that interact more frequently with other lineages through such public good exploitation can be expected to experience greater genome reduction and also to experience stronger DBD, because their survival and diversification is dependent on other community members. To test whether genome size is negatively correlated with DBD, we assigned genome sizes to 576 genera for which at least one whole-genome sequence was available and added an interaction term between genome size and diversity as a fixed effect to the GLMM (Methods). Contrary to the Black Queen prediction, we observed a slight but significant positive effect of genome size on the slope (z=2.5, P=0.01; Figure 4B, Supplementary Data file 1 Section 5). The positive relationship may even be stronger than estimated, because genus-level genome size estimates are likely quite noisy. This result supports a model in which biotic interactions (and resulting diversification) drive genome expansion (e.g. through the accumulation of toxin-and resistance-gene diversity during antagonistic coevolution (11)). Alternatively (or additionally), species with larger biosynthetic gene repertoires and greater opportunity to engage in niche construction (19) could be more prone to interact with other species, driving DBD.
Using 10 million individual marker sequences, we demonstrate a pervasive positive relationship between prokaryotic diversity and diversification, which holds across a broad range of microbiome types. The strength of the DBD relationship dissipates with increasing microbiome diversity which might be due to niche saturation, or potentially due to the fact that highly diverse communities prevent species from reliably interacting with each other. DBD appears to be particularly strong among deeply diverged lineages (e.g. phyla), suggesting the importance of DBD in the ancient diversification of bacterial lineages and supporting the view that high taxonomic ranks are ecologically coherent (44, 45). The very early stages of diversification are inaccessible at the resolution of 16S ASVs, but could be addressed in the future using genomic and metagenomic methods. Due to the correlational nature of our data, it is not possible to test whether the positive relationship between diversification and diversity is primarily due to the creation of novel niches via biotic interactions and niche construction (20), or potentially due to increased competition leading to adaptation to underexploited resources (12, 29). Regardless of the underlying mechanisms, our results clearly demonstrate the importance of biotic interactions in shaping microbiome diversity, which has important implications for their function and stability (46, 47). The answer to the question ‘why are microbiomes so diverse?’ (48) might in a large part be because microbiomes are so diverse (23).
Funding
This project was made possible by an NSERC Discovery Grant and Canada Research Chair to BJS.
Authors contributions
Conceptualization: BJS, MV. Data curation: NM. Formal analysis: NM, MV, BJS. Funding acquisition: BJS. Investigation: NM, MV, PL, BJS. Methodology: NM, MV, PL, BJS. Resources: BJS, PL. Supervision: PL, BJS. Software: NM. Visualization: NM. Writing original draft: NM, MV, BJS. Writing -review & editing: NM, MV, PL, BJS.
Competing interests
none to declare.
Data and materials availability
All data is available from the Earth Microbiome Project (ftp.microbio.me), as detailed in the Methods. All computer code used for analysis are available at https://github.com/Naima16/dbd.git.
Acknowledgements
We thank Luke Thompson for assistance obtaining EMP data and Zofia Ecaterina Taranu, Vincent Fugère and Guillaume Larocque for advice on Generalized Linear Mixed Models. We are also grateful to Steven Kembel and Tom Battin for critical comments that improved the manuscript.