Abstract
For centuries, biologists have been captivated by the vast disparity in species richness between different groups of organisms. Variation in diversity is widely attributed to differences between groups in how fast they speciate or go extinct. Such macroevolutionary rates have been estimated for thousands of groups and have been correlated with an incredible variety of organismal traits. Here we analyze a large collection of phylogenetic trees and fossil time series and report a hidden generality amongst these seemingly idiosyncratic results: speciation and extinction rates follow a scaling law where both depend strongly on the age of the group in which they are measured, with the fastest rates in the youngest clades. Using a series of simulations and sensitivity analyses, we demonstrate that the time-dependency is unlikely to be a result of simple statistical artifacts. As such, this time-scaling has profound implications for the interpretation of rate estimates and suggests there might be general laws governing macroevolutionary dynamics.
Though once controversial, it is now widely accepted that both the traits of organisms and the environments in which they live can influence the pace of evolution of life on Earth (1–3). In particular, there is tremendous variation between groups of organisms in the rate at which species form and go extinct. This variation is reflected both in the wildly uneven diversity of clades in the fossil record and in the imbalanced shape of the tree of life (2–5). Our estimates of speciation and extinction rates vary by orders of magnitude when comparing different clades, locations, or time intervals.
In turn, researchers have suggested a tremendous array of mechanisms that may have accelerated or slowed the accumulation of biodiversity. These mechanisms include aspects of organisms, such as colour polymorphism (6), body size (7), and many others; the environment, including geographic region (8, 9), and temperature (10); and the interactions between the two (11). Taken as a whole, this growing body of work implicates a wide variety of factors that influence speciation and/or extinction rates and suggests that the growth of the Tree of Life has been largely idiosyncratic.
Despite the rapid growth of research on variation in speciation and extinction rates, few synthetic studies have been attempted. As a result, we know little about whether or not there are common factors that predict speciation and extinction rates across diverse taxa (2). One hurdle to synthesis is that studies, especially those using the tree of life, often focus more on relative than absolute diversification rates. That is, studies are focused more on whether speciation or extinction rates are higher in one part of a phylogenetic tree than another, rather than attempting to estimate those rates in absolute terms. This makes comparisons across studies difficult or impossible.
Another compelling reason to gather and compare estimates of speciation and extinction rates across clades is the potential for scale-dependence. In all other instances where macroevolutionary rates have been compared, these rates show a pattern of time-dependence, with the fastest rates estimated over the shortest timescales. This time-scale-dependence is apparent in rates of molecular (12) and trait evolution (13, 14), and has even been observed when estimating long-term rates of sedimentation (15). The prevalence of time-scaling of rate estimates in other types of data, and previous hints that there may be a similar pattern in speciation rates (16–20), suggests that a broader comparison is needed. Here we explore the time scaling of diversification rates using both phylogenetic data from the Tree of Life and paleobiological data from the fossil record and find evidence that there are indeed general scaling laws that govern macroevolutionary dynamics.
Results and Discussion
Using a Bayesian approach that allows for heterogeneity across the phylogeny (21), we estimated speciation and extinction rates across 104 previously published, time-calibrated molecular phylogenies of multicellular organisms which collectively contained 25,864 terminal branches (SI Appendix, Table S1). As many other studies have reported, we found that there is substantial variation in both speciation (Fig. 1A) and extinction (Fig S5) across groups ranging from 0.02 to 1.54 speciation events per lineage per million years -- a two-order magnitude difference (16, 22–25). Remarkably, much of this variation in rates can simply be explained by time alone. We found a strong negative relationship between the mean rates of both speciation and extinction and the age of the most recent common ancestor of a group (regression on a speciation rates: β = −0.536; P < 0.001; extinction rates β = −0.498; P < 0.001; Fig. 2A, B). In general, no matter what the taxonomic identity, ecological characteristics or biogeographic distribution of the group, younger clades appeared to be both speciating and going extinct faster than older groups.
We also recovered this same scaling pattern using an independent dataset of fossil time series, demonstrating that the time-dependency of diversification rates is a general evolutionary phenomenon and not simply a consequence of using extant-only data. We estimated origination and extinction rates from a curated set of fossil time series consisting of 17 orders of mammals, 22 orders of plants, and 51 orders of marine animals (mostly invertebrates), containing representatives from 6,144 genera (SI Appendix, Table S2), using the widely used per capita method (25). For fossil data we measured time as the duration over which a clade of fossil organisms existed, and analysed the formation and extinction of genera rather than species. Both origination (β = −0.237; P < 0.001; Fig. 2C) and extinction rates of fossil genera (β = −0.271; P < 0.001; Fig. 2D) were highly dependent on duration.
This result has profound consequences for how we measure and interpret rates of diversification. We suspect that many findings of diversification rate differences among groups will need to be reconsidered in light of our findings. The time-dependency of rates implies that the current practice in our field is incorrect: it is misleading to compare rates of speciation and extinction between groups, areas, traits, etc. that are of different ages (26–28). While we do not re-evaluate any previous studies here, we note that many of the regions of the world that are often recognized as hotspots of diversification (at least over the past several million years) are also precisely those regions that harbour young groups of organisms, such as the Páramo of the Andes (29), oceanic islands (30), or polar seas (23). There is a pressing need for novel methods that can fully account for the time-scaling of rates.
Potential causes of this ubiquitous pattern fall into two main categories. First, the underlying process of diversification may be consistent with that assumed by our current models but our data or methods are biased. Second, the true dynamics of diversification over macroevolutionary timescales may be inadequately captured by the birth-death processes which underlie virtually all of our contemporary methods. Using a series of simulations and sensitivity analyses, we show that statistical biases are unlikely to account for the entirety of our pattern. We present results investigating a variety potential artifacts, such as bias in the rate estimators, incomplete sampling, error in divergence time estimation, or acquisition bias (known as the “push of the past” in the macroevolutionary literature), all of which have been invoked when this pattern was previously glimpsed by biologists in studies of individual groups (16–18, 20, 31, 32). While all of these may generate a negative relationship between rate and time, none of the artifactual explanations we considered can fully account for the pattern we find in both phylogenetic and fossil time-series data.
We therefore suggest that the time-dependency of rates is indeed a real phenomenon that requires a biological explanation. Perhaps the simplest, and easiest to dismiss, is that there has been a true, secular increase in rates of diversification through time, such that, globally, evolution is faster now than it has ever been in the past. We fit variable rate models to each molecular phylogeny individually but find no evidence of widespread speedups within groups (Fig. 3A). While we do find support for shifts in diversification rates within trees (SI Appendix, Fig. S1), there is no clear temporal trend. This is inconsistent with the idea of a global speedup. And consistent with previous studies (31), we do not see an increase in rates through time for fossil data (Fig. 3B). Furthermore, previous studies have shown that heterogeneity in rates across groups cannot generate the patterns we observe under realistic diversification scenarios (32).
The lack of support for temporal trends within groups also rules out another commonly invoked explanation for the time-dependency of rates -- that diversity dynamics are shaped by ecological limits (4, 33). If niche or geographic space is constrained, then diversification should slow down as diversity accumulates; this decoupling of clade age and size would then lead directly to time-dependent rates of diversification (32). In this scenario, younger clades are growing near exponentially while older clades have reached stationarity; therefore, the average rates of evolution would thus appear much faster in younger clades relative to older ones (19). This explanation predicts that slowdowns in rates should be ubiquitously observed within clades which, as stated above, is incompatible with our findings. Furthermore, the signal for a slowdown should be most apparent in older clades, which is also not apparent in either phylogenetic or paleobiological data (Fig. 3A, B).
Furthermore, such clade based explanations depend critically on the premise that higher taxonomic groups (such as families, orders, etc.) are meaningful units for diversification analyses. This may be because taxonomists have actually identified true, independent evolutionary lineages with their own internal dynamics (34), such as those envisioned by models of taxon cycles (35, 36), or simply because taxonomic practice is biased in some yet-unknown way (32). We can evaluate the premise that named clades are special by subsampling our data. Using recently published megaphylogenies of birds (30), ferns (37) and Angiosperms (38) we tested whether the slope of the time-dependency of diversification rates was different between named clades and clades descending from randomly chosen nodes of similar age. We found that they were not (Fig. 4), which we take as further evidence against hypotheses based on clade-specific ecological limits.
We favor another explanation: that speciation and extinction events are clustered together in time, with these clusters interspersed among long periods where species neither form nor go extinct. For inspiration, we turn to a result, strikingly similar to our own, from an entirely different field. Sadler (15) demonstrated that estimated rates of sediment deposition and erosion are also negatively correlated with the time interval over which they are estimated (this is now known in geology as the “Sadler effect”). This time dependency likely results from the unevenness of sedimentation: geological history is dominated by long hiatuses with no or negative sedimentation (i.e., erosion) punctuated by brief periods where large amounts of sedimentation accumulates (39). Under such a scenario, the mean rate tends to decrease the farther back in time one looks owing to the fact that more and more hiatuses are observed.
We think that Sadler’s rationale could apply to diversification rates as well. There is evidence that extinction events recorded by the fossil record are much more clustered in space and time than we would predict under gradualism (40, 41). Indeed, many of the boundaries of the geological time-scale are defined by large-scale faunal and floral turnover -- the most widely known example of this is undoubtedly the mass extinction event that separates the Cretaceous from the Paleogene. And there is abundant evidence, including from our results, that origination and extinction rates tend to be highly correlated over macroevolutionary time (31, 42). We expect that if extinction events are concentrated in time, speciation rates will be as well.
We are not the first to propose that pulses of diversification may mislead rate estimators built on the assumption that diversity accumulates gradually; paleobiologists have found that short geological intervals tend to show higher rates than longer intervals and have argued that this is caused by the concentration of events at the interval boundaries (40, 41). And as we noted above, some phylogenetic analyses have uncovered similar patterns (16–20). However, explanations favored by previous work are idiosyncratic to the particulars of either phylogenetic or paleontological analysis, and do not explain the entirety of our observation of apparent time-dependency found across both phylogenetic trees and fossils.
Some researchers have suggested that clustered speciation and extinction events observed in the fossil record may be due to large-scale, and possibly regular, climatic fluctuations (43) or an emergent property of complex ecosystems (44). Perhaps a simpler explanation is that species evolve and diversify over a complex geographic landscape, and successful speciation and persistence seems to require the confluence of multiple factors at the right place and time. Most speciation events likely occur in lineages with limited or fragmented ranges but the resulting species are also highly prone to go extinct before they can leave their mark on macroevolutionary history (45). This verbal model shares much in common with Futuyma’s model for ephemeral divergence (46), which itself can be invoked to explain the long-observed negative time-dependency of rates of phenotypic evolution (13, 14) and could, potentially, explain a similar pattern in rates of molecular evolution (12). This scenario, consistent with our results, would imply that the scaling of rates of sedimentation, phenotypic divergence, molecular evolution, and diversification with time all might share a common cause.
Methods
Phylogenetic collection and data cleaning
We collected and curated 104 time-calibrated species-level phylogenies of multicellular organisms from published articles and from data repositories; these trees were fully bifurcating, ultrametric, and contained more than 7 tips and at least 1% of clade’s total richness (SI Appendix, Table S1, S3). Trees were checked to ensure branch lengths were in millions of years (47–50). For each group we compiled from the literature the total number of extant species.
Phylogenetic diversification analysis and regressions
Diversification rates were estimated for each phylogeny using BAMM with default priors and including sampling fraction (51). From the posterior distribution we calculated the mean and variance in speciation and extinction rates across all branches, and the frequency of shifts. Model mixing and convergence were assessed by examining the effective sample size in coda (52). In order to calculate the k parameter (see SI Appendix for more detail), we re-ran BAMM on each phylogenetic dataset and permitted only a single diversification regime per tree. We fit linear models on a log-log scale between mean and variance rates compared with crown ages (estimated from the tree heights). We observed qualitatively identical results when fitting the same linear regressions using the Maximum Likelihood Estimate (MLE) rates for the DR statistic (30).
Phylogenetic simulations
In order to explore the effect of the ‘push of the past’ we simulated trees with at least 6 taxa and the same ages as our empirical trees dataset; we used the diversification parameters from the oldest groups (> 150 My), where the curve between rates and age levels off. We repeated this procedure by setting μ = 0.5λ to acknowledge the difficulty in estimating extinction rates and their influence in macroevolutionary dynamics. We repeated both procedures 1000 times.
To evaluate whether the time-dependency was a consequence of using named higher taxonomic groups, we develop a novel algorithm for randomly sampling nodes from a tree given the age distribution of a set of named nodes and a tolerance. Using this we were able to compare a temporally equivalent set of unnamed random clades with our empirical tree results. We computed the mean MLE for diversification rates of named clades (across several ranks) from megaphylogenies and to the random clades with equivalent ages, we repeated the latter process 1000 times.
The reconstructed trees usually just represent a proportion of a group’s entire diversity, making diversification analysis sensitive to this sampling fraction; for our purposes, this is especially relevant as older groups that tend to be more sparsely sampled than younger ones. Even though previous studies have shown that birth-death estimations are consistent when the sampling fraction is provided (53, 54), we wanted to rule out its effect. We simulated a tree corresponding to each tree in the dataset based on empirical ages, mean rates from phylogenies older that 150 My, and sampling fractions using TreeSim R package (55). We re-estimated the MLE using diversitree (56), repeating this procedure 1000 times.
Errors in ages of young clades can overestimate diversification rates leading to a negative relationship between rates and time. To this aim, we repeated the simulations as in the ‘push of the past’ section but this time we added an error to clade ages. To each age, we drew a percentage error from a uniform distribution that modified the branch lengths (by addition or subtraction). Simulations were carried with maximum error values from 10% to 90%. In all the simulated cases mentioned we estimated the slope of the log(speciation rate) ~ log(crown age) regression resulting from each one of the trees; and then we compared these to our empirically estimated slope.
Fossil collection and cleaning data
We collected and curated information on the first and last appearance of each genus for 39 orders of mammals (17) and plants (22) from online databases. Each order included in our analyses consisted of at least 10 genera, each genus had at least two occurrences, and only occurrences assigned to a unique geological stage were considered (57). We also included 51 orders, which had a minimum of 10 genera, from Sepkoski’s marine animal compendium. In total, our data consists of 6,144 genera distributed across 90 orders that were analysed individually (SI Appendix, Table S2).
Fossil diversification analysis and regressions
Diversification rates were estimated using Foote’s per-capita method (25), which uses the first and last occurrence of the genera to estimate the rates of origination (i.e., rate of appearance of new genera) and extinction for each geological stage. We divided the diversification rate (i.e., origination minus extinction) by the duration of each interval, then estimated the average diversification rate for the orders. We also estimated the correlation between the rate estimates and the time-bin ordination. We fit linear models on log-log scale between average diversification rate and clade duration.
Code and data availability
All scripts for data curation and analysis are available on GitHub at https://github.com/mwpennell/macro-sadler. Since we used only previously published (and publically available) phylogenetic and paleobiological data for this analysis, we will not re-publish the curated datasets; however, the final, curated datasets are available upon request (pennell{at}zoology.ubc.ca).
ACKNOWLEDGMENTS
We thank D. Schluter, S. Magallon, J. Davies, K. Kaur, B. Neto-Bradley, F. Mazel, J. Rolland, and J. Uyeda for comments on this work. These analyses were made possible by Compute Canada. This work was funded by a NSERC Discovery Grant to MWP and a NSF Grant DEB #1208912 to LJH.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.
- 25.↵
- 26.↵
- 27.
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.
- 59.↵
- 60.↵
- 61.↵
- 62.
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.
- 103.
- 104.
- 105.
- 106.
- 107.
- 108.
- 109.
- 110.
- 111.
- 112.
- 113.
- 114.
- 115.
- 116.
- 117.
- 118.
- 119.
- 120.
- 121.
- 122.
- 123.
- 124.
- 125.
- 126.
- 127.
- 128.
- 129.
- 130.
- 131.
- 132.
- 133.
- 134.
- 135.
- 136.
- 137.
- 138.
- 139.
- 140.
- 141.
- 142.
- 143.
- 144.
- 145.
- 146.
- 147.
- 148.
- 149.
- 150.
- 151.
- 152.
- 153.
- 154.
- 155.
- 156.
- 157.
- 158.
- 159.
- 160.
- 161.
- 162.
- 163.
- 164.
- 165.
- 166.
- 167.
- 168.
- 169.
- 170.
- 171.
- 172.
- 173.