Abstract
Being given a phylogenetic tree of both extant and extinct taxa in which the fossil ages are the only temporal information (namely, in which divergence times are considered unknown), we provide a method to compute the probability distribution of any divergence time of the tree with regard to any speciation (cladogenesis), extinction and fossilization rates under the Fossilized-Birth-Death model.
We then use this method to obtain a probability distribution for the age of Amniota (the synap-sid/sauropsid or bird/mammal divergence), one of the most-frequently used dating constraint.
1 Introduction
Dating the Tree of Life (TOL from here one) has become a major task for systematics because timetrees are increasingly used for a great diversity of analyses, ranging from comparative biology (e.g., Felsenstein 1985, 2012) to conservation biology (Faith 1992), and including deciphering patters of changes in biodiversity through time (e.g., Feng et al. 2017). Dating the TOL has traditionally been a paleontological enterprise (Romer 1966), but since the advent of molecular dating (Zuckerkandl and Pauling 1965), it has come to be dominated by molecular biology (e.g., Kumar and Hedges 2011). However, recent developments have highlighted the vital role that paleontological data must continue to play in this enterprise to get reliable results (Heath et al. 2014; Zhang et al. 2016; Cau 2017; Guindon 2018).
Most of the recent efforts to date the TOL have used the now-venerable (though still very useful) node dating method (e.g., Shen et al. 2016), even though several recent studies point out at limitations of this approach (e.g., van Tuinen and Torres 2015), notably because uncertainties about the taxonomic affinities of fossils has rarely been fully incorporated (e.g., Sterli et al. 2013), incompleteness of the fossil record is difficult to take into consideration (Strauss and Sadler 1989; Marshall 1994; Marjanović and Laurin 2008; Nowak et al. 2013), or simply because it is more difficult than previously realized to properly incorporate paleontological data into the proper probabilistic framework required for node dating (Warnock et al. 2015). More recently, other methods that incorporate more paleontological data have recently been developed. Among these, tip (or “total evidence”) dating (Pyron 2011; Ronquist et al. 2012a,b) has the advantage of using phenotypic data from the fossil record (and molecular data, for the most recent fossils) to estimate the position of fossils in the tree and thus does not require fixing a priori this parameter, and comparisons between tip and node dating are enlightening (e.g., Sharma and Giribet 2014). Another recently-developed approach is the “fossilized birth-death process” (Stadler 2010), which likewise can incorporate much paleontological data, but using birth-death processes (in addition to the phenotypic or molecular data to place fossils into a tree). It was initially used to estimate cladogenesis, extinction and fossilization rates (Stadler 2010; Didier et al. 2012, 2017), but it can also be used to estimate divergence times (Heath et al. 2014). Development of such methods is timely because much progress has been made in the last decades in understanding the phylogeny of extinct taxa, as shown by the growing number of papers that included relevant phylogenetic analyses among vertebrates (e.g., Romano and Nicosia 2015; Brocklehurst 2017) and other taxa (e.g., Bardin et al. 2017).
This contribution develops a new method to date the TOL using the fossilized birth-death process. This method relies on estimating parameters of the fossilized birth-death process through exact computations and using these data to estimate a probability distribution of node ages. This method currently requires input trees, though it could be developed to include estimation of these trees. These trees include fossils as terminal taxa or on internal branches, and each occurrence of each taxon in the fossil record must be dated. In this implementation, we use a flat probability distribution of fossil ages between two bounds, but other schemes could easily be implemented. Our new method thus shares many similarities with what we recently presented (Didier et al. 2017), but it is aimed at estimating nodal ages rather than rates of cladogenesis, extinction, and fossilization. Namely, in Didier et al. (2017), we proposed a method for computing the probability density of a phylogenetic tree of extant and extinct taxa in which the fossil ages was the only temporal information. The present work extends this approach in order to provide an exact computation of the probability density of any divergence times of the tree from the same information.
We then use these new developments to estimate the age of the divergence between synapsids and sauropsids, which is the age of Amniota, epitomized by the chicken/human divergence that has been used very frequently as a dating constraint (e.g., Hedges et al. 1996; Hugall et al. 2007). In fact, Müller and Reisz (2005: 1069) even stated that “The origin of amniotes, usually expressed as the ‘mammal-bird split’, has recently become either the only date used for calibration, or the source for ‘secondary’ calibration dates inferred from it.” The recent tendency has fortunately been towards using a greater diversity of dating constraints, but the age of Amniota has remained a popular constraint in more recent studies (e.g., Hugall et al. 2007; Marjanović and Laurin 2007; Shen et al. 2016). However, Müller and Reisz (2005: 1074) argued that the maximal age of Amniota was poorly constrained by the fossil record given the paucity of older, closely related taxa, and that this rendered its use in molecular dating problematic. Thus, we believe that a more sophisticated estimate of the age of this clade, as well as its probability distribution, will be useful to systematists, at least if our estimates provide a reasonably narrow distribution. Such data are timely because molecular dating software can incorporate detailed data about the probability distribution (both its kind and its parameters) of age constraints (Ho and Phillips 2009; Sauquet 2013), the impact of these settings on the resulting molecular age estimates is known to be important (e.g., Warnock et al. 2012), but few such data are typically available, though some progress has been made in this direction recently (e.g., Nowak et al. 2013).
We developped a computer program computing the probability distribution of divergence times from fossil ages and topologies in C language. Its source code is available at https://github.com/gilles-didier/DateFBD.
2 Methods
2.1 Birth-death-fossil-sampling model
We consider here the model introduced in Stadler (2010) and referred to as the Fossilized-Birth-Death (FBD) model in Heath et al. (2014). Namely, speciations (cladogenese) and extinctions are modelled as a birth-death process of constant rates λ and μ. Fossilization is modelled as a Poisson process of constant rate ψ running on the whole tree resulting from the speciation-extinction process. In other words, each lineage alive at time t leaves a fossil dated at t with rate ψ. Last, each lineage alive at the present time, i.e., each extant taxon, is sampled with probability ρ, independently of any other event. In practice, “sampled” may mean discovered or integrated to the study. Note that, in constrast with our previous works (Didier et al. 2012, 2017) which were based on the FBD model with full sampling of extant taxa (i.e., with p = 1), we shall consider uniform sampling of the extant taxa in the present work. In all what follows, we make the usual technical assumptions that λ, μ, ψ and p are non-negative, that λ > μ, and that ψ and p are not both null.
We shall not deal directly with the whole realizations of the FBD process (Figure 1-left) but rather, with the part of realizations which can be reconstructed from present time and which will be referred to as the (reconstructed) phylogenetic tree with fossils. Indeed, we have no information about the parts of the diversification process leading to extinct taxa that left no fossil record or non-sampled extant or extinct species (Figure 1-center). Let us state more precisely which information is assumed reconstructible. We make the assumption that the starting time of the diversification (i.e., the base of the root branch) and the fossil ages are known (in practice, we specify these as intervals, over which we sample randomly using a flat distribution for fossil ages, and we integrate for the root age) and that we are able to accurately determine the phylogenetic relationships between all the extant and extinct taxa considered, in other words, that we can reconstruct the actual tree topology of the species phylogeny (Figure 1-right). Note that under these assumptions, the only available temporal information in a reconstructed phylogenetic tree with fossils is given by the fossil ages, the starting time of the diversification process and the present time. Namely, all the divergence times of the reconstructed phylogeny are unknown.
Most of our calculus are based on probabilities of the following basic events, already derived in (Stadler 2010; Didier et al. 2012, 2017) under the FBD model.
The probability that a single lineage starting at time 0 has n descendants sampled with probability ρ at time t > 0 without leaving any fossil (i.e., neither from itself nor from any of its descendants) dated between 0 and t is given by where α < β are the roots of −λx2 + (λ + μ + ψ)x − μ = 0, which are always real (if λ is positive) and are equal to and ω = − λ(β − α).
Let us write Po(t) for the probability that a lineage present at time 0 is observable at t, which is the complementary probability that it both has no descendant sampled at the present time T and lacks any fossil find dated after t. Namely, we have that
2.2 Tree topologies
We shall consider only binary, rooted tree topologies since so are those resulting from the FBD process. Moreover, all the tree topologies considered below are labelled in the sense that all their tips are unambiguously identified. From now on, “tree topology” will refer to “labelled-binary-rooted tree topology”.
Let us use the same notations as Didier (2018). For all tree topologies 𝒯, we still write 𝒯 for the set of nodes of 𝒯. We put L𝒯 for the set of tips of 𝒯 and, for all nodes n of 𝒯, 𝒯n for the subtree of 𝒯 rooted at n. For all sets S, |S| denotes the cardinality of S. In particular, |𝒯| denotes the size of the tree topology 𝒯 (i.e., its total number of nodes, internal or tips) and |L𝒯| its number of tips.
2.2.1 Probability
Let us define T(𝒯) as the probability of a tree topology 𝒯 given its number of tips under a lineage-homogeneous process with no extinction, such as the reconstructed birth-death-sampling process.
Theorem 1
(Harding 1971). Given its number of tips, a tree topology 𝒯 resulting from a pure-birth realization of a lineage-homogeneous process has probability T(𝒯) = 1, i.e., T(𝒯) = 1 if |𝒯| = 1 is a single lineage. Otherwise, by putting a and b for the two direct descendants of the root of 𝒯, we have that
The probability provided in Didier et al. (2017, Supp. Mat., Appendix 2) is actually the same as that just above though it was derived in a different way from Harding (1971) and expressed in a slightly different form (Appendix A).
From Theorem 1, 𝒯 can be computed in linear time through a post-order traversal of the tree topology T(𝒯).
2.2.2 Start-sets
Following Didier (2018), a start-set of a tree topology 𝒯 is a possibly empty subset A of internal nodes of 𝒯 which verifies that if an internal node of 𝒯 belongs to A then so do all its ancestors.
Being given a tree topology 𝒯 and a non-empty start-set A, the start-tree is defined as the subtree topology of 𝒯 made of all nodes in A and their direct descendants. By convention, , the start-tree associated to the empty start-set, is the subtree topology made only of the root of 𝒯.
For all internal nodes n of the tree topology 𝒯, we define as the set of all the start-sets of 𝒯 that contain n.
2.3 Patterns
As in Didier et al. (2017), our computations are based on three types of subparts of the FBD process which all start with a single lineage and end with three different configurations, namely
patterns of type a end at the present time T with n ≥ 0 sampled lineages,
patterns of type b end at a time e < T with a fossil find dated at e and n − 1 ≥ 0 lineages observable at e,
patterns of type c end at a time e < T with n > 0 lineages observable at e (the case where n = 0 is not required in the calculus).
The probability density of a pattern of any type is obtained by multiplying the probability density of its ending configuration, which includes its number of tips, by the probability of its tree topology given its number of tips provided by Theorem 1.
In this section, we provide equations giving the probability density of ending configurations of patterns of types a, b and c, first with regard to a given “punctual” starting time and, second, by integrating this probability density uniformly over an interval of possible starting times in order to take into account the uncertainty associated with the timing of the beginning of the diversification process.
2.3.1 Patterns of type a
The probability density of the ending configuration of a pattern of type a is that of observing n ≥ 0 lineages sampled at time T by starting with a single lineage at time s without observing any fossil between s and T. We have that
Integrating this probability density over all starting times s between a and b, we get that
2.3.2 Patterns of type b
The probability density of the ending configuration of a pattern of type b is that of observing a fossil find dated at e with n − 1 ≥ 0 other lineages observable at time e by starting with a single lineage at time s without observing any fossil between s and e. From Didier et al. (2017), we have that
Integrating this probability density over all starting times s between a and b, we get that
2.3.3 Patterns of type c
The probability density of the ending configuration of a pattern of type c is that of getting n > 0 lineages observable at time e by starting with a single lineage at time s without observing any fossil between s and e. From Didier et al. (2017), we have that
Integrating this probability density over all starting times s between a and b, we get that
2.4 Basic trees
Following Didier et al. (2017), we split a reconstructed realization of the FBD process, i.e. a phylogenetic tree with fossils, by cutting the corresponding phylogenetic tree at each fossil find. The trees resulting of this splitting will be referred to as basic trees (Figure 3). If there are k fossils, this decomposition yields k + 1 basic trees. By construction, basic trees (i) start with a single lineage either at the beginning of the diversification process or at a fossil age, (ii) contain no internal fossil finds and (iii) are such that all their tip-branches either terminate with a fossil find or at the present time. Note that a basic tree may be unobservable (Figure 3). Since tips of basic trees are either fossil finds or extant taxa, they are unambiguously labelled. The set of basic trees of a phylogenetic tree with fossils is a partition of its phylogenetic tree in the sense that all its nodes belong to one and only one basic tree.
The interest of the decomposition into basic trees stands in the fact that a fossil find dated at a time t ensures that the fossilized lineage was present and alive at t. Since the FBD process is Markov, the evolution of this lineage posterior to t is independent of the rest of the evolution, conditionally on the fact that this lineage was present at t. It follows that the probability density of a reconstructed realization of the FBD process is the product of the probability densities of all its basic trees (Didier et al. 2017).
Remark that a basic tree is fully represented by a 3-tuple (𝒯, s, v), where 𝒯 is its topology, s is its starting time and v is the vector of its tip ages. Namely, for all , vℓ is the age of ℓ. We assume that all fossil ages are strictly anterior to the present time. Under the FBD model, the probability that two fossils are dated at the exact same time is zero. In other words, if vℓ < T and vℓ′ < T then vℓ ≠ vℓ′. For all subsets , we put V[𝒮] for the vector made of the entries of v corresponding to the elements of 𝒮.
2.5 Probability distribution of divergence times
Let us consider a phylogenetic tree with fossils 𝓗 in which the only temporal information is the diversification starting time and the fossil ages, and a possibly empty set of time constraints 𝒞 given as pairs {(n1, t1), …, (n1, t1)} where n1, …, n1 are internal nodes of the phylogenetic tree (each one occurring in a single pair of the set) and t1, …, t2 are times. For any subset of nodes 𝒮, we write 𝒞[𝒮] for the set of the time constraints of 𝒞 involving nodes in 𝒮, namely and .
We shall see here how to compute the joint probability density of 𝓗 and the events τn1 ≤ t1, …, τnk ≤ tk, denoted . Computing the joint probability of 𝓗 and events τn1 ≥ t1, …, τnk ≥ tk is symmetrical.
Note that the probability distribution of the divergence time associated to a node n at any time t is given as the ratio between the joint probability density of 𝓗 and the event τn ≤ t to the probability density of 𝓗 (with no time constraint). Though calculating the distribution of the divergence time of n requires only the probability densities of 𝓗 and that of 𝓗 and τn ≤ t (i.e., 𝓗 without time constraint and with a single time constraint), we present here the more general computation of the joint probability density of a phylogenetic tree with fossils and an arbitrary number of time constraints since it is not significantly more complicated to write.
From the same argument as in the section above, , the joint probability density of 𝓗 and the events τn1 ≤ t1, …, τnℓ ≤ tℓ is the product of the probability densities of the basic trees resulting from the decomposition of 𝓗 with the corresponding time constraints.
Namely, by putting for the basic trees of 𝓗 we have that where is the probability density of with time constraints , in other words, the joint probability density of the basic tree and the events .
In order to compute the probability density of a basic tree (𝒯, s, v) with a set of time constraints , let us define its oldest age z as z = min{mini vi, minj tj} and its set of anterior nodes X as the union of the nodes nj such that and, if there exists a tip c such that vc = z, of the direct ancestor of c (under the FBD model, if such a tip exists, it is almost surely unique).
Theorem 2
Let (𝒯, s, v) be a basic tree, T the present time, 𝒞 a possibly empty set of time constraints, z the corresponding oldest age and X the set of anterior nodes. By setting , the probability density of the basic tree (𝒯, s, v) with time constraints 𝒞 is
Proof
Let us first remark that if z = T, then we have that , since even if is not empty, z = T implies that all the times tj are posterior or equal to T, thus we have always τnj ≤ tj. Moreover, the fact that z = T implies that the basic tree (𝒯, s, v) contains no fossil. By construction, it is thus a pattern of type a and its probability density is .
Let now assume that z < T (i.e., the subtree includes a fossil or a time constraint) and that there is a tip c associated to a fossil find dated at z (i.e., such that vc = z). Let us write the probability density as the product of the probability densities of the part of the diversification process which occurs before z and of that which occurs after z. Delineating these two parts requires to determine the relative time positions of all the divergences with regard to z. Let us remark that some of the divergence times are constrained by the given of the problem. In particular, all the divergence times of the ancestral nodes of c are necessarily anterior to b and so are the divergence times of the nodes nj (and their ancestors) involved in a time-constraint such that tj = z. Conversely, the divergence times of all the other nodes may be anterior or posterior to z. We thus have to consider all the sets of nodes anterior to z consistent with the basic tree and its time constraints. From the definition of the set X of anterior nodes associated to , the set of all these sets of nodes is exactly . Since all these sets of nodes correspond to mutually exclusive possibilities, the probability density is the sum of the probability densities associated to all of them.
Let us first assume that there exists a fossil find dated at z. Given any set of nodes anterior to z, the part of diversification anterior to z is then the pattern of type b starting at s and ending at z with topology , while the part posterior to z is the set of basic trees starting from time z inside the branches bearing the tips of except c, i.e, , with the corresponding time constraints derived from 𝒞, i.e., 𝒞[𝒯m] for all (Figure 5). From the Markov property, the diversification occurring after z of all lineages crossing z is independent of any other events conditionally on the fact that this lineage was alive at z. The probability density of the corresponding basic trees has to be conditioned on the fact that their starting lineage are observable at z, i.e., it is for all nodes . The probability density of the ending configuration of the pattern of type . We have to be careful while computing the probability of the topology since its tips except c (the one associated to the oldest fossil find) are not directly labelled but only known with regard to the labels of their tip descendants while Theorem 1 provides the probability of a (exactly) labelled topology. In order to get the probability of , we multiply the probability of assuming that all its tips are labelled with the number of ways of connecting the tips of except the one with the fossil (i.e., ) to its pending lineages and the probability of their “labelling”. Since, assuming that all the possible labellings of 𝒯 are equiprobable, the probability of the “labelling” of the lineages pending from is the probability of the tree topology ΛA is eventually i.e., the product of the probability of the labelled topology with the number of ways of connecting the (not fossil) tips to the pending lineages and the probability of their “labelling”.
The case where z < T and where no fossil is dated at z is similar. It differs in the fact that the diversification occurring before z is a pattern of type c (Figure 5) and that the probability of is in this case i.e., the product of the probability of the labelled topology with the number of ways of connecting its tips to the pending lineages (i.e., ) and the probability of their “labelling” (i.e, ).
Theorem 2 allows us to express the probability density of a basic tree with a set of time constraints as a sum-product of probability densities of patterns of type a, b or c and of smaller basic trees, which can themselves be expressed in the same way. Since the basic trees involved in the right part of the equation of Theorem 2 contain either at least one fewer time constraint and/or one fewer fossil find than the one at the left part, this recursive computation eventually ends. We shall not discuss algorithmic complexity issue here but, though the number of possibilities “anterior/posterior to the oldest age” to consider can be exponential, the computation can be factorized in the same way as in Didier (2018) in order to get a polynomial, namely cubic, algorithm.
In the case where the dataset is limited to a period which does not encompass the present (as for the dataset below), some of the lineages may be known to be observable at the end time of the period but data about their fate after this time may not have been entered into the database. As in Didier et al. (2017, Section “Missing Data”), adapting the computation to this case is done by changing the type of all the patterns of type a to type c.
Note that applying the computation with an empty set of time constraints yields to the probability density of a phylogenetic tree with fossils (without the divergence times) as provided by Didier et al. (2017). On top of improving the computatinal complexity of the calculus, which was exponential in the worst case with Didier et al. (2017), the method provided here corrects a mistake in the calculus of Didier et al. (2017), which did not take into account the question of the labelling and the “rewiring” of the tips of internal basic trees, missing correcting factors in the sum-product giving the probability density that are provided here. Fortunately, this mistake do not harm much the accuracy the rates estimation (Appendix B). In particular, rates estimated from the Eupelicosauria dataset of Didier et al. (2017) are less than 5% lower, thus essentially the same, with the corrected method.
For all phylogenetic trees with fossils 𝓗, the distribution Fn of the divergence time corresponding to the node n of 𝓗 is defined for all times t by i.e., the probability density of observing 𝓗 and the divergence of n posterior to t divided by the probability density of 𝓗 with no constraint. It follows that Fn(t) can be obtained by applying the recursive computation derived from Theorem 2 on 𝓗 twice: one with the time constraint {(n, t)} and one with no time constraint.
2.6 Dealing with time uncertainty
The computation presented in the previous section requires all the fossil ages and the origin of the diversification to be provided as punctual times. In a realistic situation, these times are rather given as time intervals corresponding to geological stages for fossil ages or to hypotheses for the origin of the diversification. Actually, we give the possibility for the user to provide only the lower bound for the origin of the diversification (in this case, the upper bound is given by the most ancient fossil age). This time uncertainty is handled in the following way. First, we sample each fossil age uniformly in the corresponding time interval as we did in Didier et al. (2017). Next, thanks to the fact that we have explicit formulas for integrating the probability densities of patterns of type a, b, c with regard to their starting time, we uniformly integrate from the lower bound of the origin to its upper bound if provided or to the oldest sampled fossil age otherwise.
3 Empirical Example
3.1 Dataset Compilation
Our dataset represents the fossil record of Cotylosauria (Amniota and their sister-group, Diadectomorpha) from their origin (oldest record in the Late Carboniferous) to the end of the Roadian, which is the earliest stage of the Middle Permian. It represents the complete dataset from which the example presented in Didier et al. (2017) was extracted. Because of computation speed issues that we have now overcome, Didier et al. (2017) presented only the data on Eupelycosauria, a subset of Synapsida, which is one of the two main groups of amniotes (along with Sauropsida). Thus, the dataset presented here includes more taxa (109 taxa, instead of 50). We also incorporated the ghost lineages that must have been present into the analysis. However, in many cases, the exact number of ghost lineages that must have been present is unclear because a clade that appears in the fossil record slightly after the end of the studied period (here, after the Roadian) may have been present, at the end of that period, as a single lineage, or by two or more lineages, depending on when its diversification occurred. This concerns, for instance, in the smallest varanopid clade that includes Heleosaurus scholtzi and Anningia megalops (four terminal taxa). Our computations consider all possible cases; in this example, the clade may have been represented, at the end of the Roadian, by one to four lineages.
The data matrix used to obtain the trees is a concatenation of the matrix from Benson (2012), the study that included the highest number of early synapsid taxa, and of that of Müller and Reisz (2006) for eureptiles. However, several taxa studied here were not in our concatenated matrix. We specified conservatively their relationships to other taxa using a skeletal constraint in PAUP 4.0 (Swofford 2003), as we reported previously (Didier et al. 2017). The skeletal constraint reflects the phylogeny of Romano and Nicosia (2015) and Romano et al. (2017) for Caseasauria, Spindler et al. (2018) for Varanopidae, Brink et al. (2015) for Sphenacodontidae, and Brocklehurst (2017) for Captorhinidae. Note that our tree incorporates only taxa whose affinities are reasonably well-constrained. Thus, some enigmatic taxa, such as Datheosaurus and Callibrachion, were excluded because the latest study focusing on them only concluded that they were probably basal caseasaurs (Spindler et al. 2016). The recently described Gordodon was placed after Lucas et al. (2018). The search was conducted using the heuristic tree-bisection-reconnection (TBR) search algorithm, with 50 random addition sequence replicates. Zero-length branches were not collapsed because our method requires dichotomic trees. Characters that form morphoclines were ordered, given that simulations and theoretical considerations suggest that this gives better results than not ordering (Rineau et al. 2015), even if minor ordering errors are made (Rineau et al. 2018). The analysis yielded 100 000 equally parsimonious trees; there were no doubt more trees, but because of memory limitation, we had to limit the search at that number of trees. Benson (2012) had likewise found several (more than 15 000 000) equally parsimonious trees, and we had to add additional taxa whose relationships were only partly specified by a constraint, so we logically obtained more trees. We performed our analyses on a random sample of 100 equiparsimonious trees (one of these trees is displayed in Figure 6). As in our previous study (Didier et al. 2017), these trees do not necessarily represent the best estimate that could possibly be obtained of early amniote phylogeny if a new matrix were compiled, but that task would represent several years of work (Laurin and Piñeiro 2018) and is beyond the scope of the current study. The analyses performed here can be repeated in the future as our understanding of amniote phylogeny progresses. In our dating analyses, all taxa were considered to represent tips (no fossils were placed on internal branches). It is conceivable that a few of the fossils included in our dataset represent actual ancestors, but the sensitivity analyses carried out by Didier et al. (2017) suggest that this should have a negligible impact on our results. The data are available in the supplementary information.
3.2 Dealing with Fossil, Root Age, and Tree Topology Uncertainty When Estimating Nodal Ages
As in Didier et al. (2017), we used a flat distribution between upper and lower bounds on the estimate of the age of each fossil. For the taxa that were present in the analysis presented in Didier et al. (2017), the boundaries of the range of stratigraphic ages were not modified. However, our new analyses are based on a more inclusive set of taxa.
Our method also requires inserting a prior on the origin of the diversification, i.e., on the starting time of the branch leading to the root (here, Cotylosauria). We set only the lower bound of this origin and assume a flat distribution between this origin and the most ancient (sampled) fossil age (here that of Hylonomus lyelli between 319 and 317 Ma), a fairly basal eureptile sauropsid (Müller and Reisz 2006; Matzke and Irmis 2018). To study the robustness of our estimates to errors in this prior, we repeated the analysis with several origins that encompass the range of plausible time intervals. Recent work suggests that the Joggins Formation, in which the oldest undoubted amniote (Hylonomus) has been recovered (Carroll 1964; Davies et al. 2006; Falcon-Lang et al. 2006), is coeval with the early Langsettian in the Western European sequence, which is about mid-Bashkirian (Carpenter et al. 2015), around 317-319 Ma (Utting et al. 2010; Raine et al. 2015). In fact, recent work suggests more precise dates of between about 318.2 and 318.5 Ma (Utting et al. 2010; Rygel et al. 2015), but we have been more conservative in putting broader limits for the age of this formation, given the uncertainties involved in dating fossiliferous rocks. Thus, we have set the lower bound of the origin of diversification to 330 Ma, 340 Ma, 350 Ma, 400 Ma and 1 000 Ma to assess the sensitivity of our results to the older bound or the width of the interval. The lower bound of 1 000 Ma is of course much older than any plausible value, but its inclusion in our analysis serves as a test of the effect of setting unrealistically old lower bounds for the root age.
We deal with the uncertainty on the phylogenetic tree topology by uniformly sampling among the 100 equiparsimonious trees provided in the dataset. The distribution displayed below were obtained by sampling 10 000 times into the 100 equiparsimonious trees and the fossil age intervals (considering a highter number of samples do not change the plots). Each computation for 10 000 samples and with our dataset takes a few minutes on a desktop computer.
3.3 Results
Estimation and influence of the speciation, extinction and fossilization rates
The maximum likelihood (ML) estimates of the speciation, extinction and fossilization rates (we assumed a full sampling of the extant taxa) were obtained on the dataset by dealing with data uncertainty as described just above, as we did in our previous study (Didier et al. 2017). Our estimates for the speciation, extinction and fossilization rates are 1.457135 × 10-1 with standard deviation (SD) 7.068532 × 10-3, 1.372005 × 10-1 with SD 6.655568 × 10-3 and 2.301837 × 10-2 with SD 1.116616 × 10-3, respectively. In order to assess the influence on the divergence date estimates of the uncertainty on the parameter estimates of the speciation, extinction and fossilization rates, we display in Figure 7 the distributions of the estimated age of Amniota obtained from the ML parameter estimates and from the ML parameter estimates plus or minus two SD, in all cases by assuming a lower bound 400 Ma for the interval of origin of diversification. This analysis shows an asymmetric impact; the resulting ages can decrease by about 9 Ma or increase by about 3 Ma, around our best estimate of 334 Ma.
3.3.1 Approximating divergence time distribution
Though the exact distributions can be directly computed in a reasonable time, it may be useful to fit them with standard distributions, for instance in order to use it in molecular dating software which do not implement their exact computation. Figure 8 shows that the divergence time distributions with lower bounds 1 000 to 350 can be reasonably approximated by shifted reverse Gamma distributions with shape parameter α, scale parameter θ and location parameter δ, i.e., with density function:
Table 1 displays the best fitting parameters of the shifted reverse Gamma distributions plotted in Figure 8.
Note that shifted reverse Gamma distributions do not always approximate correctly divergence time distributions. This is in particular the case for the distributions obtained from the lower bounds 330 and 340 of the time origin, but also for those associated to several nodes in Figure 6.
3.3.2 Age estimates
Our results show remarkable robustness to variations in root age prior specification when the lower bound of the origin of diversification is far enough to the most ancient fossil age (Fig.8). The probability distributions of the age of Amniota obtained with origins 1 000 or 400 Ma are so close to each other that the curves are superimposed over all their course and thus, only one of these two curves is visible. Assuming that the lower bound is 350, 340 or 330 Ma predictably yields slightly narrower distributions, but the peak density is at a barely more recent age (around 333 to 327 Ma). Whatever the time origin, the probability density of the age of Amniota always dwindles from its peak to near 0 before reaching the age of 355 Ma. More importantly, the curves show that when the specified time of origin is at least 350 Ma, the probability density falls to near 0 well before reaching the time of origin, which suggests that the latter does not strongly constrain the result. This is even more obvious when looking at the peak density, which shifts very little (about 1-2 Ma) between times of origin of 340, 350, 400, and 1000 Ma. All this suggests that Amniota probably appeared approximately in the middle of the Carboniferous, which is fairly congruent with the fossil record.
4 Discussion
Our estimates of the rates of cladogenesis (speciation), extinction and diversification are nearly 50% higher than to those obtained for a subset of our data (50 taxa out of the 109 used here) in Didier et al. (2017). These moderate differences are not surprising given that we have expanded the taxonomic sample and made minor modifications to the method.
The method provided here is, to our knowledge, the first one able to compute the exact distributions of divergence times from fossil ages and diversification and fossilization rates. Previous approaches only allowed to sample these distributions by using Monte Carlo Markov Chain approaches. Our computation is fast, with a time complexity cubic with the size of the phylogenetic tree, and can handle hundreds of taxa. Divergence time distributions obtained from fossil ages through methods such as ours are natural choices to calibrate evolutionary models of molecular data and to be used as priors for phylogenetic inferences.
Our analyses suggest that the fossil record of early amniotes is not as incomplete as previously feared. This is despite the fact that the fossil record of continental vertebrates is relatively poor in the Early Carboniferous (in “Romer’s gap”) a bit before the first amniote fossil occurrence (Romer 1956; Coates and Clack 1995; Marjanović and Laurin 2013). Thus, there was a possibility that amniotes had a much older origin and an extended unrecorded early history in Romer’s gap. This had led Muüller and Reisz (2005) to argue that the appearance of amniota was too poorly documented in the fossil record to be useful as a calibration constraint for molecular dating studies. However, our results show that these fears were exagerated; the fossil record of amniotes appears to start reasonably soon after their origin, with a gap of no more than about 30 Ma separating amniote origins from the first recorded fossil occurrence.
Our results about the age of Amniota should prove useful for a wide range of node-based molecular dating studies that can incorporate this calibration constraint. The main objection by Muüller and Reisz (2005: 1074) against use of the age of Amniota in molecular dating (the uncertainty about the maximal age of the taxon) has thus been lifted; we now have a reasonably robust probability distribution that can be used as prior in node dating. Indeed, our probability distributions for the age of Amniota probably make it the best-documented calibration constraint so far. The probability distributions are fairly well-constrained (with a narrower distribution than many molecular divergence age estimates) and show surprisingly little sensitivity to the maximal root age prior, which is reassuring. The uncertainty of the birth and death model parameters (speciation, extinction, and fossilization rates) also appear to generate fairly narrow distributions of nodal ages. A 95% probability density interval of Amniota, using a 350 Ma maximal root age constraint, encompasses an interval of about 20 Ma (or about 30 Ma, if we also take into consideration uncertainties linked to diversification parameter estimates). By comparison, a 95% credibility interval of nodes of similar ages, such as Lissamphibia, in Hugall et al. (2007: table 3) encompasses 38 Ma or 56 Ma, depending on whether these are evaluated using the nucleotide or aminoacid dataset. Some nodes in Hugall et al. (2007: table 3) are better constrained, probably because they are closer to a dating constraint. Thus, Hugall et al. report a 95% credibility interval for Tetrapoda that encompasses a range of 24 Ma and 32 Ma, for the nucleotide or aminoacid datasets. This is only marginally broader than our 95% interval, but the width of the intervals reported in Hugall et al. (2007: table 3) are underestimated because the reflect a punctual estimate (at 315 Ma) of the age of Amniota, which is used as the single calibration point. Pyron (2011: table 1) reports 95% intervals of 54 Ma for the age of Lissamphibia using Total Evidence (tip) dating. Similarly, Ronquist et al. (2012a: fig. 9) obtained 95% credibility intervals of about 50 Ma for major clades of Hymenoptera using tip (total evidence) dating, and substantially broader intervals using node dating. Thus, we believe that our estimates are fairly precise, when comparisons are made with molecular estimates that consider a similarly broad range of sources of uncertainty. Our estimates also suggest that the way in which this constrainte (Amniota) was used in most molecular dating studies was not optimal. Indeed, most (e.g., Hedges et al. 1996; Hugall et al. 2007) have set a prior for this node centered around 310 to 315 Ma, whereas our analyses suggest that the probability peak is approximately around 330-335 Ma. It will be interesting to see how much precision can be gained with these new data, and with similar data on other calibration constraints that can be obtained with our new method.
Our method can be applied to any clade that has a good fossil record and a sufficiently complex phenotype to allow reasonably reliable phylogenetic analyses to be performed. In addition to vertebrates, this includes, minimally, many other metazoan taxa among arthropods (Ronquist et al. 2012b), echinoderms (Sumrall 1997) and mollusks (Merle et al. 2011; Bardin et al. 2017), among others, as well as embryophytes (Corvez et al. 2012). With new calibration constraints in these taxa (and possibly others), the timing of diversification of much of the eukaryotic Tree of Life should be much better documented.
A Equivalence between two tree topology distributions
In Didier et al. (2017), the probability T†(𝒯) of a tree topology 𝒯 arising from a lineage homogeneous process (conditioned on its number of tips) was expressed as where R𝒯 = 1 if 𝒯 is made of a single lineage and, putting a and b for the two direct descendants of the root of 𝒯, otherwise.
If 𝒯 is made of a single lineage, both the probability T(𝒯) from Harding (1971) and T†(𝒯) are equal to 1.
Otherwise, by substituting Equation 2 in Equation 1, we get that
It follows that the probability T†(𝒯) of Didier et al. (2017) can be expressed with the same recursive formula as the probability T†(𝒯) of Harding (1971), recalled in Theorem 1. Since moreover the recursive computations of the probabilities of Harding (1971) and Didier et al. (2017) have the same initial condition (i.e., that of trees made of a single lineage), they are equal for all topologies 𝒯. It is worth noting that the way in which these distributions have been derived is quite different. Arguments of Harding (1971) mainly rely on the labelling of the tree while those of Didier et al. (2017) are essentially based on the relative order of its divergence times.
B Impact of the correction of the probability density on the rates estimation
In order to assess the impact of the error in the probability density of phylogenetic tree with fossils on the diversification rate estimation, we simulated trees and fossils following an FBD model with given rates and determine their maximum likelihood estimates on the simulations by using the computation from Didier et al. (2017) and the corrected one presented here. Thanks to the improvement of the complexity of the new algorithm, which we also adapted for re-implementing our former computation, we relaxed the filtering of the simulated trees. Namely, we now rejected trees with fewer than 10 extant taxa or with more than 5 000 clades, against 20 extant taxa and 1 000 clades in Didier et al. (2017). Unlike for simulations of Didier et al. (2017), we did not filter the simulated trees with regard to their complexity level (i.e., their expected computational time with the former algorithm).
Figure 9 displays the mean absolute error of speciation, extinction and fossil discovery rates obtained from the corrected method provided in this work compared to the former one of Didier et al. (2017), in a same set of simulated phylogenetic trees with fossils. Though we can observe a slight improvement for the estimates of the corrected computation, the accuracy of the estimates is essentially the same for the two methods.
Acknowledgments
We thank Michael C. Rygel (SUNY Potsdam) for sending papers about the stratigraphy of the various formation represented in the Joggins locality. P. Drapeau and M. Fau helped to compile the data for the empirical example.