The Prevalence and Impact of Model Violations in Phylogenetics Analysis

Suha Naser-Khdour; Bui Quang Minh; Wenqi Zhang; Eric Stone; Robert Lanfear

doi:10.1101/460121

Abstract

In phylogenetic inference we commonly use models of substitution which assume that sequence evolution is stationary, reversible and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic datasets. We show that many partitions (39.5%) reject the SRH assumptions, and that for most datasets, the topologies of trees inferred from all partitions differ significantly from those inferred using the subset of partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. They also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org).

Introduction

Phylogenetics is an essential tool for inferring evolutionary relationships between individuals, species, genes, and genomes. Moreover, phylogenetic trees form the basis of a huge range of other inferences in evolutionary biology, from gene function prediction to drug development and forensics (Eisen 1998; Farrell, et al. 2000; Mäser, et al. 2001; Gardner, et al. 2002; Yao, et al. 2003; Grenfell, et al. 2004; Yao, et al. 2004; Salipante and Horwitz 2006; Gray, et al. 2009; Brady and Salzberg 2011; Dunn, et al. 2011).

Most phylogenetic studies use models of sequence evolution which assume that the evolutionary process follows stationary, reversible and homogeneous (SRH) conditions. Stationarity implies that the marginal frequencies of the nucleotides or amino acids are constant over time, reversibility implies that the evolutionary process is stationary and undirected, and homogeneity implies that the instantaneous substitution rates are constant along the tree or over an edge (Felsenstein 2004; Yang and Rannala 2012; Jermiin, et al. 2017). However, these simplifying assumptions are often violated by real data (Foster and Hickey 1999; Tarrío, et al. 2001; Paton, et al. 2002; Goremykin and Hellwig 2005; Murray, et al. 2005; Bourlat, et al. 2006; Hyman, et al. 2007; Sheffield, et al. 2009; Nesnidal, et al. 2010; Nabholz, et al. 2011; Martijn, et al. 2018). Such model violation may lead to systematic error that, unlike stochastic error, cannot be solved simply by increasing the size of a dataset (Felsenstein 2004; Ho and Jermiin 2004; Jermiin, et al. 2004; Philippe, et al. 2005; Sullivan and Joyce 2005; Kumar, et al. 2012; Brown and Thomson 2017; Duchene, et al. 2017). As phylogenetic datasets are steadily growing in terms of taxonomic and site sampling, it is vital that we develop and employ methods to measure and understand the extent to which systematic error affects phylogenetic inference (systematic bias), and explore ways of mitigating systematic bias in empirical studies.

One approach to accommodate data that have evolved under non-SRH conditions is to employ models that relax the SRH assumptions. A number of non-SRH models have been implemented in a variety of software packages (Foster 2004; Lartillot and Philippe 2004; Blanquart and Lartillot 2006; Boussau and Gouy 2006; Jayaswal, et al. 2007; Knight, et al. 2007; Dutheil and Boussau 2008; Jayaswal, et al. 2011; Sumner, et al. 2012; Zou, et al. 2012; Groussin, et al. 2013; Jayaswal, et al. 2014; Nguyen, et al. 2015; Woodhams, et al. 2015). However, such models remain infrequently used, as searching for optimal phylogenetic trees under these models is computationally demanding (Betancur-r, et al. 2013) and the implementations are often not easy to use. As a result, the vast majority of empirical phylogenetic inferences rely on models that assume sequences have evolved under SRH conditions, such as the general time reversible (GTR) family of models implemented in the most widely-used phylogenetics software packages (Swofford 2001; Drummond and Rambaut 2007; Guindon, et al. 2010; Ronquist, et al. 2012; Bazinet, et al. 2014; Bouckaert, et al. 2014; Stamatakis 2014; Nguyen, et al. 2015; Höhna, et al. 2016).

Another approach to accounting for data that may have evolved under non-SRH conditions is to test for model violations prior to tree reconstruction. Here, one first screens datasets or parts of datasets, and reconstruct trees exclusively from data that do not reject SRH conditions. A number of methods have been proposed to test for violation of SRH conditions in aligned sequences prior to estimating trees (Bowker 1948; Stuart 1955; Rzhetsky and Nei 1995; Kumar and Gadagkar 2001; Weiss and von Haeseler 2003; Ababneh, et al. 2006; Ho, et al. 2006), and there are also a posteriori tests for absolute model adequacy which are employed after trees have been estimated (Goldman 1993; Brown and ElDabaje 2009; Brown 2014; Duchene, et al. 2017; Brown and Thomson 2018).

Allowing the data to reject the model when the assumptions of the model are violated is an important approach to reducing systematic bias in phylogenetic inference (Philippe, et al. 2005; Brown 2014). Knowing in advance which sequences and loci are inconsistent with the SRH assumptions will allow us to choose more complex models for phylogeny reconstruction of this data or to omit some of these sequences and loci from downstream analyses (Kumar and Gadagkar 2001). The need for methods that assess the evolutionary process prior to phylogenetic inference becomes more important as the number of sequences and sites per dataset increases, because systematic bias has an increasing effect on inferences from larger phylogenetic datasets (Ho and Jermiin 2004; Jermiin, et al. 2004; Phillips, et al. 2004; Delsuc, et al. 2005).

In this paper we evaluate the extent and effect of model violation due to non-SRH evolution using 35 empirical datasets with a total of 3,572 partitions. We determine if the SRH assumptions are violated by extending and applying the matched-pairs tests of homogeneity (Jermiin, et al. 2017) to each partition. We then compare the phylogenetic trees for each dataset estimated from all of the partitions, the partitions that reject the SRH assumptions, and the partitions that do not reject the SRH assumptions, in order to evaluate the effect violating SRH conditions on phylogenetic inference.

Materials and Methods

Empirical datasets

In order to assess the impact of model violation in phylogenetics, we first gathered a representative sample of 35 partitioned empirical datasets that had been used for phylogenetic analysis in recent studies (Table 1). Within the constraints of selecting data that were publicly available and suitably annotated, i.e. such that all loci and all codon positions within protein-coding loci could be identified, we selected the datasets to provide as representative a sample as possible of the data types, taxa, and genomic regions most commonly used to infer bifurcating phylogenetic trees from concatenated alignments. These datasets include nucleotide sequences from nuclear, mitochondrial, plastid and virus genomes, and include protein-coding DNA, introns, intergenic spacers, tRNA, rRNA and ultra-conserved elements. The number of taxa and sites in these datasets range from 27 to 355 and from 699 to 1,079,052 respectively. The clades represented in these datasets include animals, plants and viruses. We partitioned all datasets to the maximum possible extent based on the biological properties of the data, i.e. we divided every locus and every codon position within each protein-coding locus into a separate partition. All partitioning information is available at the github repository https://github.com/roblanf/SRHtests/tree/master/datasets, and the full details of every dataset are provided in Table 1 and in extended Table 11.

View this table:

Table 1

Number of taxa, number of sites, clade and study reference for each dataset that has been used in this study

Workflow summary

Figure 1 outlines the workflow. For each partition in each dataset, we used two approaches based on the three matched-pairs tests of homogeneity to ask whether the evolution of the aligned sequences in the partition rejects the SRH assumptions. The three matched-pairs tests of homogeneity, described in more detail below, test three slightly different assumptions about the historical process that generated each aligned pair of sequences in a given partition. A significant result from any test suggests that the nature of the evolutionary process required to explain the aligned sequences violates at least one of the three SRH conditions (Jermiin, et al. 2017). For each test, we classify each partition as pass if the result of the test is non-significant or fail if the result of the test is significant. We then denote the original dataset as D_all, while the concatenation of pass partitions is denoted D_pass and the concatenation of fail partitions as D_fail (Fig. 1).

Fig. 1 Flow chart of methodology.

After application of the matched-pairs tests of homogeneity on each possible pair of sequences for each partition, we have two options: 1) apply a binomial test on the p-values of all possible pairs of sequences of each partition to derive a P-value for that partition. 2) Take the maximum test statistic value and compare it to a null distribution of the maximum test statistics derived from permutation of the sites of the alignments.

To investigate the impact of model violation on phylogenetic inference, we infer and compare three phylogenetic trees, T_all, T_pass and T_fail, estimated from D_all, D_pass and D_fail, respectively.

Matched-pairs tests of homogeneity

The three matched-pairs of homogeneity that are applied to pairs of sequences are: the MPTS (matched-pairs test of symmetry), MPTMS (matched-pairs test of marginal symmetry), and MPTIS (matched-pairs test of internal symmetry). The statistics are computed on a m-by-m (m is 4 for nucleotides and 20 for amino acids) divergence matrix D with elements d_ij, where d_ij is the number of alignment sites having nucleotide (or amino acid) i in the first sequence and nucleotide (or amino acid) j in the second sequence.

The MPTS tests the symmetry of D by computing the Bowker’s test statistic as the chi-square distance between D and its transpose:

A p-value is then obtained by a chi-square test with m(m − 1)/2 degrees of freedom. A low p-value (<0.05) indicates that the assumption of symmetry is rejected and evolution is non-stationary or non-homogeneous (Jermiin, et al. 2017).

The MPTMS tests the equality of nucleotide or amino acid composition between two sequences. To do so, MPTMS computes the Stuart’s test statistic based on the difference between nucleotide or amino acid frequencies of two sequences, u, and its variance-covariance matrix, .

Where u is the vector of marginal differences and u^{^T} = (d_1• − d_•1, d_2• − d_•2, …, d_k• − d_•k). d_i• is the sum of d_ij over j, d_•j is the sum of d_ij over i, and k = m − 1.

V is the estimated variance-covariance matrix of u under the assumption of marginal symmetry with the elements

A p-value is obtained by a chi-square test with m − 1 degrees of freedom. A low p-value (<0.05) indicates that the stationarity assumption is rejected.

The MPTIS uses the test statistic as the difference between Bowker’s and Stuart’s statistic: . Hence, it is also chi-square distributed and one obtains a p-value with (m − 1)(m − 2)/2 degrees of freedom. A low p-value (<0.05) indicates that the homogeneity assumption is rejected.

The MPTS, MPTMS and MPTIS test different aspects of the symmetry with which substitutions accumulate between pairs of sequences: The MPTS is a comprehensive and sufficient test to determine whether the data complies with the SRH assumptions (Jermiin, et al. 2017), but it cannot provide any information about the source of this violation. Some information on the underlying source of model violation may be obtained by performing the other two tests of symmetry, the MPTMS and the MPTIS. If the violation of the SRH assumptions stems from differences in base composition between the sequences, this should affect the marginal symmetry of the sequence pair, which can in principle be detected by the MPTMS. While if the violation of the SRH assumptions stems from differences in substitution rates over time, this should affect the internal symmetry of the sequence pair, which can in principle be detected by the MPTIS. However, even after performing all three tests, it is difficult to ascertain which of the three SRH assumptions is violated during the evolutionary process because the relationships between the SRH conditions and the three matched-pair tests is neither bijective nor injective, i.e. there is no one-to-one correspondence between the three tests and violation of the three SRH conditions (Jermiin, et al. 2017). However, the three matched-pairs tests of homogeneity were designed to ask whether any single pair of sequences rejects the SRH conditions (Jermiin, et al. 2017). To ask whether a given partition rejects SRH conditions, we developed two approaches to extend the matched-pairs tests of homogeneity to accommodate datasets with more than two sequences.

Extending the matched-pairs tests of homogeneity to multiple sequence alignments

There are many potential ways to extend the three matched pairs tests of homogeneity for multiple sequence alignments. One approach is to compute the p-value for all pairs of sequences in an alignment, and then ask whether the distribution of the resulting p-values follows the distribution expected under the null hypothesis. In this approach, we apply the matched-pair tests of homogeneity to every pair of sequences in an alignment, resulting in chi-square p-values for each partition, where n is the number of sequences in the partition (fig. 1). Under the null hypothesis of SRH evolution, the marginal distribution of each p-value should be uniform on the interval [0,1], suggesting a 5% chance that any one of them falls below 0.05. In principle, we could use this logic to assess whether we observe smaller p-values than we would expect by chance, in which case a partition would be deemed to fail the SRH conditions. Specifically, we could count how many chi-square p-values are less than 0.05 and compare the result to a Binomial distribution with trials and success probability 0.05. If the corresponding Binomial p-value were smaller than 0.05, we would classify the partition as fail; otherwise, the partition is labelled as pass.

Despite its appealing simplicity, this approach suffers from the serious drawback that it ignores the dependencies among p-values. P-values are dependent because many pairs of sequences will cross shared branches in the tree. Thus, a full accounting of the dependencies among p-values would require knowledge of the underlying phylogeny. Given that these tests aim to determine a priori whether it is feasible to build a reliable phylogeny, it would be inappropriate to use an estimated phylogeny as part of the test. Because of this limitation, we only present the results of this test in the supplementary material. To avoid confusion between the binomial and pairwise tests, we denote the binomial extensions of the MPTS, MPTMS, and the MTPIS as BiSymTest, BiSymTest_mar, and BiSymTest_int respectively.

Maximum statistic approach

The second approach, which we call MaxSymTest, to determine whether a given alignment rejects SRH conditions, is to consider only the pair of taxa with the maximum test statistic value (which we denote as S_max). MaxSymTest assumes that model violations, if present, would occur along the path connecting these two taxa. MaxSymTest overcomes the non-independence issue because it uses data from just a single pair of sequences from an alignment. Furthermore, it does not require the knowledge of the underlying tree topology. Because we do not assume any distribution of S_max, we assess its statistical significance as follows. We compute the null distribution for S_max by permutating the sequences for each alignment site independently. Specifically, for a single test of an S_max value, we generate 999 permutated alignments (Fig. 1) and use these to calculate the corresponding null distribution comprised of 999 S_max values. MaxSymTest then assigns a p-value as the fraction of permutated S_max larger than or equal to the S_max from the original alignment. For convenience, we denote the maximum value approach of the MPTS, MPTMS, and MPTIS as MaxSymTest, MaxSymTest_mar, and MaxSymTest_int respectively. We present the results of the MaxSymTest analyses in the main text. The supplementary information contains a side-by-side comparison of the MaxSymTest and the BiSymTest results.

Phylogenetic inference

We used IQ-TREE (Nguyen, et al. 2015) to infer up to seven phylogenetic trees for every dataset: T_all (all partitions from the original dataset; D_all); and T_pass and T_fail based on the D_pass and D_fail datasets from each of the three tests (MaxSymTest, MaxSymTest_mar, MaxSymTest_int), provided that there was at least one partition in each category. We ran IQ-TREE using the default settings with the best-fit fully-partitioned model (Chernomor, et al. 2016), which allows each partition to have its own evolutionary model and edge-linked rate determined by ModelFinder (Kalyaanamoorthy, et al. 2017) followed 1000 ultrafast bootstrap replicates (Hoang, et al. 2018).

Distance between trees

For each of the three tests (MPTS, MPTMS, MPTIS) we calculated the Normalised Path-Difference (NPD) and quartet distance (QD) (Steel and Penny 1993; Sand, et al. 2014) between all three possible pairs of trees (T_all vs. T_pass; T_all vs. T_fail; and T_pass vs. T_fail), as long as D_pass and D_fail were non-empty and so T_pass and T_fail had been estimated. The path-difference metric (PD) is defined as the Euclidean distance between pairs of taxa (Steel and Penny 1993; Mir and Russello 2010). In this study, because we are interested only in differences between topologies, we use the variant of the PD metric that ignores branch lengths. In order to compare path distances between trees with different number of taxa, we normalised PD (to obtain NPD) by the mean of a null distribution of PDs generated from 10K random pairs of trees with the same number of taxa (Bogdanowicz, et al. 2012). Thus, an NPD of zero indicates an identical pair of trees, an NPD of 1 indicates that a pair of trees is as similar as a pair of randomly-selected trees with the same number of taxa; and an NPD greater than 1 indicates a pair of trees that are less similar than a randomly-selected pair of trees with the same number of taxa. Since path differences are always non-negative, the NPD is also guaranteed to be non-negative.

The QD metric is defined as the fraction of quartets (subsets of four taxa) that induce different subtrees between two comparing trees. QD ranges between 0 and 1, where 0 means that two trees are identical and 1 means that they do not share any quartet subtrees. Compared with PD, QD has a main advantage that its distribution is less sensitive to the underlying distribution on tree topologies (Steel and Penny 1993).

Tree topology tests

The NPD and the QD give us measures of the differences between pairs of trees, but they do not tell us whether the differences are phylogenetically significant in the three datatsets (D_pass, D_all, and D_fail) derived from a given test. For example, trees that differ due to stochastic error _{associated with small datasets may be very different, but such differences may not be} statistically significant. To assess the significance of the differences between T_pass, T_all and T_fail, we used the weighted Shimodaira-Hasegawa (wSH) test (Shimodaira and Hasegawa 1999; Shimodaira 2002) implemented in IQ-TREE with 1000 RELL replicates (Kishino, et al. 1990). Given the alignment (D_pass), the wSH test computes a p-value for each tree, where a low p-value (<0.05) implies that the corresponding tree has a significantly worse likelihood than the best tree in the set of T_pass, T_all and T_fail. We use D_pass for these tests because it is, by definition, the only dataset that does not reject the underlying assumptions of the SH test. As such, we can only compute sWH p-values when D_pass is non-empty. Thus, we performed two sWH tests for each of the three MaxSymTest variant: one that asks whether T_all can be rejected in favour of T_pass, and another that asks whether T_fail can be rejected in favour of T_pass. In cases where there were no partitions in D_pass or D_fail, we were unable to perform the wSH test.

Correlation between number of substitutions and model violation

We hypothesised that partitions with more substitutions may be more likely to violate the SRH assumptions, since substitutions form the raw data for the matched-pairs tests of symmetry. To assess this, we fitted a linear mixed-effects model for each of the three tests using the glmer function from the lme4 package in R (Bates, et al. 2015). In this model, we treat each partition as a datapoint, the number of substitutions measured for that partition as a fixed effect, and the dataset from which that partition was taken as a random effect. This allows us to estimate the extent to which the number of substitutions in a partition correlates with whether a partition fails a given test of symmetry, after accounting for differences between the datasets. To calculate the R squared value we use the r.squaredGLMM function from the MuMIn package in R (Barton 2009; Nakagawa and Schielzeth 2013).

Software implementation

We implemented a new option --maxsymtest NUM (where NUM specifies the number of permutations) in IQ-TREE to perform the three MaxSymTest matched pairs tests of symmetry. In addition, the option --symtest-remove-bad allows users to remove from the final analysis partitions that fail the MaxSymTest. One can change the removal criterion to MaxSymTest_mar or MaxSymTest_int via the --symtest-type MAR|INT option. In addition, the cut-off p-value can be changed using the --symtest-pval NUM option, where the default value is 0.05.

Reproducibility

The GitHub repository https://github.com/roblanf/SRHtests contains the raw data and Python and R scripts necessary to perform all analyses reported in this study.

Results

Violation of SRH conditions is common across 35 empirical datasets

Across all 3,572 partitions analysed, 1,475 (40.8%) failed the MaxSymTest, 1,483 (41.5%) failed the MaxSymTest_mar, and 312 (8.7%) failed the MaxSymTest_int. 1,804 (50.5%) of partitions failed at least one test.

The proportion of partitions failing each test varied substantially among datasets (Fig. 2), but on average 44.9% of the partitions in each dataset failed the MaxSymTest, 41.8% failed the MaxSymTest_mar, and 8.2% failed the MaxSymTest_int.

Figure 2 The proportion of partitions that reject the null hypothesis of the MaxSymTest, MaxSymTest_mar and MaxSymTest_int (p-value < 0.05) in each dataset.

The fraction of failing partitions also varied with the type of genome (e.g. mitochondrial, chloroplast, or nuclear) and the type of locus (e.g. protein-coding, UCE, tRNA) from which the partition was sequenced (Table 2) although we note that a substantial proportion of the partitions from almost every category failed at least one of the tests (Table 2). Similar results are detected with the BiSymTest (Extended Table 1)

View this table:

Table 2 The proportion of partitions that failed at least one of the three tests - MaxSymTest, MaxSymTest_mar, MaxSymTest_int

There were no clear differences in the substitution models that were selected for the partitions that pass or fail the tests (see Extended Tables 2-6). However, we note that the two most-frequently selected substitution models (for >35% of the partitions) were relatively simple: K80 (Kimura 1980) and HKY (Hasegawa, et al. 1985). K80 has one single parameter, the transition to transversion rate ratio. Whereas HKY additionally has four base frequency parameters.

Model violation has a large influence on tree topologies

For MaxSymTest and MaxSymTest_mar and according to two different tree distance metrics (NPD and QD), we find that the tree inferred from the original dataset (T_all) was more similar to the tree estimated from the failed partitions T_fail (Table 3, Extended tables 7, 9-10) compared with T_pass. Furthermore, the mean NPD distance between T_pass and T_fail across all 35 datasets for the MaxSymTest was 0.71, i.e., they are 71% as dissimilar as random pairs of trees. This suggests that violations of SRH assumptions drive changes in tree topologies. The results of the wSH tests (Table 4, Extended Table 8) confirm that the differences between trees that we observe tend to be statistically significant. For example, when using the MaxSymTest, T_pass rejects T_all in ~48% of the datasets, and T_fail in ~84% of the datasets.

View this table:

Table 3 The proportion of datasets that have the highest NPD metric and QD metric respectively between the three comparisons (all-fail, all-pass, pass-fail) for MaxSymTest, MaxSymTest_mar, and MaxSymTest_int.

View this table:

Table 4 The proportion of datasets that have a significant p-value in the weighted SH test when using D_pass as the input alignment for the test.

The number of substitutions explains less than fifth of the variance in passing or failing the tests of symmetry

The number of substitutions in a partition explained 17% of the variation in whether or not a partition passed or failed the MaxSymTest (Extended figs. 6-7). This proportion is very similar for MaxSymTest_mar (18%) (Extended figs. 8-9), but is dramatically lower for the MaxSymTest_int (2%) (Extended figs. 10-11). Accordingly, the number of substitutions in a partition is a highlight significant (p<2e-16) predictor of passing or failing any of the tests. However, that the number of substitutions explains less than a fifth of the variation suggests that other factors, for example underlying differences in the extent to which partitions violate the SRH assumptions, are driving the remaining ~80% of the variation.

Model violation affects the internal relationships of Spiralia and the position of Xenacoelomorpha

To examine the effects of model violation in more detail, we selected a single dataset for more detailed consideration. Conflicting support for the placement of Xenacoelomorpha, the clade that contains Xenoturbella and Acoelomorpha, in the tree of life across different analyses has led to various hypotheses regarding to the evolution of Bilateria (Cannon, et al. 2016a). It has been suggested that such inferences might be strongly affected by model violation and systematic error (Philippe, et al. 2011). To assess whether data that pass or fail the MaxSymTest show different signals regarding the evolution of the Bilateria, we examined in more detail the T_all, T_pass, and T_fail trees from a recent study that addressed this question (2016a). This dataset comprises 76 metazoan taxa, 2 choanoflagellate outgroups, 212 genes and 424 partitions representing the first and second codon positions of the 212 genes (Cannon, et al. 2016b). The tree reconstructed from all of the partitions (T_all) is identical to the tree reconstructed from the partitions that pass the MaxSymTest (T_pass), and both are identical to the tree shown in the original paper from both DNA and amino acid data (Canon, et al. 2016a), which places Xenacoelomorpha as the sister group of Nephrozoa (Deuterostomia and Protostomia) with 100% bootstrap support (Fig. 4, Extended figs. 1-3).

The tree reconstructed from the data that fail the MaxSymTest (T_fail) on the other hand (inferred from 143 partitions) shows Xenoturbella as the sister group to Nephrozoa with 93% bootstrap support (Fig. 4, Extended figs. 4-5). It also shows Platyzoa (Rotifera, Platyhelminthes and Gastrotricha) as a sister of Acoelomorpha with 95% bootstrap support. It is clear from the results that the partitions that comprise D_fail support a very different set of relationships to those that comprise D_pass.

Discussion

In this paper, we show that model violation is prevalent and has a strong impact on tree reconstruction in many phylogenetic datasets. This impact varies a lot between different datasets and different types of partitions. The trees inferred from different groups of partitions from the same dataset often have topologies that are biologically and statistically significantly different.

Our results show great variation in the extent of model violation among different datasets and partitions. This is demonstrated by the different proportion of partitions that failed the matched-pairs tests of symmetry in each dataset and in each genomic context (codon position, rRNA, tRNA, UCE or other) and type of genome (nuclear, mitochondrial, plastid and virus). Model violations are most frequently observed in the third codon positions for viral, mitochondrial and nuclear genomes and intergenic spacers in plastid sequences. Yet, our results affirm that non-SRH evolution is not constrained to these genomic regions. For example, in a dataset of first and second codon positions from 212 genes (Cannon, et al. 2016b), 34% of partitions showed significant evidence of violating the SRH assumptions according to the MaxSymTest. The tree inferred from the partitions that show significant model violation differs a great deal in its topology from the tree inferred from the partitions that do not show significant model violation, particularly with respect to the placement of the focal taxon Xenoturbella (fig. 3). From looking at the results of the two other tests – MaxSymTest_mar and MaxSymTest_int, we noticed that all the partitions that failed the MaxSymTest also failed the MaxSymTest_mar, suggesting that those partitions are violating the models mainly due to non-stationarity. Based on this observation, one hypothesis to explain the differences between the trees in Figure 4 is that the rearrangements of the tree in Figure 4B occur because the partitions that pass and fail the test differ in their GC content, and that the two trees tend to group together clades with similar GC content (e.g. as in ref (Betancur-r, et al. 2013)). However, it is hard to discern any clear evidence for this from looking at the GC content of the clades presented in figure 4.

Fig. 3 Maximum likelihood trees of Metazoan relationships based on analysis of Cannon 2016 dataset. a) the T_all inferred from all 424 partitions and T_pass inferred from 281 partitions that passed the MaxSymTest. b) the tree inferred from 143 partitions that failed the MaxSymTest.

Red numbers at the internal branches indicate the bootstrap support values that are less than 100% under the best fitting model. Numbers in curly brackets show the GC content of the group. Spiralia 1 consists of Rotifera, Platyhelminthes and Gastrotricha. Spiralia 2 consists of Bryozoa and Entoprocta. Spiralia 3 consists of Annelida, Lophophorata, Nemertea and Mollusca.

The results of our study also provide some insights into the likely cause of model violation in the datasets we examined. Figure 2 shows that violation of marginal symmetry (assessed with MaxSymTest_mar) was much more common than violation of internal symmetry (assessed with MaxSymTest_int). This suggests that non-stationarity, which is associated with marginal symmetry, is a more common cause of systematic bias than non-homogeneity in the datasets that we examined (see also Jayaswal, et al. 2005; Ababneh, et al. 2006; Song, et al. 2010). This result hints that the development and application of non-stationary models (e.g. (Yang 1994; Roberts and Yang 1995; Yap and Speed 2005) may be an important avenue towards reducing systematic bias in future analyses. Moreover, our results show a clear preference for simple substitution models with a single transition/transversion ratio over more complex models such as GTR. This suggests that developing non-stationary models with a single parameter for the transition/transversion ratio might be sufficient to reduce systematic bias in phylogenetic analysis.

One limitation of using the tests that we propose in this paper is that their power will be limited if there are few differences between the sequences being examined. Indeed, our analyses show that in our representative sample of more than 3500 partitions from published datasets, roughly ~20% of the variance in whether a partition passes or fails a given test can be attributed to the number of observed differences between the sequences. Nevertheless, this suggests that the remaining ~80% of the variance in whether a partition passes or fails a test could be attributable to other processes, such as variation in the extent of model violation among partitions. This suggests that we should be cautiously optimistic: although a lack of power on small or slowly-evolving partitions may induce some false negatives (i.e. failures to identify partitions that have evolved under non-SRH conditions), the tests we propose still have significant power to identify partitions that show the evidence of model violation. It is possible that removing such partitions from phylogenetic analyses may improve the accuracy of results by reducing the overall burden of model violation on the inference of the tree topology. We hope that our implementation of these tests in user-friendly software IQ-TREE will allow empirical phylogeneticists to continue to explore whether this is the case.

Acknowledgments

The authors would like to thank Lars Jermiin for discussions and three anonymous referees for providing thoughtful comments on this manuscript.

References

↵
Ababneh F, Jermiin LS, Ma C, Robinson J. 2006. Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics 22:1225–1231.
OpenUrl CrossRef PubMed Web of Science
Anderson FE, Bergman A, Cheng SH, Pankey MS, Valinassab T. 2013. Data from: Lights out: the evolution of bacterial bioluminescence in Loliginidae. In: Dryad Data Repository.
Anderson FE, Bergman A, Cheng SH, Pankey MS, Valinassab T. 2014. Lights out: the evolution of bacterial bioluminescence in Loliginidae. Hydrobiologia 725:189–203.
OpenUrl CrossRef Web of Science
↵
Barton K. 2009. MuMIn: multi-model inference, R package version 0.12.0. http://r-forge.r-project.org/projects/mumin/.
↵
Bates D, Mächler M, Bolker B, Walker S. 2015. Fitting Linear Mixed-Effects Models Using lme4. 2015 67:48.
OpenUrl CrossRef
↵
Bazinet AL, Zwickl DJ, Cummings MP. 2014. A gateway for phylogenetic analysis powered by grid computing featuring GARLI 2.0. Syst. Biol. 63:812–818.
OpenUrl CrossRef PubMed
Bergsten J, Nilsson AN, Ronquist F. 2013a. Bayesian tests of topology hypotheses with an example from diving beetles. Syst. Biol. 62:660–673.
OpenUrl CrossRef PubMed
Bergsten J, Nilsson AN, Ronquist F. 2013b. Data from: Bayesian tests of topology hypotheses with an example from diving beetles. In: Dryad Data Repository.
↵
Betancur-r R, Li C, Munroe TA, Ballesteros JA, Ortí G. 2013. Addressing gene tree discordance and non-stationarity to resolve a multi-locus phylogeny of the flatfishes (Teleostei: Pleuronectiformes). Syst. Biol. 62:763–785.
OpenUrl CrossRef PubMed
↵
Blanquart S, Lartillot N. 2006. A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution. Mol. Biol. Evol. 23:2058–2071.
OpenUrl CrossRef PubMed Web of Science
↵
Bogdanowicz D, Giaro K, Wrobel B. 2012. TreeCmp: Comparison of Trees in Polynomial Time. Evol Bioinform 8:475–487.
OpenUrl
↵
Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H, Xie D, Suchard MA, Rambaut A, Drummond AJ. 2014. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comp. Biol. 10:e1003537.
OpenUrl
↵
Bourlat SJ, Juliusdottir T, Lowe CJ, Freeman R, Aronowicz J, Kirschner M, Lander ES, Thorndyke M, Nakano H, Kohn AB. 2006. Deuterostome phylogeny reveals monophyletic chordates and the new phylum Xenoturbellida. Nature 444:85.
OpenUrl CrossRef PubMed Web of Science
↵
Boussau B, Gouy M. 2006. Efficient likelihood computations with nonreversible models of evolution. Syst. Biol. 55:756–768.
OpenUrl CrossRef PubMed Web of Science
↵
Bowker AH. 1948. A test for symmetry in contingency tables. J Am Stat Assoc 43:572–574.
OpenUrl CrossRef PubMed Web of Science
↵
Brady A, Salzberg S. 2011. PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nat. Methods 8:367.
OpenUrl CrossRef PubMed Web of Science
Broughton RE, Betancur RR, Li C, Arratia G, Orti G. 2013a. Data from: Multi-locus phylogenetic analysis reveals the pattern and tempo of bony fish evolution. In: Dryad Data Repository.
Broughton RE, Betancur RR, Li C, Arratia G, Orti G. 2013b. Multi-locus phylogenetic analysis reveals the pattern and tempo of bony fish evolution. PLoS Curr 5.
↵
Brown JM. 2014. Detection of Implausible Phylogenetic Inferences Using Posterior Predictive Assessment of Model Fit. Syst. Biol. 63:334–348.
OpenUrl CrossRef PubMed
↵
Brown JM, ElDabaje R. 2009. PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy. Bioinformatics 25:537–538.
OpenUrl CrossRef PubMed Web of Science
↵
Brown JM, Thomson RC. 2017. Bayes Factors Unmask Highly Variable Information Content, Bias, and Extreme Influence in Phylogenomic Analyses. Syst. Biol. 66:517–530.
OpenUrl
↵
Brown JM, Thomson RC. 2018. Evaluating Model Performance in Evolutionary Biology. Annu Rev Ecol Evol S 49:null.
Brown RM, Siler CD, Das I, Min PY. 2012a. Data from: Testing the phylogenetic affinities of Southeast Asia’s rarest geckos: Flap-legged geckos (Luperosaurus), Flying geckos (Ptychozoon) and their relationship to the pan-Asian genus Gekko. In: Dryad Data Repository.
Brown RM, Siler CD, Das I, Min Y. 2012b. Testing the phylogenetic affinities of Southeast Asia’s rarest geckos: Flap-legged geckos (Luperosaurus), Flying geckos (Ptychozoon) and their relationship to the pan-Asian genus Gekko. Mol Phylogenet Evol 63:915–921.
OpenUrl CrossRef PubMed
↵
Cannon JT, Vellutini BC, Smith J, 3rd., Ronquist F, Jondelius U, Hejnol A. 2016a. Xenacoelomorpha is the sister group to Nephrozoa. Nature 530:89–93.
OpenUrl CrossRef PubMed
↵
Cannon JT, Vellutini BC, Smith J, Ronquist F, Jondelius U, Hejnol A. 2016b. Data from: Xenacoelomorpha is the sister group to Nephrozoa. In: Dryad Data Repository.
↵
Chernomor O, von Haeseler A, Minh BQ. 2016. Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices. Syst. Biol. 65:997–1008.
OpenUrl CrossRef PubMed
Cognato AI, Vogler AP. 2001a. Data from: Exploring data interaction and nucleotide alignment in a multiple gene analysis of Ips (Coleoptera: Scolytinae). In: Dryad Data Repository.
Cognato AI, Vogler AP. 2001b. Exploring data interaction and nucleotide alignment in a multiple gene analysis of Ips (Coleoptera: Scolytinae). Syst. Biol. 50:758–780.
OpenUrl CrossRef PubMed Web of Science
Day JJ, Peart CR, Brown KJ, Bills R, Friel JP, Moritz T. 2013. Data from: Continental diversification of an African catfish radiation (Mochokidae: Synodontis). In: Dryad Data Repository.
Day JJ, Peart CR, Brown KJ, Friel JP, Bills R, Moritz T. 2013. Continental diversification of an African catfish radiation (Mochokidae: Synodontis). Syst. Biol. 62:351–365.
OpenUrl CrossRef PubMed
↵
Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics 6:361.
OpenUrl CrossRef PubMed Web of Science
Devitt TJ, Cameron Devitt SE, Hollingsworth BD, McGuire JA, Moritz C. 2013. Data from: Montane refugia predict population genetic structure in the Large-blotched Ensatina salamander. In: Dryad Data Repository.
Devitt TJ, Devitt SE, Hollingsworth BD, McGuire JA, Moritz C. 2013. Montane refugia predict population genetic structure in the Large-blotched Ensatina salamander. Mol. Ecol. 22:1650–1665.
OpenUrl CrossRef Web of Science
Dornburg A, Moore JA, Webster R, Warren DL, Brandley MC, Iglesias TL, Wainwright PC, Near TJ. 2012a. Data from: Molecular phylogenetics of squirrelfishes and soldierfishes (Teleostei:Beryciformes: Holocentridae): reconciling more than 100 years of taxonomic confusion. In: Dryad Data Repository.
Dornburg A, Moore JA, Webster R, Warren DL, Brandley MC, Iglesias TL, Wainwright PC, Near TJ. 2012b. Molecular phylogenetics of squirrelfishes and soldierfishes (Teleostei: Beryciformes: Holocentridae): reconciling more than 100 years of taxonomic confusion. Mol Phylogenet Evol 65:727–738.
OpenUrl CrossRef PubMed
↵
Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7:214.
OpenUrl CrossRef PubMed
↵
Duchene DA, Duchene S, Ho SYW. 2017. New Statistical Criteria Detect Phylogenetic Bias Caused by Compositional Heterogeneity. Mol. Biol. Evol. 34:1529–1534.
OpenUrl
↵
Dunn M, Greenhill SJ, Levinson SC, Gray RD. 2011. Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473:79.
OpenUrl CrossRef PubMed Web of Science
↵
Dutheil J, Boussau B. 2008. Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs. BMC Evol. Biol. 8:255.
OpenUrl CrossRef PubMed
↵
Eisen JA. 1998. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8:163–167.
OpenUrl FREE Full Text
Faircloth BC, Sorenson L, Santini F, Alfaro ME. 2013a. Data from: A phylogenomic perspective on the radiation of ray-finned fishes based upon targeted sequencing of ultraconserved elements (UCEs). In: Dryad Data Repository.
Faircloth BC, Sorenson L, Santini F, Alfaro ME. 2013b. A Phylogenomic Perspective on the Radiation of Ray-Finned Fishes Based upon Targeted Sequencing of Ultraconserved Elements (UCEs). PLoS One 8:e65923.
OpenUrl CrossRef PubMed
↵
Farrell LE, Roman J, Sunquist ME. 2000. Dietary separation of sympatric carnivores identified by molecular analysis of scats. Mol. Ecol. 9:1583–1590.
OpenUrl CrossRef PubMed Web of Science
↵
Felsenstein J. 2004. Inferring phylogenies: Sinauer associates Sunderland, MA.
Fong JJ, Brown JM, Fujita MK, Boussau B. 2012a. Data from: A phylogenomic approach to vertebrate phylogeny supports a turtle-archosaur affinity and a possible paraphyletic Lissamphibia. In: Dryad Data Repository.
Fong JJ, Brown JM, Fujita MK, Boussau B. 2012b. A phylogenomic approach to vertebrate phylogeny supports a turtle-archosaur affinity and a possible paraphyletic lissamphibia. PLoS One 7:e48990.
OpenUrl CrossRef PubMed
↵
Foster PG. 2004. Modeling compositional heterogeneity. Syst. Biol. 53:485–495.
OpenUrl CrossRef PubMed Web of Science
↵
Foster PG, Hickey DA. 1999. Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J. Mol. Evol. 48:284–290.
OpenUrl CrossRef PubMed Web of Science
↵
Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419:498.
OpenUrl CrossRef PubMed Web of Science
↵
Goldman N. 1993. Statistical tests of models of DNA substitution. J. Mol. Evol. 36:182–198.
OpenUrl CrossRef PubMed Web of Science
↵
Goremykin V, Hellwig F. 2005. Evidence for the most basal split in land plants dividing bryophyte and tracheophyte lineages. Plant Syst. Evol. 254:93–103.
OpenUrl CrossRef
↵
Gray RD, Drummond AJ, Greenhill SJ. 2009. Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement. Science 323:479.
OpenUrl Abstract/FREE Full Text
↵
Grenfell BT, Pybus OG, Gog JR, Wood JLN, Daly JM, Mumford JA, Holmes EC. 2004. Unifying the Epidemiological and Evolutionary Dynamics of Pathogens. Science 303:327.
OpenUrl Abstract/FREE Full Text
↵
Groussin M, Boussau B, Gouy M. 2013. A branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences. Syst. Biol. 62:523–538.
OpenUrl CrossRef PubMed
↵
Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59:307–321.
OpenUrl CrossRef PubMed Web of Science
↵
Hasegawa M, Kishino H, Yano T-a. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174.
OpenUrl CrossRef PubMed Web of Science
↵
Ho JW, Adams CE, Lew JB, Matthews TJ, Ng CC, Shahabi-Sirjani A, Tan LH, Zhao Y, Easteal S, Wilson SR. 2006. SeqVis: visualization of compositional heterogeneity in large alignments of nucleotides. Bioinformatics 22:2162–2163.
OpenUrl CrossRef PubMed Web of Science
↵
Ho SY, Jermiin L. 2004. Tracing the decay of the historical signal in biological sequence data. Syst. Biol. 53:623–637.
OpenUrl CrossRef PubMed Web of Science
↵
Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS. 2018. UFBoot2: Improving the Ultrafast Bootstrap Approximation. Mol. Biol. Evol. 35:518–522.
OpenUrl CrossRef PubMed
↵
Höhna S, Landis MJ, Heath TA, Boussau B, Lartillot N, Moore BR, Huelsenbeck JP, Ronquist F. 2016. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65:726–736.
OpenUrl CrossRef PubMed
Horn JW, Xi Z, Riina R, Peirson JA, Yang Y, Dorsey BL, Berry PE, Davis CC, Wurdack KJ. 2014a. Data from: Evolutionary bursts in Euphorbia (Euphorbiaceae) are linked with photosynthetic pathway. In: Dryad Data Repository.
Horn JW, Xi Z, Riina R, Peirson JA, Yang Y, Dorsey BL, Berry PE, Davis CC, Wurdack KJ. 2014b. Evolutionary bursts in Euphorbia (Euphorbiaceae) are linked with photosynthetic pathway. Evolution 68:3485–3504.
OpenUrl CrossRef PubMed
↵
Hyman IT, Ho SY, Jermiin LS. 2007. Molecular phylogeny of Australian Helicarionidae, Euconulidae and related groups (Gastropoda: Pulmonata: Stylommatophora) based on mitochondrial DNA. Mol. Phylogen. Evol. 45:792–812.
OpenUrl PubMed Web of Science
↵
Jayaswal V, Ababneh F, Jermiin LS, Robinson J. 2011. Reducing Model Complexity of the General Markov Model of Evolution. Mol. Biol. Evol. 28:3045–3059.
OpenUrl CrossRef PubMed Web of Science
↵
Jayaswal V, Robinson J, Jermiin L. 2007. Estimation of Phylogeny and Invariant Sites under the General Markov Model of Nucleotide Sequence Evolution. Syst. Biol. 56:155–162.
OpenUrl CrossRef PubMed Web of Science
↵
Jayaswal V, Wong TK, Robinson J, Poladian L, Jermiin LS. 2014. Mixture models of nucleotide sequence evolution that account for heterogeneity in the substitution process across sites and across lineages. Syst. Biol. 63:726–742.
OpenUrl CrossRef PubMed
↵
Jermiin L, Ho SY, Ababneh F, Robinson J, Larkum AW. 2004. The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst. Biol. 53:638–643.
OpenUrl CrossRef PubMed Web of Science
↵
1. Keith JM
Jermiin LS, Jayaswal V, Ababneh FM, Robinson J. 2017. Identifying Optimal Models of Evolution. In: Keith JM, editor. Bioinformatics. Melbourne: Humana Press, New York, NY. p. 379–420.
↵
Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. 2017. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14:587–589.
OpenUrl CrossRef PubMed
Kawahara AY, Rubinoff D. 2013a. Convergent evolution of morphology and habitat use in the explosive Hawaiian fancy case caterpillar radiation. J. Evol. Biol. 26:1763–1773.
OpenUrl CrossRef PubMed
Kawahara AY, Rubinoff D. 2013b. Data from: Convergent evolution in the explosive Hawaiian Fancy Cased caterpillar radiation. In: Dryad Data Repository.
↵
Kimura M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120.
OpenUrl CrossRef PubMed Web of Science
↵
Kishino H, Miyata T, Hasegawa M. (Kishino1990 co-authors). 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J. Mol. Evol. 31:151–160.
OpenUrl CrossRef Web of Science
↵
Knight R, Maxwell P, Birmingham A, Carnes J, Caporaso JG, Easton BC, Eaton M, Hamady M, Lindsay H, Liu Z. 2007. PyCogent: a toolkit for making sense from sequence. Genome biology 8:R171.
OpenUrl CrossRef PubMed
↵
Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K. 2012. Statistics and truth in phylogenomics. Mol. Biol. Evol. 29:457–472.
OpenUrl CrossRef PubMed Web of Science
↵
Kumar S, Gadagkar SR. 2001. Disparity index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. Genetics 158:1321–1327.
OpenUrl Abstract/FREE Full Text
Lartillot N, Delsuc F. 2012a. Data from: Joint reconstruction of divergence times and life-history evolution in placental mammals using a phylogenetic covariance model. In: Dryad Data Repository.
Lartillot N, Delsuc F. 2012b. Joint reconstruction of divergence times and life-history evolution in placental mammals using a phylogenetic covariance model. Evolution 66:1773–1787.
OpenUrl CrossRef PubMed Web of Science
↵
Lartillot N, Philippe H. 2004. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21:1095–1109.
OpenUrl CrossRef PubMed Web of Science
↵
Martijn J, Vosseberg J, Guy L, Offre P, Ettema TJ. 2018. Deep mitochondrial origin outside the sampled alphaproteobacteria. Nature.
↵
Mäser P, Thomine S, Schroeder JI, Ward JM, Hirschi K, Sze H, Talke IN, Amtmann A, Maathuis FJM, Sanders D, et al. 2001. Phylogenetic Relationships within Cation Transporter Families of Arabidopsis. Plant Physiol. 126:1646.
OpenUrl Abstract/FREE Full Text
McCormack JE, Harvey MG, Faircloth BC, Crawford NG, Glenn TC, Brumfield RT. 2013a. Data from: A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. In: Dryad Data Repository.
McCormack JE, Harvey MG, Faircloth BC, Crawford NG, Glenn TC, Brumfield RT. 2013b. A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. PLoS One 8:e54848.
OpenUrl CrossRef PubMed
↵
Mir A, Russello F. 2010. The mean value of the squared path-difference distance for rooted phylogenetic trees. J Math Anal Appl 371:168–176.
OpenUrl
Moyle RG, Oliveros CH, Andersen MJ, Hosner PA, Benz BW, Manthey JD, Travers SL, Brown RM, Faircloth BC. 2016a. Data from: Tectonic collision and uplift of Wallacea triggered the global songbird radiation. In: Dryad Data Repository.
Moyle RG, Oliveros CH, Andersen MJ, Hosner PA, Benz BW, Manthey JD, Travers SL, Brown RM, Faircloth BC. 2016b. Tectonic collision and uplift of Wallacea triggered the global songbird radiation. Nat Commun 7:12709.
OpenUrl CrossRef PubMed
Murray EA, Carmichael AE, Heraty JM. 2013a. Ancient host shifts followed by host conservatism in a group of ant parasitoids. Proc Biol Sci 280:20130495.
OpenUrl CrossRef PubMed
Murray EA, Carmichael AE, Heraty JM. 2013b. Data from: Ancient host shifts followed by host conservatism in a group of ant parasitoids. In: Dryad Data Repository.
↵
Murray S, Jørgensen MF, Ho SY, Patterson DJ, Jermiin LS. 2005. Improving the analysis of dinoflagellate phylogeny based on rDNA. Protist 156:269–286.
OpenUrl CrossRef PubMed Web of Science
↵
Nabholz B, Künstner A, Wang R, Jarvis ED, Ellegren H. 2011. Dynamic Evolution of Base Composition: Causes and Consequences in Avian Phylogenomics. Mol. Biol. Evol. 28:2197–2210.
OpenUrl CrossRef PubMed Web of Science
↵
Nakagawa S, Schielzeth H. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4:133–142.
OpenUrl
↵
Nesnidal MP, Helmkampf M, Bruchhaus I, Hausdorf B. 2010. Compositional Heterogeneity and Phylogenomic Inference of Metazoan Relationships. Mol. Biol. Evol. 27:2095–2104.
OpenUrl CrossRef PubMed Web of Science
↵
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32:268–274.
OpenUrl CrossRef PubMed
Oaks JR. 2011a. Data from: A time-calibrated species tree of Crocodylia reveals a recent radiation of the true crocodiles. In: Dryad Data Repository.
Oaks JR. 2011b. A time-calibrated species tree of Crocodylia reveals a recent radiation of the true crocodiles. Evolution 65:3285–3297.
OpenUrl CrossRef PubMed Web of Science
↵
Paton T, Haddrath O, Baker AJ. 2002. Complete mitochondrial DNA genome sequences show that modern birds are not descended from transitional shorebirds. Proceedings of the Royal Society of London B: Biological Sciences 269:839–846.
OpenUrl GeoRef PubMed
↵
Philippe H, Brinkmann H, Copley RR, Moroz LL, Nakano H, Poustka AJ, Wallberg A, Peterson KJ, Telford MJ. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255.
OpenUrl CrossRef PubMed Web of Science
↵
Philippe H, Delsuc F, Brinkmann H, Lartillot N. 2005. Phylogenomics. Annu Rev Ecol Evol S 36:541–562.
OpenUrl
↵
Phillips MJ, Delsuc Fdr, Penny D. 2004. Genome-Scale Phylogeny and the Detection of Systematic Biases. Mol. Biol. Evol. 21:1455–1458.
OpenUrl CrossRef PubMed Web of Science
Rightmyer MG, Griswold T, Brady SG. 2013a. Data from: Phylogeny and systematics of the bee genus Osmia (Hymenoptera: Megachilidae) with emphasis on North American Melanosmia: subgenera, synonymies, and nesting biology revisited. In: Dryad Data Repository.
Rightmyer MG, Griswold T, Brady SG. 2013b. Phylogeny and systematics of the bee genus Osmia (Hymenoptera: Megachilidae) with emphasis on North American Melanosmia: subgenera, synonymies and nesting biology revisited. Syst. Entomol. 38:561–576.
OpenUrl CrossRef Web of Science
↵
Roberts D, Yang Z. 1995. On the use of nucleic acid sequences to infer early branchings in the tree of life. Mol. Biol. Evol. 12:451–458.
OpenUrl CrossRef PubMed Web of Science
↵
Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP. 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61:539–542.
OpenUrl CrossRef PubMed
↵
Rzhetsky A, Nei M. 1995. Tests of applicability of several substitution models for DNA sequence data. Mol. Biol. Evol. 12:131–151.
OpenUrl CrossRef PubMed Web of Science
↵
Salipante SJ, Horwitz MS. 2006. Phylogenetic fate mapping. Proceedings of the National Academy of Sciences 103:5448.
OpenUrl Abstract/FREE Full Text
↵
Sand A, Pedersen CNS, Brodal GS, Johansen J, Holt MK, Mailund T. 2014. tqDist: a library for computing the quartet and triplet distances between binary or general trees. Bioinformatics 30:2079–2080.
OpenUrl CrossRef PubMed
Sauquet H, Ho SY, Gandolfo MA, Jordan GJ, Wilf P, Cantrill DJ, Bayly MJ, Bromham L, Brown GK, Carpenter RJ, et al. 2012. Testing the impact of calibration on molecular divergence times using a fossil-rich group: the case of Nothofagus (Fagales). Syst. Biol. 61:289–313.
OpenUrl CrossRef PubMed
Sauquet H, Ho SYW, Gandolfo MA, Jordan GJ, Wilf P, Cantrill DJ, Bayly MJ, Bromham L, Brown GK, Carpenter RJ, et al. 2011. Data from: Testing the impact of calibration on molecular divergence times using a fossil-rich group: the case of Nothofagus (Fagales). In: Dryad Data Repository.
Seago AE, Giorgi JA, Li J, Ślipiński A. 2011a. Data from: Phylogeny, classification and evolution of ladybird beetles (Coleoptera: Coccinellidae) based on simultaneous analysis of molecular and morphological data. In: Dryad Data Repository.
Seago AE, Giorgi JA, Li J, Ślipiński A. 2011b. Phylogeny, classification and evolution of ladybird beetles (Coleoptera: Coccinellidae) based on simultaneous analysis of molecular and morphological data. Mol. Phylogen. Evol. 60:137–151.
OpenUrl CrossRef PubMed
Sharanowski BJ, Dowling APG, Sharkey MJ. 2011a. Data from: Molecular phylogenetics of Braconidae (Hymenoptera: Ichneumonoidea) based on multiple nuclear genes and implications for classification. In: Dryad Data Repository.
Sharanowski BJ, Dowling APG, Sharkey MJ. 2011b. Molecular phylogenetics of Braconidae (Hymenoptera: Ichneumonoidea), based on multiple nuclear genes, and implications for classification. Syst. Entomol. 36:549–572.
OpenUrl CrossRef Web of Science
↵
Sheffield NC, Song H, Cameron SL, Whiting MF. 2009. Nonstationary Evolution and Compositional Heterogeneity in Beetle Mitochondrial Phylogenomics. Syst. Biol. 58:381–394.
OpenUrl CrossRef PubMed Web of Science
↵
Shimodaira H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51:492–508.
OpenUrl CrossRef PubMed Web of Science
↵
Shimodaira H, Hasegawa M. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:1114–1116.
OpenUrl CrossRef Web of Science
Siler C, Brown RM, Oliveros CH, Santanen A. 2013. Data from: Multilocus phylogeny reveals unexpected diversification patterns in Asian Wolf Snakes (genus Lycodon). In: Dryad Data Repository.
Siler CD, Oliveros CH, Santanen A, Brown RM. 2013. Multilocus phylogeny reveals unexpected diversification patterns in Asian wolf snakes (genus Lycodon). Zool. Scr. 42:262–277.
OpenUrl CrossRef Web of Science
↵
Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313.
OpenUrl CrossRef PubMed Web of Science
↵
Steel MA, Penny D. 1993. Distributions of Tree Comparison Metrics - Some New Results. Syst. Biol. 42:126–141.
OpenUrl CrossRef Web of Science
↵
Stuart A. 1955. A Test for Homogeneity of the Marginal Distributions in a Two-Way Classification. Biometrika 42:412–416.
OpenUrl CrossRef Web of Science
↵
Sullivan J, Joyce P. 2005. Model selection in phylogenetics. Annual Review of Ecology Evolution and Systematics 36:445–466.
OpenUrl CrossRef Web of Science
↵
Sumner JG, Fernandez-Sanchez J, Jarvis PD. 2012. Lie Markov models. J. Theor. Biol. 298:16–31.
OpenUrl CrossRef PubMed Web of Science
↵
Swofford DL. 2001. Paup*: Phylogenetic analysis using parsimony (and other methods) 4.0. B5.
↵
Tarrío R, Rodríguez-Trelles F, Ayala FJ. 2001. Shared nucleotide composition biases among species and their impact on phylogenetic reconstructions of the Drosophilidae. Mol. Biol. Evol. 18:1464–1473.
OpenUrl CrossRef PubMed Web of Science
Tolley KA, Townsend TM, Vences M. 2013a. Data from: Large-scale phylogeny of chameleons suggests African origins and Eocene diversification. In: Dryad Data Repository.
Tolley KA, Townsend TM, Vences M. 2013b. Large-scale phylogeny of chameleons suggests African origins and Eocene diversification. Proc Biol Sci 280:20130184.
OpenUrl CrossRef PubMed
Unmack PJ, Allen GR, Johnson JB. 2013a. Data from: Phylogeny and biogeography of rainbowfishes (Melanotaeniidae) from Australia and New Guinea. In: Dryad Data Repository.
Unmack PJ, Allen GR, Johnson JB. 2013b. Phylogeny and biogeography of rainbowfishes (Melanotaeniidae) from Australia and New Guinea. Mol Phylogenet Evol 67:15–27.
OpenUrl CrossRef PubMed
Wainwright PC, Smith WL, Price SA, Tang KL, Sparks JS, Ferry LA, Kuhn KL, Eytan RI, Near TJ. 2012. The evolution of pharyngognathy: a phylogenetic and functional appraisal of the pharyngeal jaw key innovation in labroid fishes and beyond. Syst. Biol. 61:1001–1027.
OpenUrl CrossRef PubMed
Wainwright PC, Smith WL, Price SA, Tang KL, Sparks JS, Ferry LA, Kuhn KL, Near TJ. 2012. Data from: The evolution of pharyngognathy: a phylogenetic and functional appraisal of the pharyngeal jaw key innovation in labroid fishes and beyond. In: Dryad Data Repository.
↵
Weiss G, von Haeseler A. 2003. Testing Substitution Models Within a Phylogenetic Tree. Mol. Biol. Evol. 20:572–578.
OpenUrl CrossRef PubMed Web of Science
Wood HM, Matzke NJ, Gillespie RG, Griswold CE. 2012. Data from: Treating fossils as terminal taxa in divergence time estimation reveals ancient vicariance patterns in the palpimanoid spiders. In: Dryad Data Repository.
Wood HM, Matzke NJ, Gillespie RG, Griswold CE. 2013. Treating fossils as terminal taxa in divergence time estimation reveals ancient vicariance patterns in the palpimanoid spiders. Syst. Biol. 62:264–284.
OpenUrl CrossRef PubMed
↵
Woodhams MD, Fernandez-Sanchez J, Sumner JG. 2015. A New Hierarchy of Phylogenetic Models Consistent with Heterogeneous Substitution Rates. Syst. Biol. 64:638–650.
OpenUrl CrossRef PubMed
Worobey M, Han G, Rambaut A. 2014a. Data from: A synchronized global sweep of the internal genes of modern avian influenza virus. In: Dryad Data Repository.
Worobey M, Han GZ, Rambaut A. 2014b. A synchronized global sweep of the internal genes of modern avian influenza virus. Nature 508:254–257.
OpenUrl CrossRef PubMed Web of Science
↵
Yang Z. 1994. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39:105–111.
OpenUrl CrossRef PubMed Web of Science
↵
Yang Z, Rannala B. 2012. Molecular phylogenetics: principles and practice. Nat. Rev. Genet. 13:303–314.
OpenUrl CrossRef PubMed
↵
Yao H, Kristensen DM, Mihalek I, Sowa ME, Shaw C, Kimmel M, Kavraki L, Lichtarge O. 2003. An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol. 326:255–261.
OpenUrl CrossRef PubMed Web of Science
↵
Yao Y-G, Bravi CM, Bandelt H-J. 2004. A call for mtDNA data quality control in forensic science. Forensic Sci. Int. 141:1–6.
OpenUrl CrossRef PubMed Web of Science
↵
Yap VB, Speed T. 2005. Rooting a phylogenetic tree with nonreversible substitution models. BMC Evol. Biol. 5:2.
OpenUrl CrossRef PubMed
↵
Zou L, Susko E, Field C, Roger AJ. 2012. Fitting nonstationary general-time-reversible models to obtain edge-lengths and frequencies for the Barry–Hartigan model. Syst. Biol. 61:927–940.
OpenUrl CrossRef PubMed

View the discussion thread.

Posted May 16, 2019.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Molecular Biology

Subject Areas

All Articles

Animal Behavior and Cognition (5200)
Biochemistry (11703)
Bioengineering (8718)
Bioinformatics (29127)
Biophysics (14930)
Cancer Biology (12048)
Cell Biology (17353)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14143)
Epidemiology (2067)
Evolutionary Biology (18266)
Genetics (12219)
Genomics (16765)
Immunology (11841)
Microbiology (28003)
Molecular Biology (11551)
Neuroscience (60804)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3229)
Physiology (4939)
Plant Biology (10383)
Scientific Communication and Education (1679)
Synthetic Biology (2877)
Systems Biology (7333)
Zoology (1642)

[1] ↵
Ababneh F, Jermiin LS, Ma C, Robinson J. 2006. Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics 22:1225–1231.
OpenUrl CrossRef PubMed Web of Science

[2] Anderson FE, Bergman A, Cheng SH, Pankey MS, Valinassab T. 2013. Data from: Lights out: the evolution of bacterial bioluminescence in Loliginidae. In: Dryad Data Repository.

[3] Anderson FE, Bergman A, Cheng SH, Pankey MS, Valinassab T. 2014. Lights out: the evolution of bacterial bioluminescence in Loliginidae. Hydrobiologia 725:189–203.
OpenUrl CrossRef Web of Science

[4] ↵
Barton K. 2009. MuMIn: multi-model inference, R package version 0.12.0. http://r-forge.r-project.org/projects/mumin/.

[5] ↵
Bates D, Mächler M, Bolker B, Walker S. 2015. Fitting Linear Mixed-Effects Models Using lme4. 2015 67:48.
OpenUrl CrossRef

[6] ↵
Bazinet AL, Zwickl DJ, Cummings MP. 2014. A gateway for phylogenetic analysis powered by grid computing featuring GARLI 2.0. Syst. Biol. 63:812–818.
OpenUrl CrossRef PubMed

[7] Bergsten J, Nilsson AN, Ronquist F. 2013a. Bayesian tests of topology hypotheses with an example from diving beetles. Syst. Biol. 62:660–673.
OpenUrl CrossRef PubMed

[8] Bergsten J, Nilsson AN, Ronquist F. 2013b. Data from: Bayesian tests of topology hypotheses with an example from diving beetles. In: Dryad Data Repository.

[9] ↵
Betancur-r R, Li C, Munroe TA, Ballesteros JA, Ortí G. 2013. Addressing gene tree discordance and non-stationarity to resolve a multi-locus phylogeny of the flatfishes (Teleostei: Pleuronectiformes). Syst. Biol. 62:763–785.
OpenUrl CrossRef PubMed

[10] ↵
Blanquart S, Lartillot N. 2006. A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution. Mol. Biol. Evol. 23:2058–2071.
OpenUrl CrossRef PubMed Web of Science

[11] ↵
Bogdanowicz D, Giaro K, Wrobel B. 2012. TreeCmp: Comparison of Trees in Polynomial Time. Evol Bioinform 8:475–487.
OpenUrl

[12] ↵
Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H, Xie D, Suchard MA, Rambaut A, Drummond AJ. 2014. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comp. Biol. 10:e1003537.
OpenUrl

[13] ↵
Bourlat SJ, Juliusdottir T, Lowe CJ, Freeman R, Aronowicz J, Kirschner M, Lander ES, Thorndyke M, Nakano H, Kohn AB. 2006. Deuterostome phylogeny reveals monophyletic chordates and the new phylum Xenoturbellida. Nature 444:85.
OpenUrl CrossRef PubMed Web of Science

[14] ↵
Boussau B, Gouy M. 2006. Efficient likelihood computations with nonreversible models of evolution. Syst. Biol. 55:756–768.
OpenUrl CrossRef PubMed Web of Science

[15] ↵
Bowker AH. 1948. A test for symmetry in contingency tables. J Am Stat Assoc 43:572–574.
OpenUrl CrossRef PubMed Web of Science

[16] ↵
Brady A, Salzberg S. 2011. PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nat. Methods 8:367.
OpenUrl CrossRef PubMed Web of Science

[17] Broughton RE, Betancur RR, Li C, Arratia G, Orti G. 2013a. Data from: Multi-locus phylogenetic analysis reveals the pattern and tempo of bony fish evolution. In: Dryad Data Repository.

[18] Broughton RE, Betancur RR, Li C, Arratia G, Orti G. 2013b. Multi-locus phylogenetic analysis reveals the pattern and tempo of bony fish evolution. PLoS Curr 5.

[19] ↵
Brown JM. 2014. Detection of Implausible Phylogenetic Inferences Using Posterior Predictive Assessment of Model Fit. Syst. Biol. 63:334–348.
OpenUrl CrossRef PubMed

[20] ↵
Brown JM, ElDabaje R. 2009. PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy. Bioinformatics 25:537–538.
OpenUrl CrossRef PubMed Web of Science

[21] ↵
Brown JM, Thomson RC. 2017. Bayes Factors Unmask Highly Variable Information Content, Bias, and Extreme Influence in Phylogenomic Analyses. Syst. Biol. 66:517–530.
OpenUrl

[22] ↵
Brown JM, Thomson RC. 2018. Evaluating Model Performance in Evolutionary Biology. Annu Rev Ecol Evol S 49:null.

[23] Brown RM, Siler CD, Das I, Min PY. 2012a. Data from: Testing the phylogenetic affinities of Southeast Asia’s rarest geckos: Flap-legged geckos (Luperosaurus), Flying geckos (Ptychozoon) and their relationship to the pan-Asian genus Gekko. In: Dryad Data Repository.

[24] Brown RM, Siler CD, Das I, Min Y. 2012b. Testing the phylogenetic affinities of Southeast Asia’s rarest geckos: Flap-legged geckos (Luperosaurus), Flying geckos (Ptychozoon) and their relationship to the pan-Asian genus Gekko. Mol Phylogenet Evol 63:915–921.
OpenUrl CrossRef PubMed

[25] ↵
Cannon JT, Vellutini BC, Smith J, 3rd., Ronquist F, Jondelius U, Hejnol A. 2016a. Xenacoelomorpha is the sister group to Nephrozoa. Nature 530:89–93.
OpenUrl CrossRef PubMed

[26] ↵
Cannon JT, Vellutini BC, Smith J, Ronquist F, Jondelius U, Hejnol A. 2016b. Data from: Xenacoelomorpha is the sister group to Nephrozoa. In: Dryad Data Repository.

[27] ↵
Chernomor O, von Haeseler A, Minh BQ. 2016. Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices. Syst. Biol. 65:997–1008.
OpenUrl CrossRef PubMed

[28] Cognato AI, Vogler AP. 2001a. Data from: Exploring data interaction and nucleotide alignment in a multiple gene analysis of Ips (Coleoptera: Scolytinae). In: Dryad Data Repository.

[29] Cognato AI, Vogler AP. 2001b. Exploring data interaction and nucleotide alignment in a multiple gene analysis of Ips (Coleoptera: Scolytinae). Syst. Biol. 50:758–780.
OpenUrl CrossRef PubMed Web of Science

[30] Day JJ, Peart CR, Brown KJ, Bills R, Friel JP, Moritz T. 2013. Data from: Continental diversification of an African catfish radiation (Mochokidae: Synodontis). In: Dryad Data Repository.

[31] Day JJ, Peart CR, Brown KJ, Friel JP, Bills R, Moritz T. 2013. Continental diversification of an African catfish radiation (Mochokidae: Synodontis). Syst. Biol. 62:351–365.
OpenUrl CrossRef PubMed

[32] ↵
Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics 6:361.
OpenUrl CrossRef PubMed Web of Science

[33] Devitt TJ, Cameron Devitt SE, Hollingsworth BD, McGuire JA, Moritz C. 2013. Data from: Montane refugia predict population genetic structure in the Large-blotched Ensatina salamander. In: Dryad Data Repository.

[34] Devitt TJ, Devitt SE, Hollingsworth BD, McGuire JA, Moritz C. 2013. Montane refugia predict population genetic structure in the Large-blotched Ensatina salamander. Mol. Ecol. 22:1650–1665.
OpenUrl CrossRef Web of Science

[35] Dornburg A, Moore JA, Webster R, Warren DL, Brandley MC, Iglesias TL, Wainwright PC, Near TJ. 2012a. Data from: Molecular phylogenetics of squirrelfishes and soldierfishes (Teleostei:Beryciformes: Holocentridae): reconciling more than 100 years of taxonomic confusion. In: Dryad Data Repository.

[36] Dornburg A, Moore JA, Webster R, Warren DL, Brandley MC, Iglesias TL, Wainwright PC, Near TJ. 2012b. Molecular phylogenetics of squirrelfishes and soldierfishes (Teleostei: Beryciformes: Holocentridae): reconciling more than 100 years of taxonomic confusion. Mol Phylogenet Evol 65:727–738.
OpenUrl CrossRef PubMed

[37] ↵
Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7:214.
OpenUrl CrossRef PubMed

[38] ↵
Duchene DA, Duchene S, Ho SYW. 2017. New Statistical Criteria Detect Phylogenetic Bias Caused by Compositional Heterogeneity. Mol. Biol. Evol. 34:1529–1534.
OpenUrl

[39] ↵
Dunn M, Greenhill SJ, Levinson SC, Gray RD. 2011. Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473:79.
OpenUrl CrossRef PubMed Web of Science

[40] ↵
Dutheil J, Boussau B. 2008. Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs. BMC Evol. Biol. 8:255.
OpenUrl CrossRef PubMed

[41] ↵
Eisen JA. 1998. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8:163–167.
OpenUrl FREE Full Text

[42] Faircloth BC, Sorenson L, Santini F, Alfaro ME. 2013a. Data from: A phylogenomic perspective on the radiation of ray-finned fishes based upon targeted sequencing of ultraconserved elements (UCEs). In: Dryad Data Repository.

[43] Faircloth BC, Sorenson L, Santini F, Alfaro ME. 2013b. A Phylogenomic Perspective on the Radiation of Ray-Finned Fishes Based upon Targeted Sequencing of Ultraconserved Elements (UCEs). PLoS One 8:e65923.
OpenUrl CrossRef PubMed

[44] ↵
Farrell LE, Roman J, Sunquist ME. 2000. Dietary separation of sympatric carnivores identified by molecular analysis of scats. Mol. Ecol. 9:1583–1590.
OpenUrl CrossRef PubMed Web of Science

[45] ↵
Felsenstein J. 2004. Inferring phylogenies: Sinauer associates Sunderland, MA.

[46] Fong JJ, Brown JM, Fujita MK, Boussau B. 2012a. Data from: A phylogenomic approach to vertebrate phylogeny supports a turtle-archosaur affinity and a possible paraphyletic Lissamphibia. In: Dryad Data Repository.

[47] Fong JJ, Brown JM, Fujita MK, Boussau B. 2012b. A phylogenomic approach to vertebrate phylogeny supports a turtle-archosaur affinity and a possible paraphyletic lissamphibia. PLoS One 7:e48990.
OpenUrl CrossRef PubMed

[48] ↵
Foster PG. 2004. Modeling compositional heterogeneity. Syst. Biol. 53:485–495.
OpenUrl CrossRef PubMed Web of Science

[49] ↵
Foster PG, Hickey DA. 1999. Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J. Mol. Evol. 48:284–290.
OpenUrl CrossRef PubMed Web of Science

[50] ↵
Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419:498.
OpenUrl CrossRef PubMed Web of Science

[51] ↵
Goldman N. 1993. Statistical tests of models of DNA substitution. J. Mol. Evol. 36:182–198.
OpenUrl CrossRef PubMed Web of Science

[52] ↵
Goremykin V, Hellwig F. 2005. Evidence for the most basal split in land plants dividing bryophyte and tracheophyte lineages. Plant Syst. Evol. 254:93–103.
OpenUrl CrossRef

[53] ↵
Gray RD, Drummond AJ, Greenhill SJ. 2009. Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement. Science 323:479.
OpenUrl Abstract/FREE Full Text

[54] ↵
Grenfell BT, Pybus OG, Gog JR, Wood JLN, Daly JM, Mumford JA, Holmes EC. 2004. Unifying the Epidemiological and Evolutionary Dynamics of Pathogens. Science 303:327.
OpenUrl Abstract/FREE Full Text

[55] ↵
Groussin M, Boussau B, Gouy M. 2013. A branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences. Syst. Biol. 62:523–538.
OpenUrl CrossRef PubMed

[56] ↵
Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59:307–321.
OpenUrl CrossRef PubMed Web of Science

[57] ↵
Hasegawa M, Kishino H, Yano T-a. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174.
OpenUrl CrossRef PubMed Web of Science

[58] ↵
Ho JW, Adams CE, Lew JB, Matthews TJ, Ng CC, Shahabi-Sirjani A, Tan LH, Zhao Y, Easteal S, Wilson SR. 2006. SeqVis: visualization of compositional heterogeneity in large alignments of nucleotides. Bioinformatics 22:2162–2163.
OpenUrl CrossRef PubMed Web of Science

[59] ↵
Ho SY, Jermiin L. 2004. Tracing the decay of the historical signal in biological sequence data. Syst. Biol. 53:623–637.
OpenUrl CrossRef PubMed Web of Science

[60] ↵
Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS. 2018. UFBoot2: Improving the Ultrafast Bootstrap Approximation. Mol. Biol. Evol. 35:518–522.
OpenUrl CrossRef PubMed

[61] ↵
Höhna S, Landis MJ, Heath TA, Boussau B, Lartillot N, Moore BR, Huelsenbeck JP, Ronquist F. 2016. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65:726–736.
OpenUrl CrossRef PubMed

[62] Horn JW, Xi Z, Riina R, Peirson JA, Yang Y, Dorsey BL, Berry PE, Davis CC, Wurdack KJ. 2014a. Data from: Evolutionary bursts in Euphorbia (Euphorbiaceae) are linked with photosynthetic pathway. In: Dryad Data Repository.

[63] Horn JW, Xi Z, Riina R, Peirson JA, Yang Y, Dorsey BL, Berry PE, Davis CC, Wurdack KJ. 2014b. Evolutionary bursts in Euphorbia (Euphorbiaceae) are linked with photosynthetic pathway. Evolution 68:3485–3504.
OpenUrl CrossRef PubMed

[64] ↵
Hyman IT, Ho SY, Jermiin LS. 2007. Molecular phylogeny of Australian Helicarionidae, Euconulidae and related groups (Gastropoda: Pulmonata: Stylommatophora) based on mitochondrial DNA. Mol. Phylogen. Evol. 45:792–812.
OpenUrl PubMed Web of Science

[65] ↵
Jayaswal V, Ababneh F, Jermiin LS, Robinson J. 2011. Reducing Model Complexity of the General Markov Model of Evolution. Mol. Biol. Evol. 28:3045–3059.
OpenUrl CrossRef PubMed Web of Science

[66] ↵
Jayaswal V, Robinson J, Jermiin L. 2007. Estimation of Phylogeny and Invariant Sites under the General Markov Model of Nucleotide Sequence Evolution. Syst. Biol. 56:155–162.
OpenUrl CrossRef PubMed Web of Science

[67] ↵
Jayaswal V, Wong TK, Robinson J, Poladian L, Jermiin LS. 2014. Mixture models of nucleotide sequence evolution that account for heterogeneity in the substitution process across sites and across lineages. Syst. Biol. 63:726–742.
OpenUrl CrossRef PubMed

[68] ↵
Jermiin L, Ho SY, Ababneh F, Robinson J, Larkum AW. 2004. The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst. Biol. 53:638–643.
OpenUrl CrossRef PubMed Web of Science

[69] ↵
Keith JM
Jermiin LS, Jayaswal V, Ababneh FM, Robinson J. 2017. Identifying Optimal Models of Evolution. In: Keith JM, editor. Bioinformatics. Melbourne: Humana Press, New York, NY. p. 379–420.

[70] Keith JM

[71] ↵
Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. 2017. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14:587–589.
OpenUrl CrossRef PubMed

[72] Kawahara AY, Rubinoff D. 2013a. Convergent evolution of morphology and habitat use in the explosive Hawaiian fancy case caterpillar radiation. J. Evol. Biol. 26:1763–1773.
OpenUrl CrossRef PubMed

[73] Kawahara AY, Rubinoff D. 2013b. Data from: Convergent evolution in the explosive Hawaiian Fancy Cased caterpillar radiation. In: Dryad Data Repository.

[74] ↵
Kimura M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120.
OpenUrl CrossRef PubMed Web of Science

[75] ↵
Kishino H, Miyata T, Hasegawa M. (Kishino1990 co-authors). 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J. Mol. Evol. 31:151–160.
OpenUrl CrossRef Web of Science

[76] ↵
Knight R, Maxwell P, Birmingham A, Carnes J, Caporaso JG, Easton BC, Eaton M, Hamady M, Lindsay H, Liu Z. 2007. PyCogent: a toolkit for making sense from sequence. Genome biology 8:R171.
OpenUrl CrossRef PubMed

[77] ↵
Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K. 2012. Statistics and truth in phylogenomics. Mol. Biol. Evol. 29:457–472.
OpenUrl CrossRef PubMed Web of Science

[78] ↵
Kumar S, Gadagkar SR. 2001. Disparity index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. Genetics 158:1321–1327.
OpenUrl Abstract/FREE Full Text

[79] Lartillot N, Delsuc F. 2012a. Data from: Joint reconstruction of divergence times and life-history evolution in placental mammals using a phylogenetic covariance model. In: Dryad Data Repository.

[80] Lartillot N, Delsuc F. 2012b. Joint reconstruction of divergence times and life-history evolution in placental mammals using a phylogenetic covariance model. Evolution 66:1773–1787.
OpenUrl CrossRef PubMed Web of Science

[81] ↵
Lartillot N, Philippe H. 2004. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21:1095–1109.
OpenUrl CrossRef PubMed Web of Science

[82] ↵
Martijn J, Vosseberg J, Guy L, Offre P, Ettema TJ. 2018. Deep mitochondrial origin outside the sampled alphaproteobacteria. Nature.

[83] ↵
Mäser P, Thomine S, Schroeder JI, Ward JM, Hirschi K, Sze H, Talke IN, Amtmann A, Maathuis FJM, Sanders D, et al. 2001. Phylogenetic Relationships within Cation Transporter Families of Arabidopsis. Plant Physiol. 126:1646.
OpenUrl Abstract/FREE Full Text

[84] McCormack JE, Harvey MG, Faircloth BC, Crawford NG, Glenn TC, Brumfield RT. 2013a. Data from: A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. In: Dryad Data Repository.

[85] McCormack JE, Harvey MG, Faircloth BC, Crawford NG, Glenn TC, Brumfield RT. 2013b. A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. PLoS One 8:e54848.
OpenUrl CrossRef PubMed

[86] ↵
Mir A, Russello F. 2010. The mean value of the squared path-difference distance for rooted phylogenetic trees. J Math Anal Appl 371:168–176.
OpenUrl

[87] Moyle RG, Oliveros CH, Andersen MJ, Hosner PA, Benz BW, Manthey JD, Travers SL, Brown RM, Faircloth BC. 2016a. Data from: Tectonic collision and uplift of Wallacea triggered the global songbird radiation. In: Dryad Data Repository.

[88] Moyle RG, Oliveros CH, Andersen MJ, Hosner PA, Benz BW, Manthey JD, Travers SL, Brown RM, Faircloth BC. 2016b. Tectonic collision and uplift of Wallacea triggered the global songbird radiation. Nat Commun 7:12709.
OpenUrl CrossRef PubMed

[89] Murray EA, Carmichael AE, Heraty JM. 2013a. Ancient host shifts followed by host conservatism in a group of ant parasitoids. Proc Biol Sci 280:20130495.
OpenUrl CrossRef PubMed

[90] Murray EA, Carmichael AE, Heraty JM. 2013b. Data from: Ancient host shifts followed by host conservatism in a group of ant parasitoids. In: Dryad Data Repository.

[91] ↵
Murray S, Jørgensen MF, Ho SY, Patterson DJ, Jermiin LS. 2005. Improving the analysis of dinoflagellate phylogeny based on rDNA. Protist 156:269–286.
OpenUrl CrossRef PubMed Web of Science

[92] ↵
Nabholz B, Künstner A, Wang R, Jarvis ED, Ellegren H. 2011. Dynamic Evolution of Base Composition: Causes and Consequences in Avian Phylogenomics. Mol. Biol. Evol. 28:2197–2210.
OpenUrl CrossRef PubMed Web of Science

[93] ↵
Nakagawa S, Schielzeth H. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4:133–142.
OpenUrl

[94] ↵
Nesnidal MP, Helmkampf M, Bruchhaus I, Hausdorf B. 2010. Compositional Heterogeneity and Phylogenomic Inference of Metazoan Relationships. Mol. Biol. Evol. 27:2095–2104.
OpenUrl CrossRef PubMed Web of Science

[95] ↵
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32:268–274.
OpenUrl CrossRef PubMed

[96] Oaks JR. 2011a. Data from: A time-calibrated species tree of Crocodylia reveals a recent radiation of the true crocodiles. In: Dryad Data Repository.

[97] Oaks JR. 2011b. A time-calibrated species tree of Crocodylia reveals a recent radiation of the true crocodiles. Evolution 65:3285–3297.
OpenUrl CrossRef PubMed Web of Science

[98] ↵
Paton T, Haddrath O, Baker AJ. 2002. Complete mitochondrial DNA genome sequences show that modern birds are not descended from transitional shorebirds. Proceedings of the Royal Society of London B: Biological Sciences 269:839–846.
OpenUrl GeoRef PubMed

[99] ↵
Philippe H, Brinkmann H, Copley RR, Moroz LL, Nakano H, Poustka AJ, Wallberg A, Peterson KJ, Telford MJ. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255.
OpenUrl CrossRef PubMed Web of Science

[100] ↵
Philippe H, Delsuc F, Brinkmann H, Lartillot N. 2005. Phylogenomics. Annu Rev Ecol Evol S 36:541–562.
OpenUrl

[101] ↵
Phillips MJ, Delsuc Fdr, Penny D. 2004. Genome-Scale Phylogeny and the Detection of Systematic Biases. Mol. Biol. Evol. 21:1455–1458.
OpenUrl CrossRef PubMed Web of Science

[102] Rightmyer MG, Griswold T, Brady SG. 2013a. Data from: Phylogeny and systematics of the bee genus Osmia (Hymenoptera: Megachilidae) with emphasis on North American Melanosmia: subgenera, synonymies, and nesting biology revisited. In: Dryad Data Repository.

[103] Rightmyer MG, Griswold T, Brady SG. 2013b. Phylogeny and systematics of the bee genus Osmia (Hymenoptera: Megachilidae) with emphasis on North American Melanosmia: subgenera, synonymies and nesting biology revisited. Syst. Entomol. 38:561–576.
OpenUrl CrossRef Web of Science

[104] ↵
Roberts D, Yang Z. 1995. On the use of nucleic acid sequences to infer early branchings in the tree of life. Mol. Biol. Evol. 12:451–458.
OpenUrl CrossRef PubMed Web of Science

[105] ↵
Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP. 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61:539–542.
OpenUrl CrossRef PubMed

[106] ↵
Rzhetsky A, Nei M. 1995. Tests of applicability of several substitution models for DNA sequence data. Mol. Biol. Evol. 12:131–151.
OpenUrl CrossRef PubMed Web of Science

[107] ↵
Salipante SJ, Horwitz MS. 2006. Phylogenetic fate mapping. Proceedings of the National Academy of Sciences 103:5448.
OpenUrl Abstract/FREE Full Text

[108] ↵
Sand A, Pedersen CNS, Brodal GS, Johansen J, Holt MK, Mailund T. 2014. tqDist: a library for computing the quartet and triplet distances between binary or general trees. Bioinformatics 30:2079–2080.
OpenUrl CrossRef PubMed

[109] Sauquet H, Ho SY, Gandolfo MA, Jordan GJ, Wilf P, Cantrill DJ, Bayly MJ, Bromham L, Brown GK, Carpenter RJ, et al. 2012. Testing the impact of calibration on molecular divergence times using a fossil-rich group: the case of Nothofagus (Fagales). Syst. Biol. 61:289–313.
OpenUrl CrossRef PubMed

[110] Sauquet H, Ho SYW, Gandolfo MA, Jordan GJ, Wilf P, Cantrill DJ, Bayly MJ, Bromham L, Brown GK, Carpenter RJ, et al. 2011. Data from: Testing the impact of calibration on molecular divergence times using a fossil-rich group: the case of Nothofagus (Fagales). In: Dryad Data Repository.

[111] Seago AE, Giorgi JA, Li J, Ślipiński A. 2011a. Data from: Phylogeny, classification and evolution of ladybird beetles (Coleoptera: Coccinellidae) based on simultaneous analysis of molecular and morphological data. In: Dryad Data Repository.

[112] Seago AE, Giorgi JA, Li J, Ślipiński A. 2011b. Phylogeny, classification and evolution of ladybird beetles (Coleoptera: Coccinellidae) based on simultaneous analysis of molecular and morphological data. Mol. Phylogen. Evol. 60:137–151.
OpenUrl CrossRef PubMed

[113] Sharanowski BJ, Dowling APG, Sharkey MJ. 2011a. Data from: Molecular phylogenetics of Braconidae (Hymenoptera: Ichneumonoidea) based on multiple nuclear genes and implications for classification. In: Dryad Data Repository.

[114] Sharanowski BJ, Dowling APG, Sharkey MJ. 2011b. Molecular phylogenetics of Braconidae (Hymenoptera: Ichneumonoidea), based on multiple nuclear genes, and implications for classification. Syst. Entomol. 36:549–572.
OpenUrl CrossRef Web of Science

[115] ↵
Sheffield NC, Song H, Cameron SL, Whiting MF. 2009. Nonstationary Evolution and Compositional Heterogeneity in Beetle Mitochondrial Phylogenomics. Syst. Biol. 58:381–394.
OpenUrl CrossRef PubMed Web of Science

[116] ↵
Shimodaira H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51:492–508.
OpenUrl CrossRef PubMed Web of Science

[117] ↵
Shimodaira H, Hasegawa M. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:1114–1116.
OpenUrl CrossRef Web of Science

[118] Siler C, Brown RM, Oliveros CH, Santanen A. 2013. Data from: Multilocus phylogeny reveals unexpected diversification patterns in Asian Wolf Snakes (genus Lycodon). In: Dryad Data Repository.

[119] Siler CD, Oliveros CH, Santanen A, Brown RM. 2013. Multilocus phylogeny reveals unexpected diversification patterns in Asian wolf snakes (genus Lycodon). Zool. Scr. 42:262–277.
OpenUrl CrossRef Web of Science

[120] ↵
Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313.
OpenUrl CrossRef PubMed Web of Science

[121] ↵
Steel MA, Penny D. 1993. Distributions of Tree Comparison Metrics - Some New Results. Syst. Biol. 42:126–141.
OpenUrl CrossRef Web of Science

[122] ↵
Stuart A. 1955. A Test for Homogeneity of the Marginal Distributions in a Two-Way Classification. Biometrika 42:412–416.
OpenUrl CrossRef Web of Science

[123] ↵
Sullivan J, Joyce P. 2005. Model selection in phylogenetics. Annual Review of Ecology Evolution and Systematics 36:445–466.
OpenUrl CrossRef Web of Science

[124] ↵
Sumner JG, Fernandez-Sanchez J, Jarvis PD. 2012. Lie Markov models. J. Theor. Biol. 298:16–31.
OpenUrl CrossRef PubMed Web of Science

[125] ↵
Swofford DL. 2001. Paup*: Phylogenetic analysis using parsimony (and other methods) 4.0. B5.

[126] ↵
Tarrío R, Rodríguez-Trelles F, Ayala FJ. 2001. Shared nucleotide composition biases among species and their impact on phylogenetic reconstructions of the Drosophilidae. Mol. Biol. Evol. 18:1464–1473.
OpenUrl CrossRef PubMed Web of Science

[127] Tolley KA, Townsend TM, Vences M. 2013a. Data from: Large-scale phylogeny of chameleons suggests African origins and Eocene diversification. In: Dryad Data Repository.

[128] Tolley KA, Townsend TM, Vences M. 2013b. Large-scale phylogeny of chameleons suggests African origins and Eocene diversification. Proc Biol Sci 280:20130184.
OpenUrl CrossRef PubMed

[129] Unmack PJ, Allen GR, Johnson JB. 2013a. Data from: Phylogeny and biogeography of rainbowfishes (Melanotaeniidae) from Australia and New Guinea. In: Dryad Data Repository.

[130] Unmack PJ, Allen GR, Johnson JB. 2013b. Phylogeny and biogeography of rainbowfishes (Melanotaeniidae) from Australia and New Guinea. Mol Phylogenet Evol 67:15–27.
OpenUrl CrossRef PubMed

[131] Wainwright PC, Smith WL, Price SA, Tang KL, Sparks JS, Ferry LA, Kuhn KL, Eytan RI, Near TJ. 2012. The evolution of pharyngognathy: a phylogenetic and functional appraisal of the pharyngeal jaw key innovation in labroid fishes and beyond. Syst. Biol. 61:1001–1027.
OpenUrl CrossRef PubMed

[132] Wainwright PC, Smith WL, Price SA, Tang KL, Sparks JS, Ferry LA, Kuhn KL, Near TJ. 2012. Data from: The evolution of pharyngognathy: a phylogenetic and functional appraisal of the pharyngeal jaw key innovation in labroid fishes and beyond. In: Dryad Data Repository.

[133] ↵
Weiss G, von Haeseler A. 2003. Testing Substitution Models Within a Phylogenetic Tree. Mol. Biol. Evol. 20:572–578.
OpenUrl CrossRef PubMed Web of Science

[134] Wood HM, Matzke NJ, Gillespie RG, Griswold CE. 2012. Data from: Treating fossils as terminal taxa in divergence time estimation reveals ancient vicariance patterns in the palpimanoid spiders. In: Dryad Data Repository.

[135] Wood HM, Matzke NJ, Gillespie RG, Griswold CE. 2013. Treating fossils as terminal taxa in divergence time estimation reveals ancient vicariance patterns in the palpimanoid spiders. Syst. Biol. 62:264–284.
OpenUrl CrossRef PubMed

[136] ↵
Woodhams MD, Fernandez-Sanchez J, Sumner JG. 2015. A New Hierarchy of Phylogenetic Models Consistent with Heterogeneous Substitution Rates. Syst. Biol. 64:638–650.
OpenUrl CrossRef PubMed

[137] Worobey M, Han G, Rambaut A. 2014a. Data from: A synchronized global sweep of the internal genes of modern avian influenza virus. In: Dryad Data Repository.

[138] Worobey M, Han GZ, Rambaut A. 2014b. A synchronized global sweep of the internal genes of modern avian influenza virus. Nature 508:254–257.
OpenUrl CrossRef PubMed Web of Science

[139] ↵
Yang Z. 1994. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39:105–111.
OpenUrl CrossRef PubMed Web of Science

[140] ↵
Yang Z, Rannala B. 2012. Molecular phylogenetics: principles and practice. Nat. Rev. Genet. 13:303–314.
OpenUrl CrossRef PubMed

[141] ↵
Yao H, Kristensen DM, Mihalek I, Sowa ME, Shaw C, Kimmel M, Kavraki L, Lichtarge O. 2003. An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol. 326:255–261.
OpenUrl CrossRef PubMed Web of Science

[142] ↵
Yao Y-G, Bravi CM, Bandelt H-J. 2004. A call for mtDNA data quality control in forensic science. Forensic Sci. Int. 141:1–6.
OpenUrl CrossRef PubMed Web of Science

[143] ↵
Yap VB, Speed T. 2005. Rooting a phylogenetic tree with nonreversible substitution models. BMC Evol. Biol. 5:2.
OpenUrl CrossRef PubMed

[144] ↵
Zou L, Susko E, Field C, Roger AJ. 2012. Fitting nonstationary general-time-reversible models to obtain edge-lengths and frequencies for the Barry–Hartigan model. Syst. Biol. 61:927–940.
OpenUrl CrossRef PubMed

The Prevalence and Impact of Model Violations in Phylogenetics Analysis

Abstract

Introduction

Materials and Methods

Empirical datasets

Workflow summary

Matched-pairs tests of homogeneity

Extending the matched-pairs tests of homogeneity to multiple sequence alignments

Maximum statistic approach

Phylogenetic inference

Distance between trees

Tree topology tests

Correlation between number of substitutions and model violation

Software implementation

Reproducibility

Results

Violation of SRH conditions is common across 35 empirical datasets

Model violation has a large influence on tree topologies

The number of substitutions explains less than fifth of the variance in passing or failing the tests of symmetry

Model violation affects the internal relationships of Spiralia and the position of Xenacoelomorpha

Discussion

Acknowledgments

References

Citation Manager Formats

Subject Area