Evaluating fast maximum likelihood-based phylogenetic programs using empirical phylogenomic data sets

Xiaofan Zhou; Xingxing Shen; Chris Todd Hittinger; Antonis Rokas

doi:10.1101/142323

Abstract

Phylogenetics has witnessed dramatic increases in the sizes of data matrices assembled to resolve branches of the tree of life, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these four programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets from diverse animal, plant, and fungal lineages with respect to likelihood maximization, topological accuracy, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategy (ten RAxML searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation–based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but exhibited lower likelihood values and topological accuracy in both types of analyses. Finally, data matrix properties, such as the number of taxa and the information content, sometimes substantially influenced the relative performance of the programs. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses.

Introduction

Phylogenetic analysis – that is, the identification of the tree best representing the evolutionary history of the underlying data – is of fundamental importance to many biological disciplines, including but not limited to systematics, molecular evolution, and comparative genomics (Felsenstein 2003; Xia 2013; Hamilton 2014; Yang 2014). However, finding the best tree is an exceptionally difficult task because evaluation of each tree requires a considerable amount of calculations (Bryant, et al. 2005) as well as because the number of candidate strictly bifurcating trees grows very rapidly with the number of sequences (Felsenstein 1978) – for example, there are ~8 × 10²¹ possible rooted topologies for a set of 20 taxa. Therefore, fast programs that employ heuristic algorithms that can efficiently infer the best tree (or nearly as good alternatives) are of pivotal importance to phylogenetic analysis. This is evident by the success of the Neighbour-Joining (NJ) method, a distance-based clustering (instead of tree searching) algorithm (Saitou and Nei 1987) that is the most highly cited phylogenetic method (Van Noorden, et al. 2014). NJ and its variants (e.g. BIONJ which takes the variance of distance estimation into consideration) (Gascuel 1997; Bruno, et al. 2000) were among the few available options for analyzing large data sets until the 2000s, and are still widely used today to quickly produce good starting points for more sophisticated methods (e.g. (Guindon, et al. 2010; Nguyen, et al. 2015)).

It is now generally accepted that statistical methods, such as maximum likelihood (ML) (Felsenstein 1981), produce more reliable results than distance and parsimony methods (Yang and Rannala 2012; Whelan and Morrison 2017). However, ML-based methods are also computationally more expensive, necessitating the use of heuristic search algorithms for searching the enormity of tree space (Chor and Tuller 2005). Heuristic search algorithms typically adopt iterative, “hill-climbing” optimization techniques that involve three steps: (1) generate a quick starting tree (e.g. BIONJ tree, stepwise-addition parsimony tree, etc.); (2) modify the tree using certain topological rearrangement rules and evaluate the resultant trees under the ML criterion; and (3) replace the starting tree and repeat step 2 if the rearrangements identify a better tree, or otherwise terminate the search. The most common rearrangement algorithms for step 2 are Nearest-Neighbor-Interchange (NNI), where the four subtrees connected by a given internal branch are re-arranged to form two new, alternative topologies (Robinson 1971), and Subtree-Pruning-and-Regrafting (SPR), in which a given subtree is detached from the full tree and reinserted onto each of the remaining branches (Swofford, et al. 1996). SPR is more expansive in searching tree space than NNI since it can evaluate many more trees from one initial topology, but it is also much slower because of the extra tree evaluations.

Four of the most popular fast ML-based phylogenetic programs that differ in their choices or implementations of rearrangement algorithms are PhyML (Guindon and Gascuel 2003; Guindon, et al. 2010), RAxML / ExaML (Stamatakis 2014; Kozlov, et al. 2015), FastTree (Price, et al. 2010), and IQ-TREE (Nguyen, et al. 2015). First introduced in the early 2000s, PhyML has been one of the most widely used programs for ML-based phylogenetic inference (Guindon and Gascuel 2003). The original algorithm was based solely on NNI and achieved comparable performance as other contemporary ML methods but with much lower computational costs. The latest version of PhyML (version 20160530) performs hill-climbing tree searches using SPR rearrangements in early stages and NNI rearrangements in later stages of the tree search (Guindon, et al. 2010). Specifically, during the SPR-based search, candidate re-grafting positions are first filtered based on parsimony scores; the most parsimonious ones are then subject to approximate ML evaluation where branch-lengths are only re-optimized at the branches adjacent to the pruning and re-grafting positions. To accelerate the tree search, the best “up-hill” SPR move for each subtree is accepted immediately, potentially leading to the simultaneous application of multiple SPRs in one round. Once the search has converged to a single topology, the resultant tree is further optimized by NNI-based hill-climbing. Similar to the SPR stage, PhyML evaluates candidate NNIs only approximately by re-optimizing the five relevant branches, and may apply multiple NNI moves simultaneously at each round. The addition of the SPR algorithm in PhyML has significantly improved its accuracy, although at the cost of longer runtimes (Guindon, et al. 2010).

RAxML is another widely used program for fast estimation of ML trees (Stamatakis 2006, 2014). The latest version (8.2.10) implements the standard SPR-based hill-climbing algorithm and employs important heuristics to reduce the amount of unpromising SPR candidates, including: 1) candidate re-grafting positions are limited to only those within a certain distance from the pruning position (known as the “lazy subtree rearrangement”) (Stamatakis, et al. 2005); and 2) if the re-grafting to a candidate position results in substantially worse likelihood, all branches further away from that point will be ignored (Stamatakis, et al. 2007). As in PhyML, the approaches of approximate pre-scoring of SPR candidates and simultaneous SPRs are also used by RAxML to speed up the analysis (Stamatakis, et al. 2005). In addition to RAxML, its sister program ExaML is specifically engineered for large concatenated data sets (Kozlov, et al. 2015). As RAxML has exhibited excellent performance in both accuracy and speed (Stamatakis 2006), it is considered by many to be the state-of-the-art ML fast phylogenetic program.

Although both PhyML and RAxML represent great advances in developing fast and accurate phylogenetic programs, efforts aimed at improving the speed of ML tree estimation continue. For example, the recently developed FastTree program can be orders of magnitude faster than either PhyML or RAxML / ExaML (Price, et al. 2010). FastTree (latest version 2.1.10) first constructs an approximate NJ starting tree which is then improved under the minimum evolution criterion using both NNI and SPR rearrangements, followed by ML-based NNI rearrangements to search for the final tree. With computational efficiency at the very heart of its design, FastTree makes heavy use of heuristics at all stages to limit the numbers of tree searches and likelihood optimizations. As a tradeoff, FastTree generates less accurate tree estimates than SPR-based ML methods (Price, et al. 2010). The substantial edge of the FastTree program in speed has made it very popular, particularly in analyses of very large phylogenomic data matrices.

An important weakness of pure hill-climbing methods is that they can be easily trapped in local optima. The IQ-TREE program, the most recent of the four fast ML-based phylogenetic programs, was developed aiming to overcome this local optimum problem through the use of stochastic techniques (Nguyen, et al. 2015). Specifically, IQ-TREE (latest version 1.5.4) generates multiple starting trees instead of one and subsequently maintains a pool of candidate trees during the entire analysis. The tree inference proceeds in an iterative manner; at every iteration, IQ-TREE selects a candidate tree randomly from the pool, applies stochastic perturbations (e.g., random NNI moves) onto the tree, and then uses the modified tree to initiate a NNI-based hill-climbing tree search. If a better tree is found, the worst tree in the current pool is replaced and the analysis continues; otherwise, the iteration is considered unsuccessful and the analysis terminates after a certain number of unsuccessful iterations. IQ-TREE takes advantage of successful preexisting heuristics (e.g., lazy subtree rearrangement (Stamatakis, et al. 2005) and simultaneous NNIs (Guindon and Gascuel 2003)) and a highly-optimized implementation of likelihood functions (Flouri, et al. 2015) for better computational efficiency.

These four programs offer different tradeoffs between accuracy and speed in fast phylogenetic inference, and they may exhibit different behaviors toward diverse phylogenomic data sets whose properties (e.g. taxon number and gene number) and evolutionary characteristics (e.g. age of lineage, taxonomic range, and evolutionary rate) vary. Therefore, a good understanding of their relative performance across diverse empirical phylogenomic data matrices is critical to the success of phylogenetic inference when computational resources are limited. This is particularly relevant for large-scale studies using data matrices of ever-increasing data volumes and complexities. So far, these four programs have only been evaluated using simulated data (Guindon, et al. 2010; Price, et al. 2010; Liu, et al. 2011), which might not well approximate real data, and relatively small empirical data sets containing ~10 to ~200 gene alignments (Guindon, et al. 2010; Price, et al. 2010; Liu, et al. 2011; Money and Whelan 2012; Nguyen, et al. 2015; Chernomor, et al. 2016), which might lack generality. In these studies, RAxML and PhyML showed largely similar performance in identifying trees of higher likelihood scores (Guindon, et al. 2010; Money and Whelan 2012), while IQ-TREE exhibited improved efficiency compared to both RAxML and PhyML (Nguyen, et al. 2015; Chernomor, et al. 2016). On the other hand, FastTree was found to be much faster than RAxML and PhyML but reported lower likelihood scores for data sets with both small and large numbers of sequences (Guindon, et al. 2010; Price, et al. 2010; Liu, et al. 2011). However, it remains unclear if these patterns would hold for large empirical data sets and for species tree estimation based on genome-scale data.

To comprehensively evaluate the four fast ML-based phylogenetic programs (table 1), we used a large collection of 19 empirical phylogenomic data sets representing a wide range of properties, including data type, numbers of taxa and genes, and taxonomic range for diverse animal, plant, and fungal lineages (table 2; for details on the source of each data set, see supplementary table S1). For each of these data sets, we compared the performance of all programs for single-gene tree inference and, for coalescent-based and concatenation-based species tree inference, the two major current approaches to inferring species phylogenies from phylogenomic data (Liu, et al. 2015). In the coalescent approach, the species tree is estimated by considering all individually inferred single-gene trees using coalescent methods that take into account that the histories of genes may differ from those of species due to incomplete lineage sorting (fig. 1A), whereas in the concatenation approach, the species tree is estimated from the supermatrix derived by concatenating all single-gene alignments (fig. 1B).

Figure 1.

Schematics of the (A) single-gene tree inference test as well as the coalescent-based and (B) concatenation-based species tree inference tests used to evaluate the performance of fast phylogenetic programs in phylogenomic analysis.

View this table:

Table 1.

Overview of the four fast ML-based phylogenetic programs evaluated in this study.

View this table:

Table 2.

Overview of the 19 phylogenomic data sets included in this study.

In single-gene tree estimation, we found that, although the more comprehensive analysis strategy (ten RAxML searches per alignment) performed considerably better than fast strategies (one tree search per alignment using RAxML, PhyML, or IQ-TREE), all produced results of comparable quality when the inferred gene trees were used for coalescent-based species tree inference. For the concatenation-based species tree inference, we found that, in some cases, IQ-TREE recovered trees with higher likelihood scores than RAxML/ExaML, although both showed the best performance for most data sets. Importantly, IQ-TREE exhibited comparable or better speed in both coalescent-based and concatenation-based species tree inference compared with RAxML/ExaML. In contrast, FastTree produced significantly worse single-gene and species trees than the other three programs even when allowed to run multiple times, whereas PhyML did not scale well to supermatrices because the concatenation-based species tree inferences failed to complete for multiple data sets. Overall, our benchmarking of the four fast ML-based phylogenetic programs against 19 state-of-the-art data matrices is highly informative for the design of efficient data analysis strategies in phylogenomic studies.

Results and Discussion

A comprehensive collection of empirical data

For a comprehensive evaluation of the four fast ML-based phylogenetic programs, we retrieved 19 data sets from 14 recently published phylogenomic studies (table 2; see supplementary table S1 for detailed sources of each data set), representing a wide range of characteristics: 1) they include both amino acid and nucleotide data sets (nine and ten, respectively); 2) they contain either many taxa (e.g. D6, 200 taxa and 259 genes (Prum, et al. 2015)), many genes (e.g. D5a, 48 taxa and 14,448 genes (Jarvis, et al. 2014)), or both (e.g. A2, 144 taxa and 1,478 genes (Misof, et al. 2014)); 3) they cover three major taxonomic groups (i.e. animals, plants, and fungi) and various depths within each group (e.g. data sets D1 (Song, et al. 2012), A4 (Chen, et al. 2015), and A6 (Whelan, et al. 2015) cover mammals, vertebrates, and metazoans, respectively); and 4) they consist of sequence data derived from different technologies (e.g. some data sets were built entirely on whole genome sequences (Song, et al. 2012; Jarvis, et al. 2014; Shen, et al. 2016b; Tarver, et al. 2016), while some others contained mostly transcriptome sequencing data (Misof, et al. 2014; Wickett, et al. 2014; Yang, et al. 2015)). In addition, these data sets were assembled and curated in state-of-the-art phylogenomic studies and thus are of high quality. Therefore, these data sets are well suited for benchmarking the performance of fast phylogenetic programs in the context of phylogenomics.

Performance Test I: Single-gene tree inference

In the first test, we examined the performance of four fast ML-based phylogenetic programs (i.e. RAxML, PhyML, IQ-TREE, and FastTree) in inferring single-gene trees (fig. 1A). We designed five strategies: four in which each program was used to infer each gene tree from a single starting tree (these were named RAxML, PhyML, IQ-TREE, and FastTree), as well as one (named RAxML-10) in which RAxML was used to infer each gene tree from ten replicates (five, including the starting tree used in RAxML, were obtained via parsimony and the other five were random starting trees).

The five strategies were compared for the likelihood scores and topologies of their single-gene tree inferences, as well as for their computational speeds. Since the true evolutionary histories are unknown for the empirical data used here, we identified the tree with the highest likelihood score for each alignment (hereafter referred to as the “best-observed” tree) among trees inferred by the five strategies and the trees reported in previous studies, if available. These “best-observed” trees were used as the reference in the comparisons of likelihood score and topology.

Likelihood score maximization

We first examined the performance of the five strategies in likelihood score maximization on single-gene alignments (supplementary table S2) by calculating the frequencies with which each of the five strategies had the highest score (fig. 2). RAxML-10 had the highest frequency of finding the highest likelihood scores (90.37%) and reported the highest likelihood scores in ≥80% of the alignments in all data sets except for D5b, highlighting the benefit of using multiple starting trees. IQ-TREE was the second best strategy with an overall frequency of 50.24%. Importantly, the performances of RAxML-10 and IQ-TREE varied substantially among data sets; whereas their performances were very similar on several data sets (e.g. A1 and D1), in others RAxML-10 outperformed IQ-TREE by large margins (e.g. A2, D2a, and D2b). RAxML and PhyML were third and fourth, respectively, and had considerably lower overall frequencies (38.71% and 25.36%). RAxML performed better than IQ-TREE on only four (A2, A5, D2a, and D2b) data sets, whereas PhyML performed better than IQ-TREE on only one (A5) data set. However, none of these three strategies (i.e. RAxML, PhyML, and IQ-TREE) performed well on these data sets. Between RAxML and PhyML, the former found higher likelihood scores more often than the latter.

Figure 2. Performance of fast phylogenetic programs in single-gene tree inferences.

The bar-plots show the frequencies with which each of the six analysis strategies produced the best likelihoods for single-gene alignments in each of the (A) protein and (B) DNA data sets.

In comparison, the likelihood scores obtained by FastTree were much lower than those of the other four strategies; the program produced the highest likelihood scores in only 1.67% of all alignments. However, FastTree also had substantial advantages in computational speed compared to the others (see below). Since FastTree can initiate tree searches using distinct starting trees, we performed additional FastTree analyses for selected data sets, consisting of 100 tree searches for each alignment starting from 50 parsimony trees and 50 random trees. The results show that FastTree was still outperformed by other strategies even after compensating for the differences in runtime (supplementary table S3).

To further investigate the relative performance of RAxML, PhyML, and IQ-TREE, we carried out pairwise comparisons between each of them and the more comprehensive strategy RAxML-10. IQ-TREE found equally good or better likelihood scores than RAxML-10 for more than half of the alignments in 13 / 19 data sets, followed by RAxML (7 / 19) and PhyML (3 / 19) (supplementary fig. S1). Overall, IQ-TREE, RAxML, and PhyML found trees with likelihood scores equal or better to those of RAxML-10 in 54.04%, 40.81%, and 25.80% of all alignments, respectively. A similar pattern was also observed in pairwise comparisons between RAxML, PhyML, and IQ-TREE (supplementary fig. S2); IQ-TREE found trees with higher likelihood scores more often than both RAxML (for 14 out of 19 data sets) and PhyML (for 15 out of 19 data sets), while RAxML outperformed PhyML for all data sets. Cumulatively, IQ-TREE found higher likelihood scores for an additional 13.18% of alignments than RAxML, while IQ-TREE and RAxML respectively found higher likelihood scores for an additional 35.08% and 28.30%, respectively, of alignments than PhyML.

Topological accuracy

We also assessed the topological accuracy of the five strategies by comparing their tree inferences on each alignment against the corresponding “best-observed” tree (i.e. the tree with the highest likelihood score, which was used to approximate the true ML tree). Overall, there was a strong positive correlation between the differences in likelihood scores and the topological distances (measured by the normalized Robinson-Foulds, or nRF, distance (Robinson and Foulds 1981)) when comparing inferred trees to the best-observed trees (Spearman’s correlations of 0.85 for all alignments and above 0.90 for most data sets, p-values < 2.2×10⁻¹⁶ in all cases). In other words, strategies that yielded likelihood scores closest or equal to the best-observed likelihood scores tended to be those whose topologies were also closest or identical to the best-observed topologies (supplementary table S4; see fig. 3 for data set A8 as an example).

Figure 3. The performances of fast phylogenetic programs with respect to likelihood maximization and topological accuracy are positively correlated.

Dots in the scatter plot correspond to trees inferred by various analysis strategies from single-gene alignments in data set A8. Log-likelihood score differences between inferred trees and the “best-observed” trees are plotted against the corresponding topological distances. The log-likelihood score differences are shown in logarithmic scale (with the addition of a small value of 0.01). The violin plots on the top and right show the distributions of log-likelihood differences (top) and topological distances (right), respectively, for trees inferred by each strategy.

Among the five strategies, RAxML-10 showed the best performance in topological accuracy with median nRF distances of 0 for almost all data sets (supplemental table S4); this was unsurprising since RAxML-10 contributed most of the best-observed trees. IQ-TREE, RAxML, and PhyML also performed relatively well, with median nRF distances less than 0.01, 0.06, and 0.13, respectively, for most data sets. Here again, FastTree was behind the other strategies as it led to median nRF distances greater than 0.3 for most data sets.

Computational speed

To compare the computational speed of the five strategies, we plotted the runtimes of RAxML-10, PhyML, IQ-TREE, and FastTree against that of RAxML (fig. 4; supplementary table S5). We found strong positive correlations between the speeds of strategies over a wide range of runtimes (Spearman’ correlation ≥ 0.91 for all combinations of data types and strategies, p-values < 2.2×10⁻¹⁶ in all cases). As expected, RAxML-10, which conducts ten RAxML-based searches, took about ten times longer than RAxML (supplementary table S6). Interestingly, PhyML was ~1.5 times faster than RAxML on protein alignments, but ~3.1 times slower on DNA alignments. On the contrary, IQ-TREE was faster than RAxML for both protein and DNA data (~1.6 and ~1.3 times faster, respectively). Lastly, FastTree was substantially more time-efficient than RAxML on both DNA alignments (~47.9 times faster) and protein alignments (~95.4 times faster). In addition, the time advantage of FastTree was greater for alignments requiring longer runtimes; for instance, our linear regression analysis suggests that FastTree might run ~162.0 times faster than RAxML on the largest single protein alignments but only ~9.6 times faster on the smallest ones.

Figure 4. Runtime comparisons of fast phylogenetic programs in single-gene tree inferences.

The runtimes required by each strategy to analyze a randomly selected subset of all protein (top row) and DNA (bottom row) alignments are plotted against the corresponding runtimes of RAxML. All runtimes (in seconds) are shown in logarithmic scale.

Overall, our results at the level of single-gene tree inference are consistent with previous, smaller-scale studies on the better efficiency of IQ-TREE relative to RAxML and PhyML (all using one search per alignment) (Nguyen, et al. 2015), and the inferior performance of FastTree in likelihood score maximization as compared to other programs (Guindon, et al. 2010). However, in contrast to previous observations (Guindon, et al. 2010), we found that RAxML consistently outperformed PhyML in all data sets. This difference might be due to the small number of alignments examined in the previous study (Guindon, et al. 2010) and the numerous updates of both programs since then.

Performance Test II: coalescent-based species tree inference

In the second test, we assessed the fast ML-based phylogenetic programs in the context of the “two-step” coalescent-based species tree inference, in which single-gene trees were first estimated from individual alignments by each examined strategy and then used collectively to infer the species tree under a coalescent model (fig. 1A) (Liu, et al. 2015). Here, we used the single-gene trees produced in the Performance Test I as input for the ASTRAL program (Mirarab and Warnow 2015), which was used to infer coalescent-based species trees. The five strategies were then compared for the accuracy of their species tree inferences by using the species tree estimated from the best-observed gene trees as the reference.

We first determined for each data set the topological distances between the species tree inferred from the best-observed single-gene trees and those inferred from the gene trees inferred by each of the five strategies. RAxML-10, RAxML, PhyML, and IQ-TREE displayed comparably high levels of topological accuracy in their species tree estimations (median nRF distances ranged between 0.02 and 0.04 across data sets), whereas FastTree had lower topological accuracy (median nRF distances of 0.121). Nonetheless, for most strategies and data sets, the species tree estimates were much more accurate than the corresponding single-gene tree inferences (table 3; supplementary table S7).

View this table:

Table 3.

Normalized Robinson-Foulds distances between the coalescent-based species trees estimated from gene trees inferred by various strategies and the “best-observed” gene trees.

We further assessed the confidence levels (measured by quartet-based posterior probability, or PP, support (Sayyari and Mirarab 2016)) of the incongruent bipartitions or splits identified in the abovementioned species tree comparison in the set of single gene trees inferred by FastTree. Worryingly, the incongruent splits between the species tree inferred using FastTree-generated gene trees as input and the best-observed species tree received significantly higher PP supports (fig. 5; see supplementary table S8 for the results of Wilcoxon rank-sum tests), the median PP values of which were 0.81 for protein data sets and close to 1 for DNA data sets. Both of these values were much higher than those of other four strategies, which were all below 0.60 and 0.71 for protein and DNA data sets, respectively.

Figure 5. Incongruent splits in coalescent-based species trees estimated by RAxML-10, RAxML, PhyML, and IQ-TREE are weakly supported.

The violin plots show the distribution of local posterior probabilities for incongruent splits in coalescent-based species trees estimated by various analysis strategies. Here, incongruent splits are defined as the splits that are not present in species trees estimated from best-observed single-gene trees. The areas of violin plots are proportional to the total numbers of incongruent splits. The grey dots and bars in each violin plot indicate the median and the first/third quartiles of the local posterior probabilities, respectively.

Performance Test III: Concatenation-based species tree inference

In the third test, we examined the relative performance of the four programs in concatenation analysis of 17 taxon-rich and gene-rich supermatrices (we conducted concatenation analyses on 17, rather than 19, data matrices because: a) D5a and D5b correspond to different partitioning strategies from the same supermatrix (Jarvis, et al. 2014), and b) D2a does not have a corresponding supermatrix available from the original study (Misof, et al. 2014)) (fig. 1B; table 2). Here, we again focused on the programs’ performance on likelihood score maximization, topological accuracy, and computational speed. However, as PhyML required exceedingly high runtime, memory, or crashed on multiple data sets, its results are not included in the evaluation. In addition to our analyses, all the supermatrices have also been previously extensively analyzed using either RAxML or ExaML (e.g. (Jarvis, et al. 2014; Misof, et al. 2014; Wickett, et al. 2014)). Therefore, we included the reported likelihood scores and topologies – we refer to them as “RAxML/ExaML-published” trees – in our examination of relative performance.