Abstract
Phylogenetic comparative methods have been used to model trait evolution, to test selection versus neutral hypotheses, to estimate optimal trait-values, and to quantify the rate of adaptation towards these optima. Several authors have proposed algorithms calculating the likelihood for trait evolution models, such as the Ornstein-Uhlenbeck (OU) process, in time proportional to the number of tips in the tree. Combined with gradient-based optimization, these algorithms enable maximum likelihood (ML) inference within seconds, even for trees exceeding 10,000 tips. Despite its useful statistical properties, ML has been criticised for being a point estimator prone to getting stuck in local optima. As an elegant alternative, Bayesian inference explores the entire information in the data and compares it to prior knowledge but, usually, needs much longer time, even on small trees. Here, we propose an approach to use the full potential of ML and Bayesian inference, while keeping the runtime within minutes. Our approach combines (i) a new algorithm for parallel traversal of the lineages in the tree, enabling parallel calculation of the likelihood; (ii) a previously published method for adaptive Metropolis sampling. In principle, the strategy of (i) and (ii) can be applied to any likelihood calculation on a tree which proceeds in a pruning-like fashion, leading to enormous speed improvements. We implement several variants of the parallel algorithm in the form of a generic C++ library, “SPLiTTree”, capable to choose automatically the optimal algorithm for a given task and computing platform. We give examples of models of discrete and continuous trait evolution that are amenable to parallel likelihood calculation. As a complete showcase, we implement the phylogenetic Ornstein-Uhlenbeck mixed model (POUMM) in the form of an easy-to-use and highly configurable R-package that calls the library as a back-end. In addition to the above-mentioned usage of comparative methods, POUMM allows to estimate non-heritable variance and phylogenetic heritability. Using SPLiTTree, calculating the POUMM likelihood on a 4-core SIMD-enabled processor is up to 10 times faster than serial implementations written in C and hundreds of times faster than serial implementations written in R. By combining SPLiTTree likelihood calculation with adaptive Metropolis sampling, the time for Bayesian POUMM inference on a tree of ten thousand tips is reduced from several days to a few minutes.