PT - JOURNAL ARTICLE AU - Iakov I. Davydov AU - Marc Robinson-Rechavi AU - Nicolas Salamin TI - State aggregation for fast likelihood computations in phylogenetics AID - 10.1101/035063 DP - 2015 Jan 01 TA - bioRxiv PG - 035063 4099 - http://biorxiv.org/content/early/2015/12/23/035063.short 4100 - http://biorxiv.org/content/early/2015/12/23/035063.full AB - Codon models are widely used to identify the signature of selection at the molecular level and to test for changes in selective pressure during the evolution of genes encoding proteins. The large dimensionality of the Markov processes used to model codon evolution makes it difficult to use these models with large biological datasets. We propose here to use state aggregation to reduce the dimensionality of codon models and, thus, improve the computational performance of likelihood estimation on these models. We show that this heuristic speeds up the computations of the M0 and branch-site models up to 6.8 times. We also show through simulations that state aggregation does not introduce a detectable bias. We analysed a real dataset and show that aggregation provides highly correlated predictions compared to the full likelihood computations. Finally, state aggregation is a very general approach and can be applied to any continuoustime Markov process-based model with large dimensionality, such as amino acid and coevolution models. We therefore discuss different ways to apply state aggregation to Markov models used in phylogenetics.