Abstract
Introduction Homologous recombination happens when a foreign DNA segment replaces a similar segment on the genome of a prokaryotic cell. For a genome pair, recombination affects their phylogenetic reconstruction in multiple ways: (i) a genome can recombine with a DNA segment that is similar to the other genome of the pair, thereby reducing their pairwise sequence divergence; (ii) a genome can also recombine with a segment from an outgroup-genome and increase the pairwise divergence. Most phylogenetic algorithms cannot account for recombination; while some do, they cannot account for all effects of recombination.
Results We develop a fast algorithm that takes recombination into account and reconstructs ultrametric-trees. Instead of considering individual positions of genome sequences, we use a coarse-graining approach, which divides a genome sequence into short segments. For each genome pair considered, our coarse-graining phylogenetic (CGP) algorithm enumerates the pairwise single-site-polymorphisms (SSPs) on each segment to obtain the pairwise SSP-distribution; we then fit each empirical SSP-distribution to a theoretical SSP-distribution. We test the performance of our algorithm against other state-of-the-art algorithms on simulated and real genomes. For genomes with a substantial level of recombination, such as E. coli, we show that the age of internal nodes calculated by CGP is more accurate than those predicted by other algorithms, while the reconstructed tree topology is at least as accurate.
Conclusion We develop a phylogenetic algorithm that accounts for recombination. It predicts ultrametric-trees more accurately than alternative algorithms, and is also substantially faster than the current state-of-the-art algorithms in recombination-aware phylogenetic reconstruction.