TY - JOUR T1 - Approximate Likelihood Inference of Complex Population Histories and Recombination from Multiple Genomes JF - bioRxiv DO - 10.1101/077958 SP - 077958 AU - Champak R. Beeravolu AU - Michael J. Hickerson AU - Laurent A.F. Frantz AU - Konrad Lohse Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/09/28/077958.abstract N2 - The wealth of information contained in genome-scale datasets has substantially encouraged the development of methods inferring population histories with unprecedented resolution. Methods based on the Site Frequency Spectrum (SFS) are computationally efficient but discard information about linkage disequilibrium, while methods making use of linkage and recombination are computationally more intensive and rely on approximations such as the Sequentially Markov Coalescent. Overcoming these limitations, we introduce a novel Composite Likelihood (CL) framework which allows for the joint inference of arbitrarily complex population histories and the average genome-wide recombination rate from multiple genomes. We build upon an existing analytic approach that partitions the genome into blocks of equal (and arbitrary) size and summarizes the polymorphism and linkage information as blockwise counts of SFS types (bSFS). This statistic is a richer summary than the SFS because it retains information on the variation in genealogies contained in short-range linkage blocks across the genome. Our method, ABLE (Approximate Blockwise Likelihood Estimation), approximates the CL of arbitrary population histories via Monte Carlo simulations form the coalescent with recombination and overcomes limitations arising from analytical likelihood calculations. ABLE is first assessed by comparing it to expected analytic results for small samples and no intra-block recombination. The power of this approach is further illustrated by using whole genome data from the two species of orangutan and comparing our inferences under a series of models involving divergence and various forms of continuous or pulsed admixture with previous analyses based on the SFS and the SMC. Finally, we explore the effects of sampling (different block lengths and number of individuals) and find that accurate inference of demography and recombination can be achieved with reasonable computational effort. Our approach is also notably adapted to unphased data and fragmented assemblies making it particularly suitable for model as well as non-model organisms.Author Summary In this paper, we make use of the distribution of blockwise SFS (bSFS) patterns for the inference of arbitrary population histories from mutliple genome sequences. The latter can be whole genomes or of a fragmented nature such as RADSeq data. Our method notably allows for the simultaneous inference of demographic history and the genome-wide historical recombination rate. Additionally, we do not require phased genomes as the bSFS approach does not distinguish the sampled lineage in which a mutation occurred. As with the Site Frequency Spectrum (SFS), we can also ignore outgroups by folding the bSFS. Our Approximate Blockwise Likelihood Estimation (ABLE) approach implemented in C/C++ and taking advantage of parallel computing power is tailored for studying the population histories of model as well as non-model species. ER -