RT Journal Article SR Electronic T1 Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data JF bioRxiv FD Cold Spring Harbor Laboratory SP 021519 DO 10.1101/021519 A1 Lucas Amenga-Etego A1 Ruiqi Li A1 John D. O’Brien YR 2016 UL http://biorxiv.org/content/early/2016/05/03/021519.abstract AB The advent of whole-genome sequencing has generated increased interest in modeling the structure of strain mixture within clinicial infections of Plasmodium falciparum (Pf). The life cycle of the parasite implies that the mixture of multiple strains within an infected individual is related to the out-crossing rate across populations, making methods for measuring this process in situ central to understanding the genetic epidemiology of the disease. In this paper, we show how to estimate inbreeding coefficients using genomic data from Pf clinical samples, providing a simple metric for assessing within-sample mixture that connects to an extensive literature in population genetics and conservation ecology. Features of the P. falciparum genome mean that some standard methods for inbreeding coefficients and related F-statistics cannot be used directly. Here, we review an initial effort to estimate the inbreeding coefficient within clinical isolates of P. falciparum and provide several generalizations using both frequentist and Bayesian approaches. The Bayesian approach connects these estimates to the Balding-Nichols model, a mainstay within genetic epidemiology. We provide simulation results on the performance of the estimators and show their use on ~ 1500 samples from the PF3K data set. We also compare the results to output from a recent mixture model for within-sample strain mixture, showing that inbreeding coefficients provide a strong proxy for the results of these more complex models. We provide the methods described within an open-source R package pfmix.