PT  - JOURNAL ARTICLE
AU  - Benjamin M Peter
TI  - Admixture, Population Structure and &lt;em&gt;F&lt;/em&gt;-statistics
AID  - 10.1101/028753
DP  - 2015 Jan 01
TA  - bioRxiv
PG  - 028753
4099  - http://biorxiv.org/content/early/2015/10/09/028753.short
4100  - http://biorxiv.org/content/early/2015/10/09/028753.full
AB  - Many questions about human genetic history can be addressed by examining the patterns of shared genetic variation between sets of populations. A useful methodological framework for this purpose are F-statistics, that measure shared genetic drift between sets of two, three and four populations, and can be used to test simple and complex hypotheses about admixture between populations. Here, we put these statistics in context of phylogenetic and population genetic theory. We show how measures of genetic drift can be interpreted as branch lengths, paths through an admixture graph or in terms of the internal branches in coalescent trees. We show that the admixture tests can be interpreted as testing general properties of phylogenies, allowing us to generalize applications for arbitrary phylogenetic trees. Furthermore, we derive novel expressions for the F-statistics, which enables us to explore the behavior of F-statistic under population structure models. In particular, we show that population substructure may complicate inference.Author Summary For the analysis of genetic data from hundreds of populations, a commonly used technique are a set of simple statistics on data from two, three and four populations. These statistics are used to test hypotheses involving the history of populations, in particular whether data is consistent with the history of a set of populations forming a tree.Here, we provide context to these statistics by deriving novel expressions and by relating them to approaches in comparative phylogenetics. These results are useful because they provide a straightforward interpretation of these statistics under many demographic processes and lead to simplified expressions. However, the result also reveals the limitations of F-statistics, in that population substructure may complicate inference.