TY - JOUR T1 - Effects of variable mutation rates and epistasis on the distribution of allele frequencies in humans JF - bioRxiv DO - 10.1101/048421 SP - 048421 AU - Arbel Harpak AU - Anand Bhaskar AU - Jonathan K. Pritchard Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/09/25/048421.abstract N2 - The site frequency spectrum (SFS) has long been used to study demographic history and natural selection. Here, we extend this summary by examining the SFS conditional on the alleles found at the same site in other species. We refer to this extension as the “phylogenetically-conditioned SFS” or cSFS. Using recent large-sample data from the Exome Aggregation Consortium (ExAC), combined with primate genome sequences, we find that human variants that occurred independently in closely related primate lineages are at higher frequencies in humans than variants with parallel substitutions in more distant primates. We show that this effect is largely due to sites with elevated mutation rates causing significant departures from the widely-used infinite sites mutation model. Our analysis also suggests substantial variation in mutation rates even among mutations involving the same nucleotide changes. We additionally find evidence for epistatic effects on the cSFS: namely, that parallel primate substitutions at nonsynonymous sites are more informative about constraint in humans when the parallel substitution occurs in a closely related species. In summary, we show that variable mutation rates and local sequence context are important determinants of the SFS in humans. ER -