Generalization of the Ewens sampling formula to arbitrary fitness landscapes

PLoS One. 2018 Jan 11;13(1):e0190186. doi: 10.1371/journal.pone.0190186. eCollection 2018.

Abstract

In considering evolution of transcribed regions, regulatory sequences, and other genomic loci, we are often faced with a situation in which the number of allelic states greatly exceeds the size of the population. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from the population, do not change with time. In the absence of selection, the probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary fitness distributions. Although our approach is general, we focus on the class of fitness landscapes, inspired by recent high-throughput genotype-phenotype maps, in which alleles can be in several distinct phenotypic states. This class of landscapes yields sampling probabilities that are computationally more tractable and can form a basis for inference of selection signatures from genomic data. Using an efficient numerical implementation of the sampling probabilities, we demonstrate that, for a sizable range of mutation rates and selection coefficients, the steady-state allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other evolutionary parameters from population data. We also carry out numerical simulations to challenge various approximations involved in deriving our sampling formulas, such as the infinite-allele limit and the "full connectivity" assumption inherent in the Ewens theory, in which each allele can mutate into any other allele. We find that, at least for the specific numerical examples studied, our theory remains sufficiently accurate even if these assumptions are relaxed. Thus our framework establishes both theoretical and practical foundations for inferring selection signatures from population-level genomic sequence samples.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Models, Genetic*
  • Mutation
  • Transcription, Genetic*

Grants and funding

PK acknowledges financial support from a Peter Lindenfeld fellowship awarded by the Department of Physics and Astronomy, Rutgers University. AVM was supported in part through a collaboration with Los Alamos National Lab (LANL-DOE 20150236ER). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.