PT - JOURNAL ARTICLE AU - Fernando Racimo AU - Joshua G. Schraiber TI - Nonparametric inference of the distribution of fitness effects across functional categories in humans AID - 10.1101/002345 DP - 2014 Jan 01 TA - bioRxiv PG - 002345 4099 - http://biorxiv.org/content/early/2014/02/04/002345.short 4100 - http://biorxiv.org/content/early/2014/02/04/002345.full AB - Quantifying the proportion of polymorphic mutations that are deleterious or neutral is of fundamental importance to our understanding of evolution, disease genetics and the maintenance of variation genome-wide. Here, we develop an approximation to the distribution of fitness effects (DFE) of segregating single-nucleotide mutations in humans. Unlike previous methods, we do not assume that synonymous mutations are neutral, or rely on fitting the DFE of new nonsynonymous mutations to a particular parametric probability distribution, which is poorly motivated on a biological level. We rely on a previously developed method that utilizes a variety of published annotations (including conservation scores, protein deleteriousness estimates and regulatory data) to score all mutations in the human genome based on how likely they are to be affected by negative selection, controlling for mutation rate. We map this score to a scale of fitness coefficients via maximum likelihood using diffusion theory and a Poisson random field model. We then use our coefficient mapping to quantify the distribution of all scored single-nucleotide polymorphisms in Yoruba and Europeans. Our method serves to approximate the DFE of any type of segregating mutations, regardless of its genomic consequence, and so allows us to compare the proportion of mutations that are negatively selected or neutral across various genomic categories, including different types of regulatory sites. We observe that the distribution of intergenic polymorphisms is highly leptokurtic, with a strong peak at neutrality, while the distribution of nonsynonymous polymorphisms is bimodal, with a neutral peak and a second peak at s ≈ −10−4. Other types of polymorphisms have shapes that fall roughly in between these two.Author Summary The relative frequencies of polymorphic mutations that are deleterious, nearly neutral and neutral is traditionally called the distribution of fitness effects (DFE). Obtaining an accurate approximation to this distribution in humans can help us understand the nature of disease and the mechanisms by which variation is maintained in the genome. Previous methods to approximate this distribution have relied on fitting the DFE of new mutations to standard parametric probability distributions, like a normal or an exponential distribution. Here, we provide a novel method that does away with using parametric DFE approximations by relying on genomic scores designed to reflect the strength of negative selection operating on any site in the human genome. We use a maximum likelihood mapping approach to fit these scores to a scale of neutral and negative fitness coefficients. Finally, we compare the shape of the DFEs we obtain from this mapping across populations as well as different types of functional categories. We observe a highly leptokurtic distribution of polymorphisms, with a strong peak at neutrality, as well as a second peak of deleterious effects when restricting to nonsynonymous polymorphisms.