Abstract
In numerous applications, from working with animal models to mapping the genetic basis of human disease susceptibility, it is useful to know whether a single disrupting mutation in a gene is likely to be deleterious1–4. With this goal in mind, a number of measures have been developed to identify genes in which protein-truncating variants (PTVs), or other types of mutations, are absent or kept at very low frequency in large numbers of healthy individuals—genes that appear intolerant to mutation3,5–9. One measure in particular, pLI, has been widely adopted7. By contrasting the observed versus expected numbers of PTVs, it aims to classify genes into three categories, labelled null, recessive and haploinsufficient7. Here we discuss how pLI and similar measures relate to population genetic parameters and why they reflect the strength of selection acting on heterozygotes, rather than dominance or haploinsufficiency.
Experimental biologists and human geneticists are often interested in whether a single disrupting mutation, be it a protein-truncating variant (PTV) or a missense mutation, is likely to have a phenotypic effect. A related question is whether a single disrupting mutation is likely to have a deleterious effect, that is whether it will lead to a reduction in fitness of its carrier. While the terms haploinsufficient and dominant are often used interchangeably, the relationship between effects on phenotypes and on fitness is not straight-forward. For instance, a single mutation could lead to a clinically important phenotype, indicating that the gene is haploinsufficient or that there is a gain of function, yet have small or negligible effects on fitness unless homozygous. Examples include ELN and BRCA2, genes in which a single PTV leads to a severe disease, but where the fitness effect on heterozygotes is likely quite small because the disease is late onset (while homozygote PTVs are lethal)10–13. Conversely, a mutation in a highly pleiotropic gene can have very weak phenotypic effects, yet inflict a severe cost on fitness.
Following common practice in human genetics (e.g.,4), we refer to genes in which a single disrupting mutation has a discernable phenotypic effect in heterozygotes as haploinsufficient (at least with regard to that phenotype); we note, however, that a phenotypic effect of a single mutation could also be due to a gain of function. In turn, we describe genes in which a single disrupting mutation has a fitness effect in heterozygotes as at least partially dominant. More precisely, following the convention in population genetics, we denote the fitnesses 1, 1 – hs, and 1 – s as corresponding, respectively, to genotypes AA, AD, and DD, where D is the deleterious allele, h is the dominance coefficient, and s is the selection coefficient. A mutation is completely recessive if h is equal to 0 and at least partially dominant if h is not near 0. This definition of dominance differs from one often used in population genetics (where dominance is defined as h > 0.5), but has more direct relevance for the expected frequency of deleterious mutations14 (Box 1).
Estimating the strength of selection acting on a gene, as summarized by the selection coefficient (s) and dominance effects (h) of mutations, has a long tradition in population genetics15–18. In model organisms, these efforts have taken the form of mutation accumulation experiments and assays of gene deletion libraries15,19–21; in humans and other species, these parameters have been inferred from polymorphism data22–26. The statistical inferences are based on the notion of a mutation-selection balance, namely that the frequencies of deleterious alleles reflect a balance between the rate at which they are purged from the population and the rate at which they are replenished by mutation. Mutations with larger hs are purged more effectively and hence are expected at lower frequencies in the population—or, equivalently, are more likely to be absent from large samples (Box 1). Therefore, one way to identify genes whose loss is likely to reduce fitness is to assess whether disrupting mutations are found at lower frequencies than expected under some sensible null model.
Deleterious alleles are introduced into the population by mutation, then change in frequency due to the combined effects of genetic drift, demography and natural selection. Unless a disease mutation confers an advantage in some environments (e.g., the sickle cell allele in populations inflicted by malaria27), the frequency at which it will be found it in a population reflects a balance between the rate at which it is introduced by mutation and removed by drift and purifying selection28–30.
This phenomenon is referred to as mutation-selection-drift balance and modeled as follows (e.g., see31). Let u be the mutation rate from the wild type allele A to deleterious allele D. This mutation rate can be defined per site or per gene, by summing the mutation rate to deleterious alleles across sites (this simple summing implicitly assumes that there is no complementation and compound heterozygotes for deleterious alleles have the same fitness effects as homozygotes32). The fitnesses for diploid individuals carrying genes with wild-type (A) or deleterious (D) alleles are given by where s is the selection coefficient measuring the fitness of DD relative to AA and h is the dominance coefficient, such that hs is the reduction in fitness of AD relative to AA.
In the limit of an infinite, panmictic population (i.e., ignoring genetic drift and inbreeding), when h>0 (and hs >> u), the equilibrium frequency of the deleterious D allele, q, is approximately29:
Notably, when h>0, the equilibrium frequency q is determined by the strength of selection in heterozygotes, i.e., the joint effect of hs, because the frequency of deleterious homozygotes is too low for selection on them to have an appreciable effect. Hence, in this approximation, for a given hs, different combinations of h and s will yield the same value of q.
For a completely recessive allele (h=0), in turn, q is well approximated by29:
Here, the equilibrium frequency is necessarily determined by selection in homozygotes. In this limit of an infinite population size, the same frequency q of a recessive allele with s >0 can also arise from a dominant allele for some value of hs >0.
In a finite population, there is a distribution of deleterious allele frequencies rather than a single (deterministic) value for given values of h and s. This distribution was derived for a constant population size by Wright30 and is again a function of hs jointly (assuming that 2Nehs >> 1 and setting aside the case of sustained, high levels of inbreeding33). The resulting distribution can be highly variable, reflecting both stochasticity in the mutation process and the variance due to the evolutionary process (i.e., due to genetic drift). Dramatic changes in population size, as experienced by human populations, can also have a marked effect on the distribution of deleterious alleles. Regardless of these complications, it remains the case that distinguishing complete recessivity (h = 0) from small hs may not be feasible and that, other than for complete recessivity, the expected allele frequency is a function of hs, not h and s separately14.
To our knowledge, this population genetic approach was introduced as a tool for prioritizing human disease genes by Petrovski et al.5, who ranked genes by comparing the observed number of common PTVs and missense mutations to the total number of observed variants. This statistic was then supplemented by a number of others6,34–36, notably pLI, which is defined as the probability of being loss of function intolerant7. pLI is derived from a comparison of the observed number of PTVs in a sample to the number expected in the absence of fitness effects (i.e., under neutrality), given an estimated mutation rate for the gene. To build this score, Lek et al.7 assumed that the number of PTVs in a gene is Poisson distributed with mean λM, where M is the expected number of PTVs under neutrality (estimated for each gene based on a mutation model 6 and the observed synonymous polymorphism counts). The authors considered that a gene can be neutral with respect to fitness (with λNull = 1), reces-sive (λRec = 0.463) or haploinsufficient (λH1 = 0.089). The fixed values of λRec and λH1 were obtained from the average reduction in the number of observed PTVs relative to a neutral expectation in genes classified as recessive and haploinsufficient, respectively; the classification was based on the phenotypic effects of mutations in the ClinGen dosage sensitivity gene list and a hand curated gene set of Mendelian disorders37. Using this model, Lek et al.7 first estimated the proportion of human genes in each of their three categories and then, for any given gene, obtained the maximum a posteriori probability that it belongs to each of the categories. Genes with high probability (set at > 0.9) of belonging to the set parameterized by λH1 were classified as extremely loss of function intolerant7.
The pLI measure has been broadly applied in human genetics to help identify genes in which a single disrupting mutation is likely of clinical significance4,38–45. pLI is also increasingly used in clinical annotation and in databases of mouse models as indicative of haploinsufficiency and dosage sensitivity46–50. In fact, however, pLI and related measures are not directly informative about dominance effects on fitness, let alone about the degree of haploinsufficiency with respect to a phenotype, and instead reflect only the strength of selection acting in heterozygotes.
The reason is that unless h is vanishingly small (or long-term inbreeding levels are very high), a reduction in the frequency of PTVs—and hence of PTV counts—is indicative of the strength of selection acting on heterozygotes, hs, and not of the two parameters h and s separately. This result derives from mutation-selection-drift balance theory developed by Haldane28,29, Wright30, and others51 (see Box 1). Intuitively, it reflects the fact that when there are fitness effects in heterozygotes, even subtle, deleterious alleles are kept at low frequency in the population, such that homozygotes for the deleterious allele are extremely rare; the efficiency with which the allele is purged then depends almost entirely on its effects in heterozygotes. Thus, the frequencies of PTVs—and therefore pLI and related measures—depend on the strength of selection acting on heterozygotes.
To illustrate this point, we modelled how the observed count of PTVs in a gene of typical length (and hence pLI) depends on h and s, under a constant size population (Fig 1A) as well as under a more realistic model for human demographic history52 (Fig 1B). As can be seen, markedly different combinations of h and s lead to indistinguishable distributions of PTV counts (and hence of pLI values), so long as hs is the same (Fig 1A, B). More generally, the probability of observing a specific PTV count is maximized along a ridge corresponding to combinations of h and s that result in a given hs value (Fig 1C). One implication is that pLI can be near 1 when the dominance coefficient h is small, provided s is sufficiently large—and more generally that pLI is not indicative of dominance or haploinsufficiency per se.
Although these considerations make clear that pLI should be thought of as reflecting hs, it was not designed to be an estimator of this parameter, and has several problematic features as such. First, for a given value of hs, the expected value of pLI depends on gene length (Fig 1D). Second, for a typical gene length and a wide range of realistic values of hs, the distribution of pLI is highly variable and bimodal, covering most of the range from 0 to 1 (Fig 1E). Consequently, two genes with the same hs can be assigned radically different pLI values and conversely, the same pLI value can reflect markedly different hs values (Fig 1E). Outside this range of hs values, pLI is almost uninformative about the underlying parameter: below, pLI is ~ 0 for any value of hs and above it, when hs is large (approximately > 10%), it is always ~ 1. This property of pLI taking values of either 0 or 1 is only worsened with increasing gene length (Fig 1D). Thus, if the goal is to learn about selective effects in heterozygotes, a direct estimate of hs under a plausible demographic model is preferable (e.g., 9), together with a measure of uncertainty.
Recasting pLI in a population genetic framework further helps to understand why the recessive assignments are less reliable7. Lek et al.7 aim to divide genes into three categories, two of which correspond to hs > 0 (pLI) and hs = 0, s = 0 (pNULL). Logically, the remaining category pREC should include the cases where hs = 0 but s > 0, i.e., complete reces-sivity, in which selection acts exclusively against homozygotes (Box 1). Regardless of the method used, however, it can be infeasible to distinguish this category from the hs > 0 case, because the same expected allele frequency (and hence PTV count) can arise in cases when h = 0 and when hs > 0 but small (see Box 1 and Fig 1F). As one example, ignoring genetic drift, for a typical mutation rate to disease alleles per gene of u = 10−6, the frequency of disease alleles would be 1% whether h = 0 and s – 10−2 or h =1 and s = 10−4 (Box 1). In other words, strongly deleterious, completely recessive PTVs are hard to distinguish from those that are weakly selected and at least partially dominant.
Why then, in practice, do pLI and related measures appear to successfully distinguish genes classified by clinicians as recessive vs dominant based on Mendelian disease phenotypes4,7,40? Mendelian disease genes consist mostly of cases in which mutations are known to cause a highly deleterious outcome, i.e., for which there is prior knowledge that s is likely to be large (even close to 1). When s is that large, a gene will be classified by pLI as haploinsufficient so long as h is not tiny, i.e., so long as fitness effects in heterozygotes are not small. For most genes, however, there is no prior knowledge about s, and in that case, pLI—or any measure based on the frequency of PTVs—cannot reliably distinguish recessivity from dominance, let alone identify haploin-sufficiency.
In summary, measures such as pLI and approaches based on related data summaries3,4,6,9,34,56,57 hold great promise for prioritizing genes in which mutations are likely to be harmful5 and learning about the fitness effects of mutations in heterozygotes9. Recasting these measures in terms of underlying population genetic parameters provides a natural framework for their interpretation and for the development of more reliable inferences.
Acknowledgments
We thank G. Coop, M.B. Eisen, M. Hurles, J.K. Pritchard, and Y. Shen for helpful discussions. This work was supported by GM128318 to ZF, GM126787 to JB, GM121372 to MP and GM115889 to GS. We acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR030893-01, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYS-TAR) Contract C090171.
References
- [1].↵
- [2].
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].
- [9].↵
- [10].↵
- [11].
- [12].
- [13].↵
- [14].↵
- [15].↵
- [16].
- [17].
- [18].↵
- [19].↵
- [20].
- [21].↵
- [22].↵
- [23].
- [24].
- [25].
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].
- [36].↵
- [37].↵
- [38].↵
- [39].
- [40].↵
- [41].
- [42].
- [43].
- [44].
- [45].↵
- [46].↵
- [47].
- [48].
- [49].
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵