ABSTRACT
In small populations, genetic linkage between a polymorphic neutral locus and loci subject to selection, either against partially recessive mutations or in favor of heterozygotes, may result in an apparent selective advantage to heterozygotes at the neutral locus (associative overdominance), and a retardation of the rate of loss of variability by genetic drift at this locus. In large populations, selection against deleterious mutations has previously been shown to reduce variability at linked neutral loci (background selection). We describe analytical, numerical and simulation studies that shed light on the conditions under which retardation versus acceleration of loss of variability occurs at a neutral locus linked to a locus under selection. We consider a finite, randomly mating population initiated from an infinite population in equilibrium at a locus under selection, with no linkage disequilibrium. With mutation and selection, retardation only occurs when S, the product of twice the effective population size and the selection coefficient, is of order one. With S ≫ 1, background selection always causes an acceleration of loss of variability. Apparent heterozygote advantage at the neutral locus is, however, always observed when mutations are partially recessive, even if there is an accelerated rate of loss of variability. With heterozygote advantage at the selected locus, there is nearly always a retardation of loss of variability. The results shed light on experiments on the loss of variability at marker loci in laboratory populations, and on the results of computer simulations of the effects of multiple selected loci on neutral variability.
There has recently been much interest in the effects of selection at one locus on patterns of evolution and variation at linked neutral or nearly-neutral loci, with mounting evidence for such effects from surveys of genome-wide patterns of molecular variability and evolution (Cutter and Payseur 2013; Neher 2013; Charlesworth and Campos 2014). Attention has been especially directed at the possibility of enhanced neutral variability at nucleotide sites that are closely linked to sites under long-term balancing selection (Charlesworth 2006; Gao et al. 2015), and at the reduction in variability caused by the hitchhiking effects of directional selection, involving either positive selection (selective sweeps) (Maynard Smith and Haigh 1974) or negative selection (background selection) (Charlesworth et al. 1993). There seems little doubt that linkage effects play an important role in shaping patterns of variability across the genome (Cutter and Payseur 2013; Charlesworth and Campos 2014).
Recent studies have, however, made little or no reference to the classical work on associative overdominance (AOD), which dates back to over 40 years ago Following a proposal by Frydenberg (1963), who coined the term, it was shown by Sved (1968, 1971, 1972) and by Ohta and Kimura (Ohta and Kimura 1970; Ohta 1971, 1973) that linkage disequilibrium (LD) between a polymorphic neutral locus and a locus subject either to selection in favour of heterozygotes, or to selection against recessive or partially recessive mutant deleterious alleles, could result in apparent heterozygote advantage at the neutral locus. This is because such LD, generated by genetic drift in a randomly mating finite population, leads to an association between homozygosity at the two loci (Haldane 1949). If homozygosity at the selected locus results in reduced fitness, homozygotes at the neutral locus will also suffer reduced fitness. A heuristic treatment of this effect is given by Charlesworth and Charlesworth (2010, pp. 396-7, 403-4).
This effect of LD in a randomly mating population should be distinguished from the effect of identity disequilibrium (ID) in a population with a mixture of random mating and matings between close relatives (Haldane 1949; Cockerham and Weir 1968), as exemplified by species that reproduce by a mixture of outcrossing and self-fertilization. Here, AOD can arise because of variation among individuals in their inbreeding coefficients, which causes correlations among loci in their levels of homozygosity even if they are unlinked (Haldane 1949; Cockerham and Weir 1968). Theoretical models show that this type of AOD does not require either finite population size or LD among loci (Ohta and Cockerham 1974; Charlesworth 1991). In a finite randomly mating population, however, LD and ID are formally equivalent (Weir et al. 1980; Bierne et al. 2000).
Associations between heterozygosity at putatively neutral molecular marker loci and higher values of fitness components have frequently been found in natural populations of many different types of organism, leading to a debate as to whether AOD due to LD with random mating or to ID caused by variation in inbreeding levels is the main cause of these associations (David 1998; Hansson and Westerberg 2002). Current evidence appears to favor the latter model (Szulkin et al. 2010; Hoffman 2014), especially because the LD model seems to require a relatively small effective population size to produce substantial effects, unless linkage is very tight (Ohta 1971, 1973).
It has also been reported that the level of variability in both quantitative traits and marker loci in laboratory populations maintained with known effective population sizes can sometimes decline less rapidly over time than is predicted by the standard neutral model (e.g., Rumball et al. 1994; Gilligan et al. 2005; Latter 1998). In addition, laboratory populations and populations of domesticated animals and plants often have levels of quantitative trait variability that are surprisingly high, given their low effective population sizes (Johnson and Barton 2005; Hill 2010). Sved and Ohta proposed that AOD caused by randomly generated LD leads to a retardation in the loss of variability at neutral loci linked to loci under selection; Ohta suggested that this effect could be substantial when the effects of partially recessive mutations distributed over a whole chromosome are considered, provided that the population size is sufficiently small (Ohta 1971, 1973). Computer simulations of multi-locus systems have shown that a retardation of the rate of loss of neutral variability can indeed occur in small populations (Latter 1998; Pamilo and Palsson 1998, 1999; Wang and Hill 1999; Wang et al. 1999). However, an accelerated rate of loss was observed when the level of dominance of deleterious mutations is sufficiently high and/or selection is sufficiently strong, due to the background selection (BGS) effect of deleterious mutations (Charlesworth et al. 1993), which causes a reduction in the effective population size experienced by linked neutral variants
AOD due to LD is an attractive potential explanation for the maintenance of unexpectedly high levels of variability in small populations. However, we lack a quantitative theory concerning the conditions under which it produces noticeable effects on levels of variability, other than for the cases of selfing and sib-mating lines studied by Wang and Hill (1999). It is also unclear when retardation versus acceleration of the rate of loss of variability is likely to prevail in randomly mating populations, especially with multiple loci subject to mutation and selection. As a first step toward such a theory, we present some analytical, numerical and simulation results on the simplest possible model: a neutral locus linked to another locus subject to selection. We also present simulation results for a neutral locus surrounded by a small number of selected loci. Selection can either involve partially recessive deleterious mutations or heterozygote advantage.
Our main focus is on a small, randomly mating population, founded from a large initial equilibrium population, mimicking experiments on the rates of loss of neutral variability in laboratory populations. We derive approximate expressions for the apparent selection coefficients against homozygotes at the neutral locus, as well for the rate of loss of neutral variability. We show that, contrary to what seems to have been widely assumed, these apparent selection coefficients have nothing to do with the rate of loss of variability. In particular, with selection against deleterious mutations there is a wide range of parameter space in which BGS accelerates the loss of variability, but an apparent selective advantage to heterozygotes at the neutral locus is always observed. When selection is very weak, AOD always causes a retardation of the loss of neutral variability, unless mutations are close to semidominance in their fitness effects.
Theoretical Models and Methods
Apparent selection coefficients against homozygotes at a neutral locus caused by a linked locus subject to mutation to deleterious alleles
We assume that the neutral locus is segregating for two alleles, A1 and A2, with frequencies x and y = 1 - x, respectively, in a given generation. The selected locus has wild-type and deleterious mutant alleles, B1 and B2, respectively, with frequencies p and q = 1 - p. The selection and dominance coefficients at this locus are h and s, such that the relative fitnesses of B1B1, B1B2, and B2B2 are 1, 1 - hs and 1 - s. The mutation rates for B1 to B2 and B2to B1 are u and v, respectively. Let the haplotype frequencies of A1B1, A1B2, A2B1 and A2B2 be y1, y2, y3, and y4. We have x = y1 + y2, q = y2 + y4. The frequency of recombination between the two loci is c.
With random mating, the apparent fitnesses (denoted by tildes) of the three genotypes A1A1, A1A2 and A2A2 at the neutral locus, conditional on these haplotype frequencies, are as follows (Sved 1968; Ohta 1971):
The extent of apparent selection at a neutral locus, induced by LD in a randomly mating finite population, can be assessed by determining the expectations of these apparent fitnesses over the distribution of haplotype frequencies induced by drift, conditioning on segregation at the A locus (Ohta and Kimura 1970; Ohta 1971). In the present case, the starting point is assumed to be a population of infinite size, at equilibrium under mutation and selection at the B locus and with no LD between the two loci. It is thereafter maintained at a population size of N breeding individuals each generation. For simplicity, a Wright-Fisher population model is assumed, so that N is also the effective population size.
The expectations for a given generation t after the foundation of the population are most simply found by expressing the haplotype frequencies in terms of the products of the relevant allele frequencies and the coefficient of linkage disequilibrium, D = y1 y4 - y2 y3. The expectation of D, E{D}, remains at zero, since selection, mutation and drift do not affect the direction of association between the two loci; this applies generally to E{Dqi}, where i is an arbitrary non-negative integer (a proof is given in section S4 of the Supplementary Information). This does not, however, imply that quantities involving the expectation of products of D and other functions of the allele frequencies at the two loci can be ignored, which complicates the analyses.
Simple algebra (Ohta 1971) shows that the expected apparent selection coefficients against A1A1 and A2A2 homozygotes (neglecting second-order terms in s) over a set of replicate populations that are segregating at the A locus are given by:
For simplicity, from now on we will simply refer to these as the apparent selection coefficients.
Approximations for the apparent selection coefficients
It has previously been assumed that the terms in D in these expressions can be neglected, so that the extent of apparent overdominance at the neutral locus is given by the terms in D2 alone. This allows approximations for the apparent selection coefficients to be derived; two different approaches have been used, as described by Sved (1968), Ohta and Kimura (1970) and Ohta (1971), and by Bierne et al. (2000), respectively. We will present an alternative approach here, which takes into account the initial allele frequency at the neutral locus in the founding generation, as well as the subsequent effects of drift in creating a probability distribution around this frequency.
The measures of apparent overdominance in Equations 2 can be conveniently be written as: where r2 = D2/[x(1-x)pq] is the squared correlation coefficient in allelic state between the two loci (Hill and Robertson 1968). The difference between Equations 2 and 3 in the notation for the apparent selection coefficients is intended to emphasize the use of terms in D alone in Equations 3.
In order to obtain useful approximate expressions for these apparent selection coefficients at an arbitrary time t, we assume that the probability distributions of x, q and r are independent of each other, which is of course not exact. We can then write:
The quantity s(1 - 2h)E{pq} is equal to the expectation of the inbred load, B, caused by the deleterious mutations, i.e., the expected difference in fitness between the logarithms of the mean fitness of a randomly mating population and of a completely homozygous population with the same allele frequencies (Greenberg and Crow 1960). Because of the loss of variability due to drift, E{B} will in general be smaller than the inbred load for an infinite population, B* = p*q*s (1 - 2h), where q* is the equilibrium frequency of B2 under mutation and selection in an infinite population (Glémin et al. 2003). If v ≪ u, as assumed here, q* is given by the following expression (Crow and Kimura 1970, p.260):
If hs is ≫ u and h > 0, this reduces to the familiar result of Haldane (1927):
This yields the widely used formula for the inbred load in an infinite population:
Weakly selected loci may contribute substantially to the inbred load and associative overdominance in the initial population, so that q* could be substantially greater than zero. The more general expressions for q* and B* will, therefore, be used here.
An exact analytic treatment of how E{B} changes over time in a finite population is, unfortunately, very difficult. Since the expected change in allele frequency due to selection is a third degree function of q when h ≠ 0.5, the third and higher moments of q enter into the expected change per generation in B under drift, mutation and selection, so that closed expressions cannot be obtained without using a full solution to the diffusion equation under selection with arbitrary h, (e.g., Balick et al. 2015). The simplest way to obtain tractable analytical results is to make the assumption that departures caused by drift from q* are sufficiently small that the change in allele frequency due to selection per generation, (∆sq), can be linearized around q*, and hence equated to (∆sq)q* + (q - q*)(d∆sq/dq)q* (e.g., Charlesworth and Charlesworth 2010, p.355). If the mutational contributions to the change in allele frequency are included, the net change in q is equal to (q - q*) (d∆sq/dq)q* - (u + v). The expected change in q in a finite population is then zero, and a recursion equation for the variance of q, Vq, can easily be obtained, yielding a simple expression for the approximate value of E{pq} in a given generation (see Appendix).
An approximate linear recursion equation for the expectation of r2 can be obtained in a similar way, following the approach of Sved (1972) (see Appendix). The quantity σ2d = E{D2}/E{xypq} is also frequently used as a descriptor of LD in finite populations, since it can be calculated by using diffusion equations (Ohta 1971; Ohta and Kimura 1971), as discussed below. While a simple recursion for σ2d does not exist, a heuristic approach is to assume that it has a similar form to that for r2, replacing the equilibrium value of r2 given by Sved’s approach by the (smaller) equilibrium value of σ2d (see Appendix). We can then substitute the expected value of σ2d for the corresponding expectation of r2 in Equations 3. Both of these approaches ignore any effects of selection on LD.
We also need to obtain the expectations of x−1 and (1 - x)−1 for a given generation, conditioned on segregation at the neutral locus. This can be done using the diffusion equation solution for the probability distribution at a biallelic locus under pure drift (Kimura 1955), using the terms involving the first few eigenfunctions of the power series representation of the probability distribution, conditioned on an initial allele frequency x0 (see Appendix). The approximate expectations of x−1 and (1 - x)−1 can then easily be determined (see Appendix). As shown by Fisher (1930), this distribution is asymptotically close to a uniform distribution, with some slight deviations in the terminal classes, and with a mean allele frequency of 0.5 regardless of the value of x0. The asymptotic values of x−1 and (1- x)−1 can then be calculated, which both approach 5.18 asymptotically for the population size of 50 used in the numerical examples below (Equation A11); this implies that the apparent selection coefficients against each homozygote should approach the same asymptotic value, provided that any effects of mutation at the neutral locus can be ignored over the timescale under consideration.
Apparent selection coefficients induced by linkage to a locus with heterozygote advantage
The same machinery can be used when the selected locus is segregating for a pair of alleles where the B1B1 and B2B2 homozygotes have fitnesses 1 - s and 1 - t relative to a fitness of 1 for B1B2: (Sved 1968; Ohta and Kimura 1970).
The approximate apparent selection coefficients are then given by:
Using the same argument that led to Equations 4, these apparent selection coefficients can be approximated by:
The expectation of the inbred load B in a finite population is now equal to (s + t) E{pq}, so that Equations 4 and 9 both involve the same function of E{B}. An approximation for E{pq} can be found by linearizing the recursion relation for q around the deterministic equilibrium, q* = s/(s + t), similarly to the procedure for the mutation-selection balance case. The coefficient κ that measures the rate of approach to equilibrium is now equal to -st/(s + t) (Charlesworth and Charlesworth 2010, p.355). In this case, no mutation is allowed at the locus under selection; this is because we are considering a single nucleotide site instead of a whole gene sequence, and mutations to new variants are unlikely to occur over the timescale with which we are concerned.
The effect of selection at a linked locus on the amount of neutral variability
Lack of relevance of the apparent selection coefficients to the rate of loss of variability
It was assumed in previous theoretical studies that the apparent selection coefficients obtained by ignoring terms in D in Equations 2 and 8 provide a measure of the extent to which the loss of variability at a neutral locus, caused by drift, is retarded by linkage to a selected locus (Sved 1968; Ohta and Kimura 1970; Ohta 1971). But this assumption is mistaken, as can be seen as follows. Indeed, the apparent selection coefficients obtained from these equations may provide approximations for the apparent reductions in the fitnesses of the homozygotes at the neutral locus, but have no relation to the rate of loss of variability.
This can be seen as follows for the case of deleterious mutations at the B locus; a similar argument holds for the case of heterozygote advantage. The expected change in x given by Equations 2 (neglecting second-order terms in s) is:
In other words, the terms in D contribute nothing to the change in the frequency of allele B1 at the neutral locus, which is determined entirely by the expected product of D and the additive effect of B1 on fitness, a(q) = s[h + q(1 - 2h)]. A result equivalent to this is used in models of selective sweeps on neutral allele frequencies at linked sites (Charlesworth and Charlesworth 2010, p.410). It is an example of the Price equation (Price 1970), since Da is the covariance between fitness and the allelic state with respect to A1 at the neutral locus (Santiago and Caballero 1995).
Since the change in the expectation of the heterozygosity, 2x(1 - x), at the neutral locus caused by the change in allele frequency due to selection at the B locus is approximated by the expectation of 2(1 - 2x)∆x, the accompanying change in the expected heterozygosity, H = 2E{xy}, is given by:
Similar results apply to the case of heterozygote advantage. Here, a(q) = qt - ps = q(s+t) - s, so that: and
We cannot, therefore, use the apparent selection coefficients to assess the extent to which loss of variability at the neutral locus is affected by selection at a linked locus. Previous results using these selection coefficients (Ohta and Kimura 1970; Ohta 1971, 1973; Bierne et al. 2000) thus do not provide reliable measures of the extent to which AOD retards the loss of neutral variability.
How to calculate the rate of loss of variability
In order to solve this problem, we have extended the linear diffusion operator approach (Ohta and Kimura 1971; Ohta 1971) to include the effects of drift on allele frequencies at the selected locus, ignoring third-order and higher-order moments of q around q*. This requires the use of a 9-dimensional vector Y of functions of allele frequencies and D, with a corresponding 9 x 9 recursion matrix, R (see section 1 of the Supplementary Information for details). The elements of Y are as follows: Y1 = E{xy}, Y2 = E{xyq}, Y3 = E{xyq2}, Y4 = E{D(x - y)}, Y5 =E{D(x - y)q}, Y6 = E{D(x - y)q2}, Y7 = E{D2}, Y8 = E{D2q}, and Y9 = E{D2q2}.
Iteration of the R matrix provides an approximation for σ2d in an arbitrary generation (which can used instead of E{r2} in Equations 3) as well as for H, since σ2d = Y7/(Y2 - Y3) and H = 2Y1.
Computer simulations
One neutral and one selected locus
The theoretical predictions described above were compared with Monte Carlo simulation results for a population started in mutation-selection balance at the selected locus and in linkage equilibrium with a neutral locus, and then run for a chosen number of generations at a fixed population size. In each generation, the new haplotype frequencies were calculated using the standard deterministic equations, and 2N uniform random numbers were generated to sample each new haplotype from the cumulative distribution of haplotype frequencies.
Since the effects of a single selected locus on a linked neutral locus are very small, it was necessary to run a large number of replicate simulations (usually 107) to obtain tight confidence intervals on the mean heterozygosity at the neutral locus
Multiple loci with selection and mutation
To simulate multiple loci subject to mutation and selection, we assumed additive fitness interactions across the relevant loci. For weak selection, this should yield very similar results to a multiplicative fitness model. The state of the population in a given generation is represented by the haplotype frequencies with respect to n loci subject to mutation and selection, following the same rules as in the single locus case, with a neutral locus in the centre of the chromosome. The recombination frequency between the loci under selection is c; the frequency of recombination between the neutral locus and each of the adjacent selected loci is c/2. Complete interference is assumed, so that only single crossovers are allowed, generating a frequency of recombination between the two terminal loci of (n - 1)c.
The frequencies of diploid genotypes at the start of a given generation are assumed to be given by random combinations the frequencies of the haplotypes determined in the previous generation, thus tacitly assuming an infinite number of new zygotes. The frequencies of haplotypes after recombination were calculated deterministically, using an extension of the three-locus algorithm described by Crow and Kimura (1970, p.51). The haplotype frequencies after mutation and selection were then calculated deterministically. (With weak effects of selection, mutation and recombination on genotype frequencies, the order in which these processes take place has a negligible effect on the final outcome.) The haplotype frequencies after drift were obtained by choosing 2N haplotypes from a pool with the new haplotype frequencies, in the same way as above.
FORTRAN programs for all the cases mentioned here are available on request to BC.
Results
A single selected locus with mutation and selection
In order to generalize the results described below as far as possible, we have used the principle from diffusion equation theory that the outcome of the evolutionary process is determined by the products of the effective population size, Ne, and the deterministic parameters, measuring time in units of 2Ne generations (Ewens 2004, p. 157). In the present case, where Ne = N, the simulation results for a given N and a set of mutation, selection and recombination parameters can be applied to a system with a population size of KN and deterministic parameters that are a factor of K−1 times those used here. For this reason, the selection and mutation parameters in most of the material described below are scaled by a factor of 2N, thereby avoiding reference to a specific population size. We use S = 2Ns, C = 2Nc, U= 2Nu and V = 2Nv to denote these scaled parameters.
An additional point to note is that the assumption that the initial state of the population has D = 0 implies that only the first three elements of the initial Y vector are non-zero. Without loss of generality, the elements of Y can be divided by the product of the initial frequencies of A1 and A2, x0y0, and the resulting vector, Y’, can then be iterated using the R matrix. The ratio of the first element of Y’ in generation t (1 - 1/2N)t measures the ratio of the mean heterozygosity at the neutral locus relative to its value in the absence of selection at the B locus, which we denote by Hrel. This procedure means that the matrix predictions for Hrel and σ2d are independent of the initial allele frequencies at the neutral locus, since the initial value of Y’ is independent of these frequencies. If, however, D were non-zero in the initial generation, only the asymptotic behaviour of the system would be independent of the initial allele frequencies.
Accuracy of the matrix approximation
We first consider the accuracy of the 9 x 9 matrix approximation. Figure 1 shows the values of Hrel and 2∆E{xy} = ∆Hsel from Equation 11 over 2N generations, for two different dominance coefficients and with C = 0.1. We are assuming that selection and mutation at the B locus relates to deleterious mutations affecting the gene as a whole rather than individual nucleotide sites, whereas the alleles at the A locus represent a pair of variants chosen by an experimenter as neutral markers, and are not affected by mutation over the time course of the experiment. The selection and mutation parameters were chosen such that the equilibrium frequency of the deleterious mutation at the B locus was approximately 0.3. This may seem very high, but is consistent with the typical frequency per gene of putatively deleterious nonsynonymous mutations in Drosophila populations (Haddrill et al. 2010), although the selection coefficient and mutation rates used in the figure are probably much higher than is realistic.
Agreement between the simulation and matrix results is remarkably good for this parameter set. A value of h = 0.1 is associated with an elevation of Hrel over one in later generations (retardation of loss of neutral variability), whereas h = 0.45 is associated with Hrel less than one (acceleration of loss of neutral variability). This illustrates the conflict between AOD and BGS, noted previously in simulation results (Pamilo and Palsson 1999; Wang and Hill 1999); for sufficiently low h, AOD is the dominant force as far as the level of neutral variability is concerned, whereas BGS dominates when h is sufficiently close to ½. We provide some insights into the reason for this behavior in the next but one section.
This conflict implies the existence of a critical value of h, hc, where the two effects exactly cancel each other; the properties of hc are examined in more detail below. Unfortunately, there is no exact fixed value of hc, since the extent to which Hrel is greater or less than one depends on the generation in question, as shown in Figure 2, which uses the matrix results. As h increases from 0 to 0.5, Hrel at both generation N and 2N declines. The insert shows hc is larger for the earlier generations. However, for the selection and mutation parameters used in Figure 2, hc is close to 0.37 for all generations.
When q* is small and S is of order 1 or less, agreement with the simulations is good only in earlier generations of the process, reflecting effects of such high a level of dispersion of allele frequencies at the B locus around q* that our neglect of higher-order terms is inaccurate. In general, however, the matrix gives reasonably good approximations up to 2N generations, although the extent of retardation of the rate of loss of variability tends to be underestimated (or the rate of acceleration overestimated) by the matrix approximation with small q* (See Figure S2 in Supplementary Information, Section 9). We will often use results for this generation as a standard, since it is of similar order to the duration of the Drosophila experiments on AOD described in the Discussion.
Apparent selection coefficients and their relation to the rate of loss of variability
These conclusions are confirmed by the results displayed in Table 1, which also shows the simulation results for an additional measure of variability, the proportion of segregating neutral loci, Ps, at 0.5N and 2N generations, for a range of recombination rates. In addition, the exact and approximate apparent selection coefficients are displayed, together with the matrix and simulation values for Hrel. It can be seen that the approximate apparent selection coefficients are similar in magnitude to the exact ones, but tend to exceed them, especially with very close linkage and for the neutral allele with the lower initial frequency. This is mainly because of the inaccuracy of the assumption of independence between the probability distributions of allele frequencies at the neutral and selected loci. Variability at the two loci is positively correlated, so that the expectation of pqx−1 in Equation 3a will tend to be smaller than E{pq}E{x−1} in Equation 4a.
With an initial frequency of A1 of 0.5, the apparent selection coefficients for A1A1 and A2A2 are always equal, as expected from Equations 3 and 4. In contrast, for an initial frequency of 0.1 there is initially a stronger apparent selection coefficient against A1A1, again as expected, but the selection coefficients converge on equality as the mean neutral allele frequency at segregating sites approaches 0.5; these are also similar to the selection coefficients for the case of an initial frequency of 0.5. The apparent selection coefficients at generation 2N are quite close to the asymptotic values given by Equations 4 when the B locus has reached mutation-selection-drift equilibrium, if E{pq} is calculated using the linear approximation described above. For example, with h = 0.1, C = 0.1 and the selection and mutation parameters in Table 1, the asymptotic apparent selection coefficient for populations segregating at the A locus is 0.0014 when the equilibrium neutral value of σd2 is used as the estimate of E{r2} in Equations 4, whereas the observed value at 2N generations for the case with x0= 0.5 is 0.0013. In contrast, the asymptotic value predicted by Equation 7 of Ohta (1971) is 0.0024, a somewhat worse fit. This probably reflects the fact that Ohta assumed a fixed allele frequency at the neutral locus, as well as q* ≪ 1.
As expected, the magnitudes of the apparent selection coefficients decline sharply as the recombination rate increases, but always indicate heterozygote advantage, even with C = 10 and h = 0.45. But with h = 0.45, there is an acceleration of the loss of variability at the neutral locus when linkage is tight, despite the apparent selective advantage to heterozygotes. This confirms the above conclusion that the apparent selection coefficients have no relation to the extent to which mutation and selection at one locus affect the rate of loss of variability at a linked neutral locus, and show heterozygote advantage even when BGS is causing an accelerated loss of variability. Furthermore, if C is ≫ 1, little effect of selection on the rate of loss of variability can be detected (Table 1).
Conflict between AOD and BGS with respect to the rate of loss of variability
The existence of a conflict between AOD and BGS, and its evident dependence on the dominance coefficient, raises the question of how to make generalizations about the regions of parameter space in which one or other force is dominant. One way of approaching this problem is to study the properties of the leading eigenvalue of the R matrix, λ0, as a function of the relevant variables. If we equate λ0 to 1 - 1/(2Ne), where Ne is the effective population size in the presence of selection at the B locus, 1/(1 - λ0) provides a measure of 2Ne. Consider first the limiting case when S = 2Ns ≫ 1 and u ≪ hs, so that the expected frequency of A2 is close to u/(hs) (Equation 5b). The approximation for λ0 - 1 for this case (Equations S17 and S18 of section 2 of the Supplementary Information) shows that Ne is given by the standard equation for BGS at a single locus (Charlesworth and Charlesworth 2010, p.402), which relies on these conditions.
As argued by Whitlock and Barton (1997), the value of 2Ne obtained from 1/(λ0 - 1) gives the asymptotic value of the mean coalescent time, T2, for a pair of alleles at the neutral locus. Under the infinite sites model, the nucleotide diversity at mutation-drift equilibrium is equal to the product of the neutral mutation rate and 2Ne (Charlesworth and Charlesworth 2010, p.211), so that we can use these results to predict the effect of selection at the B locus on the equilibrium level of variability at the neutral locus. (See section S2 of the Supplementary Information, Equation S16, for a more rigorous derivation of this result in the present context.)
However, since the matrix approximation is inaccurate for later generations with S < 1 and small q*, this use of the leading eigenvalue to estimate Ne is limited in scope, and is not necessarily accurate for general q* and small S. The approximate approach described in the next section deals with the case when S is of order 1 or less.
Approximation for the rate of loss of variability with weak selection
The starting point for this analysis is Equation 11, which shows that the change in expected heterozygosity due to selection, ∆Hsel, is the sum of two terms, ∆1 = −2shY4 and ∆2 = −2s(1 - 2h)Y5, where Y4 = E{D(x - y)} and Y5 = E{D(x - y)q}. We know from previous work on BGS that an acceleration of loss of variability always occurs in haploid models with deleterious mutations and drift; these are equivalent to diploid models with h = 0.5, in which case ∆2 is necessarily zero, and the process is entirely driven by ∆1; a necessary and sufficient condition for acceleration in this case is thus ∆1 < 0. This suggests that with h < 0.5 there may be a conflict between the two terms, whenever ∆1 < 0 and ∆2 > 0. Larger h gives more weight to ∆1 relative to ∆2, so that selection is more likely to reduce variability when h is large, in agreement with the numerical results just described. It is, however, unclear from this argument why small S should favour AOD over BGS.
The complexity of the recursion relations means that it is impossible to obtain simple, exact expressions for Y4 and Y5, but useful approximate results for the case of weak selection (S < 1) can be obtained as follows. Using expressions S4d and S4e in the Supplementary Information, the changes per generation in these moments in the absence of selection are:
In the absence of selection, Y4 thus tends to zero, and Y5 tends to - Y7/[N(C + U + v) + 5], which is < 0 and < Y7 in magnitude. This suggests that the magnitude of ∆1 will be much smaller than that of ∆2 when selection is weak. This is confirmed by the following argument. When there is selection, approximate recursions for Y4 and Y5 can be obtained using the matrix approach described in section 1 of the Supplementary Information (Equations S5). We make the further simplification of dropping terms involving u, v and products of s or D2 with q and q2, on the assumption that these are small relative to similar terms involving s or D2. These assumptions will be violated when selection is strong in relation to drift (S > 1) or when q* is ≫ 0. In addition, for the neutral case, it is possible to show that Y6 and Y5 approach equality and Y8 approaches Y7/2 if q* is set to 0.5 (we neglect Y9, since it is considerably smaller than Y8): see Equations S31 of the Supplementary Information, section 6. Our numerical results show that these relations are approximately true under much wider conditions, so that we can replace Y6 with Y5, and Y8 with Y7/2 in Equations S5, allowing us to obtain a coupled pair of difference equations for Y4 and Y5 for a given value of Y7.
It is also convenient to rescale all terms by multiplying by 2N, implying a new time-scale of 2N generations. We can then write: where
If the changes in Y4 etc. are sufficiently slow, we can set ∆Y4 and ∆Y5 to zero, and solve the resulting linear equations to obtain approximate expressions for Y4 and Y5 as functions of Y7:
This shows that Y4 is O(S), whereas Y5 is of order one, so that ∆Hsel is dominated by ∆1 when S is small, provided that h < ½. The determinant is always positive for S ≤ 60; this condition is not very restrictive given our assumption of relatively small S, and we will assume positivity from now on. Given that X7 > 0, it can easily be shown that Y4 > 0 when h < ½; S < 2 is sufficient for Y5 < 0 when h = 0.5. For small values of h these conditions are less stringent; with h = 0, Y4 > 0 independently of S, and Y5 < 0 when S < 6. Furthermore, neither Y4 nor Y5 depend on q*, except through their common factor of Y7. Our numerical results show that these conclusion are correct, provided that S is of order 1.
From Equation 11, the sign of the change due to selection in the heterozygosity at the neutral locus, ∆Hsel, is the same as the sign of -[h Y4 + (1 - 2h) Y5]. Variability will be increased by selection if ∆Hsel > 0 and reduced if ∆Hsel < 0, so that these results imply that the critical upper value of h, hc, required for an increase in neutral variability due to AOD, is also independent of q* under the conditions we have assumed. The value of hc can be found by setting ∆Hsel = 0. Using Equations 11 together with the numerators of Equations 17a and 17b, we find that hc is the value of h that satisfies the following cubic equation in h:
With S close to zero, Equation 18 reduces to 12h - 6 = 0, i.e. hc = ½, suggesting that variability will always be increased for realistic h values when selection is very weak. For other cases, some insights into the properties of hc can be obtained by assuming C = 0, which approximates cases with low recombination rates. With S = 1, the solution to the cubic is approximately hc = 0.36, slightly smaller than the value of 0.37 obtained from the numerical results. Figure 3 shows a comparison of the values obtained from Equation 18 using Matlab symbolic calculations with the values from the matrix approximation for generation 2N with C = 0.1 (very similar patterns are obtained for C values of up to 1). It will be seen that they agree quite well when S < 1, regardless of the value of q*, but Equation 18 mostly underestimates hc as S increases beyond 1, except for values of q* much smaller or much greater than 0.1. For S ≪ 1, dominance coefficients close to V2 are compatible with an enhanced level of neutral variability, but BGS always prevails for values of S ≫ 3, even when h is very close to zero.
The quantity 2N ∆Hsel/H is equivalent to ε = Ne/N - 1, where Ne is the effective population size for the generation in question. For small S, we can approximate ε by a Taylor’s series around S = 0. We have:
The first term in this equation is derived entirely from the contribution of Y5 to ∆Hsel, since the contribution from Y4 is O(S2). This implies that selection will cause increased variability if h is slightly less than ½ when higher order terms in S are negligible, consistent with the conclusions reached above. With very weak selection, BGS thus does not operate unless mutations are close to being semidominant. The second-order term is always < 0 and increases in magnitude as S increases; under the most favourable situation (h = 0), S < 4(5 + C)/10 is necessary for ε > 0, which is slightly more restrictive than is indicated by the results in Figure 3. Figure 4 shows some examples of the extent to which this approximation match the simulation results; in general, it tends to underestimate the extent to which loss of heterozygosity is retarded by selection when S.
These approximations also imply that the magnitude of ∆Hsel is proportional to Y7 = E{D2}. If we approximate E{D2} by E{xy}E{pq}σ2d = H E{pq} σ2d/2, as was done when obtaining the Equations 4 for the apparent selection coefficients, Equations 17 and 18 give:
Since σ2d E{pq}is ≤ 0.25, the quadratic dependence of the denominator on C, and the inverse relation between σ2d and C, implies that there will only be a very small effect of selection on neutral variability when C ≫ 1, unless S ≫ C. In addition, the size of the effect is strongly determined by S and by E{pq}, since σ2d takes a value between 0 and 1.
Overall, these results suggest that retardation of loss of neutral variability caused by a single selected locus is unlikely to be important except for S values of the order of 1. In such cases, if linkage is sufficiently tight, there can be a significant retardation of loss of neutral variability, and an enhancement of the equilibrium level of variability even when h approaches ½.
Two selected loci with mutation and selection
From simulation studies of the effect of multiple, linked selected loci on the behaviour of neutral loci located among them, we would expect to see larger values of both the apparent selection coefficients and the level of neutral heterozygosity than with a single selected locus (Ohta 1971; Latter 1998; Pamilo and Palsson 1998, 1999; Wang and Hill 1999; Bierne et al. 2000). However, it is unclear whether the effects of multiple selected loci are approximately additive, or whether there is some degree of synergism.
With small population size, very close linkage, strong selection and a sufficiently high level of recessivity of deleterious mutations, simulations have shown that ‘crystallization’ of the population into two predominant and complementary haplotypes (+ − + − … and − + − +…, where + denotes wild-type and − mutant) with respect to the selected loci can occur, such that its behavior is very similar to that of a single locus with strong heterozygote advantage (Charlesworth and Charlesworth 1997; Pamilo and Palsson 1999; Palsson 2001). This can result in long-term maintenance of variability at the selected loci. This is strongly suggestive of a synergistic effect of multiple selected loci.
No analytical treatment of more than one selected locus has been done, so we have investigated this problem using simulations of a small number of selected loci. The results of simulations with two selected loci surrounding a neutral locus are shown in Table 2, using the same set of recombination frequencies between the neutral and nearest selected locus as in Table 1. As far as the apparent selection coefficients are concerned, with h = 0.1 and the lower two recombination rates, there is clear evidence that the apparent selection coefficients in generation 2N (but not necessarily in generation 0.5N) are somewhat larger than twice the apparent selection coefficients in the single locus case, indicating a degree of synergism. A similar pattern is found with four selected loci (results not shown).
It is more difficult to quantify the effect on Hrel, since this is strongly dependent on the generation in question. With close linkage and h = 0.1, the values of Hrel - 1 in generation 2N in Table 2 are slightly larger than twice the corresponding values in Table 2. Conversely, with h = 0.45 and close linkage, for which Hrel < 1, the absolute values of Hrel - 1 in generation 2N are less than twice the corresponding values in Table 1, although the apparent selection coefficients are slightly larger than twice the values with a single selected locus. A more rigorous test for additivity is described in section S5 of the Supplementary Information; it suggests that the effects of two loci on Ne/N - 1, as estimated from ∆Hsel, are somewhat greater than additive when there is a retardation of loss of variability, and less than additive when there is an acceleration.
A single selected locus with heterozygote advantage
The methods used for the case of mutation and selection can be used to model the case of linkage to a single selected locus with heterozygote advantage, simply by making appropriate modifications to the terms in the R matrix, and to the expressions for the changes in haplotype frequencies in the computer simulations (see Equations 9 and section 3 of the Supplementary Information). Figure 5 and Table 3 compare some examples of matrix and simulation results; these again suggest that the matrix predictions are accurate for this parameter range. Once again, the ratio of recombination to the strength of selection has to be sufficiently small for a substantial effect.
An approximate treatment can be applied for the case when S = 2Ns and T = 2Nt are both < 1, on the same lines as for Equations 15-20 for the mutation-selection balance case. The coefficients for the approximate recursion relations for Y4 and Y5 are as follows:
The equivalents of Equations 17 and 18 are given in section 3 of the Supplementary Information, which can be used to calculate ∆Hsel in the same way as for the mutation-selection case. The first-order term in S and T in ∆Hsel gives:
The inbred load in this case is (s + t)E{pq} (see Equations 9), so that this is identical in form to the first-order approximation for the mutation-selection balance case (see Equation 19). For sufficiently weak selection, therefore, heterozygote advantage should always lead to an enhancement of neutral variability at a linked locus.
We note, however, that the asymptotic rate of loss of variability at a locus with heterozygote advantage has a complex dependence on the deterministic equilibrium allele frequency and the population size, with extreme allele frequencies (outside the range 0.2 to 0.8) leading to an acceleration of loss of variability when S + T is sufficiently large (Robertson 1962). This suggests that a similarly complex pattern should be found for the dependence of the measure of retardation/acceleration based on the leading eigenvalue of the recursion matrix. Figure S3 of the Supplementary Information (section 9) shows that this is indeed the case, although there is only a small region of parameter space where acceleration rather than retardation of loss of variability occurs.
Under most circumstances, some retardation of the loss of variability at a neutral locus is, therefore, likely to be observed as a result of linkage to a selected locus, although its magnitude is small unless linkage is tight and S + T ≫ 1. For C ≫ S + T it is unlikely that any experimentally detectable effects could be observed. The same applies to the apparent selection coefficients against homozygotes at the neutral locus.
Discussion
Some general considerations
Our results shed some new light on the old question of the existence and properties of associative overdominance (AOD) at a neutral locus in a randomly mating population of finite size, resulting from randomly generated linkage disequilibrium with respect to a locus subject to selection. The pioneering work of Sved (1968, 1971, 1972), Ohta and Kimura (1970) and Ohta (1971) gave expressions for the apparent selection coefficients against homozygotes at a neutral locus, using the equivalents of our Equations 3 and 8. Our approximate estimates of the apparent selection coefficients for the case of a finite population founded from an infinite population at equilibrium with no LD between the selected and neutral loci (Equations 4 and 9) followed their approach in using only the terms involving D2 in these equations, which are the dominant terms.
However, as was noted by Sved (1968, p.552) for the case of heterozygote advantage and by Latter (1998) for the case of selection against deleterious mutations, these selection coefficients do not result in any change in allele frequency at the neutral locus. As mentioned above (Equation 10), this is a completely general conclusion: application of the Price equation (Price 1970) shows that the change Ax per generation in the frequency of allele A1 at the neutral locus is equal to aD, where a is the average effect of A1 on fitness (see also Santiago and Caballero 1995, p.1016), to the order of the approximations used here (neglect of second-order terms in s and 1/N). Neither the change in allele frequency nor the change in heterozygosity at the neutral locus are influenced by terms in D2.
There is thus no connection between apparent overdominance caused by linkage disequilibrium and any retardation of loss of variability by drift at a neutral locus. Such a connection seems to have been widely assumed to be the basis for a retardation of loss of variability (e.g. Latter 1998), although it had been pointed out by Charlesworth (1991) that AOD caused by identity disequilibrium has no effect on allele frequencies at the neutral loci concerned. As was noted by Bierne et al. (2000), for this case it is necessary to treat apparent overdominance at the neutral locus as a phenomenon that is entirely distinct from any retardation of loss of variability. This point is reinforced by the results in Tables 1 and 2 for the case of selection against recurrent deleterious mutations, which show that apparent overdominance can accompany an acceleration of loss of neutral variability when the dominance coefficient h is sufficiently large (see the entries with h = 0.45). Of course, h < ½ is a necessary condition for both retardation of loss of variability and for apparent overdominance. This is not restrictive, given the evidence that the dominance coefficients for slightly deleterious mutations are generally < ½ (Crow 1993; Manna et al. 2011).
Selection against deleterious mutations
Computer simulations of systems with many loci subject to mutation to deleterious alleles have shown that a noticeable retardation of loss of neutral variation at linked loci can occur (Latter 1998; Pamilo and Palsson 1998, 1999; Wang and Hill 1999; Wang et al. 1999), although only for certain ranges of values of selection and dominance coefficients. As pointed out by Pamilo and Palsson (1998, 1999), Wang and Hill (1999) and Wang et al. (1999), there is a conflict between the effect of AOD on neutral variability and the effect of background selection (BGS); the latter involves a reduction in the effective population size experienced by a neutral locus caused by its linkage to deleterious mutations (Charlesworth et al. 1993). Their multi-locus simulations, as well as the matrix-based investigation of a model of a single selected locus linked to a neutral locus sib-mating and selfing lines by Wang and Hill (1999), showed that retardation versus acceleration of the rate of loss of neutral variability is favoured by relatively weak selection (2Ns values of order four or less) and low dominance coefficients; see, for example, Figure 1 of Wang and Hill (1999).
Our analytical and numerical methods, employing a 9 x 9 matrix of ‘moments’ of functions of the allele frequencies at the two loci and D, as well as the weak selection approximations for the change in heterozygosity, have allowed us to investigate in detail the regimes in which retardation versus acceleration of loss of neutral variability occurs.
The main conclusions are that:
(1) Retardation rather than acceleration of loss of variability is favored by sufficiently low values of the dominance coefficient, h, when S = 2Ns is in the range 0.5 to 4.5 (Figure 3). The commonly used estimate of h = 0.25 for slightly deleterious mutations (Manna et al. 2011; Charlesworth 2015) is consistent with retardation when S ≤ 2.5, except when q* is close to 0.5.
(2) S < 0.5 allows retardation of loss even for h values approaching ½; this domain of S values can be thought of as the AOD limit.
(3) In contrast, there is always an acceleration of loss of variability when S > 4, even with very low h values; this constitutes the BGS limit. Despite this acceleration, apparent heterozygote advantage at the neutral locus is always observed, provided that h < ½.
(4) For q* ≪ 1and for S ≫ 1, the asymptotic behaviour of the expected heterozygosity at the neutral locus is well predicted by the Ne value given by the standard equation for BGS.
(4) The magnitude of the extent of retardation or acceleration of loss of variability is an increasing function of C/S, for a given h value. Equation (19) also shows that, when S ≪ 1, the effect is proportional to the inbred load scaled by 2N, for a given value of C.
Population genomic analyses of levels of nonsynonymous site variability in a variety of organisms have consistently shown that there is a wide distribution of selection coefficients against new deleterious mutations, and suggest a mean S for natural populations that is ≫ 1 (reviewed by Charlesworth 2015). The large effective population sizes of most populations that have been studied in this way imply, however, that the mean selection coefficient against a new deleterious mutation is likely to be very small. For example, the current estimate of the mean hs for new nonsynonymous mutations in D. melanogaster is approximately 0.001; with h = 0.25, the corresponding mean s is 0.004 (Charlesworth 2015). With N = 50, as used in the examples below, the mean S is 0.4. If there were no variation in s for new mutations, this would suggest that they fall well within the range where Equations 18 and 19 predict retardation of loss of variability. However, as discussed below we also have to take into account the fact that there is a wide distribution of s. A coefficient of variation for s for new mutations of approximately 2 is suggested by the Drosophila polymorphism data (Kousathanas and Keightley 2013).
There is also evidence for deleterious mutations with larger fitness effects than typical nonsynonymous mutations, such as insertions and deletions; these appear to make up only a small contribution to the spectrum of new mutations (Charlesworth 2015), and will be neglected in what follows, since they must be sparsely distributed across the genome. Their net effect is probably to tilt the balance slightly more in favor of BGS than if they are ignored (but see the section below on relevant data).
Effects of multiple loci subject to mutation and selection
We now consider how the effects of multiple loci subject to deleterious mutation affect a linked neutral locus; this question can asked about both the apparent selection coefficients and the extent of retardation or acceleration of loss of variability.
Apparent selection coefficients
With respect to the apparent selection coefficients, previous workers (Ohta 1971, 1973; Ohta and Cockerham 1974; Bierne et al. 2000) assumed that multiple loci combine approximately additively. Table 2, which describes results for a pair of selected loci with, suggests that this assumption may be somewhat conservative; in most of the examples shown there, the selection coefficients are somewhat larger than twice the corresponding single locus values, especially with a low initial frequency of the A1 allele.
The procedure of summing contributions over all sites when there are many selected loci, as used by Ohta (1971, 1973), may thus underestimate the apparent selection coefficients, although the departure from additivity is likely to be small for the very weak selection that is probably most common. An alternative to Equation 13 of Ohta (1973), based on Equations 3, is derived in section 6 of the Supplementary Information (Equations S31). We assume that the population is being studied at a sufficiently long time after establishment that its LD is close to equilibrium. We also assume a single chromosome with a uniform rate of recombination along the chromosome, as well as a linear genetic mapping function (this somewhat overestimates the amount of recombination, compared with the true situation with partial crossover interference). For a randomly placed marker, we have the asymptotic result: where Btot is the inbred load associated with homozygosity for the whole schromosome, and C* = 2NM, where M is the effective map length of the chromosome (taking into account any sex differences in recombination rates).
It is useful to compare this formula with the multi-locus simulation results of Latter (1998), who modeled a D. melanogaster autosome evolving in a population maintained at a size of 50 for 200 generations after its foundation from an equilibrium population. The chromosome was 100cM in length; the absence of crossing over in males means that the effective value of M is 0.5 Morgans, so that C* = 50 with N = 50. For a randomly placed marker, Equation 23 gives an approximate apparent selection coefficient of 0.312Btot.
Btot can be estimated as follows. The initial value of the net fitness (relative to a balancer chromosome) for D. melanogaster second chromosomes that were purged of major effect mutations by extracting them from a population maintained for many generations at an approximate size of 50 was 0.4 (Latter et al. 1995; Latter 1998). This is equivalent to the fitness of a homozygous chromosome relative to that of an outbred genotype in the initial population; the value of Btot for the initial population is then - ln(4) = 0.90, assuming multiplicative fitness effects. This assumption is consistent with the log-linear decline in fitness with inbreeding coefficient F in the small laboratory populations of D. melanogaster described by Latter et al. (1995).
The simulations of Latter (1998) adjusted the number of mutable loci to produce an initial Btot of 0.90. However, we need to take into account the reduction in Btot as variability is lost at the selected loci. A minimum estimate of Btot at generation t is provided by assuming that the ratio of E{pq} to its initial value follows the standard neutral result, 1 - Ft = exp(-t/2N). For N = 50 and t =200, this procedure yields Btot = 0.90 x (1 - F200) = 0.122, and a value of 0.038 for the apparent selection coefficient on a random marker, somewhat lower than the value of 0.05 given in Figure 6 of Latter (1998). The discrepancy probably reflects an overestimate of the loss of variability by use of the standard neutral value of Ft. Latter’s Figure 4 shows F200 = 0.7 for the selected loci in his simulations, giving Btot = 0.270 and an apparent selection coefficient of 0.084. The single locus approximations for the apparent selection coefficients often overestimate the true values for later generations (see Table 1), so that it is not surprising that Equation 23 overestimates the apparent selection coefficient.
Since the numerator of Equation (23) is only a logarithmic function of C*, the apparent selection coefficients are roughly inversely proportional to population size. Species with larger numbers of chromosomes, and with crossing over in both sexes, will have much smaller apparent selection coefficients for the same value of N. Detectable apparent selection coefficients arising from AOD in randomly mating populations are thus only likely to be found in very small populations, especially in organisms with large numbers of chromosomes such as most vertebrates. This suggests that AOD arising from identity disequilibrium (Szulkin et al. 2010) is a more likely explanation of heterozygosity/fitness relations in natural populations than LD due to drift in a randomly mating population (Hansson and Westerberg 2002), unless the population has been reduced to a very small number of breeding individuals.
Rate of loss of variability
This approach can also be applied to the rate of loss of neutral variability, using the weak selection approximation of Equation 19, which includes only the first and second-order terms in S in the expression for ε = Ne /N - 1 = 2N ∆Hsel, again assuming that LD is close to neutral equilibrium. The derivation of the approximate expected value of ε for a neutral marker placed randomly on a chromosome, using the same assumptions as before, is given in section S7 of the Supplementary Information (Equations S32-S35). The final result is: where CVs is the coefficient of variation of the distribution of s for new mutations. With an exponential distribution, CVs = 1.
We can compare the predictions of this equation with the simulation results in Latter’s Table 6 for a mean S of 0.5, which is similar to the value suggested by the population genomic analyses for a population size of 50 and meets the weak selection requirement. Latter used a measure (∆F) derived from net change in F for neutral loci over t generations, relative to the corresponding value without selection; this is approximately equivalent to the ratio of N to the harmonic mean of Ne, i.e., to 1/(1 + ε). For the case of an exponential distribution of s, he estimated ∆F as approximately 0.6 for t = 200, for three different dominance coefficients, 0, 0.1 and 0.2. With h = 0.2, the estimate of ε for a random marker with a population size of 50 is 1.35Btot; with the above estimate of Btot = 0.270 for generation 200, this gives ε = 0.365 and ∆F = 0.73, which is somewhat higher than Latter’s value. But the comparisons of the single locus simulation results with Equation 19 suggested that the approximation underestimates ε by about 30% (Figure 4); increasing ε by this amount to 0.474 give ∆F = 0.68. If the effects of different loci on Ne combine multiplicatively rather than additively, as is known to be the case for BGS (Charlesworth and Charlesworth 2010, p.402), ε should be replaced by exp(ε) - 1 = 0.606, and ∆F = 0.62, which is very close to the simulation value. The value of h has only a small effect on the results; for example, with h = 0, the uncorrected ε = 0.454 and ∆F = 0.69.
We can also use Equation 24 to determine the expectations with the population genomics estimates of mean S of 0.004 and CVs = 2, with h = 0.25. Substituting these values into Equation 24 with the same N and recombination parameters as before, Equation 24 gives a negative values of ε = −0.0663, and ∆F = 1.07. This implies that BGS would cause a reduction in Ne, to about 93% of the neutral value. This conclusion should be treated with caution, since the very wide distribution of selection coefficients in this case means that there will be a substantial range of values that violate the assumption of S < 1 required for the validity of Equation 19.
Relation to the data on loss of variability in small populations
Most of the data that are relevant to the question of whether AOD generated by linkage to deleterious mutations can retard the loss of neutral variability in small randomly mating populations comes from a few studies of the behavior of putatively neutral markers and quantitative traits in small laboratory populations of D. melanogaster. In order to avoid the complications of interpretation associated with high levels of inbreeding, we will not discuss results from studies of sib or first-cousin mating, although these suggest a significant retardation of loss of variability (Rumball et al. 1994), which is consistent with the conclusion that retardation is favored when S is small. Latter (1998) analyzed the results of Latter et al. (1995) for two allozyme markers, and found a ∆F value of 0.52 over 200 generations in populations maintained at a size of approximately 50 (see his Table 6), consistent with his multi-locus simulation results based on an exponential distribution of s.
Experiments using seven allozyme markers and two bristle traits in pedigreed populations ranging in size from sib-mating to 500, and maintained for 50 generations, showed that the regression coefficient of heterozygosity and genetic variance on pedigree inbreeding coefficient were significantly lower than the value of 1 expected with neutrality, suggesting a retardation of loss of variability (Gilligan et al. 2005). The facts that the results from populations with different sizes were pooled, and that there is non-linear relation between heterozygosity and inbreeding coefficient when AOD is acting, mean that a quantitative analysis of these results in terms of our model is impossible. In addition, a later analysis of 8 microsatellite loci over 48 generations for the same set of populations showed an acceleration of loss of variability by this criterion (Montgomery et al. 2010), which the authors interpreted as evidence for selective sweeps related to adaptation to the laboratory environment.
Different experiments have, therefore, given very different patterns, and it is certainly possible that any retardation of loss of variability due to AOD may be obscured by selective sweeps even if N is of the size that would otherwise lead to a retardation of loss of variability. However, a basic problem with experiments of this type is that the heterozygosity for a single neutral locus has a very high stochastic variance, even if allele frequencies are estimated with complete accuracy (Avery and Hill 1977); see Equation S39 of the Supplementary Information, section 8. This yields the asymptotic expression for large t:
This equation implies that the relative error increases as mean heterozygosity decreases. For example, at generation 100, if the initial heterozygosity was at the maximal value of 0.5 for a biallelic locus, the mean neutral H for N = 50 is 0.184, yielding an expected coefficient of variation for a single locus in a single population of 2.17. To obtain a coefficient of variation of 0.1, (2.17/0.1)2 = 471 independent heterozygosities would need to be measured, either from independent loci or populations, or from a combination of the two. Given that Figure 6 of Latter (1998) shows a difference of approximately 0.2 between the expected neutral F value and the value from simulations with AOD due to deleterious mutations, which corresponds to the difference in scaled H values, it is clear that many more replications than this would be necessary to obtain statistically significant results, especially as the distribution of H is far from normal. The differences between the different experiments could thus be purely stochastic, especially as the autocorrelations between H values in different generations means that the regression tests used by Gilligan et al. (2005) and Montgomery et al. (2010) are problematic.
It is possible that tight linkage of neutral markers to sets of very strongly and highly recessive deleterious variants that are in repulsion LD with each other could maintain variation as a result of the pseudo-overdominance that can arise in this circumstance: see the section on two selected loci above, and Charlesworth and Charlesworth (1997), Pamilo and Palsson (1999) and Palsson (2001). This is most likely to occur in very small populations; major effect mutations are relatively sparsely distributed, so that their effects on neutral variability are likely to be restricted to specific genomic regions, rather than evenly spread across the genome. Recent genomic investigations of the well-known Chillingham population of cattle, which have been maintained at very small population size for 350 years, are consistent with this interpretation. Genome-wide diversity is very low, and residual variability is localized to a number of specific genomic locations (Willams et al. 2016).
A full assessment of the possibility that a realistic distribution of mutational effects on fitness (DFE), which are currently becoming available from genome-wide polymorphism data (Charlesworth 2015), can cause significant retardation of the loss of neutral variability will require simulations that use realistic DFEs, coupled with experiments on the effects of reduced population size that exploit the high levels of replication that can be achieved using modern genomic technology for generating large numbers of markers. It would also be desirable to use populations that have been maintained for a long period in the laboratory at a large size as the initial population, to avoid possible confounding effects of selective sweeps. Current theory and data cannot convincingly answer the question of whether AOD due to deleterious mutations is a credible explanation for the presence of more than expected levels of variability in small populations.
Selection in favor of heterozygotes
Genome scans for signatures of balancing selection suggest that this is a relatively rare phenomenon compared with the number of genes in the genome (Charlesworth 2006; Gao et al. 2015), so that it is unlikely that a given neutral site will be closely linked to a locus maintained by heterozygote advantage. The exception is inversion polymorphisms in organisms such as Drosophila; these can cover relatively large proportions of the genome, and also suppress crossing over to a considerable extent outside their breakpoints (Krimbas and Powell 1992). They could thus have a substantial impact on the behavior of neutral variants in small populations. Thus, our conclusions concerning heterozygote advantage are relevant to experimental results on small populations where inversions are segregating, as is often the case with D. melanogaster.
Our main conclusion is that retardation of loss of variability is nearly always observed when there is heterozygote advantage, except when the allele frequency at the selected locus is close to the boundaries 0 or 1, and there is relatively strong selection, when it is known that heterozygote advantage tends to accelerate the loss of variability (Robertson 1962). As shown in Figure 4 and Table 3, a substantial degree of retardation can occur when c ≪ 10(s + t), especially with intermediate equilibrium allele frequencies. As with mutation and selection, with S + T < 1 and a fixed value of C the magnitude of retardation is proportional to the inbred load scaled by 2N (see Equation 22).
Acknowledgments
We thank Nick Barton and Bill Hill for useful discussions. Lei Zhao was supported by a China Scholarship Council (CSC) scholarship (#201506100068).
Appendix
Approximations for the expected frequency of B2 and expected heterozygosity at the selected locus
With weak selection, the deterministic change in the frequency q of allele B2 under selection and mutation (neglecting terms of order s2) is given by:
Because ∆q is third degree in q, it is impossible to obtain exact, closed equations for the changes in the expectations of q and q2, E{q}, and E{q2} under drift, selection and mutation. However, if it is assumed that second-order terms in the deviation of q from its deterministic equilibrium value q*, given by Equation 5a, can be ignored, approximate recursion relations can be obtained as follows. (The matrix calculations described in section 1 of the Supplementary Information, which neglect only third-order terms in (q - q*), yield better approximations.)
Taking the first derivative of Equation A1 with respect to q yields a linear deterministic recursion relation for the departure of q in generation t from the equilibrium frequency q* under mutation and selection, which is given by Equations 5: where
Since this is linear in q, the mean of q under drift, selection and mutation remains at q*.
The equilibrium expected heterozygosity at the selected locus is given by H2 = 2q*(1 - q*)A/(1+A), where A = −4Neκ (e.g., Charlesworth and Charlesworth 2010, p.355). The change in the expected heterozygosity at the selected locus at time t, H2t, is approximately equal to the expectation of 2(1 - 2qt)∆q(qt); combining this with Equations A2 yields the following recursion relation:
Because H2 is twice the expectation of pq, Equation A3 was used in Equations 4 for determining the approximate apparent selection coefficients.
Approximate neutral recursion relations for the expectation of r2
For the purely neutral case, Sved (1972) proposed the following approximation for the recursion relation for the expected correlation coefficient in generation t, E{rt2}: where c is the frequency of recombination between the two loci, assumed to be ≪ 0.5. Since mutation at the B locus reduces the coefficient of linkage disequilibrium D with the A locus by u + v each generation, this equation should be modified as follows:
This yields the equilibrium solution: where C’ = 2N(c + u + v).
If the initial population is in linkage equilibrium, Equations A4 yield the following homogeneous recursion relation:
This suggests the following expression for σd2 = E{D2}/E{xypq}: where the equilibrium value of σd2 is given by the following expression (Ohta and Kimura 1971):
The simulation results show that Equation A6a tends to overestimate E{r2} up to t = 0.5N; it is quite accurate for later generations up to t = N, and tends to underestimate E{r2} thereafter. Equation A5 gives a better approximation for the earlier and later generations, but a worse approximation for the intermediate generations.
Expectations of the reciprocals of the allele frequencies at the neutral locus
The expectations of x−1 and y−1 = (1 - x)−1, conditioned on segregation at locus A, for use in Equations 3 and 9, can be derived using the diffusion equation solution for the case of pure drift (Kimura 1955), under the assumption that departures from strict neutrality can be neglected as a first-order approximation. For this purpose, it is convenient to rescale time in units of 2N generations, such that T = t/(2N). Using the first three terms in the series expansion for the unconditional probability density of the frequency x of A1at time T, given an initial frequency of x0, we have:
The expected frequency of heterozygotes at locus A at time T is
To determine the conditional probability of x for a segregating population, we need the probability that the population is still segregating for A1 and A2 at time T, given by Equation 8.4.9 of Crow and Kimura (1970). Using the first two terms in the series, we have
The conditional probability density of x at time T is then given by φ/Ps, which can be used to determine the mean of x−1 at time T for segregating populations:
An equivalent expression for the expectation of y−1 can be obtained by interchanging x and y and x0 and y0 in these equations.
For sufficiently large T, Equation A7 implies that the conditional probability distribution becomes uniform, regardless of the initial frequency, so that its mean should be well approximated by where γ is Euler’s constant (approximately 0.5772) for the difference between the sum and integral of 1/x.
As was pointed out by Fisher (1930), there is a slight departure of the asymptotic distribution from a uniform distribution, due to the inaccuracy of the diffusion approximation near the frequencies 0 and 1, with fewer frequencies in the subterminal classes close to either boundary, although the distribution is still symmetrical about 0.5. Using the table of exact values computed by Fisher for the first 10 classes from 1/(2N) upwards, and from 1 - 1/(2N) downwards, the asymptotic conditional means are slightly greater than the above value, by 2.299 - [2.6556 x 2N/(2N - 1)]. Addition of this term to Equation A11 provides a slightly more accurate approximation for the long-term expectation of the reciprocals of the allele frequencies, which was used to calculate the asymptotic expected selection coefficients.