Abstract
Background selection is a process whereby recurrent deleterious mutations cause a decrease in the effective population size and genetic diversity at linked loci. Several authors have suggested that variation in the intensity of background selection could cause variation in FST across the genome, which could confound signals of local adaptation in genome scans. We performed realistic simulations of DNA sequences, using parameter estimates from humans and sticklebacks, to investigate how variation in the intensity of background selection affects different statistics of population differentiation. We show that, in populations connected by gene flow, Weir & Cockerham’s (1984) estimator of FST is largely insensitive to locus-to-locus variation in the intensity of background selection. Unlike FST, however, dXY is negatively correlated with background selection. We also show that background selection does not greatly affect the false positive rate in FST outlier studies. Overall, our study indicates that background selection will not greatly interfere with finding the variants responsible for local adaptation.
Introduction
Natural selection affects patterns of genetic diversity throughout the genome. How selection affects genetic diversity on a single isolated locus is relatively easy to model; however, when a large number of linked loci are considered, interactions between evolutionary pressures at different sites render the task of modelling much more difficult.
Maynard Smith & Haigh (1974) recognized the influence of selection on linked neutral sites, proposing that strong positive selection could reduce genetic diversity at nearby sites. This process is now referred to as a ‘selective sweep’. Much later, Charlesworth et al. (1993) proposed that deleterious mutations could also affect genetic diversity at nearby sites, because some haplotypes would be removed from the population as selection acts against linked deleterious alleles. They named this process background selection (BGS). Both selective sweeps and background selection affect genetic diversity; they both reduce the effective population size of linked loci. Empirical evidence of a positive correlation between genetic diversity and recombination rate has been reported in several species (Cutter and Payseur, 2013), including Drosophila melanogaster (Begun & Aquadro, 1992; Elyashiv et al., 2016), humans (Spencer et al., 2006), collared flycatchers, hooded crows and Darwin’s finches (Dutoit et al., 2017; see also Vijay et al., 2017).
BGS is also expected to affect FST (Charlesworth et al., 1997; Cutter & Payseur, 2013; Cruickshank & Hahn, 2014; Hoban et al., 2016). At low effective population size, different populations may randomly fix different alleles, increasing FST, while at high population size, allele frequency changes less through time and different populations are more likely to have comparable allele frequencies, keeping FST low. This negative relationship between effective population size Ne and FST is captured in Wright’s classical infinite island result; (Wright, 1943). One might therefore expect that loci under stronger BGS would show higher FST.
Many authors have also argued that, because BGS reduces the within-population diversity, it should lead to high FST (Cutter & Payseur, 2013; Cruickshank & Hahn, 2014; Hoban et al., 2016). Expressed in terms of heterozygosities, , where HT is the expected heterozygosity in the entire population and HS is the average expected heterozygosity within subpopulations (HS and HT are also sometimes called πS and πT; e.g. Charlesworth, 1998). All else being equal, a decrease of HS would indeed lead to an increase of FST. However, all else is not equal; HT is also affected by BGS (Charlesworth et al., 1997). Therefore in order to understand the effects of BGS on FST, we must understand the relative impact of BGS on both HS and HT.
Performing numerical simulations, Charlesworth et al. (1997) report that BGS reduces the within population heterozygosity HS slightly more than it reduces the total heterozygosity HT, causing a net increase in FST. The effect on FST reported is quite substantial, but, importantly, their simulations were not meant to be realistic. The authors highlighted their goal in the methods:
“The simulations were intended to show the qualitative effects of the various forces studied […], so we did not choose biologically plausible values […]. Rather, we used values that would produce clear-cut effects”.
For example, talking about their choice for the deleterious mutation rate of 8 × 10−4 per site:
“This unrealistically high value was used in order for background selection to produce large effects […]”
Much of the literature on the effect of BGS on FST is based on the results in Charlesworth et al. (1997), even though they only intended to show proof of concept (see also Zeng & Charlesworth, 2011 and Zeng & Corcoran, 2015). They did not attempt to estimate how strong of an effect BGS has on FST in real genomes.
It is important to distinguish two separate questions when discussing the effect of BGS on FST; 1) How does BGS affect the average genome-wide FST? and 2) How does locus-to-locus variation in the intensity of BGS affect locus-to-locus variation in FST? The second question is of particular interest to those trying to identify loci under positive selection (local selection or selective sweep). Locus-to-locus variation in FST potentially could be confounded with the FST peaks created by positive selection. In this paper, we focus on this second question.
The identification of loci involved in local adaptation is often performed via FST outlier tests (Lotterhos & Whitlock, 2014; Hoban et al., 2016). Other tests exist to identify highly divergent loci such as cross-population extended haplotype homozygosity (XP-EHH; Sabeti et al., 2007), comparative haplotype identity (Lange & Pool, 2016), cross-population composite likelihood ratio (XP-CLR; Chen et al., 2010). FST outlier tests, such as FDist2 (Beaumont & Nichols, 1996), BayeScan (Foll & Gaggiotti, 2008) or FLK (Bonhomme et al., 2010), look for genomic regions showing particularly high FST values to find candidates for local adaptation. If BGS can affect FST unevenly across the genome, then regions with a high intensity of BGS could potentially have high FST values that could be confounded with the pattern caused by local selection (Charlesworth et al., 1997; Cruickshank & Hahn, 2014). BGS could therefore inflate the false positive rate when trying to detect loci under local selection.
The potential confounding effect of BGS on signals of local adaptation has led to an intense effort trying to find solutions to this problem (Bank et al., 2014; Huber et al., 2016). Many authors have understood from Cruickshank and Hahn (2014) that dXY should be used instead FST in outlier tests (e.g. McGee et al., 2015; Yeaman, 2015; Whitlock & Lotterhos, 2015; Brousseau et al., 2016; Picq et al., 2016; Payseur & Rieseberg, 2016; Hoban et al., 2016; Vijay et al., 2017; see also Nachman & Payseur, 2012). FST is a measure of population divergence relative to the total genetic diversity, while dXY is an absolute measure of population divergence defined as the probability of non-identity by descent of two alleles drawn in the two different populations averaged over all loci (Nei, 1987; Nei, 1987 originally called it DXY but, here, we follow Cruickshank and Hahn’s, 2014 terminology by calling it dXY). The argument is that because FST is a measure of divergence relative to the genetic diversity and dXY an absolute measure of divergence and because BGS reduces genetic diversity, then BGS must affect FST but not dXY, a claim that we will investigate in this paper.
Whether BGS can affect genome-wide FST under some conditions is not in doubt (Charlesworth et al., 1997), but whether locus-to-locus variation in the intensity of BGS present in natural populations substantially affects variation in FST throughout the genome is very much unknown. Empirically speaking, it has been very difficult to measure how much of the genome-wide variation in genetic diversity is caused by BGS, as opposed to selective sweeps or variation in mutation rates (Cutter & Payseur, 2013; see also attempts in humans by Cai et al., 2009 and McVicker et al. 2009). We are therefore in need of realistic simulations that can give us more insight into how BGS affects genetic diversity among populations and how it affects the statistics of population divergence.
In this article, we investigate the effect of BGS in structured populations with realistic numerical simulations using parameter estimates from humans and stickleback. Our two main goals are 1) to quantify the impact of locus-to-locus variation in the intensity of BGS on dXY (Nei, 1987) and FST (Weir & Cockerham, 1984) and 2) to determine whether BGS inflates the false positive rate of FST outlier tests.
Methods
Our goal is to perform biologically plausible simulations of the local genomic effects of background selection. BGS is expected to vary with gene density, mutation rate and recombination rate across the genome. We used data from real genomes to simulate realistic covariation in recombination rates and gene densities. We chose to base our simulated genomes on two eukaryote genomes, sticklebacks and humans, because these two species have attracted a lot attention in studies of local adaptation and because sticklebacks have a variance in recombination rate which is almost 15 times higher than humans (data not shown), allowing us to test vastly different types of eukaryotic genomes. The recombination rate variation in humans is extremely fine scale, but it presents the potential issue that it is estimated from linkage disequilibrium data. As selection causes linkage disequilibrium to increase, estimates of recombination rate at regions under strong selection may be under-estimated, hence artificially increasing the simulated variance in the intensity of BGS. Although the recombination map for stickleback is much less fine scaled, the estimates are less likely to be biased as they are computed from pedigrees.
Our simulations are forward in time and were performed using the simulation platform SimBit version 3.69. The code and user manual are available at https://github.com/RemiMattheyDoret/SimBit. To double check our results, we also ran some simulations with SFS_code (Hernandez, 2008), confirming that we get consistent distributions of genetic diversity and of FST among simulations (results not shown). Generations are non-overlapping, individuals are hermaphrodites, mating is random within patches and selection occurs before dispersal.
Genetics
For each simulation, we randomly sampled a sequence of about 10 cM coming either from the stickleback (Gasterosteus aculeatus) genome or from the human genome (see treatments below) and used this genomic location to determine the recombination map and exon locations for a simulation replicate. For the stickleback genome, we used the gene map and recombination map from Roesti et al. (2013). Ensembl-retrieved gene annotations were obtained from Marius Roesti. For the human genome, we used the recombination map from The International HapMap Consortium (2007) and the gene positions from NCBI and positions of regulatory sequences on Ensembl (Zerbino et al., 2017). We excluded sex chromosomes to avoid complications with haploid parts of the genomes. As estimates of mutation rate variation throughout the genome are very limited, we assumed that the haploid mutation rate varies from site to site following an exponential distribution with mean of 2.5×10−8 per generation (Nachman & Crowell, 2000).
More specifically, we first randomly sampled a sequence of 105 nucleotides, which we will refer to as the focal region. All of the statistics (defined under the section Statistics below) are calculated only on the focal region of each simulation. Nucleotides that occur in locations determined to be exons in the sampled genomic map are subject to selection (see Selection), while all other nucleotides are assumed to be neutral. The focal region itself contained on average ∼0.44 genes for the human genome and ∼3.15 genes for the stickleback genome.
We simulated a 5 cM region on each side of the focal region (resulting in a window of 10 cM plus the recombination rate present in the specific focal region of 105 sites) in order to capture the local effects of background selection. In these 10 cM flanking regions, we only tracked exons. In the nearest 1 cM on each side of the focal region, as well within the focal region, we individually simulated each nucleotide with a biallelic locus. On the remaining outer 4 cM, to improve the speed and RAM usage of the process, we tracked the number of mutations in blocks of up to 100 nucleotides. For these blocks, we tracked only the number of mutations but not their location within the block. Ignoring recombination within a block likely had little effect on the results because the average recombination distance between the first and last site of a block is of the order of 10−6 cM. The expected number of segregating sites within a block is , which for a mutation rate per block of 10−6 and a population size of N = 10,000 is ∼0.42. The probabilities of having more than one mutation and more than two mutations (based on a Poisson approximation) are therefore only approximately 6.7% and 0.9%, respectively. Overall, the level of approximation used is very reasonable.
Selection
As we are interested in the effect of BGS, we modelled the effects of purifying selection against novel deleterious mutations. Each nucleotide in the exons (and regulatory sequences for the human genome) is subject to purifying selection with a selection coefficient against mutant alleles determined by a gamma distribution described below. For focal regions that include exons, statistics are computed over a sequence that is at least partially under direct purifying selection.
To create variance in selection pressures throughout the genome, each exon (and regulatory sequence for the human genome) has its own gamma distribution of heterozygous selection coefficients s. The mean and variance of these gamma distributions are drawn from a bivariate uniform distribution with correlation coefficient of 0.5 (so that when the mean is high, so is the variance) bounded between 10−8 and 0.2 for both the mean and the variance. These bounds were inspired by the methodology used in Gilbert et al. (2017). The gamma distributions are bounded to one. Figure S1 shows the overall distribution of selection coefficient s, with 2% of mutations being lethal and an average deleterious selection coefficient for the non-lethal mutations of 0.07. To improve the performance of our simulations, we used multiplicative dominance, where the fitness of heterozygotes is at locus i is 1-si and the fitness of the double mutant is (1-si) 2.
As a consequence of our parameter choices, our genome-wide deleterious mutation rate was about 1.6 in sticklebacks and about 3 in humans. 9.4% of the stickleback genome and 2.6% of the human genome was under purifying selection. For comparison, the genome-wide deleterious mutation rate is estimated at 2.2 in humans (Keightley, 2012) and 0.44 in rodents (Keightley & Gaffney, 2003). To our knowledge, there is currently no such estimation for sticklebacks. Note however that the above estimates cannot reliably detect mutations that are quasi lethal (s << 1/2N). By our distribution of selective coefficients, 49% of all deleterious mutations have a heterozygote selective coefficient lower than 1/2Ne when Ne = 1,000 (42% when Ne = 10,000).
It is worth noting however that, in rodents, about half of the deleterious mutation rate occurs in non-coding sequences (Keightley & Gaffney, 2003). Our simulations using human genome had all exons and all regulatory sequences under purifying selection. With our simulations based on the stickleback genome, however, only exons were under purifying selection. It is therefore possible that we would have over-estimated the deleterious mutation rate in gene-rich regions and under-estimated the deleterious mutation rate in other regions, especially in stickleback. This would artificially increase the locus-to-locus variation in the intensity of BGS in our simulations.
Demography
In all simulations, we started with a burn-in phase with a single population of N diploid individuals, lasting 5 × 2N generations. The population was then split into two populations of N individuals each with a migration rate between them equal to m. After the burn-in phase, each simulation was run for 5 × 2N more generations for a total of 10 × 2N generations.
Treatments
We explored presence and absence of deleterious mutations over two patch sizes, three migration rates, and two genomes.
We do not have a full factorial design. We considered a basic design and explored variations from this design. The basic design had a population size per patch of N = 1000, a migration rate of m = 0.005 and used the stickleback genome for its recombination map and gene positions. As deviations from this basic design, we explored modification of every variable, one variable at a time. The Large N treatment has N = 10000. The Human treatment uses the human genome for gene positions, regulatory sequences and recombination map. The treatments No Migration and High Migration have migration rates of m = 0 and m = 0.05, respectively.
To test the robustness of our results and because it may be relevant for inversions, we also had unrealistic simulations where recombination rate for the entire genome was set at zero. As a check against previous work, we qualitatively replicated the results Charlesworth et al. (1997) by performing simulations with similar assumptions as they used. We named this treatment CNC97. In our CNC97 simulations, N=2000, m=0.001, and 1000 loci were all equally spaced at 0.1 cM apart from each other with constant selection pressure with heterozygotes having fitness of 0.98 and double homozygotes fitness of 0.9 and constant mutation rate μ = 0.0004.
In all treatments (except Large N), we performed 4000 simulations; 2000 simulations with BGS and 2000 simulations without selection (where all mutations were neutral). For Large N, simulations took more memory and more CPU time. We therefore could only perform 2000 simulations for Large N; 1000 simulations with background selection and 1000 simulations without selection. That represents a total of 26,000 simulations for 7 treatments. A full list of all treatments can be found in table 1.
We set the generation 0 at the time of the split. The state of each population was recorded at the end of the burn-in period (generation −1) and at generations 0.001× 2N, 0.05 × 2N, 0.158 × 2N, 1.581× 2N and 5 × 2N after the split. For N=1000, the sampled generations are therefore −1, 2, 100, 316, 3162 and 10000.
Predicted intensity of Background Selection
In order to investigate the locus-to-locus correlation between the predicted intensity of BGS and various statistics, we computed B, a statistic that approximates the expected ratio of the coalescent time with background selection over the coalescent time without background selection . B quantifies how strong BGS is expected to be for a given simulation (Nordborg et al. 1996). A B value of 0.8 means that BGS has caused a drop of genetic diversity of 20% compared to a theoretical absence of BGS. Lower B values indicate stronger BGS.
Both Hudson & Kaplan (1995) and Nordborg et al. (1996) have derived theoretical expectations for B. We applied both methods and found that the predictions of the two formulas are highly correlated (Figure S2). Because the Hudson & Kaplan (1995) approach has been more popular in the literature (cited almost twice as often), we show only the B values computed following Hudson & Kaplan (1995): where ri is the recombination rate between the focal site and the ith site under selection, and si is the heterozygous selection coefficient at that site. ui is the mutation rate at the ith site. By this formula, B is bounded between 0 and 1, where 1 means no BGS at all and low values of B mean strong BGS. We computed B for all sites in the focal region and report the average B for the region.
For the stickleback genome, B values ranged from 2 × 10−6 to 1.0 with a mean of 0.937 (Figure S2). For the human genome, B values ranged from 0.45 to 1.0 with a mean at 0.975. There is indeed less variability and much fewer extremely low B values in the human genome. In the unrealistic No Recombination treatment, B values range from 0.00003 to 0.84 with a mean of 0.17.
FST outlier tests
In order to know the effect of BGS on outlier tests of local adaptation, we used a variant of FDist2 (Beaumont & Nichols, 1996). We chose FDist2 because it is a simple and fast method for which the assumptions of the test match well to the demographic scenario simulated here. Because the program FDist2 is not available through the command line, we rewrote the FDist2 algorithm in R and C++. Source code can be found at https://github.com/RemiMattheyDoret/Fdist2.
Our FDist2 procedure is as follows; first, we estimated the migration rate from the average ; Charlesworth, 1998) and then running 50000 simulations each lasting for 50 times the half-life to reach equilibrium FST given the estimated migration rate (Whitlock, 1992). For each SNP, we then selected the subset of FDist2 simulations for which allelic diversity was less than 0.02 away from the allelic diversity of the SNP of interest. The P-value is computed as the fraction of FDist2 simulations within this subset having a higher FST than the one we observed. The false positive rate is then defined as the fraction of neutral SNPs for which the P- value is lower than a given α value. The α values explored are 0.1, 0.05, 0.01, 0.001, 0.0001 and 0.00001.
For the outlier tests, to avoid issues of pseudo-replication, we considered only a single SNP per simulation whose minor allele frequency is greater than 0.05. Then, we randomly assembled SNPs from a given treatment into groups of 500 SNPs to create the data file for FDist2. We have 4000 simulations (2000 with BGS and 2000 without BGS) per treatment (Large N is an exception with only 2000 simulations total), which allowed 8 independent false positive rate estimates per treatment (4 estimates with BGS and 4 without BGS). In each treatment, we tested for different false positive rate with and without BGS with both a Welch’s t-test and a Wilcoxon test.
Statistics
FST and dXY are both measures of population divergence. In the literature there are several definitions of FST, and we also found potential misunderstanding about how dXY is computed. We want to clarify here these definitions and what we mean when we use the terms FST and dXY.
There are two main estimators of FST in the literature; GST (Nei, 1973) and θ (Weir & Cockerham, 1984). In this article, we focus on θ as an estimate of FST (Weir & Cockerham, 1984). There are also two methods of averaging FST over several loci. The first method is to simply take an arithmetic mean over all loci. The second method consists at calculating the sum of the numerator of θ over all loci and dividing it by the sum of the denominator of θ over all loci. Weir and Cockerham (1984) showed that this second averaging approach has lower bias than the simple arithmetic mean. We will refer to the first method as the “average of ratios” and to the second method as “ratio of the averages” (Reynolds et al. 1983; Weir & Cockerham, 1984). In this article, we use FST as calculated by “ratio of the averages”, as advised by Weir and Cockerham (1984). To illustrate the effects of BGS on the biased estimator of FST, we also computed FST as a simple arithmetic mean (“average of the ratio”), and we will designate this statistic with a subscript FST (average of ratios). dXY is a measure of genetic divergence between two populations X and Y. Nei (1987) defined dXY as where L is the total number of sites, Al is the number of alleles at the lth site and xl,k and yl,k are the frequency of the kth allele at the lth locus in the population X and Y respectively.
Some population genetics software packages (e.g., EggLib; De Mita and Siol, 2012) average dXY over polymorphic sites only, instead of averaging over all sites, as in Nei’s (1987) original definition of dXY. This measure averaged over polymorphic sites only will be called dXY-SNP; otherwise, we use the original definition of dXY by Nei (1987).
The statistics reported are the average FST, dXY, and within population genetic diversity . For each treatment and at each generation, we computed five independent Pearson correlation tests between B and FST, FST (average of ratios), dXY, dXY-SNP and HS. We compared our correlation tests with ordinary least squares regressions and robust regressions (using M-estimators; Huber, 1964), and the results were consistent.
Results
Correlations between the statistics HS, FST, FST (average of ratios), dXY, and dXY-SNP and B, are summarized in tables S1, S2, S3, S4 and S5, respectively. Figure 1 shows the means and standard errors for the treatments Default, High Migration, Large N, Human and No Migration. The same graphs for the treatments No Recombination and CNC97 can be found in Figure S3.
Genetic Diversity
Genetic diversity within populations (HS) is very similar among the treatments Default, High Migration and Human (around HS = 1.9 × 10−4) but is about 1.9 times lower in the No Migration treatment (HS = 1.0 × 10−4) and about 10 times higher in the Large N treatment (HS = 1.9 × 10−3; Figure 1).
B is significantly correlated with genetic diversity within populations (HS) for all treatments using the stickleback genome (and at almost all generations) but not with the Human treatment (table S1). Excluding the unrealistic treatments (No Recombination and CNC97), simulations with BGS have a genetic diversity 4% to 20% lower than simulations without BGS (Figure 1, right column and Figure S3, right column). In the Human treatment, there is no significant correlation of B and HS. Note that Pearson’s correlation coefficients between B and HS are always very small even when the effect is highly significant. The largest R2 observed in realistic simulations (excluding in the No Recombination and CNC97 treatments) between B and HS is R2 ≈ 0.0121.
Statistics of population divergence
Figure 2 shows the correlation between B and the statistics FST, dXY and HS for Default at the last generation. These graphs highlight the general tendencies of the treatments Default, High Migration, Large N and No Recombination. The strongest correlation with B is observed for the statistics dXY (P = 3.28 × 10−5, R = 0.093) and HS (P = 3.1 × 10−5, R = 0.093). In fact, the two statistics dXY and HS are very highly correlated (P < 2.2 × 10−16, R = 0.99). This high correlation explains the resemblance between the central and right graphs of figure 2. FST is not correlated with B (P = 0.99, R = 10−4). All correlation tests between B and the statistics FST, FST (average of ratios), dXY, dXY-SNP can be found in Tables S2, S3, S4 and S5, respectively.
The No Migration treatment is an exception to the other treatments. FST is not significantly correlated with B at early generations but become slightly correlated as divergence rises to 0.6 and higher. dXY shows an opposite pattern. dXY is very significantly correlated with B at early generations and seemingly independent of B at the last generation. Note that for both dXY and FST, all correlation coefficients are always very small. The largest R2 observed is R2=0.0121 in realistic simulations (found for FST No Migration and for dXY Large N; Tables S2, S4) and R2=0.0256 for the No Recombination treatment (found for dXY; Table S4).
As expected, in the CNC97 simulations, there is a strong difference between simulations with BGS and simulations without BGS for all three statistics (FST, dXY, and HS) at all generations (Welch’s t-tests; all P < 2.2×10−16; Figure S3).
FST calculated as advised by Weir and Cockerham (1984) was generally less sensitive to BGS than FST calculated as an average of ratios (compare tables S2 and S3). Figure S4 illustrates the sensitivity of FST (average of ratios) in the worst case, the No Recombination treatment. This sensitivity is driven largely by rare alleles and goes away when minor alleles below a frequency of 0.05 are excluded.
FST outlier tests
The observed false positive rate is relatively close to the α values except for No Migration (with and without BGS) and CNC97 (with BGS). With the exception of treatments No Migration and CNC97, there is no significant difference in false positive rates between simulations with BGS and those without BGS for α of 0.05 (Figure 3). In the treatment, No Migration, the false positive rates between simulations with and without BGS are significantly different for the latter generations. In this treatment, the false positive rate for both simulations with and without BGS are much higher than the α value of 0.05. Results remain very congruent for other α values (results not shown).
Discussion
Background selection reduces genetic diversity, both within and among populations, but the effect on FST is rather small. In simulations of interconnected populations with realistic parameters, FST is insensitive to BGS while the absolute measure of divergence dXY is affected by BGS. On the other hand, in highly diverged populations unconnected by migration, BGS can have a greater impact on FST and a lower impact on dXY. The effects of BGS, when observed, are always very small with R2 never over 1.1%.
BGS impacts both total and within population genetic diversity. Excluding the treatments No recombination and CNC97, we observe that simulations with BGS have a genetic diversity (whether HT or HS; HT data not shown) 6% to 16% lower than simulations without BGS. Messer & Petrov (2013) simulated a panmictic population, looking at a sequence of similar length inspired from a gene-rich region of the human genome, and reported a similar decrease in genetic diversity. Under the No Recombination treatment, this average reduction of genetic diversity due to BGS is 53%. Although empirical estimates are very complex and can hardly disentangle BGS from selective sweeps, our results are also comparable with empirical estimates. Reduction in genetic diversity in humans between regions under high BGS are estimated at 6% according to Cai et al. (2009) or 19-26% according to McVicker et al. (2009). In Drosophila melanogaster, where gene density is higher, the reduction in genetic diversity due to BGS is estimated at 36% when using Kim & Stephan (2000)’s methodology and is estimated at 71% reduction using a composite likelihood approach (Elyashiv et al., 2016) and is hence closer to our No Recombination treatment than to the other treatments. It is worth noting that, because we were interested in simulating variance among sites in effects of BGS, we only simulated local effects and therefore underestimate the expected genome-wide effect of BGS on HT.
In contrast to measures of heterozygosity, FST was generally not significantly correlated with B. The only exception is for the No Migration treatment, where, after many generations, as the average FST becomes very high (FST > 0.5), we observe a slight, yet significant, negative correlation between the expected effects of BGS, B, and FST (intense BGS lead to high values of FST). This highlights that FST is not completely insensitive to BGS, but FST is largely robust to BGS.
Future research is needed to attempt a theoretical estimate of the genome-wide effect of BGS. Our work has been restricted to the stickleback and human genomes. While these two genomes are good representatives of many cases of eukaryotic genomes, they are not good representatives of more compact genomes such as bacterial genomes or yeasts. Our simulations used randomly mating diploid populations. Non-random mating, selfing, and asexual reproduction could also affect our general conclusion, and potentially strongly increase the effects of BGS on FST (Charlesworth et al. 1997). We have explored two population sizes, but we could not explore population sizes of the order of a million individuals (like Drosophila melanogaster) and still realistically simulate such long stretch of DNA. It is not impossible that a much greater population size or a more complex demography could yield to BGS having a greater effect on FST than what we observed here (Torres et al. 2017).
Some have argued that, because BGS reduces the within population diversity, it should lead to high FST (Cutter & Payseur, 2013; Cruickshank & Hahn, 2014; Hoban et al., 2016). All else being equal, this statement is correct. However, BGS reduces HT almost as much as HS (Figure 4). It is therefore insufficient to consider only one component, and we must consider the ratio of these two quantities captured by the definition of FST, . This ratio, as we have shown, appears to be relatively robust to BGS. While genome-wide BGS might eventually be strong enough to cause departures with FSTvalues, it appears that locus-to-locus variation in the intensity of BGS is not strong enough to have much impact on FST as long as populations are not too highly diverged.
We also investigated the consequences of BGS on the widely-used but imperfect estimator, FST (average of ratios), for which FST measures for each locus are averaged to create a genomic average. It is well known that FST (average of ratios) is a biased way to average F!”over several loci (Weir & Cockerham, 1984); however, its usage is relatively common today. In our simulations, FST (average of ratios) is more affected by BGS than FST. Interestingly, FST (average of ratios) is most often higher with weaker BGS. The directionality of this correlation may seem unintuitive at first. To understand this discrepancy, remember that BGS affects the site frequency spectrum; we observed that BGS leads to an excess of loci with low HT (results not shown but see Charlesworth et al., 1995; see also contrary expectation in Stephan, 2010). Loci associated with very low HT also have low FST (figure S5), a well-known result described by Beaumont and Nichols (1996). As BGS creates an excess of loci with low HT and loci with low HT tend to have low FST, BGS can actually reduce FST (average of ratios). After filtering out SNPs with a minor allele frequency lower than 5%, most of the correlation between FST (average of ratios) and B is eliminated (Figure S4).
The absolute measure of divergence dXY is sensitive to BGS. Regions of stronger BGS are associated with low dXY. The effect, although significant, is of relatively small size. The expected dXY for neutral loci is dXY = 4Nµ + 2tµ (Nei, 1987), where t is the time in generation since the populations started to diverge. 4Nµ is the expected heterozygosity in the ancestral population (before splitting) and 2tµ is the expected number of mutations fixed over time in either population since the population split. BGS does not affect the rate of fixation of mutations arising after the populations diverged, but BGS affects the expected heterozygosity. Therefore, BGS should affect dXY by its effect on the expected heterozygosity, and this effect should be greater early in divergence when the 4Nμ term is large relative to the fixation term. This is consistent with the results of our simulations. This result is in agreement with Vijay et al. (2017) who reported a strong correlation between HS and dXY when FST is low (FST ≈ 0.02), but this correlation breaks down when studying more distantly related populations (FST ≈ 0.3).
Interestingly, dXY becomes less sensitive to BGS when FST becomes more sensitive. While our methodology does not allow us to test the efficiency of dXY in outlier tests, it is possible that dXY could be used for highly divergent lineages, but not in cases when divergence is relatively low. Cruickshank and Hahn (2014) suggested relying more on dXY than FST for finding highly divergent loci. Their conclusion was based on analysis of a dataset involving highly divergent populations only (FST values range from about 0.38 to about 0.8). Based on our simulations, their conclusion is not valid when populations are not highly diverged.
As BGS also leads to a reduction of the number of polymorphic sites, BGS has an even stronger effect on dXY-SNP than on dXY (Figure S4). (The measure that we call dXY- SNP is dxy improperly calculated based only on polymorphic sites, as is done in some software packages.) This result highlights the importance of not blindly trusting the output of a given software package.
Outside the effect of BGS on Ne, there are at least two other possible factors that can potentially affect the correlation between B and µ: the effect of deleterious mutations on the effective migration rate and the auto-correlation of µ. Because most deleterious mutations are recessive (García-Dorado and Caballero, 2000; Peters et al., 2003; Shaw & Chang, 2006), the offspring of migrants, who enjoy an increased heterozygosity compared to local individuals, will be at a selective advantage. The presence of deleterious mutations therefore lead to an increase in the effective migration rate (Ingvarsson & Whitlock, 2000). This increases the effective migration rate and hence, leads to a decrease in FST.
As mutation rate is auto-correlated throughout the genome, neutral sequences closely linked to sequences that frequently receive deleterious mutation are also likely to experience frequent neutral mutations. As a high mutation rate leads to low F values (, Wright 1943), autocorrelation in mutation rate may also impact the correlation between B and FST. This effect is likely to be negligible as long as m >> µ.
Recently, evidence of a correlation between recombination rate and FST has been interpreted as likely being caused by deleterious mutations rather than positive selection, whether the divergence between populations is very high (e.g. Cruickshank & Hahn, 2014), moderately high (Vijay et al., 2017) or moderately low (Torres et al., 2017). Here we showed the BGS is unlikely to explain all of these correlations. It can be hypothesized that positive selection (selective sweeps and local adaptation) could be the main cause of this correlation.
McVicker et al. (2009) attempted an estimation of B values in the human genome (see also Elyashiv et al., 2016). They did so using equations from Nordborg et al. (1996). As there is little knowledge about the strength of selection throughout the genome, to our understanding, this estimation of B values should be highly influenced by the effects of beneficial mutations as well as deleterious mutations. Torres et al. (2017) reused this dataset and found a slight association between B and FST among human lineages. It is plausible that this correlation between B and FST could be driven by positive selection rather than by deleterious mutations.
Our FDist2 analysis shows that the false positive rate does not differ in simulations with BGS or without BGS. The only exceptions concern the unrealistic CNC97 treatment and the No Migration treatment after many generations (Figure 3). The average FST at the last generation of the No Migration treatment is greater than 0.8. With such high FST both the simulation without BGS and with BGS lead to very high false positive rates (0.45 without BGS and 0.5 with BGS). This difference in false positive rates is significant, but the observed high false positive rate should make it clear that for such highly diverged populations, FST outlier tests are not recommended in general.
Many authors (e.g. Nachman & Payseur, 2012; Cutter & Payseur, 2013; McGee et al., 2015; Yeaman, 2015; Whitlock & Lotterhos, 2015; Brousseau et al., 2016; Picq et al., 2016; Payseur & Rieseberg, 2016; Hoban et al., 2016) have raised concerns that BGS can strongly reduce our ability to detect the genomic signature of local adaptation. Our analysis shows that BGS is not a strong confounding factor to FST outlier tests of populations that are not too highly diverged.
Funding
The work was funded by NSERC Discovery Grant RGPIN-2016-03779 to MCW and by the Swiss National Science Foundation via the fellowship Doc.Mobility P1SKP3_168393 to RMD.
Acknowledgments
Many thanks to Loren H. Rieseberg, Sarah P. Otto and Amy L. Angert for their help in discussing the design of the project and for feedback. Thanks to Sarah P. Otto, Darren E. Irwin, and Bret A. Payseur for helpful comments on the manuscript. We also thank Yaniv Brandvain and Tom Booker for their feedback and Marius Roesti for help with the Ensembl-retrieved gene annotations.