Background selection and the statistics of population differentiation: consequences for detecting local adaptation

Remi Matthey-Doret; Michael C. Whitlock

doi:10.1101/326256

Abstract

Background selection is a process whereby recurrent deleterious mutations cause a decrease in the effective population size and genetic diversity at linked loci. Several authors have suggested that variation in the intensity of background selection could cause variation in F_ST across the genome, which could confound signals of local adaptation in genome scans. We performed realistic simulations of DNA sequences, using parameter estimates from humans and sticklebacks, to investigate how variation in the intensity of background selection affects different statistics of population differentiation. We show that, in populations connected by gene flow, Weir & Cockerham’s (1984) estimator of F_ST is largely insensitive to locus-to-locus variation in the intensity of background selection. Unlike F_ST, however, d_XY is negatively correlated with background selection. We also show that background selection does not greatly affect the false positive rate in F_ST outlier studies. Overall, our study indicates that background selection will not greatly interfere with finding the variants responsible for local adaptation.

Introduction

Natural selection affects patterns of genetic diversity throughout the genome. How selection affects genetic diversity on a single isolated locus is relatively easy to model; however, when a large number of linked loci are considered, interactions between evolutionary pressures at different sites render the task of modelling much more difficult.

Maynard Smith & Haigh (1974) recognized the influence of selection on linked neutral sites, proposing that strong positive selection could reduce genetic diversity at nearby sites. This process is now referred to as a ‘selective sweep’. Much later, Charlesworth et al. (1993) proposed that deleterious mutations could also affect genetic diversity at nearby sites, because some haplotypes would be removed from the population as selection acts against linked deleterious alleles. They named this process background selection (BGS). Both selective sweeps and background selection affect genetic diversity; they both reduce the effective population size of linked loci. Empirical evidence of a positive correlation between genetic diversity and recombination rate has been reported in several species (Cutter and Payseur, 2013), including Drosophila melanogaster (Begun & Aquadro, 1992; Elyashiv et al., 2016), humans (Spencer et al., 2006), collared flycatchers, hooded crows and Darwin’s finches (Dutoit et al., 2017; see also Vijay et al., 2017).

BGS is also expected to affect F_ST (Charlesworth et al., 1997; Cutter & Payseur, 2013; Cruickshank & Hahn, 2014; Hoban et al., 2016). At low effective population size, different populations may randomly fix different alleles, increasing F_ST, while at high population size, allele frequency changes less through time and different populations are more likely to have comparable allele frequencies, keeping F_ST low. This negative relationship between effective population size N_e and F_ST is captured in Wright’s classical infinite island result; (Wright, 1943). One might therefore expect that loci under stronger BGS would show higher F_ST.

Many authors have also argued that, because BGS reduces the within-population diversity, it should lead to high F_ST (Cutter & Payseur, 2013; Cruickshank & Hahn, 2014; Hoban et al., 2016). Expressed in terms of heterozygosities, , where H_T is the expected heterozygosity in the entire population and H_S is the average expected heterozygosity within subpopulations (H_S and H_T are also sometimes called π_S and π_T; e.g. Charlesworth, 1998). All else being equal, a decrease of H_S would indeed lead to an increase of F_ST. However, all else is not equal; H_T is also affected by BGS (Charlesworth et al., 1997). Therefore in order to understand the effects of BGS on F_ST, we must understand the relative impact of BGS on both H_S and H_T.

Performing numerical simulations, Charlesworth et al. (1997) report that BGS reduces the within population heterozygosity H_S slightly more than it reduces the total heterozygosity H_T, causing a net increase in F_ST. The effect on F_ST reported is quite substantial, but, importantly, their simulations were not meant to be realistic. The authors highlighted their goal in the methods: “The simulations were intended to show the qualitative effects of the various forces studied […], so we did not choose biologically plausible values […]. Rather, we used values that would produce clear-cut effects”.

For example, talking about their choice for the deleterious mutation rate of 8 × 10⁻⁴ per site: “This unrealistically high value was used in order for background selection to produce large effects […]”

Much of the literature on the effect of BGS on F_ST is based on the results in Charlesworth et al. (1997), even though they only intended to show proof of concept (see also Zeng & Charlesworth, 2011 and Zeng & Corcoran, 2015). They did not attempt to estimate how strong of an effect BGS has on F_ST in real genomes.

It is important to distinguish two separate questions when discussing the effect of BGS on F_ST; 1) How does BGS affect the average genome-wide F_ST? and 2) How does locus-to-locus variation in the intensity of BGS affect locus-to-locus variation in F_ST? The second question is of particular interest to those trying to identify loci under positive selection (local selection or selective sweep). Locus-to-locus variation in F_ST potentially could be confounded with the F_ST peaks created by positive selection. In this paper, we focus on this second question.

The identification of loci involved in local adaptation is often performed via F_ST outlier tests (Lotterhos & Whitlock, 2014; Hoban et al., 2016). Other tests exist to identify highly divergent loci such as cross-population extended haplotype homozygosity (XP-EHH; Sabeti et al., 2007), comparative haplotype identity (Lange & Pool, 2016), cross-population composite likelihood ratio (XP-CLR; Chen et al., 2010). F_ST outlier tests, such as FDist2 (Beaumont & Nichols, 1996), BayeScan (Foll & Gaggiotti, 2008) or FLK (Bonhomme et al., 2010), look for genomic regions showing particularly high F_ST values to find candidates for local adaptation. If BGS can affect F_ST unevenly across the genome, then regions with a high intensity of BGS could potentially have high F_ST values that could be confounded with the pattern caused by local selection (Charlesworth et al., 1997; Cruickshank & Hahn, 2014). BGS could therefore inflate the false positive rate when trying to detect loci under local selection.

The potential confounding effect of BGS on signals of local adaptation has led to an intense effort trying to find solutions to this problem (Bank et al., 2014; Huber et al., 2016). Many authors have understood from Cruickshank and Hahn (2014) that d_XY should be used instead F_ST in outlier tests (e.g. McGee et al., 2015; Yeaman, 2015; Whitlock & Lotterhos, 2015; Brousseau et al., 2016; Picq et al., 2016; Payseur & Rieseberg, 2016; Hoban et al., 2016; Vijay et al., 2017; see also Nachman & Payseur, 2012). F_ST is a measure of population divergence relative to the total genetic diversity, while d_XY is an absolute measure of population divergence defined as the probability of non-identity by descent of two alleles drawn in the two different populations averaged over all loci (Nei, 1987; Nei, 1987 originally called it D_XY but, here, we follow Cruickshank and Hahn’s, 2014 terminology by calling it d_XY). The argument is that because F_ST is a measure of divergence relative to the genetic diversity and d_XY an absolute measure of divergence and because BGS reduces genetic diversity, then BGS must affect F_ST but not d_XY, a claim that we will investigate in this paper.

Whether BGS can affect genome-wide F_ST under some conditions is not in doubt (Charlesworth et al., 1997), but whether locus-to-locus variation in the intensity of BGS present in natural populations substantially affects variation in F_ST throughout the genome is very much unknown. Empirically speaking, it has been very difficult to measure how much of the genome-wide variation in genetic diversity is caused by BGS, as opposed to selective sweeps or variation in mutation rates (Cutter & Payseur, 2013; see also attempts in humans by Cai et al., 2009 and McVicker et al. 2009). We are therefore in need of realistic simulations that can give us more insight into how BGS affects genetic diversity among populations and how it affects the statistics of population divergence.

In this article, we investigate the effect of BGS in structured populations with realistic numerical simulations using parameter estimates from humans and stickleback. Our two main goals are 1) to quantify the impact of locus-to-locus variation in the intensity of BGS on d_XY (Nei, 1987) and F_ST (Weir & Cockerham, 1984) and 2) to determine whether BGS inflates the false positive rate of F_ST outlier tests.

Methods

Our goal is to perform biologically plausible simulations of the local genomic effects of background selection. BGS is expected to vary with gene density, mutation rate and recombination rate across the genome. We used data from real genomes to simulate realistic covariation in recombination rates and gene densities. We chose to base our simulated genomes on two eukaryote genomes, sticklebacks and humans, because these two species have attracted a lot attention in studies of local adaptation and because sticklebacks have a variance in recombination rate which is almost 15 times higher than humans (data not shown), allowing us to test vastly different types of eukaryotic genomes. The recombination rate variation in humans is extremely fine scale, but it presents the potential issue that it is estimated from linkage disequilibrium data. As selection causes linkage disequilibrium to increase, estimates of recombination rate at regions under strong selection may be under-estimated, hence artificially increasing the simulated variance in the intensity of BGS. Although the recombination map for stickleback is much less fine scaled, the estimates are less likely to be biased as they are computed from pedigrees.

Our simulations are forward in time and were performed using the simulation platform SimBit version 3.69. The code and user manual are available at https://github.com/RemiMattheyDoret/SimBit. To double check our results, we also ran some simulations with SFS_code (Hernandez, 2008), confirming that we get consistent distributions of genetic diversity and of F_ST among simulations (results not shown). Generations are non-overlapping, individuals are hermaphrodites, mating is random within patches and selection occurs before dispersal.

Genetics

For each simulation, we randomly sampled a sequence of about 10 cM coming either from the stickleback (Gasterosteus aculeatus) genome or from the human genome (see treatments below) and used this genomic location to determine the recombination map and exon locations for a simulation replicate. For the stickleback genome, we used the gene map and recombination map from Roesti et al. (2013). Ensembl-retrieved gene annotations were obtained from Marius Roesti. For the human genome, we used the recombination map from The International HapMap Consortium (2007) and the gene positions from NCBI and positions of regulatory sequences on Ensembl (Zerbino et al., 2017). We excluded sex chromosomes to avoid complications with haploid parts of the genomes. As estimates of mutation rate variation throughout the genome are very limited, we assumed that the haploid mutation rate varies from site to site following an exponential distribution with mean of 2.5×10⁻⁸ per generation (Nachman & Crowell, 2000).

More specifically, we first randomly sampled a sequence of 10⁵ nucleotides, which we will refer to as the focal region. All of the statistics (defined under the section Statistics below) are calculated only on the focal region of each simulation. Nucleotides that occur in locations determined to be exons in the sampled genomic map are subject to selection (see Selection), while all other nucleotides are assumed to be neutral. The focal region itself contained on average ∼0.44 genes for the human genome and ∼3.15 genes for the stickleback genome.

We simulated a 5 cM region on each side of the focal region (resulting in a window of 10 cM plus the recombination rate present in the specific focal region of 10⁵ sites) in order to capture the local effects of background selection. In these 10 cM flanking regions, we only tracked exons. In the nearest 1 cM on each side of the focal region, as well within the focal region, we individually simulated each nucleotide with a biallelic locus. On the remaining outer 4 cM, to improve the speed and RAM usage of the process, we tracked the number of mutations in blocks of up to 100 nucleotides. For these blocks, we tracked only the number of mutations but not their location within the block. Ignoring recombination within a block likely had little effect on the results because the average recombination distance between the first and last site of a block is of the order of 10⁻⁶ cM. The expected number of segregating sites within a block is , which for a mutation rate per block of 10⁻⁶ and a population size of N = 10,000 is ∼0.42. The probabilities of having more than one mutation and more than two mutations (based on a Poisson approximation) are therefore only approximately 6.7% and 0.9%, respectively. Overall, the level of approximation used is very reasonable.

Selection

As we are interested in the effect of BGS, we modelled the effects of purifying selection against novel deleterious mutations. Each nucleotide in the exons (and regulatory sequences for the human genome) is subject to purifying selection with a selection coefficient against mutant alleles determined by a gamma distribution described below. For focal regions that include exons, statistics are computed over a sequence that is at least partially under direct purifying selection.

To create variance in selection pressures throughout the genome, each exon (and regulatory sequence for the human genome) has its own gamma distribution of heterozygous selection coefficients s. The mean and variance of these gamma distributions are drawn from a bivariate uniform distribution with correlation coefficient of 0.5 (so that when the mean is high, so is the variance) bounded between 10⁻⁸ and 0.2 for both the mean and the variance. These bounds were inspired by the methodology used in Gilbert et al. (2017). The gamma distributions are bounded to one. Figure S1 shows the overall distribution of selection coefficient s, with 2% of mutations being lethal and an average deleterious selection coefficient for the non-lethal mutations of 0.07. To improve the performance of our simulations, we used multiplicative dominance, where the fitness of heterozygotes is at locus i is 1-s_i and the fitness of the double mutant is (1-s_i) ².

Figure S1:

Overall distribution of selection coefficient in the heterozygotes. There are 2% lethal mutations, and the average selection coefficient of the non-lethal mutations is approximately 0.07.

As a consequence of our parameter choices, our genome-wide deleterious mutation rate was about 1.6 in sticklebacks and about 3 in humans. 9.4% of the stickleback genome and 2.6% of the human genome was under purifying selection. For comparison, the genome-wide deleterious mutation rate is estimated at 2.2 in humans (Keightley, 2012) and 0.44 in rodents (Keightley & Gaffney, 2003). To our knowledge, there is currently no such estimation for sticklebacks. Note however that the above estimates cannot reliably detect mutations that are quasi lethal (s << 1/2N). By our distribution of selective coefficients, 49% of all deleterious mutations have a heterozygote selective coefficient lower than 1/2N_e when N_e = 1,000 (42% when N_e = 10,000).

It is worth noting however that, in rodents, about half of the deleterious mutation rate occurs in non-coding sequences (Keightley & Gaffney, 2003). Our simulations using human genome had all exons and all regulatory sequences under purifying selection. With our simulations based on the stickleback genome, however, only exons were under purifying selection. It is therefore possible that we would have over-estimated the deleterious mutation rate in gene-rich regions and under-estimated the deleterious mutation rate in other regions, especially in stickleback. This would artificially increase the locus-to-locus variation in the intensity of BGS in our simulations.

Demography

In all simulations, we started with a burn-in phase with a single population of N diploid individuals, lasting 5 × 2N generations. The population was then split into two populations of N individuals each with a migration rate between them equal to m. After the burn-in phase, each simulation was run for 5 × 2N more generations for a total of 10 × 2N generations.

Treatments

We explored presence and absence of deleterious mutations over two patch sizes, three migration rates, and two genomes.

We do not have a full factorial design. We considered a basic design and explored variations from this design. The basic design had a population size per patch of N = 1000, a migration rate of m = 0.005 and used the stickleback genome for its recombination map and gene positions. As deviations from this basic design, we explored modification of every variable, one variable at a time. The Large N treatment has N = 10000. The Human treatment uses the human genome for gene positions, regulatory sequences and recombination map. The treatments No Migration and High Migration have migration rates of m = 0 and m = 0.05, respectively.

To test the robustness of our results and because it may be relevant for inversions, we also had unrealistic simulations where recombination rate for the entire genome was set at zero. As a check against previous work, we qualitatively replicated the results Charlesworth et al. (1997) by performing simulations with similar assumptions as they used. We named this treatment CNC97. In our CNC97 simulations, N=2000, m=0.001, and 1000 loci were all equally spaced at 0.1 cM apart from each other with constant selection pressure with heterozygotes having fitness of 0.98 and double homozygotes fitness of 0.9 and constant mutation rate μ = 0.0004.

In all treatments (except Large N), we performed 4000 simulations; 2000 simulations with BGS and 2000 simulations without selection (where all mutations were neutral). For Large N, simulations took more memory and more CPU time. We therefore could only perform 2000 simulations for Large N; 1000 simulations with background selection and 1000 simulations without selection. That represents a total of 26,000 simulations for 7 treatments. A full list of all treatments can be found in table 1.

View this table:

Table 1:

Summary of treatments. For all treatments but CNC97, the average mutation rate was set to 2.5 × 10⁻⁸ per site, per generation and the mean heterozygous selection coefficient to 0.1.

We set the generation 0 at the time of the split. The state of each population was recorded at the end of the burn-in period (generation −1) and at generations 0.001× 2N, 0.05 × 2N, 0.158 × 2N, 1.581× 2N and 5 × 2N after the split. For N=1000, the sampled generations are therefore −1, 2, 100, 316, 3162 and 10000.

Predicted intensity of Background Selection

In order to investigate the locus-to-locus correlation between the predicted intensity of BGS and various statistics, we computed B, a statistic that approximates the expected ratio of the coalescent time with background selection over the coalescent time without background selection . B quantifies how strong BGS is expected to be for a given simulation (Nordborg et al. 1996). A B value of 0.8 means that BGS has caused a drop of genetic diversity of 20% compared to a theoretical absence of BGS. Lower B values indicate stronger BGS.

Both Hudson & Kaplan (1995) and Nordborg et al. (1996) have derived theoretical expectations for B. We applied both methods and found that the predictions of the two formulas are highly correlated (Figure S2). Because the Hudson & Kaplan (1995) approach has been more popular in the literature (cited almost twice as often), we show only the B values computed following Hudson & Kaplan (1995): where r_i is the recombination rate between the focal site and the i^th site under selection, and s_i is the heterozygous selection coefficient at that site. u_i is the mutation rate at the i^th site. By this formula, B is bounded between 0 and 1, where 1 means no BGS at all and low values of B mean strong BGS. We computed B for all sites in the focal region and report the average B for the region.

Figure S2:

Relationship between B values computed for all simulations using the stickleback genome using the methods of Hudson and Kaplan (1996) and Nordborg et al. (1997).

For the stickleback genome, B values ranged from 2 × 10⁻⁶ to 1.0 with a mean of 0.937 (Figure S2). For the human genome, B values ranged from 0.45 to 1.0 with a mean at 0.975. There is indeed less variability and much fewer extremely low B values in the human genome. In the unrealistic No Recombination treatment, B values range from 0.00003 to 0.84 with a mean of 0.17.

F_ST outlier tests

In order to know the effect of BGS on outlier tests of local adaptation, we used a variant of FDist2 (Beaumont & Nichols, 1996). We chose FDist2 because it is a simple and fast method for which the assumptions of the test match well to the demographic scenario simulated here. Because the program FDist2 is not available through the command line, we rewrote the FDist2 algorithm in R and C++. Source code can be found at https://github.com/RemiMattheyDoret/Fdist2.

Our FDist2 procedure is as follows; first, we estimated the migration rate from the average ; Charlesworth, 1998) and then running 50000 simulations each lasting for 50 times the half-life to reach equilibrium F_ST given the estimated migration rate (Whitlock, 1992). For each SNP, we then selected the subset of FDist2 simulations for which allelic diversity was less than 0.02 away from the allelic diversity of the SNP of interest. The P-value is computed as the fraction of FDist2 simulations within this subset having a higher F_ST than the one we observed. The false positive rate is then defined as the fraction of neutral SNPs for which the P- value is lower than a given α value. The α values explored are 0.1, 0.05, 0.01, 0.001, 0.0001 and 0.00001.

For the outlier tests, to avoid issues of pseudo-replication, we considered only a single SNP per simulation whose minor allele frequency is greater than 0.05. Then, we randomly assembled SNPs from a given treatment into groups of 500 SNPs to create the data file for FDist2. We have 4000 simulations (2000 with BGS and 2000 without BGS) per treatment (Large N is an exception with only 2000 simulations total), which allowed 8 independent false positive rate estimates per treatment (4 estimates with BGS and 4 without BGS). In each treatment, we tested for different false positive rate with and without BGS with both a Welch’s t-test and a Wilcoxon test.

Statistics

F_ST and d_XY are both measures of population divergence. In the literature there are several definitions of F_ST, and we also found potential misunderstanding about how d_XY is computed. We want to clarify here these definitions and what we mean when we use the terms F_ST and d_XY.

There are two main estimators of F_ST in the literature; G_ST (Nei, 1973) and θ (Weir & Cockerham, 1984). In this article, we focus on θ as an estimate of F_ST (Weir & Cockerham, 1984). There are also two methods of averaging F_ST over several loci. The first method is to simply take an arithmetic mean over all loci. The second method consists at calculating the sum of the numerator of θ over all loci and dividing it by the sum of the denominator of θ over all loci. Weir and Cockerham (1984) showed that this second averaging approach has lower bias than the simple arithmetic mean. We will refer to the first method as the “average of ratios” and to the second method as “ratio of the averages” (Reynolds et al. 1983; Weir & Cockerham, 1984). In this article, we use F_ST as calculated by “ratio of the averages”, as advised by Weir and Cockerham (1984). To illustrate the effects of BGS on the biased estimator of F_ST, we also computed F_ST as a simple arithmetic mean (“average of the ratio”), and we will designate this statistic with a subscript F_{ST (average of ratios)}. d_XY is a measure of genetic divergence between two populations X and Y. Nei (1987) defined d_XY as where L is the total number of sites, A_l is the number of alleles at the l^th site and x_l,k and y_l,k are the frequency of the k^th allele at the l^th locus in the population X and Y respectively.

Some population genetics software packages (e.g., EggLib; De Mita and Siol, 2012) average d_XY over polymorphic sites only, instead of averaging over all sites, as in Nei’s (1987) original definition of d_XY. This measure averaged over polymorphic sites only will be called d_XY-SNP; otherwise, we use the original definition of d_XY by Nei (1987).

The statistics reported are the average F_ST, d_XY, and within population genetic diversity . For each treatment and at each generation, we computed five independent Pearson correlation tests between B and F_ST, F_{ST (average of ratios)}, d_XY, d_XY-SNP and H_S. We compared our correlation tests with ordinary least squares regressions and robust regressions (using M-estimators; Huber, 1964), and the results were consistent.

Results

Correlations between the statistics H_S, F_ST, F_{ST (average of ratios)}, d_XY, and d_XY-SNP and B, are summarized in tables S1, S2, S3, S4 and S5, respectively. Figure 1 shows the means and standard errors for the treatments Default, High Migration, Large N, Human and No Migration. The same graphs for the treatments No Recombination and CNC97 can be found in Figure S3.

Figure 1:

Comparisons of means F_ST (left column), d_XY (central column), and H_S (right column) between simulations with (black) and without (grey) BGS. Similar graphs for treatments No Recombination and CNC97 are in figure S3. Error bars are 95% CI.

Figure S3:

Comparisons of means F_ST (left column), d_XY (central column), and H_S (right column) between simulations with (black) and without (grey) BGS for all unrealistic treatments. Realistic treatments (Default, No Migration, High Migration, Human and Large N) are in Figure 1 in the main text. Error bars are 95% confidence intervals.

Genetic Diversity

Genetic diversity within populations (H_S) is very similar among the treatments Default, High Migration and Human (around H_S = 1.9 × 10⁻⁴) but is about 1.9 times lower in the No Migration treatment (H_S = 1.0 × 10⁻⁴) and about 10 times higher in the Large N treatment (H_S = 1.9 × 10⁻³; Figure 1).

B is significantly correlated with genetic diversity within populations (H_S) for all treatments using the stickleback genome (and at almost all generations) but not with the Human treatment (table S1). Excluding the unrealistic treatments (No Recombination and CNC97), simulations with BGS have a genetic diversity 4% to 20% lower than simulations without BGS (Figure 1, right column and Figure S3, right column). In the Human treatment, there is no significant correlation of B and H_S. Note that Pearson’s correlation coefficients between B and H_S are always very small even when the effect is highly significant. The largest R² observed in realistic simulations (excluding in the No Recombination and CNC97 treatments) between B and H_S is R² ≈ 0.0121.

View this table:

Table S1:

Pearson’s correlation tests for the association between the coefficient of background selection (B) and H_S. P-values and R are computed on the simulations with BGS only.

Statistics of population divergence

Figure 2 shows the correlation between B and the statistics F_ST, d_XY and H_S for Default at the last generation. These graphs highlight the general tendencies of the treatments Default, High Migration, Large N and No Recombination. The strongest correlation with B is observed for the statistics d_XY (P = 3.28 × 10⁻⁵, R = 0.093) and H_S (P = 3.1 × 10⁻⁵, R = 0.093). In fact, the two statistics d_XY and H_S are very highly correlated (P < 2.2 × 10⁻¹⁶, R = 0.99). This high correlation explains the resemblance between the central and right graphs of figure 2. F_ST is not correlated with B (P = 0.99, R = 10⁻⁴). All correlation tests between B and the statistics F_ST, F_{ST (average of ratios)}, d_XY, d_XY-SNP can be found in Tables S2, S3, S4 and S5, respectively.

View this table:

Table S2:

Pearson’s correlation tests for the association between the coefficient of background selection (B) and F_ST. P-values and R are computed on the simulations with BGS only.

View this table:

Table S3:

Pearson’s correlation tests for the association between the coefficient of background selection (B) and F_{ST (average of ratios)}. P-values and R are computed on the simulations with BGS only.

View this table:

Table S4:

Pearson’s correlation tests for the association between the coefficient of background selection (B) and d_XY. P-values and R are computed on the simulations with BGS only.

View this table:

Table S5:

Pearson’s correlation tests for the association between the coefficient of background selection (B) and d_XY-SNP. P-values and R are computed on the simulations with BGS only.

Figure 2:

Correlation between B and F_ST, H_S, and d_XY for the last generation (5 × 2N generations after the split) of the Default treatment. Each grey dot is a single simulation where there is BGS. The large black dot is the mean of the simulations where BGS was artificially turned off. The P-values are computed from a Pearson’s correlation test. P-values and R are computed on the simulations with BGS (grey dots) only.

The No Migration treatment is an exception to the other treatments. F_ST is not significantly correlated with B at early generations but become slightly correlated as divergence rises to 0.6 and higher. d_XY shows an opposite pattern. d_XY is very significantly correlated with B at early generations and seemingly independent of B at the last generation. Note that for both d_XY and F_ST, all correlation coefficients are always very small. The largest R² observed is R²=0.0121 in realistic simulations (found for F_ST No Migration and for d_XY Large N; Tables S2, S4) and R²=0.0256 for the No Recombination treatment (found for d_XY; Table S4).

As expected, in the CNC97 simulations, there is a strong difference between simulations with BGS and simulations without BGS for all three statistics (F_ST, d_XY, and H_S) at all generations (Welch’s t-tests; all P < 2.2×10⁻¹⁶; Figure S3).

F_ST calculated as advised by Weir and Cockerham (1984) was generally less sensitive to BGS than F_ST calculated as an average of ratios (compare tables S2 and S3). Figure S4 illustrates the sensitivity of F_{ST (average of ratios)} in the worst case, the No Recombination treatment. This sensitivity is driven largely by rare alleles and goes away when minor alleles below a frequency of 0.05 are excluded.

Figure S4:

Correlations between B and F_ST, d_xy, F_ST (average of ratios), d_xy-SNP, and F_ST (average of ratios) after removing all loci that have minor allele frequency (MAF) lower than 0.05 (called MAF - F_ST (average of ratios)) for the treatment No Recombination only at the last generation (5 x 2N generations after the split). Each grey dot is a single simulation with BGS. The large black dot is the mean of all simulations without BGS. The P-values are computed from a Pearson’s correlation test. P-values and R are computed on the simulations with BGS (grey dots) only.

F_ST outlier tests

The observed false positive rate is relatively close to the α values except for No Migration (with and without BGS) and CNC97 (with BGS). With the exception of treatments No Migration and CNC97, there is no significant difference in false positive rates between simulations with BGS and those without BGS for α of 0.05 (Figure 3). In the treatment, No Migration, the false positive rates between simulations with and without BGS are significantly different for the latter generations. In this treatment, the false positive rate for both simulations with and without BGS are much higher than the α value of 0.05. Results remain very congruent for other α values (results not shown).

Figure 3:

Comparison of false positive rate (FPR) returned by FDist2 between simulations with BGS (black) and without BGS (grey) for all treatments by generation. The significance level is 0.05 and is represented by the horizontal dashed line. Significance based on a Welch’s t-test is indicated with stars (2×10⁻¹⁶’***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1). Significance levels are the same with Wilcoxon tests.

Discussion

Background selection reduces genetic diversity, both within and among populations, but the effect on F_ST is rather small. In simulations of interconnected populations with realistic parameters, F_ST is insensitive to BGS while the absolute measure of divergence d_XY is affected by BGS. On the other hand, in highly diverged populations unconnected by migration, BGS can have a greater impact on F_ST and a lower impact on d_XY. The effects of BGS, when observed, are always very small with R² never over 1.1%.

BGS impacts both total and within population genetic diversity. Excluding the treatments No recombination and CNC97, we observe that simulations with BGS have a genetic diversity (whether H_T or H_S; H_T data not shown) 6% to 16% lower than simulations without BGS. Messer & Petrov (2013) simulated a panmictic population, looking at a sequence of similar length inspired from a gene-rich region of the human genome, and reported a similar decrease in genetic diversity. Under the No Recombination treatment, this average reduction of genetic diversity due to BGS is 53%. Although empirical estimates are very complex and can hardly disentangle BGS from selective sweeps, our results are also comparable with empirical estimates. Reduction in genetic diversity in humans between regions under high BGS are estimated at 6% according to Cai et al. (2009) or 19-26% according to McVicker et al. (2009). In Drosophila melanogaster, where gene density is higher, the reduction in genetic diversity due to BGS is estimated at 36% when using Kim & Stephan (2000)’s methodology and is estimated at 71% reduction using a composite likelihood approach (Elyashiv et al., 2016) and is hence closer to our No Recombination treatment than to the other treatments. It is worth noting that, because we were interested in simulating variance among sites in effects of BGS, we only simulated local effects and therefore underestimate the expected genome-wide effect of BGS on H_T.

In contrast to measures of heterozygosity, F_ST was generally not significantly correlated with B. The only exception is for the No Migration treatment, where, after many generations, as the average F_ST becomes very high (F_ST > 0.5), we observe a slight, yet significant, negative correlation between the expected effects of BGS, B, and F_ST (intense BGS lead to high values of F_ST). This highlights that F_ST is not completely insensitive to BGS, but F_ST is largely robust to BGS.

Future research is needed to attempt a theoretical estimate of the genome-wide effect of BGS. Our work has been restricted to the stickleback and human genomes. While these two genomes are good representatives of many cases of eukaryotic genomes, they are not good representatives of more compact genomes such as bacterial genomes or yeasts. Our simulations used randomly mating diploid populations. Non-random mating, selfing, and asexual reproduction could also affect our general conclusion, and potentially strongly increase the effects of BGS on F_ST (Charlesworth et al. 1997). We have explored two population sizes, but we could not explore population sizes of the order of a million individuals (like Drosophila melanogaster) and still realistically simulate such long stretch of DNA. It is not impossible that a much greater population size or a more complex demography could yield to BGS having a greater effect on F_ST than what we observed here (Torres et al. 2017).

Some have argued that, because BGS reduces the within population diversity, it should lead to high F_ST (Cutter & Payseur, 2013; Cruickshank & Hahn, 2014; Hoban et al., 2016). All else being equal, this statement is correct. However, BGS reduces H_T almost as much as H_S (Figure 4). It is therefore insufficient to consider only one component, and we must consider the ratio of these two quantities captured by the definition of F_ST, . This ratio, as we have shown, appears to be relatively robust to BGS. While genome-wide BGS might eventually be strong enough to cause departures with F_STvalues, it appears that locus-to-locus variation in the intensity of BGS is not strong enough to have much impact on F_ST as long as populations are not too highly diverged.

Figure 4:

Regressions of total (H_T; upper line) and within (H_S; lower line) population expected heterozygosity on the coefficient of BGS (B) for the last generation of the Default treatment. The two regression lines are not exactly parallel with H_S tending to H_T as B goes to low values (more intense BGS).

We also investigated the consequences of BGS on the widely-used but imperfect estimator, F_{ST (average of ratios)}, for which F_ST measures for each locus are averaged to create a genomic average. It is well known that F_{ST (average of ratios)} is a biased way to average F_!”over several loci (Weir & Cockerham, 1984); however, its usage is relatively common today. In our simulations, F_{ST (average of ratios)} is more affected by BGS than F_ST. Interestingly, F_{ST (average of ratios)} is most often higher with weaker BGS. The directionality of this correlation may seem unintuitive at first. To understand this discrepancy, remember that BGS affects the site frequency spectrum; we observed that BGS leads to an excess of loci with low H_T (results not shown but see Charlesworth et al., 1995; see also contrary expectation in Stephan, 2010). Loci associated with very low H_T also have low F_ST (figure S5), a well-known result described by Beaumont and Nichols (1996). As BGS creates an excess of loci with low H_T and loci with low H_T tend to have low F_ST, BGS can actually reduce F_{ST (average of ratios)}. After filtering out SNPs with a minor allele frequency lower than 5%, most of the correlation between F_{ST (average of ratios)} and B is eliminated (Figure S4).

Figure S5:

Relationship between total heterozygosity H_T and F_ST on a site per site basis. Each dot represents a single site. The black line is a Local Polynomial Regression (LOESS). Data is a random subset from the Default treatment at the last generation.

The absolute measure of divergence d_XY is sensitive to BGS. Regions of stronger BGS are associated with low d_XY. The effect, although significant, is of relatively small size. The expected d_XY for neutral loci is d_XY = 4Nµ + 2tµ (Nei, 1987), where t is the time in generation since the populations started to diverge. 4Nµ is the expected heterozygosity in the ancestral population (before splitting) and 2tµ is the expected number of mutations fixed over time in either population since the population split. BGS does not affect the rate of fixation of mutations arising after the populations diverged, but BGS affects the expected heterozygosity. Therefore, BGS should affect d_XY by its effect on the expected heterozygosity, and this effect should be greater early in divergence when the 4Nμ term is large relative to the fixation term. This is consistent with the results of our simulations. This result is in agreement with Vijay et al. (2017) who reported a strong correlation between H_S and d_XY when F_ST is low (F_ST ≈ 0.02), but this correlation breaks down when studying more distantly related populations (F_ST ≈ 0.3).

Interestingly, d_XY becomes less sensitive to BGS when F_ST becomes more sensitive. While our methodology does not allow us to test the efficiency of d_XY in outlier tests, it is possible that d_XY could be used for highly divergent lineages, but not in cases when divergence is relatively low. Cruickshank and Hahn (2014) suggested relying more on d_XY than F_ST for finding highly divergent loci. Their conclusion was based on analysis of a dataset involving highly divergent populations only (F_ST values range from about 0.38 to about 0.8). Based on our simulations, their conclusion is not valid when populations are not highly diverged.

As BGS also leads to a reduction of the number of polymorphic sites, BGS has an even stronger effect on d_XY-SNP than on d_XY (Figure S4). (The measure that we call d_XY- SNP is d_xy improperly calculated based only on polymorphic sites, as is done in some software packages.) This result highlights the importance of not blindly trusting the output of a given software package.

Outside the effect of BGS on N_e, there are at least two other possible factors that can potentially affect the correlation between B and µ: the effect of deleterious mutations on the effective migration rate and the auto-correlation of µ. Because most deleterious mutations are recessive (García-Dorado and Caballero, 2000; Peters et al., 2003; Shaw & Chang, 2006), the offspring of migrants, who enjoy an increased heterozygosity compared to local individuals, will be at a selective advantage. The presence of deleterious mutations therefore lead to an increase in the effective migration rate (Ingvarsson & Whitlock, 2000). This increases the effective migration rate and hence, leads to a decrease in F_ST.

As mutation rate is auto-correlated throughout the genome, neutral sequences closely linked to sequences that frequently receive deleterious mutation are also likely to experience frequent neutral mutations. As a high mutation rate leads to low F values (, Wright 1943), autocorrelation in mutation rate may also impact the correlation between B and F_ST. This effect is likely to be negligible as long as m >> µ.

Recently, evidence of a correlation between recombination rate and F_ST has been interpreted as likely being caused by deleterious mutations rather than positive selection, whether the divergence between populations is very high (e.g. Cruickshank & Hahn, 2014), moderately high (Vijay et al., 2017) or moderately low (Torres et al., 2017). Here we showed the BGS is unlikely to explain all of these correlations. It can be hypothesized that positive selection (selective sweeps and local adaptation) could be the main cause of this correlation.

McVicker et al. (2009) attempted an estimation of B values in the human genome (see also Elyashiv et al., 2016). They did so using equations from Nordborg et al. (1996). As there is little knowledge about the strength of selection throughout the genome, to our understanding, this estimation of B values should be highly influenced by the effects of beneficial mutations as well as deleterious mutations. Torres et al. (2017) reused this dataset and found a slight association between B and F_ST among human lineages. It is plausible that this correlation between B and F_ST could be driven by positive selection rather than by deleterious mutations.

Our FDist2 analysis shows that the false positive rate does not differ in simulations with BGS or without BGS. The only exceptions concern the unrealistic CNC97 treatment and the No Migration treatment after many generations (Figure 3). The average F_ST at the last generation of the No Migration treatment is greater than 0.8. With such high F_ST both the simulation without BGS and with BGS lead to very high false positive rates (0.45 without BGS and 0.5 with BGS). This difference in false positive rates is significant, but the observed high false positive rate should make it clear that for such highly diverged populations, F_ST outlier tests are not recommended in general.

Many authors (e.g. Nachman & Payseur, 2012; Cutter & Payseur, 2013; McGee et al., 2015; Yeaman, 2015; Whitlock & Lotterhos, 2015; Brousseau et al., 2016; Picq et al., 2016; Payseur & Rieseberg, 2016; Hoban et al., 2016) have raised concerns that BGS can strongly reduce our ability to detect the genomic signature of local adaptation. Our analysis shows that BGS is not a strong confounding factor to F_ST outlier tests of populations that are not too highly diverged.

Funding

The work was funded by NSERC Discovery Grant RGPIN-2016-03779 to MCW and by the Swiss National Science Foundation via the fellowship Doc.Mobility P1SKP3_168393 to RMD.

Acknowledgments

Many thanks to Loren H. Rieseberg, Sarah P. Otto and Amy L. Angert for their help in discussing the design of the project and for feedback. Thanks to Sarah P. Otto, Darren E. Irwin, and Bret A. Payseur for helpful comments on the manuscript. We also thank Yaniv Brandvain and Tom Booker for their feedback and Marius Roesti for help with the Ensembl-retrieved gene annotations.

References

↵
Bank C, Ewing GB, Ferrer-Admettla A, Foll M, Jensen JD (2014) Thinking too positive? Revisiting current methods of population genetic selection inference. Trends in Genetics, 30, 540–546.
OpenUrl CrossRef PubMed
↵
Beaumont MA, Nichols RA (1996) Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society B: Biological Sciences, 263, 1619–1626.
OpenUrl CrossRef
↵
Begun DJ, Aquadro CF (1992) Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature, 356, 519–520.
OpenUrl CrossRef PubMed Web of Science
↵
Bonhomme M, Chevalet C, Servin B et al. (2010) Detecting selection in population trees: the Lewontin and Krakauer test extended. Genetics, 186, 241–262.
OpenUrl Abstract/FREE Full Text
↵
Brousseau L, Postolache D, Lascoux M et al. (2016) Local adaptation in European Firs assessed through extensive sampling across altitudinal gradients in southern Europe. Plos One, 11.
↵
Cai JJ, Macpherson JM, Sella G, Petrov DA (2009) Pervasive hitchhiking at coding and regulatory sites in humans. PLoS Genetics, 5.
↵
Charlesworth B, Morgan MT, Charlesworth D (1993) The effect of deleterious mutations on neutral molecular variation. Genetics, 134, 1289–1303.
OpenUrl Abstract/FREE Full Text
↵
Charlesworth B (1998) Measures of divergence between populations and the effect of forces that reduce variability. Molecular Biology and Evolution, 15, 538–543.
OpenUrl CrossRef PubMed Web of Science
↵
Charlesworth B, Nordborg M, Charlesworth D (1997) The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genetical Research, 70, 155–174.
OpenUrl CrossRef PubMed Web of Science
↵
Charlesworth D, Charlesworth B, Morgan MT (1995) The pattern of neutral molecular variation under the background selection model. Genetics, 141, 1619–1632.
OpenUrl Abstract/FREE Full Text
↵
Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selective sweeps. Genome Research, 20, 393–402.
OpenUrl Abstract/FREE Full Text
↵
Cruickshank TE, Hahn MW (2014) Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Molecular Ecology, 23, 3133–3157.
OpenUrl CrossRef PubMed Web of Science
↵
Cutter AD, Payseur BA (2013) Genomic signatures of selection at linked sites: Unifying the disparity among species. Nature Reviews Genetics, 14, 262–274.
OpenUrl CrossRef PubMed
↵
Dutoit L, Vijay N, Mugal CF et al. (2017) Covariation in levels of nucleotide diversity in homologous regions of the avian genome long after completion of lineage sorting. Proceedings of the Royal Society B: Biological Sciences, 284, 20162756.
OpenUrl CrossRef PubMed
↵
Elyashiv E, Sattath S, Hu TT et al. (2016) A genomic map of the effects of linked selection in Drosophila. PLOS Genetics, 12.
↵
Foll M, Gaggiotti O (2008) A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective. Genetics, 180, 977–993.
OpenUrl Abstract/FREE Full Text
↵
García-Dorado A, Caballero A (2000) On the average coefficient of dominance of deleterious spontaneous mutations. Genetics, 155, 1991–2001.
OpenUrl Abstract/FREE Full Text
↵
Gilbert KJ, Sharp NP, Angert AL et al. (2017) Local adaptation interacts with expansion load during range expansion: Maladaptation reduces expansion load. The American Naturalist, 189, 368–380.
OpenUrl
↵
Hernandez RD (2008) A flexible forward simulator for populations subject to selection and demography. Bioinformatics, 24, 2786–2787.
OpenUrl CrossRef PubMed Web of Science
↵
Hoban S, Kelley JL, Lotterhos KE et al. (2016) Finding the genomic basis of local adaptation: Pitfalls, practical solutions, and future directions. The American Naturalist, 188, 379–397.
OpenUrl CrossRef PubMed
↵
Huber CD, Degiorgio M, Hellmann I, Nielsen R (2016) Detecting recent selective sweeps while controlling for mutation rate and background selection. Molecular Ecology, 25, 142–156.
OpenUrl CrossRef PubMed
↵
Huber PJ (1964) Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35, 73–101.
OpenUrl
↵
Hudson RR, Kaplan NL (1995) Deleterious background selection with recombination. Genetics, 141, 1605–1617.
OpenUrl Abstract/FREE Full Text
↵
Ingvarsson PK, Whitlock MC (2000) Heterosis increases the effective migration rate. Proceedings of the Royal Society B: Biological Sciences, 267, 1321–1326.
OpenUrl CrossRef PubMed Web of Science
↵
International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851.
OpenUrl CrossRef PubMed Web of Science
↵
Keightley PD, Gaffney DJ (2003) Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents. Proceedings of the National Academy of Sciences, 100, 13402–13406.
OpenUrl Abstract/FREE Full Text
↵
Keightley PD (2012) Rates and Fitness Consequences of new mutations in humans. Genetics, 190, 295–304.
OpenUrl Abstract/FREE Full Text
↵
Kim Y, Stephan W (2000) Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics, 155, 1415–1427.
OpenUrl Abstract/FREE Full Text
↵
Lange JD, Pool JE (2016) A haplotype method detects diverse scenarios of local adaptation from genomic sequence variation. Molecular Ecology, 25, 3081–3100.
OpenUrl
↵
Lotterhos KE, Whitlock MC (2014) Evaluation of demographic history and neutral parameterization on the performance of F_ST outlier tests. Molecular Ecology, 23, 2178–2192.
OpenUrl CrossRef PubMed
↵
Maynard Smith J, Haigh J (1974) The hitch-hiking effect of a favourable gene. Genetical Research, 23, 23.
OpenUrl CrossRef PubMed Web of Science
↵
McGee MD, Neches RY, Seehausen O (2015) Evaluating genomic divergence and parallelism in replicate ecomorphs from young and old cichlid adaptive radiations. Molecular Ecology, 25, 260–268.
OpenUrl
↵
McVicker G, Gordon D, Davis C, Green P (2009) Widespread genomic signatures of natural selection in hominid evolution. PLoS Genetics, 5.
↵
Messer PW, Petrov DA (2013) Frequent adaptation and the McDonald-Kreitman test. Proceedings of the National Academy of Sciences, 110, 8615–8620.
OpenUrl Abstract/FREE Full Text
↵
Mita SD, Siol M (2012) EggLib: processing, analysis and simulation tools for population genetics and genomics. BMC Genetics, 13, 27.
OpenUrl
↵
Nachman MW, Crowell SL (2000) Estimate of the mutation rate per nucleotide in humans. Genetics, 156, 297–304.
OpenUrl Abstract/FREE Full Text
↵
Nachman MW, Payseur BA (2012) Recombination rate variation and speciation: Theoretical predictions and empirical results from rabbits and mice. Philosophical Transactions of the Royal Society B: Biological Sciences, 367, 409–421.
OpenUrl CrossRef PubMed
↵
Nei M (1973) Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, 70, 3321–3323.
OpenUrl Abstract/FREE Full Text
↵
Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York (N.Y.).
↵
Nordborg M, Charlesworth B, Charlesworth D (1996) The effect of recombination on background selection. Genetical Research, 67, 159.
OpenUrl CrossRef PubMed Web of Science
↵
Payseur BA, Rieseberg LH (2016) A genomic perspective on hybridization and speciation. Molecular Ecology, 25, 2337–2360.
OpenUrl
↵
Peters AD, Halligan DL, Whitlock MC, Keightley PD (2003) Dominance and overdominance of mildly deleterious induced mutations for fitness traits in Caenorhabditis elegans. Genetics, 165, 589–599.
OpenUrl Abstract/FREE Full Text
↵
Picq S, Mcmillan WO, Puebla O (2016) Population genomics of local adaptation versus speciation in coral reef fishes (Hypoplectrusspp, Serranidae). Ecology and Evolution, 6, 2109–2124.
OpenUrl
↵
Reynolds J, Weir BS, Cockerham CC (1983) Estimation of the coancestry coefficient: Basis for a short-term genetic distance. Genetics, 105, 767–779.
OpenUrl Abstract/FREE Full Text
↵
Roesti M, Moser D, Berner D (2013) Recombination in the threespine stickleback genome-patterns and consequences. Molecular Ecology, 22, 3014–3027.
OpenUrl CrossRef PubMed Web of Science
↵
Sabeti PC, Varilly P, Fry B et al. (2007) Genome-wide detection and characterization of positive selection in human populations. Nature, 449, 913.
OpenUrl CrossRef PubMed Web of Science
↵
Shaw RG, Chang S-M (2006) Gene action of new mutations in Arabidopsis thaliana. Genetics, 172, 1855–1865.
OpenUrl Abstract/FREE Full Text
↵
Spencer CCA, Deloukas P, Hunt S et al. (2006) The influence of recombination on human genetic diversity. PLoS Genetics, 2.
↵
Stephan W (2010) Genetic hitchhiking versus background selection: the controversy and its implications. Philosophical Transactions of the Royal Society B: Biological Sciences, 365, 1245–1253.
OpenUrl CrossRef PubMed
↵
Torres R, Szpiech ZA, Hernandez RD (2017) Human demographic history has amplified the effects of background selection across the genome.
↵
Vijay N, Weissensteiner M, Burri R et al. (2017) Genomewide patterns of variation in genetic diversity are shared among populations, species and higher-order taxa. Molecular Ecology, 26, 4284–4295.
OpenUrl CrossRef
↵
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution, 38, 1358.
OpenUrl CrossRef PubMed Web of Science
↵
Whitlock MC (1992) Temporal fluctuations in demographic parameters and the genetic variance among populations. Evolution, 46, 608.
OpenUrl CrossRef Web of Science
↵
Whitlock MC, Lotterhos KE (2015) Reliable detection of loci responsible for local adaptation: Inference of a null model through trimming the distribution of F_ST. The American Naturalist, 186.
↵
Wright S (1943) Isolation by distance. Genetics, 28, 114–138.
OpenUrl FREE Full Text
↵
Yeaman S (2015) Local adaptation by alleles of small effect. The American Naturalist, 186.
↵
Zeng K, Corcoran P (2015) The effects of background and interference selection on patterns of genetic variation in subdivided populations. Genetics, 201, 1539–1554.
OpenUrl Abstract/FREE Full Text
↵
Zeng K, Charlesworth B (2011) The joint effects of background selection and genetic recombination on local gene genealogies. Genetics, 189, 251–266.
OpenUrl Abstract/FREE Full Text
↵
Zerbino DR, Achuthan P, Akanni W et al.(2017) Ensembl 2018. Nucleic acids research, 46, D754–D761.
OpenUrl CrossRef

View the discussion thread.

Posted May 24, 2018.

Download PDF

Citation Tools

Subject Area

Evolutionary Biology

Subject Areas

All Articles

Animal Behavior and Cognition (5201)
Biochemistry (11715)
Bioengineering (8723)
Bioinformatics (29128)
Biophysics (14935)
Cancer Biology (12049)
Cell Biology (17359)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14144)
Epidemiology (2067)
Evolutionary Biology (18268)
Genetics (12221)
Genomics (16767)
Immunology (11843)
Microbiology (28014)
Molecular Biology (11560)
Neuroscience (60810)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4940)
Plant Biology (10384)
Scientific Communication and Education (1680)
Synthetic Biology (2878)
Systems Biology (7333)
Zoology (1642)

[1] ↵
Bank C, Ewing GB, Ferrer-Admettla A, Foll M, Jensen JD (2014) Thinking too positive? Revisiting current methods of population genetic selection inference. Trends in Genetics, 30, 540–546.
OpenUrl CrossRef PubMed

[2] ↵
Beaumont MA, Nichols RA (1996) Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society B: Biological Sciences, 263, 1619–1626.
OpenUrl CrossRef

[3] ↵
Begun DJ, Aquadro CF (1992) Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature, 356, 519–520.
OpenUrl CrossRef PubMed Web of Science

[4] ↵
Bonhomme M, Chevalet C, Servin B et al. (2010) Detecting selection in population trees: the Lewontin and Krakauer test extended. Genetics, 186, 241–262.
OpenUrl Abstract/FREE Full Text

[5] ↵
Brousseau L, Postolache D, Lascoux M et al. (2016) Local adaptation in European Firs assessed through extensive sampling across altitudinal gradients in southern Europe. Plos One, 11.

[6] ↵
Cai JJ, Macpherson JM, Sella G, Petrov DA (2009) Pervasive hitchhiking at coding and regulatory sites in humans. PLoS Genetics, 5.

[7] ↵
Charlesworth B, Morgan MT, Charlesworth D (1993) The effect of deleterious mutations on neutral molecular variation. Genetics, 134, 1289–1303.
OpenUrl Abstract/FREE Full Text

[8] ↵
Charlesworth B (1998) Measures of divergence between populations and the effect of forces that reduce variability. Molecular Biology and Evolution, 15, 538–543.
OpenUrl CrossRef PubMed Web of Science

[9] ↵
Charlesworth B, Nordborg M, Charlesworth D (1997) The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genetical Research, 70, 155–174.
OpenUrl CrossRef PubMed Web of Science

[10] ↵
Charlesworth D, Charlesworth B, Morgan MT (1995) The pattern of neutral molecular variation under the background selection model. Genetics, 141, 1619–1632.
OpenUrl Abstract/FREE Full Text

[11] ↵
Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selective sweeps. Genome Research, 20, 393–402.
OpenUrl Abstract/FREE Full Text

[12] ↵
Cruickshank TE, Hahn MW (2014) Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Molecular Ecology, 23, 3133–3157.
OpenUrl CrossRef PubMed Web of Science

[13] ↵
Cutter AD, Payseur BA (2013) Genomic signatures of selection at linked sites: Unifying the disparity among species. Nature Reviews Genetics, 14, 262–274.
OpenUrl CrossRef PubMed

[14] ↵
Dutoit L, Vijay N, Mugal CF et al. (2017) Covariation in levels of nucleotide diversity in homologous regions of the avian genome long after completion of lineage sorting. Proceedings of the Royal Society B: Biological Sciences, 284, 20162756.
OpenUrl CrossRef PubMed

[15] ↵
Elyashiv E, Sattath S, Hu TT et al. (2016) A genomic map of the effects of linked selection in Drosophila. PLOS Genetics, 12.

[16] ↵
Foll M, Gaggiotti O (2008) A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective. Genetics, 180, 977–993.
OpenUrl Abstract/FREE Full Text

[17] ↵
García-Dorado A, Caballero A (2000) On the average coefficient of dominance of deleterious spontaneous mutations. Genetics, 155, 1991–2001.
OpenUrl Abstract/FREE Full Text

[18] ↵
Gilbert KJ, Sharp NP, Angert AL et al. (2017) Local adaptation interacts with expansion load during range expansion: Maladaptation reduces expansion load. The American Naturalist, 189, 368–380.
OpenUrl

[19] ↵
Hernandez RD (2008) A flexible forward simulator for populations subject to selection and demography. Bioinformatics, 24, 2786–2787.
OpenUrl CrossRef PubMed Web of Science

[20] ↵
Hoban S, Kelley JL, Lotterhos KE et al. (2016) Finding the genomic basis of local adaptation: Pitfalls, practical solutions, and future directions. The American Naturalist, 188, 379–397.
OpenUrl CrossRef PubMed

[21] ↵
Huber CD, Degiorgio M, Hellmann I, Nielsen R (2016) Detecting recent selective sweeps while controlling for mutation rate and background selection. Molecular Ecology, 25, 142–156.
OpenUrl CrossRef PubMed

[22] ↵
Huber PJ (1964) Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35, 73–101.
OpenUrl

[23] ↵
Hudson RR, Kaplan NL (1995) Deleterious background selection with recombination. Genetics, 141, 1605–1617.
OpenUrl Abstract/FREE Full Text

[24] ↵
Ingvarsson PK, Whitlock MC (2000) Heterosis increases the effective migration rate. Proceedings of the Royal Society B: Biological Sciences, 267, 1321–1326.
OpenUrl CrossRef PubMed Web of Science

[25] ↵
International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851.
OpenUrl CrossRef PubMed Web of Science

[26] ↵
Keightley PD, Gaffney DJ (2003) Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents. Proceedings of the National Academy of Sciences, 100, 13402–13406.
OpenUrl Abstract/FREE Full Text

[27] ↵
Keightley PD (2012) Rates and Fitness Consequences of new mutations in humans. Genetics, 190, 295–304.
OpenUrl Abstract/FREE Full Text

[28] ↵
Kim Y, Stephan W (2000) Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics, 155, 1415–1427.
OpenUrl Abstract/FREE Full Text

[29] ↵
Lange JD, Pool JE (2016) A haplotype method detects diverse scenarios of local adaptation from genomic sequence variation. Molecular Ecology, 25, 3081–3100.
OpenUrl

[30] ↵
Lotterhos KE, Whitlock MC (2014) Evaluation of demographic history and neutral parameterization on the performance of F_ST outlier tests. Molecular Ecology, 23, 2178–2192.
OpenUrl CrossRef PubMed

[31] ↵
Maynard Smith J, Haigh J (1974) The hitch-hiking effect of a favourable gene. Genetical Research, 23, 23.
OpenUrl CrossRef PubMed Web of Science

[32] ↵
McGee MD, Neches RY, Seehausen O (2015) Evaluating genomic divergence and parallelism in replicate ecomorphs from young and old cichlid adaptive radiations. Molecular Ecology, 25, 260–268.
OpenUrl

[33] ↵
McVicker G, Gordon D, Davis C, Green P (2009) Widespread genomic signatures of natural selection in hominid evolution. PLoS Genetics, 5.

[34] ↵
Messer PW, Petrov DA (2013) Frequent adaptation and the McDonald-Kreitman test. Proceedings of the National Academy of Sciences, 110, 8615–8620.
OpenUrl Abstract/FREE Full Text

[35] ↵
Mita SD, Siol M (2012) EggLib: processing, analysis and simulation tools for population genetics and genomics. BMC Genetics, 13, 27.
OpenUrl

[36] ↵
Nachman MW, Crowell SL (2000) Estimate of the mutation rate per nucleotide in humans. Genetics, 156, 297–304.
OpenUrl Abstract/FREE Full Text

[37] ↵
Nachman MW, Payseur BA (2012) Recombination rate variation and speciation: Theoretical predictions and empirical results from rabbits and mice. Philosophical Transactions of the Royal Society B: Biological Sciences, 367, 409–421.
OpenUrl CrossRef PubMed

[38] ↵
Nei M (1973) Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, 70, 3321–3323.
OpenUrl Abstract/FREE Full Text

[39] ↵
Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York (N.Y.).

[40] ↵
Nordborg M, Charlesworth B, Charlesworth D (1996) The effect of recombination on background selection. Genetical Research, 67, 159.
OpenUrl CrossRef PubMed Web of Science

[41] ↵
Payseur BA, Rieseberg LH (2016) A genomic perspective on hybridization and speciation. Molecular Ecology, 25, 2337–2360.
OpenUrl

[42] ↵
Peters AD, Halligan DL, Whitlock MC, Keightley PD (2003) Dominance and overdominance of mildly deleterious induced mutations for fitness traits in Caenorhabditis elegans. Genetics, 165, 589–599.
OpenUrl Abstract/FREE Full Text

[43] ↵
Picq S, Mcmillan WO, Puebla O (2016) Population genomics of local adaptation versus speciation in coral reef fishes (Hypoplectrusspp, Serranidae). Ecology and Evolution, 6, 2109–2124.
OpenUrl

[44] ↵
Reynolds J, Weir BS, Cockerham CC (1983) Estimation of the coancestry coefficient: Basis for a short-term genetic distance. Genetics, 105, 767–779.
OpenUrl Abstract/FREE Full Text

[45] ↵
Roesti M, Moser D, Berner D (2013) Recombination in the threespine stickleback genome-patterns and consequences. Molecular Ecology, 22, 3014–3027.
OpenUrl CrossRef PubMed Web of Science

[46] ↵
Sabeti PC, Varilly P, Fry B et al. (2007) Genome-wide detection and characterization of positive selection in human populations. Nature, 449, 913.
OpenUrl CrossRef PubMed Web of Science

[47] ↵
Shaw RG, Chang S-M (2006) Gene action of new mutations in Arabidopsis thaliana. Genetics, 172, 1855–1865.
OpenUrl Abstract/FREE Full Text

[48] ↵
Spencer CCA, Deloukas P, Hunt S et al. (2006) The influence of recombination on human genetic diversity. PLoS Genetics, 2.

[49] ↵
Stephan W (2010) Genetic hitchhiking versus background selection: the controversy and its implications. Philosophical Transactions of the Royal Society B: Biological Sciences, 365, 1245–1253.
OpenUrl CrossRef PubMed

[50] ↵
Torres R, Szpiech ZA, Hernandez RD (2017) Human demographic history has amplified the effects of background selection across the genome.

[51] ↵
Vijay N, Weissensteiner M, Burri R et al. (2017) Genomewide patterns of variation in genetic diversity are shared among populations, species and higher-order taxa. Molecular Ecology, 26, 4284–4295.
OpenUrl CrossRef

[52] ↵
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution, 38, 1358.
OpenUrl CrossRef PubMed Web of Science

[53] ↵
Whitlock MC (1992) Temporal fluctuations in demographic parameters and the genetic variance among populations. Evolution, 46, 608.
OpenUrl CrossRef Web of Science

[54] ↵
Whitlock MC, Lotterhos KE (2015) Reliable detection of loci responsible for local adaptation: Inference of a null model through trimming the distribution of F_ST. The American Naturalist, 186.

[55] ↵
Wright S (1943) Isolation by distance. Genetics, 28, 114–138.
OpenUrl FREE Full Text

[56] ↵
Yeaman S (2015) Local adaptation by alleles of small effect. The American Naturalist, 186.

[57] ↵
Zeng K, Corcoran P (2015) The effects of background and interference selection on patterns of genetic variation in subdivided populations. Genetics, 201, 1539–1554.
OpenUrl Abstract/FREE Full Text

[58] ↵
Zeng K, Charlesworth B (2011) The joint effects of background selection and genetic recombination on local gene genealogies. Genetics, 189, 251–266.
OpenUrl Abstract/FREE Full Text

[59] ↵
Zerbino DR, Achuthan P, Akanni W et al.(2017) Ensembl 2018. Nucleic acids research, 46, D754–D761.
OpenUrl CrossRef

Background selection and the statistics of population differentiation: consequences for detecting local adaptation

Abstract

Introduction

Methods

Genetics

Selection

Demography

Treatments

Predicted intensity of Background Selection

FST outlier tests

Statistics

Results

Genetic Diversity

Statistics of population divergence

FST outlier tests

Discussion

Funding

Acknowledgments

References

Citation Manager Formats

Subject Area

F_ST outlier tests

F_ST outlier tests