Abstract
Short-scale local adaptation is a complex process involving selection, migration and drift. The expected effects on the genome are well grounded in theory, but examining these on an empirical level has proven difficult, as it requires information about local selection, demographic history and recombination rate variation. Here, we use locally adapted and phenotypically differentiated Arabidopsis lyrata populations from two altitudinal gradients in Norway to test these expectations at the whole-genome level. Demography modelling indicated that populations within the gradients diverged less than 2 kya and that the sites are connected by gene flow. The gene flow estimates were, however, highly asymmetric with migration from high to low altitudes being several times more frequent than vice versa. To detect signatures of selection for local adaptation, we estimated patterns of lineage specific differentiation among these populations. Theory predicts that gene flow leads to concentration of adaptive loci in areas of low recombination; a pattern we observed in both lowland-alpine comparisons. Although most selected loci displayed patterns of conditional neutrality, we found indications of genetic trade-offs, with one locus particularly showing high divergence and signs of selection in both populations. Our results further suggest that resistance to solar radiation is an important adaptation to alpine environments, while vegetative growth and bacterial defense were indicated as selected traits in the lowland habitats. These results provide insights into genetic architectures and evolutionary processes driving local adaptation under gene flow. We also contribute to understanding of traits and biological processes underlying alpine adaptation in northern latitudes.
Introduction
Population genetics theory by Haldane (1930) and Wright (1931) laid the basis for predicting how selection and drift influence adaptive variation in the presence of gene flow. More recently, theoretical studies have shown that the interplay between these factors can lead to characteristic changes in genetic architectures underlying local adaptation (Griswold 2006; Yeaman and Whitlock 2011). An important prediction is the shift towards fewer large effect loci that are concentrated in areas of reduced recombination, as differences are more easily maintained in a set of correlated sites under strong selection (Slatkin 1975; Lenormand and Otto 2000; Griswold 2006). The adaptive loci are also expected to show antagonistic pleiotropy, wherein a single locus is kept polymorphic by differential selection to contrasting environments, because under the alternative scenario of conditional neutrality (allele has a fitness effect only in one environment, while being neutral in the other), the overall beneficial allele will be fixed at all populations over time (Yeaman and Whitlock 2011; Wadgymar et al. 2017; Yoder and Tiffin 2017). These effects can make the footprint of natural selection more distinct (Pfeifer et al. 2018; Wang et al. 2018), but migration also hinders adaptive divergence by swamping beneficial alleles (Lenormand 2002). Therefore, short-scale local adaptation is inevitably restricted to environments where spatially varying selection can overcome the homogenizing effects of the gene flow. For example, plant populations growing on toxic soils, either naturally occurring or contaminated by mine tailings, have exhibited such adaptation (Antonovics and Bradshaw 1970; Sambatti and Rice 2006; Arnold et al. 2016; Aeschbacher et al. 2017).
In the present study, we examine the genetic basis of local adaptation among lowland and alpine populations of a self-incompatible perennial plant, Arabidopsis lyrata. In mountainous environments, abiotic factors such as temperature and solar radiation can differ drastically among closely adjacent areas, producing steep environment gradients (Gonzalo-Turpin and Hazard 2009;Fischer et al. 2013; Kubota et al. 2015; Günther et al. 2016), not unlike those found on toxic soils. Indeed, we have recently demonstrated that A. lyrata individuals from different altitudes exhibited population specific phenotypes when grown at native common garden sites in Norway and in a novel habitat in Finland (Hämälä et al. 2018). Using whole-genome based demography modelling and a reciprocal transplant experiment, we further showed that despite evidence of gene flow, local alpine and lowland populations had highest fitness at their home sites, satisfying the local vs. foreign criterion of local adaptation (Kawecki and Ebert 2004). This study system and the combined knowledge of local adaptation and demographic history, as well as genome-wide recombination information from a high density linkage map (Hämälä et al. 2017), provides us an rare opportunity to study genomic patterns of adaptive divergence under gene flow (Savolainen et al. 2013).
Here, we extend our previous work to the genome level by examining variation potentially underlying the local adaptation. If the alpine adaptation evolved after recent colonization from the lowland sites, the high-altitude populations are expected to show lower genetic diversities, higher accumulation of deleterious mutations, and stronger population size contractions than the low altitude populations. The ongoing gene flow then leads us to look for evidence of antagonistic pleiotropy (loci showing high differentiation and signs of selection in both populations) and loci kept polymorphic by reduced recombination. Furthermore, we expect to identify genes and biological processes underlying the lowland and alpine adaptation. We search for signs of selection by employing a measure for lineage specific differentiation, population branch statistic (PBS) (Yi et al. 2010), on a set of whole-genome sequences. Unlike traditional differentiation measures, PBS can distinguish the selected lineage by combining information from two closely related populations and an outgroup. As this method searches for allele frequency differences among populations, it is more likely to detect large effect loci instead of small effect ones underlying truly quantitative traits, because the latter signal may largely arise from linkage disequilibrium between populations (Latta 1998). We then explore the potential effects of selection with coalescent and forward simulations, utilizing parameter estimates available for these populations.
We specifically address the following questions: What do diversity patterns tell us about the demographic history of these populations, and do we find footprints of gene flow on this variation? Are there more selected sites in areas of reduced recombination in populations that receive more migrants? Is the genetic architecture dominated by conditional neutrality or do we find loci exhibiting variation consistent with antagonistic pleiotropy? And to what phenotypes and biological processes are the outliers associated with and do we find evidence of adaptive convergence between the two alpine areas?
Results
We used whole-genome data from 47 resequenced A. lyrata individuals. Most of the samples were collected from four populations growing in two alpine areas in Norway, each represented by a high and low group (Jotunheimen and Trollheimen): Lom, Jotunheimen (J1, 300 m.a.s.l.; n = 9), Spiterstulen, Jotunheimen (J3, 1,100 m.a.s.l; n = 12), Sunndalsøra, Trollheimen (T1, 10 m.a.s.l; n = 5) and Nedre Kamtjern, Trollheimen (T4, 1,360 m.a.s.l; n = 9). The abbreviations come fromHämälä et al. (2018). Based on patterns of microsatellite variation, populations within both alpine areas form distinct genetic clusters (Gaudeul et al. 2007). For some of the analysis, we also included samples from Germany (GER; n = 6) and Sweden (SWE; n = 6) as comparison groups. Data from the T4 population represent new collections for the current study, whereas other samples came from two earlier studies (Mattila et al. 2017; Hämälä et al. 2018).
Alpine populations harbour lower genetic diversities than lowland populations
Within each area, the high-altitude populations had lower synonymous nucleotide diversity estimates than the low-altitude populations (p < 0.001, Kolmogorov-Smirnov test [KST]) (Table 1). The high-altitude populations also had higher πN/πS ratios (p < 0.001, KST), likely indicating greater accumulation of deleterious mutations. Furthermore, synonymous Tajima’s D estimates were more highly positive in the high-altitude populations (p < 1×10-16, KST), suggesting stronger population size contractions in their history. Differentiation levels, as estimated with FST, were lower between the neighbouring lowland and alpine populations than between other comparison pairs (Table 1). The Jotunheimen populations were less differentiated from each other (J1-J3; FST = 0.09) than populations from Trollheimen (T1-T4; FST = 0.17).
We then conducted a principal component analysis (PCA) and an admixture analysis to further evaluate the relationships between the studied individuals. PCA showed distinct divisions, with individuals clustering according to populations and populations clustering according to geographical location (Fig. 1A). Consistent with the FST estimates, the Trollheimen populations were more clearly separated from each other than populations from Jotunheimen (Fig. 1B). In fact, in a model with only the focal populations included, variation along the first two principal components did not separate the J1 and J3 populations (separation happens at PC4; Fig. S1). The admixture patterns were in most part concordant with the PCA results. In general, individuals from the low-altitude populations J1 and T1 had higher admixture proportions (from their respective high-altitude neighbours) than individuals from the high-altitude populations J3 and T4 (from their low-altitude neighbours) (Fig. 1C).
Demography analysis reveals recent divergence and asymmetric gene flow
To quantify the levels of gene flow, as well as to have an estimate of divergence times and effective population sizes, we conducted site frequency spectra based demography simulations in fastsimcoal2 (Excoffier et al. 2013). Simulations involving the Jotunheimen populations J1 and J3 were done as part of an earlier study (Hämälä et al. 2018) and here we add parameter estimates for the Trollheimen populations T1 and T4. Maximum likelihood estimates (MLE) from the best-supported models (for model selection, see Table S1) indicated more recent divergence between T1 and T4 (307 generations ago) populations then between J1 and J3 (866 generations ago) populations. The Trollheimen groups also had lower effective population size estimates (Ne: T1 = 1,862; T4 = 779) than the Jotunheimen groups (Ne: J1 = 3,370; J3 = 4,295). Although the levels of gene flow were clearly higher among the Jotunheimen populations, the population migration rates were heavily biased towards the low altitude populations in both alpine areas (~36× higher from J3 to J1 than from J1 to J3, and ~2.4× higher from T4 to T1 than from T1 to T4) (Fig. 1D). As A. lyrata is insect pollinated, the asymmetry in gene flow could result from higher seed dispersal from high-to low-altitudes, but because allele frequencies reflect events over extended periods of time, we cannot rule out other modes of migration that may have been in effect during colonization of the alpine areas. For all MLEs and their 95% confidence intervals, see Table S2.
Genome-wide patterns of population specific selection
We studied population specific differentiation to examine the effects of gene flow on neutral and selected variants and to ascertain the important biological processes at each lowland and alpine habitat. To this end, we used population branch statistic (PBS) (Yi et al. 2010) to distinguish loci that have been under directional selection after the adjacent low-and high-altitude populations diverged. We first applied a maximum likelihood model by Kim et al. (2011) to estimate allele frequencies for each population directly from the genotype likelihoods. This method bypasses the need for genotype calling, leading to unbiased allele frequency estimates even at low coverage. The pairwise differentiation estimates between three populations, two focal ones and an outgroup, were quantified using a relative measure, FST, and an absolute measure, dXY. As these measures can be inflated by reduced within population genetic diversities and unequal sample sizes, respectively, we focused on outliers that were confirmed by both measures. For outlier detection, we used a method based on simulated null distributions. Estimated demography parameters (divergence times, migration rates and effective population sizes, going back to most recent common ancestor) and recombination information from a linkage map (Hämälä et al. 2017) were used to model the expected neutral differentiation statistics for these populations.
Selection patterns between the neighbouring populations were examined by calculating PBS estimates in 50 SNP non-overlapping sliding windows for population trios J1-J3-GER and T1-T4-GER. The German population was chosen as the outgroup over the Swedish one, because it has a more equal genetic distance to each of our focal populations (Fig. 1A and Table 1). Using the simulated neutral data, we first generated ~50,000 PBS samples for each population and used the distributions to find limits to neutral variation in the observed data. The simulations predicted lower median estimates and more tightly centred distributions in the low-altitude populations (median: J1 = –0.007, T1 = –0.009; IQR: J1 = 0.050, T1 = 0.078) than in the high-altitude populations (median: J3 = –0.006, T4 = 0.002; IQR: J3 = 0.053, T4 = 0.081). Overall, the observed estimates corresponded well with the simulated data. Within both areas, the low-altitude populations had lower median estimates than the high-altitude populations (J1 = –0.009, J3 = –0.004, T1 = –0.017, T4 = 0.007) and the distributions were less dispersed around the median (IQR: J1 = 0.060, J3 = 0.063, T1 = 0.093, T4 = 0.108). The observed distributions did, however, have slightly longer lower and upper tails than the neutral ones (Fig. S2), indicating possible balancing and directional selection, respectively. Significant deviations from the neutral expectations were determined by defining p-values for the observed estimates through comparisons against quantiles of the simulated distributions. The p-values were subsequently transformed into false discovery rate based q-values to reduce the bias caused by multiple testing. PBS estimates with q-value less than 0.05 were considered as significant. Our outlier detection model predicted fewer selected loci in the low-altitude populations than in the high-altitude populations (bracketed values indicate percentages of the total data sets): J1 = 896 (1.1%), J3 = 948 (1.2%), T1 = 966 (1.3%) and T4 = 1,160 (1.6%) (Fig. S3). Although all outlier windows tended to localize in areas with lower than average per base pair recombination rates (genome-wide average = 3.7×10-8), this trend was more pronounced in the low-altitude populations (median estimates with 50 SNP sliding windows): J1 = 2.13×10-8, J3 = 2.72×10-8, T1 = 2.37×10-8 and T4 = 2.56×10-8 (p < 0.003, KST). The same pattern also persisted with longer window sizes (150 SNP window): J1 = 2.10×10-8, J3 = 2.54×10-8, T1 = 2.14×10-8 and T4 = 2.8×10-8 (p-value < 0.009, KST).
Evidence of allelic trade-off at XRN2 locus
Most selection outliers were found only in a single population (Table S3), likely indicating conditional neutrality. Some loci were, however, shared between the neighboring low-and high-altitude populations, suggesting possible antagonistic pleiotropy: 42 out of 1,668 genes that were found within 5 Kb of an outlier window were shared between J1 and J3 (based on 10,000 randomized data sets [randomization test from now on], the expected number of shared outliers is 22), while 67 out of 2,359 were shared between T1 and T4 (43 expected, randomization test). One locus (XRN2) particularly stood out as one of the top outliers in T1 and T4, caused by higher than neutral differentiation between all three population comparisons (i.e. T1-T4, T1-GER and T4-GER). To examine whether the observed pattern might result from directional selection favoring different alleles in different populations (antagonistic pleiotropy), we examined allele frequencies, as well as pairwise nucleotide diversities (π) and Tajima’s D estimates around the gene. Reduced diversity surrounding the selected site is a classical signal of a rapid selective sweep, whereas Tajima’s D detects skews in the site frequency spectrum, producing negative estimates when the number of rare alleles exceeds neutral expectations; a pattern also consistent with a selective sweep. Allele frequencies showed clear evidence of differentiation, with T1 fixed for one allele and T4 nearly fixed for the alternative allele. Both populations also had reduced nucleotide diversities and negative Tajima’s D estimates in ~8 Kb area around the gene, indicating directional selection in each lineage (Fig. 2). To evaluate the likelihood of the observed patterns under our estimated demography parameters (Fig. 1D), we used forward genetic models in SLiM 2 (Haller and Messer 2017) to simulate nucleotide diversities and Tajima’s D estimates under different selection scenarios. The recombination rate (3.7×10-8) was obtained from the corresponding area of the linkage map. For the purpose of these simulations, we assumed constant population sizes so as not to confound the selection patterns with genetic bottlenecks. Like selection, a bottleneck can reduce genetic diversity, but given the highly localized nature of our findings (Fig. 2), it is an unlikely source for the observed patterns. We ran the simulation 10×Ne generations to approach mutation-drift balance, introduced the sweep mutation with selection coefficient α (2Nes) = 102, 103 and 104, and recorded the time in number of generations until π and D dropped below the initial neutral estimate. The simulations were conducted as single-origin hard sweeps and as multiple-origin soft sweeps. For the soft-sweeps, we assumed that 5% of the corresponding population carried the adaptive allele before the selection shift. Each simulation was repeated 300 times and the median value retained. Our models showed that under a hard sweep with strong selection (α > 103), π and D can respond to selection in both populations within the estimated time frame (faster in T4 as it had lower Ne estimate [Fig. S4]). The same, however, was not true for lower selection coefficients or multiple-origin soft sweeps in general (Fig. 3).
Biological processes show adaptive convergence
Most genes that localized within 5 Kb of the outlier windows were population specific (Table S3). Among the significant outlier loci, 49 out of 1,785 genes were shared between the two low-altitude populations (25 expected, randomization test), whereas in the high-altitude populations, 92 genes were shared out of 2,240 (38 expected, randomization test). We then conducted a Gene Ontology (GO) enrichment analysis to summarize the biological processes associated with our outlier genes. In contrast to individual loci, significantly enriched GO terms showed clear correlations among the low-and high-altitude populations (Fig. 4). In the J1 and T1 populations, ‘leaf development’, ‘shoot system development’ and ‘response to bacterium’ were among the highest enriched GO categories, whereas the J3 and T4 outliers were enriched for terms ‘response to radiation’, ‘response to light stimulus’ and ‘cellular response to stress’.
Discussion
We found indications of lower genetic diversities, higher accumulation of deleterious mutations and stronger population size contractions in the high-altitude populations than in the low-altitude populations. Combined with the low divergence time estimates from the demography models, these results point towards recent colonization of the alpine habitats in southwestern Norway. In contrast, highland populations in regions not influenced by the last glacial maximum (~20,000 years ago). likely reflect adaptation on a longer time scales [e.g. A. halleri in Japan (Kubota et al. 2015) and A. thaliana in Italy (Günther et al. 2016)], making our A. lyrata population set particularly suitable system for studying recent adaptation among populations connected by gene flow.
Gene flow shapes patterns of neutral and adaptive variation
The effects of gene flow on genetic patterns of local adaptation have been intensely studied at the theoretical level (Lenormand and Otto 2000; Griswold 2006; Yeaman and Whitlock 2011; Aeschbacher et al. 2017), but the empirical support for number of key predictions are still weak or missing. For example, under higher levels of gene flow, is adaptation attributable to fewer large effect loci that are concentrated in areas of reduced recombination? The use of PBS allowed us to approach these questions by examining the lineage specific differentiation patterns among populations that receive variable number of migrants. The demography modelling indicated ongoing, but highly asymmetric, gene flow between the neighbouring low-and high-altitude populations, while the Jotunheimen groups J1 and J3 exchanged migrants at a significantly higher rate than the Trollheimen groups T1 and T4. Concordant with these estimates, the low-altitude populations T1 and J1 showed PBS distributions with lower and less dispersed overall estimates than the high-altitude populations J3 and T4, and our simulation based outlier detection model predicted more adaptive loci in the high-altitude populations. Our results also showed that outlier windows in the low-altitude populations were more concentrated in areas of lower recombination rates than in the high-altitude populations. Similar patterns have previously been found among stickleback population pairs that presumably exchange migrants at different rates (Marques et al. 2016; Samuk et al. 2017). However, by combining estimates of bidirectional gene flow with lineage specific selection analysis, we were able to examine these patterns not only between the Jotunheimen and Trollheimen study pairs, but also within them, making it possible to ascertain asymmetric effects and to better approach the genetic architecture underlying the local adaptation. Furthermore, despite a conscious effort (Renaut et al. 2013), significant associations between gene flow and recombination induced localization of the adaptive loci has not (to our knowledge) previously been reported in any other plant taxa. Based on theory, if the genetic architecture underlying local adaptation with gene flow evolves only through de novo mutations, a considerable amount of time may be needed for the signature patterns to form (Yeaman and Whitlock 2011). As the observed recombination differences have evolved recently, potentially during the last 1,000 generations, favourable patterns were likely present as standing genetic variation before the selection shift, leading to faster evolution of the genetic architecture.
Selection patterns indicate antagonistic pleiotropy between neighbouring populations
A major question in local adaptation research concerns the role of antagonistic pleiotropy in promoting adaptive divergence (Kawecki and Ebert 2004; Savolainen et al. 2013; Tiffin and Ross-Ibarra 2014; Wadgymar et al. 2017; Yoder and Tiffin 2017), and this issue is particularly relevant when the focus is on closely adjacent populations (Yeaman and Whitlock 2011). A traditional way to search for genetic trade-offs is to measure presumably adaptive trait variation in contrasting environments and to find correlations between phenotypes and genotypes through genetic mapping (Fournier-Level et al. 2011; Anderson et al. 2013; Ågren et al. 2013). Although this approach has the potential advantage of linking phenotypes to fitness, focusing on preselected traits might cause important factors to be overlooked. Furthermore, the causative genes underlying the often-wide quantitative trait loci (QTL) intervals have rarely been discovered. Here, we used PBS scans to find loci showing patterns of opposing selection among the neighbouring low-and high-altitude populations, because under gene flow and unrestricted recombination, the frequency differentiation can only be maintained by such fitness trade-offs.
Our analysis discovered one locus (XRN2) with clear evidence of allelic trade-off between the T1 and T4 populations. In A. thaliana, XRN2 is known to be involved in various RNA processing tasks (Zakrzewska-Placzek et al. 2010), including posttranscriptional gene silencing (Gy et al. 2007); a defence response against viral mRNAs. Examination of nucleotide diversities and Tajima’s D estimates showed that the observed sequence patterns have likely resulted from two rapid and opposing selective sweeps, while the forward simulations indicated that selection has to be strong to produce these patterns in just under 600 generations (the upper confidence interval for the divergence time estimate). Furthermore, as suggested by our simulations, the highly negative Tajima’s D estimates are likely the result of a single or very few haplotypes being swept to a high frequency in each population. The sweeps might, however, have started from a standing genetic variation, but due to low Ne and high drift, nearly all haplotypes were lost during the initial selection phase, producing hard sweep like patterns (Hermisson and Pennings 2017). Reduced recombination is not a likely factor here, because our linkage map, as well as the relatively short area affected by the selective sweeps (~8 Kb), indicates that the recombination rate in this area is close to genome-wide average. However, projected against the theory of selection-migration balance (Lenormand 2002), the near-fixation of alleles at this locus is likely facilitated not only by strong selection, but also by weak migration. Indeed, some loci were also found as outliers in both Jotunheimen populations, which are connected by stronger gene flow, but none exhibited the same extreme patterns of selective sweeps as XRN2 in the Trollheimen populations. Based on expectations of mutational effects, conditional neutrality may actually be the dominant driver of local adaptation across all environments, as it can evolve through purely deleterious mutations (as opposed to antagonistic pleiotropy, which requires a beneficial effect in at least one environment) (Martin and Lenormand 2006). The method used here, however, searches for signals of positive selection, so the observed patterns are likely influenced by the lack of power to distinguish selection in both lineages, leading to some trade-off loci being identified as conditionally neutral (Wadgymar et al. 2017; Yoder and Tiffin 2017). The divergence between these populations might also be too recent for antagonistic pleiotropy to become more common, as it is a prediction based on longer evolutionary time scales (Yeaman and Whitlock 2011). Although QTL mapping based studies have suggested that genetic trade-offs may be more frequent between the long since diverged Swedish and Italian A. thaliana populations (Ågren et al. 2013; Price et al. 2018), the absence of gene flow makes comparisons to our results problematic. In any case, the apparent prevalence of conditional neutrality can result in freer spread of alleles, which might be beneficial for these A. lyrata populations upon climate change (Fournier-Level et al. 2011; Hämälä et al. 2018). Furthermore, the discovery of a strong allelic trade-off at XRN2 locus and the evaluation of the underlying processes requires detailed information about local selection, demographic history and recombination rate variation, which has rarely been available for other study systems.
Resistance to solar radiation is selected for in alpine environments
Besides genome-wide patterns of adaptive divergence, our results revealed genes and biological process that may underlie colonization of alpine and lowland habitats in southwestern Norway. A major factor associated with highland environments is the increase in solar radiation intensity (Körner 2007). Indeed, earlier study on A. halleri in Switzerland (Fischer et al. 2013) detected signs of directional selection on UV resistance genes in high-altitude populations. Our analysis also discovered a UV resistance gene TT5 (Li et al. 1993) among the most significant outliers in J3 and a DNA break repair (a trait related to increased radiation levels) gene RAD23C (Farmer et al. 2010) was found in T4. Furthermore, genes involved in response to light stimulus were discovered in both populations, and the highest shared outlier between J3 and T4 was ADG1; a gene thought to be important in acclimation to high light intensities in A. thaliana (Schmitz et al. 2014). As indicated by our GO enrichment analysis, loci involved in ‘response to radiation’ were more numerous than expected among the outlier loci in the two alpine populations. Additional highly enriched GO categories were ‘response to light stimulus’ and ‘cellular response to stress’, which might also be linked to adaptation under increased solar radiation (‘response to high light intensity’ was also significantly enriched in T4). The selection on radiation resistance genes may not, however, be ubiquitous among highland populations, as shown by studies on A. halleri in Japan (Kubota et al. 2015) and on A. thaliana in Italy (Günther et al. 2016). Therefore, these results indicate that solar radiation imposed selection can be a significant driver of adaptive divergence in the northern latitudes, even though the high-altitude areas in Scandinavia are relatively low elevation when compared to highest regions in the world.
Vegetative growth and bacterial defence are important traits in lowland adaptation
We also discovered interesting selection patterns in the low-altitude populations. Genes involved in leaf development and vegetative growth were found among the top outliers in J1 and T1, and the corresponding ‘leaf development’ GO category was significantly enriched in both gene sets. ‘Shoot system development’, another significantly enriched term, is especially interesting, because we have previously shown that lowland populations generally produce longer flowering shoots than alpine populations, and the trait had a positive correlation with fitness at the Finnish sea level field site (Hämälä et al. 2018). Bacterial defence seems to be another important trait in the lowland habitats. The GO term ‘response to bacterium’ was significantly enriched in both lowland populations, and the top outlier in J1 (SRT2) is involved in that process (Wang et al. 2010). Selection on bacterial defence genes has previously been discovered among populations from different altitudes in A. halleri (Fischer et al. 2013; Kubota et al. 2015) and in A. thaliana (Günther et al. 2016), but as those studies relied on traditional divergence outlier methods, the authors could not infer whether the selection target had been the low-or the high-altitude lineage. Furthermore, XRN2, the gene exhibiting allelic trade-off between T1 and T4, is thought to be involved in immune responses in A. thaliana (Gy et al. 2007), but it also has a role in general RNA processing (Zakrzewska-Placzek et al. 2010), so the causal factor behind the trade-off may not be related to pathogen defence.
Despite the fact that many genes with same biological processes were shared between the two high-and low-altitude populations, almost all individual loci were population specific, and none of the most significant ones have previously been found in altitude related genome scans in other Arabidopsis species (Fischer et al. 2013; Kubota et al. 2015; Günther et al. 2016). This lack of correlation at the gene level likely indicates that populations (and related species) living in similar environments have responded to same environmental pressures through different genetic pathways, suggesting either subtle differences in phenotypes under selection or that selection has acted on different genes underlying the same polygenic traits. Considering the recent divergence time estimate between populations from the two alpine areas (~2,400 years ago) and the growing evidence of convergent local adaptation even among more distantly related groups (Hohenlohe et al. 2010; Arnold et al. 2016; Yeaman et al. 2016), these results indicate that adaptation to altitude specific environments in A. lyrata is not constrained to only few key genes, which might further aid adaptation to future climates.
Conclusions
By studying recently diverged, phenotypically differentiated and locally adapted A. lyrata populations, we have gained novel insights into the adaptive processes driving differentiation under ongoing gene flow. Our approach of combining estimates of bidirectional gene flow with lineage specific selection analysis allowed us to examine population differences in genetic architectures underlying the local adaptation and to infer how asymmetric gene flow affects these patterns. The overall levels of gene flow were heavily biased towards the low-altitude populations, with J1 and T1 receiving several times more migrants than J3 and T4. As predicted by theory, the lowland populations had more selection outliers in areas of reduced recombination rates than the alpine populations. Furthermore, we found a locus showing clear footprints of strong opposing selective sweeps in the Trollheimen populations; a pattern likely caused by antagonistic pleiotropy. However, most selected loci showed indications of conditional neutrality, potentially reflecting the recent divergence between these populations. Phenotypes associated with the outlier loci also revealed biological processes that may underlie alpine and lowland adaptation in northern Europe. Resistance to increased levels of solar radiation is likely an important factor for alpine populations, while in the lowland habitats, selected genes were involved in vegetative growth and bacterial defence. These results contribute to understanding of processes driving adaptive differentiation under gene flow, as well as of traits and biological processes underlying alpine adaptation in northern latitudes.
Materials and methods
Study populations
We studied altitude adaptation among Norwegian A. lyrata populations from two alpine areas (Jotunheimen and Trollheimen). Both areas were represented by one low-and one high-altitude population. The fitness and phenotypic variation of these populations were previously studied inHämälä et al. (2018). In that study, the areas were represented by four populations, which were abbreviated as J1 to J4 and T1 to T4. Here, we retain the naming convention and call the populations J1 (300 m.a.s.l), J3 (1,100 m.a.s.l), T1 (10 m.a.s.l) and T4 (1,360 m.a.s.l). The distance between the J1 and J3 growing sites is approximately 25 km, whereas the approximate distance between T1 and T4 is 30 km. The two alpine areas are roughly 100 km apart. We previously showed that individuals from J1 and J3 exhibited local superiority when grown at reciprocal common garden sites in Norway. The T1 and T4 individuals were not planted at their local environments, but when grown at common garden in Finland, they expressed phenotypic differences consistent with altitude adaptation (Hämälä et al. 2018).
Whole-genome sequencing
Whole-genome data from two previously published studies were used (Mattila et al. 2017; Hämälä et al. 2018) nine individuals from J1, 12 from J3 and five from T1. For the present study, we also resequenced nine individuals from T4 population, which exhibited more high-altitude specific phenotypes (earlier flowering start and shorter flowering shoots) in our field experiments than the previously sequenced T3 population (Hämälä et al. 2018). In addition, a German population (abbreviated as GER) consisting of six individuals was used as an outgroup in the selection scan (see ‘Selection scan’ -section) and the German and a Swedish population (SWE; consisting also of six individuals) were used as comparison groups in admixture and principal component analysis (see ‘Analysis of genetic diversity and population structure’ -section). The German and Swedish individuals, as well as five individuals from J3, came fromMattila et al. (2017), whereas the other previously sequenced individuals were fromHämälä et al. (2018). In all three studies, DNA was extracted from fresh leaves using NucleoSpin Plant II kits (Macherey-Nagel), the libraries for Illumina whole-genome sequencing were prepared with NEBNext master mix kits (New England Biolabs) and the sequencing was done with Illumina HiSeq2000 (Mattila et al. 2017) and HiSeq2500 (Hämälä et al. 2018; this study) in Institute of Molecular Medicine Finland, University of Helsinki, using PE100 chemistry. The median read coverage per individual ranged from six to 25. The low coverage in some individuals combined with the variable nature of the short-read sequencing means that information contained in large part of the data set is insufficient to call genotypes with high confidence (Nielsen et al. 2011). Therefore, to lessen the bias caused by uncertain genotypes, we adopted a SNP call free approach and based all estimated statistics on genotype likelihoods.
Sequence processing and allele frequency estimation
Low quality reads and sequencing adapters were removed using Trimmomatic (Bolger et al. 2014). The reads were aligned to A. lyrata v1.0 reference genome (Hu et al. 2011) with bwa-mem (Li and Durbin 2009). Duplicated reads were removed with Picard tools (https://broadinstitute.github.io/picard/) and indels realigned with GATK (DePristo et al. 2011). Likelihoods for the three possible genotypes in each biallelic site were then calculated with the GATK model in ANGSD (Korneliussen et al. 2014). We only used reads with mapping quality over 30, while sites needed to have quality over 20 and sequencing depth no less than 4×. Allele frequencies for the selection scan were then estimated directly from the genotype likelihoods using a maximum likelihood model by Kim et al. (2011). The strict filtering associated with SNP calling (commonly the ranking genotype needs to be ten times more likely than the others) can especially reduce the number of heterozygote calls in areas of low coverage (for a comparison between SNP calls and genotype likelihoods in our data, see Figure S2). The method used here circumvents that problem by taking the genotype uncertainty into account, producing unbiased allele frequency estimates even with the minimum threshold coverage (Kim et al. 2011; Nielsen et al. 2011).
Analysis of genetic diversity and population structure
We studied genetic variation within populations by estimating three summary statistics with ANGSD (Korneliussen et al. 2014): nucleotide diversity π (Nei and Li 1979); the ratio of nonsynonymous and synonymous pairwise differences (πN/πS; estimated for 0-fold and 4-fold sites, respectively), which can estimate the efficacy of selection and the accumulation of deleterious mutations (e.g.Chen et al. 2017); and Tajima’s D, which approximates how far the population is from a mutation-drift balance (Tajima 1989). Admixture analysis and principal component analysis (PCA) were conducted to further assess the genetic relationships between the study populations. ANGSD was used to estimate genetic covariance’s for the PCA and genotype likelihoods for the admixture analysis (both estimated for 4-fold sites). The latter was conducted with NGSadmix (Skotte et al. 2013), by estimating admixture proportions for each individual from the genotype likelihoods.
Demography simulations
Site frequency spectra based coalescent simulations were used to estimate divergence times, migration rates and effective population sizes. Estimates involving the Jotunheimen populations J1 and J3 were inferred as part of an earlier study (Hämälä et al. 2018). Here, we conducted additional simulations for the Trollheimen populations T1 and T4 using the same method as in the earlier study. Briefly, derived site frequency spectra were estimated for 4-fold degenerate sites in ANGSD (Korneliussen et al. 2014) and the demography models were fitted to these in fastsimcoal2 (Excoffier et al. 2013). We tested four different migration models between the Norwegian populations: no migration, unidirectional migration (from 1 to 2 and from 2 to 1) and bidirectional migration. Simulation were repeated 50 times to acquire global maximum likelihood estimates for the parameters. Model selection was based on Akaike information criterion (AIC) scores. We then used 100 nonparametric bootstrap site frequency spectra to define 95% confidence intervals for the parameter estimates. For more information about the demography analysis, see Hämälä et al. (2018).
Selection scan
Selected sites were detected by scanning the chromosomes for areas of unexpectedly high differentiation between the populations; a pattern indicative of directional selection (Lewontin and Krakauer 1973). The relative levels of differentiation were estimated with FST measure by Hudson et al. (1992): 1 – (HW / HB), where HW and HB are the mean number of differences within and between populations, respectively. However, here we used a more specific formula, developed for the two population, two allele case by Bhatia et al. (2013): where ni is the sample size and pi is the minor allele frequency in the two populations to be compared. This estimator has been shown to be unbiased by unequal sample sizes and less prone to overestimating differentiation than measures by Weir and Cockerham (1984) or Nei (1986) (Bhatia et al. 2013). The selection scan was conducted in 50 SNP non-overlapping sliding windows (median length ~2 Kb), to prevent biasing the estimates with unequal SNP numbers. FST for a window of size n was calculated using the weighting method by Reynolds et al. (1983): For lower number of markers (such as within windows), this approach is often more reliable than averaging the FST estimates over loci (Weir and Cockerham 1984; Weir and Hill 2002; Bhatia et al. 2013). FST can, however, be inflated by reduced within-population nucleotide diversities, brought on e.g. by background selection or impaired recombination (Charlesworth 1998; Cruickshank and Hahn 2014). For this reason, we also estimated absolute levels of differentiation between the populations; an index commonly called dXY (Nei 1987; Cruickshank and Hahn 2014). To make it compatible with allele frequencies estimated from genotype likelihoods, dXY for a window of size n was calculated by simply excluding the within-populations component of FST:
This measure is indifferent to within-population levels of diversity, but it can be biased by unequal sample sizes. Therefore, to balance the shortcomings of both FST and dXY, we only considered sites that were detected as outliers using both measures.
A single differentiation measure, either FST or dXY, can detect localized selection, but it cannot distinguish which lineage has been the target. A recently developed method, PBS (Yi et al. 2010), overcomes this limitation by comparing differentiation estimates between two closely related populations and an outgroup. The FST and dXY values were first transformed into relative divergence times: T = –ln(1 – X), where X is the differentiation measure. PBS for population 1 was then estimated as:
The obtained value quantifies the magnitude of allele frequency change in a lineage 1 since its divergence from the closely related population 2 and the outgroup 3. Lineage specific selection patterns among the focal populations were estimated by calculating PBS estimated for population trios J1-J3-GER and T1-T4-GER. The German population was chosen as the outgroup over the Swedish one, because it has a more equal genetic distance to each of our focal populations (Figure 1A and Table 1).
Outlier detection
We compared the PBS estimates against simulated samples to find sites that show higher differentiation than expected under neutrality. Neutral data were generated with coalescent models in ms (Hudson 2002), by taking into account the genome-wide recombination rates and the demographic history of these populations. This approach can produce realistic approximations of the null distribution, which generally leads to fewer false positives compared to methods based on specific population genetic or statistical models (Lotterhos and Whitlock 2015; Hoban et al. 2016). The relevant maximum likelihood estimates from the demography models (i.e. divergence times, migration rates and effective population sizes, going back to a most recent common ancestor of these populations) were used for each population comparison. Recombination rates for sequences that corresponded in size to observed window lengths were pulled randomly from a linkage map published in an earlier study (Hämälä et al. 2017). The mutation rate was set to 7×10−9 followingOssowski et al. (2010). Using the simulated data, we acquired ~50,000 neutral PBS estimates for each population, which constituted the null distributions for outlier testing. We defined p-values as the proportion of neutral estimates that had the same or higher PBS value than the observed one. To address the multiple testing issue, p-values were transformed into false discovery rate based q-values (Storey and Tibshirani 2003). We have implemented this PBS scan and outlier detection method into a new C program, PBScan, available at: https://github.com/thamala/PBScan
For one especially interesting locus, we also examined what evolutionary scenarios may have produced the selection patterns by conducting simulations under different forward genetic models in SLiM 2 (Haller and Messer 2017). As in the case of neutral simulations, we used the estimated demography parameters, mutation rate from Ossowski et al. (2010) and recombination information from Hämälä et al. (2017). Selection parameters (α [2Nes] = 102, 103, 104) were based onPennings and Hermisson (2006). Code for the SLiM 2 simulations is available at: https://github.com/thamala/SLiM2_scripts
Gene Ontology enrichment analysis
Windows with significant (q-value < 0.05) outlier status were annotated with SnpEff (Cingolani et al. 2012) and genes that localized within or were closer than 5 Kb of an outlier window were included in a Gene Ontology (GO) (Ashburner et al. 2000) analysis. PANTHER tools (Mi et al. 2017) was used to detect significantly enriched terms (last accessed 19th of July 2017).
Acknowledgements
We thank H. Stenøien for sharing information about the A. lyrata growing sites in Norway, T. Toivainen and J. Tyrmi for sample collections and T. Mattila for providing part of the data set. We are also grateful for P. Ingvarsson and members of the Plant Genetics Group in Oulu for valuable discussions and comments on the manuscript. IT Center for Science (CSC) supplied computational resources. This work was supported by Biocenter Oulu (to T.H and O.S), Eemil Aaltonen Foundation (to T.H) and Academy of Finland’s Research Council for Biosciences and Environment (decision 132611 to O.S).