Ancient genomic variation underlies recent and repeated ecological adaptation

Thomas C. Nelson; William A. Cresko

doi:10.1101/167981

Abstract

Adaptation in the wild often involves the use of standing genetic variation (SGV), allowing rapid responses to selection on ecological timescales. Despite increasing documentation of evolutionarily important SGV in natural populations, we still know little about how the genetic and genomic structure and molecular evolutionary history of SGV relate to adaptation. Here, we address this knowledge gap using the threespine stickleback fish (Gasterosteus aculeatus) as a model. We demonstrate that adaptive genetic variation is structured genome-wide into distinct marine and freshwater haplogroups. This divergent variation averages six million years old, nearly twice the genome-wide average, but has been evolving over the 15-million-year history of the species. Divergent marine and freshwater genomes maintain regions of ancient ancestry that include multiple chromosomal inversions and extensive linked variation. These discoveries about ancient SGV demonstrate the intertwined nature of selection on ecological timescales and genome evolution over geological timescales.

The mode and tempo of adaptive evolution depend on the sources of genetic variation affecting fitness^1,2. While new mutation is ultimately the source of all genetic variation, recent studies of adaptation in the wild document adaptive genetic variation that was either segregating in the ancestral population as standing genetic variation (SGV)^3-5 or introgressed from a separate population or species^6,7. The use of SGV appears particularly important when dramatic responses to selection occur on ecological timescales, in dozens of generations or fewer³. When environments change rapidly, SGV can propel rapid evolution in ecologically relevant traits even in populations of long-lived organisms like Darwin’s finches⁸, monkeyflowers⁹, and threespine stickleback fish¹⁰.

Existing genetic variants have evolutionary histories that are often unknown but that may have significant impacts on subsequent adaptation^9,11. The abundance, genomic distribution, and fitness effects^10-14 of SGV are themselves products of evolution, and their unknown history raises fascinating questions for the genetics of adaptation in the wild. When did adaptive variants originally arise? How are they structured, across both geography and the genome? Which evolutionary forces shaped their distribution? And how does this evolutionary history of SGV potentially channel future evolutionary change?

Answers to these questions are critical for our understanding of the importance of SGV in nature and our ability to predict the paths available to adaptation on ecological timescales⁹. Biologists are beginning to probe evolutionary histories of SGV using genome-wide sequence variation across multiple individuals in numerous populations¹⁵, but this level of inference has been unavailable for most natural systems because of methodological limitations that remove phase information (e.g. pool-seq¹⁶) or produce very short reads (e.g. RAD-seq¹⁷). Here, we investigate the structure and evolutionary history of divergent SGV by implementing a novel haplotyping method based on restriction site-associated DNA sequencing (RAD-seq). This approach creates nearly 1kb haplotypes at thousands of densely sampled loci, allowing us to accurately measure sequence variation and estimate divergence times across the genome.

SGV is likely critical to adaptation in this species. Marine stickleback have repeatedly colonized freshwater lakes and streams^18,19, and adaptive divergence in isolated freshwater habitats is highly parallel at the phenotypic^20,21 and genomic levels^19,22 (but see Stuart et al.²³). In addition, analyses of haplotype variation at the genes eda^10,24 and atp1a1²⁴ present two clear results: separate freshwater populations share common ‘freshwater’ haplotypes that are identical-by-descent (IBD), and sequence divergence between the major marine and freshwater haplogroups suggests their ancient origins, perhaps over two million years ago in the case of eda¹⁰. While intriguing, it is not clear whether the deep evolutionary histories of these loci are outliers or representative of more widespread ancient history across the genome. To address this fundamental question we utilize the new haplotype RAD-seq approach to assay genome-wide variation associated with adaptive divergence in two young freshwater ponds, which formed during the end-the end-Pleistocene glacial retreat (c. 12,000 years ago^21,25; Fig. 1). Our results demonstrate a suite of adaptive variation structured into distinct marine and freshwater haplotypes that evolved over millions of years.

Figure 1. Stickleback sampling and RAD sequencing to measure haplotype variation.

A) Threespine stickleback sampling locations in this study. Colors represent habitat type: red: marine; blue: freshwater. B-D: We modified the original RAD-seq protocol to generate local haplotypes. Colored bars represent polymorphic sites. For a detailed description of haplotype construction, see Methods. B) Overlapping paired-end reads are anchored to PstI restriction sites. C) Paired reads mapping to each halfsite are merged into contigs. Contigs mapping to the same restriction site are identified by alignment to the reference genome. D) Sequences from each half of a restriction site are phased to generate a single RAD locus.

RESULTS AND DISCUSSION

Parallel adaptation to freshwater environments has been a major theme of stickleback evolutionary history²⁶. Stereotypical morphological changes (e.g. bony armor²⁰ and craniofacial structures²⁷) presumably reflect adaptation to similar selective regimes^28,29 and are accompanied by parallel genomic divergence^19,22, which involves large regions spanning many megabases^24,30, including multiple chromosomal inversions¹⁹. The leading hypothesis for the genetics of parallel divergence in stickleback posits that distinct freshwater-adaptive haplotypes that are identical-by-descent (IBD) are shared among fresh water populations due to historical gene flow between marine and freshwater populations³⁰. To test for the presence of these haplotypes directly, we characterized the genomic architecture and evolutionary history of SGV by modifying the RAD-seq protocol³¹ to generate phased haplotypes similar in length to Sanger sequencing reads, each anchored to tens of thousands of PstI restriction sites spread across the genome (Fig. 1.B-D). We sampled five fish (10 haploid genomes) each from Boot Lake (BL) and Rabbit Slough (RS), and four fish (8 genomes) from Bear Paw Lake (BP). After stringent data filtering (see Methods), this resulted in a dataset of 57,992 RAD loci (locus = two tags representing one cut site) with 694 potential variable sites per locus and a median of seven segregating sites per locus (range: 2-155, Suppl. Fig. 1, Suppl. Table 1). We then used these phased haplotypes to estimate genealogies at each RAD locus. By including haplotypes from all three populations in these genealogical analyses, we were able to jointly calculate population genetic statistics (F_st, π, d_xy), estimate the degree of lineage sorting within populations, and identify patterns of IBD among populations.

Supplementary Figure S1. RAD-seq effectively samples genome-wide sequence diversity.

Histograms of the distance between adjacent RAD loci (left: calculated as the distance between the centers of each restriction site) and the number of variable sites per locus show that most RAD loci were within 4 kb of their nearest neighbor and contained ≥ 7 variable sites. Means for each metric are shown as dashed vertical lines. Medians are solid lines. Each histogram is truncated to highlight the bulk of the distribution. Maximum values: distance = 455 kb; variable sites = 155.

View this table:

Supplementary table 1.

Sequencing summary for threespine stickleback samples

We find that indeed, parallel population genomic divergence in each freshwater site consistently involved haplotypes that were IBD among both freshwater populations (Fig. 2). Background F_st between populations ranged from 0.139-0.226, with divergence between the freshwater populations BL and BP being highest (F_st(rs-bl) = 0.139, F_st(rs-bp) = 0.194, F_st(bl-bp) = 0.226; two-sided Mann-Whitney test for all pairwise comparisons: p ≤ 1×10⁻¹⁰). The degree and genomic distribution of pairwise F_st between the BL, BP, and RS populations were similar to those previously reported²², including marine-freshwater F_st outlier regions on chromosome 4 over a broad span in which the eda gene is embedded (orange triangle in Fig. 2A) and three regions now known to be associated with chromosomal inversions on chromosomes 1, 11, and 21 (yellow bars in Fig. 2; hereafter referred to as inv1, inv11, and inv21). The gene atp1a1 (green triangle in Fig. 2A) is contained within inv1. As expected, we found distinct haplogroups associated with marine and freshwater habitats at both eda and atp1a1 (Fig. 3, insets).

Strikingly, this finding of habitat specific haplogroups was not at all unique to these well studied genes or chromosomal inversions. The two isolated freshwater populations shared IBD haplotypes within all common marine-freshwater F_st peaks even though IBD was rare elsewhere (Fig. 2B). Furthermore, we observed a separate clade of haplotypes representing the marine RS population at the majority (1129 of 2172, 52%) of RAD loci showing freshwater IBD. The result was a genome-wide pattern of reciprocal monophyly between marine and freshwater haplotypes. Notably, this is the same genealogical structure previously reported at eda^10,24 and atp1a1²⁴, demonstrating that these loci are but a small part of a genome-wide suite of genetic variation sharing similar habitat-specific evolutionary histories, and the previous documentation of their genealogies was a harbinger of a much more extensive pattern across the genome revealed here. Hereafter, we refer collectively to this class of RAD loci as ‘divergent loci’.

Figure 2. The genealogical structure of parallel genomic divergence.

A) Genome-wide F_st for both marine-freshwater comparisons was kernel-smoothed using a normally distributed kernel with a window size of 500 kb. Inverted triangles indicate the locations of two genes known to show extensive marine-freshwater haplotype divergence, Eda and Atp1a1. Three chromosomal inversions are highlighted in yellow. B) Lineage sorting patterns were identified from maximum clade credibility trees for each RAD locus. Blue bars: haplotypes from both freshwater populations form a single monophyletic group; red: haplotypes from the marine population form a monophyletic group; black: A RAD locus is structured into reciprocally monophyletic marine and freshwater haplogroups.

Because the genealogical structure of divergence across the genome mirrors that at eda and atp1a1, we asked whether levels of sequence variation and divergence also showed consistent genomic patterns. At all RAD loci we therefore calculated π within each population, as well as in the combined freshwater populations, and d_xy between marine and freshwater habitat types. Genome-wide diversity was similar across populations and habitat types (mean π_rs = 0.0032, π_bl = 0.0034, π_bp = 0.0026, π_fw = 0.0038) and comparable to previous estimates²². Likewise, genome-wide d_xy among habitat types was modest (0.0049) when compared to π across all populations (π = 0.0042, two-sided Mann-Whitney test: p ≤ 1×10⁻¹⁰). Among divergent loci, however, we observed reductions in diversity in both habitats (mean π_rs-divergent = 0.0012, π_rs-divergent = 0.0016, two-sided permutation test: p ≤ 1×10⁻⁴, Fig. 3, Suppl. Fig. 2), indicating natural selection in both habitats. Sequence divergence associated with reciprocal monophyly was striking, however, averaging nearly three times the genome-wide mean (mean d_xy-divergent = 0.0124). This divergence ranged more than an order of magnitude (0.0013–0.0442), from substantially lower than the genome-wide average to ten times greater than the average. These findings indicate that much of the genetic variation underlying adaptive divergence was not just standing and structured by habitat, but has been segregating and accumulating for millennia.

Supplementary Figure S2. Relative (F_st) and absolute (d_xy) sequence divergence are positively correlated genome-wide in two instances of marine-freshwater divergence.

Points are 250 kb non-overlapping genomic windows. Left panel compares the marine Rabbit Slough population (RS) to the freshwater Boot Lake population (BL) (type-II linear model: r² = 0.314, permuted p-value [reduced major axis] = 0.01). Right panel compares RS to the freshwater Bear Paw Lake population (BL) (type-II linear model: r² = 0.311, permuted p-value [reduced major axis] = 0.01).

Figure 3. Extensive sequence divergence between marine and freshwater haplogroups accompanies reciprocal monophyly.

For each reciprocally monophyletic RAD locus, we calculated sequence variation (π) within and sequence divergence between habitat types (d_xy). Each RAD locus is shown as a pair of lines connecting estimates of π and d_xy. Boxplots show distributions across all reciprocally monophyletic RAD loci: Boxes are upper and lower quartiles, including the median; whiskers extend to 1.5x interquartile range. Dashed lines are the genome-wide medians. Single RAD loci from within the transcribed regions of Eda and Atp1a1 are shown as gold and green lines, respectively, and presented as haplotype networks. Dots represent mutational steps. Circle sizes indicate the number of haplotypes and colors indicate population of origin as in Figure 1. Each network = 29 haplotypes.

These data clearly support the hypothesis of Schluter and Conte³⁰ of ancient haplotypes ‘transported’ among freshwater populations. Much of the divergence we observed was ancient in origin, with levels of sequence divergence at some RAD loci exceeding that observed at eda (Fig. 3, gold line) and suggestive of divergence times of at least two million years ago¹⁰. Our observation that sequence variation was consistently reduced in both habitat types emphasizes that alternative haplotypes at these loci are likely selected for in the marine population as well as the freshwater. These alternative fitness optima — driven by different ecologies — provide a favorable landscape for the maintenance of variation^32,33, but also lead to a more potent barrier to gene flow among freshwater populations if there are fitness consequences in the marine habitat for stickleback carrying freshwater-adaptive variation. Conditional fitness effects through genetic interactions (e.g., dominance³⁴ or epistasis³⁵) and genotype-by-habitat interactions³⁶ could potentially extend the residence time of freshwater haplotypes in the marine habitat. Future work should consider the phenotypic effects of divergently adaptive variation in different external environments^36,37.

A steady accumulation of divergently adaptive variation between marine and freshwater stickleback genomes may also have been critical to the rapid divergence in the young pond populations we study here. We found reciprocal monophyly associated with a spectrum of sequence divergence, including a substantial fraction of divergent loci (11.0%, 124/1129) with d_xy below the genome-wide average. Thus, ongoing marine-freshwater ecological divergence has yielded continuing marine-freshwater genomic divergence. Moreover, while this younger variation is shared between the freshwater populations in this study, and localizes to genomic regions of divergence shared globally¹⁹, some adaptive variants may be distributed only locally (e.g. to southern Alaska or the eastern Pacific basin). In addition to the globally distributed suite of variation, there may also exist a substantial amount of regional variation contributing to stickleback genomic and phenotypic diversity.

Sequence divergence provides an important relative evolutionary timescale. However, to more directly compare the timescales of ecological adaptation and genomic evolution, we translated patterns of sequence variation into the time to the most recent common ancestor (Tmrca) of allelic variation, in years. To do so, we performed a de novo genome assembly of the ninespine stickleback (Pungitius pungitius), a member of the Gasterosteidae that diverged from the threespine stickleback lineage approximately 15 million years ago³⁸ (Fig. 4A, Suppl. Table 2). We then aligned our RAD dataset to this assembly and estimated gene trees for each alignment with BEAST³⁹, setting divergence to the ninespine stickleback at 15 MYA (see Methods).

View this table:

Supplementary table 2.

Genome assembly statistics for Pungitius pungitius.

Figure 4. Marine-freshwater divergence has evolved over millions of years, affecting large genomic regions.

We performed Bayesian estimation of the time to the most recent common ancestor (T_mrca) of alleles at threespine stickleback RAD loci. We calibrated coalescence times within threespine stickleback by including a de novo genome assembly from the ninespine stickleback (Pungitius pungitius) and setting threespine-ninespine divergence at 15 million years ago. A) Maximum clade credibility RAD gene tree representative of the genome-wide average T_mrca. Branches within threespine are colored by population of origin. B) Kernel-smoothed densities of T_mrca distributions for all RAD loci containing a monophyletic group of threespine stickleback alleles (light gray) and those structured into reciprocally monophyletic marine and freshwater haplogroups. C) The genomic distribution of reciprocally monophyletic RAD loci (black, as in Figure 2) is associated with increased T_mrca at a genomic scale. T_mrca outlier windows (those exceeding 99.9% of permuted genomic windows) are shown as gray bars. Genome-wide T_mrca was kernel-smoothed using a normally distributed kernel with a window size of 500 kb. Inverted triangles indicate the locations of Eda and Atp1a1. Three chromosomal inversions are highlighted in yellow.

We find that the divergence of key marine and freshwater haplotypes has been ongoing for millions of years and extends back to the split with the ninespine stickleback lineage (Fig. 4B). Genome-wide variation averaged 4.1 MY old, and Tmrca for the vast majority of RAD loci was under 5 MY old. In contrast, divergent loci averaged 6.4 MY old and, amazingly, the most ancient 10% (118 of 1129 loci) are estimated at over 10 MY old. This deep genomic divergence not only underscores that the marine-freshwater transition has been occurring throughout the history of the threespine stickleback lineage, for which there is evidence in the fossil record going back 10 million years⁴⁰, but it also demonstrates that at least some of the variation fueling those ancient events has persisted until the present day. In some genomic regions, then, marine and freshwater threespine stickleback are as divergent as threespine and ninespine stickleback, which are classified into separate genera.

Adaptive divergence has impacted the history of the stickleback genome as a whole (Fig. 4C). We identified 32.6 Mb, or 7.5%, of the genome as having elevated T_mrca (gray boxes in Fig. 4C; two-sided permutation test, p ≤ 0.001). Outside of the non-recombining portion of the sex chromosome (chr. 19), the oldest regions of the stickleback genome were those enriched for divergent loci. Patterns of ancient ancestry closely mirrored recent divergence in allele frequencies (Fig. 2A) and it appears that historical and contemporary marine-freshwater divergence has impacted ancestry across much of the length of some chromosomes. Chromosome 4, for example, contains at least three broad peaks in T_mrca and a total of 5.9 Mb identified as genome-wide outliers (two-sided permutation test, p ≤ 0.001). This chromosome has been of particular interest because of its association with a number of phenotypes^20,41, including fitness⁴². We found the major-effect armor plate locus eda comprised a local peak (mean T_mrca = 6.4 MYA) nested within a large region of deep ancestry spanning 8.1 Mb. Moreover, at least two other peaks distal to eda, centered at 21.4 Mb and 26.6 Mb, were also several million years older than the genomic average at 6.8 MYA and 7.0 MYA, respectively.

Intriguingly, genomic regions of elevated T_mrca remained outliers even after removing marine-freshwater relative divergence outliers as measured by F_st (Suppl. Fig. 3). We estimated that 7.5% of the genome had increased T_mrca even though only 1.9% of RAD loci (1129 of 57,992) were classified as divergent. When we removed these loci along with loci with extreme values of marine-freshwater F_st (F_st > 0.5), many of the regions in which they resided were still T_mrca outliers. It is possible that the remainder of this old variation is neutral with respect to fitness. However, we identified divergence outliers based on only a single axis of divergence: the marine-freshwater axis. Throughout the entire species range, populations are locally experiencing multiple axes of divergence, including lake-stream and benthic-limnetic axes⁴³, that often shares a common genomic architecture^44,45. Our data may indicate underlying similarities in selection regimes. Alternatively, this co-localized ancient variation may represent the accumulation of adaptive divergence along multiple axes in the same genomic regions, whether or not the underlying adaptive variants are the same. Aspects of the genomic architecture, such as gene density or local recombination rates, may in part govern where in the genome adaptive divergence can occur^46-48. Multiple axes of divergence may therefore act synergistically to maintain genomic variation across the stickleback metapopulation.

Supplementary Figure S3. T_mrca outlier regions remain outliers after removing highly differentiated RAD loci.

Panel A is taken from Fig. 2 and shows the genomic distribution of reciprocally monophyletic (“divergent”; black bars) RAD loci. Panel B shows the distributions of T_mrca outlier regions (increased Tmrca) including all RAD loci (magenta boxes, “Y”). Below are the T_mrca outlier regions after removing divergent loci and any RAD locus with a marine-freshwater (RS vs. [BL+BP]) F_st > 0.5, which is approximately the top 7% of the F_st distribution. Panel C: Genome scans of T_mrca using all RAD loci (magenta) and excluding marine-freshwater outliers (gray).

Nevertheless, much of the ancient variation we observe may in fact itself be neutral, having been maintained by close linkage to loci under divergent selection between the marine and freshwater habitats³². Indeed, the broadest peaks of T_mrca we observe occur in genomic regions with low rates of recombination^47,49 in other stickleback populations, which would extend the size of the linked region affected by divergent selection. On ecological timescales, low recombination rates in stickleback are thought to promote divergence by making locally adapted genomic regions resistant to gene flow⁴⁷. Our results potentially extend the inferred impact of recombination rate variation on genomic variation to timescales that are 1000-fold longer, maintaining both multimillion-year-old adaptive variation and large stores of linked genetic variation. Future modeling efforts will be needed to explore the range of population genetic parameter values (e.g. selection coefficients, migration rates, and recombination rates) required to produce the extent of divergence we see here.

Lastly, our findings demonstrate that known chromosomal inversions maintain globally distributed, multilocus haplotypes. The three chromosomal inversions (inv1, inv11, and inv21; yellow bars in Fig. 4C) all showed sharp spikes in T_mrca. Genomic signatures of these inversions are distributed throughout the species range, including coastal marine-freshwater population pairs in the Pacific and Atlantic basins¹⁹ and inland lake-stream pairs in Switzerland.⁴⁴. Despite our limited geographic sampling, our finding that all three of these inversions are over six million years old is further evidence of single, ancient origins of each, followed by their spread across the species range. Each inversion contained a high density of divergent RAD loci (inv1: 64% of loci divergent; inv11: 60%; inv21: 71%) but we also identified regions within these inversions in which haplotypes from marine or freshwater habitats, or both, were not monophyletic. inv1 and inv11 both contained two regions separated by loci in which neither habitat type was monophyletic; inv21, the largest of the three, contained ten such regions. Additionally, T_mrca and F_st decreased sharply to background levels outside of the inversions, demonstrating the potential for gene flow and recombination to homogenize variation in these regions. We interpret this as evidence that these inversions help maintain linkage disequilibrium among multiple divergently adaptive variants in regions susceptible to homogenization^11,50 The presence of these inversions, therefore, further supports the hypothesis that the recombinational landscape can influence where in the genome adaptive divergence can occur and emphasizes the degree to which gene flow among divergently adapted stickleback populations has impacted global genomic diversity.

CONCLUSIONS

Selection operating on two very different timescales — the ecological and the geological — has shaped genomic patterns of SGV in the threespine stickleback. Selection on ecological timescales drives phenotypic divergence in decades or millennia by sorting SGV across geography and throughout the genome^22,44,51,52. Our findings show that the persistence of this ecological diversity and local adaptation of stickleback has set the stage for long-term divergent selection and for the continual accumulation and maintenance of adaptive variation over millions of years. A number of genetic variants fueling contemporary, rapid adaptation may even have been present - and under selection - since before the threespine-ninespine stickleback lineages split. The extent to which ecological adaptation in a single population drew on haplotypes that have evolved over millions of years and persisted in multiple populations, many of which are now extinct, underscores the need to understand macroevolutionary patterns when studying microevolutionary processes, and vice versa.

METHODS

Sample collection and library preparation

Wild threespine stickleback were collected from Rabbit Slough (N 61.5595, W 149.2583), Boot Lake (N 61.7167, W 149.1167), and Bear Paw Lake (N 61.6139, W 149.7539). Rabbit Slough is an offshoot of the Knik Arm of Cook Inlet and is known to be populated by anadromous populations of stickleback that are stereotypically oceanic in phenotype and genotype^22,53. Boot Lake and Bear Paw Lake are both shallow lakes formed during the end-Pleistocene glacial retreat. Fish were collected in the summers of 2009 (Rabbit Slough), 2010 (Bear Paw Lake), and 2014 (Boot Lake) using wire minnow traps and euthanized in situ with Tricaine solution. Euthanized fish were immediately fixed in 95% ethanol and shipped to the Cresko Laboratory at the University of Oregon (Eugene, OR, USA). DNA was extracted from fin clips preserved in 95% ethanol using either Qiagen DNeasy spin column extraction kits or Ampure magnetic beads (Beckman Coulter, Inc) following manufacturer’s instructions. Yields averaged 1-2 μg DNA per extraction (~30 mg tissue). Treatment of animals followed protocols approved the University of Oregon Institutional Animal Care and Use Committee (IACUC).

We designed our library preparation strategy to identify sufficient sequence variation for gene tree reconstruction and to simplify downstream sequence processing and analysis by taking advantage of the phase information captured by paired-end sequencing. We generated RAD libraries from these samples using the single-digest sheared RAD protocol from Baird et al. with the following specifications and adjustments: 1 μg of genomic DNA per fish was digested with the restriction enzyme PstI-HF (New England Biolabs), followed by ligation to P1 Illumina adaptors with 6 bp inline barcodes. Ligated samples were multiplexed and sheared by sonication in a Bioruptor (Diagenode). To ensure that most of our paired-end reads would overlap unambiguously and produce longer contiguous sequences, we selected a narrow fragment size range of 425-475 bp. The remainder of the protocol was per Baird et al.³¹. All fish were sequenced on an Illumina HiSeq 2500 using paired-end 250 bp sequencing reads at the University of Oregon’s Genomics and Cell Characterization Core Facility (GC3F).

Sequence preparation

Raw Illumina sequence reads were demultiplexed, cleaned, and processed primarily using the Stacks pipeline⁵⁴. Paired-end reads were demultiplexed with process_shortreads and cleaned using process_radtags using default criteria (throughout this document, names of scripts, programs, functions, and command-line arguments will appear in fixed-width font). Overlapping read pairs were then merged with fastq-join⁵⁵ (Fig. S1). Pairs that failed to merge were removed from further analysis. In order to retain the majority of the sequence data for analysis in Stacks and still maintain adequate contig lengths, merged contigs were trimmed to 350 bp and all contigs shorter than 350 bp were discarded. We aligned these contigs to the stickleback reference genome^19,49 using bbmap with the most sensitive alignment settings (‘vslow=t’; http://jgi.doe.gov/data-and-tools/bbtools/) and used the pstacks, cstacks, and sstacks components of the Stacks pipeline to create stacks and call SNPs and haplotypes, create a catalog of RAD tags across individuals, and match tags across individuals. All data were then passed through the Stacks error correction module rxstacks to prune unlikely haplotypes. We ran the Stacks component program populations on the final dataset to filter loci genotyped in fewer than four individuals in each population and to create output files for sequence analysis. We use the naming conventions of Baird et al.⁵⁶: A “RAD tag” refers to sequence generated from a single end of a restriction site and the pair of RAD tags sequenced at a restriction site comprises a “RAD locus” (Figure 2.1).

We used the program phase ⁵⁷ to phase pairs of RAD tags originating from the same restriction site. We coded haplotypes present at each RAD tag, which often contain multiple SNPs, into multiallelic genotypes. This both simplified and reduced computing time for the phasing process. Custom Python scripts automated this process and are included as supplementary files. We required that each individual had at least one sequenced haplotype at each tag for phasing to be attempted. If a sample had called genotypes at only one tag in the pair, the sample was removed from further analysis of that locus. The resultant phased haplotypes were used to generate sequence alignments for import into BEAST.

We recovered a total of 236,787 RAD tags after filtering, mapping to 151,813 PstI restriction sites. At 84,974 restriction sites, we recovered and successfully phased adjacent RAD tags (169,948 RAD tags) into single RAD loci. We retained these 84,974 RAD loci for our analysis.

Ninespine stickleback genome assembly

In order to estimate the T_mrca of threespine stickleback RAD alleles, we used the ninespine stickleback (Pungitius pungitius) as an outgroup (Figure 3.1, see Figure 1.2). RAD sequence analysis, however, relies on the presence of homologous restriction sites among sampled individuals and results in null alleles when mutations occur within a restriction site⁵⁸. Because this probability increases with greater evolutionary distance among sampled sequences, we elected to use RAD-seq to only estimate sequence variation within the threespine stickleback. We then generated a contig-level de novo ninespine stickleback genome assembly from a single ninespine stickleback individual from St. Lawrence Island, Alaska (collected by J. Postlethwait) using DISCOVAR de novo (https://software.broadinstitute.org/software/discovar). We used this single ninespine stickleback haplotype to estimate threespine-ninespine sequence divergence and time calibrate coalescence times within the threespine stickleback. DISCOVAR de novo requires a single shotgun library of paired-end 250-bp sequence reads from short-insert-length DNA fragments. High molecular weight genomic DNA was extracted from an ethanol-preserved fin clip by proteinase K digestion followed by DNA extraction with Ampure magnetic beads. Purified genomic DNA was mechanically sheared by sonication and size selected to a range of 200-800 bp by gel electrophoresis and extraction. We selected this fragment range to agree with the recommendations for de novo assembly using the DISCOVAR de novo (https://software.broadinstitute.org/software/discovar/blog). This library was sequenced on a single lane of an Illumina HiSeq2500 at the University of Oregon’s Genomics and Cell Characterization Core Facility (GC3F: https://gc3f.uoregon.edu/). We assembled the draft ninespine stickleback genome using DISCOVAR de novo. Raw sequence read pairs were first quality filtered and adaptor sequence contamination removed using the program process_shortreads, which is included in the Stacks analysis pipeline⁵⁹. We ran the genome assembly on the University of Oregon’s Applied Computational Instrument for Scientific Synthesis (ACISS: http://aciss-computing.uoregon.edu).

Alignment of RAD tags to the ninespine assembly

We included the single ninespine stickleback haplotype into our sequence analyses by aligning a single phased threespine stickleback RAD haplotype from each locus to the ninespine genome assembly. For those that aligned uniquely (59,254 RAD loci), we used a custom Python script to parse the output BAM file⁶⁰ and reconstruct the ninespine haplotype from the query sequence and alignment fields. The final dataset consists of 57,992 RAD loci that mapped to the 21 threespine stickleback chromosomes and aligned uniquely to the ninespine assembly.

Lineage sorting and time to the most recent common ancestor

Allelic divergence can occur by multiple modes of lineage sorting during adaptation. To identify patterns of lineage sorting associated with freshwater colonization, we analyzed gene tree topologies at all RAD loci using BEAST v. 1.7^39,61. We used blanket parameters and priors for BEAST analyses across all RAD loci. Markov chain Monte Carlo (MCMC) runs of 1,000,000 states were specified, and trees logged every 100 states. We used a coalescent tree prior and the GTR+Γ substitution model with four rate categories and uniform priors for all substitution rates. We identified evidence of lineage sorting by using the program treeannotator to select the maximum clade credibility (MCC) tree for each RAD locus and the is.monophyletic() function included in the R package ‘ape’⁶². We determined for each MCC tree whether tips originating from marine (RS) or freshwater (BL+BP) formed monophyletic clades.

To convert node ages estimated in BEAST into divergence times, in years, we assumed a 15 million-year divergence time between threespine and ninespine stickleback at each RAD locus³⁸ The T_mrca of all alleles in each gene tree was set at 15 Mya at each node age of interest was converted into years relative to the total height of the tree. Additionally, to use the ninespine stickleback as an outgroup, we required that threespine stickleback haplotypes at a RAD locus were monophyletic to the exclusion of the ninespine haplotype. Doing so reduced our analysis to 49,672 RAD loci for analyses included in Fig. 4 of the main text. RAD loci not showing this pattern of lineage sorting did not show evidence of a genome-wide correlation with marine-freshwater divergence and thus do not impact the assertions in the main text. We used medians of the posterior distributions as point estimates of T_mrca for each RAD locus. Because of the somewhat limited information from any single RAD locus, and because the facts of the genealogical process mean that the true T_mrca at any locus likely differs from the 15 My estimate^63-65, we do not rely heavily on T_mrca estimates at individual RAD loci. Rather, we use these estimates to understand patterns of broad patterns of ancestry throughout the threespine stickleback genome — spatially along chromosomes and genome-wide patterns.

We determined T_mrca outlier genomic regions by permuting and kernel smoothing the genomic distribution of T_mrca estimates using the same window sizes as we present in the main text. Windows where the actual T_mrca value exceeded 99.9% of permuted windows were considered outliers. This method controls for the local density of RAD loci (poorly sampled regions will have larger confidence bands) and the size of the windows used.

Sequence diversity and haplotype networks

We quantified sequence diversity within and among populations and sequence divergence between populations using R (R Core Team⁶⁶). We used the R package ‘ape’⁶² to compute pairwise distance matrices for all alleles at each RAD locus and used these matrices to calculate the average pairwise nucleotide distances, π, within and among populations along with d_xy, the average pairwise distance between two sequences using only across-population comparisons⁶⁷. We also calculated the haplotype-based F_st from Hudson et al.⁶⁸ implemented in the R package ‘PopGenome’⁶⁹. We used permutation tests written in R to identify differences in variation within- and between-habitat type at divergent RAD loci versus the genome-wide distributions. Mann-Whitney-Wilcoxon tests implemented in R were used to identify variation in genome-wide diversity among populations and habitat types.

We constructed haplotype networks of the RAD loci at eda and atp1a1 using the infinite sites model with the function haploNet() in the R package ‘pegas’⁷⁰. The atp1a1 network was constructed from from a RAD locus spanning exon 15 of atp1a1 and including portions of introns 14 and 15 at (chr1:21,726,729-21,727,381 [BROAD S1, v89]; chr1: 26,258,117-26,257,465 [re-scaffolding from Glazer, et al⁴⁹]). The eda network spans exon 2 and portions of introns 1 and 3 of eda (chr4: 12,808,396-12,809,030).

Code availability

Scripts used to phase RAD-tags, summarize gene trees, calculate population genetic statistics, and produce figures and statistics presented in paper are available at https://github.com/thomnelson/ancient-divergence. Scripts for processing raw sequence data are available from the authors upon request.

DATA AVAILABILITY

Raw sequence data supporting these findings are available on the Sequence Read Archive at PRJNAXXXXXX. The final datasets needed to reproduce the figures and statistics presented in the paper are available at https://github.com/thomnelson/ancient-divergence.

AUTHOR CONTRIBUTIONS

TCN and WAC conceived of the project and designed sampling, sequencing, and analysis. TCN prepared sequencing libraries, wrote software, and performed data analysis. TCN and WAC wrote the paper.

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

ACKNOWLEDGEMENTS

We thank P. Phillips, M. Streisfeld, J. Postlethwait, K. Sterner for valuable input and lively discussion throughout this project. We also thank K. Alligood, E. Beck, S. Bassham, M. Chase, M. Currey, M. Hahn, L. Fishman, C. Small, S. Stankowski, J. Willis, two anonymous reviewers, and members of the Cresko Lab and the Institute of Ecology and Evolution for advice and comments on previous versions of this manuscript. J. Postlethwait graciously donated ninespine stickleback tissue, collected under award XXXXXXXX. We acknowledge National Science Foundation awards NSF DEB 1501423 (WAC and TCN), NSF DEB 0949053 (WAC), and National Institutes of Health award NIH T32GM007413 (TCN).

LITERATURE CITED

↵
Wright, S. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proceedings of the Sixth International Congress on Genetics 1, 356–366 (1932).
OpenUrl
↵
Orr, H. A. The genetic theory of adaptation: a brief history. Nature Reviews Genetics 6, 119–127, doi:10.1038/nrg1523 (2005).
OpenUrl CrossRef PubMed Web of Science
↵
Barrett, R. D. H. & Schluter, D. Adaptation from standing genetic variation. Trends in Ecology and Evolution 23, 38–44, doi:10.1016/j.tree.2007.09.008 (2008).
OpenUrl CrossRef PubMed Web of Science
Domingues, V. S. et al. Evidence of adaptation from ancestral variation in young populations of beach mice. Evolution 66, 3209–3223, doi:10.1111/j.1558-5646.2012.01669.x (2012).
OpenUrl CrossRef PubMed Web of Science
↵
Schrider, D. R. & Kern, A. D. Soft sweeps are the dominant mode of adaptation in the human genome. Molecular Biology and Evolution, doi:10.1093/molbev/msx154 (2017).
OpenUrl CrossRef PubMed
↵
Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197, doi:10.1038/nature13408 (2014).
OpenUrl CrossRef PubMed Web of Science
↵
Fontaine, M. C. et al. Mosquito genomics. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science 347, 1258524, doi:10.1126/science.1258524 (2015).
OpenUrl Abstract/FREE Full Text
↵
Grant, P. R. & Grant, B. R. Unpredictable evolution in a 30-year study of Darwin’s finches. Science 296, 707–711, doi:DOI 10.1126/science.1070315 (2002).
OpenUrl Abstract/FREE Full Text
↵
Wright, K. M., et al. Indirect Evolution of Hybrid Lethality Due to Linkage with Selected Locus in Mimulus guttatus. PLoS Biology 11, doi:10.1371/journal.pbio.1001497 (2013).
OpenUrl CrossRef PubMed
↵
Colosimo, P. F. et al. Widespread parallel evolution in sticklebacks by repeated fixation of Ectodysplasin alleles. Science 307, 1928–1933, doi:10.1126/science.1107239 (2005).
OpenUrl Abstract/FREE Full Text
↵
Kirkpatrick, M. & Barton, N. Chromosome inversions, local adaptation and speciation. Genetics 173, 419–434, doi:10.1534/genetics.105.047985 (2006).
OpenUrl Abstract/FREE Full Text
Charlesworth, B., Morgan, M. T. & Charlesworth, D. The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303 (1993).
OpenUrl Abstract/FREE Full Text
Linnen, C. R., et al. On the origin and spread of an adaptive allele in deer mice. Science 325, 1095–1098, doi:10.1126/science.1175826 (2009).
OpenUrl Abstract/FREE Full Text
↵
Stankowski, S. & Streisfeld, M. A. Introgressive hybridization facilitates adaptive divergence in a recent radiation of monkeyflowers. Proceedings of the Royal Society B: Biological Sciences 282, 20151666, doi:10.1098/rspb.2015.1666 (2015).
OpenUrl CrossRef PubMed
↵
Pease, J. B., et al. Phylogenomics reveals three sources of adaptive variation during a rapid radiation. PLoS Biology 14, e1002379, doi:10.1371/journal.pbio.1002379 (2016).
OpenUrl CrossRef PubMed
↵
Schlotterer, C., et al. Sequencing pools of individuals - mining genome-wide polymorphism data without big funding. Nature Reviews Genetics 15, 749–763, doi:10.1038/nrg3803 (2014).
OpenUrl CrossRef PubMed
↵
Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics 12, 499–510, doi:10.1038/nrg3012 (2011).
OpenUrl CrossRef PubMed
↵
Bell, M. A. & Foster, S. A. in The Evolutionary Biology of the Threespine Stickleback (eds M. A. Bell & S. A. Foster) Ch. 1, 1–27 (Oxford University Press, 1994).
↵
Jones, F. C. et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61, doi:10.1038/nature10944 (2012).
OpenUrl CrossRef PubMed Web of Science
↵
Colosimo, P. F. et al. The genetic architecture of parallel armor plate reduction in threespine sticklebacks. PLoS Biology 2, 635–641, doi:10.1371/journal.pbio.0020109 (2004).
OpenUrl CrossRef Web of Science
↵
Cresko, W. A. et al. Parallel genetic basis for repeated evolution of armor loss in Alaskan threespine stickleback populations. Proc Natl Acad Sci U S A 101, 6050–6055, doi:10.1073/pnas.0308479101 (2004).
OpenUrl Abstract/FREE Full Text
↵
Hohenlohe, P. A. et al. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. Plos Genet 6, e1000862, doi:10.1371/journal.pgen.1000862 (2010).
OpenUrl CrossRef PubMed
↵
Stuart, Y. E. et al. Contrasting effects of environment and genetics generate a continuum of parallel evolution. Nature Ecology & Evolution 1, 0158, doi:10.1038/s41559-017-0158 (2017).
OpenUrl CrossRef
↵
Roesti, M., et al. The genomic signature of parallel adaptation from shared genetic variation. Mol Ecol 23, 3944–3956, doi:10.1111/mec.12720 (2014).
OpenUrl CrossRef PubMed
↵
Francis, R. C., et al. Historical and ecological sources of variation among lake populations of threespine sticklebacks, Gasterosteus aculeatus, near Cook Inlet, Alaska. Canadian Journal of Zoology 64, 2257–2265 (1986).
OpenUrl
↵
Bell, M. A. & Foster, S. A. in The Evolutionary Biology of the Threespine Stickleback (eds M. A. Bell & S. A. Foster) Ch. 16, 472–486 (Oxford University Press, 1994).
↵
Kimmel, C. B. et al. Evolution and development of facial bone morphology in threespine sticklebacks. Proc Natl Acad Sci U S A 102, 5791–5796, doi:10.1073/pnas.0408533102 (2005).
OpenUrl Abstract/FREE Full Text
↵
Reimchen, T. E. in The Evolutionary Biology of the Threespine Stickleback (eds M. A. Bell & S. A. Foster) Ch. 9, 240–276 (Oxford University Press, 1994).
↵
Arnegard, M. E. et al. Genetics of ecological divergence during speciation. Nature 511, 307–311, doi:10.1038/nature13301 (2014).
OpenUrl CrossRef PubMed
↵
Schluter, D. & Conte, G. L. Genetics and ecological speciation. Proc Natl Acad Sci U S A 106 Suppl 1, 9955–9962, doi:10.1073/pnas.0901264106 (2009).
OpenUrl Abstract/FREE Full Text
↵
Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. Plos One 3, e3376, doi:10.1371/journal.pone.0003376 (2008).
OpenUrl CrossRef PubMed
↵
Charlesworth, B., Nordborg, M. & Charlesworth, D. The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genetics Research 70, 155–174, doi:Doi 10.1017/S0016672397002954 (1997).
OpenUrl CrossRef PubMed Web of Science
↵
Lenormand, T. Gene flow and the limits to natural selection. Trends Ecol Evol 17, 183–189, doi:Doi 10.1016/S0169-5347(02)02497-7 (2002).
OpenUrl CrossRef
↵
Otto, S. P. & Bourguet, D. Balanced polymorphisms and the evolution of dominance. American Naturalist 153, 561–574, doi:Doi 10.1086/303204 (1999).
OpenUrl CrossRef Web of Science
↵
Phillips, P. C. Epistasis–the essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics 9, 855–867, doi:10.1038/nrg2452 (2008).
OpenUrl CrossRef PubMed Web of Science
↵
McGuigan, K., et al. Cryptic genetic variation and body size evolution in threespine stickleback. Evolution 65, 1203–1211, doi:10.1111/j.1558-5646.2010.01195.x (2011).
OpenUrl CrossRef PubMed Web of Science
↵
McCairns, R. J. S. & Bernatchez, L. Plasticity and heritability of morphological variation within and between parapatric stickleback demes. Journal of Evolutionary Biology 25, 1097–1112, doi:10.1111/j.1420-9101.2012.02496.x (2012).
OpenUrl CrossRef PubMed
↵
Aldenhoven, J. T., et al. Phylogeography of ninespine sticklebacks (Pungitius pungitius) in North America: glacial refugia and the origins of adaptive traits. Mol Ecol 19, 4061–4076, doi:10.1111/j.1365-294X.2010.04801.x (2010).
OpenUrl CrossRef PubMed
↵
Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29, 1969–1973, doi:10.1093/molbev/mss075 (2012).
OpenUrl CrossRef PubMed Web of Science
↵
Bell, M. A., Baumgartner, J. V. & Olson, E. C. Patterns of temporal change in single morphological characters of a Miocene stickleback fish. Paleobiology 11, 258–271 (1985).
OpenUrl Abstract
↵
Miller, C. T. et al. Modular skeletal evolution in sticklebacks is controlled by additive and clustered quantitative trait loci. Genetics 197, 405–420, doi:10.1534/genetics.114.162420 (2014).
OpenUrl Abstract/FREE Full Text
↵
Barrett, R. D., Rogers, S. M. & Schluter, D. Natural selection on a major armor gene in threespine stickleback. Science 322, 255–257, doi:10.1126/science.1159978 (2008).
OpenUrl Abstract/FREE Full Text
↵
McKinnon, J. S. & Rundle, H. D. Speciation in nature:the threespine stickleback model systems. Trends in Ecology and Evolution 17, 480–488 (2002).
OpenUrl CrossRef Web of Science
↵
Roesti, M., Kueng, B., Moser, D. & Berner, D. The genomics of ecological vicariance in threespine stickleback fish. Nature Communications 6, 8767, doi:10.1038/ncomms9767 (2015).
OpenUrl CrossRef PubMed
↵
Deagle, B. E. et al. Population genomics of parallel phenotypic evolution in stickleback across stream-lake ecological transitions. Proceedings of the Royal Society B: Biological Sciences 279, 1277–1286, doi:10.1098/rspb.2011.1552 (2012).
OpenUrl CrossRef PubMed
↵
Samuk, K. et al. Gene flow and selection interact to promote adaptive divergence in regions of low recombination. Molecular Ecology, doi:10.1111/mec.14226 (2017).
OpenUrl CrossRef
↵
Roesti, M., Moser, D. & Berner, D. Recombination in the threespine stickleback genome - Patterns and consequences. Molecular Ecology 22, 3014–3027, doi:10.1111/mec.12322 (2013).
OpenUrl CrossRef PubMed Web of Science
↵
Aeschbacher, S., et al. Population-genomic inference of the strength and timing of selection against gene flow. Proceedings of the National Academy of Sciences USA 114, 7061–7066, doi:10.1073/pnas.1616755114 (2017).
OpenUrl Abstract/FREE Full Text
↵
Glazer, A. M., et al. Genome assembly improvement and mapping of convergently evolved skeletal traits in sticklebacks with genotyping-by-sequencing. G3-Genes Genomes Genetics 5, 1463–1472, doi:10.1534/g3.115.017905 (2015).
OpenUrl Abstract/FREE Full Text
↵
Guerrero, R. F., Rousset, F. & Kirkpatrick, M. Coalescent patterns for chromosomal inversions in divergent populations. Philosophical Transactions of the Royal Society B: Biological Sciences 367, 430–438, doi:10.1098/rstb.2011.0246 (2012).
OpenUrl CrossRef PubMed
↵
Hendry, A. P., Taylor, E. B. & McPhail, J. D. Adaptive divergence and the balance between selection and gene flow: lake and stream stickleback in the Misty system. Evolution 56, 1199–1216, doi:10.1554/0014-3820(2002)056[1199:ADATBB]2.0.CO;2 (2002).
OpenUrl CrossRef PubMed Web of Science
↵
Lescak, E. A. et al. Evolution of stickleback in 50 years on earthquake-uplifted islands. Proceedings of the National Academy of Sciences USA 112, E7204–7212, doi:10.1073/pnas.1512020112 (2015).
OpenUrl Abstract/FREE Full Text
↵
Cresko, W. A. et al. Parallel genetic basis for repeated evolution of armor loss in Alaskan threespine stickleback populations. Proceedings of the National Academy of Sciences USA 101, 6050–6055, doi:10.1073/pnas.0308479101 (2004).
OpenUrl Abstract/FREE Full Text
↵
Catchen, J. M., et al. Stacks: building and genotyping Loci de novo from short-read sequences. G3 - Genes Genomes Genetics 1, 171–182, doi:10.1534/g3.111.000240 (2011).
OpenUrl CrossRef
↵
Aronesty, E. ea-utils: Command-line tools for processing biological sequencing data. (2011).
↵
Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. Plos One 3, 1–7, doi:10.1371/journal.pone.0003376 (2008).
OpenUrl CrossRef PubMed
↵
Stephens, M., Smith, N. J. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989, doi:Doi 10.1086/319501 (2001).
OpenUrl CrossRef PubMed Web of Science
↵
Arnold, B., et al. RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol Ecol 22, 3179–3190, doi:10.1111/mec.12276 (2013).
OpenUrl CrossRef Web of Science
↵
Catchen, J., et al. Stacks: An analysis tool set for population genomics. Molecular Ecology 22, 3124–3140, doi:10.1111/mec.12354 (2013).
OpenUrl CrossRef PubMed Web of Science
↵
Li, H. et al. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
OpenUrl CrossRef PubMed Web of Science
↵
Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. Bmc Evol Biol 7, 214, doi:10.1186/1471-2148-7-214 (2007).
OpenUrl CrossRef PubMed
↵
Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290, doi:10.1093/bioinformatics/btg412 (2004).
OpenUrl CrossRef PubMed Web of Science
↵
Kingman, J. F. C. The coalescent. Stochastic Processes and their Applications 13, 235–248, doi:10.1016/0304-4149(82)90011-4 (1982).
OpenUrl CrossRef
Kingman, J. F. C. On the genealogy of large populations. Journal of Applied Probability 19, 27–43 (1982).
OpenUrl CrossRef PubMed
↵
Tajima, F. Evolutionary relationship of DNA-sequences in finite populations. Genetics 105, 437–460 (1983).
OpenUrl Abstract/FREE Full Text
↵
R Core Team. R Foundation for Statistical Computing, Vienna, Austria (2016).
↵
Nei, M. Molecular Evolutionary Genetics. (Columbia university press, 1987).
↵
Hudson, R. R., Slatkin, M. & Maddison, W. P. Estimation of levels of gene flow from DNA-sequence data. Genetics 132, 583–589 (1992).
OpenUrl Abstract/FREE Full Text
↵
Pfeifer, B., et al. PopGenome: An efficient swiss army knife for population genomic analyses in R. Molecular Biology and Evolution 31, 1929–1936, doi:10.1093/molbev/msu136 (2014).
OpenUrl CrossRef PubMed Web of Science
↵
Paradis, E. pegas: an R package for population genetics with an integrated-modular approach. Bioinformatics 26, 419–420, doi:10.1093/bioinformatics/btp696 (2010).
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted July 25, 2017.

Download PDF

Citation Tools

Subject Area

Evolutionary Biology

Subject Areas

All Articles

Animal Behavior and Cognition (5201)
Biochemistry (11715)
Bioengineering (8723)
Bioinformatics (29129)
Biophysics (14936)
Cancer Biology (12049)
Cell Biology (17359)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14144)
Epidemiology (2067)
Evolutionary Biology (18268)
Genetics (12221)
Genomics (16767)
Immunology (11843)
Microbiology (28014)
Molecular Biology (11560)
Neuroscience (60814)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4940)
Plant Biology (10384)
Scientific Communication and Education (1680)
Synthetic Biology (2878)
Systems Biology (7333)
Zoology (1642)

[1] ↵
Wright, S. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proceedings of the Sixth International Congress on Genetics 1, 356–366 (1932).
OpenUrl

[2] ↵
Orr, H. A. The genetic theory of adaptation: a brief history. Nature Reviews Genetics 6, 119–127, doi:10.1038/nrg1523 (2005).
OpenUrl CrossRef PubMed Web of Science

[3] ↵
Barrett, R. D. H. & Schluter, D. Adaptation from standing genetic variation. Trends in Ecology and Evolution 23, 38–44, doi:10.1016/j.tree.2007.09.008 (2008).
OpenUrl CrossRef PubMed Web of Science

[4] Domingues, V. S. et al. Evidence of adaptation from ancestral variation in young populations of beach mice. Evolution 66, 3209–3223, doi:10.1111/j.1558-5646.2012.01669.x (2012).
OpenUrl CrossRef PubMed Web of Science

[5] ↵
Schrider, D. R. & Kern, A. D. Soft sweeps are the dominant mode of adaptation in the human genome. Molecular Biology and Evolution, doi:10.1093/molbev/msx154 (2017).
OpenUrl CrossRef PubMed

[6] ↵
Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197, doi:10.1038/nature13408 (2014).
OpenUrl CrossRef PubMed Web of Science

[7] ↵
Fontaine, M. C. et al. Mosquito genomics. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science 347, 1258524, doi:10.1126/science.1258524 (2015).
OpenUrl Abstract/FREE Full Text

[8] ↵
Grant, P. R. & Grant, B. R. Unpredictable evolution in a 30-year study of Darwin’s finches. Science 296, 707–711, doi:DOI 10.1126/science.1070315 (2002).
OpenUrl Abstract/FREE Full Text

[9] ↵
Wright, K. M., et al. Indirect Evolution of Hybrid Lethality Due to Linkage with Selected Locus in Mimulus guttatus. PLoS Biology 11, doi:10.1371/journal.pbio.1001497 (2013).
OpenUrl CrossRef PubMed

[10] ↵
Colosimo, P. F. et al. Widespread parallel evolution in sticklebacks by repeated fixation of Ectodysplasin alleles. Science 307, 1928–1933, doi:10.1126/science.1107239 (2005).
OpenUrl Abstract/FREE Full Text

[11] ↵
Kirkpatrick, M. & Barton, N. Chromosome inversions, local adaptation and speciation. Genetics 173, 419–434, doi:10.1534/genetics.105.047985 (2006).
OpenUrl Abstract/FREE Full Text

[12] Charlesworth, B., Morgan, M. T. & Charlesworth, D. The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303 (1993).
OpenUrl Abstract/FREE Full Text

[13] Linnen, C. R., et al. On the origin and spread of an adaptive allele in deer mice. Science 325, 1095–1098, doi:10.1126/science.1175826 (2009).
OpenUrl Abstract/FREE Full Text

[14] ↵
Stankowski, S. & Streisfeld, M. A. Introgressive hybridization facilitates adaptive divergence in a recent radiation of monkeyflowers. Proceedings of the Royal Society B: Biological Sciences 282, 20151666, doi:10.1098/rspb.2015.1666 (2015).
OpenUrl CrossRef PubMed

[15] ↵
Pease, J. B., et al. Phylogenomics reveals three sources of adaptive variation during a rapid radiation. PLoS Biology 14, e1002379, doi:10.1371/journal.pbio.1002379 (2016).
OpenUrl CrossRef PubMed

[16] ↵
Schlotterer, C., et al. Sequencing pools of individuals - mining genome-wide polymorphism data without big funding. Nature Reviews Genetics 15, 749–763, doi:10.1038/nrg3803 (2014).
OpenUrl CrossRef PubMed

[17] ↵
Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics 12, 499–510, doi:10.1038/nrg3012 (2011).
OpenUrl CrossRef PubMed

[18] ↵
Bell, M. A. & Foster, S. A. in The Evolutionary Biology of the Threespine Stickleback (eds M. A. Bell & S. A. Foster) Ch. 1, 1–27 (Oxford University Press, 1994).

[19] ↵
Jones, F. C. et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61, doi:10.1038/nature10944 (2012).
OpenUrl CrossRef PubMed Web of Science

[20] ↵
Colosimo, P. F. et al. The genetic architecture of parallel armor plate reduction in threespine sticklebacks. PLoS Biology 2, 635–641, doi:10.1371/journal.pbio.0020109 (2004).
OpenUrl CrossRef Web of Science

[21] ↵
Cresko, W. A. et al. Parallel genetic basis for repeated evolution of armor loss in Alaskan threespine stickleback populations. Proc Natl Acad Sci U S A 101, 6050–6055, doi:10.1073/pnas.0308479101 (2004).
OpenUrl Abstract/FREE Full Text

[22] ↵
Hohenlohe, P. A. et al. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. Plos Genet 6, e1000862, doi:10.1371/journal.pgen.1000862 (2010).
OpenUrl CrossRef PubMed

[23] ↵
Stuart, Y. E. et al. Contrasting effects of environment and genetics generate a continuum of parallel evolution. Nature Ecology & Evolution 1, 0158, doi:10.1038/s41559-017-0158 (2017).
OpenUrl CrossRef

[24] ↵
Roesti, M., et al. The genomic signature of parallel adaptation from shared genetic variation. Mol Ecol 23, 3944–3956, doi:10.1111/mec.12720 (2014).
OpenUrl CrossRef PubMed

[25] ↵
Francis, R. C., et al. Historical and ecological sources of variation among lake populations of threespine sticklebacks, Gasterosteus aculeatus, near Cook Inlet, Alaska. Canadian Journal of Zoology 64, 2257–2265 (1986).
OpenUrl

[26] ↵
Bell, M. A. & Foster, S. A. in The Evolutionary Biology of the Threespine Stickleback (eds M. A. Bell & S. A. Foster) Ch. 16, 472–486 (Oxford University Press, 1994).

[27] ↵
Kimmel, C. B. et al. Evolution and development of facial bone morphology in threespine sticklebacks. Proc Natl Acad Sci U S A 102, 5791–5796, doi:10.1073/pnas.0408533102 (2005).
OpenUrl Abstract/FREE Full Text

[28] ↵
Reimchen, T. E. in The Evolutionary Biology of the Threespine Stickleback (eds M. A. Bell & S. A. Foster) Ch. 9, 240–276 (Oxford University Press, 1994).

[29] ↵
Arnegard, M. E. et al. Genetics of ecological divergence during speciation. Nature 511, 307–311, doi:10.1038/nature13301 (2014).
OpenUrl CrossRef PubMed

[30] ↵
Schluter, D. & Conte, G. L. Genetics and ecological speciation. Proc Natl Acad Sci U S A 106 Suppl 1, 9955–9962, doi:10.1073/pnas.0901264106 (2009).
OpenUrl Abstract/FREE Full Text

[31] ↵
Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. Plos One 3, e3376, doi:10.1371/journal.pone.0003376 (2008).
OpenUrl CrossRef PubMed

[32] ↵
Charlesworth, B., Nordborg, M. & Charlesworth, D. The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genetics Research 70, 155–174, doi:Doi 10.1017/S0016672397002954 (1997).
OpenUrl CrossRef PubMed Web of Science

[33] ↵
Lenormand, T. Gene flow and the limits to natural selection. Trends Ecol Evol 17, 183–189, doi:Doi 10.1016/S0169-5347(02)02497-7 (2002).
OpenUrl CrossRef

[34] ↵
Otto, S. P. & Bourguet, D. Balanced polymorphisms and the evolution of dominance. American Naturalist 153, 561–574, doi:Doi 10.1086/303204 (1999).
OpenUrl CrossRef Web of Science

[35] ↵
Phillips, P. C. Epistasis–the essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics 9, 855–867, doi:10.1038/nrg2452 (2008).
OpenUrl CrossRef PubMed Web of Science

[36] ↵
McGuigan, K., et al. Cryptic genetic variation and body size evolution in threespine stickleback. Evolution 65, 1203–1211, doi:10.1111/j.1558-5646.2010.01195.x (2011).
OpenUrl CrossRef PubMed Web of Science

[37] ↵
McCairns, R. J. S. & Bernatchez, L. Plasticity and heritability of morphological variation within and between parapatric stickleback demes. Journal of Evolutionary Biology 25, 1097–1112, doi:10.1111/j.1420-9101.2012.02496.x (2012).
OpenUrl CrossRef PubMed

[38] ↵
Aldenhoven, J. T., et al. Phylogeography of ninespine sticklebacks (Pungitius pungitius) in North America: glacial refugia and the origins of adaptive traits. Mol Ecol 19, 4061–4076, doi:10.1111/j.1365-294X.2010.04801.x (2010).
OpenUrl CrossRef PubMed

[39] ↵
Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29, 1969–1973, doi:10.1093/molbev/mss075 (2012).
OpenUrl CrossRef PubMed Web of Science

[40] ↵
Bell, M. A., Baumgartner, J. V. & Olson, E. C. Patterns of temporal change in single morphological characters of a Miocene stickleback fish. Paleobiology 11, 258–271 (1985).
OpenUrl Abstract

[41] ↵
Miller, C. T. et al. Modular skeletal evolution in sticklebacks is controlled by additive and clustered quantitative trait loci. Genetics 197, 405–420, doi:10.1534/genetics.114.162420 (2014).
OpenUrl Abstract/FREE Full Text

[42] ↵
Barrett, R. D., Rogers, S. M. & Schluter, D. Natural selection on a major armor gene in threespine stickleback. Science 322, 255–257, doi:10.1126/science.1159978 (2008).
OpenUrl Abstract/FREE Full Text

[43] ↵
McKinnon, J. S. & Rundle, H. D. Speciation in nature:the threespine stickleback model systems. Trends in Ecology and Evolution 17, 480–488 (2002).
OpenUrl CrossRef Web of Science

[44] ↵
Roesti, M., Kueng, B., Moser, D. & Berner, D. The genomics of ecological vicariance in threespine stickleback fish. Nature Communications 6, 8767, doi:10.1038/ncomms9767 (2015).
OpenUrl CrossRef PubMed

[45] ↵
Deagle, B. E. et al. Population genomics of parallel phenotypic evolution in stickleback across stream-lake ecological transitions. Proceedings of the Royal Society B: Biological Sciences 279, 1277–1286, doi:10.1098/rspb.2011.1552 (2012).
OpenUrl CrossRef PubMed

[46] ↵
Samuk, K. et al. Gene flow and selection interact to promote adaptive divergence in regions of low recombination. Molecular Ecology, doi:10.1111/mec.14226 (2017).
OpenUrl CrossRef

[47] ↵
Roesti, M., Moser, D. & Berner, D. Recombination in the threespine stickleback genome - Patterns and consequences. Molecular Ecology 22, 3014–3027, doi:10.1111/mec.12322 (2013).
OpenUrl CrossRef PubMed Web of Science

[48] ↵
Aeschbacher, S., et al. Population-genomic inference of the strength and timing of selection against gene flow. Proceedings of the National Academy of Sciences USA 114, 7061–7066, doi:10.1073/pnas.1616755114 (2017).
OpenUrl Abstract/FREE Full Text

[49] ↵
Glazer, A. M., et al. Genome assembly improvement and mapping of convergently evolved skeletal traits in sticklebacks with genotyping-by-sequencing. G3-Genes Genomes Genetics 5, 1463–1472, doi:10.1534/g3.115.017905 (2015).
OpenUrl Abstract/FREE Full Text

[50] ↵
Guerrero, R. F., Rousset, F. & Kirkpatrick, M. Coalescent patterns for chromosomal inversions in divergent populations. Philosophical Transactions of the Royal Society B: Biological Sciences 367, 430–438, doi:10.1098/rstb.2011.0246 (2012).
OpenUrl CrossRef PubMed

[51] ↵
Hendry, A. P., Taylor, E. B. & McPhail, J. D. Adaptive divergence and the balance between selection and gene flow: lake and stream stickleback in the Misty system. Evolution 56, 1199–1216, doi:10.1554/0014-3820(2002)056[1199:ADATBB]2.0.CO;2 (2002).
OpenUrl CrossRef PubMed Web of Science

[52] ↵
Lescak, E. A. et al. Evolution of stickleback in 50 years on earthquake-uplifted islands. Proceedings of the National Academy of Sciences USA 112, E7204–7212, doi:10.1073/pnas.1512020112 (2015).
OpenUrl Abstract/FREE Full Text

[53] ↵
Cresko, W. A. et al. Parallel genetic basis for repeated evolution of armor loss in Alaskan threespine stickleback populations. Proceedings of the National Academy of Sciences USA 101, 6050–6055, doi:10.1073/pnas.0308479101 (2004).
OpenUrl Abstract/FREE Full Text

[54] ↵
Catchen, J. M., et al. Stacks: building and genotyping Loci de novo from short-read sequences. G3 - Genes Genomes Genetics 1, 171–182, doi:10.1534/g3.111.000240 (2011).
OpenUrl CrossRef

[55] ↵
Aronesty, E. ea-utils: Command-line tools for processing biological sequencing data. (2011).

[56] ↵
Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. Plos One 3, 1–7, doi:10.1371/journal.pone.0003376 (2008).
OpenUrl CrossRef PubMed

[57] ↵
Stephens, M., Smith, N. J. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989, doi:Doi 10.1086/319501 (2001).
OpenUrl CrossRef PubMed Web of Science

[58] ↵
Arnold, B., et al. RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol Ecol 22, 3179–3190, doi:10.1111/mec.12276 (2013).
OpenUrl CrossRef Web of Science

[59] ↵
Catchen, J., et al. Stacks: An analysis tool set for population genomics. Molecular Ecology 22, 3124–3140, doi:10.1111/mec.12354 (2013).
OpenUrl CrossRef PubMed Web of Science

[60] ↵
Li, H. et al. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
OpenUrl CrossRef PubMed Web of Science

[61] ↵
Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. Bmc Evol Biol 7, 214, doi:10.1186/1471-2148-7-214 (2007).
OpenUrl CrossRef PubMed

[62] ↵
Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290, doi:10.1093/bioinformatics/btg412 (2004).
OpenUrl CrossRef PubMed Web of Science

[63] ↵
Kingman, J. F. C. The coalescent. Stochastic Processes and their Applications 13, 235–248, doi:10.1016/0304-4149(82)90011-4 (1982).
OpenUrl CrossRef

[64] Kingman, J. F. C. On the genealogy of large populations. Journal of Applied Probability 19, 27–43 (1982).
OpenUrl CrossRef PubMed

[65] ↵
Tajima, F. Evolutionary relationship of DNA-sequences in finite populations. Genetics 105, 437–460 (1983).
OpenUrl Abstract/FREE Full Text

[66] ↵
R Core Team. R Foundation for Statistical Computing, Vienna, Austria (2016).

[67] ↵
Nei, M. Molecular Evolutionary Genetics. (Columbia university press, 1987).

[68] ↵
Hudson, R. R., Slatkin, M. & Maddison, W. P. Estimation of levels of gene flow from DNA-sequence data. Genetics 132, 583–589 (1992).
OpenUrl Abstract/FREE Full Text

[69] ↵
Pfeifer, B., et al. PopGenome: An efficient swiss army knife for population genomic analyses in R. Molecular Biology and Evolution 31, 1929–1936, doi:10.1093/molbev/msu136 (2014).
OpenUrl CrossRef PubMed Web of Science

[70] ↵
Paradis, E. pegas: an R package for population genetics with an integrated-modular approach. Bioinformatics 26, 419–420, doi:10.1093/bioinformatics/btp696 (2010).
OpenUrl CrossRef PubMed Web of Science