Abstract
Sympatric speciation illustrates how natural and sexual selection may create new species in isolation without geographic barriers. However, so far, all genomic reanalyses of classic examples of sympatric speciation indicate secondary gene flow occurred. Thus, there is a need to revisit criteria for demonstrating sympatric speciation in the face of widespread gene flow. We summarize theoretical differences between sympatric speciation and speciation-with-gene-flow models and propose genomic criteria for sympatric speciation: 1) timing of fine-scale introgression; 2) timing of selective sweeps and 3) functional annotation of this introgressed variation; and 4) the absence of similar sweeps in outgroups. Monophyly is an insufficient criterion for sympatric speciation; we must take a locus-specific approach to investigate whether any introgression contributed to reproductive isolation.
What is sympatric speciation?
Sympatric speciation is the evolution of reproductive isolation within a single panmictic population without the aid of any geographic isolation [1]. It represents the most extreme and controversial endpoint on the divergence with gene flow continuum: panmictic gene flow and no initial divergence at the start of speciation [2–5]. In the context of theoretical speciation models, sympatric speciation is the most difficult process because the starting conditions involve no pre-existing divergence, potentially tied with physical linkage, among loci involved in reproductive isolation (i.e. ‘barrier’ loci [6,7]); instead, linkage disequilibrium (see Glossary) must build up through time within a population through the action of disruptive natural selection and strong assortative mating by ecotype, despite the countervailing eroding force of recombination [8–11].
We can thus distinguish different types of scenarios that will result in two sister species being found in sympatry based on whether secondary gene flow aided population divergence: 1) classic sympatric speciation without gene flow; 2) sympatric speciation in the presence of a) neutral secondary gene flow or b) after differential sorting of an ancestral hybrid swarm. In the latter case, it is important to distinguish whether the ancestral hybrid swarm population achieved panmixia before later divergence (i.e. sympatric divergence); otherwise, differential sorting of haplotypes within the hybrid swarm is better described by secondary contact speciation with gene flow models. 3) Speciation may be aided by secondary gene flow that a) triggers initial sympatric divergence or b) increases divergence after initial divergence in sympatry becomes stalled, an outcome of many sympatric speciation models without sufficiently strong disruptive selection [12–15]. Finally, 4) secondary contact after a period of allopatry between two populations can result in coexistence or reinforcement, if there is not collapse into a single admixed population [16–19]. We consider scenarios 1 and 2 to be examples of sympatric speciation, whereas scenarios 3 and 4 would be examples of speciation aided by secondary gene flow. Interestingly, hybrid swarm scenarios exist in a gray area, since substantial initial gene flow from multiple sources may increase ecological or preference variation within a population that is sufficient to trigger later sympatric divergence, even without segregating inversions or genetic incompatibilities [20–22]. So far, we know of no examples of scenario 1 within any case study of sympatric sister species examined using genomic tools; even long diverged species show some evidence of introgression in their past (e.g. [23]). In contrast, sympatric speciation with neutral gene flow (Scenario 2) and speciation aided by gene flow (Scenarios 3 and 4) frequently appear to operate concurrently even within a single sympatric adaptive radiation (e.g. [24–26]).
It is important to distinguish these scenarios because theoretical models predict that sympatric divergence unaided by any form of secondary gene flow is substantially more difficult than other speciation with gene flow scenarios (Box 1). Gene flow throughout the speciation process allows recombination to break down linkage disequilibrium among alleles associated with ecological divergence and assortative mating. There are actually three different classes of sympatric speciation models to consider: the most difficult process involves independently segregating loci for ecotype, female preferences, and male traits within the population, whereas sympatric divergence is much easier if any of these three types of traits are combined, such as assortative mating based on phenotype matching instead of separate loci for preference and traits [27,28] or “magic” traits (such as assortative mating based on microhabitat preference; [9,10,29]). Sympatric speciation by sexual selection alone is also theoretically possible if there is substantial preference variation either initially within the population or through secondary gene flow [20,30,31].
Any form of linkage disequilibrium among ecological and mate choice loci formed in allopatry, whether due to physical linkage, selection, or drift, can tend to shift the initial starting conditions of panmixia in favor of sympatric divergence [17]. However, linkage disequilibrium without physical linkage subsides within a few generations after secondary sympatry and thus may not allow sufficient time for the evolution of assortative mating within the population. In contrast, pre-existing physical linkage among ecological loci has been shown to increase the probability of divergence, especially when it captures already divergent alleles as is more likely after allopatric divergence [32,33]. Similarly, physical linkage can cause preference and trait alleles to mimic phenotype matching, although even tight linkage can break down over long timescales (shown in a model with population structure: [34]). Segregating inversions in the ancestral population are now well-known empirical examples of physical linkage promoting divergence in sympatry [35–37]. Sympatric divergence is also limited by many other restrictive conditions regarding the costs of female choosiness and strengths of disruptive selection and assortative mating (Box 1).
Despite extensive searches for examples of sympatric speciation in the wild, there are few convincing case studies due to the difficulty of ruling out historical allopatric scenarios (see below) and the new difficulty of ruling out a role of introgression in speciation. Furthermore, the role of magic traits or matching vs. preference/trait mechanisms is not fully understood in any existing case study. Thus, we still have very limited empirical tests of an extensive theoretical literature and diverse competing models of the notoriously difficult process of sympatric speciation [24,27,28,38–42].
The classic problem of sympatric speciation
There are four traditional criteria for demonstrating classic sympatric speciation (Scenario 1): 1) sister species which are reproductively isolated, 2) form a monophyletic group, 3) largely overlap in ranges, and 4) have biogeographic and evolutionary histories that make periods of allopatric divergence highly unlikely [1]. Very few case studies have been able to meet these rigorous criteria despite intense searches [1,5]. This has led to the prominent status of crater lake cichlid radiations as some of the best examples of sympatric speciation in the wild due to the uniform shape of isolated volcanic lakes which convincingly rule out phases of allopatry due to water level changes (Box 2; [43–45]).
The monophyly criterion assumes that monophyly arises only when a single ancestral population underlies the present-day daughter species. This is typically met by inferring a single phylogeny from one or more loci. This single point-estimate view of evolutionary history is problematic because it obscures the presence of non-bifurcating relationships among organisms (e.g. sister species which derived ancestry from multiple source populations due to extensive gene flow or hybrid speciation) and the real variation in evolutionary histories among genes across the genome itself (e.g. [46]). Few regions of the genome may initially contribute to reproductive isolation resulting in a heterogeneous genomic landscape of differentiation among incipient species [47], a pattern now extensively supported across case studies [48–51]. Therefore, monophyletic relationships are consistent with, but not exclusive to a scenario of sympatric speciation. Examining heterogeneous evolutionary histories across regions relevant to speciation is thus crucial for understanding the processes and conditions under which sympatric divergence can occur.
The ‘new’ problem of sympatric speciation
While genomics has increased our ability to resolve evolutionary relationships among organisms, it has also revealed more complex evolutionary histories of multiple colonization and extensive secondary gene flowin all examples of sympatric speciation that have been examined with genomic data so far [52–68]. Indeed, only a handful of genes may directly contribute to the speciation process whereas the rest of the genome is porous to gene flow while reproductive isolation is incomplete [47,69]. Examples of ‘classic’ sympatric speciation without secondary gene flow (Scenario 1) are now unknown after applying modern genomic tools to search for introgression (and paleogenomics is therefore unnecessary to provide historical point estimates of spatial isolation as recently suggested [70]). Instead, it is still possible that sympatric speciation occurs in the face of secondary gene flow in nearly all these examples (Scenario 2; [67]). Importantly, most evidence of secondary gene flow comes from genome-wide tests of introgression from outgroup lineages, not gene flow between diverging populations in sympatry (e.g. [64,65]). Therefore, introgression detected at the genome-wide level from lineages outside the speciation event tells us little about the divergence process among incipient sympatric species and how gene flow shaped the process of speciation.
The challenge of sympatric speciation in the genomic era is establishing or rejecting a functional role for the ubiquitous secondary gene flow present during the speciation process, in effect ruling out scenarios 3 and 4 in favor of scenario 2 (Fig. 1). Even if signatures of secondary gene flow are detected, speciation could still have occurred solely via mechanisms of sympatric speciation if that secondary gene flow did not play a causal role in divergence. Secondary gene flow could play a causal role if it introduced novel genetic variation or physically linked alleles (e.g. a segregating inversion) that promote speciation before the start of divergence, such as hybrid swarm (Scenario 2b; ([21,22,25,71]), adaptive introgression(Scenario 3; [72–77]), transgressive segregation (Scenarios 2–3; [78,79]), or hybrid speciation (Scenario 4; [80]). We note a distinction between ancestral introgression of segregating haplotypes promoting speciation (speciation with gene flow models apply) versus sufficient time for recombination to break down these haplotypes after a hybrid swarm and create panmictic conditions before the start of divergence (sympatric speciation models apply) versus simply inflated ecological and preference variation within a population due to hybrid swarm (gray area; sympatric speciation more likely due to this initial gene flow). Here we propose and discuss new genomic criteria to help establish or reject a functional role of secondary gene flow in the speciation process (Fig. 1). This is necessary to identify putative cases of the sympatric speciation process when gene flow appears to be nearly universal in the wild, particularly among sympatric diverging populations.
New criteria for sympatric speciation in the genomic era
Although genome-wide analyses of introgression provide a starting point, ultimately consideration of the time of arrival and functional role of each introgressed region within extant sympatric sister species pairs will be necessary to distinguish between sympatric speciation with gene flow (Scenario 2) or speciation aided by secondary gene flow (Scenario 3; e.g. segregating inversions [35,81] or ancient balancing selection on regions containing multiple barrier loci [82,83]). We suggest four major types of genomic analyses as new criteria to help identify sympatric speciation with gene flow: 1) estimate the timing of introgression into sympatric sister species relative to their divergence time, 2) infer the presence and timing of selective sweeps within sympatric sister species, 3) annotate candidate adaptive introgression regions for functional elements or trait associations that may be relevant to speciation, and 4) if closely related non-speciating outgroups are available, confirm the lack of selective sweeps of these regions in outgroups. Combining these statistics will aid in distinguishing where case studies fall along the speciation with gene flow continuum and whether the starting conditions of panmixia in sympatric speciation models will apply (Fig. 1).
1) Secondary gene flow is constant across the speciation process or not concurrent with divergence times
Estimating the duration of gene flow and the timing of introgression relative to the timing of divergence between sympatric sister species will help distinguish between scenarios of sympatric speciation, speciation with gene flow, and secondary contact. If populations diverged in sympatry independent of any concurrent secondary gene flow (Scenario 2), we might expect to see weak concordance of the timing of gene flow with divergence times among species. This discordance could be in the form of discrete gene flow events that date well before or after divergence times among species (Fig 1A). In the case of continuous gene flow from the time of colonization to the present, more information about the functional role and selection on introgressed regions will be needed.
The timing of introgression is also useful in ruling out other evolutionary phenomena that can leave similar genomic signatures. The random or biased assortment of ancestral variation among lineages during the speciation process can create similar phylogenetic patterns to introgression resulting from secondary gene flow [82,84]. Timing is important for differentiating introgression from the sorting of ancient ancestral polymorphisms due to processes such as balancing selection. For example, if genetic divergence in an introgressed region shared between sister species is greater (e.g. elevated Dxy) than expected given divergence time between the sister species, this pattern suggests differential sorting of ancestral variation and doesn’t rule out a scenario of sympatric speciation (Scenarios 1 & 2). If introgression after secondary contact did occur, genetic divergence in these regions between recipient sister species should be lower than expected given their divergence time. Increasingly sophisticated approaches for detecting fine-scale patterns of introgression are available to estimate the timing and duration of gene flow from genomic data (Box 3).
2) Lack of selective sweeps or non-concurrent timing of sweeps in regions that have experience gene flow
We can use information about selective sweeps of introgressed variation to further characterize the role of gene flow in sympatric divergence. When an allele is selectively favored in a population, positive selection may cause it to increase in frequency and form a localized selective sweep of reduced genetic variation surrounding the adaptive variant [85]. Such regions of high differentiation in recently diverged species are often targeted as candidates for speciation genes, although other processes not directly associated with speciation can lead to similar patterns of high heterogeneity in differentiation across a genome (reviewed in [86–88]); indeed, there is still no evidence that these regions are associated with reproductive isolation or reduced gene flow and can also result from adaptive introgression [89,90]). If speciation was recent or ongoing, there may be strong signatures of a selective sweep for particular haplotypes in at least one of the sister species for regions involved in the divergence process (Fig. 1B). If secondary gene flow was neutral with respect to speciation, we may find no signatures of selective sweeps in those introgressed regions.
Importantly, a sweep of the same introgressed region in both sympatric sister species may be interpreted as adaptation to the same new environment, which may not contribute to reproductive isolation between the pair (dependent on their respective genetic backgrounds; e.g. [90,91]). However, this pattern is also consistent with the sweep of a region contributing to a ‘one-allele’ mechanism of mate choice [27,28,92], such as increased female choosiness in both sympatric sister species (e.g. [93]), which would contribute to reproductive isolation. Thus, selective sweeps of an introgressed region in both sympatric sister species do not rule out its role in aiding the speciation process.
Alternatively, if selective sweeps are detected, the timing of selective sweeps in the regions affected by this gene flow can give indirect evidence about the selective pressure underlying the sweep. It is challenging to infer something about the importance of an introgressed region if the timing of introgression predates the timing of the selective sweep because linkage disequilibrium among loci relevant to speciation may take time to build up, a process involved in most speciation models [9,10]. However, the absence of selective sweeps or introgression until long after population divergence would suggest that introgression was not relevant to the speciation process (Scenario 2a). Introgressed regions that have undergone soft selective sweeps and were important for divergence may easily be missed, but increasingly sensitive methods [94,95] are making it easier to detect them.
3) Weak support for casual role of secondary gene flow based on functional genetic analyses of variants in the region
Another potential source of evidence for the functional importance of gene flow can come from associations between variants in introgressed regions with traits involved in ecological and sexual isolation between sister species from genome-wide association studies (GWAS). However, many complex traits are driven by a large number of variants of small effect and ruling out a functional role for gene flow from any annotations is difficult (e.g. omnigenic model; [96]). The conservation of sequences within introgressed regions across taxa may also provide strong evidence of a functional role (e.g. PhastCons [97]). Finally, and most powerfully, genome editing and gene expression reporter systems are increasingly tractable in non-model systems (e.g. [98– 101]. This is ultimately an asymmetric problem: finding evidence that an introgressed region may have contributed to reproductive isolation is far easier than demonstrating that no introgressed regions contributed to reproductive isolation in any way [67]. Finding evidence for sympatric speciation in the wild is now the difficult problem of functional genetic analyses of introgressed regions.
4) Similar patterns of selection or divergence in the introgessed regions in closely related outgroup populations
Thorough investigations of these same regions in outgroups to the sympatric species gives added power to distinguish whether secondary gene flow aided sympatric divergence. If non-diversifying, closely related species exist in similar environments and haven’t diversified in a similar manner but share signatures of selective sweeps in the same regions, then the observed introgression may have been neutral relative to speciation, e.g. due to adaptations to shared changes in climate or pathogens or shared regions of reduced recombination or increased background selection. For example, several studies comparing genomic landscapes of differentiation across closely related taxa have found that high differentiation observed in the same genomic regions across taxa reflects the action of linked selection across low-recombination regions rather than selection against gene flow at barrier loci [102–110].
Concluding Remarks
Sympatric speciation remains among the most controversial evolutionary processes, beloved by theorists and long sought after by empiricists. While evidence for this process appeared to be mounting using traditional criteria [5], genomic data has now cast doubt on all these putative examples due to the ubiquity of secondary gene flow. Furthermore, nearly all our existing case studies involve some form of automatic magic trait, such as assortative mating by habitat [35,111,112], along a depth gradient [63], or environment-induced phenology shifts [113]. We think an outstanding remaining question is whether sympatric divergence can occur in nature without the aid of some form of magic trait, as originally demonstrated to be possible in theory [9,114].
Future fine-scale investigations of introgression will likely continue to paint a complex picture of the role of secondary gene flow in nearly all speciation events. The highly polygenic and multi-dimensional nature of adaptation and mate choice suggests that an ‘all-of-the-above’ speciation scenario containing a mix of preference/trait, magic trait, and phenotype matching, each spread across a wide distribution of allelic effect sizes with varying times of arrival, will be the norm in nature. In contrast, although numerous and diverse, most speciation models continue to address these mechanisms in a piecemeal fashion with an assumption of large effect alleles. It remains unclear how different mechanisms, effect sizes, and times of arrival will interact and compete within a single model.
Interestingly, strict isolation of sympatric environments such as crater lakes may ultimately become less important, since even these isolated environments are not isolated from secondary gene flow [115,116]. Instead, recent sympatric divergence combined with well characterized introgression and functional annotations may be the new limiting factor for convincing case studies of sympatric divergence in the genomic era.
Why do we care whether speciation is sympatric?
An increasingly common claim in the empirical speciation literature is that there is no difference between ‘speciation with gene flow’ and ‘sympatric speciation’ (e.g. [70]). This contrasts sharply with the rich theoretical literature differentiating models of sympatric speciation with models of speciation with gene flow. Indeed, theory teaches us that we should care about the real differences between the process of sympatric speciation (i.e. population divergence in sympatry without the aid of introgression contributing to reproductive isolation) and other models of speciation with gene flow. Sympatric speciation is uniquely and notoriously difficult [117], in part because quite specific conditions of resource availability (e.g., [9,118]), mating traits and preferences (e.g., [31,119]), and search costs (e.g., [120]) must be met for it to occur.
Inferences from theoretical models predict that, under a scenario of speciation with gene flow (Scenario 3), introgression can make the process of speciation much easier in three ways. First, by introducing additional variation in ecological traits into the population, introgression could potentially facilitate a branching process due to competition for resources (although we are not aware of a model that assesses this precise situation, it can be inferred from the dynamics of [9]). Second, introgression of novel alleles for mating preferences may provide a boost in preference variation that could be an important trigger to aid the evolution of assortative mating under a preference/trait mechanism, which requires preference variation to be large ([20,31]). Indeed, we see exactly this pattern of secondary gene flow of olfactory alleles shortly before the rapid divergence of a Cameroon cichlid radiation in Lake Ejagham [66]. Third, secondary sympatry may lead to increased linkage disequilibrium between assortative mating and ecological loci or among ecological loci. It seems logical that this might facilitate sympatric speciation as this metric is often described as progress along the speciation continuum. However, initial linkage disequilibrium has been shown not to matter much in at least some scenarios [8], because without physical linkage, linkage disequilibrium will break down quickly. However, physical linkage may enable these alleles to remain in association for a sufficient time for assortative mating to evolve within the population (e.g., [34]). Initial linkage disequilibrium may also increase the probability of allelic capture by an inversion or for selection for new mutations within the inversion that may affect both ecology and assortment [32]. Increased linkage disequilibrium among ecological loci may also increase the probability of sympatric divergence, but this is in effect similar to varying effect sizes of alleles at ecological loci (e.g. many small effect alleles within a region resemble a large-effect locus [121–124]).
The fundamental difference between sympatric speciation and speciation with gene flow, including secondary contact scenarios, lies in the fact that very often multiple equilibrium states exist in speciation models, such that loss of divergence and maintenance of divergence in the presence of gene flow are both possible outcomes, depending on the starting conditions of a population (this is nicely illustrated for one measure of divergence by [17], Fig. I). In such cases, speciation is much more easily reached from starting conditions that match those of two populations that have diverged largely in allopatry due to the large amount of allelic variation or pre-existing phenotypic bimodality and assortative mating. Even for scenarios of speciation with gene flow that are much easier, such as geographic separation between two incipient species that are undergoing gene flow, differentiation is much more difficult to reach or maintain from an initially homogeneous population than from an initially differentiated one [125,126].
Evidence for sympatric speciation from crater lake cichlid radiations
There are relatively few volcanic chains of crater lakes containing fishes in the tropics, notably found only in Cameroon, Nicaragua, Tanzania, Uganda, Madagascar, and Papua New Guinea [63,127,128]. Although sympatric radiations of endemic fishes are known from other isolated saline, alkali, and ancient lakes [129–132], only three lineages of cichlids have radiated in the world’s crater lakes (Fig. II). The most diverse is Barombi Mbo, Cameroon with eleven endemic species, followed by Lake Bermin, Cameroon with nine [43]. Nicaraguan crater lakes reach up to five species [45,55,133,134], the East African craters never exceed two sympatric species [63,68], and Madagascar’s several crater lakes each contain only a single endemic cichlid [128]. It remains unknown why regional and lineage diversity varies so greatly because there appears to be no relationship with crater lake size or age (up to approximately 5 km diameter and 2 million years old) until reaching the much larger sizes of the East African rift lakes ([135], but also see [134]).
In contrast to claims in a recent review [70], the evidence for sympatric speciation with secondary gene flow is rather consistently in favor and remarkably similar across all crater lake cichlid radiations. In all cases examined with genomic data so far, secondary gene flow was detected, but there was little evidence it came from substantial divergence in allopatry followed by secondary contact. Instead, nearly all studies have concluded sympatric divergence with periodic or continuous gene flow, frequently from an initial hybrid swarm population (i.e. introgression from multiple outgroup populations; [54,63–65,68]).
We think that the best evidence for secondary gene flow as a trigger of sympatric divergence in cichlids comes from a radiation of three Coptodon species in Lake Ejagham: demographic analyses of whole genomes suggest that this population did not diversify for 8,000 years in the face of frequent gene flow until an influx of olfactory receptor alleles coinciding with the first sympatric divergence event in the lake [66]. Similarly in Lake Victoria, segregating opsin alleles in riverine cichlid populations were differentially sorted among Lake Victorian cichlids and may have triggered their diversification [25].
Evidence for sympatric divergence in crater lake cichlids without the aid of secondary gene flow remains elusive. Malinsky et al. [63] showed that introgression and hybrid swarm predated the divergence of a shallow/deep-water pair of cichlids in Lake Massoko, Tanzania; however, these ancestral segregating haplotypes may have later aided sympatric divergence (which admittedly is very difficult to rule out – see text). Very recent sympatric divergence in some crater lakes on the order of thousands of years may also suggest that ongoing divergence occurred in sympatry without the aid of gene flow [133,136]; however, it remains unclear if this incipient divergence will become stalled as in other sympatric radiations [24]. Very rare secondary gene flow without a clear functional role into the Barombi Mbo cichlid radiation (< 1% introgressed regions in nearly all species) provides weak evidence of sympatric divergence, but more functional characterization and timing of introgression is needed [67]. The recent advent of transgenic reporters, CRISPR-Cas9, and in situ hybridization genetic tools within Nicaraguan crater lake cichlids provides much promise for future investigations of the role of introgression in sympatric divergence [99,137].
Tools for detecting and dating local gene flow and selective sweeps
1) Detecting and dating local secondary gene flow
While there are a variety of tests to detect gene flow on a local scale or within sliding genomic windows [23,138–142], dating gene flow events is still accomplished mostly at the level of the entire genome (although see [141] for dating introgression relative to divergence). Currently, three types of demographic coalescent modeling approaches can infer local gene flow timing based on different information from genomic data: 1) the distribution of allele frequencies from genotype data (site frequency spectrum: [143,144]), 2) the distribution of haplotype block lengths from phased genomes [145–148], and 3) variation in coalescent patterns among gene trees [149]. Alternative approaches using conditional random fields [150] and hidden Markov models [151,152] have also been used to detect and date gene flow events.
2) Dating selective sweeps and ages of beneficial alleles
Methods for estimating the age of a sweep of a beneficial allele exploit several different aspects about the pattern of variation surrounding the allele on its haplotypic background. These include heuristic approaches that use point estimates of mean haplotype length or the number of derived mutations within a chosen distance of the site [153–156] and model-based approaches that use demographic information and summary statistics of allele frequencies and linkage disequilibrium to model a distribution of ages that fit the observed data [157–160]. Alternatively, full sequence data about haplotype structure on chromosomes and models that leverage the length of ancestral haplotypes surrounding the beneficial allele and the accumulation of derived mutations can be used to estimate the age of beneficial alleles [161–163].
3) Functional genetic analyses of introgressed variants
Functional assessments of introgressed regions minimally involve searching an annotated reference genome for genes with relevant functions known for model organisms (or a pipeline for assembling and annotating the organism of interest if an annotated reference is not already available (reviewed in [164,165]). Introgressed regions that are unannotated can be searched for evidence of potential functional importance based on strong sequence conservation across taxa [97] or potential regulatory elements (reviewed in [166]). Additionally, genome wide association studies (GWAS) can highlight variants in introgressed regions that may underlie complex traits of interest, including novel variants previously unknown in model organisms (GWAS reviewed in [167–169]). Functional validation of gene and regulatory element variants through genome-editing experiments is also becoming increasingly tractable for non-model organisms (e.g. [98,99,101,170]).
Glossary Box
Coalescence : The event of two sampled lineages from different populations merging back in time in a shared ancestral lineage.
Dxy: A measure of absolute genetic divergence between populations calculated as the average number of pairwise differences between sequences from two populations, excluding all the comparisons between sequences within populations.
Hybrid swarm : A genetically diverse population with unique allele combinations derived from the hybridization of multiple distinct taxa and subsequent backcrossing with hybrids and crossing between hybrids themselves.
Hidden Markov model : A statistical modeling approach used to infer hidden states from observed data along a sequence, where each hidden variable is independent of all others and conditional only on the state of the previous hidden variable.
Conditional random field : A statistical modeling approach similar to hidden Markov models except that each hidden variable can be conditional on regional hidden variables, not just the immediately previous one.
Incomplete lineage sorting : The imperfect sorting of ancestral alleles between diverging lineages that creates variable signatures about the evolutionary relationships among organisms.
Introgression : The movement and incorporation of genetic material from one distinct lineage into another upon hybridization between the two and subsequent backcrossing with one of the parent species.
Linkage disequilibrium : A non-random association of alleles at two or more loci.
Monophyletic : A group of lineages where the most recent ancestor of the group is not an ancestor of any lineages outside the group.
Secondary gene flow : Any gene flow event from non-sympatric populations after the initial colonization of the area that sympatric sister species diverged in. Introgression into the diverging sister species following such events potentially brings in variation that has evolved in allopatry that can aid the speciation process.
Transgressive segregation : The formation of extreme phenotypes in a segregating hybrid population that are outside the range of phenotypes observed in parental species.
Acknowledgements
We thank J. McGirr, M. St. John, J. Poelstra, B. Reinhard, and J. Hermisson for valuable discussion of content in this manuscript. This work was supported by the National Science Foundation DEB CAREER Grant #1749764 and by the University of North Carolina to CHM.