Abstract
While hybridization between species is increasingly appreciated to be a common occurrence, little is known about the forces that govern the subsequent evolution of hybrid genomes. We considered this question in three independent, naturally-occurring hybrid populations formed between swordtail fish species Xiphophorus birchmanni and X. malinche. To this end, we built a fine-scale genetic map and inferred patterns of local ancestry along the genomes of 690 individuals sampled from the three populations. In all three cases, we found hybrid ancestry to be more common in regions of high recombination and where there is linkage to fewer putative targets of selection. These same patterns are also apparent in a reanalysis of human-Neanderthal admixture. Our results lend support to models in which ancestry from the “minor” parental species persists only where it is rapidly uncoupled from alleles that are deleterious in hybrids, and show the retention of hybrid ancestry to be at least in part predictable from genomic features. Our analyses further indicate that in swordtail fish, the dominant source of selection on hybrids stems from deleterious combinations of epistatically-interacting alleles.
One sentence summary The persistence of hybrid ancestry is predictable from local recombination rates, in three replicate hybrid populations as well as in humans.
Main text
Understanding speciation is central to understanding evolution, but so much about the process still puzzles us. The foundational work in evolutionary biology envisioned speciation as an ordered process during which reproductive barriers, once established, prevent gene flow between species (1). We now realize, however, that speciation is much more dynamic, with hybridization occurring both during and after the evolution of reproductive barriers and evidence of past hybridization with close relatives still visible in the genomes of myriad animal and plant species (2–9). The ubiquity of hybridization raises the question of how species that hybridize remain genetically and ecologically differentiated.
At least part of the answer is likely that selection filters out deleterious hybrid ancestry from the genome (1). For instance, in hominins and swordtail fish, individuals are less likely to carry hybrid ancestry near functionally important elements (6, 10, 11), presumably because it is especially deleterious in such regions. Aside from these observations, however, little is known about how admixed genomes evolve—or, in some cases, stabilize—following hybridization. Our understanding of the evolution of hybrid genomes is complicated by the existence of many possible modes of selection and by the fact that, in most systems, the location of sites under selection is unknown. Decades of experimental work have demonstrated that Dobzhansky-Muller incompatibilities (DMIs) are a central mechanism underlying reproductive isolation once species are formed (12–16), but the importance of DMIs in shaping the evolution of hybrid genomes remains unknown, as does the role of other modes of selection. Notably, it was recently pointed out that when there is introgression from a species with a lower effective population size, hybrids may suffer from increased genetic load (“hybridization load”) due to the introduction of weakly deleterious alleles (10, 17). Depending on the environment in which hybrids find themselves, alleles that underlie ecological adaptations in the parental species may also be deleterious in hybrids (18, 19). Complicating matters yet further, modes of selection on hybrid ancestry will likely vary from system to system, depending on the extent of genetic divergence and ecological differentiation between the parental species, as well as long-term differences in their effective population sizes.
One feature, however, is expected to play a central role in all these models: variation in recombination rates along the genome (10, 17, 20–22). Theory predicts that selection is more likely to weed out hybrid ancestry in regions of low recombination (23–25). Specifically, in models of DMIs, minor parent ancestry will persist preferentially in regions of higher recombination because it is more rapidly uncoupled from mutations that are incompatible with the prevalent (i.e., major parent) genetic background (Fig. 1). Similarly, in models of hybridization load, all else being equal, shorter linkage blocks will carry fewer weakly deleterious mutations and therefore be will be less rapidly purged by selection (10, 17; Fig. S1). Previous studies have reported patterns potentially consistent with these expectations (26, 27), but without directly investigating ancestry patterns and their relationship with local recombination rates (28).
Recombination shapes ancestry in swordtail fish
To test predictions about the role of recombination in filtering hybrid ancestry, we took advantage of a set of naturally occurring hybrid populations between two swordtail fish species, Xiphophorus birchmanni and a closely related species, X. malinche (Fig. 2; Supporting Information 1-3). The two species are ~0.5% divergent at the nucleotide level and incomplete lineage sorting between the two is relatively rare (29; Fig. 2A). We focused on three hybrid populations that formed independently between the two fewer than 100 generations ago (29), likely as a result of human-mediated habitat disturbance (30). Previous analyses of hybrid zones between these two species, including two of the three populations analyzed here, suggested that there are on the order of 100 pairs of unlinked DMIs segregating in hybrids (29, 31), with estimated selection coefficients ~0.03-0.05 (29), and potentially many more linked DMIs, indicating that swordtail hybrids may be experiencing widespread selection on DMIs.
To infer local ancestry patterns, we generated ~1X low coverage whole genome data for 690 hybrids sampled from the three hybrid populations (Supporting Information 1). We estimated local ancestry patterns for the 690 hybrids by applying a hidden Markov model (32); this approach is predicted to have high accuracy for these hybrid populations, given the marker density and time since mixture (29, 32, 33). Using ancestry calls at 1-1.2 million sites genome wide, we inferred that two of the hybrid populations derive on average 75-80% of their genomes from X. birchmanni, whereas individuals in the third population derive on average 72% of their genomes from X. malinche (Fig. 2; Supporting Information 1; 34). The median homozygous tract length for the minor parent ranges from 84 kb to 225 kb across the three populations, roughly matching expectations for hybrid populations of these ages and mixture proportions (Supporting Information 4).
To consider the relationship between local ancestry and recombination rates, we inferred a fine-scale genetic map for X. birchmanni from patterns of linkage disequilibrium (LD) in unrelated individuals (Table S1; Supporting Information 2, 4-5). Based on our previous work on recombination in this taxon (35), we had a strong prior expectation that local recombination rates should be conserved between X. birchmanni and X. malinche (Supporting Information 6). We also generated crossover maps from hybrids based on inferred switch points between the two ancestries. Overall the hybrid and parental maps are consistent (Fig. S2), with the correlations between maps roughly comparable to what would be expected if the maps were in fact identical (Supporting Information 7).
In all three hybrid populations of swordtail fish, the probability of carrying ancestry from the minor parent increases with the local recombination rate (Fig. 3, Table 1). This association remains irrespective of the choice of scale (Fig. S3) and after thinning the SNP and ancestry variation data to control for possible differences in the ability to reliably infer recombination rates or the power to call hybrid ancestry across windows (Supporting Information 4). The preferential persistence of minor parent ancestry in regions of higher recombination is not expected under neutrality (Fig. S1) and instead indicates that minor parent ancestry was retained where it was more likely to have been rapidly uncoupled from the deleterious alleles with which it was originally linked (Supporting Information 5). This qualitative pattern can be generated under several models of selection, including selection against DMIs, selection against weakly deleterious alleles introduced by hybridization, or widespread ecological selection against loci that derive from the minor parent (Fig. 1, Fig. S1).
In principle, the retention of hybrid ancestry should be most accurately predicted from the exact number of deleterious alleles to which a minor parent segment was linked since hybridization occurred. Local recombination rates are one proxy for this (unknown) parameter, as are the number of coding base pairs nearby. In these data, both factors predict minor parent ancestry (Fig. S4; Fig. S5; Supporting Information 4), but local recombination is a stronger predictor and remains a predictor after controlling for the number of coding base pairs (Table 1; Table S2). These findings are consistent with those obtained in simulations mimicking the data structure (Supporting Information 4-5), presumably because the number of coding base pairs nearby is an extremely noisy proxy.
The source of selection
Controlling for the recombination rate, local ancestry is positively correlated between all pairs of hybrid populations, with weaker but significant correlations seen even between populations with different major parent ancestries (Fig. 4). These findings are expected from selection on the same underlying loci in independently formed populations (Supporting Information 8). Whereas both DMIs and hybridization load are predicted to drive positive correlations in local ancestry across populations regardless of the admixture proportions (Fig. S6), ecological selection against minor parent ancestry should lead to negative correlations in local ancestry and is thus inconsistent with the observed patterns (Fig. 4; Supporting Information 8).
Comparison among the three hybrid populations also provides a means to distinguish between the remaining two hypotheses. Analyzing genome sequences from X. malinche (5, 29) and X. birchmanni, we found that X. malinche has had a lower long-term effective population size than X. birchmanni (Fig. 2; Supporting Information 3), as seen both in the approximately four-fold lower average heterozygosity in X. malinche (0.03% vs 0.12% per base pair, respectively) and in estimates of effective population sizes over time from high coverage genome sequences (Fig. 2, Table S1). Consistent with a lower long-term effective population size, X. malinche carries significantly more putative deleterious alleles relative to the inferred ancestral sequence than does X. birchmanni, as measured by the number of derived, non-synonymous substitutions per haploid genome (a 2.5% excess, p=0.016 based on 1,000 bootstrap resamples; see Supporting Information 3; 36, 37). Because X. birchmanni and X. malinche source populations differ in the number of putatively deleterious variants, the three hybrid populations of swordtail fish provide an informative contrast: whereas DMIs should lead to selection against minor parent ancestry in all three populations, hybridization load should favor the major parent in populations 1 and 2 and the minor parent in population 3 (Fig. 2; Fig. 4).
In this regard, the fact that minor parent ancestry also increases with recombination in the third hybrid population, which derives most of its genome from the parental species that has lower effective population size (Fig. 2, 3), indicates that hybrid incompatibilities are the dominant mode of selection shaping ancestry in the genome in these hybrid populations, rather than selection against hybridization load (Fig. 4; Fig. S7; Supporting Information 4-5). In principle, ecological selection favoring the major parent could also produce a positive correlation between recombination rate and ancestry (though not the positive correlations in ancestry across populations; Fig. 4). However, this scenario would require two of the hybrid populations to occur in more birchmanni-like environments and one in a more malinche-like environment, when available evidence suggests otherwise—notably, all of the hybrid populations are found in thermal environments that are mismatched to the environment where their major parent is found (Fig. 2; Supporting Information 5).
Moreover, in all three hybrid populations, minor parent ancestry is unusually low near previously mapped DMIs between the two parental species (29, 31), a pattern that should not arise from the approach used to identify DMIs (Supporting Information 5), but is expected from selection on epistatically-interacting alleles (Fig. 4; Supporting Information 4-5). Taken together, these lines of evidence indicate that DMIs are the main (though not necessarily sole) source of selection shaping the retention of hybrid ancestry in these three swordtail fish hybrid populations (Fig. 4).
Ancestry also interacts with the local recombination rate in hominins
To evaluate the generality of the relationship between recombination rate and ancestry seen in swordtails, we considered the only other case with similar genomic data available: admixture between humans and archaic hominins. Several studies have reported that the average proportion of Neanderthal ancestry decreases with the number of closely linked coding base pairs and with a measure of the strength of purifying selection at linked sites (6, 10, 17, 38), patterns for which both DMIs and hybridization load (due to the smaller effective population size of Neanderthals, 39) have been proposed as explanations (6, 10, 17). Reanalyzing the data, we found that the proportion of Neanderthal ancestry (the minor parental species) decreases in regions of the human genome with lower recombination rates (Fig. 3D; Table 1; Table S3). This relationship is seen for different window size choices and with any of three approaches to infer Neanderthal ancestry in the human genome (Table 1), and is not expected as a result of variation in the power to identify introgression along the genome (Supporting Information 9). The effect of local recombination rate on Neanderthal ancestry also persists after accounting for the number of coding base pairs nearby (Table 1; Supporting Information 9). Interestingly, the relationship between Neanderthal ancestry and local recombination rate is especially strong when excluding regions of unusually high frequency Neanderthal ancestry (e.g. top 1%; Fig. S8), possibly because these regions are enriched for cases of adaptive introgression (6, 38, 40, 41). Repeating these analyses for Denisovan ancestry, for which there is lower power to identify ancestry tracts, there is a much weaker but consistent trend (Table 1; Supporting Information 9).
As in swordtails, the persistence of Neanderthal ancestry in regions of higher recombination is not expected under neutrality (Fig. S1) but could be generated by selection against DMIs, weakly deleterious alleles introduced by Neanderthals (6, 10, 17, 38), or widespread ecological selection against Neanderthal ancestry (Fig. 1, Fig. S1). Unlike in the case of swordtails, however, these causes cannot be distinguished based on these data alone (6, 10, 17, 38, 42). Moreover, the conclusion about the source of selection reached for swordtail fish need not hold for hominins, in particular because modern humans and Neanderthals (Denisovans) were less diverged when they are thought to have interbred, and thus may have accumulated many fewer DMIs (43, 44).
The predictability of hybrid ancestry
Hybrid ancestry is predicted by the local recombination rate across three replicate admixture events between the same species pair in swordtail fish, as well as in two cases of admixture in hominins. In swordtail fish hybrids, several lines of evidence indicate that selection against hybrid incompatibilities is the dominant force shaping minor parent ancestry in the genome. In hominins, the source of selection remains unclear. Regardless of the precise mechanisms of selection on hybrids, the generality of these patterns reveals the retention of hybrid ancestry to be at least in part predictable from genomic features.
The relationship of minor parent ancestry to local recombination thus provides a useful tool for predicting where in the genome we might expect hybrid ancestry to persist preferentially. In particular, in hominins, meiotic recombination events are directed to the genome by binding of the PRDM9 gene, whereas in swordtail fish, they are not and instead are concentrated around CpG islands and other promoter-like features (35; Supporting Information 5-6). Accordingly, we found that in swordtail fish, minor parent ancestry is higher around CpG islands and transcription start sites whereas in humans, it is not (Fig. 5; Supporting Information 5). In other words, the mechanism by which recombination is directed to the genome shapes the retention of hybrid ancestry.
One implication is that the reliance on PRDM9 to direct recombination may not only impact reproductive isolation between species directly (as in mice, 45), but also indirectly. For example, if DMIs tend to occur between neighboring genes (46), hybrids between species with PRDM9-independent recombination may experience greater negative selection than species that use PRDM9, because recombination events are more likely to uncouple negatively-interacting alleles. On the other hand, in species with PRDM9-independent recombination, genic regions have higher recombination rates and thus may be more likely to be uncoupled from a deleterious background, potentially providing more opportunities for adaptive introgression. As genomic data accumulates for hybridizing species across the tree of life (47–53), the importance of recombination mechanisms for the fate of hybrids can soon be systematically evaluated.
Acknowledgements
We thank Yaniv Brandvain, Erin Calfee, Graham Coop, Jonathan Pritchard, David Reich, Guy Sella, Sonal Singhal, Matthias Steinrücken and members of the Przeworski, Sella, and Pickrell labs for helpful discussions and/or comments on an early version of the manuscript. We thank the Federal Government of Mexico for permission to collect fish and Gaston Jofre for providing fish pictures. This project was supported by R01 GM83098 grant to MP, NSF DDIG DEB-1405232 to MS, and a Harvard Milton Fund grant to MS. MS was supported by a Hanna H. Gray Fellowship from the Howard Hughes Medical Institute.