Abstract
The Dobzhansky–Muller model posits that incompatibilities between alleles at different loci cause speciation. However, it is known that if the alleles involved in a Dobzhansky–Muller incompatibility (DMI) between two loci are neutral, the resulting reproductive isolation cannot be maintained in the presence of either mutation or gene flow. Here we propose that speciation can emerge through the collective effects of multiple neutral DMIs that cannot, individually, cause speciation—a mechanism we call emergent speciation. We investigate emergent speciation using a haploid neutral network model with recombination. We find that certain combinations of multiple neutral DMIs can lead to speciation. Complex DMIs and high recombination rate between the DMI loci facilitate emergent speciation. These conditions are likely to occur in nature. We conclude that the interaction between DMIs may be a root cause of the origin of species.
Introduction
Unravelling the ways in which reproductive barriers between populations arise and are maintained remains a central challenge of evolutionary biology. The Dobzhansky–Muller model posits that speciation is driven by intrinsic postzygotic reproductive isolation caused by incompatibilities between alleles at different loci (Dobzhansky, 1937; Muller, 1942; Orr, 1995). The kinds of strong negative epistatic interactions envisioned by this model are common between amino acid substitutions within proteins (Kondrashov et al., 2002; Kulathinal et al., 2004). Furthermore, Dobzhansky–Muller incompatibilities (hereafter DMIs) have been shown to cause inviability or sterility in hybrids between closely related species, although the extent to which any particular DMI has actually contributed to speciation remains an open question (Presgraves, 2010a,b; Maheshwari and Barbash, 2011; Seehausen et al., 2014).
In Figure 1A, we illustrate a simple version of the evolutionary scenario originally proposed by Dobzhansky (1937) with an incompatibility between neutral alleles at two loci (A and B) in a haploid. We refer to this interaction as a neutral DMI. An ancestral population is fixed for the ab genotype. This population splits into two geographically isolated (allopatric) populations. One population fixes the neutral allele A at the A locus, whereas the other fixes the neutral allele B at the B locus. The derived alleles are incompatible: individuals carrying one of the derived alleles are fit but individuals carrying both of them are not. Upon secondary contact between the populations, this neutral DMI creates postzygotic isolation between the two populations: if r is the recombination rate between the loci, then r/2 of haploid F1 hybrids between individuals from the two populations are unfit (inviable or sterile).
The neutral DMI described in the previous paragraph is unlikely to be an effective mechanism of speciation because it assumes that the populations diverge in perfect allopatry, and that the derived alleles go to fixation before secondary contact takes place. However, either mutation or gene flow can disrupt this process (Barton and Bengtsson, 1986; Bank et al., 2012) (Figures 1B and 1C): they lead to the production of individuals with the ancestral genotype (ab) and these individuals have an advantage because they are completely compatible with individuals carrying derived alleles (Ab and aB).
It is known that the reproductive barriers created by neutral DMIs can be strengthened in at least two ways. First, if selection favors the derived alleles—that is, if the DMI is not neutral (Gavrilets, 1997; Agrawal et al., 2011; Bank et al., 2012). This could happen if the derived alleles are involved in adaptation to different environments, a scenario known as ecological speciation (Schluter, 2009; Agrawal et al., 2011). Second, if the two populations are prezygotically isolated. For example, the low fitness of hybrids can select against hybridization and cause the evolution of assortative mating between individuals carrying the same derived allele—a mechanism known as reinforcement (Dobzhansky, 1937; Felsenstein, 1981; Liou and Price, 1994; Servedio and Kirkpatrick, 1997).
Here we consider a new mechanism we call emergent speciation—that speciation emerges through the collective effects of multiple neutral DMIs that cannot, individually, cause speciation. Low fitness in hybrids between closely related species is often caused by multiple DMIs (Presgraves, 2003; Payseur and Hoekstra, 2005; Masly and Presgraves, 2007; Matute et al., 2010; Moyle and Nakazato, 2010; Schumer et al., 2014). However, it does not follow that any of these DMIs actually caused speciation: most of the DMIs may have accumulated after speciation had occurred by other means.
The majority of theoretical work on DMIs has relied on either population genetic models (Nei, 1976; Bengtsson and Christiansen, 1983; Wagner et al., 1994; Gavrilets and Hastings, 1996; Gavrilets, 1997; Agrawal et al., 2011; Bank et al., 2012), or models of divergence between populations (Werth and Windham, 1991; Orr, 1995; Lynch and Force, 2000b; Orr and Turelli, 2001; Welch, 2004; Fraïsse et al., 2014). Both classes of models include simplifying assumptions: the former consider only DMIs involving 2–3 loci, whereas the latter ignore polymorphism at the DMI loci. Both simplifications are problematic: reproductive isolation is often caused by multiple DMIs involving multiple loci (Presgraves, 2003; Payseur and Hoekstra, 2005; Masly and Presgraves, 2007; Matute et al., 2010; Moyle and Nakazato, 2010; Schumer et al., 2014), and many populations contain alleles involved in DMIs segregating within them (Cutter, 2012; Corbett-Detig et al., 2013). The few studies that have attempted to overcome these simplifications have either excluded DMIs (Flaxman et al., 2014) or have not represented DMIs explicitly (Barton and Bengtsson, 1986; Gavrilets et al., 1998; Gavrilets, 1999; Barton and de Cara, 2009) and, therefore, could not capture emergent speciation. We investigate emergent speciation using a haploid neutral network model (Schuster et al., 1994; van Nimwegen et al., 1999) with recombination (Xia and Levitt, 2002; Szöllősi and Derényi, 2008), which allows us to represent DMIs involving multiple loci (Gavrilets and Gravner, 1997; Gavrilets, 2004), and to take into account genetic variation at those loci (Cutter, 2012; Corbett-Detig et al., 2013).
A neutral network (Schuster et al., 1994; van Nimwegen et al., 1999) is a network of fit genotypes connected by mutational accessibility. Two genotypes are mutationally accessible if one genotype can be obtained from the other through a single mutation. For example, Figure 1A shows a neutral network where aB is connected to ab but not to Ab. All genotypes in the network are fit and have equal fitness. All genotypes outside the network are unfit but some may be mutationally accessible from genotypes in the network. For example, in the neutral network shown in Figure 1A, AB is unfit, and it is accessible from both aB and Ab, but not ab.
Neutral networks define “holey” adaptive landscapes with “ridges” of fit genotypes connecting distant genotypes (Gavrilets and Gravner, 1997; Gavrilets, 2004). They extend the neutral DMI model to multiple loci (Gavrilets and Gravner, 1997; Gavrilets, 2004); a neutral network of K genotypes with L loci, each with α alleles can be constructed by taking the entire space of αL genotypes and “removing” the αL − K genotypes that carry incompatible combinations of alleles (e.g., the A and B alleles in the neutral network in Figure 1A). A single DMI of order ω (i.e., one involving alleles at ω loci) implies the removal of αL−ω genotypes (2 ≤ ω ≤ L). Additional DMIs imply the removal of other genotypes, although the corresponding sets of genotypes to remove may overlap with each other. DMIs of order ω = 2 are designated simple, whereas those of order ω > 2 are designated complex (Cabot et al., 1994; Orr, 1995; Fraïsse et al., 2014). DMIs of order up to ω = 5 have been discovered in introgression studies (Fraïsse et al., 2014). The alleles of genotypes in the neutral network can be, for example, nucleotides, amino acids, insertions/deletions, or presence/absence of functional genes. Therefore, a neutral network can also be used to represent DMI-like scenarios such as the degeneration of duplicate genes (Werth and Windham, 1991; Lynch and Force, 2000b; Nei and Nozawa, 2011).
We show that neutral networks defined by multiple simple and/or complex neutral DMIs can lead to the establishment of stable reproductive barriers between populations. Although the neutral network model includes its own simplifying assumptions, it captures the essence of the phenomenon of emergent speciation in the absence of other possible mechanisms of speciation. Thus, it allows us to identify and characterize some of the causes of emergent speciation, including the pattern of interactions between DMI loci and recombination. Furthermore, emergent speciation is a robust mechanism that we argue should operate under a broad range of conditions.
Results
A neutral DMI between two loci is not sufficient to cause speciation
Consider the neutral DMI illustrated in Figure 1A. Initially, two allopatric populations are fixed for the aB and Ab genotypes, respectively. The populations are maximally genetically differentiated at the two loci (GST = 1). The degree of reproductive isolation between the two populations is I = r/2, the mean fitness of haploid F1 hybrids between individuals from the two populations (see Materials and methods for definitions of both GST and I).
How stable is the reproductive barrier between the two populations? To address this question we begin by investigating the effect of mutation within populations. If the alleles at each locus can mutate into each other (A↔a and B↔b) at a rate u per locus per generation, then the degree of reproductive isolation will decline exponentially according to the expression: It ≈ I0 · e−2ut, where t is time in generations, and I0 = r/2 is the initial reproductive isolation. For example, if u = 10−3 and r = 0.2, then genetic differentiation and reproductive isolation will be eliminated within ∼4,000 generations (Figures 1B and 1C, m = 0). Any amount of gene flow between the two populations will further accelerate the erosion of the reproductive barrier (Figures 1B and 1C, m > 0). For example, if just 1 individual in 2,000 migrates from one population to the other every generation (m = 0.0005) then genetic differentiation and reproductive isolation will be eliminated within ∼2,000 generations.
The evolution of a stable reproductive barrier between two populations—that is, speciation—requires the existence of more than one stable equilibrium (Barton, 1996; Gavrilets and Hastings, 1996). A single neutral DMI between two diallelic loci is not sufficient to cause speciation because, in the presence of mutation (0 < u < 0.5), it only contains one stable equilibrium for any level of recombination (Gavrilets, 2004), and populations will gradually evolve toward this equilibrium (Figure 1—figure supplement 1). Changes to the adaptive landscape can cause the appearance of two stable equilibria (Bank et al., 2012). For example, if the derived alleles confer an advantage (fitness: waB = wAb = 1 and wab = 1 − s), and if both r and s ≫ u, the genotype network will have two stable equilibria, with and , respectively (Figure 2). Two populations in different equilibria will show a degree of reproductive isolation of: I ≈ r(1 + s)/2 (Figure 2E).
Neutral networks based on multiple DMIs can show multiple stable equilibria
We began by investigating whether neutral networks contain multiple stable equilibria. To do this we generated ensembles of 500 random neutral networks of K genotypes with L loci and α alleles per locus for a range of values of K, L and α. None of the neutral networks considered could have been specified by a single DMI of any order (2 ≤ ω ≤ L). To construct a random neutral network, we generated K random genotypes with L loci and one of α alleles per locus, and kept the resulting network if it was connected. We ignored disconnected networks because, although they often contain multiple stable equilibria, a population is unlikely to shift from one equilibrium to another because it requires rare multiple mutations (Gavrilets, 2004).
For each neutral network, we constructed populations with different initial genotype frequencies and allowed each population to evolve independently until it reached equilibrium. We then evaluated the stability of the resulting equilibria (see Materials and methods). No neutral networks defined on L = 3 loci with α = 2 alleles per locus contain multiple stable equilibria. However, some neutral networks with L = 4 and α = 2, and with L = 3 and α = 3 contain multiple stable equilibria (Figure 3A; Figure 3—figure supplement 1A). Populations evolving independently to different stable equilibria become genetically differentiated and partially reproductively isolated from each other (Figure 3B–C; Figure 3—figure supplement 1B–C). Thus, speciation can emerge through the collective effects of multiple neutral DMIs that cannot, individually, cause speciation.
Larger, sparser neutral networks are more likely to contain multiple stable equilibria
The probability, PM, that a random neutral network from an ensemble shows multiple stable equilibria is correlated with properties of the network. PM increases with the size of the network, K (Figure 3A; Figure 3—figure supplement 1A). We have never found any random connected neutral network with K = 5 genotypes with multiple equilibria (PM≈ 0), regardless of the values of L and α. In contrast, networks with K = 9 genotypes defined by L = 6 diallelic loci, show PM ≈ 50%.
For random neutral networks of a given size, the topology of the network also influences PM. This is the reason why the relationship between PM and K is non-monotonic for L = 4 diallelic loci (Figure 3A): the genotype space consists of only 24 = 16 genotypes, which constrains the range of topologies that a random neutral network can take. Increasing either L or α increases the size of the genotype space and, therefore, alleviates the constraint (Figure 3A; Figure 3—figure supplement 1A).
Table 1 shows that PM is correlated with multiple network properties, including the average degree of a genotype, that is, the mean number of mutational neighbors it has in the neutral network. One complication is that different properties of neutral networks are not independent of each other (Table 1; Figure 3—figure supplement 3). Figure 3—figure supplement 2 shows two network correlates of PM that are, in turn, uncorrelated with each other (Figure 3—figure supplement 3; Table 1): the spectral radius and the degree assortativity. The spectral radius is the leading eigenvalue of the adjacency matrix and measures the mean degree of a population at equilibrium when r = 0 (van Nimwegen et al., 1999). The degree assortativity measures the extent to which nodes with a certain degree are connected with nodes with similar degree. Neutral networks with low spectral radius and negative degree assortativity—and more sparsely connected, spread out, modular networks—are more likely to show multiple stable equilibria (Table 1). However, the topology of a network is not sufficient to determine PM: the precise pattern of linkage between loci also influences whether a particular neutral network shows multiple stable equilibria (Figure 3—figure supplement 4).
Neutral networks based on complex DMIs are more likely to show multiple stable equilibria
A neutral network of a certain size (K) can be specified by either a few low-order DMIs or many high-order DMIs. To investigate the extent to which DMIs of different order (ω) can lead to multiple stable equilibria, we have exhaustively enumerated all possible combinations of simple DMIs (ω = 2) on L = 4 diallelic loci specifying connected neutral networks with K ≥ 6 genotypes. Of the 2,918 resulting neutral networks, none were found to contain multiple stable equilibria (PM ≈ 0).
This result is surprising because random neutral networks with K = 6 to 12 genotypes with L = 4 diallelic loci showed PM ≈ 12% (Figure 3A). One possibility is that simple DMIs are not sufficient to generate neutral networks with multiple stable equilibria, and that complex DMIs (ω > 2) are required.
To test this hypothesis, we generated additional ensembles of random neutral networks of K = 9 genotypes using random combinations of DMIs of order ω = 2 to 4 between L = 6 diallelic loci. We found that, although simple DMIs are capable of generating neutral networks with multiple stable equilibria, ∼97% of neutral networks generated by combinations of 5–14 simple DMIs have only one stable equilibrium. As expected, PM increases with the complexity (ω) of the DMIs (Figure 4).
The existence of multiple stable equilibria depends on the recombination rate
In the absence of recombination between the loci defining a neutral network, there is only one stable equilibrium (van Nimwegen et al., 1999). The genotype frequencies at equilibrium are given by the leading eigenvector of the mutation matrix M, where entry Mij is the mutation rate from genotype i to genotype j per generation (van Nimwegen et al., 1999). With recombination, however, multiple stable equilibria can occur (Figure 3A).
To quantitatively investigate the relationship between the existence of multiple stable equilibria and the recombination rate between fitness loci (r) in a concrete example we considered the neutral network shown in Figure 5A. This was one of the random neutral networks in the K = 6, L = 3 and α = 3 ensemble summarized in Figure 3—figure supplement 1. The neutral network is defined by 10 simple DMIs: A1–B3 (i.e., A1 and B3 are incompatible), A2–B2, A2–B3, A3–B2, A3–B3, B1–C1, B1–C3, B3–C1, B3–C2, and B3–C2. We examined how the number of stable equilibria in this neutral network changes with r while keeping the mutation rate constant (u = 10−3). When the recombination rate is low (0 ≤ r ≤ 0.0019) the neutral network contains only one stable equilibrium regardless of initial conditions (Figure 5B). The equilibrium is symmetric in that the frequency of the A1B2 haplotype (red) is the same as that of the B1C2 haplotype (blue). Above a critical recombination rate (0.0019 ≤ r ≤ 0.5) there are two stable equilibria and one unstable equilibrium. Populations evolve to the different equilibria depending on initial conditions (Figure 5C). The stable equilibria are asymmetric with an excess of genotypes containing either the A1B2 (red) or B1C2 (blue) haplotype, respectively (note, however, that these equilibria are symmetric with each other). The unstable equilibrium is symmetric, with equal frequencies of the A1B2 and B1C2 haplotypes. The critical point at which the equilibria bifurcate is approximately invariant with the r/u ratio (Figure 5—figure supplement 1).
The reproductive barriers generated by multiple neutral DMIs can persist in the presence of gene flow
If two allopatric populations evolve independently to the different stable equilibria of the neutral network in Figure 5A, they will become genetically differentiated and reproductively isolated to an extent that also depends on r (Figure 5D–E).
The reproductive barrier created by the neutral network in Figure 5A can persist in the presence of gene flow (Figures 5D–E, red). Introducing gene flow weakens the degree of genetic differentiation and of reproductive isolation at equilibrium, and increases the critical value of r required for the persistence of a reproductive barrier (Figures 5D–E, red). However, the maximum migration rate between two populations that allows the reproductive barrier to persist is lower than the mutation rate (m ≈ 0.00047, for r = 0.5; Figure 5—figure supplement 2, blue). Stable differentiation can occur in a stepping-stone model (Kimura, 1952) with higher local migration rates (Figure 6D), but the resulting reproductive barrier does not slow down the spread of a neutral allele at an unlinked locus appreciably (Barton and Bengtsson, 1986) (Table 2).
Larger neutral networks, involving complex incompatibilities between greater numbers of loci, can generate stronger reproductive barriers, capable of withstanding substantial gene flow. The neutral network shown in Figure 7A contains three stable equilibria. This was one of the random neutral networks in the K = 11 and L = 5 ensemble summarized in Figure 3. The neutral network is defined by 9 DMIs, 7 of which are complex: A–e (i.e., A and e are incompatible), B–e, A–b–D, a–B–d, A–C–D, B–c–d, b–C–D, C–D–e, and a–c–d–E. Populations at the equilibria at opposite ends of the network can show high levels of genetic differentiation and reproductive isolation (Figures 7D and 7E). If the fitness loci are unlinked (r = 0.5), then 50% of F1 hybrids between two populations at equilibrium are unfit. The maximum migration rate between two populations that allows the reproductive barrier to persist is almost two orders of magnitude higher than the mutation rate (m ≈ 0.0943, for r = 0.5; Figure 5—figure supplement 2, red). In a stepping-stone model, this neutral network can slow down the spread of a neutral allele at an unlinked locus to a greater extent than a single DMI with selection for the derived alleles (Figures 6C and 6E; Table 2). Thus, emergent speciation could, in principle, be an effective mechanism of either allopatric or parapatric speciation.
The probability of a stochastic shift from one stable equilibrium to the other decreases with the recombination rate
In the neutral network model speciation requires that a population undergo a stochastic shift from one stable equilibrium to another. One mechanism by which this could happen is the “founder effect” (Templeton, 1980; Carson and Templeton, 1984). In this scenario, a new allopatric population is founded by a few individuals from a larger source population. The new population then expands rapidly. The stochastic shift occurs during the short period of time while the population is small. We investigated the probability of a stochastic shift in the neutral network shown in Figure 5A (see Materials and methods). We found that the probability that a founder event causes a stochastic shift (PS) can be high when r is low, and declines as r increases (Figure 5F). A similar relationship between PS and r was observed for the neutral network in Figure 7A and for a single DMI with selection for the derived alleles (Figure 7F and 2F). In general, PS declined as the reproductive barrier became stronger (Table 2).
Discussion
Our main result is that, when it comes to multiple neutral DMIs, the whole can be greater than the sum of its parts. Although a single neutral DMI cannot lead to the evolution of stable reproductive isolation, the collective effects of certain combinations of multiple neutral DMIs can lead to the evolution of strong barriers to gene flow between populations—a mechanism we call emergent speciation.
Emergent speciation depends on two factors: the pattern of interactions between DMI loci and recombination. DMIs of higher order (ω), and involving greater numbers of loci (L), tend to promote emergent speciation (Figures 3 and 4). This relationship is mediated by several properties of the neutral networks specified by the DMIs: larger (K), more sparsely connected, spread out, modular neutral networks tend to facilitate emergent speciation (Figure 3; Figure 3—figure supplement 2; Table 1). Note that our results are conservative because we considered only connected networks. Real neutral networks might, in fact, be disconnected (Jiménez et al., 2013) which would be expected to further facilitate emergent speciation.
Increasing the recombination rate between DMI loci promotes emergent speciation in at least three ways. First, it causes the appearance of multiple equilibria (Figures 5B–C and 7B–C). Recombination had been shown to generate multistability in other evolutionary models (Bürger, 1989; Bergman and Feldman, 1992; Boerlijst et al., 1996; Higgs, 1998; Wright et al., 2003; Jacobi and Nordahl, 2006; Park and Krug, 2011), although earlier studies of the evolutionary consequences of recombination in neutral networks did not detect multiple equilibria (Xia and Levitt, 2002; Szöllősi and Derényi, 2008). Second, it increases genetic differentiation between populations at the different equilibria (Figures 5D and 7D). This pattern is consistent with the observation that increasing r reduces variation within a population at equilibrium in a neutral network (Xia and Levitt, 2002; Szöllősi and Derényi, 2008; Paixão and Azevedo, 2010). Third, it increases the degree of reproductive isolation between populations at different equilibria (Figures 5E and 7E). This is because, in our model, recombination is required to produce hybrids and consequently is the predominant source of selection. High r between fitness loci has been shown to promote speciation in other models (Felsenstein, 1981; Bank et al., 2012).
The precise pattern of recombination—that is, linkage—between loci can also determine the existence of multiple equilibria (Figure 3—figure supplement 4). This result indicates that certain chromosomal rearrangements may facilitate emergent speciation. Note that this mechanism of chromosomal speciation does not assume that different chromosomal rearrangements are polymorphic within populations and therefore is not based on suppression of recombination (Faria and Navarro, 2010).
How likely is emergent speciation to occur in nature? One recent study (Corbett-Detig et al., 2013) found evidence that multiple simple DMIs involving loci with high r are currently segregating within natural populations of Drosophila melanogaster. Corbett-Detig and colleagues surveyed a large panel of recombinant inbred lines (RILs) (Corbett-Detig et al., 2013). They found 22 incompatible pairs of alleles at unlinked loci in the RILs; of the 44 alleles, 27 were shared by two or more RILs, indicating that multiple DMIs are polymorphic within natural populations (Corbett-Detig et al., 2013). They also found evidence for multiple DMIs in RIL panels in Arabidopsis and maize (Corbett-Detig et al., 2013). Corbett-Detig and colleagues did not attempt to identify DMIs among linked loci or complex DMIs and therefore are likely to have underestimated the actual number and complexity of DMIs in the RILs. These observations suggest that the conditions for emergent speciation by multiple DMIs may indeed occur in nature, although the resulting neutral networks remain to be discovered.
There is strong evidence that DMIs contribute to reproductive isolation between closely related species, but it is difficult to determine the extent to which these DMIs actually caused speciation or are simply a by-product of divergence after speciation had occurred by other means (Presgraves, 2010a,b; Maheshwari and Barbash, 2011; Seehausen et al., 2014). One prediction of the emergent speciation hypothesis is that, if multiple DMIs contribute to speciation then DMIs fixed between species should have higher order (ω), on average, than DMIs segregating within species. Recent surveys have concluded that complex DMIs, as well as other forms of high-order epistasis, are widespread (Presgraves, 2010a; Weinreich et al., 2013; Fraïsse et al., 2014), but a systematic comparison between the complexity of DMIs in divergence and polymorphism remains to be carried out.
The neutral network model includes two central assumptions: neutrality within the network and complete unfitness outside it. Both assumptions are plausible in the case of speciation by reciprocal degeneration or loss of duplicate genes. Gene duplication followed by reciprocal degeneration or loss of duplicate copies in different lineages can act just like a DMI (Werth and Windham, 1991; Lynch and Force, 2000b), despite not involving an epistatic interaction (Nei and Nozawa, 2011). If the duplicates are essential genes, then genotypes carrying insufficient functional copies will be completely unfit. Gene duplications, degenerations and losses are common (Force et al., 1999; Lynch and Conery, 2000; Nei and Nozawa, 2011) and a substantial fraction of gene degenerations and losses are likely to be effectively neutral (Force et al., 1999; Lynch and Force, 2000a; Lynch and Conery, 2000). Following whole genome duplications, multiple gene degenerations or losses occur (Force et al., 1999; Lynch and Force, 2000a; Scannell et al., 2006; Nei and Nozawa, 2011), and the duplicates tend to be unlinked. Thus, we predict that emergent speciation will play a major role in speciation by reciprocal degeneration or loss of duplicate genes. This form of speciation appears to have contributed to the diversification of yeasts (Scannell et al., 2006).
The assumption of “in-network” neutrality is challenged by evidence that many DMI loci have experienced positive selection during their evolutionary history (Presgraves, 2010a,b; Maheshwari and Barbash, 2011). However, the neutral network model could still apply to some of those cases for two reasons. First, emergent speciation is robust to some variation in fitness among the genotypes in a neutral network (Figure 5—figure supplement 3). Second, neutral networks may approximate more complex scenarios where selection is weak or variable over time and/or space, or population sizes are small (Gavrilets, 2004). The assumption that “out-of-network” genotypes are completely unfit is contradicted by the observation that many DMIs cause only partial loss of fitness (Presgraves, 2003; Corbett-Detig et al., 2013; Schumer et al., 2014). However, our results also apply to partial DMIs. As long as the the disadvantage of “falling off” the neutral network is substantial, partial DMIs are still expected to lead to the evolution of stable—albeit weaker—reproductive barriers (Figure 5—figure supplement 4). We conclude that emergent speciation is a robust mechanism that should operate under a broader range of conditions violating the two central assumptions of the neutral network model.
The best studied examples of DMIs are in diploids (Presgraves, 2010b; Maheshwari and Barbash, 2011). Our model assumes haploidy, which means that it is mathematically equivalent to a diploid model where the incompatible haplotypes cause dominant incompatibilities, but where the same diploid genotypes involving cis and trans allele combinations (e.g., Ab/aB and ab/AB) may have different fitnesses. The latter is rare, and the former is unrealistic: DMIs in diploids tend to be recessive (Presgraves, 2003; Masly and Presgraves, 2007). Nevertheless, diploidy is likely to facilitate emergent speciation for three reasons. First, segregation in diploids has many of the same consequences as high recombination in haploids, regardless of the rate of recombination among linked loci (Otto, 2003). Second, diploids can show much stronger reproductive isolation than haploids. Strong reproductive isolation in haploids requires that a large proportion of recombinants carry incompatible combinations of alleles. This can only be achieved with large numbers of DMI loci and high recombination rate between them. In contrast, single DMIs can cause dramatic loss of fitness in F1 hybrids in diploids (Presgraves, 2010b; Maheshwari and Barbash, 2011). Third, diploidy may allow patterns of DMI interaction that increase the probability of stochastic shifts between stable equilibria (Wagner et al., 1994; Gavrilets, 2004).
Recombination does oppose emergent speciation in neutral networks in one crucial way: it reduces the probability of a stochastic shift (PS) between stable equilibria (Figures 5F and 7F). PS also appears to increase with the strength of the reproductive barrier at equilibrium (Figures 5F and 7F). Similar observations have been made in other models (Figure 2F) (Wagner et al., 1994; Barton, 1996; Gavrilets, 2004), leading many to conclude that genetic drift alone cannot cause speciation (Barton, 1996; Seehausen et al., 2014). It does not follow, however, that emergent speciation is unlikely. Shifts between stable equilibria might be facilitated by transient changes in selection (Barton, 1996). Alternatively, populations could diverge in allopatry as envisaged in traditional DMI models (Dobzhansky, 1937; Muller, 1942; Orr, 1995).
Our results have broader implications for evolutionary theory. The neutral network model was originally developed in the context of RNA and protein sequence evolution (Lipman and Wilbur, 1991; Schuster et al., 1994; Huynen et al., 1996), and has played an important role in the study of the evolution of robustness and evolvability (Huynen et al., 1996; van Nimwegen et al., 1999; Ancel and Fontana, 2000; Wagner, 2008; Draghi et al., 2010). One limitation of much of this work is that it has been conducted using the asexual version of the neutral network model. Our finding that recombination promotes the appearance of multiple stable equilibria in neutral networks has clear implications for the evolution of robustness and evolvability that deserve further investigation. For example, Wagner (2011) has argued that recombination helps explore genotype space because it causes greater genotypic change than mutation. However, our results suggest that, depending on the structure of the neutral network, large sexual populations can get trapped in stable equilibria, therefore restricting their ability to explore genotype space.
We have found that multiple neutral DMIs can cause emergent speciation and that the conditions that promote emergent speciation are likely to occur in natural populations. We conclude that the interaction between DMIs may be a root cause of the origin of species. Continued efforts to detect DMIs (Payseur and Hoekstra, 2005; Masly and Presgraves, 2007; Schumer et al., 2014) and to reconstruct real neutral networks (Lee et al., 1997; Jiménez et al., 2013) will be crucial to evaluating the reality and importance of emergent speciation.
Materials and methods
Neutral network model
Organisms are haploid and carry L loci with effects on fitness. Each locus can have one of α alleles. Out of the possible αL genotypes, K are fit, with equal fitness, and the remaining genotypes are completely unfit. The K genotypes define a neutral network, where genotypes are connected if one genotype can be obtained from the other through a single mutation (i.e., they differ at a single locus).
Random neutral networks
Ensembles of random neutral networks were analyzed. Random neutral networks were generated by sampling K genotypes at random from the αL possible genotypes available (without replacement) and retaining the resulting network if it was connected.
Neutral networks specified by DMIs
To investigate the effect of the order (ω) of a DMI, ensembles of neutral networks were generated by sampling combinations of d random DMIs with pre-specified values of ω between alleles at L diallelic loci (see Figure 4 for more details). Following Orr (Orr, 1995), one allele at each locus was considered to be ancestral and compatible with other ancestral alleles, and no DMIs were allowed where all the ω incompatible alleles were ancestral.
Network statistics
Algebraic connectivity
Second smallest eigenvalue of the Laplacian matrix of the network (Newman, 2010). Abbreviated as AC in Figure 3—figure supplement 3. Calculated using NetworkX (Hagberg et al., 2008).
Average degree and variance in degree
Mean and variance of the degree distribution, respectively (Newman, 2010). The degree of a genotype is the number of its fit mutational neighbors. Calculated using NetworkX (Hagberg et al., 2008).
Average Hamming distance
Average number of loci at which pairs of genotypes carry different alleles. Genotypes connected in the neutral network are at a Hamming distance of 1.
Average shortest path length
Average number of steps along the shortest path between pairs of genotypes (Newman, 2010). Abbreviated as PL in Figure 3—figure supplement 3. Calculated using NetworkX (Hagberg et al., 2008).
Degree assortativity
A measure of the correlation of the degree of linked genotypes (Newman, 2010). Abbreviated as DA in Figure 3—figure supplement 3 and Table 1. Calculated using NetworkX (Hagberg et al., 2008).
Estrada index
A centrality measure (Estrada and Rodríguez-Veázquez, 2005). Calculated using NetworkX (Hagberg et al., 2008).
Modularity
A measure of the extent to which the network displays community structure (Newman, 2006). A community is a group of densely interconnected nodes showing relatively few connections to nodes outside the community. Abbreviated as Q in Figure 3—figure supplement 3. Calculated using igraph (Csárdi and Nepusz, 2006) based on an exhaustive search over all possible partitions of the network.
Spectral radius
Leading eigenvalue of adjacency matrix (Newman, 2010). Measures the mean degree of a population at equilibrium in the absence of recombination (van Nimwegen et al., 1999). Abbreviated as SR in Figure 3—figure supplement 3 and Table 1. Calculated using NumPy (Oliphant, 2007).
Evolution
Evolution on a neutral network was modeled by considering an infinite-sized population of haploid organisms reproducing sexually in discrete generations. The state of the population is given by a vector of frequencies , where pi is the frequency of genotype i. Genotypes outside the network are ignored because they are completely inviable (van Nimwegen et al., 1999). Individuals mate at random with respect to genotype to form a transient diploid that undergoes meiosis to produce haploid descendants. Selection takes place during the haploid phase. Mating, recombination, mutation and selection cause the population to evolve according to the equation: where is the state of the population at generation t, M is the mutation matrix such that entry Mij is the mutation rate from genotype i to genotype j per generation, and is a vector of recombination matrices such that entry of matrix Rg is the probability that a mating between individuals of genotypes i and j generates an individual offspring of genotype g. The diagonal elements of M (Mii) represent the probability that genotype i does not mutate (including to unfit genotypes outside the neutral network). Values of Mij are set by assuming that each locus mutates with probability u and that a genotype can only mutate simultaneously at up to a certain number of loci. Up to L − 1 crossover events can occur between two genotypes with probability 0 ≤ r ≤ 0.5 per interval. The recombination rate r is the same for all pairs of adjacent loci. If r = 0.5, then there is free recombination between all loci.
Equilibria
Given a neutral network, population genetic parameters u and r, and a set of initial genotype frequencies , the population was allowed to evolve until the root-mean-square deviation of the genotype frequencies in consecutive generations was . The final genotype frequencies were identified as an equilibrium . Multiple initial conditions were used: (i) fixed for each of the K genotypes in turn, and (ii) 4 independent sets of random frequencies. Two equilibria were judged identical if . Only one of a set of identical equilibria was counted. This procedure does not guarantee the discovery of all equilibria; indeed, it likely underestimates the number of unstable equilibria.
Stability analysis
For each equilibrium , the eigenvalues of the Jacobian matrix of were calculated at . If |λi| < 1 for every i (to within a tolerance of 10−8), the equilibrium was judged to be stable.
Gene flow
Gene flow was modeled as symmetric migration between two populations. Migration occurs at the beginning of each generation, such that a proportion m of each population is composed of immigrants from the other population. Then random mating, recombination and mutation take place within each population, as described above.
Stepping-stone model
A stepping-stone model (Kimura, 1952) was used to measure the rate of spread of a neutral allele across a reproductive barrier (Barton and Bengtsson, 1986). A number n of populations are arranged in a line. Every generation a proportion 2m of a population emigrates to its two neighboring populations (except populations 1 and n, which have only one neighbor, so only m of each of them emigrate) (see Figure 6A). Note that, unlike the stepping stone model studied by Gavrilets (Gavrilets, 1997), our implementation allows the genotype frequencies of terminal populations (1 and n) to vary. When n = 2 this model reduces to the gene flow model described in the previous section.
Genetic differentiation
GST = 1 − HS/HT was used to measure the genetic differentiation between two populations at a locus, where HS is the average gene diversity of the two populations, and HT is the gene diversity of a population constructed by pooling the two populations (Nei, 1976). The gene diversity of a population at a locus is defined as , where qi is the frequency of allele i. Values of GST can vary between 0 (two populations with the same allele frequencies) and 1 (two populations fixed for different alleles). The overall genetic differentiation between two populations was quantified as the average GST over all loci. If all genotypes in the neutral network contain the same allele at a locus, that locus is excluded from the calculation of average GST.
Reproductive isolation
The degree of reproductive isolation is defined as (Barton, 1996; Palmer and Feldman, 2009): , where is the mean fitness of haploid F1 hybrid offspring from crosses between individuals from the two populations, and is the average of the mean fitnesses of the individual populations. The calculation of and only takes into account the contribution of recombination, and ignores mutation. Values of I can vary between 0 (the populations are undifferentiated or r = 0) and 1 (all F1 hybrids are unfit).
Founder effect speciation
To simulate a founder event (Templeton, 1980; Carson and Templeton, 1984), a new population is founded from a sample of N0 individuals from an infinite-sized population at one stable equilibrium. The population is then allowed to grow according to the equation Nt = λtN0, where t is the generation number and λ is the finite rate of increase. At generation t, the expected vector of genotype frequencies is calculated using equation (1) and a random sample of size Nt is drawn from a multinomial distribution with probabilities . Once the population reaches Nt > 104 individuals, it is allowed to evolve deterministically to equilibrium. If the population evolves to a different equilibrium from that of the source population, it is counted as a shift.
For the adaptive landscape in Figure 2A, every simulation run was evolved to equilibrium. For the neutral network in Figure 5A, only simulation runs where at least one of the N0 founder individuals carried the A1B2 haplotype were evolved to equilibrium. Similarly, for the neutral network Figure 7A, only simulation runs where at least one of the N0 founder individuals carried either the D allele or the abde haplotype were evolved to equilibrium. Thus, the estimates of the probability that a founder event causes a stochastic shift (PS) for the neutral networks in Figures 5A and 7A slightly underestimate the true value because mutations occurring early during the population expansion phase could cause a shift.
To estimate the probability that a founder event causes a stochastic shift (PS) as many tries (ν) as required to get σ successful shifts were run. The following unbiased estimator was used: 95% confidence intervals were calculated by parametric bootstrapping: for each estimate of PS, 106 random samples of σ values from the negative binomial distribution with probability of success PS were generated and PS was recalculated; the confidence intervals were estimated as the 2.5% and 97.5% quantiles of the distribution of simulated PS values.
Acknowledgments
We thank N. Barton, T. Cooper, J. Cuesta, J. Krug, A. Kalirad, S. Manrubia, I. Nemenman, and D. Weissman, for helpful discussions. A. Kalirad and I. Patanam contributed to coding, testing, and documentation.
Additional information
Funding
Competing interests
The authors declare no competing financial interests.
Author contributions
The project was initiated by TP and RBRA. TP wrote preliminary code and collected preliminary data. This study was conceived, performed and interpreted primarily by RBRA, with contributions from both TP and KEB. The manuscript was written primarily by RBRA, with contributions from both TP and KEB.