Abstract
Even if a species’ phenotype does not change over evolutionary time, the underlying mechanism may change, as distinct molecular pathways can realize identical phenotypes. Here we use quantitative genetics and linear system theory to study how a gene network underlying a conserved phenotype evolves, as the genetic drift of small changes to these molecular pathways cause a population to explore the set of mechanisms with identical phenotypes. To do this, we model an organism’s internal state as a linear system of differential equations for which the environment provides input and the phenotype is the output, in which context there exists an exact characterization of the set of all mechanisms that give the same input–output relationship. This characterization implies that selectively neutral directions in genotype space should be common and that the evolutionary exploration of these distinct but equivalent mechanisms can lead to the reproductive incompatibility of independently evolving populations. This evolutionary exploration, or system drift, proceeds at a rate proportional to the amount of intrapopulation genetic variation divided by the effective population size (Ne). At biologically reasonable parameter values this process can lead to substantial interpopulation incompatibility, and thus speciation, in fewer than Ne generations. This model also naturally predicts Haldane’s rule, thus providing another possible explanation of why heterogametic hybrids tend to be disrupted more often than homogametes during the early stages of speciation.
Introduction
It is an overarching goal of many biological subdisciplines to attain a general understanding of the function and evolution of the complex molecular machinery that translates an organism’s genome into the characteristics on which natural selection acts, the phenotype. For example, there is a growing body of data on the evolutionary histories and molecular characterizations of particular gene regulatory networks [Jaeger, 2011, Davidson and Erwin, 2006, Israel et al., 2016], as well as thoughtful verbal and conceptual models [True and Haag, 2001, Weiss and Fullerton, 2000, Edelman and Gally, 2001, Pavlicev and Wagner, 2012]. Mathematical models of both particular regulatory networks and the evolution of such systems in general can provide guidance where intuition fails, and thus has the potential to discover general principles in the organization of biological systems as well as provide concrete numerical predictions [Servedio et al., 2014]. There is a substantial amount of work studying the evolution of gene regulatory networks, in both abstract [Wagner, 1994, 1996, Siegal and Bergman, 2002, Bergman and Siegal, 2003, Draghi and Whitlock, 2015] and empirically inspired computational [Mjolsness et al., 1991, Jaeger et al., 2004, Kozlov et al., 2015, Crombach et al., 2016, Wotton et al., 2015, Chertkova et al., 2017] frameworks.
It is well known that in many contexts mathematical models can fundamentally be nonidentifiable and/or indistinguishable – meaning that there can be uncertainty about an inferred model’s parameters or even its claims about causal structure, despite access to complete and perfect data [Bellman and Åström, 1970, Grewal and Glover, 1976, Walter et al., 1984]. Models with different parameter schemes, or even different mechanics can make equally accurate predictions, but still not actually reflect the internal dynamics of the system being modelled. In control theory, where electrical circuits and mechanical systems are often the focus, it is understood that there can be an infinite number of “realizations”, or ways to reverse engineer the dynamics of a “black box”, even if all possible input and output experiments on the “black box” are performed [Kalman, 1963, Anderson et al., 1966, Zadeh and Deoser, 1976]. The inherent nonidentifiability of chemical reaction networks is sometimes referred to as “the fundamental dogma of chemical kinetics” [Craciun and Pantea, 2008]. In computer science, this is framed as the relationship among processes that simulate one another [Van der Schaft, 2004]. Finally, the field of inverse problems studies those cases in which, despite the existence of a theoretical one-to-one mapping between a model and behavior, tiny amounts of noise make inference problems nonidentifiable in practice [Petrov and Sizikov, 2005].
Nonidentifiability is a major barrier to mechanistic understanding of real systems, but viewed from another angle, this concept can provide a starting point for thinking about externally equivalent systems-systems that evolution can explore, so long as the parameters and structures can be realized biologically. These functional symmetries manifest in convergent and parallel evolution, as well as developmental system drift: the observation that macroscopically identical phenotypes in even very closely related species can in fact be divergent at the molecular and sequence level [Kimura, 1981, True and Haag, 2001, Tanay et al., 2005, Tsong et al., 2006, Hare et al., 2008, Lavoie et al., 2010, Vierstra et al., 2014, Matsui et al., 2015, Dalal et al., 2016, Dalal and Johnson, 2017].
In this paper we outline a theoretical framework to study the evolution of biological systems, such as gene regulatory networks. We study the evolution of an optimally adapted population under stable conditions, that is where phenotype is under stabilizing selection. Even if the phenotype remains stable over evolutionary time, the underlying mechanism might not remain so, as many distinct (and mutationally connected) molecular pathways can realize identical phenotypes. Below, we first apply results from system theory which give an analytical description of the set of all linear gene network architectures that yield identical phenotypes. Since these phenotypically equivalent gene networks are not necessarily compatible with one another, system drift may result in reproductive incompatibility between sufficiently isolated populations, even in the absence of any sort of adaptive, selective, or environmental change. We then use quantitative genetic theory to estimate how quickly reproductive incompatibility due to system drift manifests.
It is not a new observation that there is often more than one way to do the same thing, or that speciation can be the result of (nearly) neutral processes. The potential for speciation has been analyzed in models of traits under stabilizing selection determined additively by alleles at many loci [Wright, 1935, Barton, 1986, 2001], in related fitness landscape models [Fraïsse et al., 2016], and for pairs of traits that must match but whose value is unconstrained [Sved, 1981]. It has also been shown that population structure can allow long-term stable coexistence of incompatible genotypes encoding identical phenotypes [Phillips, 1996]. However, previous simulations of system drift in regulatory sequences [Tulchinsky et al., 2014] and a regulatory cascade Porter and Johnson [2002] found that speciation could be rapid under directional selection but only equivocal support for speciation under models of purely neutral drift. Our model differs from previous work in that we use linear systems theory to both explore a much richer class of regulatory network models, and provide analytical expectations in large populations with complex phenotypes that would be inaccessible to population simulations.
Results
Using gene regulatory network dynamics as the basis for our models, we begin with a collection of n coregu-lating molecules ‒ such as transcription factors ‒ as well as external or environmental inputs. We write κ(t) for the vector of n molecular concentrations at time t. The vector of m “inputs” determined exogenously to the system is denoted u(t), and the vector of ℓ “outputs” is denoted ϕ(t). The output is merely a linear function of the internal state: ϕi(t) = Σj Cij κj(t) for some matrix C. Since ϕ is what natural selection acts on, we refer to it as the phenotype (meaning the “visible” aspects of the organism), and in contrast refer to κ as the kryptotype, as it is “hidden” from direct selection. Although ϕ may depend on all entries of κ, it is usually of lower dimension than κ, and we tend to think of it as the subset of molecules relevant for survival. The dynamics are determined by the matrix of regulatory coefficients, A, a time-varying vector of inputs u(t), and a matrix B that encodes the effect of each entry of u on the elements of the kryptotype. The rate at which the ith concentration changes is a weighted sum of the concentrations as well as the input:
Furthermore, we always assume that κ(0) = 0, so that the kryptotype measures deviations from initial concentrations. Here A can be any n × n matrix, B any n × m, and C any ℓ × n dimensional matrix, with usually ℓ and m less than n. We think of the system as the triple (A, B, C), which translates (time-varying) m-dimensional input u(t) into the ℓ-dimensional output ϕ(t). Under quite general assumptions, we can write the phenotype as which is a convolution of the input u(t) with the system’s impulse response, which we denote as h(t)≔ CeAt B.
In terms of gene regulatory networks, Aij determines how the jth transcription factor regulates the ith transcription factor. If Aij > 0, then κj upregulates κi, while if Aij < 0, then κj downregulates κi. The ith row of A is therefore determined by genetic features such as the strength of j-binding sites in the promoter of gene i, factors affecting chromatin accessibility near gene i, or basal transcription machinery activity. The form of B determines how the environment influences transcription factor expression levels, and C might determine the rate of production of downstream enzymes. To demonstrate this approach, we apply it to construct a simple gene network in Example 1 below.
Example 1 (An oscillator)
For illustration, we consider an extremely simplified model of oscillating gene transcription, as for instance is found in cell cycle control or the circadian rhythm. There are two genes, whose transcript concentrations are given by κ1(t) and κ2(t), and gene-2 upregulates gene-1, while gene-1 downregulates gene-2 with equal strength. Only the dynamics of gene-1 are consequential to the oscillator (perhaps the amount of gene-1 activates another downstream gene network). Lastly, both genes are equally upregulated by an exogenous signal. The dynamics of the system are described by
In matrix form the system regulatory coefficients are given as, Suppose the input is an impulse at time zero (a delta function), and so its phenotype is equal to its impulse response,
The system and its dynamics are referred to in Figure 1. We return to the evolution of such a system below.
Equivalent gene networks
As reviewed above, some systems with identical phenotypes are known to differ, sometimes substantially, at the molecular level; systems with identical phenotypes do not necessarily have identical kryptotypes. How many different mechanisms perform the same function?
Two systems are equivalent if they produce the same phenotype given the same input, i.e., have the same input-output relationship. We say that the systems defined by (A, B, C) and are phenotypically equivalent if their impulse response functions are the same: h(t) = for all t ≥ 0. This implies that for any acceptable input u(t), if (κu(t), ϕu(t)) and are the solutions to equation (1) of these two systems, respectively, then
In other words, phenotypically equivalent systems respond identically for any input.
One way to find other systems phenotypically equivalent to a given one is by change of coordinates: if V is an invertible matrix, then the systems (A,B,C) and (VAV−1, VB, CV−1) are phenotypically equivalent because their impulse response functions are equal:
However, not all phenotypically equivalent systems are of this form: systems can have identical impulse responses without being coordinate changes of each other. In fact, systems with identical impulse responses can involve interactions between different numbers of molecules, and thus have kryptotypes in different dimensions altogether.
This implies that most systems have at least n2 degrees of freedom, where recall n is the number of components of the kryptotype vector. This is because for an arbitrary n × n matrix Z, taking V to be the identity matrix plus a small perturbation in the direction of Z above implies that moving A in the direction of ZA − AZ while also moving B in the direction of ZB and C in the direction of −CZ will leave the phenotype unchanged to second order in the size of the perturbation. If the columns of B and the rows of C are not all eigenvectors of A, then any such Z will result in a different system.
It turns out that in general, there are more degrees of freedom, except if the system is minimal ‒ meaning, informally, that it uses the smallest possible number of components to achieve the desired dynamics. Results in system theory show that any system can be realized in a particular minimal dimension (the dimension of the kryptotype, nmin), and that any two phenotypically equivalent systems of dimension nmin are related by a change of coordinates. Since gene networks can grow or shrink following gene duplications and deletions, these additional degrees of freedom can apply, in principle, to any system.
Even if the system is not minimal, results from systems theory explicitly describe the set of all pheno-typically equivalent systems. We refer to (A0, B0, C0) as the set of all systems phenotypically equivalent to the system defined by (A0, B0, C0):
These systems need not have the same kryptotypic dimension n, but must have the same input and output dimensions (ℓ and m, respectively).
The Kalman decomposition, which we now describe informally, elegantly characterizes this set [Kalman, 1963, Kalman et al., 1969, Anderson et al., 1966]. To motivate this, first note that the input u(t) only directly pushes the system in certain directions (those lying in the span of the columns of B). As a result, different combinations of input can move the system in any direction that lies in what is known as the reachable subspace. Analogously, we can only observe motion of the system in certain directions (those lying in the span of the rows of C), and so can only infer motion in what is known as the observable subspace. The Kalman decomposition then classifies each direction in kryptotype space as either reachable or unreachable, and as either observable or unobservable. Only the components that are both reachable and observable determine the system’s phenotype − that is, components that both respond to an input and produce an observable output.
Concretely, the Kalman decomposition of a system (A, B, C) gives a change of basis P such that the transformed system (PAP−1, PB, CP−1) has the following form: and
The impulse response of the system is given by and therefore, the system is phenotypically equivalent to the minimal system (Aro, Bro, Cro).
This decomposition is unique up to a change of basis that preserves the block structure. In particular, the minimal subsystem obtained by the Kalman decomposition is unique up to a change of coordinates. This implies that there is no equivalent system with a smaller number of kryptotypic dimensions than the dimension of the minimal system. It is remarkable that the gene regulatory network architecture to achieve a given input–output map is never unique – both the change of basis used to obtain the decomposition and, once in this form, all submatrices other than Aro, Bro, and Cro can be changed without affecting the phenotype, and so represent degrees of freedom. (However, some of these subspaces may affect how the system deals with noise.)
Note on implementation: The reachable subspace, which we denote by , is defined to be the closure of span(B) under applying A, and the unobservable subspace, denoted , is the largest A-invariant subspace contained in the null space of C. The four subspaces, are defined from these by intersections and orthogonal complements – ro refers to the both reachable and observable subspace, while refers to the unreachable and unobservable subspace, and similarly for.
For the remainder of the paper, we interpret as the neutral set in the fitness landscape, along which a large population will drift under environmental and selective stasis. Even if the phenotype is constrained and remains constant through evolutionary time, the molecular mechanism underpinning it is not constrained and likely will not be conserved.
Finally, note that if B and C are held constant − i.e., if the relationships between environment, kryptotype, and phenotype do not change − there are still usually degrees of freedom. The following example 2 gives the set of minimal systems equivalent to the oscillator of Example 1, that all share common B and C matrices. The oscillator can also be equivalently realized by a three-gene (or larger) network, and will have even more evolutionary degrees of freedom available, as in Figure 3.
Example 2 (All equivalent rewirings of the oscillator)
The oscillator of example 1 is minimal, and so any equivalent system is a change of coordinates by an invertible matrix V. If we further require B and C to be invariant then we need VB = B and CV = C. Therefore the following one-parameter family (A(τ),B,C) describes the set of all two-gene systems phenotypically equivalent to the oscillator:
The resulting set of systems are depicted in Figure 2.
Sexual reproduction and recombination
Parents with phenotypically equivalent yet differently wired gene networks may produce offspring with dramatically different phenotypes. If the phenotypes are significantly divergent then the offspring may be inviable or otherwise dysfunctional, despite both parents being well adapted. If this is consistent for the entire population, we would consider them to be separate species, in accord with the biological species concept [Mayr, 2000].
First, we must specify how sexual reproduction acts on these systems. Suppose that each of a diploid organisms’ two genomes encodes a set of system coefficients. We assume that a diploid which has inherited systems (A’, B’, C’) and (A”, B”, C”) from its two parents has phenotype determined by the system that averages these two, ((A’ + A”)/2, (B’ + B”)/2, (C’ + C”)/2).
Each genome an organism inherits is generated by meiosis, in which both of its diploid parents recombine their two genomes, and so an F1 offspring carries one system copy from each parent, and an F2 is an offspring of two independently formed F1s. If the parents are from distinct populations, these are simply first-and second-generation hybrids, respectively.
Exactly how the coefficients (i.e., entries of A, B and C) of a haploid system inherited by an offspring from her diploid parent are determined by the parent’s two systems depends on the genetic basis of any variation in the coefficients. Thanks to the randomness of meiotic segregation, the result is random to the extent that each parent is heterozygous for alleles that affect the coefficients. Since the ith row of A summarizes how each gene regulates gene i, and hence is determined by the promoter region of gene i, the elements of a row of A tend to be inherited together, which will create covariance between entries of the same row. It is, however, a quite general observation that the variation seen among recombinant systems is proportional to the difference between the two parental systems.
Offspring formed from two phenotypically identical systems do not necessarily exhibit the same phenotype as both of its parents − in other words , the set of all systems phenotypically equivalent to a given one, is not, in general, closed under averaging or recombination. If sexual recombination among systems drawn from yields systems with divergent phenotypes, populations containing significant diversity in can carry genetic load, and isolated populations may fail to produce hybrids with viable phenotypes.
Hybrid incompatibility
Two parents with the optimal phenotype can produce offspring whose phenotype is suboptimal if the parents have different underlying systems. How quickly do hybrid phenotypes break down as genetic distance between parents increases? We will quantify how far a system’s phenotype is from optimal using a weighted difference between impulse response functions. Suppose that ρ(t) is a nonnegative, smooth, square-integrable weighting function, h0(t) is the optimal impulse response function and define the “distance to optimum” of another impulse response function to be
Consider reproduction between a parent with system (A, B, C) and another displaced by distance ∊ in the direction (X, Y, Z), i.e., having system (A + ∊X, B + ∊Y, C + ∊Z). We assume both are “perfectly adapted” systems, i.e., having impulse response function h0(t), and their offspring has impulse response function h∊(t). A Taylor expansion of D(h∊) in ∊ shows that the phenotype of an F1 hybrid between these two is at distance proportional to ∊2 from optimal, while F2 hybrids are at distance proportional to ∊. This is because an F1 hybrid has one copy of each parental system, and therefore lies directly between the parental systems (see Figure 4) − the parents both lie in , which is the valley defined by D, and so their midpoint only differs from optimal due to curvature of. In contrast, an F2 hybrid may be homozygous for one parental type in some coefficients and homozygous for the other parental type in others; this means that each coefficient of an F2 may be equal to either one of the parents, or intermediate between the two; this means that possible F2 systems may be as far from the optimal set, , as the distance between the parents. The precise rate at which the phenotype of a hybrid diverges depends on the geometry of the optimal set relative to segregating genetic variation.
Example 3 (Hybrid incompatibility: misregulation due to system drift)
Offspring of two equivalent systems from Example 2 can easily fail to oscillate. For instance, the F1 offspring between homozygous parents at τ = 0 and τ = −2 has phenotype ϕF1 (t) = et, rather than ϕ(t) = sin t + cos t. However, the coefficients of these two parental systems differ substantially, probably more than would be observed between diverging populations. In figure 5 we compare the phenotypes for F1 and F2 hybrids between more similar parents, and see increasingly divergent phenotypes as the difference between the parental systems increases. (In this example, the coefficients of A(∊) differ from those of A(0) by an average factor of 1 + ∊/2; such small differences could plausibly be caused by changes to promoter sequences.) This divergence is quantified in Figure 6, which shows that mean distance to optimum phenotype of the F1 and F2 hybrid offspring between A(0) and A(∊) increases with ∊2 and ∊, respectively.
Haldane’s rule
This model naturally predicts Haldane’s rule, the observation that if only one hybrid sex is sterile or inviable it is likely the heterogametic sex (e.g., the male in XY sex determination systems) [Haldane, 1922, Orr, 1997]. For example, consider an XY species with a two-gene network where the first gene resides on an autosome and the second gene on the X chromosome. A male whose pair of haplotypes is has phenotype determined by if dosage compensation upregulates het-erogametes by a factor of two relative to homogametes (as with Drosophila), while a female homozygous for the haplotype has phenotype determined by An F1 male offspring of these two will have its phenotype determined by. If both genes resided on the autosomes, this system would only be possible in an F2 cross. More generally, if the regulatory coefficients for a system are shared between the sex and one or more autosomal chromosomes, F1 males are effectively equivalent to purely autosomal-system F2 hybrids, and recall that F2s are significantly less fit on average than F1s (see Figure 6). Although many alleles will be dominant if the phenotype−fitness relationship is convex, the underlying mechanism does not depend on the dominance theory Turelli and Orr [1995] to explain Haldane’s rule: instead, it derives from the nature of segregation load.
System drift and the accumulation of incompatibilities
Thus far we have seen that many distinct molecular mechanisms can realize identical phenotypes and that these mechanisms may fail to produce viable hybrids. Does evolution shift molecular mechanisms fast enough to be a significant driver of speciation? To approach this question, we explore a general quantitative genetic model in which a population drifts stochastically near a set of equivalent and optimal systems due to the action of recombination, mutation, and demographic noise. Although this is motivated by the results on linear systems above, the quantitative genetics calculations are more general, and only depend on the presence of genetic variation and a continuous set of phenotypically equivalent systems.
We will suppose that each organism’s phenotype is determined by its vector of coefficients, denoted by x = (x1, x2,…, xp), and that the corresponding fitness is determined by the distance of its phenotype to optimum. The optimum phenotype is unique, but is realized by many distinct x - those falling in the “optimal set” . The phenotypic distance to optimum of an organism with coefficients x is denoted D(x). In the results above, x = (A,B,C) and D(x) is given by equation (5). The fitness of an organism with coefficients x will be exp(−D(x)2). We assume that in the region of interest, the map D is smooth and that we can locally approximate the optimal set as a quadratic surface. As above, an individual’s coefficients are given by averaging its parentally inherited coefficients and adding random noise due to segregation and possibly new mutation. Concretely, we use the infinitesimal model for reproduction [Barton et al., 2017] - the offspring of parents at x and x’ will have coefficients (x + x’)/2 + ε, where ε is a random Gaussian displacement due to random assortment of parental alleles.
System drift
We work with a randomly mating population of effective size Ne. If the population variation has standard deviation σ in a particular direction, since subsequent generations resample from this diversity, the population mean coefficient will move a random distance of size per generation, simply because this is the standard deviation of the mean of a random sample [Lande, 1981]. Selection will tend to restrain this motion, but movement along the optimal set is unconstrained, and so we expect the population mean to drift along the optimal set like a particle diffusing. The amount of variance in particular directions in coefficient space depends on constraints imposed by selection and correlations between the genetic variation underlying different coefficients (the G matrix [Arnold et al., 2008]). It therefore seems reasonable to coarsely model the time evolution of population variation in regulatory coefficients as a “cloud” of width σ about the population mean, which moves as an unbiased Brownian motion through the set of network coefficients that give the optimal phenotype.
Next, we calculate with some simplifying assumptions to give the general idea; multivariate derivations appear in Appendix A. There will in general be different amounts of variation in different directions; to keep the discussion intuitive, we only discuss σN, the amount of variation in “neutral” directions (i.e., directions along ), and σS, the amount of variation in “selected” directions (perpendicular to ). The other relevant scale we denote by γ, which is the scale on which distance to phenotypic optimum changes as x moves away from the optimal set, . Concretely, γ is where x is optimal and z is a “selected” direction perpendicular to . With these parameters, a typical individual will have a fitness of around exp(− (σS/γ)2). Of course, there are in general many possible neutral and selected directions; we take γ to be an appropriate average over possible directions.
Hybridization
The means of two allopatric populations each of effective size Ne separated for T generations will be a distance roughly of order apart along χ. (Consult figure 4 for a conceptual diagram.) A population of F1 hybrids has one haploid genome from each, whose coefficients are averaged, and so will have mean system coefficients at the midpoint between their means. The distribution of F2 hybrids will have mean at the average of the two populations, but will have higher variance. The variance of F2 hybrids can be shown to increase linearly with the square of the distance between parental population means under models of both simple and polygenic traits. This is suggested by figure 4 and shown in Appendix B. Concretely, we expect the population of F1s to have variance in the selected direction (the same as within each parental population), but the population of F2 hybrids will have variance of order where ω is a factor that depends on the genetic basis of the coefficients. If the optimal set has dimension q, using the polygenic model of appendix B, ω is proportional to the number of degrees of freedom: ω = (p − q)/8. If each trait is controlled by a single locus, as in figure 4, the value is similar.
What are the fitness consequences? A population of F2 hybrids will begin to be substantially less fit than the parentals once they differ from the optimum by a distance of order γ, i.e., once . This implies that hybrid incompatibility among F2 hybrids should appear much slower - on a time scale of Ne(γ/σN)2/(4ω) generations. The F1s will not suffer fitness consequences until the hybrid mean is further than γ from the optimum; as suggested by figure 4, Taylor expanding D2 along the optimal set implies that this deviation of the mean from optimum grows with the square of the distance between the parental populations, and so we expect fitness costs in F1s to appear on a time scale of generations.
For a more concrete prediction, suppose that the distribution among hybrids is Gaussian. A population whose trait distribution is Gaussian with mean μ and variance σ, has mean fitness
This assumes a single trait, for simplicity. A population of F2 hybrids will have, as above, variance σ2 = The mean diverges with the square of the distance between the parentals, so we set μ = cμγT/Ne, where cμ is a constant depending on the local geometry of the optimal set. The mean fitness in parental populations is as in equation 6 with μ = 0 and σ = σS. This implies that if we define to be the mean relative fitness among F2 hybrids between two populations separated by T generations, (i.e., the mean fitness divided by the mean fitness of the parents) then
If each of the q selected directions acts independently, the drop in fitness will be the expression for the correlated, multivariate case is given in Appendix A.1. We discuss the implications of this expression in the next section.
Speciation rates under neutrality
Equation (7) describes how fast hybrids become inviable as the time that the parental populations are isolated increases; what does this tell us about speciation rates under neutrality? From equation (7) we observe that time is always scaled in units of Ne generations, the population standard deviations are always scaled by γ, and the most important term is the rate of accumulation of segregation variance, 4ω(σN/γ)2. All else being equal, this process will lead to speciation more quickly in smaller populations and in populations with more neutral genetic variation (larger σN). These parameters are related - larger populations generally have more genetic variation - but since these details depend on the situation, we leave these separate.
How does this prediction depend on the system size and constraint? If there are p trait dimensions, constrained in q dimensions, and if ω is proportional to p − q, then the rate that F2 fitness drops is, roughly, (1 + 4(p − q)KT/Ne)−q/2 ∝ q(p − q), where K is a constant. Both degree of constraint and number of available neutral directions affect the speed of accumulation of incompatibilities - more unconstrained directions allows faster system drift, but more constrained directions implies greater fitness consequences of hybridization. However, note that in real systems, it is likely that γ also depends on p and q.
Now we will interpret equation (7) in three situations plausible for different species, depicting how hybrid fitness drops as a function of T/Ne in Figure 7. In all cases, the fitness drop for F1 hybrids is much smaller than that of F2 hybrids, so we work only with the first (square-root) term in equation (7).
Suppose in a large, genetically diverse population, the amount of heritable variation in the neutral and selected directions are roughly equal (σN ≈ σS) but the overall amount of variation is (weakly) constrained by selection (σN ≈ γ). If so, then the first term of equation (7) is If also ω = 1, then, for instance, after 0.1 Ne generations the average F2 fitness has dropped by 10% relative to the parentals.
Consider instead a much smaller, isolated population whose genetic variation is primarily constrained by genetic drift, so that σN ≈ σS ≪ γ. Setting a = (σN/γ)2 to be small, the fitness of F2 hybrids is Hybrid fitness seems to drop more slowly in this case in figure 7, but since time is scaled by Ne, so speciation may occur faster than in a large population. However, at least in some models [Lynch and Hill, 1986], in small populations at mutation-drift equilibrium the amount of genetic variance is proportional to Ne, which would compensate for this difference, perhaps even predicting the rate of decrease of hybrid fitness to be independent of population size for small populations.
In the other direction, consider large metapopulations (or a “species complex”) among which heritable variation is strongly constrained by selection (i.e., there is substantial recombination load), so that σS ≈ γ but σN/γ is large. Then the fitness of F2 hybrids is , and could be extremely rapid if a is large.
For instance, between two populations of one million organisms that has 10 generations per year (a drosophilid species, perhaps) under the “large population” scenario of Figure 7A, system drift would lead to a substantial fitness drop of around 10% in F2 hybrids in only 10,000 years. This drop may be enough to induce evolutionary reinforcement of reproductive isolation. If one thousand of these organisms is isolated (perhaps on an island, as in Figure 7B), then a similar drop could occur in around 120 years. On the other hand, if the population is one of several of similar size that have recently come into secondary contact after population re-expansion, the situation may be similar to that of Figure 7C with Ne = 106, and so the same drop could occur after 1,100 years. (However, hyperdiverse populations of this last type may not be stable on these time scales.)
Genetic variation in empirical regulatory systems
What is known about the key quantity above, the amount of heritable variation in real regulatory networks? The coefficient Aij from the system (1) measures how much the rate of net production of i changes per change in concentration of j. It is generally thought that regulatory sequence change contributes much more to inter-and intraspecific variation than does coding sequence change affecting molecular structure [Schmidt et al., 2010]. In the context of transcription factor networks this may be affected not only by the binding strength of molecule j to the promoter region of gene i but also the effects of other transcription factors (e.g., cooperativity) and local chromatin accessibility [Stefflova et al., 2013]. For this reason, the mutational target size for variation in Aij may be much larger than the dozens of base pairs typically implicated in the handful of binding sites for transcription factor j of a typical promoter region, and single variants may affect many entries of simultaneously.
Variation in binding site occupancy may overestimate variation in A, since it does not capture buffering effects (if for instance only one site of many needs to be occupied for transcription to begin), and variation in expression level measures changes in steady-state concentration (our κi) rather than the rate of change. Nonetheless, these measures likely give us an idea of the scale of variability. It has been shown that between human individuals, there is differential occupancy in 7.5% of binding sites of a transcription factor (p65) Kasowski et al. [2010]. It has also been inferred that cis-regulatory variation accounts for around 2–6% of expression variation in human blood-derived primary cells Verlaan et al. [2009], and that human population variation explained about 3% of expression variation Lappalainen et al. [2013]. Allele-specific expression is indicative of standing genetic cis-regulatory variation; allele-specific expression in 7.2–8.5% of transcripts of a flycatcher species has been observed Wang et al. [2017], as well as allele-specific expression in 23.4% of genes studied in a baboon species Tung et al. [2015]. Taken together, this suggests that variation in the entries of A may be on the order of at least a few percent between individuals of a population – doubtless varying substantially between species and between genes.
Discussion
In this paper, we use tools from quantitative genetics and control theory to study the evolution of a mechanistic model of the genotype-phenotype map, in which the phenotype is subject to stabilizing selection. In so doing, we provide an explicit model of phenogenetic drift [Weiss and Fullerton, 2000] and developmental system drift [True and Haag, 2001]. In this context, the Kalman decomposition [Kalman, 1963] gives an analytical description of all phenotypically equivalent gene networks. It also implies that nearly all systems are nonidentifiable, and that, in general, there exist axes of genetic variation unconstrained by natural selection. The independent movement of separated populations along these axes by genetic drift can lead to a significant reduction in hybrid viability, and thus precipitate speciation, at a speed dependent on the effective population size and the amount of genetic variation. In this model, at biologically reasonable parameter values, system drift is a significant – and possibly rapid – driver of speciation. This may be surprising because hybrid inviability appears as a consequence of recombining different, yet functionally equivalent, mechanisms, and since species are often defined by their unique adaptations or morphologies.
Consistent with empirical observation of hybrid breakdown (e.g., Plötner et al. [2017]), we see that the fitnesses of F2 hybrids drop at a much faster rate than those of F1s. Another natural consequence of the model is Haldane’s rule, that if only one F1 hybrid sex is inviable or sterile it is likely to be the heterogametic sex. This occurs because if the genes underlying a regulatory network are distributed among both autosomes and the sex chromosome, then heterogametic F1s show variation (and fitnesses) similar to that seen in F2 hybrids.
Is there evidence that this is actually occurring? System drift and network rewiring has been inferred across the tree of life [Wotton et al., 2015, Crombach et al., 2016, Dalal and Johnson, 2017, Johnson, 2017, Ali et al., 2017], and there is often significant regulatory variation segregating within populations. Transcription in hybrids between closely related species with conserved transcriptional patterns can also be divergent [Haerty and Singh, 2006, Maheshwari and Barbash, 2012, Coolon et al., 2014, Michalak and Noor, 2004, Mack and Nachman, 2016], and hybrid incompatibilities have been attributed to cryptic molecular divergence underlying conserved body plans [Gavin-Smyth and Matute, 2013]. Furthermore, in cryptic species complexes (e.g., sun skinks [Barley et al., 2013]), genetically distinct species may be nearly morphologically indistinguishable.
The origin of species not by means of natural selection?
As classically formulated, the Dobzhansky-Muller model of hybrid incompatibility is agnostic to the relative importance of neutral versus selective genetic substitutions [Coyne and Orr, 1998], and plausible mechanisms have been proposed whereby Dobzhansky–Muller incompatibilities could originate under neutral genetic drift [Lynch and Force, 2000] or stabilizing selection [Fierst and Hansen, 2009]. The same holds for the “pathway model” [Lindtke and Buerkle, 2015], which is closer to the situation here. However, previous authors have argued that neutral processes are likely too slow to be a significant driver of speciation [Nei et al., 1983, Seehausen et al., 2014]. This has led some to conclude that hybrid incompatibility is typically a byproduct of positive selection [Orr et al., 2004, Schluter, 2009] or a consequence of genetic conflict [Presgraves, 2010, Crespi and Nosil, 2013], two processes that typically act much more rapidly than genetic drift. However, our calculations suggest that even under strictly neutral processes, hybrid fitness breaks down as a function of genetic distance rapidly enough to play a substantial role in species formation across the tree of life. This is consistent with broad patterns such as the relationship between molecular divergence and genetic isolation seen by Roux et al. [2016], and the clocklike speciation rates observed by Hedges et al. [2015].
All of these forces – adaptive shifts, conflict, and network drift – are plausible drivers of speciation, and may even interact. Many of our observations carry over to models of directional selection – for instance, rapid drift along the set of equivalent systems could be driven by adaptation in a different, pleiotropically coupled system. Or, reinforcement due to local adaptation might provide a selective pressure that speeds up system drift. Furthermore, while the fitness consequences of incompatibility in any one given network may be small, the cumulative impact of system drift across the many different networks an organism relies on may be substantial. It remains to be seen how the relative strengths of these forces compare.
Fisher’s geometric model
There has been substantial work on models of quadratic stabilizing selection around a single optimum – i.e., Fisher’s geometric model [e.g. Simon et al., 2017]. Our model would have this form if viewed only in phenotype space, where there is a single optimum – but differs by explicitly considering evolutionary dynamics of underlying genotype-derived regulatory matrices. These make the quantitative genetic model we are led to a generalization of Fisher’s geometric model, with distinct behavior, due to the existence of neutral directions in genotype space. However, there are many shared aspects: for instance, Barton [1989] also found incompatibilities accumulating with genetic drift. Fisher’s geometric model also predicts Haldane’s rule and many other empirically observed patterns [Fraïsse et al., 2016], but for different underlying reasons.
Nonlinearity and model assumptions
Of course, real regulatory networks are not linear dynamical systems. Most notably, physiological limits put upper bounds on expression levels, implying saturating response curves. It remains to be seen how well these results carry over into real systems, but the fact that most nonlinear systems can be locally approximated by a linear one suggests our qualitative results may hold more generally. Furthermore, nonidentifiability (which implies the existence of neutral directions) is often found in practice in moderately complex models of biological systems [Gutenkunst et al., 2007, Piazza et al., 2008].
This simple quantitative genetics model we use above has been shown to produce good predictions in many situations, even when the substantial number of simplifying assumptions are violated [Bürger and Lande, 1994, Turelli and Barton, 1994]. The calculations above should be fairly robust even to substantial deviations from normality. A larger effect on these predictions seems likely due to correlations due to molecular constraint, genetic linkage, population structure, historical contingency and so forth. Although such considerations would not change the qualitative predictions of this model, their combined effects could substantially change the predicted rate of accumulation of incompatibilities.
Finally, despite our model’s precise separation of phenotype and kryptotype, this relationship in nature may be far more complicated as aspects of the kryptotype may be less “hidden” than we currently assume. For instance, attributes excluded from the phenotype as modelled here, ignore the potential energy costs associated with excessively large (non-minimal) kryptotypes, as well as the relationship between a specific network architecture and robustness to mutational, transcriptional, or environmental noise. More precise modeling will require better mechanistic understanding not only of biological systems, but also the nature of selective pressures and genetic variation in these systems.
A Genetic drift with a multivariate trait
For completeness, we provide a brief exposition of how a population evolves due to genetic drift with a quantitative genetics model, as in Lande [1981] or Hansen and Martins [1996]. These do not directly model underlying genetic basis, but developing a more accurate model is beyond the scope of this paper.
Suppose that the population is distributed in trait space as a Gaussian with covariance matrix Σ and mean μ, whose density we write as ƒ(·; Σ, μ). Selection has the effect of multiplying this density by the fitness function and renormalizing, so that if expected fitness of x is proportional to exp(−||Lx||2/2), then the distribution post-selection has density at x proportional to ƒ(x;Σ, μ)exp(−||Lx||2/2). By the computation below (“Completing the square”), the result is a Gaussian distribution with covariance matrix (Σ−1+LTL)−1 and mean (Σ−1 + LTL)−1Σ−1μ.
After selection, we have reproduction: suppose this occurs as in the infinitesimal model [Barton et al., 2017], so that each offspring of parents with traits x and y is drawn independently from a Gaussian distribution with mean (x+y)/2 and covariance matrix R. Here, R is the contribution of “segregation variance”, i.e., random choices of parental alleles. If = (Σ−1 + LTL)−1 is the covariance matrix of the parents post-selection, then the distribution of offspring will again be Gaussian, with mean equal to that of the parents and covariance matrix /2 + R.
In summary, a generation under this model modifies the mean (μ) and covariance matrix (Σ) of a population as follows:
What measures are stable under this transformation? The condition μ = μ’ reduces to ΣLTLμ = 0; if we assume R and therefore Σ are of full rank, then this happens if and only if μ is in the null space of L, i.e., if μ lies in a neutral direction. The condition Σ’ = Σ can also be solved, at least numerically. After rearrangement, it reduces to ΣLTLΣ + (I/2 − RLTL) Σ= R. Importantly, the mean μ does not affect either how the covariance matrix moves, or its stable shape.
Above we have described the expected motion of the mean and covariance. However, random resampling will introduce noise. Suppose that a population of N individuals behaves approximately as described above. By the above, we may expect that the covariance matrix stays close to a constant value Σ, computed from R and L as above, so that we need only consider motion of the mean, μ. Since we take a sample of size N to construct the next generation, the next generation’s mean is drawn from a Gaussian distribution with mean μ and covariance matrix Σ/N. Defining Γ = (I − (I + ΣLTL)−1), this can be written as where ∊ is a multivariate Gaussian with mean zero and covariance matrix Σ. Let μ(k) denote the mean in the kth generation, and suppose that μ differs from optimal by something of order is the rescaled process, then the previous equation implies that as N → ∞, in the limit v solves the Itô equation where now W(t) is a multivariate white noise. This has an explicit solution as a multivariate Ornstein-Uhlenbeck process:
The asymptotic variance of this process in the direction z is which is infinite if and only if Γz = 0, which occurs if and only if Lz = 0. This implies that at equilibrium, population mean trait values lie away from the optimal set by a Gaussian displacement of order with a covariance matrix given by equation (8).
Completing the square First note that if A is symmetric, and so if B is also symmetric and A + B is invertible,
Therefore, by substituting A = Σ−1 and B = LTL,
A.1 Gaussian load
Suppose that a population has a Gaussian distribution in d-dimensional trait space with mean μ and co-variance matrix Σ, and that fitness of an individual at x is exp(−||Lx||2/2). Then, completing the square as above with A =Σ−1, y = μ, and B = LTL, and defining Q = (Σ−1 + LTL)−1,
Now suppose that Σ = σ2 I and L = I/γ. Then,
Also,
B Evolution of segregation covariance
The description above does not describe how two diverging populations interact, since the amount of segregation variance, quantified by R, will not stay constant. To get an idea of how this might change, suppose that a trait is determined by L unlinked, biallelic loci, and that the ith locus has two alleles with additive effects ±αi, so that being homozygous for the + allele contributes +2αi to the trait. For simplicity, we will neglect the effects of selection. If the + allele at locus i is at frequency pi in a population, then the mean and genetic variance of the trait in a diploid population with random mating is
Segregation variance between two parents depends on the loci at which either are heterozygous, and each locus contributes independently since alleles are additive. If the alleles are at Hardy-Weinberg proportions, then since segregation acts like a fair coin flip, a heterozygous locus contributes to the variance, and so the mean segregation variance, averaging across parents, is
On the other hand, if the second parent came from a distinct population with frequencies qi (an F1 hybrid), this would be
If we assume that the populations are at equilibrium, R0(p) ≈ R0(q), and so R1(p, q)≈ R0(p).
Now consider an F2 hybrid, where both parents are F1 and so each heterozygous at locus i with probability pi(1 − qi) + (1 −pi)qi. Then
Suppose that the two populations are slightly drifted from each other, with frequency difference pi−qi = 2∊i. Then,
If the frequencies have evolved neutrally in unconnected, Wright-Fisher populations of effective size N for t generations from a common ancestor with allele frequency u, then ∊ has mean zero and variance roughly 2u(1 − u)t/N. Still assuming the populations are at stationarity, so that R0 is constant between the two, and taking the frequencies pi as a proxy for the ancestral frequencies ui, this implies that we expect
On the other hand, the expected squared difference in trait means between two such populations is
This implies that under this model, segregation variance in F2 hybrids between two populations is roughly increased by a factor of 1/8 of the difference between their means.
Acknowledgements
We would like to thank Sergey Nuzhdin, Stevan Arnold, Michael Turelli, Patrick Phillips, Erik Lundgren and Hossein Asgharian for valuable discussion. We would also like to thank Nick Barton, Sarah Signor and Todd Parsons for very helpful comments on the manuscript. Work on this project was supported by funds from the Sloan Foundation and the NSF (under DBI-1262645) to PR.
Footnotes
jsschiff{at}usc.edu, plr{at}uoregon.edu