Abstract
We develop a new approach for stochastic analysis of biochemical reaction systems with arbitrary distributions of waiting times between reaction events. Specifically, we derive a stationary generalized chemical master equation for a non-Markovian reaction network. Importantly, this equation allows to transform the original non-Markovian problem into a Markovian one by introducing a mean reaction propensity function for every reaction in the network. Furthermore, we derive a stationary generalized linear noise approximation for the non-Markovian system, which is convenient to the direct estimation of the stationary noise in state variables. These derived equations can have broad applications, and exemplars of two representative non-Markovian models provide evidence of their applicability.
Introduction
-The theory of Markov processes is well established, and has found its applications in an array of scientific fields including biology, chemistry, physics, epidemiology, ecology, and finance [1-3]. An important foundation of this theory is the chemical master equation (CME) [1,2,4], which can be simulated using numerical methods [5-8], or can be analytically solved in some cases [1,2,9-14].The mathematical tractability of Markov processes enables great simplifications in problem formulation, leading to important successes in the description of many stochastic processes ranging from gene regulation and mass transport to disease spreading and animal species interactions [1-3].
A Markovian biochemical process is memoryless, that is, the probabilities that future reaction events happen depend only on the present state of the system, independent of the prior history. However, many biochemical processes have memory or are non-Markovian. Non-exponentially distributed waiting times [15-18] and time delays [19-22] between reaction events can lead to memory. Non-Markovity has also been verified by the increasing availability of time-resolved data on different kinds of interactions [23-29]. The continuous time random walk (CRTW) provides a systematic starting point to account for arbitrary waiting time distributions between reaction events [30-38].
Recently, a generalized chemical master equation (gCME) was derived, which is capable of accounting for non-exponential interreaction times and the resulting non-Markovian character of reaction dynamics in time [38]. But the problem with the gCME is that the so-called “memory functions” are implicitly expressed by waiting time distributions and/or distributed delays. This leads to notorious difficulties in obtaining the system’s behavior, greatly limiting applications of the gCME (although numerical calculations can sometimes provide useful information [39-44]). While a dynamic distribution exactly captures stochastic behavior of a chemical reaction system, steady-state distribution is an important quantity needed to characterize the stationary behavior of the underlying system. To assess how experimental data can be informative, it is also needed to calculate or simulate aspects of steady-state distribution [45-47].
In order to obtain information on stationary behavior of a non-Markovian biochemical system, we introduce a mean reaction propensity function for every reaction to replace the memory function in the traditional treatment. Thus, the original ‘memory functions’ are currently converted into memoryless functions expressed explicitly by the integrals of the given waiting time distributions over the full time. Importantly, we derive a stationary gCME (sgCME), which allows to transform the original sticky non-Markovian problem into a mathematically tractable Markovian one. With this novel formulation, we further derive a stationary generalized linear noise approximation (sgLNA) for the original non-Markovian system, which allows the direct estimation of the stationary noise in state variables. These derived stationary equations can have broad applications and are particularly useful for the analytical derivation of stationary distributions in, e.g., a gene model of bursty expression with general waiting time distributions (Note: this is an issue unsolved in previous works).
General Theory
-Consider a general chemical reaction network consisting of N different species (denoted by Xj, j = 1, 2,…, N) that participate in L different reactions of the form where rij and pij are stoichiometric coefficients, taking non-negative integers, and the differences sij = pji − rji are stored in a N × L stoichiometric matrix S. Denote by nj the molecule number of species Xj and by n = (n1,, … nN)T the state vector, where T represents transpose. Let ψi (t; n) be the probability density function (PDF) of the ith reaction waiting time (depending on the system state, n), and Ψi (t; n) be the cumulative distribution function. In the following, we consider only the non-Markovity resulting from non-exponential waiting times between reaction events whereas the non-Markovity generated by distributed delays will be discussed later.
The time-evolutional gCME of the above network system has been established [31,33,34,38], but solving this equation has been thwarted to date. Instead, here we consider steady-state behavior of the system. Assume that the stationary probability of the system exists, and denote it by P (n). Then, based on the chemical CRTW theory [38], we can derive the following stationary equation (see the Supplemental Online Material [48] for its derivation) where E−sji is a step operator with the operation rule below: it removes sji molecules from species Xj in the ith reaction, i.e., E−sji f (n) = f (n1 ,…, nj − sji ,…, nN) for any function f (n). Symbol Ki (n) represents the mean propensity function of the ith reaction, accounting for the transition probability from a given state n to any other state. Importantly, we can show that Ki (n) is explicitly expressed by given waiting time distributions, that is,
See the Supplemental Online Material [48] for the derivation of Eq. (3). Equation (2) and Eq. (3) altogether constitute the final sgCME, which governs stationary behavior of the original non-Markovian reaction system.
First, we observe that function Ki (n) depends only on state variable n , independent of the prior history. In other words, the mean reaction propensity functions associated with the original non-Markovian process are memoryless. Then, we point out that the sgCME has important implications. For example, if we compare Eq. (2) with the common CME for the same structure reaction network with rate-limited reactions or with exponentially distributed waiting times (which is a certain Markovian chemical network), then we find that Eq. (2) is actually the stationary version of this CME. Moreover, if Ki (n) is taken as the ‘reaction propensity function’ (possibly a rational rather than polynomial-type function of n) of the ith reaction in this Markovian network, then we successfully convert a non-Markovian problem into a Markovian problem.
In particular, if the waiting times for some reaction are exponentially distributed, e.g., if ψ (t; n) = λi (n) e−λi(n)t I(0,∞) (t) for some i , where IA (x) is an indicator function of set A , i.e., IA ( x) = 1 if x ∈ A and IA ( x) = 0 otherwise, and λi (n) represents the transition probability of this reaction, implying Ψi (t; n) = 1 − e−λi(n)t I(0,∞) (t) , then we find (see the Supplemental Online Material [48] for derivation)
In other words, if the waiting times for some reaction follow an exponential distribution, then the corresponding Ki (n) is equal to the reaction propensity function of this reaction. If all the reaction waiting times are exponentially distributed, then the corresponding sgCME is reduced to the common stationary CME (sCME).
To help the reader further understand the physical meaning of function Ki (n) , we state some facts. First, represents the probability that the ith reaction happens within the infinitesimal waiting time [t,t + dt], so the integral represents the cumulative possibility that the ith reaction takes place within the full time. Then, the equality always holds, so the integral represents the mean waiting time that all the reactions happen. Thus, function Ki (n) represents the mean transition probability of the ith reaction within the mean waiting time.
As an application of Eq. (2) with Eq. (3), we consider an interesting case where the waiting time for every reaction in the reaction network is assumed to follow a power-law distribution of Pareto type, that is, ψi = αi (n) (τi(n))αi(n)/tαi(n)+1 I(0,∞) (t) [49]. This distribution is characterized by both a positive scale parameter τi (n) and a positive shape parameter αi (n). Note that if α(n) ∈(0,1] , such a waiting time is a parsimonious model for infinite-mean random variables due to the generalized central limit theorem [49]. In this case, the corresponding system is weakly ergodicity breaking [50,51], which is a common characteristic of anomalous transport in heterogeneous environments and can lead to a fractional-in-time differential equation [50]. In spite of this, we can show (see the Supplemental Online Material [48] for detail) which is finite, independent of the convergence of raw moments of the power-law distribution.
Time delays, which account for the non-Markovian nature of many random processes, play a key role in many problems involving biochemical reactions or mass transport [19-22]. If a distributed delay is introduced to the above reaction network, then we can show (see the Supplemental Online Material [48] for derivation) where represents the mean delay time of the ith reaction with the PDF and is assumed to finite. If we denote by τr the waiting time for the next reaction (i.e., the inter-reaction waiting time without delay), then represents the mean inter-reaction waiting time. In particular, if (which is called the mean global delay time) for 1 ≤ i ≤ L , i.e., if all the mean delay times are equal, then we have (see the Supplemental Material [48] for derivation) which indicates that the mean reaction propensity function of a reaction only depends on the mean global delay time, independent of the delay probability distribution. We point out that Eq. (7) is the generalization of a previous result in ref. [38] wherein an approximation was made.
Although we have derived the nice mathematical form of the sgCME for a general non-Markovian biochemical network (i.e., Eq. (2) with Eq. (3)), characterizing fluctuations in the individual reactive species is still difficult in some cases. On the other hand, it is well known that for a stochastic system, statistical quantities of the state variables, such as the first-order raw moments (or means) and the second-order central moments, are often most interesting since they can simply characterize the fluctuations in the state variables of the system. Here, we present a stationary generalized linear noise approximation (sgLNA) for the above non-Markovian system. Recall that for a given Markovian reaction system, the equations governing dynamics of the first-order raw moments (or means) and the second-order central moments of the state variables have been derived [1,2]. In order to calculate these statistical quantities in the above non-Markovian reaction system, we construct a Markovian biochemical process using Ki (n). More precisely, we construct a Markovian reaction network such that it has the same structure as the original non-Markovian reaction network but takes Ki (n) as the probability transition function of the ith reaction, where i = 1, 2,… , L. For such a constructed Markovian system, we can easily derive its rate equations, e.g., at steady state (see the Supplemental Online Material [48]), they are assumed to take the form [1,2] which is called a stationary generalized reaction rate equation, where x = ( x1, …, xN)T with xi representing the concentration of reactive species Xi, S = (sij) is a stoichiometric matrix, and K (x) = (K1(x), … KN (x))T is a N-dimensional vector. If the solution of Eq. (8) is denoted by xS, then xS is the vector of the stationary mean concentrations of the state variables in the original non-Markovian system.
In order to derive analytical formulae for calculating the second-order central moments of the state variables in the original non-Markovian reaction system, we adopt the Ω-expansion method [1,2]. Let Ω represent the volume of the system and write n = Ωx + Ω1/2 z. For convenience, we introduce two matrices AS and DS , the entries of which are given by and , respectively. Then, the covariance matrix ∑S = (〈x − xS)(x − xS)T〉) satisfies the following Lyapunov matrix equation (see the Supplemental Online Material [48] for derivation)
From this algebraic equation, we can easily obtain the stationary second-order central moments of the state variables in the original non-Markovian system.
Implementations
-Here we apply the above general theory to two representative examples. First, birth-and-death processes, with some straightforward additions such as innovation, are a simple, natural and formal framework for modeling a vast variety of biological processes such as population dynamics, speciation, genome evolution [1,2,52]. Therefore, the first example we will analyze is a generalized birth-death process, which constitutes a fundamental model of non-Markovian evolutionary dynamics. This process can be described by two reactions: where ψ1 (t; n) and ψ2 (t; n) are waiting time distributions of birth and death respectively, and n represents the molecule number of species X. This model is general and can include almost previously studied models of birth-death processes as its special cases [1,2,52]. According to the above sgCME, the steady-state equation corresponding to this process is given by [48] where Ki (n) (i = 1, 2) can be obtained through Eq. (9). From Eq. (10), we can obtain the iterative relation: K2 (n) P(n) = K1 (n − 1) P (n − 1) , from which we can further obtain the explicit expression of stationary distribution
In particular, if ψ1 (t, n) = ψ1 (t) I(0,∞) (t) , and λ2 (t, n) = λ2 ne−λ2nt I(0,∞) (t) , then we can have where ψ̃1 (·) is the Laplace transform of function ψ1 (t) , and is the reciprocal of the mean birth waiting time. Furthermore, if in which Γ(·) is the common gamma function, then we can obtain stationary distribution P (n) = ( 0F1 (; 2λ; λ2))−1 λ2n/((2λ)n n!) , where we denote λ = λ1/λ2 and 0F1 is a generalized hypergeometric function [53]. In addition, we can obtain the analytical mean and variance given by 〈n〉 = (21/k1 − 1)λ and ∑ = 21/k1−1 λ/k1 respectively. The Fano factor, which is defined as the ratio of the variance over the mean, is given by F = 21/k1−1/[(21/k1 − 1) k1. In the Supplemental Material [48], we also analyzed the case that waiting times for the birth process follow an exponential distribution and those for the death process follow a general distribution. In addition, we presented a numerical method for the case that waiting time distributions for birth and death processes are all general.
Numerical results are demonstrated in Fig. 1. From this figure, we observe that results obtained by sgCME and by sgLNA are in good agreement. Similarly, results obtained by Gillespie stochastic simulation (lines in Fig. 1(a)) are also in good accord with those obtained by theoretical prediction (empty circles in Fig. 1(a)). In addition, we observe from Fig. 1(b) and 1(c) that the larger the k1 is, the smaller are the mean and the Fano factor, implying that non-Markovity can reduce the mean and noise of the outcome. Note that k1 = 1 corresponds to the case of exponential waiting times.
As a simplification, the dynamics of gene expression probabilities is described often by coupled birth-death processes, where birth corresponds to protein synthesis while death occurs via degradation. Therefore, we next consider a model of gene self-regulation, which is described by where n represents the number of protein molecules [11,54,55]. We point out that this model belongs to the above discussed case in the absence of feedback, but it can be a more general birth-death process in other cases. For analytical consideration, we assume that waiting time distributions for protein birth and death take the forms: ψ1 (t; n) = ((λ1(n))k1/Γ(k1))tk1−1e−λ1(n)t I(0,∞) (t) where λ1(n) is a function of n , and ψ2 (t; n) = λ2 e−λ2t I(1,∞) (t) , respectively. Furthermore, we consider the case that function λ1(n) is linear in n , e.g., λ1 (n) = (n + n0)λ1 corresponding to linear feedback) is set, where λ1 is a constant and n0 is a positive integer. By simple calculations, we can show k1 (n) = [(nλ2 (n+n0)λ1)k1]/[(nλ2 + (n + n0)λ1)k1 − ((n + n0)λ1)k1] and K2(n) nλ2. In particular, if k1 = 1, n0 = 1, then P (n) = (1 − ρ)n ρ is a geometric distribution, where ρ = 1 − λ1/λ2 and n = 0,1, 2,…. For the case of nonlinear feedback, similar analysis can be carried out but the results are more complex.
In contrast to the above analyzed gene model that corresponds to constitutive expression, there is another expression way, i.e., bursty gene expression. Therefore, we next consider the second example, which is an on-off model of bursty gene expression with non-exponential waiting times (referring to Fig. 2(a)). The biochemical reactions be described by four reactions: where functions ψon (t; n) , ψoff (t; n) , ψg (t; n) and ψdeg (t; n) are all waiting time distributions, state vector n = (n1 , n2 , m)T with n1 = off and n2 = on as well as m representing the numbers of DNA molecules at off and on states as well as the number of mRNA molecules respectively, and the burst sizes B follow a geometric distribution Pb ( B = k) = bk/(1 + b)k+1 (k = 0,1,…) with b being the average burst size. For analytical consideration, we assume that on and off waiting times follow respectively Erlang distributions ψon (t; n) = ((λon n1 f)kon/Γ(kon))tkon−1e−λonn1t and ψoff (t; n) = ((λoffn2)koff/Γ(Koff))tkoff−1e−λoffn2t, and transcription and degradation waiting times follow respectively exponential distributions ψg(t; n) = μn2 e−μn2t and ψdeg (t; n) = λdeg me−λdegmt. Then, we can derive the analytical expressions of Ki (n) (i = 1, 2,3, 4) (see the Supplemental Online Material [48] for detail). Numerical results are demonstrated in Fig. 2(b-d).
From Fig. 2(c), we observe that if Lon = 1 is fixed, the mRNA mean is monotonically increasing in Loff but if Loff = 1 is fixed, it is monotonically decreasing in Lon. In both cases, the mRNA mean almost keep unchanged after a certain value of Lon or Loff. However, the change tendency for the rate of the mRNA variance over the square of the mRNA mean is opposite to that of the mRNA mean (comparing Fig. 2(d) with Fig. 2(c)). Figure 2(b) shows stationary mRNA distributions in three special cases, which can further verify the change tendency in Fig. 2(c) and 2(d). These analyses indicate that non-Markovity plays an unneglectable role in affecting gene expression.
Conclusions
-We have derived an exact sgCME and a sgLNA from the gCME for a general reaction network with arbitrary (exponential or non-exponential) waiting time distributions or/and with distributed delays. These derived equations allow one to retain analytical and/or numerical tractability, being general in scope, and thus of a potential applicability in a wide variety of problems that transcend pure physics applications. The derived sgCME is particularly useful in deriving stationary distributions in some sticky non-Markovian biochemical systems, as demonstrated in this article. The power of the sgCME can be enhanced by analyzing other examples such as non-Markovian random walks and diffusion on networks [56-63], and non-Markovian open quantum systems [64]. We expect that our analytical frameworks will be of use for studies of a variety of phenomena in biological and physical sciences, and indeed in other areas where individual-based models with general waiting time distributions and/or delayed interactions are relevant.
This work was supported by grants 91530320, 11775314, 11475273, and 11631005 from Natural Science Foundation of P. R. China; 2014CB964703 from Science and Technology Department, P. R. China; 201707010117 from the Science and Technology Program of Guangzhou, P. R. China.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].
- [7].
- [8].↵
- [9].↵
- [10].
- [11].↵
- [12].
- [13].
- [14].↵
- [15].↵
- [16].
- [17].
- [18].↵
- [19].↵
- [20].
- [21].
- [22].↵
- [23].↵
- [24].
- [25].
- [26].
- [27].
- [28].
- [29].↵
- [30].↵
- [31].↵
- [32].
- [33].↵
- [34].↵
- [35].
- [36].
- [37].
- [38].↵
- [39].↵
- [40].
- [41].
- [42].
- [43].
- [44].↵
- [45].↵
- [46].
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].
- [58].
- [59].
- [60].
- [61].
- [62].
- [63].↵
- [64].↵