Bayesian estimation for stochastic gene expression using multifidelity models

Huy D. Vo; Zachary Fox; Ania Baetica; Brian Munsky

doi:10.1101/468090

Abstract

The finite state projection (FSP) approach to solving the chemical master equation (CME) has enabled successful inference of discrete stochastic models to predict single-cell gene regulation dynamics. Unfortunately, the FSP approach is highly computationally intensive for all but the simplest models, an issue that is highly problematic when parameter inference and uncertainty quantification takes enormous numbers of parameter evaluations. To address this issue, we propose two new computational methods for the Bayesian inference of stochastic gene expression parameters given single-cell experiments. First, we present an adaptive scheme to improve parameter proposals for Metropolis-Hastings sampling using full FSP-based likelihood evaluations. We then formulate and verify an Adaptive Delayed Acceptance Metropolis-Hastings (ADAMH) algorithm to utilize with reduced Krylov-basis projections of the FSP. We test and compare both algorithms on three example models and simulated data to show that the ADAMH scheme achieves substantial speedup in comparison to the full FSP approach. By reducing the computational costs of parameter estimation, we expect the ADAMH approach to enable efficient data-driven estimation for more complex gene regulation models.

Introduction

An important goal of quantitative biology is to elucidate and predict the mechanisms of gene expression. Evidence increasingly suggests that gene expression processes are inherently stochastic with substantial cell-to-cell variability.^1–3 In an isogenic population with the same environmental factors, much of these fluctuations can be attributed to intrinsic chemical noise, which is captured well by the chemical master equation (CME).⁴ Predictive models for gene expression dynamics can be identified by fitting the solution of the CME to the empirical histogram of single-cell data at several experimental conditions or time-points.^5–8

The finite state projection (FSP),⁹ which approximates the dynamics of the CME with a finite system of linear ODEs, provides a framework to analyze full distributions of stochastic gene expression models with computable error bounds. It has been observed that the full distribution-based analyses using the FSP perform well, even when applied to realistically small experimental datasets on which summary statistics-based fits may fail.¹⁰ On the other hand, the FSP requires solving a large system of ODEs that grows quickly with the complexity of the gene expression network under consideration. Our present study borrows from model reduction strategies in other complex systems fields to alleviate this issue by reducing the computational cost of FSP-based parameter estimation.

There has been intensive research on efficient computational algorithms to quantify the uncertainty in complex models.¹¹ A particularly promising approach is to utilize multifidelity algorithms to systematically approximate the original system response. In these approximations, surrogate models or meta-models allow for various degrees of model fidelity (e.g., error compared to the exact model) in exchange for reductions in computational cost. Surrogate models generally fall into two categories: response surface and low-fidelity models.^12,13 We will focus on the second category that consists of reduced-order systems, which approximate the original high-dimensional dynamical system using either simplified physics or projections onto reduced order subspaces.^11,14,15 Reduced-order modeling has already begun to appear in the context of stochastic gene expression. When all model parameters are known, the CME can be reduced by system-theoretic methods,^16,17 sparse-grid/aggregation strategies,^18,19 tensor train representations^20–22 and hierarchical tensor formats.²³ Model reduction techniques have also been applied to parameter optimization by Waldherr and Hassdonk²⁴ who projected the CME onto a linear subspace spanned by a reduced basis, and Liao et al.²⁵ who approximated the CME with a Fokker-Planck equation that was projected onto the manifold of low-rank tensors.²⁶ While these previous works clearly show the promise of reduced-order modeling, there remains a vast reservoir of ideas from the broader computational science and engineering community that remain to be adapted to the quantitative analysis of stochastic gene expression.

In this paper, we introduce two efficient algorithms, which are based on the templates of the adaptive Metropolis algorithm²⁷ and the delayed acceptance Metropolis-Hastings (DAMH^28,29) algorithm, to sample the posterior distribution of gene expression parameters given single-cell data. The adaptive Metropolis approach automatically tunes parameter proposal distributions to more efficiently search spaces of unnormalized and correlated parameters. The DAMH provides a two-stage sampling approach that uses a cheap approximation to the posterior distribution at the first stage to quickly filter out unlikely parameters. Improvements to the DAMH allow algorithmic parameters to be updated adaptively and automatically by the DAMH chain.^30,31 The DAMH has been applied to the inference of stochastic chemical kinetics parameters from time-course data.³² Our algorithm is a modified version of DAMH that is specifically adapted to improve Bayesian inference from population snapshots of single-cell data, such as data arising from flow cytometry or fixed-cell microscopy experiments. We employ parametric reduced order models using Krylov-based projections,^33,34 which give an intuitive means to compute expensive FSP-based likelihood evaluations.^35,36 To improve the accuracy and the DAMH acceptance rate, we allow the reduced model to be refined during parameter space exploration. The resulting method, which we call the ADAMH-FSP-Krylov algorithm, is tested on three common gene expression models. We also provide a theoretical guarantee and numerical demonstrations that the proposed algorithms converge to equivalent target posterior distributions.

The organization of the paper is as follows. We review the background on the FSP analysis of single-cell data, and basic Markov chain Monte Carlo (MCMC) schemes in the Background section. In the Materials and Methods section, we introduce our method to generate reduced FSP models, as well as our way of monitoring and refining their accuracy. These reduced models give rise to an approximation to the true likelihood function, which is then employed to devise an Adaptive Delayed Acceptance Metropolis-Hastings with FSP-Krylov reduced models (ADAMH-FSP-Krylov). We make simple adjustments to the existing ADAMH variants in the literature to prove convergence, and we give the mathematical details in the supplementary materials. We provide empirical validation of our methods on three gene expression models, and we compare the efficiency and accuracy of the approaches in the Numerical Results section. Interestingly, we find empirically that the reduced model learned through the ADAMH run could fully substitute the original FSP model in a Metropolis-Hastings run without incurring a large difference in the sampling results. Finally, we conclude with a discussion of future work and the potential of computational science and engineering tools to analyze stochastic gene expression.

Background

Stochastic modeling of gene expression and the chemical master equation

Consider a well-mixed biochemical system with N ≥ 1 different chemical species that are interacting via M ≥ 1 chemical reactions. Assuming constant temperature and volume, the time-evolution of this system can be modeled by a continuous-time Markov process.⁴ The state space of the Markov process consists of integral vectors x ≡ (x₁,…,x_N)^T, where x_i is the population of the ith species. Each reaction channel, such as the transcription of an RNA species, is characterized by a stoichiometric vector ν_j (j = 1,…, M) that represents the change when the reaction occurs; if the system is in state x and reaction j occurs, then the system transitions to state x + ν_j. Given x(t) = x, the propensity α_j(x; θ)dt determines the probability that reaction j occurs in the next infinitesimal time interval [t, t + dt), where θ is the vector of model parameters.

Since the state space is discrete, we can index the states as x₁,…,x_n,… The time-evolution of the probability distribution of the Markov process is the solution of the linear system of differential equations known as the chemical master equation (CME): where the probability mass vector p = (p₁,p₂,…)^T is such that each component, p_i = P(t, x_i) = Prob{x(t) = x_i}, describes the probability of being at state x_i at time t, for i = 1,…, n. The vector p₀ = p(0) is an initial probability distribution and A(θ) is the infinitesimal generator of the Markov process. Here, we have made explicit the dependence of A on the model parameter vector θ, which is often inferred from experimental data.

Finite State Projection

The state space of the CME could be infinite or extremely large. To alleviate this problem, the finite state projection (FSP ⁹) was introduced to truncate the state space to a finite size. In the simplest FSP formulation, the state space is restricted to a hyper-rectangle where the n_k are the maximum copy numbers of the chemical species.

The infinite-dimensional matrix A and vector p in eq. (1) are replaced by the corresponding submatrix and subvector. When the bounds n_k are chosen sufficiently large and the propensities satisfy some regularity conditions, the gap between the FSP and the original CME is negligible and computable.^9,37 Throughout this paper, we assume that the bounds n_k have been chosen appropriately and that the FSP serves as a high-fidelity model of the gene expression dynamics of interest. Our goal is to construct lower-fidelity models of the FSP using model order reduction and incorporate these reduced models in the uncertainty analysis for gene expression parameters.

Bayesian inference from single-cell data

Data from smFISH experiments^5,8,38,39 consist of several snapshots of many independent cells taken at discrete times t₁,…,t_T. The snapshot at time t_i records gene expression in n_i cells, each of which can be collected in the data vector c_j,i, j = 1,…, n_i of molecular populations in cell j at time t_i. Let p(t, x|θ) denote the entry of the FSP solution corresponding to state x at time t, with model parameters θ. The FSP-based approximation to the log-likelihood of the data set 𝓓 given parameter vector θ is given by

It is clear that when the FSP solution converges to the true solution of the CME, the FSP-based log-likelihood converges to the true data likelihood. The posterior distribution of model parameters θ given the data set 𝓓 then takes the form where f₀ is the prior density that quantifies prior knowledge and beliefs about the parameters. When f₀ is a constant, the parameters that maximize the posterior density are equivalent to the maximum likelihood estimator. However, we also want to quantify our uncertainty regarding the accuracy of the parameter fit, and the MCMC framework provides a way to address this by sampling from the posterior distribution.

For convenience, we limit our current discussion to models and inference problems that have the following characteristics:

The matrix A(θ) can be decomposed into where g_j are continuous functions and A_j are independent of the parameters.
The support of the prior is contained in a bounded domain of the form

The first assumption means that the CME matrix depends “linearly” on the parameters, ensuring the efficient assembly of the parameter-dependent matrix. In particular, the factors A_j can be computed and stored in the offline phase before parameter exploration and only a few (sparse) matrix additions are required to compute A(θ) in the online phase. When there are nonlinear dependence on parameters, more sophisticated methods such as the Discrete Empirical Interpolation method⁴⁰ could be applied, but we leave this development for future work in order to focus more on the parameter sampling aspect. Nevertheless, condition (4) covers an important class of models, including all models defined by mass-action kinetics. The second assumption means that the support of the posterior distribution is a bounded and well-behaved domain (in mathematical terms, a compact set). This allows us to derive convergence theorems more straightforwardly. In practice, condition (5) is not a severe restriction since it can be interpreted as the prior belief that physical parameters cannot assume infinite values.

The Metropolis-Hastings and the adaptive Metropolis algorithms

The Metropolis-Hastings (MH) Algorithm^41,42 is one of the most popular methods to sample from a multivariate probability distribution (Algorithm 1). The basic idea of the MH is to generate a Markov chain whose limiting distribution is the target distribution. To do so, the algorithm includes a probabilistic acceptance/rejection step. More precisely, let f denote the target probability density. Assume the chain is at state θ_i at step i. Let θ′ be a proposal from the pre-specified proposal density q(.|θ_i). The DAMH computes a first-step acceptance probability of the form to decide whether to accept θ′ as the next state of the chain. If θ′ fails to be promoted, the algorithm moves on to the next iteration with θ_i+1 := θ_i.

There could be many choices for the proposal density q (for example, see the survey of Roberts and Rosenthal⁴³). We will consider only the symmetric case where q is a Gaussian, that is, where Σ is a positive definite matrix that determines the covariance of the proposal distribution. With this choice, the MH reduces to the original Metropolis Algorithm.⁴¹ For gene expression models, the MH has been combined with the FSP for parameter inference and model selection in several studies.^8,10

The appropriate choice of Σ is crucial for the performance of the Metropolis algorithm. Haario et al.²⁷ proposes an Adaptive Metropolis (AM) algorithm in which the proposal Σ is updated at every step using the values visited by the chain. This is the version that we will implement for sampling the posterior distribution with the full FSP model. In particular, let θ₁,…, θ_i be the samples accepted so far, the AM updates the proposal covariance using the formula

Here, the function Cov returns the sample covariances. The constant s_d is assigned the value (2.4)²/d following Haario et al.²⁷ The matrix Σ₀ is an initial choice for the Gaussian proposal density, and n₀ is the number of initial steps without proposal adaptations. Using the adaptive Metropolis allows for more efficient search over un-normalized and correlated parameters spaces and eliminates the need for the user to manually tune the algorithmic parameters. In the numerical results that we will show, the adaptive Metropolis results in reasonable acceptance rates (19% – 23.4%). Although non-adaptive MH algorithms have been consider in the past,^8,10 to the best of our knowledge, this is the first adaptive MH algorithm to be proposed for Bayesian inference of gene expression models.

Materials and Methods

Delayed acceptance Metropolis-Hastings algorithm

Previous applications of the MH to gene expression have required 10⁴ to 10⁶ or more iterations per combination of model and data set,¹⁰ and computational cost is a significant issue when sampling from a high-dimensional distribution whose density is expensive to evaluate. A practical rule of thumb for balancing between exploration and exploitation for a MH algorithm with the Gaussian proposal is to have an acceptance rate close to 0.234, which was derived by Roberts et al.⁴⁴ as the asymptotically optimal acceptance rate for random walk MH algorithms. Assuming the proposal density of Algorithm 1 is tuned to have an acceptance rate of approximately 23.4%, one could achieve significant improvement to computation time if one can quickly screen out the remaining rejected proposals without evaluating the expensive posterior density.

Algorithm 1

Metropolis-Hastings

The delayed acceptance Metropolis-Hasting (DAMH)²⁸ seeks to alleviate the computational burden of rejections in the original MH by employing a rejection step based on a cheap approximation to the target density (cf. Algorithm 2). Specifically, let f (.) be the density of the target distribution of the parameter θ. Let be a cheap state-dependent approximation to f. At iteration i, let θ′ be a proposal from the current parameter θ using a pre-specified proposal density q(.|.). The DAMH promotes θ′ as a potential candidate for acceptance with probability

If θ′ fails to be promoted, the algorithm moves on to the next iteration with θ_i+1 := θ_i. If the θ′ passes the first inexpensive check, than a second acceptance probability is computed using the formula and the DAMH algorithm accepts θ′ for the next step with probability β. In this manner, much computational savings can be expected if unlikely proposals are quickly rejected in the first step, leaving only the most promising candidates for careful evaluation in the second step. Christen and Fox show that the ADAMH converges to the target distribution under conditions that are easily met in practice.²⁸ However, the quality of the approximation affects the overall efficiency. Poor approximations lead to many false promotions of parameters that are rejected at the expensive second step. On the other hand, the first step may falsely reject parameters that could have been accepted using the accurate log-likelihood evaluation. This leads to subsequent developments that seek appropriate approximations and ways to adapt these approximations to improve the performance of DAMH in specific applications.^30,45 Specifically, the adaptive DAMH variant in Cui et al., 2014⁴⁵ formulates via reduced basis models that can be updated on the fly using samples accepted by the chain. The adaptive version in Cui et al., 2011,³⁰ allows adaptations for the proposal density and the error model, with convergence guarantees.³¹ We will borrow these elements in our sampling scheme that we introduce below. However, the stochastic gene expression models that we investigate here differ from the models studied in those previous contexts, since our likelihood function incorporates intrinsic discrete state variability instead of external Gaussian noise.

Reduced-order models for the FSP dynamics

Projection-based model reduction

We approximate the full parameter-dependent FSP dynamics, with a sequence of reduced-order dynamics,

Algorithm 2

Delayed Acceptance Metropolis-Hastings

Here, i = 1,…, n_B indexes the user-specified subintervals [t_i−i, t_i] with t₀ = 0. Each matrix Φ⁽ⁱ⁾ ∈ ℝ^n×r_i, r_i ≤ n, has orthonormal columns that span the subspace onto which we project the full dynamics. Equation (8) implies that the solution at a previous time interval will be projected onto the subspace of the next interval. While this introduces some extra errors, subdividing the long time interval helps to reduce the subspace dimensions for systems with complicated dynamics. Given an ordered set of reduced bases the approximations to the full distributions are given by

Under assumption (4), the reduced system matrices B⁽ⁱ⁾(θ) in eq. (7) can be decomposed as where This decomposition allows us to assemble the reduced systems quickly with complexity.

We build the reduced basis for the parameter-dependent dynamics by concatenation (see, e.g., Benner et al.¹⁵). Specifically, we assume that for any fixed parameter 6, we can construct a set of orthogonal basis matrices. We can sample different bases from a finite set of ‘training’ parameters θ₁,…, θ_{n_train}. Then, through the iterative updates we obtain the bases Φ⁽ⁱ⁾ = Φ^(i,n_train) that provide global approximations for the full dynamical system across the parameter domain. The operation Gram-Schmidt implies that the columns in V⁽ⁱ⁾ (θ_j) are orthogonalized against the columns in Φ^(i,j−1) to produce a new matrix with orthonormal columns.

Krylov subspace approximation for single-parameter model reduction

Consider a fixed parameter combination θ. Let the time points 0 < t₁ < … < t_B = t_f be given. Using a high-fidelity solver, we can compute the full solution at those time points, and we let p_i denote the full solution at time t_i. Our aim is to construct a sequence of orthogonal matrices V⁽ⁱ⁾ ≡ V⁽ⁱ⁾(θ) with i = 1… B such that the full model dynamics at the parameter θ on the time interval [t_i−1,t_i] is well-approximated by a projected reduced model on the span of V⁽ⁱ⁾.

A simple and effective way to construct the reduced bases is to choose V⁽ⁱ⁾ as the orthogonal basis of the Krylov subspace

In order to determine the subspace dimension m_i, we use the error series derived by Saad³³ which we reproduce here using our notation as

Here, are the outputs at step m_i of the Arnoldi procedure (Algorithm 10.5.1 in Golub and Van Loan⁴⁶) to build the orthogonal matrix V⁽ⁱ⁾, where φ_k(X) = for any square matrix X. The matrix H⁽ⁱ⁾ = (V⁽ⁱ⁾)^T A(θ)V⁽ⁱ⁾ is the state matrix of the reduced-order system obtained via projecting A onto the Krylov subspace K_{m_i}. The terms can be computed efficiently using Expokit (Theorem 1, Sidje³⁴). We use the Euclidean norm of the first term of the series (14) as an indicator for the model reduction error. Given an error tolerance ε_Krylov, we iteratively construct the Krylov basis V⁽ⁱ⁾ with increasing dimension until the error per unit time step of the reduced model falls below the tolerance, that is,

Adaptive Delayed Acceptance Metropolis with reduced-order models of the CME

The approximate log-likelihood formula

The reduced bases described above allow us to find reduced-cost approximations p ≈ p_Φ to the full FSP dynamics. We can then approximate the full log-likelihood of single-cell data in equation (3) by the reduced-model-based log-likelihood where ε_s is a small constant, chosen to safeguard against undefined values. We need to include ε_s in our approximation since the entries of the reduced-order approximation are not guaranteed to be positive (not even in exact arithmetic). We aim to make the approximation to be accurate for parameters θ with high posterior density, and crude on those with low density, which should be visited rarely by the Monte Carlo chain.

One can readily plug in the approximation (16) to the DAMH algorithm. Since 0 for all the chain will eventually converge to the target posterior distribution (Theorem 1 in Christen and Fox,²⁸ and Theorem 3.2 in Efendiev et al.²⁹). On the other hand, a major problem with the DAMH is that the computational efficiency depends on the quality of the reduced basis approximation. Crude models result in high rejection rates at the second stage, thus increasing sample correlation and computation time. Therefore, it is advantageous to fine-tune the parameters of the algorithm and update the reduced models adaptively to ensure a reasonable acceptance rate. This motivates the adaptive version of the DAMH that we discuss next.

Delayed acceptance posterior sampling with infinite model adaptations

We propose an adaptive version of the DAMH for sampling from the posterior density of the CME parameters given single-cell data (Algorithm 3). We have borrowed elements from the adaptive DAMH algorithms in Cui et al.^30,45 The first step proposal uses an adaptive Gaussian similar to the adaptive Metropolis of Haario et al.,²⁷ where the covariance matrix is updated at every step from the samples accepted so far. Here, we generate the proposals in log₁₀ space.

The reduced bases are updated as the chain explores the parameter domain. Instead of using a finite adaptation criterion to stop model adaptation as in Cui et al.,⁴⁵ we introduce an adaptation probability with which the reduced basis updates are considered. This means that there could be an infinite amount of model adaptations that occur with diminishing probability as the chain progresses. This idea is taken from the “doubly-modified example” in Roberts and Rosenthal.⁴⁷ The advantage of the probabilistic adaptation criteria is that it allows us to prove ergodicity for the adaptive algorithm. The mathematical proofs are presented in the Appendix.

The adaptation probability a(i) is chosen to converge to 0 as the chain iteration index i increases. In particular, we use where I₀ is a user-specified constant. This formula means that the probability for an adaptation to occur decreases by half after every I₀ chain iterations. In addition, we further restrict the adaptation to occur only when the error indicator is above a threshold at the proposed parameters. As a consequence of our model updating criteria, the reduced-order bases will be selected at points that are close to the support of the target posterior distribution.

Algorithm 3

ADAMH-FSP-Krylov

Numerical Results

We conduct numerical tests on several stochastic gene expression models to study performance of our proposed Algorithms. The test platform is a desktop computer running Linux Mint and MATLAB 2017a, with 32 GB RAM and Intel Core i7 3.4 GHz quad-core processor.

We compare three sampling algorithms:

Adaptive Metropolis-Hastings with full FSP-based likelihood evaluations (AMH-FSP): This version is an adaptation of the Adaptive Metropolis of Haario et al.,²⁷ which updates the covariance of the Gaussian proposal density at every step. The algorithm always uses the FSP-based likelihood (3) to compute the acceptance probability, and it is solved using the Krylov-based Expokit.³⁴ This is the reference algorithm by which we assess the accuracy and performance of the other sampling schemes. To the best of our knowledge, such an adaptive Metropolis scheme has not been used elsewhere for gene expression models.
Adaptive Delayed Acceptance Metropolis-Hastings with reduced FSP model constructed from Krylov subspace projections (ADAMH-FSP-Krylov): This is Algorithm 3 mentioned above. Similar to AMH-FSP, this algorithm uses a Gaussian proposal with an adaptive covariance matrix. However, it has a first-stage rejection step that employs the reduced model constructed adaptively using Krylov-based projection.
Adaptive Metropolis-Hastings with only reduced model-based likelihood evaluations (AMH-ROM): This is similar to AMH-FSP, but we instead use the approximate log-likelihood formula (16). The reduced model is constructed during the run of the ADAMH-FSP-Krylov, and therefore this variant can only be executed after the ADAMH-FSP-Krylov has terminated. We include this variant here in order to study the accuracy and potential speedup when leaving the acceptance/rejection decision fully to the reduced model.

We rely on two metrics for performance evaluation: total CPU time to finish each chain, and the multivariate effective sample size as formulated in Vats et al.⁴⁸ Given samples θ₁,…, θ_n, the multivariate effective sample size is estimated by where Λ_n is an estimation of the posterior covariance using the sample covariance, and Σ_n the multivariate batch means estimator. An algorithm, whose posterior distribution matches the full FSP implementation, but with a lower ratio of CPU time per (multivariate) effective sample will be deemed more efficient. We use the MATLAB implementation by Luigi Acerbi⁴⁹ for evaluating the effective sample size from the MCMC outputs.

Implementation details

To achieve reproducible results for each example, we reset the random number generator to Mersenne Twister with seed 0 in Matlab using the rng(‘default’) command before simulating the single-cell observations with Gillespie’s Algorithm⁵⁰ and running the ADAMH-Krylov-FSP and AMH-FSP chains. The random seed is then set to the ‘default’ value again before running the AMH-ROM chain.

Two-state gene expression

We first consider the common model of bursting gene expression^39,51–54 with a gene that can switch between ON and OFF states and an RNA species that is transcribed when the gene is switched on (Table 1). We simulate data at ten equally spaced time points from 0.1 to 1 hour, with 200 independent observations per time point. The gene states are assumed to be unobserved. We generate the reduced bases on subintervals generated by the time points in the set where Δt_data = 0.1hr and Agasis = 0.01hr. Thus, Δt_basis includes the observation times. We choose the basis update threshold as δ = 10⁻⁴. The prior distribution in our test is the log-uniform distribution on a rectangle, whose bounds are given in Table 2. The full FSP state space is chosen as

View this table:

Table 1:

Two-state gene expression reactions and propensities. We assume that the time unit is hours (hr). Hence, parameters’ units are hr⁻¹. ([X] is the number of copies of the species X.)

View this table:

Table 2:

Two-state gene expression example. Bounds on the support of the prior distribution of the parameters, which we choose to be the uniform distribution. Parameter units are hr⁻¹.

We choose a starting point for the sampling algorithms using five iterations of MATLAB’s genetic algorithm with a population size of 100, resulting in 600 full FSP evaluations. We then refine the output of the genetic algorithm with a local search using fmincon with a maximum of 1000 further evaluations of the full model. This is a negligible cost in comparison to the 10, 000 iterations that we set for the sampling algorithms.

We summarize the performance characteristics of the sampling schemes in Table 3. The ADAMH-FSP-Krylov requires less computational time (Fig. 1) without a significant reduction in the multivariate effective sample size. In terms of computational time, the ADAMH-FSP-Krylov takes less time to generate an independent sample. This is partly explained by observing that the first stage of the scheme filters out many unlikely samples with the efficient approximation, resulting in 78.34% fewer full evaluations in the second stage (cf. Table 3).

Figure 1:

Two-state gene expression example. (A) CPU time vs number of iterations for a sample run of the ADAMH-FSP-Krylov and the AMH-FSP. (B) Scatterplot of the unnomarlized log-posterior evaluated using the full FSP and the reduced model. Notice that the approximate and true values are almost identical with a correlation coefficient of approximately 0.99853. (C) Distribution of the relative error in the approximate logposterior evaluations at the parameters accepted by the ADAMH chain.

View this table:

Table 3:

Two-state gene expression example. Performance of the Adaptive Delayed Acceptance Metropolis-Hastings with the Krylov-based reduced model (ADAMH-FSP-Krylov), the Adaptive Metropolis-Hastings with full FSP (AMH-FSP), and the Adaptive MetropolisHastings with the reduced model constructed by ADAMH-FSP-Krylov (AMH-ROM). The total chain length for each algorithm is 10000.

We observe from the scatterplot of log-posterior values of the parameters accepted by the ADAMH-FSP-Krylov that the reduced model evaluations are very close to the FSP evaluations, with the majority of the approximate log-posterior values having a relative error below 10⁻⁴, with an average of 1.09 × 10⁻⁶ and a median of 8.49 × 10⁻⁸ across all 2152 accepted parameter combinations (Fig. 1 C). This accuracy is achieved with a reduced set of no more than 168 basis vectors per time subinterval that was built using solutions from only four sampled parameter combinations (Fig. 2). All the basis updates occur during the first tenth portion of the chain, and these updates consume less than one percent of the total chain runtime (Table 4).

Figure 2:

Two-state gene expression example. (A) Estimations of the marginal posterior distribution of the parameter k_r using the Adaptive Delayed Acceptance Metropolis-Hastings with Krylov reduced model (ADAMH-FSP-Krylov) and the Adaptive Metropolis-Hastings with full FSP (AMH-FSP). (B) 2-D projections of parameter combinations accepted by the ADAMH scheme (blue) and parameter combinations used for reduced model construction (red).

View this table:

Table 4:

Two-state gene expression example. Breakdown of CPU time spent in the main components of ADAMH-FSP-Krylov.

From the samples obtained by the ADAMH-Krylov-FSP, we found that full and reduced FSP evaluation take approximately 0.25 and 0.09 seconds on average, allowing for a maximal speedup factor of approximately 100(0.25 – 0.09)/0.25 ≈ 65.73% for the current model reduction scheme. Here, the term reduced model refers to the final reduced model obtained from the adaptive reduced basis update of the ADAMH-Krylov-FSP. The speedup offered by the ADAMH-Krylov-FSP was found to be 100(2497.70 – 1424.32)/2497.70 ≈ 42.97%, or approximately two thirds the maximal achievable improvement for the current model reduction scheme. To further investigate the speed and quality of the reduced model learned from the ADAMH-FSP-Krylov run, we performed another run of the adaptive Metropolis-Hastings algorithm with the log-likelihood evaluated solely using the reduced model constructed by the ADAMH-FSP-Krylov. Interestingly, we observe almost identical results using the reduced model alone in comparison to using the full model (Fig. 2 and Table 5), and the 65.03% reduction in computational effort matched very well to the maximal estimated improvement.

View this table:

Table 5:

Two-state gene expression example. True parameter values and the average values of the parameters visited by the ADAMH-FSP-Krylov and AMH-FSP chains. The “Start” column shows the starting point of both MCMC chains. This starting point is obtained from a numerical optimization procedure that seeks to maximize the full log-likelihood in equation (3). Parameter values are shown in log₁₀ scale. All parameters have unit hr⁻¹.

A gene expression model with spatial components

We consider an extension of the previous model to distinguish between the nucleus and cytoplasmic compartments in the cell, similar to a stochastic model recently considered for MAPK-activated gene expression dynamics in yeast.¹⁰ The gene can transition between four states {0,1, 2, 3} with transcription activated when the gene state is in states 1 to 3. RNA is transcribed in the nucleus and later transported to the cytoplasm as a first order reaction.

These cellular processes and the degradation of RNA in both spatial compartments are modeled by a reaction network with six reactions and three species (Table 6).

View this table:

Table 6:

Spatial gene expression reactions and propensities. The gene is considered as one species with 4 different states G_i, i = 0,…, 3. We assume that the time unit is seconds (sec). Hence, the parameters’ units are sec⁻¹

We simulated a data set of 200 single-cell measurements at five equally-spaced time points between 1 min and 10 min, that is, T_data = {2, 4, 6, 8,10} (min). The time points for generating the basis are T_basis = T_data ∪ {j × 0.2 min, j = 1,…, 50}. We chose the basis update threshold as δ = 10⁻⁴. The prior distribution in our test is the log-uniform distribution on a rectangle, whose bounds are given in Table 7. The full FSP state space is chosen as

View this table:

Table 7:

Spatial gene expression. Bounds on the support of the prior distribution of the parameters, which we choose to be the log-uniform distribution. All parameters have the same unit sec⁻¹.

To find the starting point for the chains, we run five generations of MATLAB’s genetic algorithm (implemented in the function ga) with 600 full FSP evaluations. Then, we run another 500 steps of fmincon to refine the output of the ga solver. Using the parameter vector output obtained by this combined optimization scheme as the initial sample, we run both the ADAMH-FSP-Krylov and the AMH-FSP for 10, 000 iterations.

The acceleration obtained by using the reduced model is quite evident, with the ADAMH generating an effective sample about twice as fast as the AMH (Table 8). The log-posterior evaluations from the reduced model are accurate (Fig. 3 C and Table 9), with relative error below the algorithmic tolerance of 10⁻⁴, with a mean of 1.11 × 10⁻⁵ and a median of 6.98 × 10⁻⁶. This accurate model was built automatically by the ADAMH scheme using just 18 points in the parameter space (Fig. 4), resulting in a set of no more than 438 vectors per time subinterval. All the basis updates occur during the first fifth portion of the chain, and these updates consume about 11.25% of the total runtime (Table 10). The high accuracy of the posterior approximation translates into a very high second-stage acceptance of 96.15% of the proposals promoted by the first-stage reduced-model-based evaluation. Such high acceptance rates in the second stage are crucial to the efficiency for the delayed acceptance scheme, since almost all of the expensive FSP evaluations are accepted.³⁰

Figure 3:

Spatial gene expression. (A) CPU time vs number of iterations for a sample run of the ADAMH-FSP-Krylov and the AMH-FSP. (B) Scatterplot of the unnomarlized logposterior evaluated using the full FSP and the reduced model. Notice that the approximate and true values are almost identical with a correlation coefficient of approximately 0.9979. (C) Distribution of the relative error in the approximate log-posterior evaluations at the parameters accepted by the ADAMH chain.

Figure 4:

Spatial gene expression. (A) Estimations of the marginal posterior distribution of the parameter k_r using the Adaptive Delayed Acceptance Metropolis-Hastings with the Krylov reduced model (ADAMH-FSP-Krylov), the Adaptive Metropolis-Hastings with full FSP (AMH-FSP), and the approximate Adaptive Metropolis-Hastings using the reduced order model learned from the ADAMH-FSP-Krylov run (AMH-ROM). The dashed vertical line marks the true parameter value. (B) 2-D projections of parameter combinations accepted by the ADAMH scheme (blue) and parameter combinations used for reduced model construction (red). The truncated appearance of the samples is the consequence of the upper bound on the support of the prior (see Table 7).

View this table:

Table 8:

Spatial gene expression. Performance of the Adaptive Delayed Acceptance Metropolis-Hastings with Krylov-based reduced model (ADAMH-FSP-Krylov) vs the Adaptive Metropolis-Hastings with full FSP (AMH-FSP). The total chain length for each algorithm is 100, 000. The ADAMH-FSP-Krylov scheme uses markedly fewer full evaluations than the AMH-FSP scheme.

View this table:

Table 9:

Spatial gene expression. True parameter values and the average values of the parameters visited by the ADAMH-FSP-Krylov and AMH-FSP chains. The “Start” column shows the starting point of both MCMC chains. This starting point is obtained from a numerical optimization procedure that seeks to maximize the full log-likelihood in equation (3). Parameter values are shown in log₁₀ scale. All parameters have unit sec⁻¹.

View this table:

Table 10:

Spatial gene expression example. Breakdown of CPU time spent in the main components of ADAMH-FSP-Krylov.

The close agreement between the first and second stage of the ADAMH algorithm suggests that the reduced model constructed by ADAMH can provide a reliable substitute of the full model. Upon finishing the ADAMH chain, we run another chain with 10, 000 iterations using only the reduced-model-based evaluations, where the reduced model is the final model output from the ADAMH-Krylov-FSP run. We observe that the marginal posterior distributions sampled from this chain are not markedly different from the results of the other two chains (see Fig. (4) for a representative example).

From the posterior samples of the ADAMH chain, we estimate that an average full FSP evaluation would take 1.31 seconds, while an average reduced model evaluation takes 0.30 seconds, leading to an average speedup (in terms of total CPU time) of approximately 77.35%. The comparative runtimes shown in Table 8 confirms this estimate, with the AMH-ROM taking about 76.58% less time than the AMH-FSP chain. The speed up of the ADAMH-Krylov-FSP was comparable at approximately 45.91%.

Genetic toggle switch

The final model we consider in our numerical tests is the nonlinear genetic toggle switch⁵⁵ with the propensity functions listed in Table 11. We use the same parameters as those in Fox and Munsky.⁵⁶ Using the stochastic simulations and the ‘true’ parameters as given in Table 12, we generate data at 2, 6 and 8 hours, each with 500 single-cell samples. To build the reduced bases for the FSP reduction, we use the union of ten equally-spaced points between zero and 8 hrs and the time points of observations. The prior distribution in our test was chosen as the log-uniform distribution on a rectangle, whose bounds are given in Table 13. The full FSP size is set as the rectangle {0,…, 100} × {0,…, 100}, corresponding to 10,201 states.

View this table:

Table 11:

Reaction channels of the genetic toggle switch example. We assume that the time unit is seconds (sec). Hence, the unit for the parameters k_0X, k_1X, γx, k_0y, k_1y, γ_Y are sec⁻¹. The other parameters (dimensionless) are fixed at a_yx = 2.6 × 10⁻³, a_xy = 6.1 × 10⁻³, n_yx = 3, n_xy 2.1

View this table:

Table 12:

Genetic toggle switch. True parameter values and the average values of the parameters visited by the ADAMH-FSP-Krylov and AMH-FSP chains. The “Start” column shows the starting point of both MCMC chains. This starting point is obtained from a numerical optimization procedure that seeks to maximize the full log-likelihood (3). Parameter values are shown in log₁₀ scale. All parameters have unit sec⁻¹.

View this table:

Table 13:

Genetic toggle switch. Bounds on the support of the log-uniform prior. Parameters have the same unit sec⁻¹.

To find the starting point for the chains, we run five generations of MATLAB’s genetic algorithm with 600 full FSP evaluations. Then, we run another 1000 iterations of fmincon to refine the output of the ga solver. Using the parameter vector output by this combined optimization scheme as initial sample, we run both the ADAMH-FSP-Krylov and the AMH-FSP for 100, 000 iterations.

The efficiency of the ADAMH-Krylov-FSP is confirmed in Table 14, where the delayed acceptance scheme is 37.16% faster than the AMH-FSP algorithm, compared to a maximum potential savings of 59.82% when exclusively using the reduced FSP model.

View this table:

Table 14:

Genetic toggle switch example. Performance of the Adaptive Delayed Acceptance Metropolis-Hastings with Krylov-based reduced model (ADAMH-FSP-Krylov) vs the Adaptive Metropolis-Hastings with full FSP (AMH-FSP). The total chain length for each algorithm was 100, 000. The ADAMH-FSP-Krylov scheme uses markedly fewer full evaluations than the AMH-FSP scheme, and 98.36% of the parameters promoted by the first-stage are accepted in the second stage.

View this table:

Table 15:

Genetic toggle switch example. Breakdown of CPU time spent in the main components of ADAMH-FSP-Krylov.

Similar to the last two examples, we observe a close agreement between the first and second stage of the ADAMH run, where 98.36% of the proposals promoted by the reduced-model-based evaluations are accepted by the full-FSP-based evaluation. This high second-stage acceptance rate is explained by the quality of the reduced model in approximating the log-posterior values (Fig. 5 C). We also ran another chain using the reduced model outputted by the ADAMH, which yields similar results to the reference chain (Fig. 6) but with reduced computational time (Table 14). The accurate reduced model consists of no more than 634 basis vectors per time subinterval, with all the basis updates occurring during the first tenth portion of the chain.

Figure 5:

Genetic toggle switch example. (A) CPU time vs number of iterations for a sample run of the ADAMH-FSP-Krylov and the AMH-FSP. (B) Scatterplot of the unnomarlized logposterior evaluated using the full FSP and the reduced model. Notice that the approximate and true values are almost identical with a correlation coefficient of approximately 0.9993. (C) Distribution of the relative error in the approximate log-posterior evaluations at the parameters accepted by the ADAMH chain.

Figure 6:

Genetic toggle switch example. (A) Estimations of the marginal posterior distribution of the parameter γ_X using the Adaptive Delayed Acceptance Metropolis-Hastings with Krylov reduced model (ADAMH-FSP-Krylov), the Adaptive Metropolis-Hastings with full FSP (AMH-FSP), and the approximate Adaptive Metropolis-Hastings using the reduced order model learned from the ADAMH-FSP-Krylov run (AMH-ROM). The dashed vertical line marks the true parameter value. (B) 2-D projections of parameter combinations accepted by the ADAMH scheme (blue) and parameter combinations used for reduced model construction (red).

From the samples obtained by the ADAMH, we found that Expokit takes 0.42 sec to solve the full FSP model and 0.17 sec to solve the reduced model.

Discussion and concluding remarks

There is a clear need for efficient computational algorithms for the uncertainty analysis of gene expression models. In this work, we proposed and investigated new approaches for Bayesian parameter inference of stochastic gene expression parameters from single-cell data that employ adaptive tuning of proposal distributions in addition to delayed acceptance MCMC and reduced-order modeling. Numerical tests confirm that the reduced model can be used to significantly speed up the sampling process without incurring much loss in accuracy.

A surprising observation from our numerical results is that once trained, the reduced model constructed by the ADAMH-FSP-Krylov closely matches the original FSP sampling results. This suggests that the ADAMH-FSP-Krylov algorithm could be used as a data-driven method to learn reduced representations of the full FSP-based model, which could then be successfully substituted for the full FSP model in subsequent Bayesian updates. In other words, it could be equally accurate but more efficient to cease full FSP evaluations in the ADAMH scheme once confident about the accuracy of the reduced model. In our numerical tests, the ADAMH updates completed first 10-20% of the MCMC chain, at which point the remaining chain could have been sampled using only the reduced model. Perhaps other approaches to substitute function approximations into the expensive likelihood evaluations^57,58 could provide additional insights to the reduced order modeling approximations we have used.

While we have achieved a significant reduction in computational time with our implementation of the Krylov subspace projection, other model reduction algorithms may yet improve this performance.⁵⁹ For example, the reduced models discovered here achieved levels of accuracy (i.e., relative errors of 10⁻⁸ or less) that are much higher than one would expect to be necessary to compare models in light of far less accurate data. In light of this finding and the fact that parameter discrimination can be achieved at different levels of accuracy for different combinations of models and data,⁶⁰ we suspect that it could be advantageous to build less accurate models that can be evaluated in less time.

Our present work assumes the full FSP-based solution can be computed for use to learn the reduced model bases and to evaluate the second stage likelihood in the ADAMH-FSP-Krylov algorithm. For many problems, the required FSP state space can be so large that it would be impossible even to keep the full model in computer memory. Representing the FSP model in a low-rank tensor format²⁰ is a promising approach that we plan to investigate in order to overcome this limitation. Our current work has focused on using reduced models for uncertainty quantification, but the equally important task of finding optimal parameter fits should also benefit from reduced order modeling. For example, techniques from other engineering fields, such as trust-region methods,⁶¹ may provide valuable improvements to infer stochastic models from gene expression data. In time, a wealth of algorithms and insight remains to be gained by adapting computational technology from the broader computational science and engineering communities to analyze stochastic gene expression.

Acknowledgments

Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award numbers R25GM105608 and R35GM124747. The work reported here was partially supported by a National Science Foundation grant (DGE-1450032). Any opinions, findings, conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of the National Science Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Appendix: Mathematical proofs

Preliminaries on adaptive MCMC algorithms

We will derive ergodicity results in the following sections based on Theorem 1 in the paper of Roberts and Rosenthal,⁴⁷ and we will use some proof techniques of Theorem 1 from Cui et al.³¹ for part of our analysis. All random variables we will discuss below will be of the form X: Ω → 𝓧 where 𝓧 is a metric space with the associated Borel σ-algebra B(𝓧).

Let 𝓧 be the parameter space, assumed to have a metric space topology, and π: B(𝓧) → [0,1] the target distribution to be sampled from by an adaptive MCMC algorithm. We will assume that π has a density f: 𝓧 → [0, ∞). Let K_γ denote a transition kernel that depends on an adaptation index γ ∈ 𝓨, and assume that each K_γ has π as an invariant distribution. We assume that for each fixed γ, an MCMC algorithm with K_γ as the Markov transition kernel will eventually converge to π, that is where ∥μ − ν∥_TV = sup_B∈B(𝓧) |μ(B) − ν(B)| is the total variation distance between two probability measures on 𝓧.

Let X_n be the random variable representing the state of the adaptive MCMC at iteration n, and let Γ_n be the random variable representing the choice of kernel for updating from X_n to X_n+1. The state of the algorithm is then modeled by the discrete-time stochastic process {(X_n, Γ_n)}, whose transition between steps is determined by the underlying rules of the algorithm. Finally, let 𝓖_n = σ ({X₀,…, X_n, Γ₀,…, Γ_n}) denote the filtration generated by {(X_n, Γ_n)}. Thus, each Γ_n+1 is a 𝓖_n+1 -measurable random variable.

Roberts and Rosenthal proved the following important result, which gives sufficient conditions for ergodicity of an adaptive MCMC.

Theorem 0.1

(Theorem 1 in Roberts and Rosenthal⁴⁷). Consider an adaptive MCMC algorithm with state space 𝓧 and adaptation index 𝓨, with transition kernels K_γ, γ ∈ 𝓨. The algorithm is ergodic if the following conditions hold

(Simultaneous uniform ergodicity) For every ε > 0, there exists N = N(ε) such that for every x ∈ 𝓧, γ ∈ 𝓨, and n > N.
(Diminishing adaptation) lim_n→∞ D_n = 0 in probability where is a 𝓖_n+1-measurable random variable.

We immediately get a useful corollary.

Corollary 0.2.

Consider an adaptive MCMC with state space 𝓧 and transition kernels K_γ, γ ∈ 𝓨 that are ergodic w.r.t π. Assume that the following conditions are satisfied:

The algorithm satisfies the diminishing adaptiation condition.
𝓧 is a compact metric space.
where each 𝓨_j is a compact metric space.
For each n = 1, 2,…, and on each set 𝓧 × 𝓨_j with the product metric space topology, the mapping is continuous.

Then, the adaptive MCMC algorithm is ergodic.

Proof.

Our proof is a modification of the proof of Corollary 3 in.⁴⁷ Fix a number ε > 0 and an index j ∈ {1,…, m}. Let be the set of all (x, γ) ∈ 𝓧 × 𝓨_j such that

Since each kernel is ergodic, for every (x, γ) ∈ 𝓧 × 𝓨_j there exists some n such that (x, γ) ∈ and that S(x,j; n′, j) < ε for all n′ > n. We thus have

Due to continuity, each is an open set. By compactness, there exists a finite subcover for 𝓧 × 𝓨_j. Choose N_j(ε) to be the maximum of all n₁,…,n_{r_j}. Then, choose N (ε) = N₁(ε) + … + N_m (ε), we have for all n > N (ε) and (x, γ) ∈ 𝓧 × 𝓨. Thus, simultaneous uniform ergodicity is satisfied. Combining with diminishing adaptation, the preceding theorem shows that the algorithm is ergodic. □

Convergence of adaptive DAMH with diminishing model adaptations

In this section, we analyze the convergence of an adaptive variant of the DAMH. As seen in the pseudocode of Algorithm 4, this variant modifies the approximation and the proposal density at every step, using the samples accepted so far on the chain. The update of the approximate model occurs randomly, with the upate probability at step n pre-specified as a(n).

Proposition 0.3.

Consider an adaptive delayed acceptance Metropolis-Hastings algorithm with the target distribution supported on a state space 𝓧, proposal adaptation space 𝓨, approximation space 𝓩. Let f be the density of the target distribution π with respect to a finite reference measure λ, that is, π(dx) = f(x)λ(dx). Let be the family of approximations to f. Let q_γ be the first-step proposal densities. The algorithm is ergodic under the following conditions:

𝓧, 𝓨 are compact metric spaces, and where each 𝓩_j is a compact metric space.
For each fixed γ, φ, the transition kernel K_γ,φ is ergodic.
λ{x} = 0 for all x ∈ 𝓧.
The mapping (x,y, γ) ↦ q_γ (x,y) is continuous and uniformly bounded on 𝓧 × 𝓧 × 𝓨 which is a compact metric space equipped with the product space metric.
For each y ∈ 𝓨, the mapping is continuous on each 𝓧 × 𝓩_j.
Diminishing adaptation: The chain (Γ_n, Φ_n) satisfies in probability.

Algorithm 4

Adaptive Delayed Acceptance MH with probabilistic approximation adaptation

Proof.

The ADAMH could be viewed as an adaptive MCMC algorithm with state space 𝓧 and adaptation space 𝓨 × 𝓩. In order to apply corollary 0.2, we will prove that for any fixed n = 1, 2,…, and fixed j = 1,…, m, the mapping is continuous on 𝓧 × 𝓨 × 𝓩_j. In order to do so, we proceed as in the proof of theorem 1 in.³¹ Fix (x,γ, φ) ∈ 𝓧 × 𝓨 × 𝓩_j, the transition kernel for the DAMH associated with (x, γ, φ) is where is the first step acceptance probability, β_γ,φ(x,z) = is the second step acceptance probability, and is the overall probability for a proposal to be accepted.

Fix the value of z, then due to conditions (iv) and (v), g(x, z, γ, φ) = q_γ(x, z)_{α_γ,φ}(x, z)_{β_γ,φ}(x, z) is jointly continuous in (x, γ, φ) ∈ 𝓧 × 𝓨 × 𝓩_j. Furthermore, condition (iv) implies that the functions z ↦ g(x, z, γ, φ) is uniformly bounded for (x, γ, φ) ∈ 𝓧 × 𝓨 × 𝓩_j. By the bounded convergence theorem, ρ_γ,φ(x) is jointly continuous in the three variables x,γ, φ.

By induction, we can show that the n-step transition kernel has the form where g_n is an appropriate function that is jointly continuous in x, γ and φ. From condition (iii), δ_x and π are orthogonal measures. Therefore,

The integral on the right hand side is jointly continuous in x,j, φ due to the bounded convergence theorem. This shows that is continuous in the variable (x,γ,φ) ∈ 𝓧 × 𝓨 𝓧 𝓩_j. From this, conditions (i), (vi) and corollary 0.2 combined show that the algorithm is ergodic. □

Proposition 0.4.

Assume the ADAMH with probabilistic model adaptation satisfies conditions (i)-(v) in proposition 0.3. Assume further that the proposal is symmetric, that the approximate posterior adaptation probability a(n) → 0 as n → ∞, and that d_𝓨 (Γ_n+1, Γ_n) → 0 in probability (here dy denote the metric on 𝓨). Then, the algorithm satisfies diminishing adaptation.

Proof.

All conditions for ergodicity in proposition 0.3 are satisfied, except for the diminishing adaptation that we will verify. Fix a value of n. Consider a fixed set of values of adaptivity parameters of the ADAMH chain up to iteration n.

Fix an event A ∈ B(𝓧) and x ∈ 𝓧. We have

We bound each term separately. First of all, we have D₂ = 0 if φ_n = φ_n+1 and D₂ ≤ K_{γ_n,φ_n+1} (x,A) ≤ 2 if φ_n ≠ φ_n+1, with the latter event taking place with probability less than a(n).

Due to the symmetry of the proposal, the first and second step acceptance probabilities do not depend on the choice of γ. This and the uniform continuity of q_γ(x,y) gives us where C > 0 is independent of x, y, β and φ.

Combining the bounds on D₁ and D₂ we get where 𝓧(A) = 1 if A is true and 0 otherwise. Taking the supremum over all x and A we get

Fix a scalar ε > 0. The set of runs where D_n < ε include sample chains where both events φ_n = φ_n+1 and C.d_𝓨(γ_n, γ_n+1) < ε hold. Therefore, the event [D_n > ε] is a subset of the event [C · d_𝓨(Γ_n, Γ_n+1) ≥ ε] ∪ [Φ_n = Φ_n+1]. We therefore have

The last right hand side of the inequality above converges to 0 as n → ∞. Therefore, D_n converges to 0 in probability. The diminishing adaptation condition is satisfied and the algorithm is ergodic. □

Regularity of the ROM-based likelihood approximation

Let 𝓢_j be the set of all n × j matrices Q such that Q^TQ = I_j×j. It is known that 𝓢_j with the metric defined by the induced matrix 2-norm is a compact metric space (indeed, it is the inverse image of I_j×j via the continuous mapping A ↦ A^TA). Let m_max be the maximum dimension allowed in the reduced basis and let Φ be a particular basis set constructed during a run of the ADAMH chain, then there exists a tuple (j₁,…, j_{n_B}) with 1 ≤ j_k ≤ m_max such that

Thus, the set of all possible choices of reduced basis set Φ is the finite union of all S_j with j bounded elementwise by m_max. Note that each S_j is a compact metric space with the product space topology. Thus, we can apply the theory developed in the previous section to show that the ADAMH-FSP-Krylov is ergodic. The following propositions concern the continuity in the change of the reduced-order approximations with respect to the change in basis.

Proposition 0.5.

Fix a space S_j as above, and let Φ and Ψ be elements of this space. For for every fixed θ ∈ Θ we have as Ψ → Φ in S_j, where is the approximation to the FSP log-likelihood as defined in eq. (16).

Proof.

From eq. (7), it is clear that the mapping Φ ↦ p_Φ(t_k) is continuous on S_j for all time points t_k. The mapping is a composition of continuous mappings Φ ↦ p_Φ(t_k) and and is therefore continuous. □

Ergodicity of the ADAMH-FSP-Krylov algorithm

Proposition 0.6.

The ADAMH-FSP-Krylov algorithm is ergodic.

Proof.

We apply proposition 0.3 with 𝓧 = Θ. The proposal densities of the first step are Gaussian with γ being the modified empirical covariance matrix as in the adaptive Metropolis Algorithm.²⁷ Similar to the proof of Theorem 1 in Haario et al.,²⁷ we can take 𝓨 to be a closed, bounded subset of the set of positive definite matrices. The reduced model space is 𝓩 = ∪_j S_j the finite union of the compact spaces S_j with j ≤ m_max pointwise. These spaces satisfy condition (i), and the proposal density satisfies condition (iv).

The posterior density is and the approximate posterior densities are where these are the densities of the true and approximate posterior distributions with respect to the Lebesgue measure. From Theorem 1 in Christen and Fox,²⁸ condition (ii) is satisfied.

Condition (v) is then satisfied using proposition 0.5.

Since the empirical covariances are computed from values in a bounded set, the modification to the empirical covariance matrix γ at step n is O(1/n), so changes in Γ_n converge to 0 (see Haario et al.²⁷). Thus, the conditions in proposition 0.4 are satisfied. The algorithm therefore satisfies all sufficient conditions for ergodicity outlined in proposition 0.3. □

Footnotes

E-mail: Huy.Vo{at}colostate.edu; Munsky{at}colostate.edu

References

(1).↵
McAdams, H. H.; Arkin, A. Proc. Natl. Acad. Sci. U. S. A. 1997, 94, 814–819.
OpenUrl Abstract/FREE Full Text
(2).↵
Elowitz, M. B.; Levine, A. J.; Siggia, E. D.; Swain, P. S. Science 2002, 297, 1183–1186.
OpenUrl Abstract/FREE Full Text
(3).↵
Kaern, M.; Elston, T. C.; Blake, W. J.; Collins, J. J. Nature Rev. Genet. 2005, 6, 451–464.
OpenUrl CrossRef PubMed Web of Science
(4).↵
Gillespie, D. T. Physica A 1992, 188, 404–425.
OpenUrl CrossRef Web of Science
(5).↵
Neuert, G.; Munsky, B.; Tan, R. Z.; Teytelman, L.; Khammash, M.; Oudenaarden, A. V. Science 2013, 339, 584–587.
OpenUrl Abstract/FREE Full Text
(6).
Shepherd, D. P.; Li, N.; Micheva-Viteva, S. N.; Munsky, B.; Hong-Geller, E.; Werner, J. H. Anal. Chem. 2013, 85, 4938–4943.
OpenUrl CrossRef
(7).
Munsky, B.; Fox, Z.; Neuert, G. Methods 2015, 85, 12–21.
OpenUrl CrossRef
(8).↵
Gómez-Schiavon, M.; Chen, L.; West, A. E.; Buchler, N. E. Genome Biol. 2017, 18, 164.
OpenUrl
(9).↵
Munsky, B.; Khammash, M. J. Chem. Phys. 2006, 124, 044104.
OpenUrl CrossRef PubMed
(10).↵
Munsky, B.; Li, G.; Fox, Z. R.; Shepherd, D. P.; Neuert, G. PNAS 2018,
(11).↵
Peherstorfer, B.; Willcox, K.; Gunzburger, M. SIAM Review 2018, 60, 550–591.
OpenUrl
(12).↵
Asher, M. J.; Croke, B. F. W.; Jakeman, A. J.; Peeters, L. J. M. Water Resour. Res. 2015, 51, 5957–5973.
OpenUrl
(13).↵
Razavi, S.; Tolson, B. A.; Burn, D. H. Water Resour. Res. 2012, 48.
(14).↵
Pinnau, R. Model Order Reduction: Theory, Research Aspects and Applications; Springer Berlin Heidelberg, 2008; Vol. 13; pp 95–109.
OpenUrl
(15).↵
Benner, P.; Gugercin, S.; Willcox, K. SIAM R 2015, 57, 483–531.
OpenUrl
(16).↵
Peleš, S.; Munsky, B.; Khammash, M. J. Chem. Phys. 2006, 125, 1–13.
OpenUrl
(17).↵
Munsky, B.; Khammash, M. IEEE Trans. Aut. Contrl. 2008, 53, 201–214.
OpenUrl
(18).↵
Tapia, J. J.; Faeder, J. R.; Munsky, B. 2012 IEEE 51st IEEE Conf. Decis. Ctrl. (CDC) 2012, 836, 5361–5366.
OpenUrl
(19).↵
Vo, H. D.; Sidje, R. B. Solving the chemical master equation with aggregation and Krylov approximations. Proceedings of IEEE 55th Conference on Decision and Control. 2016; pp 7093–7098.
(20).↵
Kazeev, V.; Khammash, M.; Nip, M.; Schwab, C. PLoS Comput. Biol. 2014, 10.
(21).
Dolgov, S.; Khoromskij, B. N. Numer. Linear Algebra Appl. 2013, 22, 197–219.
OpenUrl
(22).↵
Vo, H. D.; Sidje, R. B. J. Chem. Phys. 2017, 147.
(23).↵
Dayar, T.; Orhan, M. C. Numer. Linear Algebra Appl. 2018, 25, e2158, e2158 nla.2158.
OpenUrl
(24).↵
Waldherr, S.; Haasdonk, B. BMC systems biology 2012, 6, 81.
OpenUrl
(25).↵
Liao, S.; Vejchodský, T.; Erban, R. J. R. Soc. Interface 2015, 12, 20150233.
OpenUrl CrossRef PubMed
(26).↵
Oseledets, I. V. SIAM J. Sci. Comput. 2011, 33, 2295–2317.
OpenUrl CrossRef
(27).↵
Haario, H.; Saksman, E.; Tamminen, J. Bernoulli 2001, 7, 223.
OpenUrl CrossRef Web of Science
(28).↵
Christen, J. A.; Fox, C. J. Comput. Graph. Stat. 2005, 14, 795–810.
OpenUrl
(29).↵
Efendiev, Y.; Hou, T.; Luo, W. SIAM J. Sci. Comput. 2006, 28, 776–803.
OpenUrl CrossRef
(30).↵
Cui, T.; Fox, C.; O’Sullivan, M. J. Water Resources Research 2011, 47.
(31).↵
Cui, T.; Fox, C.; O’Sullivan, M. Adaptive Error Modelling MCMC sampling for Large Scale Inverse Problems; 2012.
(32).↵
Golightly, A.; Henderson, D. A.; Sherlock, C. Statistics and Computing 2015, 25, 1039–1055.
OpenUrl
(33).↵
Saad, Y. SIAM J. Numer. Anal. 1992, 29, 209–228.
OpenUrl CrossRef Web of Science
(34).↵
Sidje, R. B. ACM Trans. Math. Softw. 1998, 24, 130–156.
OpenUrl CrossRef Web of Science
(35).↵
1. Langville, A.,
2. Stewart, W.
Burrage, K.; Hegland, M.; MacNamara, S.; Sidje, R. B. In 150^th Markov Anniversary Meeting, Charleston, SC, USA; Langville, A., Stewart, W., Eds.; Boson Books, 2006; pp 21–38.
(36).↵
Sidje, R. B.; Vo, H. D. Math. Biosci. 2015, 269, 10–16.
OpenUrl
(37).↵
Gauckler, L.; Yserentant, H. ESAIM. Math. Model. 2014, 48, 1757–1775.
OpenUrl
(38).↵
Femino, A. M.; Fay, F. S.; Fogarty, K.; Singer, R. H. Science 1998, 280, 585–590.
OpenUrl Abstract/FREE Full Text
(39).↵
Raj, A.; van Oudenaarden, A. Cell 2008, 135, 216–226.
OpenUrl CrossRef PubMed Web of Science
(40).↵
Chaturantabut, S.; Sorensen, D. C. SIAM J. Sci. Comput. 2010, 32, 2737–2764.
OpenUrl
(41).↵
Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. J. Chem. Phys. 1953, 21, 1087–1092.
OpenUrl CrossRef Web of Science
(42).↵
Hastings, W. K. Biometrika 1970, 57, 97–109.
OpenUrl CrossRef Web of Science
(43).↵
Roberts, G. O.; Rosenthal, J. S. Probability Surveys 2004, 1, 20–71.
OpenUrl
(44).↵
Roberts, G. O.; Gelman, A.; Gilks, W. R. Annal. Appl. Prob. 1997, 7, 110–120.
OpenUrl
(45).↵
Cui, T.; Marzouk, Y. M.; Willcox, K. E. Int. J. Numer. Meth. Engnr. 2015, 102, 966–990.
OpenUrl
(46).↵
Golub, G.; Van Loan, C. Matrix Computations, 4th ed.; John Hopkins University Press, 2012.
(47).↵
Roberts, G.; Rosenthal, J. S. J. Appl. Probability 2007, 44, 458–475.
OpenUrl CrossRef Web of Science
(48).↵
Vats, D.; Flegal, J. M.; Jones, G. L. Multivariate Output Analysis for Markov Chain Monte Carlo; 2017.
(49).↵
Acerbi, L. multiESS. https://https://github.com/lacerbi/multiESS, 2018.
(50).↵
Gillespie, D. T. J. Phys. Chem. 1977, 81, 2340–2361.
OpenUrl CrossRef Web of Science
(51).↵
Munsky, B.; Neuert, G.; van Oudenaarden, A. Science 2012, 336, 183–187.
OpenUrl Abstract/FREE Full Text
(52).
Peccoud, J.; Ycart, B. Theoretical Pop. Biol. 1995, 48, 222–234.
OpenUrl
(53).
Golding, I.; Paulsson, J.; Zawilski, S. M.; Cox, E. C. Cell 2005, 123, 1025–1036.
OpenUrl CrossRef PubMed Web of Science
(54).↵
Iyer-Biswas, S.; Hayot, F.; Jayaprakash, C. Phys. Rev. E 2009, 79, 031911.
OpenUrl
(55).↵
Gardner, T.; Cantor, C.; Collins, J. Nature 2000, 403, 339–342.
OpenUrl CrossRef PubMed Web of Science
(56).↵
Fox, Z. R.; Munsky, B. bioRxiv 2018,
(57).↵
Conrad, P. R.; Marzouk, Y. M.; Pillai, N. S.; Smith, A. J. Amer. Stat. Assoc. 2016, 111, 1591–1607.
OpenUrl
(58).↵
Conrad, P.; Davis, A.; Marzouk, Y.; Pillai, N.; Smith, A. SIAM/ASA J. Uncertainty Quantification 2018, 6, 339–373.
OpenUrl
(59).↵
Benner, P.; Cohen, A.; Ohlberger, M.; Wilcox, K. e. Model reduction and approximation: Theory and algorithms; SIAM, 2017.
(60).↵
Fox, Z.; Neuert, G.; Munsky, B. J. Chem. Phys. 2016, 145.
(61).↵
Qian, E.; Grepl, M.; Veroy, K.; Willcox, K. SIAM J. Sci. Comput. 2017,

View the discussion thread.

Posted November 11, 2018.

Download PDF

Citation Tools

Subject Areas

All Articles

Animal Behavior and Cognition (5213)
Biochemistry (11744)
Bioengineering (8751)
Bioinformatics (29193)
Biophysics (14968)
Cancer Biology (12094)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14178)
Epidemiology (2067)
Evolutionary Biology (18303)
Genetics (12244)
Genomics (16801)
Immunology (11866)
Microbiology (28082)
Molecular Biology (11592)
Neuroscience (60959)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4957)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2885)
Systems Biology (7339)
Zoology (1651)

[1] (1).↵
McAdams, H. H.; Arkin, A. Proc. Natl. Acad. Sci. U. S. A. 1997, 94, 814–819.
OpenUrl Abstract/FREE Full Text

[2] (2).↵
Elowitz, M. B.; Levine, A. J.; Siggia, E. D.; Swain, P. S. Science 2002, 297, 1183–1186.
OpenUrl Abstract/FREE Full Text

[3] (3).↵
Kaern, M.; Elston, T. C.; Blake, W. J.; Collins, J. J. Nature Rev. Genet. 2005, 6, 451–464.
OpenUrl CrossRef PubMed Web of Science

[4] (4).↵
Gillespie, D. T. Physica A 1992, 188, 404–425.
OpenUrl CrossRef Web of Science

[5] (5).↵
Neuert, G.; Munsky, B.; Tan, R. Z.; Teytelman, L.; Khammash, M.; Oudenaarden, A. V. Science 2013, 339, 584–587.
OpenUrl Abstract/FREE Full Text

[6] (6).
Shepherd, D. P.; Li, N.; Micheva-Viteva, S. N.; Munsky, B.; Hong-Geller, E.; Werner, J. H. Anal. Chem. 2013, 85, 4938–4943.
OpenUrl CrossRef

[7] (7).
Munsky, B.; Fox, Z.; Neuert, G. Methods 2015, 85, 12–21.
OpenUrl CrossRef

[8] (8).↵
Gómez-Schiavon, M.; Chen, L.; West, A. E.; Buchler, N. E. Genome Biol. 2017, 18, 164.
OpenUrl

[9] (9).↵
Munsky, B.; Khammash, M. J. Chem. Phys. 2006, 124, 044104.
OpenUrl CrossRef PubMed

[10] (10).↵
Munsky, B.; Li, G.; Fox, Z. R.; Shepherd, D. P.; Neuert, G. PNAS 2018,

[11] (11).↵
Peherstorfer, B.; Willcox, K.; Gunzburger, M. SIAM Review 2018, 60, 550–591.
OpenUrl

[12] (12).↵
Asher, M. J.; Croke, B. F. W.; Jakeman, A. J.; Peeters, L. J. M. Water Resour. Res. 2015, 51, 5957–5973.
OpenUrl

[13] (13).↵
Razavi, S.; Tolson, B. A.; Burn, D. H. Water Resour. Res. 2012, 48.

[14] (14).↵
Pinnau, R. Model Order Reduction: Theory, Research Aspects and Applications; Springer Berlin Heidelberg, 2008; Vol. 13; pp 95–109.
OpenUrl

[15] (15).↵
Benner, P.; Gugercin, S.; Willcox, K. SIAM R 2015, 57, 483–531.
OpenUrl

[16] (16).↵
Peleš, S.; Munsky, B.; Khammash, M. J. Chem. Phys. 2006, 125, 1–13.
OpenUrl

[17] (17).↵
Munsky, B.; Khammash, M. IEEE Trans. Aut. Contrl. 2008, 53, 201–214.
OpenUrl

[18] (18).↵
Tapia, J. J.; Faeder, J. R.; Munsky, B. 2012 IEEE 51st IEEE Conf. Decis. Ctrl. (CDC) 2012, 836, 5361–5366.
OpenUrl

[19] (19).↵
Vo, H. D.; Sidje, R. B. Solving the chemical master equation with aggregation and Krylov approximations. Proceedings of IEEE 55th Conference on Decision and Control. 2016; pp 7093–7098.

[20] (20).↵
Kazeev, V.; Khammash, M.; Nip, M.; Schwab, C. PLoS Comput. Biol. 2014, 10.

[21] (21).
Dolgov, S.; Khoromskij, B. N. Numer. Linear Algebra Appl. 2013, 22, 197–219.
OpenUrl

[22] (22).↵
Vo, H. D.; Sidje, R. B. J. Chem. Phys. 2017, 147.

[23] (23).↵
Dayar, T.; Orhan, M. C. Numer. Linear Algebra Appl. 2018, 25, e2158, e2158 nla.2158.
OpenUrl

[24] (24).↵
Waldherr, S.; Haasdonk, B. BMC systems biology 2012, 6, 81.
OpenUrl

[25] (25).↵
Liao, S.; Vejchodský, T.; Erban, R. J. R. Soc. Interface 2015, 12, 20150233.
OpenUrl CrossRef PubMed

[26] (26).↵
Oseledets, I. V. SIAM J. Sci. Comput. 2011, 33, 2295–2317.
OpenUrl CrossRef

[27] (27).↵
Haario, H.; Saksman, E.; Tamminen, J. Bernoulli 2001, 7, 223.
OpenUrl CrossRef Web of Science

[28] (28).↵
Christen, J. A.; Fox, C. J. Comput. Graph. Stat. 2005, 14, 795–810.
OpenUrl

[29] (29).↵
Efendiev, Y.; Hou, T.; Luo, W. SIAM J. Sci. Comput. 2006, 28, 776–803.
OpenUrl CrossRef

[30] (30).↵
Cui, T.; Fox, C.; O’Sullivan, M. J. Water Resources Research 2011, 47.

[31] (31).↵
Cui, T.; Fox, C.; O’Sullivan, M. Adaptive Error Modelling MCMC sampling for Large Scale Inverse Problems; 2012.

[32] (32).↵
Golightly, A.; Henderson, D. A.; Sherlock, C. Statistics and Computing 2015, 25, 1039–1055.
OpenUrl

[33] (33).↵
Saad, Y. SIAM J. Numer. Anal. 1992, 29, 209–228.
OpenUrl CrossRef Web of Science

[34] (34).↵
Sidje, R. B. ACM Trans. Math. Softw. 1998, 24, 130–156.
OpenUrl CrossRef Web of Science

[35] (35).↵
Langville, A.,
Stewart, W.
Burrage, K.; Hegland, M.; MacNamara, S.; Sidje, R. B. In 150^th Markov Anniversary Meeting, Charleston, SC, USA; Langville, A., Stewart, W., Eds.; Boson Books, 2006; pp 21–38.

[36] Langville, A.,

[37] Stewart, W.

[38] (36).↵
Sidje, R. B.; Vo, H. D. Math. Biosci. 2015, 269, 10–16.
OpenUrl

[39] (37).↵
Gauckler, L.; Yserentant, H. ESAIM. Math. Model. 2014, 48, 1757–1775.
OpenUrl

[40] (38).↵
Femino, A. M.; Fay, F. S.; Fogarty, K.; Singer, R. H. Science 1998, 280, 585–590.
OpenUrl Abstract/FREE Full Text

[41] (39).↵
Raj, A.; van Oudenaarden, A. Cell 2008, 135, 216–226.
OpenUrl CrossRef PubMed Web of Science

[42] (40).↵
Chaturantabut, S.; Sorensen, D. C. SIAM J. Sci. Comput. 2010, 32, 2737–2764.
OpenUrl

[43] (41).↵
Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. J. Chem. Phys. 1953, 21, 1087–1092.
OpenUrl CrossRef Web of Science

[44] (42).↵
Hastings, W. K. Biometrika 1970, 57, 97–109.
OpenUrl CrossRef Web of Science

[45] (43).↵
Roberts, G. O.; Rosenthal, J. S. Probability Surveys 2004, 1, 20–71.
OpenUrl

[46] (44).↵
Roberts, G. O.; Gelman, A.; Gilks, W. R. Annal. Appl. Prob. 1997, 7, 110–120.
OpenUrl

[47] (45).↵
Cui, T.; Marzouk, Y. M.; Willcox, K. E. Int. J. Numer. Meth. Engnr. 2015, 102, 966–990.
OpenUrl

[48] (46).↵
Golub, G.; Van Loan, C. Matrix Computations, 4th ed.; John Hopkins University Press, 2012.

[49] (47).↵
Roberts, G.; Rosenthal, J. S. J. Appl. Probability 2007, 44, 458–475.
OpenUrl CrossRef Web of Science

[50] (48).↵
Vats, D.; Flegal, J. M.; Jones, G. L. Multivariate Output Analysis for Markov Chain Monte Carlo; 2017.

[51] (49).↵
Acerbi, L. multiESS. https://https://github.com/lacerbi/multiESS, 2018.

[52] (50).↵
Gillespie, D. T. J. Phys. Chem. 1977, 81, 2340–2361.
OpenUrl CrossRef Web of Science

[53] (51).↵
Munsky, B.; Neuert, G.; van Oudenaarden, A. Science 2012, 336, 183–187.
OpenUrl Abstract/FREE Full Text

[54] (52).
Peccoud, J.; Ycart, B. Theoretical Pop. Biol. 1995, 48, 222–234.
OpenUrl

[55] (53).
Golding, I.; Paulsson, J.; Zawilski, S. M.; Cox, E. C. Cell 2005, 123, 1025–1036.
OpenUrl CrossRef PubMed Web of Science

[56] (54).↵
Iyer-Biswas, S.; Hayot, F.; Jayaprakash, C. Phys. Rev. E 2009, 79, 031911.
OpenUrl

[57] (55).↵
Gardner, T.; Cantor, C.; Collins, J. Nature 2000, 403, 339–342.
OpenUrl CrossRef PubMed Web of Science

[58] (56).↵
Fox, Z. R.; Munsky, B. bioRxiv 2018,

[59] (57).↵
Conrad, P. R.; Marzouk, Y. M.; Pillai, N. S.; Smith, A. J. Amer. Stat. Assoc. 2016, 111, 1591–1607.
OpenUrl

[60] (58).↵
Conrad, P.; Davis, A.; Marzouk, Y.; Pillai, N.; Smith, A. SIAM/ASA J. Uncertainty Quantification 2018, 6, 339–373.
OpenUrl

[61] (59).↵
Benner, P.; Cohen, A.; Ohlberger, M.; Wilcox, K. e. Model reduction and approximation: Theory and algorithms; SIAM, 2017.

[62] (60).↵
Fox, Z.; Neuert, G.; Munsky, B. J. Chem. Phys. 2016, 145.

[63] (61).↵
Qian, E.; Grepl, M.; Veroy, K.; Willcox, K. SIAM J. Sci. Comput. 2017,

Bayesian estimation for stochastic gene expression using multifidelity models

Abstract

Introduction

Background

Stochastic modeling of gene expression and the chemical master equation

Finite State Projection

Bayesian inference from single-cell data

The Metropolis-Hastings and the adaptive Metropolis algorithms

Materials and Methods

Delayed acceptance Metropolis-Hastings algorithm

Reduced-order models for the FSP dynamics

Projection-based model reduction

Krylov subspace approximation for single-parameter model reduction

Adaptive Delayed Acceptance Metropolis with reduced-order models of the CME

The approximate log-likelihood formula

Delayed acceptance posterior sampling with infinite model adaptations

Numerical Results

Implementation details

Two-state gene expression

A gene expression model with spatial components

Genetic toggle switch

Discussion and concluding remarks

Acknowledgments

Appendix: Mathematical proofs

Preliminaries on adaptive MCMC algorithms

Convergence of adaptive DAMH with diminishing model adaptations

Regularity of the ROM-based likelihood approximation

Ergodicity of the ADAMH-FSP-Krylov algorithm

Footnotes

References

Citation Manager Formats