Abstract
Biological organisms acclimatize to varying environmental conditions via active self-regulation of internal gene regulatory networks, metabolic networks, and protein signaling networks. While much work has been done to elucidate the topologies of individual networks in isolation, understanding of inter-network regulatory mechanisms remains limited. This shortcoming is of particular relevance to synthetic biology. Synthetic biological circuits tend to lose their engineered functionality over generational time, primarily due to the deleterious stress that they exert on their host organisms. To reduce this stress (and thus minimize loss of functionality) synthetic circuits must be sensitive to the health of the host organism. Development of integrated regulatory systems is therefore essential to robust synthetic biological systems. The aim of this study was to develop integrated gene-regulatory and metabolic networks which self-optimize in response to varying environmental conditions. We performed in silico evolution to develop such networks using a two-step approach: (1) We optimized metabolic networks given a constrained amount of available enzyme. Here, we found that a proportional relationship between flux control coefficients and enzyme mass holds in all linear sub-networks of branched networks, except those sub-networks which contain allosteric regulators. Network optimization was performed by iteratively redistributing enzyme until flux through the network was maximized. Optimization was performed for a range of boundary metabolite conditions to develop a profile of optimal enzyme distributions as a function of environmental conditions. (2) We generated and evolved randomized gene regulatory networks to modulate the enzymes of a target metabolic pathway. The objective of the gene regulatory networks was to produce the optimal distribution of metabolic network enzymes given specific boundary metabolite conditions of the target network. Competitive evolutionary algorithms were applied to optimize the specific structures and kinetic parameters of the gene regulatory networks. With this method, we demonstrate the possibility of algorithmic development of integrated adaptive gene and metabolic regulatory networks which dynamically self-optimize in response to changing environmental conditions.
Introduction
In synthetic biology living organisms have been engineered to perform novel functions that do not exist in nature. Some synthetic constructs include tunable oscillators [1], event counters [2], concentration detectors [3], edge detectors [4], linear amplifiers [5], and toggle-switches [6]. Synthetic genetic circuits, however, tend to lose their engineered functionality over time, due in large part to their deleterious effects on host fitness [7, 8]. Wild-type cells and mutant synthetic cells often out-compete their designed synthetic counter-parts. One possible method to improve this fitness stability is to optimize the expression of synthetic gene products to reduce the burden on the host cell. For this, the interaction between the synthetic circuits and host metabolic networks must be considered.
Most micro-organisms live in continuously changing environments, which makes maintaining fitness even more difficult. An example of this are the phototrophic micro-organisms cyanobacteria that are subject to the daily fluctuations in environmental light intensity. Recently, cyanobacteria have been engineered to convert carbon dioxide into biofuels with designed synthetic pathways [9, 10]. For efficient biofuel production, the synthetic pathways must be adaptive to this varying environmental condition. The adaptive responses, controlled by gene regulatory networks (GRNs), allow the host to quickly shift from one gene expression profile to another as needed. For instance, upon a shift from light to dark, the cyanobacterium Synechococcus sp. PCC 7002 responds by decreasing and increasing the transcription levels of genes essential to photosynthesis and carbohydrate degradation [11]. To understand the response mechanisms of such adaptive systems, it is important to adopt an integrative perspective. The nature of GRNs, protein signaling networks, and metabolic networks must be appreciated as an interconnected whole. In this study we adopt this perspective by considering GRNs within an integrated metabolic context.
The linkage between GRNs and their target metabolic networks are catalytic enzymes. Accurate expression of these catalytic enzymes for a wide range of environmental conditions gives robustness to the micro-organism. The concentrations of these enzymes are thought to be optimized for efficient cellular functions [12, 13]. Optimization is required since protein synthesis and degradation are energetically costly and resource-limited. The availability of amino acids, within the finite volume of a cell constrains the total enzyme concentration within the cell [14]. Additionally, the production of certain osmotically-active enzymes must be constrained to preserve the osmotic balance of the cell [15, 16].
Extensive research has been performed to examine the effect of enzyme mass constraints on metabolic fluxes. Several different optimization conditions have been considered in those studies: Metabolic fluxes were maximized for a fixed total enzyme concentration [17, 18]; intermediate concentrations and total osmolarity were minimized [16]; net enzyme concentrations were minimized [12, 13] or maintained below a certain level [14]. The studies which sought minimal total enzyme concentration as an optimization condition [12,13] show clear relationships between enzyme concentrations and flux control coefficients, namely, that flux control coefficient of any enzyme is proportional to that enzyme’s concentration in the optimized state. It was shown, however, that this relationship is valid for linear pathways but not for branched pathways. Consequently a new relationship was derived for general network topologies [13].
We aimed to design GRNs to reduce the burden a synthetic circuit exerts on the host cell, and to make synthetic systems robust to a wide range of environmental conditions. To achieve this aim, we considered the interaction between GRNs and metabolic networks, and investigated the structural properties of designed GRNs. These GRNs were designed to regulate the expression of catalytic enzymes such that metabolic fluxes remained maximized for various boundary metabolite concentrations. By taking an in silico evolutionary approach we obtained integrated networks composed of a GRN and a metabolic network which are robust and self-optimizing (Figure 1). We found that the integrated networks have interesting topological properties: the GRN typically shows shallow regulation and feed-forward structures, and boundary metabolites alone can be sufficient for regulation of the GRN.
Moreover, our numerical study discovered a new relationship similar to the one found in the linear pathway study [12, 13] which holds for general networks that include both linear and branched pathways. This relationship was found to be valid for each branch of a network which meets two conditions: (1) the branch itself is linear and (2) the branch does not include a metabolite which that allosterically regulates another branch. By investigating local linear pathways of a general network, this relationship can be used to determine whether pathway metabolites regulate other parts of the network.
Results
Model systems
A self-optimizing network that maximizes metabolic fluxes under a constraint in the total enzyme mass is composed of two reaction systems: a GRN and a metabolic network (Figure 1). The GRN controls the expression of enzymes that catalyze metabolic reactions and the metabolic network regulates promoter activities in the GRN via allosteric metabolite interactions. The metabolic and gene regulatory networks were described in a deterministic framework which does not consider the stochastic details of gene expression [19]. The networks were modeled with a set of ordinary differential equations with non-linear reaction rates (see Methods for details).
Flux control coefficients reflect enzyme concentrations
Previous studies which considered the total enzyme mass constraint revealed a relationship between flux control coefficients and enzyme concentrations in linear pathways. Brown [12] proposed that flux control coefficients [20–22] become proportional to enzyme concentrations in the optimized state for an n-step linear metabolic pathway: where c is a constant and every enzyme has identical mass. The flux control coefficient describes the sensitivity of the steady-state flux J to a perturbation in the concentration of the i-th enzyme Ei: The control coefficient has been used to quantify the degree of control specific reactions exert on fluxes within a network [23], as well as control of gene expression noise in gene regulatory networks [24, 25].
The above relationship can be extended to consider enzymes of unequal mass as follows: Where Mi = Eimi with mi the molar mass of the i-th enzyme. In the previous study [12], total enzyme mass was minimized for a given metabolic flux. Equation (S14) can also be derived for the complementary case where flux is maximized for a given total enzyme mass (SI, Section S1). For general metabolic networks, Klipp and Heinrich [13] found a similar relationship which can also be expressed with unequal enzyme mass (SI, Section S1): where C represents a matrix of flux control coefficients and M represents a vector of enzyme masses Mi.
We investigated the above results numerically and found an interesting fact: Equation (S14) holds for certain sub-networks of general networks which include linear and branched pathways. This previously-unobserved result is a hidden property of Equation (S15). First, consider the branched network shown in Figure 2(e) without allosteric regulation. We minimized the total enzyme concentration while all the fluxes were fixed (see Methods for details). The optimization procedure yielded Brown’s result for all branches (Figure 3(a)). This can be partly understood with the following intuitive argument: When the total mass of enzymes regulating each branch is minimized, the total mass of enzymes for the whole network must also be minimized. Since each branch flux is fixed, the optimization produces Brown’s result for each branch.
For the branched network shown in Figure 2(e), we examined various different regulatory configurations (Figure 3; Figures S3 – S5). Surprisingly, we observed the same result for all branches which are the target of inter-branch allosteric regulation, but not for those branches which are the source of allosteric regulation. Brown’s result also holds for all branches if a branch point corresponds to an allosteric metabolite (Figure S4). Mathematically, were equal for all metabolites within any linear segments of the networks that do not have any allosteric metabolite regulating other branches. We also numerically solved Equation (S15) and verified our simulation results (SI).
These findings suggest that our numerical relationship can be used to determine whether metabolic networks are in an optimized state by observing the control coefficients of each linear branch, excluding those with allosterically-active metabolites. In addition, this relationship can be leveraged to discover the origins of regulation (allosteric metabolites) within a network., under the condition that total enzyme mass is tightly constrained.
The connectivity theorem [20–22] suggests that reactions with low enzyme levels have correspondingly high flux control coefficients. But our results and previous studies [12, 13] show that reactions with higher control coefficients have higher enzyme levels. At first, this seems paradoxical. Consider the two-reaction linear pathway shown in Figure 2(a). We computed the control coefficients for each reaction over a range of different enzyme concentrations, where the net enzyme mass was fixed. Figures 4(a–b) illustrate that higher enzyme levels can lead to lower flux control coefficients.
The flux peaks at an intermediate E1 concentration (Figure 4(a)). The flux control coefficient for E1, decreases monotonically with the increase in E1 (Figure 4(b)). This illustrates that higher enzyme levels can lead to lower flux control coefficients of that particular enzyme. However, this relationship between an enzyme level and its corresponding flux control coefficient can only be observed with changes in the enzyme concentration, not with a concentration value in isolation. Equation (S14), on the other hand, describes the latter case. Consequently, the Brown’s result is biologically reasonable.
We have hitherto investigated the properties of flux control coefficients under the total enzyme mass constraint while optimizing metabolic fluxes. In the following sections, we will investigate how such an optimized state can be maintained under a wide range of environmental conditions.
In silico evolution of GRNs as a part of integrated self-optimizing networks
We composed gene regulatory networks and metabolic networks to obtain integrated networks which function robustly under a wide range of environmental conditions. We optimized each metabolic network over a wide range of boundary metabolite concentrations to determine corresponding optimal enzyme distributions (see SI, Section S5). These data were used then as inputs and outputs for in silico evolution of GRNs [26, 27]. Here the boundary metabolite concentration and total enzyme mass are considered environmental conditions.
In silico evolution of GRNs was performed by building N random GRNs from a given limited number of gene network building blocks and parameterizations. The building blocks were (1) genes that express enzymes, (2) inducible, repressible, and constitutive promoters, and (3) up to three additional regulator binding sites. An example of such a random GRN is shown in Figure 1.
Given a specific boundary metabolite condition, the constructed GRNs were trained to produce the corresponding optimal enzyme distribution. The fitness of each GRN was quantified as the relative error between the desired and trained output values, as expressed by Equation (4). After evaluating network fitness, some of the networks were selected to breed for a new generation (Figure 8(b)). Fitter networks were more likely to survive and mutate into even fitter GRN structures. Two selection methods — 1) elitism to guarantee survival of the fittest structures and 2) tournament selection to allow structures to mutate to fitter structures — were used to select GRNs for the next generation. The GRNs chosen with tournament selection were subject to mutations in parameter values and regulation patterns. Activators could be mutated into repressors (and vice versa), and regulators or genes could be inserted or deleted within predefined bounds. Typically, the numbers of genes and regulators allowed were two and three times of the number of enzymes regulating the connected metabolic networks, respectively. The process of fitness-scoring, network selection, and reproduction was iterated until the fitness score converged. Finally, the gene and metabolic network were integrated, resulting in a self-optimizing biochemical network. See the Methods for more details on this evolutionary approach.
For the metabolic networks shown in Figure 2(a–d), a GRN was trained to fit the desired input-output characteristics. One of the most surprising findings was that no additional proteins were required in the GRN to satisfy the desired input-output characteristics. For example, to control the input-output characteristics in the two-enzyme metabolic network shown in Figure 2(a), the gene networks were initially allowed to contain up to four genes. Two of these genes would express both the enzymes while other genes could express enzymes that do not interact with the metabolic network. Nonetheless, the evolutionary approach returned a GRN in which two genes were sufficient to maximize the metabolic flux in the trained regime of boundary metabolite concentrations. This result shows that the enzymes expressed in the GRNs can play dual roles of regulation on themselves as well as on the target metabolic networks, which leads to shallow regulation cascade structures.
The input-output performance of the evolved GRNs was evaluated by computing the degree of fitness (performance error; Equation (4)) for a wide range of boundary metabolite concentrations beyond the trained regime. The performance error is shown in Figure 5, where the dotted white boxes represent the trained regimes. Within the trained regime performance error could be lowered below 1% for all evolved GRNs, indicating excellent input-output performance within this regime. The performance errors over the trained regime were 0.57 % (SD = 0.38), 0.85 % (SD = 0.57), 0.51 % (SD = 0.24), and 0.66 % (SD = 0.25) for the four integrated GRN-metabolic networks referenced in Figure 5, respectively. Outside the trained regime (one or two orders of expansion), the error moderately increased up to approximately 20% for most cases.
Emerging regulation patterns in integrated self-optimizing networks
Besides seeking strong agreement between desired and trained gene expression levels, we were interested in determining general design features of successfully-evolved GRNs. Such design features are listed below.
Regulation by boundary metabolites is sufficient for optimal adaptation.
Because the optimal enzyme distribution changes for different boundary metabolite concentrations, direct or indirect regulation by boundary metabolites is essential. Based on our analysis of linear metabolic pathways, GRNs obtained by in silico evolution (e.g. Figures 6–7) showed accurate input-output performance for wide ranges of boundary metabolite concentrations. This implies that direct regulation by boundary metabolites alone may be sufficient for optimal adaptation to changes in environmental conditions.
Shallow network topologies allow coordinated expression.
In the five-enzyme system shown in Figure 7(b), enzyme E5 regulates the expression of three other enzymes, and in the six-enzyme system shown in Figure 7(c), enzyme E6 regulates four other enzymes. This emerging regulation motif, previously termed a single-input module (SIM) [28], is notable because it allows for coordinated expression with shallow network topologies. The SIM is also known to lead to fast responses to external signals [29–31]. Although the response speed was not considered in our GRN evolution, this result suggests that the frequent occurrence of the SIM in natural systems may be attributed to their utility for robust self-optimization under various environmental conditions.
Feed-forward loops can be essential to obtain optimal enzyme distributions.
Feed-forward loops (FFLs) were observed in the evolved networks. For example, Figure 7 (c) shows that boundary metabolite P inhibits the expression of E3, but activates the expression of E6, which in turn actives E3, completing a type-3 incoherent FFL. This implies a non-monotonic, bell-shape response profile between E3 and E6: As P increases, E3 and E6 increase together but after a certain threshold in P, E3 begins to decrease due to strong inhibition from P, while E6 still increases due to activation from P. The prevalence of incoherent FFLs suggests that this motif is an important component for regulation of complex enzyme distributions.
Discussion
This research investigated the design of robust and self-optimizing networks composed of gene regulatory networks and metabolic pathways under the constraint of limited enzyme resources within a cell. The obtained integrated networks adapted to maintain maximal network fluxes for a wide range of boundary metabolite concentrations. Here, boundary metabolite concentrations and the enzyme mass constraint were considered to be directly affected by environmental variations.
Under the total enzyme mass constraint, we computed the optimal enzyme distributions and discovered an interesting relationship between flux control coefficients and enzyme levels for linear sub-networks. The relationship was shown to hold for the sub-networks that (1) are linear and (2) do not allosterically regulate other branches. This relationship can serve as an indicator for the presence of such allosteric metabolites and for determining whether cellular conditions whether are tightly constrained by available enzyme mass.
Furthermore, in our study of self-optimizing integrated networks, we found that a minimal number of metabolites can be used as input signals for self-optimizing GRN controllers of metabolic networks. The boundary metabolite concentrations and corresponding optimal enzyme distributions were used as constraints in evolving GRNs based on genetic algorithms [32]. Here, the initial randomly-built GRNs were evolved in silico from a limited number of genetic building blocks and allowable parameterizations. The evolved GRNs were shown to successfully modulate expression levels of the enzymes in response to changes in boundary metabolite concentrations.
One of the important regulation patterns of the evolved GRNs was the SIM, which enabled shallow network regulation structures. This motif implies fast and coordinated regulation. The degree of fitness (performance error; Equation (4)) used in our evolutionary algorithm was related to the steady-state expression levels of the enzymes, and did not include the temporal response to the steady-state. The prevalence of the SIM in the evolved GRNs implies that accurate performance can be obtained by this motif without delaying the dynamic response. It would be interesting to include response time in our fitness function in future research, to investigate emerging GRN motifs related to rapid adaptation of GRNs under volatile environment conditions.
The input-output performance of the in silico evolved GRNs was shown to be accurate for wide ranges of boundary metabolite concentrations. The performance plots (Figure 5) show that our GRNs and metabolic networks can be optimized for a range of three or four orders of magnitude within∼20% error. This implies that flux optimization can be robustly achieved under significant environmental changes via feedback from GRNs.
In summary, we investigated self-optimizing feedback systems composed of a GRN and a metabolic network under the constraint of limited total enzyme mass. The self-optimizing systems generated accurately maintained optimal enzyme distributions to maximize the metabolic fluxes under significantly different boundary metabolite levels. Our theoretical study showed a possible approach for designing such robust synthetic systems. This work may eventually help provide rational design principles and computational methods for constructing synthetic circuits that are robust to a wide range of environmental conditions.
Methods
Our self-optimizing integrated GRNs and metabolic networks were designed by using two major algorithms for the optimization of metabolic networks and the in silico evolution of GRNs. The first algorithm determined the input-output characteristics of the GRNs and the second algorithm used them as a training data set. The pipeline for designing these self-optimizing networks is shown in Figure 8(a).
Optimization of metabolic fluxes under the total enzyme mass constraint
In our model systems, metabolic reactions were based on Michaelis-Menten and Hill kinetics. The enzymes regulating the metabolic networks were assumed to maintain their total enzyme mass at the level of 10−19 kg per cell, which corresponds to 600 enzymes (under the assumption that the molar mass of all the enzymes is 100 kDa). Under this enzyme mass constraint, we obtained enzyme distributions that maximize metabolic fluxes in linear metabolic networks by using an optimization algorithm based on the Brown’s result, Equation (S14). This optimization procedure was performed for a range of boundary metabolite concentrations to obtain the characteristics between these concentrations and optimal enzyme distributions. For each metabolic network there is only one enzyme distribution where Equation (S14) holds, given a set of kinetic parameters [12]
In the optimization method, (1) scaled flux control coefficients were computed at the steady-state, divided by the total enzyme mass, and subsequently ranked. (2) The enzyme concentration with the highest value was increased and the enzyme concentration with th lowest value was decreased such that the total enzyme mass remained constant. The amount of change in the enzyme concentration was kept sufficiently small (0.1 – 0.5%), since flux control coefficients change with enzyme concentrations. This procedure of steps (1) and (2) was iterated until the newly ranked values were identical within a given convergence error. See SI Section S2 for more details of this algorithm.
In silico evolution of GRNs
To obtain GRNs that regulate and maintain the metabolic networks at their optimal flux levels, we used the concentrations of the boundary metabolites as input signals of the GRNs and the optimized enzyme distributions as the output signals. These input-output characteristics were applied as a training data set to evolve and select desired GRNs. First, a population of random biochemical network topologies was built. Then, selection and mutation were used to evolve the topologies and perturb reaction parameters, to be trained to follow the input-output characteristics.
Specifically, random GRNs with a population size N was built. These GRNs were modeled based on Hill kinetics [27, 33, 34]. Up to three regulators were permitted to control gene expression by performing AND or OR operations [35] with non-competitive (independent) binding. For example, AND operation between two activators exists if both the activators cannot bind individually, but must be dimerized before they can bind to the promoter region. By contrast, OR operation exists when one activator is sufficient for activation, whereas presence of both activators enhances expression.
To evolve GRNs that optimize the distribution of the enzymes regulating the metabolic network, the degree of fitness was quantified by a performance error F in the input-output characteristics: where the absolute difference between the desired and trained levels of each (i-th) enzyme is divided by the desired enzyme level, and then summed over all training input-output pairs (the number of the pairs is X). Finally, this value is divided by the number of boundary metabolite conditions (X) and the number of enzymes (Y) to allow comparing models with a different number of boundary metabolite conditions and/or number of enzymes. In other words, the algorithm optimized percentage error per boundary metabolite condition per enzyme.
Our in silico approach for GRN evolution is schematically described in Figure 8(b). After evaluating the population fitness, a subset of the GRNs was selected to breed for a new generation. Tournament selection was used to select S ‘winning’ GRNs, and elitism was used to guarantee survival of the N−S fittest GRNs, for a new population of N GRNs. Tournament selection was performed by randomly selecting two GRNs and returning the fittest GRN. After performing tournament selection, the S ‘winners’ were subject to structural mutations (insertion, deletion, and transmutation of activators and repressors) and parameter mutations (bounded randomization of kinetic coefficients, as outlined below).
Parameters Vmax, Vdeg, Ka, and h (Hill coefficient) were permitted to be randomized between carefully determined boundaries. For example, a parameter value (pbefore) can be multiplied by a random value between zero and two to obtain a new one (pafter). These random values are drawn from a uniform distribution. As a result, the mean values of the parameter before and after the perturbation are identical and the standard deviation of the perturbed parameter value (pafter) is equal to . To prevent upwards drift over time a new random value between the minimum and maximum parameter value was assigned if a parameter value mutated to out of boundary values.
Optimization of fluxes for general metabolic networks
For more general metabolic pathways, a different optimization algorithm was used. In contrast to the previous optimization algorithm in which metabolic flux was optimized under the constraint of fixed total enzyme mass, this optimization algorithm was performed to minimize the total enzyme mass given the constraint of fixed metabolic fluxes. Because these fluxes are fixed in this algorithm, the change in the flux due to parameter perturbations must be zero: where k describes the independent flux and i the reaction number. This represents an under-determined system of linear equations, which means that this system cannot be solved for each δE. Thus, we semi-randomly selected V−W enzymes to perturb their concentrations, where V and W are the number of enzymes in the system and fixed fluxes, respectively. Then, the remaining enzyme concentrations were computed by using Equation (5). If the total enzyme quantity decreased, we accepted the proposed perturbations. Otherwise, the enzyme concentrations were reset to the previous values. As a consequence, the enzyme concentrations can be updated such that total enzyme mass decreases under the constraint of fixed fluxes.
Algorithm implementation
The software for performing the optimization algorithms for metabolic networks was developed in Python [36] by using the Python libraries NumPy [37], SymPy [38], and PySCeS [39]. All metabolic and gene regulatory networks were written in the PySCeS Modeling Description Language (MDL). The software for the in silico evolution of GRNs was developed in a combination of C++ and Python. This implementation makes use of the Mersenne-Twister pseudo-random number generator [40].
Acknowledgments
The authors gratefully acknowledge the contributions of NSF (No. 0827592 in Theoretical Biology, No. 1158573 in Molecular and Cellular Biosciences, No. 0527023 in Integrative Biological Research), NWO FOM/ALW 10TBSC24, and the Washington NASA Space Grant. This project was carried out within the research programme of BioSolar Cells, co-financed by the Dutch Ministry of Economic Affairs.