Abstract
Microbes comprise nearly half of all biomass on Earth. Almost every habitat on Earth is teeming with microbes, from hydrothermal vents to the human gastrointestinal tract. Those microbes form complex communities and play critical roles in maintaining the integrity of their environment or the well-being of their hosts. Controlling microbial communities can help us restore natural ecosystems and maintain healthy human microbiota. Yet, our ability to precisely manipulate microbial communities has been fundamentally impeded by the lack of a systematic framework to control them. Here we fill this gap by developing a control framework based on the new notion of structural accessibility. This framework allows identifying minimal sets of “driver species” through which we can achieve feasible control of the entire microbial community. We numerically validate our control framework on large microbial communities, and then we demonstrate its application for controlling the gut microbiota of gnotobiotic mice infected with Clostridium difficile and the core microbiota of the sea sponge Ircinia oros.
INTRODUCTION
Microorganisms form complex communities that play critical roles in maintaining the well-being of their hosts or the integrity of their environment1-4. This deep relationship can have severe consequences to the host or the environment when a microbial community is disrupted. In humans, for example, a disruption to the gut microbiota —the aggregate of microorganisms residing in our intestine— has been associated to gastrointestinal diseases such as irritable bowel syndrome, and Clostridium difficile Infection (CDI)5, 6. A variety of non-gastrointestinal disorders as divergent as autism, obesity, and cavernous cerebral malformations have also been associated with disrupted gut microbiota5, 7. For agriculture crops, a disruption to rhizosphere microbiota can reduce their disease resistance and hence affect the overall crop yield8, 9. In the oceans, a disruption to their microbiota can impact global climate by altering carbon sequestration rates3, 4, 10. Driving these microbial communities back to their healthy states has the potential to bring novel solutions to prevent and treat complex human diseases, enhance sustainable agriculture, and regulate global warming11, 12. For example, inoculation of soil microbes can restore terrestrial ecosystems13, and Fecal Microbiota Transplantation (FMT) is so far the most successful therapy in treating patients with recurrent CDI by restoring disrupted gut microbiota14. Despite the success of these two empirical strategies, a broad application of microbial-manipulation strategies will only be possible if we can efficiently and systematically control large complex microbial communities15.
There are two main challenges to efficiently control a large complex microbial community. First and foremost, an efficient control method should only manipulate the minimal necessary number of species in the community. However, we still lack a method to systematically identify minimal sets of those “driver species” whose control can help us drive the whole community to desired states. Here, we use the term “species” in the general context of ecology, i.e., as a set of organisms adapted to a particular set of resources in the environment. It doesn’t necessarily represent the lowest major taxonomic rank. In fact, one could think of organizing microbes by strains, genera, or operational taxonomical units as well. Second, even if those driver species were known, calculating the control strategy that should be applied to them for driving the community towards the desired state remains somewhat tricky (e.g., it is difficult to calculate how much the abundance of those drive species needs to be increased or decreased). The difficulty in solving this second challenge is not only due to our insufficient knowledge of microbial dynamics and interactions, but also because of the inherently complex dynamics they often display.
To efficiently and systematically control large complex communities, here we develop a framework showing that the above two challenges can be addressed by focusing on the ecological network underlying the microbial community. We first introduce the new notion of “structural accessibility” and derive its graph-theoretical characterization. This theoretical result enables us to efficiently identify minimal sets of driver species of any microbial community purely from the topology of its underlying ecological network, even if some microbial interactions are missing and its population dynamics is unknown. Structural accessibility is a generalization of the notion of structural controllability16 —which only applies to systems with linear dynamics— to systems with nonlinear dynamics. Linear structural controllability is receiving increasing attention from the viewpoint of Network Science17. Once the driver species are identified, we systematically design feedback control strategies to drive a microbial community towards the desired state, even if the microbial dynamics is not precisely known. We numerically validated our control framework in large microbial communities, analyzing its performance for different parameters of the community we aim to control (e.g., the connectivity of its underlying ecological network), and with respect to errors in the ecological network used to identify the driver species. Finally, we demonstrate our framework by controlling the core microbiota of the sea sponge Ircinia oros, and restoring the gut microbiota of gnotobiotic mice infected by Clostridium difficile. Our results provide a rational and systematic framework to control microbial communities and other complex ecosystems based only on knowing their underlying ecological networks.
PROBLEM STATEMENT
In our modeling framework, we focus on exploring the impact that manipulating a subset of species has on the abundances of other species. We thus consider a microbial community whose state at time t can be determined from the abundance profile x(t) ∈ ℝN of its N species, where the i-th entry xi(t) of x(t) represents the abundance of the i-th species at time t. Let us assume that the state evolves according to some general population dynamics where the function f: ℝN → ℝN models the intrinsic growth and inter/intra-species interactions of the community (see Supplementary Note 1 for details). For most microbial communities the function f is unknown and difficult to infer due to the manifold of interaction mechanisms between microbes, such as cross-feeding and modulation by the host immune system18. Thus we assume that f(x) is some unknown meromorphic function (i.e., each entry fi(x) is the quotient of analytic functions of x). This is a very mild assumption that is satisfied by most population dynamics models19.
Instead of knowing the population dynamics of the microbial community, we assume we know its underlying ecological network . This network is defined as a directed graph where nodes X = {x1, ⋯, xN} represent species and edges (xj → xi) ∈ E denote that the j-th species has a direct ecological impact (e.g., direct promotion or inhibition) on the i-th species (Fig.1a). Mapping these ecological networks requires performing mono-culture and co-culture experiments20, 21, using time-resolved abundance data and system identification techniques22, 23, or using steady-state abundance data via a recently developed inference method24. The accuracy of all these methods strongly depends on how informative is the available data25. Note that these ecological networks are different from correlation or co-occurrence based networks because correlation doesn’t imply causation26. Correlation-based networks can be readily constructed from abundance profiles of different samples20, 27 and, under certain specific conditions28, they could be a proxy of the underlying ecological network.
Controlling a microbial community consists in driving its state from an initial value x0 = x(0) ∈ ℝN at time t = 0 (e.g., a “diseased” state) towards a desired value xd ∈ ℝN (e.g., a “healthier” state, Fig. 1b). We consider that the community will not naturally evolve to the desired state. To drive the microbial community, we consider a set of M control inputs u(t) ∈ ℝM that directly affect certain species that we call actuated species (Fig. 1a). These control inputs encode a combination of M control actions that are simultaneously applied to the community at time t. There are four types of control actions that we consider. If uj (t) < 0, the j-th control action at time t is either a bacteriostatic agent or a bactericide, which decreases the abundance of the species it actuates by inhibiting their reproduction or directly killing them, respectively29. If uj (t) > 0, the j-th control action at time t is either a prebiotic30 or a transplantation, which stimulate the growth or engrafts a consortium of the species it actuates, respectively. For the human gut microbiota, probiotics administration31 and FMTs14 are examples of transplantations. We introduce the controlled ecological network of the community to specify which species are actuated by each control input. Here, the set U = {u1, ⋯, uM} is the set of control input nodes, and the edge (uj → xi) ∈ B denotes that the the j-th control input actuates the i-th species (Fig.1a).
Given a controlled ecological network describing the interactions between species and which species are actuated by the control inputs, we next introduce two control schemes describing how the control inputs will affect the species. The first control scheme models a combination of prebi-otics (if uj (t) > 0) and bacteriostatic agents (if uj (t) < 0) as continuous control inputs modifying the growth of the actuated species (Fig. 1c):
The second control scheme considers a combination of transplantations (if uj (t) > 0) and bactericides (if uj(t) < 0) applied at discrete intervention instants , rendering impulsive control inputs that instantaneously modify the abundance of the actuated species (Fig. 1d):
In the above equation, the symbol x(t+) denotes the state “right after time t”, so a control input u(t) ≠ 0 at makes x(t) “jump” at that time instant. Thus, control actions are classified as impulsive if they instantaneously modify the abundance of some species, and continuous otherwise (see Supplementary Note 1.2 for details).
Both control schemes are characterized by the pair of functions {f, g}, describing the controlled population dynamics of the microbial community. As we have seen, the function f: ℝN → ℝN models the intrinsic growth and inter/intra-species interactions. The function g: ℝN → ℝN×M models the direct susceptibility of the species to the control actions. The i-th species is actuated by the j-th control input if the (i, j)-th entry of g(x) satisfies gij(x) ≢ 0. As in the uncontrolled community of Eq. (1), the function g(x) is typically unknown because the mechanisms of susceptibility to the control actions can be uncertain. Thus we assume that g is some unknown meromorphic function such that gij ≢ 0 iff (uj → xi) ∈ B.
Notice that when all species are directly controlled (i.e., each species is actuated by an independent control input so M = N and g(x) is full rank), the state of the whole microbial community can obviously be fully controlled. Fortunately, as we next show, controlling all the species in a community is far from being necessary. Indeed, several species can be indirectly controlled by the same control input when this signal is adequately propagated through the ecological network underlying the community. Thus, our first goal is to identify minimal sets of species that we need to actuate in order to drive the entire community. We call those species driver species. We will also study if the impulsive control scheme can be as effective as the continuous control scheme for controlling microbial communities. Indeed, the former is more feasible than the latter, especially for human-associated microbial communities. Finally, we will design the control inputs that should be applied to the identified driver species to drive the whole community towards the desired state.
IDENTIFYING DRIVER SPECIES
Driver species are characterized by the absence of autonomous elements
To understand when a set of actuated species is a set of driver species, consider a three-species community with the classical Generalized Lotka-Volterra (GLV) population dynamics (Fig. 2a). This toy community has one control input actuating the third species x3. Actuating this species alone creates an autonomous element —namely, a constraint between some species abundances that the control input cannot break, confining the state of the community to a low-dimensional manifold. More precisely, our mathematical formalism reveals that ξ = x1 x2 is an autonomous element for this microbial community (Example 2 in Supplementary Note 2). Indeed, differentiating ξ with respect to time yields , which implies that the state of the community is constrained to the low-dimensional manifold {x ∈ ℝ3|x1x2 = x1(0)x2(0)} for all control inputs (Fig. 2a right). Intuitively, an autonomous element exists because the control input cannot change the abundance of species x1 without changing the abundance of species x2 in a predefined way (i.e., x2 = x1(0)x2(0)/x1). It is thus impossible to drive the whole community in its three-dimensional state space, implying that x3 cannot be a driver species for this community. Introducing a second control input actuating species x1 eliminates this autonomous element by helping the system to jump out of the low-dimensional manifold. Hence, the community can be driven in any direction within its three-dimensional state space (Fig. 2b). This indicates that {x1, x3} is a minimal set of driver species for this community. Actually, by using these two driver species we can steer the community to any desired state with positive abundances (Example 6 in Supplementary Note 5).
In the general case of N species and M control inputs, we define a set of actuated species as a set of driver species if the corresponding controlled population dynamics {f, g} of the microbial community lacks autonomous elements. For a given pair {f, g}, the absence of autonomous elements can be mathematically deduced using a formalism based on differential one-forms (Supplementary Note 2). Indeed, for the continuous control scheme of Eq. (2), the conditions for the absence of autonomous elements are well understood because they define when a system is accessible32. As a cornerstone concept in nonlinear control theory, accessibility has been instrumental for developing technological advances such as robotics. Since it is more natural to control microbial communities with impulsive control actions, in this paper we extended the study of autonomous elements to the impulsive control systems of Eq. (3). For this, we first introduced a mathematical definition of autonomous elements for impulsive control systems (Definition 3 in Supplementary Note 2). Using this definition, we characterized necessary and sufficient conditions for the absence of autonomous elements in a given controlled population dynamics (Theorem 2 in Supplementary Note 2).
To our surprise, we found that the conditions for the absence of autonomous elements for the continuous and the impulsive control schemes are identical (Remark 2 in Supplementary Note 2). This result suggests that, for controlling microbial communities, transplantations and bactericides (impulsive control actions) can be as effective as prebiotics and bacteriostatic agents (continuous control actions). Since impulsive control actions could be simpler to implement for many microbial communities such as the human gut microbiota, this result assures us to further develop microbiome-based therapies in the form of probiotic cocktails and FMTs.
Structural accessibility characterizes the generic absence of autonomous elements
For complex microbial communities such as the human gut microbiota, it is very difficult to choose an adequate pair {f, g} to model its controlled population dynamics. As the autonomous elements depend on such a pair, this might suggest that it is impossible to predict their presence and thus to identify the driver species of complex microbial communities. We now show that this seemingly unavoidable limitation can be solved by focusing on the topology of the controlled ecological network of the community.
Define the graph associated with a meromorphic function pair {f, g} as follows. First, the edge (xj → xi) ∈ Ef,g exists if xj appears in the right-hand side of or xi(t+) in Eqs. (2) or (3), respectively. Second, the edge (uj → xi) ∈ Bf,g exists if gij ≢ 0. In this definition, the interaction xj → xi can originate in the uncontrolled population dynamics (i.e., fi(x) depends on xj) or, in a more general case, also in the controlled dynamics (i.e., the i-th row of g(x) depends on xj). Using this definition and given a controlled ecological network , we can describe the class of all possible controlled population dynamics that the controlled microbial community can have. Mathematically, we describe the class as containing all base models {f*, g*} such that , together with all deformations {f, g} of each of those base models. The base models characterize the simplest controlled population dynamics that the community can have. We have chosen them as controlled GLV models with constant susceptibilities: for i = 1, ⋯, N. The base models are parametrized by A = (aij) ∈ ℝN×N, r = (ri) ∈ ℝN, and B = (bij) ∈ ℝN×M, representing the interaction matrix, the intrinsic growth rate vector, and the susceptibility matrix of the community, respectively. Thus, the base models in are all controlled GLV models such that their graph matches . As a classical population dynamics model, the GLV model has been applied to microbial communities in lakes, soils, and human bodies14, 15, 20, 33–39. Notice that in a microbial community, any species that gets extinct cannot “resurrect” by itself without some external influence such as a transplantation or migration. Eq. (4) is the simplest population dynamics that satisfies this condition in the following sense: it is obtained by considering population dynamics of the form fi(x) = xiFi(x), and then choosing the functions Fi(x) to be simple affine functions.
Next, we say that a meromorphic pair {f, g} is a deformation of a base model {f*, g*} if it satisfies the following three conditions: (i) it has the same graph as the base model (i.e., ); (ii) there exists a finite set of parameters θ ∈ ℝC such that ; and (iii) the identity holds. The minimal integer C ≥ 0 for which these conditions are satisfied is called the size of the deformation, quantifying the cardinality of the parameter set θ that is needed to obtain the deformation from the base model. A rather general class of controlled population dynamics can be described by deformations of the base model of Eq. (4), such as for i = 1, ⋯, N. In Eq. (5), the parameters θi,1 are migration rates from/to neighboring habitats, are the carrying capacities of the environment, are the Allee constants, and the rest characterize the saturation of the functional responses40. Note that θi,1 > 0 can also model species like C. difficile that sporulate into “inactive” forms and then recover. Note also that “higher-order” interactions can be described as deformations. For example, if species xi is directly affected by species xj and xk, then a deformation can include the third-order interaction θixixjxk. Similarly, deformations allow cases when the susceptibility of the i-th species to j-th control input is mediated by the abundance of other species. For example, the deformation gij(x; θ) = bij + θjikxk models a case when the i-th species is actuated by the j-th control input but its effect is mediated by the abundance of the k-th species.
We call the class structurally accessible if almost all of its base models and almost all of their deformations lack autonomous elements. This means that, except for a zero-measure set of “singularities”, all the controlled population dynamics that the community may take have to lack autonomous elements. The conditions under which is structurally accessible are fully characterized using our mathematical formalism (Supplementary Note 3), and they depend only on the underlying controlled ecological network . We first proved that, generically, increasing the size of a deformation cannot create autonomous elements (see Proposition 1 in Supplementary Note 3, and Fig. 2c for an illustration). This result reduces the search for autonomous elements to the deformations in with minimal size C = 0. That is, to all base models whose graph matches . Finally, we proved that is structurally accessible if and only if satisfies the following two conditions: (i) each species is the end-node of a path that starts at a control input node; and (ii) there is a disjoint union of cycles (excluding self-loops) and paths that cover all species nodes (see Theorem 3 of Supplementary Note 3). If these two graph conditions are satisfied, we also call structurally accessible.
The notion of structural accessibility introduced above is a nonlinear counterpart of the notion of structural controllability for linear systems16. For linear systems we have {f(x), g(x)} = {Ax, B}, and the absence of autonomous elements is equivalent to their controllability32 —the intrinsic ability to drive the system between two arbitrary states, which can be verified by the celebrated Kalman’s rank condition: rank(B, AB, A2B,…, AN−1B) = N. Condition (i) above is necessary for both structural accessibility and linear structural controllability, requiring that the network contains paths that spread the influence of the control inputs to all species. However, for linear structural controllability, condition (ii) is sufficient but not necessary. More precisely, for linear structural controllability, the required disjoint union of cycles that cover the species nodes can also include self-loops due to intrinsic nodal dynamics (see Remark 4 in Supplementary Note 3).
Identifying minimal sets of driver species in microbial communities
The above result provides a complete graph-characterization of driver species: a set of actuated species is a set of driver species (for all but a zero-measure set of controlled population dynamics that the community may have) if and only if its corresponding is structurally accessible. We used this characterization to build an algorithm that identifies a minimal set of driver species from the ecological network of the community. More precisely, we mapped the satisfaction of the graph conditions (i) and (ii) into solving a maximum matching problem over the graph without self-loops (Proposition 3 in Supplementary Note 4). This result provides a polynomial time algorithm to identify one minimal set of driver species, making it feasible for large networks (Remark 5 in Supplementary Note 4).
Note that once is structurally accessible this network cannot lose its structural accessibility when new edges are added to it. This observation implies that a set of driver species remains valid even if new edges (e.g., new inter/intra-species interactions) are added to the ecological network of the community. Therefore, it is possible to find the driver species of a microbial community using an “incomplete” ecological network that only includes some of the ecological interactions (e.g., high-confidence interactions).
DRIVING THE DRIVER SPECIES
Next we turn to the question of calculating the control signal u(t) that needs to be applied to a set of driver species to drive the whole community towards the desired state. We will show that impulsive control actions can make this calculation easier.
Calculating optimal control strategies for microbial communities with known population dynamics
To calculate the impulsive control inputs needed to drive the microbial community to the desired state xd we adopt a model predictive control (MPC) approach41. First, based on the current state of the community x(tk) at the intervention instant , we use knowledge of its controlled population dynamics to predict the sequence of states that the community will take in response to a sequence of L impulsive control inputs Uk, L = {u(tk), ⋯, u(tk+L−1)}. The prediction horizon L > 0 quantifies how far into the future we predict. We then choose , where is the first element of the optimal control input sequence calculated by solving the following optimization problem:
Here Ω ⊆ ℝM×L is a set that specifies constraints in the control inputs we can use, and Jxd is some cost function penalizing deviations of the predicted trajectory from the desired state xd. For example, the simplest cost function penalizes deviations of the predicted final state from the desired state. Penalizing the deviations of intermediate states can provide a smoother transition to the desired state.
To choose the prediction horizon L in Eq. (6), we proved that it is possible to distinguish between two cases (Theorem 4 in Supplementary Note 5). The first case is when the community can be driven to the desired state using a finite number L of impulsive control actions. This number can be calculated from its controlled population dynamics. The second case is when the community can only be asymptotically driven to the desired state as time goes by, meaning that a “sufficiently large” L ≫ N should be used. This second case could be circumvented by increasing the number of driver species (Remark 8 in Supplementary Note 5). Note that by recalculating at each intervention instant using the actual state of the community, the MPC method creates a feedback loop that enhances its robustness against prediction errors due to uncertainty in the dynamics41. For L = 1 the proposed MPC methodology is similar to the network control method of Ref. (40). Eq. (6) is a finite-dimensional optimization problem that can be solved using several algorithms such as “DIRECT”42. By contrast, for continuous control actions, the analogous optimization problem is defined over the infinite-dimensional space of all M-dimensional continuous functions. Solving such optimization problem is apparently more difficult, significantly limiting our ability to calculate optimal continuous control actions.
We studied the performance of the above MPC strategy in the three-species microbial community with a solo driver species of Fig. 1. Given the dynamics of this community (see caption in Fig. 1), we find that L = 3 impulsive control inputs are sufficient to drive the whole community (Example 4 in Supplementary Note 5). To calculate the optimal control inputs we selected in Eq. (6). Solving the optimization problem using DIRECT yields the MPC strategy u*(t1) = −0.8815, u*(t2) = 2.0089 and u*(t3) = −10−4 (pink in Fig. 3a). We use this example to compare the performance of applying two other control strategies to drive this community. The first strategy uses a transplantation to restore the abundance of the driver species (i.e., increase its abundance to its desired value), expecting that such control action will drive the rest of the community to the desired state (purple in Fig. 3a). This control strategy is reminiscent of a probiotic administration that restores the “healthy” abundance of the driver species. The second control strategy ignores the driver species of this community, using two control inputs (instead of one) to set the abundance of the non-driver species to their desired values (blue in Fig. 3a).
Among the above control strategies, only the MPC applied to the driver species succeeds (Fig. 3b). Actually, this strategy succeeds in a somewhat unconventional way: despite the driver species is more abundant in the desired state than in the initial state, the first control action decreases further its abundance. This first control action makes the non-driver species reach their desired abundances and, once that happens, the abundance of the driver species is finally increased to its desired value (pink in Fig. 3b). The second control strategy succeeds in driving species x2 and x3, but it fails to drive x1 to the desired abundance because it approaches the desired state from an unstable direction (purple in Fig. 3b). Finally, not actuating the driver species results in the worst strategy, failing to drive a single species to the desired state (blue in Fig. 3b). This example demonstrates the importance of actuating the driver species.
Calculating control strategies for microbial communities with uncertain population dynamics or a large number of species
In general, solving the non-convex optimization problem of Eq. (6) is challenging as the number of species or prediction horizon increase. Also, a prerequisite for solving this optimization problem is a reasonable knowledge of the controlled population dynamics of the community, which may not available. To circumvent these two drawbacks, next we leverage the network underlying the controlled microbial community.
Consider that it is possible to obtain a weighted adjacency matrix  ∈ ℝN×N from the ecological network of the community, providing a proxy for its interaction matrix. Without additional knowledge of the susceptibility matrix of the community, we assume it is possible to increase or decrease as desired the abundance of each driver species. Under this assumption, we define as a proxy for the susceptibility matrix, with bij = 1 if the j-th control input actuates the i-th driver species. Next, by rewriting the controlled population dynamics of the community as , we use the pair to provide a linear prediction for the response of the community to the control inputs. Here, the nonlinear functions represent perturbations whose magnitude depend on how well the linear pair approximates the true dynamics {f(x),g(x)} of the community. Using this linear pair for predicting the response of the community to impulsive control actions, we design a linear MPC by solving the optimization problem of Eq. (6) with the quadratic cost function
In the above equation, the positive definite matrices and R = RT ∈ ℝM×M are design parameters. The matrix Q penalizes the deviations of the predicted trajectory from the desired state, and R quantifies the “cost” of using the control inputs. Under this scenario (i.e., a linear prediction model and quadratic cost), the solution to the optimization problem of Eq. (6) can be obtained in closed form43 even if L → ∞. This result enabled us to obtain the explicit form u(tk) = Kx(tk) for the linear MPC at time , where K ∈ ℝM×N is computed by solving a Riccati algebraic equation (Supplementary Note 6). Since the Riccati equation can be efficiently solved for large N, the linear MPC can be calculated for large microbial communities. The above linear MPC has several other advantages: it requires minimal knowledge of the controlled population dynamics of the community (i.e., the weighted adjacency matrix of its underlying ecological network); it is robust to the perturbations (wx,wu) and other uncertainties (Remark 12 in Supplementary Note 6); and it also allows calculating the control signals for the continuous control scheme (Remark 10 in Supplementary Note 6).
We used the above linear MPC for controlling the three-species community of Fig. 1, assuming its dynamics is unknown. Based on the ecological network of this community and its population dynamics (see Fig. 1 and its caption), we choose  = (−0.5,0, −0.1; 0, −5, 1; 0, 0, −1) as a proxy for its interaction matrix. Note that  is a rather rough approximation of the linearization of the population dynamics at the desired state given by (−0.37, 0, −0.05; 0, −5.31, 0.52; 0, 0, −1). Since {x3} is a solo driver species for this community, we use . Choosing Q = diag(20, 1, 10), we compared the performance of three different linear MPCs obtained by using the values R = 10−4, 10−3, 10−2 (Fig. 3c). The performance of the linear MPC strongly depends on the selection of these parameters. For R = 10−4, despite not using knowledge of the population dynamics, the performance of the linear MPC (pink in Fig. 3d) is very similar to the performance of the MPC that uses full knowledge of the nonlinear population dynamics (pink in Fig. 3b). The success of the linear MPC in driving a community with nonlinear population dynamics illustrates the robustness of the MPC strategy, since the controller succeeds despite having non-zero perturbations (wx, wu). As R increases, the performance of the linear MPC deteriorates, first using more interventions to reach the desired state (green in Fig. 3d), and finally failing to drive the system to the desired state (blue in Fig. 3d). Indeed, since R > 0 quantifies the “cost” of using control inputs, increasing R reduces the magnitude of the control inputs, to the point they are not large enough to drive the system towards the desired state. We emphasize that, in general, the performance of the linear MPC also depends on the chosen and the desired state (Remark 11 in Supplementary Note 6).
Numerical validation of the control framework on large microbial communities
To systematically validate our control framework, we considered communities of N = 100 species having random Erdös-Rényi ecological networks with a prescribed connectivity c ∈ [0, 1], see Fig. 4a. The network edge-weights are chosen from a normal distribution with zero mean and standard deviation σ ≥ 0, where σ characterizes the typical interspecies interaction strength. Negative self-loops with weights −1 were added to each species to ensure stability, representing intraspecies interactions. We use this ecological network to identify the driver species of the community, and its corresponding weighted adjacency matrix as the interaction matrix to construct the linear MPC. The parameters Q = 20 × 104IN×N, R = 0.15IM×M of the linear MPC were fixed for all communities, and the intervention time instants were chosen such that tk+1 − tk = 0.1. Next, we used Eq. (5) to numerically simulate the population dynamics of these communities. For this, we set the weighted adjacency matrix of the ecological network we built as the interaction matrix A in Eq. (5). We choose θi,j · = 0 for j = 1, ⋯, 6, and θij,7 uniformly at random from [0, θmax], where θmax is a parameter. Last, we choose the intrinsic growth rates ri to ensure all generated random communities share the desired state xd ∈ ℝN as an equilibrium point. Note all the constructed communities have nonlinear population dynamics, and their linearization at the desired state is not equal to the interaction matrix used for the linear MPC (see Supplementary Note 8 for details of this construction).
To quantify the success of our control framework on a particular community, we generate 300 initial species abundances that are uniformly distributed at a distance d > 0 from the desired state (distance is measured using the Euclidean norm). Then, the success rate of our control framework at distance d is defined as the proportion of those initial conditions that are driven to the desired state only when the linear MPC is applied to a minimal set of driver species of the community (Fig. 4b-d). Namely, the success rate discards all initial conditions that naturally evolve to the desired state. Finally, we calculated the mean success rate by averaging the success rate over 100 randomly constructed ecological networks (see items 7 and 8 of Supplementary Note 8 for details).
The mean success rate of our control framework changes with the distance to the desired state, being close to 1 for small distances regardless of the parameters of the microbial community (Fig. 4e-f). This result agrees well with the theoretical prediction that success is guaranteed provided that the distance to the desired state is small enough. We next investigated how the success rate changes with the distance d for different interspecies interaction strengths, and for different connectivities of the ecological network underlying the community. The success rate decreases as the interspecies interaction strength increase, especially for large distances (Fig. 4e). Since increasing the interspecies interaction strength damages the stability of the population dynamics44, this result suggests that microbial communities become “harder” to control as they lose stability. The success rate of our control framework is also higher in microbial communities whose ecological networks have lower connectivity (Fig. 4f). Note that, in general, the size of a minimal set of driver species decreases as the network connectivity increases. Therefore, this observations suggest that the success rate may increase as the number of driver species increases. Indeed, regardless of the distance to the desired state, we find that our control framework attains a success rate > 0.8 provided that we drive at least 6 of the 100 species (Fig. 4g). This last result also suggests that the success rate of our control framework can be enhanced by directly controlling a few additional species.
Finally, we investigated the robustness of our control framework to errors in the ecological network used for both identifying the driver species, and for calculating the linear MPC. Note that, despite structural accessibility is insensitive to missing interactions in the ecological network, the calculated linear MPC is not. Additionally, structural accessibility can be lost if some ecological interactions do not really exist in the ecological network. To introduce errors in the ecological network, we randomly rewire each of its edges with probability p ∈ [0, 1]. This rewiring probability determines the percentage of error introduced to the ecological network (e.g., p = 0.05 corresponds to a 5% error). Our control framework is robust to these errors, in the sense that the success rate deteriorates but remains larger than zero despite large errors (Fig. 4h). However, just a 5% error decreases the success rate in about 30%. This result illustrates that our framework is feasible for controlling large microbial communities provided we have an accurate map of their ecological networks.
APPLICATION
Mapping the ecological network of a microbial community allow us to identify its driver species. We identified a minimal set of driver species in the gut microbiota of germ-free mice that are pre-colonized with a mixture of human commensal bacterial type strains and then infected with Clostridium difficile spores22. We identified a minimal set of five driver species in this 14-species community: R. obeum (x1), R. mirabilis (x12), B. ovalus (x2), C. ramnosum (x6) and A. muciniphila (x10), see Fig. 5a. We also used the ecological network underlying the core microbiota of the sea sponge Ircinia oros23, finding ten driver species in this twenty-species community (Fig. 5b).
We studied by simulation the efficacy of the identified driver species and the linear MPC method for these two microbial communities, assuming that their dynamics are uncertain (see Supplementary Note 7 for details of the dynamics used for the simulation). For the mice gut microbiota, our framework succeeds in driving the whole community from an initial state where Clostridium difficile is overabundant, towards a desired state with a better balance of species (Figs. 5c and 5d). Similar results were obtained for controlling the core microbiota of Ircinia oros, using the ten identified driver species to drive the twenty species constituting this microbial community (Figs. 5e and 5f). The success of our control framework shows again that the linear MPC method is robust enough to drive microbial communities despite the presence of the perturbations (wx, wu).
DISCUSSION
An influential method to understand and manage complex ecosystems has been identifying species with a “big impact” on the entire ecosystem, leading to notions such as keystone45, 46 or core47 species. In general, the keystone or core species of an ecosystem are not necessarily its driver species. For example, the driver species of an ecosystem do not depend on their abundance, while the definition of keystone species does depend on the abundance —namely, species whose removal cause a disproportionate deleterious effect relative to their abundance45.
It was suggested that notion of controllability —the ability to drive a system between any two states— could help predicting the success of ecosystem management strategies48. For microbial communities and many other biological systems, it is inadequate to use the notion of controllability because there are states that those systems cannot reach by their nature (e.g., those states corresponding to negative abundances). Additionally, since dynamic models for microbial communities and other complex ecosystems are nonlinear, uncertain, and often very difficult to infer, it is impossible to even test if those systems are controllable or not. The notion of structural accessibility at the basis of our framework overcomes these two limitations, generalizing the control-theoretic notion of accessibility32 to systems with uncertain dynamics and impulsive control inputs. As result, our framework allows efficiently controlling microbial communities only knowing their underlying ecological networks. We note that our framework can be used to identify minimal sets of “driver variables” for biological systems beyond microbial communities when their underlying networks are known. For this, we just need to choose the adequate base model49 for each class of system. For example, we identified a single “driver protein” in the repressilator50 —a synthetic three-gene regulatory network that generates sustained oscillations— allowing us to eliminate those oscillations (Supplementary Note 8 and Fig. S2).
In this paper, we used a maximum matching based algorithm to identify a minimum set of driver species from the ecological network of a given microbial community. In principle, there could be multiple maximum matchings associated with the same network, rendering potentially different minimum sets of driver species. Note that those minimum driver species sets share the same cardinality. We claim that a minimum set of driver species is optimal only in the sense that its cardinality is minimal. If the cost of choosing any species as a driver species is known, one can develop a combinatorial optimization scheme to further pick up the best driver species set. But we feel this is beyond the scope of the current work and hence leave it for future work.
Rather counterintuitively, our mathematical formalism shows that increasing the complexity of the community’s population dynamics (measured by the size of the deformation) can only reduce the number of necessary driver species. In practice, however, increasing the complexity of the dynamics could render the design of the control strategies more difficult. Note that, in general, it can be expected that the design of control strategies becomes more difficult as the number of used driver species decreases (see Remark 9 in Supplementary Note 5). Additionally, we note that despite the minimal number of driver species decreases as the ecological network becomes denser, this condition is only sufficient. Indeed, the minimal number of driver species of a microbial community should be mainly determined by the degree distribution of the ecological network, since the maximum matching size of a directed network is largely determined by its degree distribution51.
For large communities with uncertain controlled population dynamics, we calculated the control actions using a linear prediction model with an infinite horizon. More sophisticated control algorithms, such as those based on reinforcement learning52 (RL), could provide better performance. Note that RL algorithms typically require specifying a-priori the “driver variables” they can actuate53. Our characterization of minimal sets of driver species should help to efficiently apply RL methods for controlling microbial communities and other biological systems. In practice, the performance of the control algorithms can also be improved by using more detailed models that incorporate the dynamics of the susceptibility of species to the control actions (e.g., the pharma cokinetics of prebiotics). In such case, different control actions could be modeled by different pairs {f, g} in Eqs. (2) or (3), making the conditions for the absence of autonomous elements different for continuous and impulsive control actions. We note that altering the ecological network of a microbial community or obtaining a “simplified” network, in the spirit of Refs.54 and 55, respectively, could be an alternative and complementary approach to controlling microbial communities (e.g., to reduce the number of necessary driver species).
Note also that in our deterministic framework we don’t consider the effects of stochasticity due to, e.g., immigration in microbial communities. From a theoretical viewpoint, incorporating stochastic effects into the model will turn Eqs. (2) and (3) into controlled stochastic differential equations, which are the material of a different scientific area. To the best of our knowledge, the characterization of the accessibility properties of those class of equations remains an open problem and their analysis become intractable in practice. Indeed, the very notion of an autonomous element —the basis for the concept of accessibility— would need to be reformulated. We consider this is beyond the scope of the current work and call for research activities of the control theory community in this area.
In conclusion, by identifying driver species, our framework shows that an accurate map of the ecological network underlying a microbial community opens the door for an efficient and systematic control. The driver species can be identified despite missing interactions in the ecological network, but our methods to calculate the adequate control actions can be sensitive to them. The design of controllers that are robust to missing interactions will be a necessary step for controlling real microbial communities. To fully harvest the potential benefits of controlling microbial communities a stronger synergy between microbiology, ecology, and control theory will be necessary.
Data availability
All the experimental datasets analyzed in this study are publicly available.
Code availability
A Julia implementation of the algorithm for identifying a minimal set of driver species, as well as all other functions necessary to reproduce the results of the paper, is provided at the GitHub repository: https://github.com/mtangulo/DriverSpecies.