Abstract
Microbial communities perform key functions for the host they associate with or the environment they reside in. Our ability to control those microbial communities is crucial for maintaining or even enhancing the well-being of their host or environment. But this potential has not been fully harvested due to the lack of a systematic method to control those complex microbial communities. Here we introduce a theoretical framework to rigorously address this challenge, based on the new notion of structural accessibility. This framework allows the identification of minimal sets of “driver species” through which we can achieve feasible control of the entire community. We apply our framework to control the core microbiota of a sea sponge and the gut microbiota of gnotobiotic mice infected with C. difficile. This control-theoretical framework fundamentally enhances our ability to effectively manage and control complex microbial communities, such as the human gut microbiota. In particular, the concept of driver species of a microbial community holds translational promise in the design of probiotic cocktails for various diseases associated with disrupted microbiota.
INTRODUCTION
Microbial communities (MCs) play vital roles in the well-being of their hosts or environment [1–4]. For example, the human gut microbiota —the aggregate of microorganisms that resides in our intestines— plays a very important role in human physiology and diseases. Many gastrointestinal diseases such as inflammatory bowel disease, irritable bowel syndrome, and C. difficile Infection, as well as a variety of non-gastrointestinal disorders as divergent as autism and obesity, have been associated with disrupted gut microbiota [5–7]. For soil microbiota, its disruption may reduce the resistance of crops to diseases [8, 9]. For ocean microbiota, the disruption may impact global climate by altering carbon sequestration rates in the oceans [3, 4] Controlling these disrupted MCs to restore their healthy or normal states will be important for addressing a variety of challenges, including complex human diseases, global warming and sustainable agriculture [10–13].
In general, MCs can be affected by four types of control actions. Bacteriostatic agents and bactericides are antibiotics that decrease the abundance of targeted species* by inhibiting their reproduction or directly killing them, respectively [14]. Prebiotics are chemical compounds that selectively stimulate the growth of targeted species [15]. The fourth type of control action is transplantation, in which certain combination of species (e.g., all species from a “healthier” MC) is introduced to an MC at discrete intervention time instants. For the human gut microbiota, probiotics administration [ 16] and fecal microbiota transplantations (FMTs) [17] are examples of transplantations. To date, the empirical use of those control actions have already shown their efficacy for restoring disrupted MCs. For example, transplanting adequate soil species (i.e., soil inoculation) has been shown to be the key to successful restoration of terrestrial ecosystems [18]. And FMT is so far the most successful therapy in treating patients with recurrent C. difficile Infection [17]. But in order to harvest their full potential for restoring healthy MCs from disrupted ones, a systematic and rational control design method is required [10, 11, 19].
There are two fundamental challenges. First and foremost, we lack an efficient algorithm to identify minimal sets of species, which we call driver species, whose control can help us steer the whole community to desired states. Consequently, we don’t quite understand why microbiota transplantation works for some conditions but fails for others. Second, even if we have those driver species right in our hands, microbial dynamics are typically complicated and highly uncertain, rendering the systematic design of control strategies extremely difficult. Here, we developed a new control-theoretic framework that systematically address those two challenges. We first define the new notion of structural accessibility and derive a graph-theoretical characterization of it. This enables us to efficiently identify minimal sets of driver species of any MC purely from the topology of its underlying ecological network. Once the driver species are identified, the notion of structural accessibility also allows us to systematically design control strategies that steer an MC towards desired states. We demonstrate our framework in restoring the core microbiota of a sea sponge and the gut microbiota of gnobiotic mice infected by C. difficile.
PROBLEM STATEMENT
Define the state of an MC as the abundance profile of its n species. Consider that host or environmental factors influencing the MC remain constant during the time-interval that the control is to be performed (see Remark 1 in SI-1.2). Then, the evolution of the state along time t can be described by a general population dynamics model in the form of the ordinary differential equation
where the function models the intrinsic growth and inter/intra-species interactions of the n species in the MC.
The exact functional form of f is often unknown because species can interact via a multitude of mechanisms [20], forming complex ecological networks [21] that give rise to various population dynamics models even at the scale of two species [22]. This leads us to only assume that (i) the topology of the underlying ecological network of the MC is known; and (ii) f is some meromorphic function (i.e., its n entries fi(x) are the quotient of analytic functions of x). The ecological network of an MC has one state node per species and edges iff the j-th species directly promotes or inhibits the growth of the i-th one (Fig.1A). Those interactions can be directly inferred from co-culture experiments [21, 23], or indirectly inferred from time-resolved metagenomics data using system identification techniques [24, 25]. Assumption (ii) is very mild and is satisfied for most population dynamics models (Remark 2 in SI-1), including the classical Generalized Lotka-Volterra (GLV) model: f (x) = diag(x) [Ax + r] with and the inter-species interaction matrix and the intrinsic growth rate vector, respectively [17, 19, 21, 26–32].
Consider now m different control inputs applied to an MC with population dynamics as Eq. (1), aiming to steer it from an initial towards a desired state (Fig. 1B). To describe which species are actuated by (i.e., directly susceptible to) the control inputs, we introduce the controlled ecological network . This network contains the same set of state nodes X and inter-species interaction edges as , as well as an additional control node set —one per control input— and edges of the form denote that the i-th species is actuated by the j-th control input (Fig. 1A).
We consider two control schemes to analyze the effect of the control inputs on the state of the MC. The first control scheme models a combination of prebiotics (ui (t) > 0) and bacteriostatic agents (ui (t) < 0) as continuous control signals modifying the growth of certain species (Fig.1C):
The second control scheme models a combination of probiotic administrations or microbiota transplantations (ui (t) > 0) and bactericides (ui (t) < 0) as impulsive control signals (Fig. 1D), which instantaneously modify the abundance of the actuated species at discrete time instants :
Here denotes the value of x “right after time t”. Notice how the state x(t) “jumps” if u(t) ≠ 0 because of the instantaneous increase or decrease of the abundance of the actuated species (Fig. 1C). For simplicity’s sake, we assume that the control signals are periodically applied every τ > 0 time units (i.e., tk+1 = tk + τ for all ).
In both control schemes, the entry gij(x) of the susceptibility matrix describes the susceptibility of the i-th species to the j-th control input. If gij ≢ 0, then the i-th species is actuated by the j-th control input. The detailed mechanism of the susceptibility of a species to certain control actions is typically complicated and not exactly known. But we can just assume that g is some unknown meromorphic function.
In the trivial case when each species is directly actuated by an independent control input (e.g., g(x) is the identity matrix In×n), the state of the MC can of course be fully controlled, indicating that the set of all species is a trivial set of driver species. This control strategy is certainly overkill, since it requires as many controls as species in the MC. Next we show that knowledge of the topology of can actually be exploited to significantly reduce the number of driver species.
RESULTS
Given a particular pair of meromorphic functions {f, g} in Eqs. (2) or (3), the topology of its controlled ecological network is given by iff xj appears in the right-hand side of for continuous control, or in the right-hand side of or xi(t+) for impulsive control. For both control schemes, iff gij ≢ 0. Knowing only the topology of the ecological network of the MC, we are led to consider the class of all controlled dynamics of the form of Eqs. (2) or (3) with the same network topology (i.e., all meromorphic f’s and g’s such that , see SI-3.1).
Structural accessibility
To identify minimal sets of driver species, we introduce the notion of structural accessibility of . The class is structurally accessible if at least one of its pairs {f, g} lacks autonomous elements —that is, internal variables of the system, involving certain combination of species, that are completely unaffected by the control inputs (Definitions 1 and 3 in SI-2). We further proved that this condition implies that almost all other pairs in lack autonomous elements as well (Theorem 3 in SI-3). The absence of autonomous elements guarantees that the dimension of their set of reachable states equals the dimension of the state space itself, so the control actions are effective enough to locally steer and independently manipulate the whole state of the MC (Fig.2A). Therefore, if is structurally accessible, its corresponding set of actuated species is defined as a set of driver species, and the condition of structural accessibility of can be used to determine minimal sets of driver species.
By contrast, is not structurally accessible if all its pairs have autonomous elements. This implies that, for all population dynamics f and all susceptibility matrices g, the set of states that can be reached by using some control input u is constrained to a low-dimensional manifold, representing an underlying constraint between some species abundances that the control inputs cannot brake (Fig.2B). Consequently, those actuated species (encoded in the susceptibility matrix g) cannot be a set of driver species.
We also say that a pair is structurally accessible if is structurally accessible. We proved that if a pair {f, g} is structurally accessible but does have autonomous elements, then there always exists an infinitesimal deformation than can be made to {f, g} such that the new pair is free of autonomous elements (Proposition 1 in SI-3). If one had originally chosen {f, g} as an adequate dynamics model for a controlled MC, the new pair free of autonomous elements is as adequate as the original one, since it predicts essentially the same temporal behavior of the MC. Therefore, in practice, for a structurally accessible class of controlled microbial dynamics models , those pairs with autonomous elements can be always circumvented.
Characterizing the structural accessibility of requires us to define the notion of autonomous element for impulsive control systems for the first time (Definitions 3 in SI-2), since this notion has been studied only for systems with continuous control inputs [33]. We then characterized the conditions for the absence of autonomous elements for a given pair {f, g}, finding that, surprisingly, such conditions are identical for systems with continuous and impulsive control (Theorem 2 and Remark 3 in SI-2). This indicates that, for controlling MCs, impulsive control actions can be as effective as continuous ones, making unnecessary the use of continuous control —which is hard to implement anyway for many MCs, especially the human gut microbiota. Note that other existing approaches that could steer an MC towards desired states, such as “clamping” the abundance of the species in the so-called Feedback Vertex Set of to their values at the desired state [34], also require continuous actuation. This encouraging result assures us to further develop microbiome-based therapies such as probiotic cocktails and FMTs in an impulsive control manner.
Finally, we used the above obtained conditions to derive a graph-theoretical characterization of the structural accessibility of (Theorem 5 in SI-3). We proved that is structurally accessible if and only if (i) each state (species) node in is the end-node of a path that starts in the control input node set U; and (ii) does not have pure dilations of the control inputs. Let S ⊆ X be a set of state nodes and denote its neighborhood set by T(S) ⊆ X ∪ U (i.e., the set of all nodes that point to S). The controlled ecological network has a pure dilation of the control inputs (Definition 8 in SI-3) if there exists a set S of state nodes such that T(S) only contains control input nodes (i.e., T(S) ⊆ U) and the size of T(S) is smaller than the size of S, i.e., |T(S)| < |S| (Fig.2B).
Finding minimal sets of driver species
The above graph-theoretical characterization of structural accessibility provides a systematic method to identify minimal sets of driver species for any MC that has known ecological network and meromorphic functions for its controlled population dynamics model. Algorithmically, a minimal set of driver species can be obtained by choosing one node in each of the root Strongly Connected Components (SCCs) of (Fig. 2A). Here, an SCC of is a maximal subgraph such that there is a directed path between any two of its nodes. A root SCC is an SCC without incoming edges. If one node per root SCC is actuated by at least one independent control input, the resulting controlled ecological network cannot have pure dilations in the control input, rendering the set of all possible controlled dynamical systems structurally accessible. Note that the SCCs of general directed graphs can be found in linear time [35, pp. 552-557].
Notice that once satisfies the conditions for structural accessibility, this cannot be undone by adding new edges to the network. Thus, for example, if we find the driver species of an ecological network that includes only the high-confidence interactions in an MC, the identified driver species remain driver species even if additional interactions exist. Note also that adding or removing self-loops do not change the driver species of , since its SCCs remain unchanged.
Designing control signals that steer MCs to desired states
Next we discuss the design of the control signals u(t) that should be applied to those driver species to steer the entire MC towards some desired state. We will focus on feedback control signals u(x(t)), as they provide robustness against uncertainty in the system dynamics [36]. We consider two scenarios: when the controlled dynamics of the MC is approximately known, and when it is highly uncertain. In the first scenario, we show that the knowledge of the population dynamics can be used to “optimally” steer the MC towards a desired state. In the second scenario, the controller needs to be designed to cope with large uncertainties in the population dynamics of the MC.
If a structurally accessible pair is known to adequately model the controlled population dynamics of the MC, then the absence of autonomous elements can be used to systematically build feedback controllers u(x(t)) that steer the system towards desired states using the so-called “feedback linearization” methodology [33, pp. 119]. Such methodology yields controllers that steer the MC towards arbitrary desired states, excepting a set of “singularities” of zero measure. In addition of steering the MC to desired states, the feedback controller can be designed to satisfy certain optimality conditions using a linear quadratic regulator [36]. For example, the controller can provide an optimal tradeoff between the convergence rate towards the desired state xf and the “control energy” that is spent (e.g., a proxy for total abundance of species that is transplanted or eliminated by bactericides), by minimizing the quadratic index
for continuous control. For impulsive control, the above integral should be replaced by a sum over . Here and are the linearized state and control inputs, respectively, produced by the feedback linearization methodology. The symmetric positive semi-definite matrices and are design parameters: “large” R highly penalizes the control effort, while “large” Q enhances the convergence rate.
As a concrete example, consider the toy MC shown in Fig. 1. This small MC contains 3 species and the underlying ecological network is depicted in Fig. 1A. has only one root SCC and this root SCC contains only one species x3, corresponding to the only driver species that needs to be actuated. The controlled network does satisfy the graph-theoretical criteria of structural accessibility. Hence the microbial system is structurally accessible. Indeed, a controller can be designed using the feedback linearization methodology (see Examples 1 and 2 in SI-2) to obtain a suitable control input u(t) (either continuous or impulsive) that can be applied to the driver species x-3 to steer steer the whole MC towards the desired steady-state (Fig.1B). The obtained continuous (or impulsive) control input u(t) is shown in Fig. 1C (or Fig. 1D), respectively. The time evolution of the species abundances are also shown in Fig. 1C and Fig. 1D.
ℓ-structural accessibility
Without detailed knowledge of an adequate pair of functions {f, g} that models the controlled population dynamics of the MC, a systematic controller design for arbitrary MCs would be very challenging if not impossible [36, 37]. Indeed, the only general class of uncertain systems for which systematic controller design methods exist is linear systems. However, the selection of driver species by making structurally accessible does not always guarantee that those robust linear control techniques can be applied. This is because, given an structurally accessible class , it is possible that only its nonlinear function pairs lack autonomous elements (Example 3 in SI-3). This implies that there exist states that cannot be reached using any linear controller, despite being infinitesimally close to the initial state of the MC.
To circumvent the above limitation and allow for the design of feedback controllers that can (at least locally) steer an arbitrary MC with uncertain dynamics, we introduce the notion of ℓ-structural accessibility. The class is said ℓ-structural accessible if at least one of its linear pairs {Ax, B} is free of autonomous elements. We can also prove that this condition implies that almost all linear pairs in are free of autonomous elements (Proposition 7 in SI-3). Note that the zero-nonzero patterns of the matrices and are fully determined by . Note also that, for linear systems , the absence of autonomous elements is equivalent to their (linear) controllability [33] —the intrinsic ability to steer these linear systems between two arbitrary states, which can be verified by the famous Kalman’s rank condition: rank[B, AB, A2B,… , An−1B] = n [36].
If is ℓ-structurally accessible and is any pair that adequately models the population dynamics of the controlled MC, it is always possible to rewrite this pair as
where (wx, wu) = (f −Ax, g−B) are some nonlinear functions, and the pair (A, B) is linearly controllable. The selection of (A, B) now becomes a part of the controller design, with better selections (i.e., those minimizing wx and wu) leading to controllers that can steer the system to more remote states. Of course, the best selection would be the linearization of {f, g} at the initial state of the MC with zero control. This, however, requires exact knowledge of the uncertain {f, g}. To circumvent this requirement, we propose selecting some proxy (A, B) of this linearization; for example, the A matrix can simply be the weighted adjacency matrix of . Note that, regardless of the selection, the controllability of (A, B) guarantees that optimal robust linear feedback controllers u = Kx can always be designed to locally steer the MC. Indeed, the feedback gain matrix can be systematically designed using a variety of methodologies, such as linear quadratic regulators or control (see details in SI-4).
We proved that, with either continuous or impulsive control, is ℓ-structurally accessible iff it is structurally accessible and, additionally, its controlled ecological network has no dilations in the state nodes (Theorem 6 is SI-3). A dilation in the state nodes exists if there is a subset of state nodes S such that it neighborhood set T(S) ⊆ U ∪ X has smaller size than S. These conditions turn out to be equivalent to the conditions for linear structural controllability of [38], which has recently received renewed attention in the context of network science [39, 40].
Finding minimal sets of ℓ-driver species
Minimal sets of ℓ-driver species can be efficiently found by solving the maximum matching problem on the directed graph [39]. Indeed, in addition of having a path that starts in U to each state node, it is necessary that X is covered by a disjoint union of cycles and paths (a.k.a. the cactus structure in linear structural control theory [38]). Notice that for general population dynamics model, each species node in the ecological network will usually have a self-loop due to the intrinsic growth and the intra-species interactions. Consequently, all state nodes are matched by themselves and the sets of ℓ-driver species and driver species coincide. This result suggest that, for most MCs, it might be enough to find their sets of driver species to design robust linear controllers.
Using control to restore the gut microbiota of mice
We identified a minimal set of driver species and ℓ-driver species for the gut microbiota of germ-free mice that are pre-colonized with a mixture of human commensal bacterial type stains and then infected with C. difficile spores [24]. The underlying ecological network was inferred from time-resolved metagenomics data. This MC has fourteen strains (Fig. 3A), with R. obeum and R. mirabilis (x1 and x12) being the root SCCs of , thus forming the minimum set of driver species of the community. As discussed earlier, if all species have intra-species interactions, this set of driver species would also be a set of ℓ-driver species.
To further validate our conclusion that a set of driver species or ℓ-driver species remains valid after adding new edges to the network, we consider finding a set of ℓ-driver species for the ecological network in which all self-loops are removed. This makes necessary adding three more actuated species —for example, B. ovalus, C. ramnosum and A. muciniphila corresponding to {x2, x6, x10}— to obtain a disjoint union of cycles and paths covering all state nodes (Fig. 3A), providing a minimal set of ℓ-driver species. We validated by simulation the effectivity of the above five identified ℓ-driver species for steering the system towards a desired state despite uncertainty in the controlled system dynamics and the presence of self-loops. With this aim, the simulation results use the GLV model, which implicitly adds a self-loop to each state node (see SI-5 for details on the dynamic model used for the simulation). To design the controller, we selected the A matrix in Eq. (4) as the weighted adjacency matrix of without self-loops. The zero-nonzero pattern of B is determined by the selected driver species, while the value of its non-zero entries were randomly chosen. Linear optimal robust controllers were designed (see SI-4.4) and the resulting control signal was applied to the driver species to steer the system from an initial “diseased” state , in which C. difficile is overabundant compared to the rest of the species, towards a desired state with a more balanced abundance of species (Fig. 3B-D). The efficacy of the control signal also shows that the designed controller is robust enough to steer the system despite not knowing the presence of self-loops.
Controlling the core microbiota of a sea sponge
We also identified and validated a minimal set of driver species and ℓ-driver species for controlling the core microbiota of the sea sponge Ircina oros [25], see SI-5 for details of the dynamic model used for the simulation. This MC has twenty species; 14 of them are driver species, and two more are needed to obtain a minimal set of ℓ-driver species (Fig. 3E-H).
DISCUSSION
The notion of structural accessibility we introduced here is a generalization of the notion of structural controllability of linear systems (see e.g.[38–40] and Remark 7 in SI-3). It is also a generalization of the control theoretic notion of accessibility —defined as the absence of autonomous elements of a controlled system with known dynamics, a keystone notion in nonlinear control [33]— to systems with uncertain dynamics.
Although it has been noted before the that notion of controllability could be used to predict the success of ecosystem management strategies [41], such notion is not completely adequate for MCs and many other biological systems. First, for those systems, some states are unreachable simply due to their nature (e.g., states with negative abundances for MCs) and not due to ineffective control actions. Thus, it would be overkill to demand that the control actions provide controllability of the system. Second, precise dynamic models for some particular MCs might be unknown and very difficult to infer, making impossible to anlyze their controllability. The notion of structural accessibility that we introduced here overcomes those two limitations, opening the door for controlling other complex biological systems beyond MCs. Gene regulatory systems, for example, have highly uncertain and very nonlinear dynamics, but their underlying gene regulatory networks have continued being mapped in exquisite detail.
To achieve the ultimate goal of controlling MCs, there are still several steps to be taken to further enhance our control theoretical framework. First of all, despite structural accessibility is insensitive to missing edges in the ecological network of the MC, the design of feedback controllers is not. Hence, the systematic design of controllers that can be very robust to missing interactions will be necessary in order to control real MCs. Second, we could pose the design of controllers with certain constraints, such as only taking positive values. For impulsive control inputs, for example, this corresponds to constraining the control actions to be only microbiota transplantation or probiotic administration. Along this direction, our framework would provide the basis for a systematic design of probiotic cocktails that restore disrupted MCs.
In conclusion, by identifying driver species, our framework shows that correctly inferring the ecological networks underlying MCs could open the door for the systematic and rational design of control strategies to steer MCs towards desired states. It will be necessary to design robust control strategies, letting us calculate adequate control signals despite having large uncertainties on the dynamics of MCs. Therefore, in order to fully harvest the potential benefits of controlling microbial communities for ourselves and our environment, a stronger synergy between microbiology, ecology and control theory will be necessary.
Footnotes
↵* Electronic address: mangulo{at}im.unam.mx
↵† Electronic address: yyl{at}channing.harvard.edu
↵* In this paper the term “species” is used in the general context of ecology, i.e., a set of organisms adapted to a particular set of resources in the environment. It doesn’t necessarily represent the lowest major taxonomic rank. In fact, one could think of organizing microbes by strains, genus or operational taxonomical units as well.