Abstract
Dynamics of complex biological systems is driven by intricate networks, the current knowledge of which are often incomplete. The traditional systems biology modeling usually implements an ad hoc fixed set of differential equations with predefined function forms. Such an approach often suffers from overfitting or underfitting and thus inadequate predictive power, especially when dealing with systems of high complexity. This problem could be overcome by deep neuron network (DNN). Choosing pattern formation of the gap genes in Drosophila early embryogenesis as an example, we established a differential equation model whose synthesis term is expressed as a DNN. The model yields perfect fitting and impressively accurate predictions on mutant patterns. We further mapped the trained DNN into a simplified conventional regulation network, which is consistent with the existing body of knowledge. The DNN model could lay a foundation of “in-silico-embryo”, which can regenerate a great variety of interesting phenomena, and on which one can perform all kinds of perturbations to discover underlying mechanisms. This approach can be readily applied to a variety of complex biological systems.
Introduction
The early embryogenesis of Drosophila is a well-studied model system in developmental biology, characterized by a rapid cascade of gene expression patterns1. Under the guidance of maternal effect morphogens, a handful of gap genes form sophisticated spatial patterns across the embryo, serving as the blueprint for future body plan. Large amounts of experimental and modeling efforts have been devoted to uncovering the genetic interaction network and regulatory mechanisms underlying the pattern formation2–12, but mysteries still remain2,3,13.
Various mathematical models of gap genes’ expression have been constructed6–9. One kind of model8, as in most modeling approaches of biological systems, starts with a presumed network inferred from a body of experimental work, and/or simplified by the author’s opinion of what is important. Differential equations describing the rate change of each gene expression are written down, with gene regulation modeled by specific mathematical functions, e.g. the Hill function. Recognizing the fact that knowledge on regulations of gap genes may not be complete, another kind of model adopts a reverse engineering approach6,7. Genetic interactions are effectively expressed as a single layer neural-network-like architecture, with no prior constrains on regulatory structures. Regulations then emerge from data fitting. Both kinds of modeling had considerable success: certain important phenomena can be explained, gene expression data fitted, and the emerged regulation network in the second approach made some sense in comparing with the known knowledge. However, these models have inadequate predictive power.
This weakness in predictive power is natural here. The complexity in real biological systems such as this one may well exceed the capacity of these kinds of models. For example, the expression of each gap gene is contributed by 2~5 regulatory modules9,14 (enhancers and shadow enhancers15,16), each of which is regulated differently, and dynamical switch could happen between different enhancers15. Within each module, the regulatory sequence usually bears 10~20 binding sites of different transcription factors17 with unknown cooperativity among them9,18. Furthermore, apart from the 4 gap genes (hunchback (hb), Krüppel (Kr), knirps (kni) and giant (gt)) focused by most models and quantitative experiments, there are very likely to be a number of other genes relevant to this process as suggested by bioinformatics search, or even unknown factors as suggested by the experiment14. These and other unknown complexity may introduce strong nonlinearity within the equivalent regulation functions, making it almost impossible to be expressed by predefined formulas.
This sort of dilemma is not uncommon when dealing with complex systems. On the one hand, we would like to simplify the system, but often have little idea how to simplify it or whether it can in principle be simplified -- the models may easily be oversimplified. On the other hand, even if one manages to obtain equations with enough complexity, they typically contain too many parameters to avoid overfitting with finite amount of data. In some cases, this problem could be alleviated by a recently developed adaptive modeling approach for dynamical systems19. But its applicability in more demanding situations, such as the spatiotemporal patterning here, has yet to be tested.
In this study, we try a different approach to this complex problem -- deep neural networks (DNN)20–22. We hope DNN, instead of regulation equations with prefixed forms, can alleviate the dilemma of model capacity. For reasons not yet completely clear, neural networks have almost infinite fitting power, but hardly overfit even without any regularization techniques23. To a certain extent, it is a kind of “self-adapting model”, adjusting its own capacity to fit and avoid overfitting, thus overcoming the above-mentioned difficulty of traditional equation-based models. In a sense, our approach could be viewed as an upgraded version of the gene circuit models24, but motivation and thus results are different: instead of seeking directly for a unique regulation network with prefixed regulation forms, we aim to mimic this complex system as accurately as possible at the expense of using a black box. The DNN model is then validated with predictions on mutants’ patterns, and can, in principle, be used as an “in-silico-embryo” on which we can perform all kinds of perturbations, so as to discover possible underlying mechanisms in such an indirect manner.
Results
Model Setup
As the Drosophila embryo is at the syncytial stage when gap gene patterns form (12-14th nucleus cycle (nc)), the spatiotemporal dynamics of these expressions could in theory be described by equations with synthesis, degradation and diffusion terms. Since the diffusion constants of gap proteins are estimated to be 1 μm2/s (around 10% embryo length within an hour)4,8,24, we neglect diffusion for simplicity (including diffusion does not improve the performance). The dynamic equations are:
Here, gi(x, t) stands for the expression level of gap gene i at spatial grid number x and time step t. Four gap genes, hb, Kr, kni and gt, are considered here. mi(x) denotes maternal morphogens, which are viewed as stable inputs throughout the relevant time period. The degradation rates for all 4 gap genes are set to be the same, as a trainable parameter; thus all the regulations should be contained in the synthesis term Fi, which gives the synthesis rate for gap gene i out of the current local expression level of the gap genes and maternal morphogens. With no prior knowledge on the regulation network nor the functional form of Fi being assumed, we use a 4-layer fully connected neural network to simulate Fi (i=1,2,3,4). Solving Eq. 1 numerically is then equivalent to recurrent architecture with F as the recurrent block (Fig. 1).
Maternal factors Bicoid (Bcd), Caudal (Cad) and Torso-like (Tsl) are selected as explicit maternal input patterns1. Among them, Cad pattern is assumed to be uniquely determined by Bcd as suggested by both biological knowledge and most previous models (Eq. S1)25. Another important maternal effect gene, Nanos (Nos), is assumed to take effect purely by shaping the initial condition of Hb26,27. The other gap genes all start with zero initial conditions (see Supplementary Information S1 for details).
Loss function for training is set to be the Euclidean distance between a selected set of experiment data Gi(x, t) and the corresponding model pattern gi(x, t), for wild type (wt) and/or mutant systems (mut).
Thus, network F (synthesis term) is trained to form the desired patterns from given initial conditions and maternal inputs (see Supplementary Information S3 for details).
Training
Overfitting could be a problem, as our DNN model has about 750 parameters but the quantitative dataset we can collect are only dozens of frames. Moreover, avoiding overfitting can be more demanding in this study: unlike typical deep learning tasks where test and training data are sampled from the same distribution and in most cases features in test data are completely reflected in the training set, here wild type and mutants are different.
Based on multiple trials, the model achieves the best performance if the training set only uses 7 frames of the wt gap gene expression time course data (~5.5 min a frame, 8 to 41 minutes in nc14)5 together with a snapshot of maternal factor triple mutant (Bcd−;Nos−,Tsl−) at around 40 minutes in nc14 (Fig. S4B)28. Unsurprisingly, due to the powerful fitting capacity of DNN model, lots of details within the data can be well fitted (Fig. 2); e.g., relative heights of different peaks, the anterior peak of Kni, the posterior peak of Hb, and the dynamic anterior shift of abdomen patterns. Also, the training converges rather fast computationally: within only a couple of minutes on a common desktop.
Prediction
Surprisingly, the trained DNN model yields excellent predictions on the gap gene profiles in almost all the mutants, including various double mutants28. The number, position and even the relative intensity of almost all the peaks in the gap gene profiles are well predicted (Fig. 3). Interestingly, some delicate details are also captured by the prediction: (1) in Tsl− and Tsl−;Nos− mutants, the height of anterior Kni peak drops half compared with wt; (2) in Bcd−;Nos− mutant, two symmetric small peaks of Gt exists; (3) in Kr− mutant, Kni peak changes position and lies under Gt peak; (4) when Bcd dosage is halved or doubled (Bcd1X or Bcd4X), the predicted posterior boundary of the anterior Hb domain shifts by −8% or +9.3%, which is very close to the experimental measured value of −6.5% or +9.4%13, rather than ±11.6% as predicted by a simple threshold activation model29.
To make a more quantitative comparison, we mark the positions of important features in the expression profiles, i.e. the main peaks and their boundaries, and then compare these positions between model prediction and the experiments (Fig. 4A, see Supplementary Information S5 for detailed algorithm). The trained DNN model shows excellent performance: nearly 90% of feature points are matched (Fig. 4B inset) and those matched features have similar experiment and predicted positions (main scatter plot in Fig. 4B). We trained the model eight times independently; matched feature points of the resulting predictions are usually between 80% and 90% (See Supplementary Information S6 for detailed statistics).
Furthermore, the model could have some other predictions on multi-mutants that also agree with published experiments (Fig. 5). Especially, it has been reported that Nos−, a severe mutant lacking almost all abdomen patterns, could be rescued by knocking out maternal Hb (mHb) completely33, and the above trained DNN model predicted that the gap gene profiles is very similar to wt if initial Hb is absolutely zero in a Nos-background (Fig. 5A). This result holds across eight different trainings (Fig. S6B).
Other examples of good prediction include: Kr peak still exists and expands toward anterior when Hb and Gt were knocked out simultaneously (Fig. 5B)34; Gt instead of Kr has uniform high expression if mHb is further knocked out in the maternal morphogen mutant Bcd−;Tsl− (Fig. 5C)35; In Bcd−;Tsl− embryos, Kr pattern remains almost the same even if Gt, which is usually thought to strongly repress Kr, is knocked out (Fig. 5D)35; In Nos−;Tsl− embryos, mutation in zygotic Hb (mHb unaffected) will shift the anterior boundary of Kr from 50% to about 40% (Fig. 5E)35.
Regulation Network
Excellent prediction on nontrivial experimental observations suggests that the trained DNN model might have faithfully captured the essential characteristics of the fly embryos’ developmental system. Thus decoding the black box of the DNN model should help us understanding the underlying mechanism. Here, the black box is a function calculating four output synthesis rates from seven input concentrations. Decoding stands for regenerating this input-output relation, at least partially, with a simpler and more understandable function form. As preliminary trails, we tried to extract a simple gap gene regulation network from the DNN model, and compared it with previous knowledge.
We have tried various different methods to map a deep neural network into a simple regulatory network (Fig. 6), for example, by measuring outputs of one-hot inputs (leaving one input as 1 and setting the rest to 0), calculating correlation functions between input dimensions and output dimensions, trying to fit the black box with a linear model, or a single layer neural network with shared bias values,
It is unsurprising that each method, with limited plasticity, captures different aspect of the nonlinear black box, resulting in different network topologies. This result seems to undermine the legitimacy of representing such a complex system with just a simple regulation network. Though on the other hand, the extracted gap gene regulation network is qualitatively compatible with the known one deduced from experimental evidence according to reviewing literature (Fig. 6E). Albeit such similarity with existing knowledge, it is impossible to regenerate gap gene patterns from these fitted regulation rules, suggesting that these representations are probably already over simplified. There should be some “high order effects”36 that cannot be ignored.
At least part of the problems in mapping the DNN to a simple regulatory network can be understood by the “inherent plasticity”. For example, it is commonly accepted that Cad and its repressor Bcd both activate Hb, forming an incoherent feed forward (IFF) motif37,38 (Fig. 6E). But as shown in Fig. 6C, an IFF motif emerges with both Bcd and Cad inhibit Hb. As Cad is set to be fully determined by Bcd in our model, these two ways of implementing an IFF motif could be functionally indistinguishable, unless Cad− mutant is introduced. Apart from this explicit 3-node example, such degeneracy (different regulatory structure, almost identical function) could exist in a more dispersive and obscure manner on a larger network scale, making reverse engineering a unique regulation network purely from limited amount of data difficult.
However, it should be noted that for simple problems, such as “how can two-node reaction diffusion system generate stripes”, this training and decoding methodology works pretty well and yields the Turing pattern mechanism (See Supplementary Information S7 for details). So whether such decoding would yield meaningful mechanism is obviously case and data dependent.
Higher Order Effects
Though it was evident above that there should be some irreducible higher order effect for gap gene pattern formation, visualization of some intersections of the high dimensional F shows smooth regulation functions, instead of a rugged landscape in typical overfitting cases. Also, among all the variations in output F when generating the wt and mutant patterns, 86.5% could be explained by a linear model (measured with Euclidean distance); i.e., after regulation matrix W being fitted:
Distribution of the remaining errors can also be plotted as a histogram (Fig. 7A). In most situations (for most input g), error of the linear fitting is rather small, corresponding to simple monotonic regulatory logics as the cases shown Fig. 7B. The distribution of each gene component of those inputs when linear fittings have large errors is plotted as Fig. 7A inset. These histograms show roughly where the higher order effects are. For example, Tsl seems not to be involved in those high order effects: Tsl level is low (peaked at around 0) in all those situations where linear fitting fails, thus the regulation function is almost linear when Tsl is high, which is sufficient but not necessary for concluding that Tsl effect is almost additive.
On the other hand, both Hb and Kr show some high order effects (Fig. 7C). First, Hb is self-activating at low levels, but self-inhibiting at high levels, when Bcd>0.1, corresponding to 0% to 38% embryo length. Interestingly, evidences for both self-activating39,40 and self-inhibiting15,41 are reported previously. Second, self-activation of Kr and the inhibition from Kni on Kr acts like an AND-gate (Keeping Hb=0, Gt=0, Bcd=0.05, Tsl=0). Thirdly, both Hb and Kni activate Kr at low concentrations but inhibit at higher concentrations (Keeping Kr=0.1, Gt=0, Bcd=0.05, Tsl=0). This is in accordance with the reported dual regulation effect (both activation and inhibition) of Hb on Kr42. Notably, only the observation of the self-regulation of Hb, not the last two, is consistent among different training trails (Fig. S6B).
Discussion
No Better Prediction with More Training Data
Intuitively, with more training data added, model parameters should be more tightly constrained, resulting in better solutions; however, this is not the case for the current situation. If extra snapshots of three maternal morphogen mutants (Bcd−, Nos− and Tsl−) were added to the training set, predictions are significantly interfered. Correctly predicted features drop from 88.9% and 86.1% to 82.6% and 72.6% for maternal factor mutants and gap gene mutants, respectively (Fig. 8A). Similar results in prediction reappear if we add other extra mutant profiles to the training set. We suspect that overfitting might be caused by some potential incompatibility within the data set, as DNN is capable of fitting all sorts of features. Such incompatibility may be reduced by more careful background removal, expression level normalization, embryo age estimation, etc. For comparison, if trained with wt only, the model yields even better predictions on gap gene mutants (89.2% vs. the original 86.1%, Fig. 8B), but much worse on maternal factor mutants. Again, these results reflect some inherent plasticity: the model seems to be able to correctly predict gap gene mutants even without a correct “understanding” of the role of each upstream morphogens.
Robustness against Missing Factors
It can never be guaranteed in practice that no factors (hidden genes, gene modification, small RNA, etc.) are left unknown. Instead of wishing the missing factors are not important, we can demonstrate that our model is insensitive to missing even important factors. With Kni pretended to be absolutely unknown, i.e. removed from data and model, we retrained the three-node model, and remarkably it still yielded good predictions on features of the remaining gap genes (Fig. 9A).
The regulation network reconstructed by the method discussed in Fig. 6, though rough, bears some hint for how Kni’s role was effectively absorbed by other genes (Fig. 9B). For example, the original (Fig. 6C) double inhibition (Hb inhibits Kni and Kni inhibits Kr) is replaced by an effective activation from Hb to Kr.
This result may serve as a demonstration to how the model can work robustly with missing factors. However, such robustness may hinder the model’s ability to discover new genes. Ideally if an extra node is provided to help pattern formation freely, while an irreplaceable factor is missing, this free node would be able to take the role of the missing one. In simple cases like the three-node adaptation network, the buffering node automatically emerges if trained in this way (Supplementary S7). But it is not the case for more complex situations: here, an additional free node X did not help with better prediction (Fig. 9C), nor did it show the pattern or regulation of the original Kni (Fig. 9D-E).
It should be noted that overall introducing genes with known patterns usually help with prediction performance. As a good example, Cad helps significantly improving predictions, though theoretically effects of Cad can always be absorbed as a nonlinearity of Bcd regulation function.
Alternative Mechanism
With previous models, it has been difficult to explain the global decline of gap gene profiles after 40 minutes in nc14 without any change in external inputs6,8. It has been suggested that this phenomenon could be attributed to the events associated with maternal-zygotic transition, such as the decaying of the Bcd gradients in nc1443, the turn off of the Bcd transcription regulation on Hb44, or the switch of the Hb enhancer15. While we could capture the falling phase of the gap gene profiles if we introduce the shutdown of Bcd in our model in early nc14, surprisingly, we can also train a model with both the rising phase before 40 minutes and the falling phase from 40 to 58 minutes without any input change. The resulting model can not only fit the decline phase well, but also have reasonably good predictions on mutants’ profiles (78.3% feature points in maternal factor mutants, and 85.1% in gap gene mutants were correctly predicted).
In the same sense, our present model did not take into account of many factors, such as diffusion, lifetime of mRNAs, the time delay due to transcription and translation and degradation of the maternal morphogens, but it still has satisfactory predictions, suggesting that those effects are not irreplaceable for the formation of the main pattern structures.
Conclusion
Differential equation models have been widely and successfully used in simulating and understanding biological systems. However, it is evident that such models, in its conventional implementation, can often run into their limitations in dealing with systems of high complexity. Part of the problem may come from the standard modeling procedure: (1) (qualitative) regulation relations are extracted/inferred from experimental observation/data, which are typically obtained by perturbing the system in a few limited and mostly qualitative ways (e.g. deleting, mutating and overexpressing genes of interest); and (2) predefined simple functional forms (e.g. Hill functions) are used to model the regulations with some parameters. Information can get lost in both steps, and the resulting model can be too restricted and confined to reflect the true essential dynamics of the system. Therefore, it may be worthwhile to try to use the available data differently. The approach we adopted here with DNN takes the data in its entirety – the expression profiles of the gap genes. The fact that our model can acquire such an impressive predictive power with only the wt dynamics data is also suggestive – there is a rich content of information in the dynamics of the system as compared with the end phenotype.
Albeit the difficult interpretability like all DNN models, our model did generate some new insight about the patterning system in early fly embryos. More importantly, with such an in-silico model one can conceivably perform almost arbitrary perturbations and thought experiments, which would otherwise be difficult to perform in wet experiments and be less reliable in the conventional network models. So maybe in the near future, this approach could become a powerful lens to provide novel insights and new perspectives, contributing to our understanding of complex systems.
Acknowledgements
We thank Xiaojing Yang, Ning Yang, and Xiao Li for helpful discussions. The work was supported by the Chinese Ministry of Science and Technology (Grant No.2015CB910300) and the National Natural Science Foundation of China (Grant No. 91430217).
Footnotes
↵* Email: tangc{at}pku.edu.cn