Abstract
The concept of attractor of dynamic biochemical networks has been used to explain cell types and cell alterations in health and disease. We have recently proposed an extension of the notion of attractor to take into account metastable regimes, defined as long lived dynamical states of the network. These regimes correspond to slow dynamics on low dimensional invariant manifolds of the biochemical networks. Methods based on tropical geometry allow to compute the metastable regimes and represent them as polyhedra in the space of logarithms of the species concentrations. We are looking for sensitive parameters and tipping points of the networks by analyzing how these polyhedra depend on the model parameters. Using the coupled MAPK and PI3K/Akt signaling networks as an example, we test the idea that large changes of the metastable states can be associated to cancer disease specific alterations of the network. In particular, we show that for model parameters representing protein concentrations, the protein differential level between tumors of different types is reasonably reflected in the sensitivity scores, with sensitive parameters corresponding to differential proteins.
1 Introduction
Precision medicine is an emerging concept in healthcare that aims to adapt the therapy to patient specificity [10]. The need for this paradigm change is justified by the strong differences among patients suffering of seemingly the same disease which leads to strong variability of the response to treatment. The implementation of such a strategy relies on the development of new tools that allow to understand the origin of the inter-individual differences and to predict the individual response to a particular treatment strategy.
It is now largely considered that at least part of the inter-individual differences are genetic and therefore can be detected by genome sequencing. However, although differences in a DNA sequence, such as single nucleotide polymorphism and copy number variations, are relatively easy to detect, their consequences are very difficult to predict [30, 35, 11]. Mathematical models for cell physiology such as protein interaction networks and chemical reaction networks (CRN) can be used to understand the impact on the phenotype of a change in the protein function or expression level. Static protein interaction networks were used to understand differences between patients by mapping the genotype differences onto the network and identifying significantly enriched modules [8]. This method, based on the hypothesis of modularity of the biological function, has its limitations. Most importantly, it is unable to predict differences in the phenotype produced by alterations of the same pathway.
In oncology, the concentration of alterations in the same pathway is not uncommon, especially for hub signaling pathways such as MAPK and PI3K/Akt [34]. These pathways are deregulated in more than 60% of all cancers; these deregulations imply multiple changes involving several proteins. Pathway redundancy and multiple feed-back regulation are obstacles against cancer targeted therapies; inhibition of one oncogene can trigger compensatory effects elsewhere [19, 13]. Furthermore, deregulation of signaling is a dynamical phenomenon, affecting the time dependent response of signaling proteins to stimuli. Therefore, in order to predict the effect of perturbations on such complex systems, one needs quantitative dynamical models. Signaling pathway dynamics can be conveniently modeled by ordinary differential equations (ODEs) resulting from the chemical kinetics of CRNs.
In the context of CRN dynamics, differences between patients or between tumors can be modeled as differences of parameter values. It is reasonable to think that only the differences of the sensitive parameters, i.e., those parameters whose change induces significant effect, matter for the differences in phenotype. Sensitivity analysis is a general method allowing to detect sensitive parameters and the capacity of the model to resist changes [37]. Sensitivity analysis can be applied to steady states or to computation tree logic (CTL) formulas or ad hoc descriptions of properties of time dependent protein concentrations such as oscillations, peaks, etc. [28, 36, 18]. However, in general it is not clear which dynamical property is important for the biological function. Therefore, it is interesting to look for global methods allowing to test the effect of parameter changes on all the features of the dynamics.
A possible global description of a dynamical system can be the set of its point attractors (stable steady states). In the context of boolean network models (an alternative to ODE models) point attractors have been used to characterize cell types and changes of their number were interpreted as cell fate decisions [15]. However, the set of point attractors is difficult to compute for large networks. Moreover, the knowledge of point attractors is not sufficient for reconstructing the transient propagation of a signal through the network.
We have recently shown that biochemical networks in general, and signaling networks in particular, have also metastable regimes, defined as regions of the phase space where the system is slow and spends a long time [33]. A possible trajectory passes from one metastable regime to another, either stopping in a steady state or visiting periodically one or several slow metastable regimes along a limit cycle attractor. Thus, the set of metastable regimes and the possible transitions between them provides a richer picture of the qualitative dynamics of the network [33, 26]. We also showed how to use tropical geometry methods in order to compute, without simulating trajectories, all the metastable regimes of a given model with polynomial ODE dynamics [32, 33, 26]. In [33] we showed that data from aggressive and non-aggressive tumors correspond to different metastable regimes of the TGF-β signaling network.
In this paper, we provide a new application of tropical geometry methods. In computational models of signaling pathways, some proteins represent model variables and other proteins represent model parameters. Of course, not all model parameters can be associated to protein levels, many of them are kinetic parameters. In [33] we focused on those proteins that are model variables and directly compared their levels in metastable regimes to data. Here, we use tropical geometry to detect sensitive parameters that have significant global effect on the network dynamics. For those parameters that correspond to proteins we compare the sensitivity scores to differential protein levels from data. Changing sensitive parameters can bring the network to tipping points where the network behavior changes drastically. In order to test this idea in a biomedical framework we check if these sensitive parameters are significantly changed in cancer disease.
The structure of this paper is the following. The second section recalls the mathematical formalism of tropical geometry and introduces the branches of tropical equilibration that are proxies for metastable regimes. The third section is dedicated to the methods used for computing tropical equilibration branches and for performing sensitivity analysis. The fourth section presents the results of this analysis on a MAPK and PI3K/Akt model from the literature and the comparison of sensitive parameters with differential proteome data from different cancers. The last section is a discussion of the results.
2 Theory: Tropical equilibrations of chemical reactions networks with rational rate functions
In this section we introduce the main concepts relating to geometry and dynamics.
We consider chemical reaction networks described by rational kinetic laws (the reaction rates are fractions whose denominator and numerator are polynomials in the species concentrations). Examples of such kinetic laws include, but are not restricted to, mass action law, Michaelis-Menten law and its various generalizations, Hill law with integer Hill index.
After computing a common denominator of rates of reactions acting on the same species, the CRN kinetics are described by a system of differential equations: where kij, are products of kinetic constants and integer stoichiometric coefficients, xi > 0 are variable concentrations, αj = (), βj = () are multi-indices, and xαj = xβj = .
In general, the system (1) has a number of semi-positive conservation laws, i.e., linear combinations of variables that are constant on any trajectory: where Cij ≥ 0, > 0.
We replace now the exact values of parameters by their orders of magnitude which are supposed to be known. Usually, orders of magnitude are approximations of the parameters by integer powers of ten and serve for rough comparisons. Our definition of orders of magnitude is based on the equations kij = k̅ijεγij, , , where 0 < ε < 1. The exponents γ,γʹ, γʺ are considered to be integer or rational. For instance, the approximation produces integer exponents, whereas γij = round(d log(~kij)~/log(ε))/d, = round(d log(~~/log(ε))/d, = round(d log()/log(ε))/d produces rational exponents, where “round” stands for the closest integer (with half-integers rounded to even numbers) and d is a strictly positive integer. When ε = 1/10, our definition provides the usual decimal orders.
In this study, orders of magnitude of the kinetic parameters are supposed to be known. In contrast, species orders vary in time and have to be computed. To this aim, the species concentrations are first represented by orders of magnitude defined as
More precisely, one has xj = x̅jεaj, where x̅j has zero order (unity). The definition (4) restricts species orders to integers; however, rational orders are also acceptable. Because log(ε) < 0, (4) means that species orders and concentrations are anti-correlated (large orders mean small concentrations and vice versa).
Then, network dynamics are described by the rescaled ODE system where µij(a) =σij + 〈a, αj〉, (a) = + 〈a, βj〉 and 〈·, ·〉 stands for the dot product.
The rescaled conservation laws read where we considered that the Cij have zero order.
The numerator and denominator of the r.h.s. of each equation in (5) are sums of multivariate monomials in the concentrations. The orders µij indicate how large these monomials are in absolute value. A monomial of order µij dominates another monomial of order µijʹ if µij < µijʹ.
There is always at least one dominant monomial for the numerator and one for the denominator. We are interested in the situation when the denominator has at least two dominant monomials, one positive and the other negative. This situation implies compensation of large forces on variable i and as shown elsewhere [33], leads to a good proxy for metastable states. Furthermore, the orders of magnitude are constrained by the conservation laws. This justifies the following definition:
The tropical equilibration problem consists of the equality of the orders of at least two monomials, one positive and another negative, in the numerator of the differential equations of each species and in each semi-positive linear conservation law. This condition allows us to compute the concentration orders defined by (4). More precisely, we want to find a vector a such that min
Eq. (7) is related to the notion of a tropical hypersurface. A tropical hypersurface is the set of vectors a ∈ ℝn, such that the minimum minj(γij + 〈a, αj〉) is attained for at least two different indices j (with no sign conditions). Tropical prevarieties are finite intersections of tropical hypersurfaces. Therefore, our tropical equilibrations are subsets of tropical prevarieties.
In order to find the solutions of the system (7) we can explore combinatorially trees of solutions resulting from various choices of minimal terms and write down inequalities for each situation. Because a set of inequalities defines a polyhedron, the set of tropical equilibration solutions forms a set of polyhedra in ℝn.
To set these ideas down let us use a simple chemical network example, the Goldbeter-Koshland kinetics: where P, P*, E1, E2 represent a protein, its modified form, and two enzymes, respectively.
The two enzymatic reactions of the model have Michaelis-Menten kinetics and it results where k1, k2, Km1, Km2 are positive kinetic constants.
The system (8) has a semi-positive linear conservation law
Orders of variables and parameters are as follows: P = P̅εa1, P* = P̅* εa2, k1E1 = v̅1 εγv1, k2E2 = v̅2εγv2, Km1 = K̅m1εγm1, Km2 =K̅m2εγm2, P0 = P̅0εγp0.
After finding the common denominator of the fractions in (8) we get and the tropical equilibration equations for the Goldbeter-Koshland example read
By examining all the cases (10) and (11) we find values for a1 and a2, but also conditions that the parameters must satisfy for these solutions. We list below all the structurally stable solutions (solutions whose existence does not depend on equalities among parameter orders):
a1 = γp0, a2 = γp0 + γv1 − γv2 + γJ2 − γJ1 under the conditions γJ1 ≥ 0 and γv1 − γv2 ≥ max(γJ1, γJ1 − γJ2),
a2 = γp0, a1 = γp0 + γv2 − γv1 + γJ1 − γJ2 under the conditions γJ2 ≥ 0 and γv2 − γv1 ≥ max(γJ1, γJ1 − γJ2),
a1 = γp0, a2 = γp0 + γv1 − γv2 + γJ2 under the conditions γJ1 ≤ 0 and γv1 − γv2 ≥ max(−γJ2, 0),
a2 = γp0, a1 = γp0 + γv2 − γv1 + γJ1 under the conditions γJ2 ≤ 0 and γv2 − γv1 ≥ max(−γJ1, 0),
where γj1 = γm1 − γp0, γj2 = γm2 − γp0. For this model, for the given parameter values, there is only one tropical equilibration solution that follows closely the unique stable state state of the model when the model parameters are perturbed (see Fig. 1). Interestingly, the well-known zero-order ultrasensitivity [14] of this model, occurring when both enzymes are saturated (i.e., when γj1 ≤ 0, γj2 ≤ 0), corresponds to a jump discontinuity of the tropical solution (see Fig. 1).
Branches of tropical equilibrations. For each equation i, let us define in other words Ji denotes the set of indices j of monomials having the same minimal order for a given i.
We say that two tropical equilibrations a1, a2 are equivalent iff 𝕁i(a1) = 𝕁i(a2), for all i. Equivalence classes of tropical equilibrations are called branches. A branch B with index sets 𝕁i is minimal if, for any branch Bʹ of index sets , the relation ⊂ 𝕁i for all i implies Bʹ = B or Bʹ = 0̸. Closures of equilibration branches are defined by a finite set of linear inequalities, which means that they are polyhedral complexes. Minimal branches correspond to maximal dimensional faces of the polyhedral complex. The incidence relations between the maximal dimensional faces (n−1 dimensional faces, where n is the number of variables) of the polyhedral complex define the connectivity graph. More precisely, minimal branches are the vertices of this graph. Two minimal branches are connected if the corresponding faces of the polyhedral complex share an n − 2 dimensional face. In terms of index sets, two minimal branches with index sets 𝕁i and are connected if there are index sets of an existing non-minimal branch such that ⊂ and 𝕁i ⊂ for all i. A zero dimensional minimal branch is a point, like in the case of the Goldbeter-Koshland model when it corresponds to a steady state of the model. A non-zero dimensional minimal branch usually corresponds to a slow invariant attractive curve or surface (sufficient conditions for such a situation can be found in [27]). Examples of models with non-zero dimensional branches corresponding to attractive invariant manifolds can be found in [24, 32, 27, 33]. Slow attractive invariant manifolds are metastable regimes because the dynamical system defined by Eq. (1) spends considerable time on them before eventually leaving them for another invariant manifold.
3 Methods
3.1 Computation of minimal branches
For a given model and set of model parameters, the minimal branches were computed by the algorithm described in [31, 33]. The tropical solution set is the union of polyhedra where the polyhedra represent minimal branches.
3.2 Generation of perturbed parameter orders
For a given model, the nominal tropical solution set M was computed with the given model parameters. Thereafter, the perturbed tropical solution sets were computed by varying the orders of the given parameter kj. Here, ℓ = 6 and the perturbed parameter orders are γj −3, γj −2, γj −1, γj +1, γj +2, γj +3, respectively. The perturbation of multiple parameter orders in such a framework is straightforward.
3.3 Identification of parameter sensitivity scores
3.3.1 Computation of distances
For distance computation, we applied the following constraints to the concentration orders aj (cf. (4)), thereby defining a feasible region Γ within the polyhedra for sampling purposes: where lb and ub refers to upper and lower bounds, respectively. Finally, the distances between were computed and are represented by the vector Dj of dimension ℓ as defined in the following manner where T and denote the sets of representative point(s) sampled from the polyhedra in M ∩ Γ and ∩ Γ, respectively.~~ · ~~ is the Lp norm distance metric. The philentropy R-package with Minkowski distance method [12] was used to implement it. We tested our findings with different values of the Lp norm (as suggested in [7]). The idea of such a distance computation technique was motivated by collision detection approach in motion planning (cf. Chapter 5 in [20]). For a given M, T was computed as follows: where p is the number of polyhedra in the solution set M. RepresentativePoint(·) is a function that generates random sample point(s) satisfying the inequality conditions for the given polyhedron and computes the component-wise mean. In case of non-zero dimensional polyhedra, we sampled 3000 random samples points from an infinite number of feasible points.
If instead of ℝn distances one is interested in the effect of perturbations on a particular target variable Xm, then the sets Tk should be replaced by their projections on the corresponding axis. More precisely, instead of the concentration order vector a = (a1, … , an) ∈ Tk, consider the scalar am representing the concentration order of the species Xm. We will denote these projected distances as
3.3.2 Tropical sensitivity score
We denote the mean as the parameter sensitivity score for kj. Similarly, we define as the parameter sensitivity score for kj in the direction of the variable Xm. A higher score signifies a more sensitive parameter in a relative scale. However, sensitivity scores can not be compared between (17) and (16).
3.3.3 Availability
TROSS (Tropical Sensitivity Scores), the software to compute the sensitivity scores is available at https://github.com/JRC-COMBINE/TROSS. It uses PtCut to compute the tropical solutions [23].
4 Results
In this section, we test the idea that large changes of the metastable states can be associated with cancer disease specific alterations of the network. We selected a biochemical reaction network known to be involved in cancer and computed the tropical minimal branches with methods exposed in the Sections 2 and 3.1, thereby approximating its metastable states. Large changes in tropical solutions are achieved by changing the parameter values from a given set of nominal values and quantifying the resulting change through normalized sensitivity scores. The comparison between these scores and differential proteomics data obtained from different cancer conditions is presented.
More precisely, we perform a diagnostic test to check if the protein levels whose variations between different cancer conditions are large are also strongly sensitive parameters of the model and that proteins with small variation are low sensitivity parameters. The accuracy of such a test is based on the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) measures if the ordering in the parameter sensitivity scores (sensitive/not sensitive) is actually preserved in the differential expression of proteins (high/low) in the data.
4.1 Biochemical reaction network
The biochemical model was obtained from the Biomodels database [21] with identifier BIOMD0000000146. The model, introduced in [16], was motivated by experimental works on the Heregulin stimulated ErbB receptor and demonstrates the Akt-induced inhibition of the MAPK pathway via phosphorylation of Raf-1. We have chosen this model because it is one of the few models in the Biomodels database showing crosstalk of two pathways important in cancer. A very similar model was used in [19] to investigate the possibilities of targeted combination therapy.
This CRN model has 33 species and 34 reactions: 21 reactions have Michaelis-Menten kinetics and 12 have mass action kinetic laws. The chemical kinetics is composed of 33 ODEs with rational r.h.s. terms. Using MATLAB Symbiology we computed 11 semi-positive conservation laws. Using the conservation laws and the denominator of common denominator expression for each ODE we generated a system of 44 polynomials for which we computed tropical equilibrations.
The CRN has 78 parameters to which we added 11 more parameters corresponding to the initial values of the conservation laws, computed from the initial concentrations of the species.
Among the 89 parameters not all of them amount to expression level of genes and proteins. The parameters that can be directly associated to genes and proteins were chosen according to the following simple method:
All the concentration laws have a meaning as total amount of kinases and were associated to the corresponding protein kinase: k79–k89. Exception is made by k80 that represents conservation of the complex AKT-PI-P and can not be directly associated to a protein.
Some other parameters belong to dephosphorylation reactions and are proportional to constant phosphatase concentrations: k77, k78. These parameters were associated to the corresponding protein phosphatase.
In the Sections 4.3.3 and 4.4.2 the sensitivity scores of the chosen parameters are compared to differential levels of the corresponding proteins found in databases. To this aim we associate to each parameter the protein whose level is represented by the value of the parameter. In models, variable and parameter names often correspond to protein families. For instance, the parameter k77 coding for a MAPK phosphatase, generically designates the DUSP family. Thus, the mapping from parameters to proteins is one-to-many.
4.2 Computation of distances
For the given (default) parameter values of this model, we obtained two minimal branches, each of which were of dimension 2. The existence of non-zero dimensional branches suggests that all of them are metastable regimes (cf. Section 2). The minimal branches along with the parameter perturbations were performed as per the steps described in Section 3 for ε = 1/11. For sampling, we fixed the ub = round(log(10−20) log(ε)) and lb = round (log(1010)/log(ε)), so that it covers a broad range of concentration orders applicable in different application scenarios. Furthermore, we normalized the parameter sensitivity scores between 0 and 1 in the following manner where D̅k̅ (cf. Eq. (16), (17)) and pk are the parameter sensitivity and normalized parameter sensitivity score of k-th parameter respectively. P = 89 (total number of model parameters). The histogram depicting the normalized parameter sensitivity scores is shown in Fig. 2 for the Euclidean distance, i.e., using the Lp norm with p = 2. For this distance, the distribution of sensitivity scores is rather uniform and most sensitive parameters can not be sharply separated from the others. We tested the robustness of the sensitivity scores by varying the Lp norm (cf. Supplementary Fig. 3). One can notice that a small value of the Lp norm preserves the order and sharpens the differences between sensitivity scores, emphasizing a few strong sensitivity parameters. The normalized sensitivity scores for various model parameters are presented in Fig. 2 for three distances D1 computed with Eq. (16), and D2, D3 computed with Eq. (17), for Lp norm with p = 2 (cf. Supplementary Table 3 for other values of the Lp norm). The distances D2 and D3 are in projection on a 1D axis, therefore do not depend on the value of the Lp norm, except for minor variations due to the random sampling of the polyhedra.
4.3 TCPA Proteomics Data
We focused on the tissue samples pertaining to The Cancer Genome Atlas (TCGA). TCGA is among the biggest collaborations with the aim of genomic research in cancer and comprises of more than two petabytes of genomic data [5], which makes it an important benchmarking dataset in the bioinformatics community. Since in this work we aimed to study the protein differential levels, we turn to The Cancer Proteome Atlas (TCPA), which contains 8167 tumor samples in total, mainly consisting of TCGA tumor tissue sample sets [6].
4.3.1 Data pre-processing
We selected L4 (Replicates-based normalization) data files from [22] with at least two groups of patient cohorts. This resulted in four cancer datasets, namely, Breast invasive carcinoma (BRCA), Pheochromocytoma and Paraganglioma (PCPG), Skin Cutaneous Melanoma (SKCM) and Thyroid carcinoma (THCA), respectively. BRCA dataset consisted of primary tumor, metastatic tumor and normal samples whereas the rest comprised of primary and metastatic samples. The samples are relevant to the TCGA Research Network.
4.3.2 TCPA Differential Protein Expression Analysis
Out of 89 model parameters, we could associate 9 of them (see Fig. 2) with the datasets where more than one protein may be mapped to the same parameter. Thereafter, we extracted the data for the 17 proteins associated with these 9 parameters and performed a Wilcoxon rank sum test [17] between the primary versus normal samples in BRCA and primary versus metastatic samples for other datasets. The P-values were adjusted based on Benjamini and Hochberg correction method [9]. P-values smaller than a threshold indicate significant differential protein expression. Only BRCA and SKCM datasets resulted in significant proteins, where significance is defined at various P-value thresholds, namely, 5e-05, 5e-04, 0.005, 0.05. In case several proteins are mapped to a single model parameter, the parameter is considered to be significant if it is mapped to at least one of the significant proteins. The adjusted P-values are reported in Table 1.
4.3.3 Validating parameter sensitivity scores with TCPA database
We compared model parameter sensitivity scores and differential protein expression. To this aim, we defined a diagnostic test to assess the ability of our method to correctly identify sensitive/insensitive parameters with respect to their differential expression in the proteomics data. We treated the normalized parameter sensitivity scores (cf. Section 4.2) as predicted class probabilities ranging from 0 to 1. The parameters with values close to 1 are more sensitive than the ones close to 0. Finally, we perform a diagnostic test to quantify the extent to which the predicted class probabilities overlap with the significantly (actual class label of 1) and non-significantly (actual class label of 0) expressed proteins in our datasets. The test is performed by computing the AUC of the ROC curve using R [29]. The major benefit of using AUC is that the method does not require cut points in order to define predicted class labels from predicted class probabilities [25]. An AUC above 0.5 means that the method performs better than random guessing. As the definition of significantly expressed proteins, as well as the Lp norm values, are arbitrary, i.e., depends on a pre-defined threshold, we investigated the changes in AUC values by varying the P-value thresholds while keeping the Lp value constant and vice versa. The findings are reported in Table 2 and Supplementary Tables 4, 5. The distribution of AUC values are shown in Supplementary Fig. 4, 5.
4.4 CPTAC Proteomics Data
CPTAC or Clinical Proteomic Tumor Analysis Consortium is another proteomic collaboration, containing TCGA patients. The CPTAC proteomic datasets relevant to the TCGA were downloaded from CPTAC data portal [1]. From all TCGA cancer datasets, only Breast, Ovarian and Colorectal cancers were available in CPTAC. Since we need relative values for our analysis, we used data derived from iTRAQ (isobaric Tags for Relative and Absolute Quantification) quantification method. More information about iTRAQ method and overall data analysis pipelines of CPTAC can be found in [2, 3]. The data consisted of log fold ratio of the samples (determined with respect to an internal reference).
4.4.1 CPTAC Differential Protein Expression Analysis
We focused on CPTAC datasets with at least two groups of patient cohorts and the availability of iTRAQ data type. This resulted in a single dataset, namely, Breast cancer dataset with four subtypes (Luminal A, Luminal B, Basal-like and HER2-enriched subtypes). Using the linear modeling of the Limma package, we computed the differentially expressed proteins among the four subtypes in a pairwise manner. For each protein we calculated average log2 expression, moderated t-statistic, P-value, q-value or adjusted P-value (based on Benjamini and Hochberg correction method), log-odds that the protein is differentially expressed, and estimate of the log2-fold-change by applying empirical Bayes moderation of the standard errors towards a common value. More information can be found on Limma’s documentation and vignettes available at Bioconductor [4]. The significant proteins are determined based on various P-value thresholds, namely, 5e-05, 5e-04, 0.005, 0.05 and subsequently the significant model parameters. The adjusted P-values are reported in Table 1.
4.4.2 Validating parameter sensitivity scores with CPTAC database
Out of 89 model parameters, we could map 11 parameters with the datasets. The AUC values were computed as per Section 4.3.3 and are reported in Table 2 and Supplementary Tables 4, 5. The distribution of AUC values are shown in Supplementary Fig. 6, 7, 5.
5 Discussion and conclusion
Computing the sensitivity scores based on tropical geometry is a flexible method for predicting sensitive parameters and tipping points of biochemical networks. The comparison between these predictions and differential proteomics data is summarized in the Table 2 and Supplementary Tables 4, 5.
We have used two types of sensitivity scores: an n-dimensional (nD) score based on the Lp distance between sets of tropical equilibration solutions and a 1-dimensional (1D) score based on distances between projections of these sets on a target variable. Among these scores, the one based on the distance D3, the 1D score targeting AKT-PI-PP, has the worst performance in terms of the AUC of the ROC curve. The nD score and the 1D score targeting MAPK-PP lead to reasonable agreement between predicted sensitive parameters and proteomic data.
We measure the strength of overlap of our scores with the protein measurement data using AUC values. An AUC above 0.5 means that our method performs better than random guessing. We achieved better AUC values for many datasets, e.g., Luminal A, Luminal B, Basal-like and HER2-enriched subtypes from CPTAC database as compared to BRCA and SKCM datasets from TCPA database. For associating our scores with protein measurement data, two important parameters need to be pre-defined, namely Lp norm for distance computation and the P-value threshold to assess protein significance. Furthermore, it has been suggested that in high dimensional setting, the proper choice of the Lp norm is important [7], which was also reflected in our histograms (cf. Fig. 2). Therefore, we systematically varied the P-value thresholds along with the Lp norm values resulting in different combinations and associated AUC values (cf. Fig. 4, 5, 6, 7). For a fixed Lp we average over the different P-value thresholds and report the median AUC values along with the standard deviations. In the presence of significant proteins at only a single P-value threshold, the standard deviations are denoted as NA (cf. Table 2 and Supplementary Table 4). Likewise, for fixed P-value thresholds, we average over different Lp norms. In the absence of significant proteins at a given threshold, AUC computation fails and is denoted by NA (cf. Table 2 and Supplementary Tables 5). We found that for most datasets a low value of p in the Lp resulted in better AUC values, except in SKCM. The empirical findings suggest that a proper selection of the Lp norm is important. Also, it was found that the choice of the P-value threshold is relevant.
The nD score with p = 0.01 for the Lp norm identifies SOS/GRB2 and SHC as sensitive parameters. EGFR is not signaled as sensitive parameter, except for much larger p = 2, which means that EGFR total quantity is not limiting in this model. Extra sensitive parameters such as MAPK, MEK, PP2A and DUSP are detected by the 1D score when MAPK-PP is the target. PP2A, AKT3 are sensitive parameters when the target is AKT-PI-PP. The findings for 1D scores are compatible with standard sensitivity analysis of the same model. For instance, [18] found that the most influential parameters on MAPK-PP are, in decreasing order, MAPK, SHC, MEK, DUSP, and the most influential parameters on AKT-PI-PP are PP2A, AKT, PI3K, EGFR. This result is nearly identical to our Table 2 (columns D2 and D3; our D2 score classifies also PP2A among the parameters influential on MAPK-PP).
Altogether, the findings for nD and 1D scores are compatible with the properties of the studied model. The sensitive parameter PP2A is responsible for the behavior of the model as a single pathway (MAPK) or as two pathways in crosstalk [16]. The negative loop produced by the repression of MAPK by the PI3K/Akt pathway (via Raf) reduces the model sensitivity with respect to the total EGFR, but does not affect the model sensitivity on parameters downstream of Raf, such as MEK, MAPK and DUSP. The system of coupled MAPK and PI3K/Akt signaling networks has a number of other negative loops, not represented in this model. These loops further increase the robustness of this system and reduce the effectiveness of targeted therapies.
Contrary to standard sensitivity analysis methods, tropical sensitivity analysis has the choice to tune the contrast between sensible parameters by the choice of the Lp norm, which can lead to well separated sensitivities where standard methods find a continuous distribution of sensitivities. Also, tropical sensitivity analysis does not use simulation of trajectories as standard sensitivity analysis does and is thus protected against errors coming from incomplete sampling of the trajectory set. Furthermore, as tropical analysis depends on orders of magnitude of the parameters, its results are expected to be valid for wide domains of model parameters. Finally, tropical sensitivity analysis can be also applied to categorical variables. For instance, one can compare patients with distinct types of mutations.
Tropical sensitivity analysis provides robust information on sensitive model parameters that are candidates for biologically important drivers. The extension of our approach to perturbation of multi parameter sets (e.g., combinatorial perturbations) remains a topic of future work. Furthermore, we aim to use drug-target databases for further in-silico validations of our sensitivity scores. We also aim to efficiently extrapolate our findings to gene expression databases which are more readily available than protein measurement databases. We also intend to study the effect of drug inhibitors in targeted therapy. In a tropical geometry setting, this can be studied by supplementing the model with variables representing the drugs and parameters representing the affinities of the drugs to the targets. Finally, a future topic is the study of the rich geometric structure of the perturbed metastable regimes resulting from the tropical sensitivity analysis. For instance, the geometrical description of the perturbed pathways could be used to characterize the pathway remodeling induced by the perturbation. We expect that refinement of these techniques will lead to a better understanding of tumor vulnerabilities that can be used to optimize targeted therapy.
Acknowlegements
This work was supported by the ANR/DFG grant ANR-17-CE40-0036 (project SYMBIONT) and CompSE profile area, RWTH Aachen University. We thank R. Larive and D. Santamaria for their critical reading of the manuscript and for useful discussions.