Abstract
BACKGROUND Recent clinical trials have shown that the adaptive drug therapy can be more efficient than a standard MTD-based policy in treatment of cancer patients. The adaptive therapy paradigm is not based on a preset schedule; instead, the doses are administered based on the current state of tumor. But the adaptive treatment policies examined so far have been largely ad hoc. In this paper we propose a method for systematically optimizing the rules of adaptive policies based on an Evolutionary Game Theory model of cancer dynamics.
METHODS Given a set of treatment objectives, we use the framework of dynamic programming to find the optimal treatment strategies. In particular, we optimize the total drug usage and time to recovery by solving a Hamilton-Jacobi-Bellman equation based on a mathematical model of tumor evolution.
RESULTS We compare adaptive/optimal treatment strategy with MTD-based treatment policy. We show that optimal treatment strategies can dramatically decrease the total amount of drugs prescribed as well as increase the fraction of initial tumour states from which the recovery is possible. We also examine the optimization tradeoffs between the total administered drugs and recovery time.
CONCLUSIONS The adaptive therapy combined with optimal control theory is a promising concept in the cancer treatment and should be integrated into clinical trial design.
1 Introduction
Intratumoral heterogeneity is being increasingly recognized as a cause of metastasis, progression and resistance to therapy1. While genetic instability, a hallmark of malignancy2, can result in this heterogeneity, it is being increasingly understood that eco-evolutionary factors, like selection and clonal interference, can also drive and maintain it3,4.
While sequencing technologies have enabled increasingly in-depth quantitative understanding of the genetic heterogeneity, relatively little experimental work has sought to directly quantify the eco-evolutionary interactions involved. As more studies come to light showing the efficacy of treatments based on eco-evolutionary trial designs, this lack of quantification is coming into focus.
In line with standard, cell-autonomous growth-based theories, conventional chemotherapy is given to patients at the maximum tolerated doses (MTD): the highest doses that most patients can safely tolerate. Although the MTD-based chemotherapy offers advantages in survival compared to no therapy, cures remain elusive, and side effects can be severe. In addition to the toxicity, it is known that relapse is nearly inevitable due to the emergence of therapeutic resistance: a process driven by Darwinian evolutionary dynamics in which the MTD-based chemotherapy kills off the chemotherapy-sensitive cells and chemo-refractory cells eventually dominate in the tumor. While it is unknown whether these resistant cells are present before therapy, or acquire resistance mutations during therapy, it is the process of variation and selection under standard therapy that drives the inevitable failure in the patient.
Metronomic chemotherapy has been proposed as a possible alternative to the MTD strategy5,6. Metronomic chemotherapy is given in an on/off fashion at frequent time intervals according to a set periodic schedule. The idea behind this method is to give less overall therapy, thereby increasing tolerability, and allowing time for therapy sensitive cells to regrow, allowing for resensitization of the tumor. Frustratingly, the results of clinical trials of metronomic chemotherapy have been “variable” 7 and many of them have not demonstrated significant efficacy8–10 as compared to standard therapy. Recent studies argue that metronomic chemotherapy highly depends on timing, and the right scheduling can improve the results of its usage11.
Based on the hypothesis that disease dynamics depend on the evolution of tumor heterogeneity as modulated by competition between subtypes, the idea of using adaptive therapy (AT) has been proposed12. AT is much like metronomic therapy, with an important difference. AT administers doses of therapy according to the current state of tumor growth and its anticipated evolutionary changes (trajectory). These can be estimated using direct (e.g., taking biopsies) or indirect (e.g., antigen testing, mathematical modeling) methods. Therefore, unlike the MTD-based or metronomic protocols, AT does not have preset schedule and it adjusts the doses and timing before a tumor becomes chemotherapy-resistant, prolonging time to this event. Recently, the adaptive strategies have shown promise in pre-clinical trials of breast cancer13 and a phase 2 clinical trial in metastatic castrate-resistant prostate cancer14.
These two recent successes13,14 in adaptive therapy have been based on mathematical modeling of tumor evolution under therapy using a dynamical systems approach based on Evolutionary Game Theory (EGT) 15,16. This formalism explicitly considers interactions between sub-populations and models their fitness in frequency dependent terms. EGT has been used to theoretically consider many scenarios in cancer before, including therapy scheduling and timing in prostate cancer14,17–19; the use of tumor microenvironment targeting therapy in glioblastoma20; the trade-off between healthy tissue and cancer in multiple myeloma21,22; and drug resistance in general23–25. These theoretical studies, combined with the recent empiric realizations, suggest significant opportunities to improve therapy by using this evolutionarily enlightened approach. Nevertheless, therapeutic decisions in general practice are currently not based on this knowledge and continue to use the MTD paradigm.
Assuming that an oncologist has perfect information about the current state of a tumor, and a faithful mathematical model that can predict its trajectory (these, of course, are very strong assumptions), it is not clear how he/she should adjust the schedule and doses. Based on a stage of the disease and patient’s needs, the therapy can have different final goals: maximization of patient’s duration of life, ensuring the best possible quality of the rest of life, decreasing probability of new metastases appearing, decreasing time/cost of the treatment, etc.. Unfortunately, an oncologist can usually only focus on one or two of these goals, having some reasonable constraints on the secondary parameters. Thus, an important step toward optimizing AT is to define an objective of the therapy and “translate” it into mathematical language. The next step is to quantify how good each particular strategy is with respect to that chosen objective. Optimizing this objective is a mathematical goal which can be addressed by the tools of optimal control theory.
Optimal control theory, a branch of mathematics typically applied to problems in engineering, can be applied to a wide class of problems arising in oncology26. The first application of optimal control theory in cancer was done by Swan and Vincent27 who found the optimal treatment strategy for multiple myeloma with the objective to minimize the total amount of drugs used applying the Pontryagin Minimum Principle (PMP)28. Since then, others have used the PMP to different optimal cancer treatment problems: a chemotherapy optimization under evolving drug resistance 29–31, optimal scheduling of a vessel disruptive agent32, MAPK inhibitors 33 input in cancer treatment, minimization amount of drugs prescribed in tumorimmune model34, finding compromise between drug toxicity and tumor repression for the myeloma bone disease35, and many others.
While these approaches have offered benefits in their ability to formally optimize problems written as dynamical systems, the PMP method has several limitations. First, PMP yields only a necessary condition for an optimum, and any locally optimal trajectory of the control system satisfies PMP. Local optimality means that the trajectory is optimal when comparing it with its small perturbations, but there may well be a different trajectory that is even better (globally optimal, compared with all possible trajectories). Secondly, PMP provides a time-dependent (open-loop) control: given an initial state, the method provides an optimal treatment strategy as a function of time – therefore a treating oncologist has to follow it regardless of the changing state of the tumor. However, if the underlying model has been perturbed or includes some noise (like a tumor acquiring mutations, say), the control cannot adapt to these unexpected changes.
A different approach to analysis of an optimal control problem is a feedback (closed-loop) control point of view. Using the Hamilton-Jacobi-Bellman (HJB) equations, one can obtain controls that depend on the current state of the dynamical system (current distribution of sub-populations of cancer cells) rather than only the current time36,37. In this case, the treating oncologist’s decisions can be adjusted if something unexpected has happened with the trajectory. Moreover, the HJB equations guarantees that the resulting treatment feedback strategies are globally optimal. Despite these advantages of the HJB over PMP method, there are few works38,39 which use the feedback control paradigm to find an optimal treatment strategy.
Here, we apply the HJB approach to solve for optimal treatment strategies for a model of lung cancer proposed by Kaznatcheev and colleagues40. In that paper the authors introduce an evolutionary game (system of replicator-type equations) that models the dynamics of three sub-populations of tumor cells. The article highlights the importance of a good scheduling in the polyclonal regime, when the game has cyclic dynamics. The article has an example of two different scheduling strategies with the same set of initial parameters that lead the system to opposite outcomes: putative recovery, versus putative death of a patient. While several qualitatively different treatment schedules are presented, optimal therapy is not discussed. Given the growing interest in EGT in clinical applications14 and recent work connecting these models using direct in vitro parameterization41, we believe the optimization of therapies based on such models will become increasingly important and the HJB-based approach will be used far more often in the future.
2 Methods
In this article we focus on a model of cancer evolution that has been proposed in Kaznatcheev et al. 40 and summarized in Box 1. This model considers interactions between three different sub-populations of cancer cells playing a modified version of the public goods social dilemma: glycoltyic cells (GLY), vacscular overproducers (VOP) and cells called defectors (DEF) which use both strategies to “cheat” on the others. GLY cells are anaerobic and produce lactic acid. Both VOP and DEF cells are aerobic. In addition, VOP cells spend extra energy to produce VEGF (a protein that improves the vasculature, benefiting both VOP and DEF). Based on the replicator model from EGT15,16 summarized in equations (1) and (3), the evolution of the tumor is described by tracking the changing proportions of GLY, VOP and DEF cells in the full population. The patient is viewed as recovered when the GLY proportion falls below some low threshold rb. (Below this recovery barrier, the validity of replicator-based model is harder to justify and we assume that the GLY cells are essentially extinct.) Conversely, we assume that GLY cells suppress other tumor cells and a patient dies if the total proportion of aerobic cells (VOP and DEF sub-populations combined) falls below some low threshold (a failure barrier) fb.
Box 1: Mathematical model of the cancer sub-populations evolution
Subpopulation Proportions: (xG, xD, xV) for GLY, DEF, and VOP respectively.
Note: xG + xD + xV = 1.
Relative Subpopulation Fitness: (ψG, ψD, ψV) defined in Materials and Methods of Kaznatcheev et al.40).
Full Population Averaged Fitness: <ψ>:= xGΨG + xDΨD + xVΨV.
Replicator equations to model the evolution of sub-populations:
Transformation/Reduction to a 2D system:
Equivalent dynamics of (1) in reduced coordinates (see Appendix B in40):
Parameters:
ba, the benefit per unit of acidification;
bv, the benefit from the oxygen per unit of vascularization;
c, the cost of production VEGF;
n, the number of glycolytic (GLY) cells in the interaction group.
Conditions for homogeneous center equilibrium and periodic oscillations:
Process terminates as soon as either
Terminal set:
For a range of parameter values (4), this model predicts a heterogeneous regime* in cancer evolution with coexistent and oscillating proportions of GLY, VOP, and DEF. Without any treatment, these sub-populations follow cyclic dynamics and a patient never recovers; see Figure 1(a).
Following Section 4.1. in Kaznatcheev et al.40 we consider a cell-type-targeting therapy that preferentially penalizes the fitness of GLY cells; see formula (7) in Box 2. During the treatment, a doctor defines the timing of therapy and its intensity. This time-dependent intensity d(t) can vary between 0 (no therapy) and dmax > 0 (the maximum tolerated dose, MTD). Two extreme case (d(t) = 0 versus d(t) = dmax for all times t) are illustrated in Figures 1(a) and 1(b) respectively. In the latter case, GLY cells become extinct and the patient recovers quickly, however it is natural to ask whether this treatment strategy is optimal in some sense (i.e. could the recovery be much delayed if the patient received therapy less often or at a lower intensity?). In the following section we will show that the MTD-based treatment can lead to an avoidably high cumulative amount of therapy (see Figure 2(c) or might even fail to achieve a recovery in situations where AT-based treatment would otherwise have succeeded (see Figure 6), much like the early results from Zhang et al. in metastatic prostate cancer14.
Box 2: Evolution dynamics with control on therapy intensity
Time-dependent intensity of GLY-targeting therapy: d : ℝ#x002B; → [0, dmax].
Evolutionary dynamics as a controlled system:
One natural objective function to minimize is the total amount of therapy administered over the course of treatment (which in this case could be a surrogate for both toxicity and cost). This can be quantified as , where the total time of treatment T is dependent on the initial cancer subpopulation fractions and on our chosen therapy policy d(·). However, this objective is problematic for two reasons. First, the minimum of D is clearly attained without any therapy (taking d(t) ≡ 0 implies D = 0 and T = +∞), even though the dynamics become cyclic and the recovery is never achieved. Second, if we constrain our minimization to only those d(·) that lead to recovery, an optimal treatment policy does not exist. Instead, there is a sequence of treatment policies that lead to successively smaller d values but with an unbounded increase in corresponding treatment times T. The idea of such policies is simple: travel along the therapy-free trajectories of Figure 1(a) for most of the time, but use short bursts of therapy only when the drugs are most effective. To approach the optimally small d, one would need to use shorter and shorter bursts, resulting in policies that are hard to implement in practice and would require unrealistically long treatment times T, but would yield a situation like a chronic disease, where while the tumor is never cured, it is always controlled.
In order to get a meaningful optimal policy we will penalize the treatment time by a time penalty σ > 0. The total time spent on the treatment, including time when a patient skips the doses, is an important factor by itself. Much longer time spent on a treatment causes worse quality of life and additional costs for a patient. Therefore, our objective is to minimize the sum of a therapy cost D and a treatment time cost σT, while guaranteeing the treatment results in recovery. For every choice of σ > 0, the resulting treatment policies are thus Pareto-optimal with respect to D and T.
In this paper we consider an objective function that is equal to , if a patient recovers, and is equal to otherwise. The value function, u, is defined as the minimum of this objective function (over the set of treatment policies), and any policy d(·) that realizes this minimum is called optimal.
Due to the structure of this optimization problem, one can show that virtually all optimal treatment policies are bang-bang: at any given time t, they either administer therapy at MTD-rate (d(t) = dmax) or administer no drugs at all (d(t) = 0); see Section 6.1 in Supplementary Materials. For such policies, the objective function becomes a weighted sum of the total therapy time (when d(t) = dmax) and the total treatment time T, with dmax and σ as the corresponding weights. Moreover, this allows for a simple visual representation of any such policy: splitting the full state space into two parts (MTD dynamics vs. no-therapy dynamics), shown in yellow and blue respectively in all figures throughout this paper, and simplifies application of therapy into something familiar to clinicians: on or off.
3 Results
3.1 Quantifying the benefits of optimal treatment strategies
In this section we apply the optimal control theory to an example considered in Kaznatcheev et al.40. The details of the optimal control problem formulation are summarized in Box 3. We take the same model parameters as in Figure 2 of Kaznatcheev et al.40: ba = 2.5, bv =2, c =1, n = 4, strength of MTD dmax = 3, and initial state (q0,p0) = (0.6, 0.9), which by formula (2) is equivalent to (xD,xG,xV) = (0.04, 0.9, 0.06). We also use σ = 0.01 to incorporate a time-penalty absent in the original model. We take rb = fb = 10−1.5 as recovery and failure barriers†
Box 3. Objective function
Terminal time (or the total treatment time):,
If the system never gets to the terminal set, we assume that T(q0,p0,d(·)) = +∞.
Terminal cost function is g(q,p) : Δ → {0, +∞} s.t.
Let T := T(q0, p0, d(·)) be the terminal time.
Treatment cost (objective) function:
J is finite if the system (7) terminates at the recovery barrier, and is infinite otherwise (i.e., if the system terminates at the failure barrier or does not terminate at all).
Value function: can be found by solving Hamilton-Jacobi-Bellman (HJB) PDE:
The boundary conditions of HJB equation:
Once u and its gradient are found through a numerical approximation, they can be used to obtain the optimal control in feedback form: d* = d(q,p).
In Figure 2 we compare the treatment cost (10) and treatment time (8) of trajectories corresponding to four different treatment strategies.
The first two strategies we consider are similar to those modeled in Kaznatcheev et al.40: 2(a) is an example of a bad policy that may cause a failure by stopping the therapy prematurely, while 2(b) is a good policy based on ad-hoc adjustment of the start time for the therapy. We also illustrate a “MTD-based”’ policy 2(c), which is analogous to the standard of care using MTD as long as possible. Even though both 2(b) and 2(c) lead to recovery, neither of these is optimal (with the MTD-based approach resulting in excessive amount of drugs, captured by the higher cost). The policy minimizing our objective function can be found by solving the HJB equation; we illustrate it in 2(d).
The corresponding therapy on/off regions and the resulting vector field are shown in Figure 3(a). The zoomed version shows that trajectories can be prevented from crossing the failure barrier by using the MTDs just before crossing. In fact, a chattering control (with intermittent and sufficiently frequent use of MTDs) would be sufficient to guarantee this.
The level sets of u in Figure 3(b) show that value functions need not be smooth. Since the gradient of u is used to determine the optimal course of action (therapy on or off), there can actually be more than one optimal policy for initial states on a shockline (where that gradient is undefined). We show an example of such trajectories for an initial point (xD,xG,xV) ≈ (0.417, 0.311, 0.272) (solid green and red lines in Figure 3(b)). Both trajectories yield the same cost of 2.764. Moreover, non-smoothness of the value function often poses a challenge for method’s based on Pontryagin Maximum Principle (PMP) 28 even if the initial state is not on a shockline. For example, perturbing the initial state to a nearby (xD, xG, xV) = (0.35, 0.3, 0.35), denoted by a cross in Figure 3(c), one sees two locally optimal trajectories and PMP might yield either of these depending on the initial guess. The green one is however inferior to the globally optimal red trajectory, which is always recovered by solving the HJB equation.
3.2 Optimization trade-offs: total administered drugs versus time to recovery
The optimal therapy-on regions are clearly dependent on specific values of all model parameters. Here we explore their dependence on σ and dmax. Recall that, for every policy leading to recovery, the overall cost of treatment is a sum of the “therapy cost” (i.e., the total amount of drugs administered, and the treatment-time cost σT. Since the optimal control is bang-bang, this can be re-written as a weighted sum of the time-till-recovery T, and the total drug therapy time . That is, for controls based on repeated therapy-off/MTD-level-therapy switches, we can re-write the overall cost as , with the ratio between the weights (σ/dmax) representing the “relative importance” of T and for the optimization. But the functional role of these weights is quite different: while σ can be chosen to reflect our preferences, the MTD-rate dmax is dictated by the medical reality, which will be patient and drug specific. By varying dmax while keeping (σ/dmax) constant, we can study the role played by the MTD-level in determining optimal policies under a fixed relative preference between the objectives. In Figure 4 we conduct this experiment for two different initial states and the same set of game parameters (ba = 2.5, bv = 2, c =1, n = 4), demonstrating that the value of dmax strongly influences the optimal policies and the shape of the “therapy-on” yellow regions.
On the other hand, for any fixed/biological dmax value, we can vary σ to study how the trade-off between and T affects the optimization. This experiment is conducted for the same two initial states in Figure 5. As we can see, smaller σ entails larger total time. Intuitively, this happens since it becomes “cheaper” to pause the therapy until we reach a “better” state to administer the drugs. Larger σ leads to a shorter time-to-recovery T, but also an increase in the total amount of administered drugs and a larger therapy-on region (shown in yellow) in the state space.
3.3 “Incurable” states and periodic trajectories under MTD treatment
One might think that, despite being sub-optimal, an aggressive MTD-based strategy is at least always fully reliable and the resulting trajectories are guaranteed to reach the recovery zone from every initial configuration, as shown in Figure 2(c). Indeed, if , the MDT-based policy (d(t) ≡ dmax) guarantees that ṗ is always negative; see equation (7). But with the recovery might not be attained with the constant use of MTDs (even if some other treatment policies are successful!).
Consider, for example, the following set of parameters: ba = 4, bv =2, c =1, n = 4; rb = fb = 10−1.5; dmax = 0.3, σ = 0.03 and an initial state (xD, xG, xV) = (0.02, 0.8, 0.18). Under these parameters the MTD-based therapy has a periodic trajectory‡; see Figure 6(b). Since the treatment time is infinite, the cost (10) of such a policy is +∞. (In reality this would lead to the emergence of drug resistance and eventual failure, but this biological situation is not modeled in Kaznatcheev et al.40.)
We can see that neither of two extreme strategies (“no-drugs-at-all” in Figure 6(a) and the MTD-based “drugs-all-the-time” in Figure 6(b)) can bring the trajectory to the recovery zone. However, their adaptive combination can still achieve the objective. We show a trajectory corresponding to the optimal policy in Figure 6(c). With a larger failure zone (e.g., rb = fb = 10−1), a previously successful MTD-based treatment might even result in death (Figure 6(d)), while the adaptive strategy still leads to recovery (Figure 6(e)).
For a fixed treatment policy, we define its corresponding “incurable” area to be a set of states starting from which it is impossible to cross the recovery barrier. For example in Figure 6, the incurable area of the MTD-based policy includes the state (xD,xG,xV) = (0.02, 0.8,0.18) (when rb = fb = 10−1.5 or rb = fb = 10−1). However, this state is not in the (dramatically smaller) incurable area of the adaptive/optimal policy; see Figure 7.
Starting from any incurable configuration, one could similarly pose a different control problem of maximizing the time until crossing the failure barrier. While we do not address it here, we note that the HJB approach would be quite suitable to find optimal treatment policies for this problem as well.
Of course, the “incurable areas” are highly dependent on model parameters. In section 6.4 of Supplementary Materials, we show that they can grow due to an increase in the MTD rate dmax or a decrease in vascularization benefits bv.
4 Discussion
By now it is widely accepted that cancer is an evolutionary process, and that variation and selection drive the emergence of drug resistance. While this new knowledge is driving cancer research forward, it has largely not yet affected clinical practice, with the majority of clinical protocols relying on MTD-based approaches, which invariably fail in the setting of most metastatic disease. This is changing with the advent of adaptive therapy – therapeutic strategies specifically designed with a changing regimen prescribed: one that adapts to an evolving tumor 42.
To optimally design AT protocols, the underling dynamics of the tumor growth and treatment response must be known. While the methods of learning these dynamics are still in their infancy, the last decade has seen a flurry of activity using evolutionary game theory and other evolutionary models for a range of prescribed dynamics. These works have shown qualitative changes in tumor behavior in a range of treatment scenarios, and importantly, demonstrated that the order43, sequence40 and timing17 of therapy can drastically change the outcomes. As we come closer to the reality of evolutionarily designed therapeutic trials in the mainstream, it is important then that we develop methods not just to make our outcomes better, but also to formally optimize them.
However, before using the mathematical tools for optimization, one also needs to choose a specific quantifiable criterion for comparing the outcomes. Once that criterion is selected and the underlying mathematical model is sufficiently accurate, the best treatment strategy can be found by the techniques of optimal control theory. In this paper, we show how this can be done for one particular heterogeneous cancer model previously described in Kaznatcheev et al.40. Given a set of model parameters (bv, ba, c, n) and treatment/recovery parameters (dmax, rb, fb), we can find an optimal treatment policy for any initial distribution of cells (q0, p0). We show that the optimal treatment policy can have multiple regimes: always on d*(·) ≡ dmax, always off d*(·) ≡ 0, and involving several contiguous treatment periods. For the latter, the challenge is to accurately approximate the on/off “switching curves” in the state space. We show that the definition of optimal treatment policy is heavily dependent on a parameter σ, a “cost” term, describing the relative importance of minimizing the total amount of drugs administered versus the total time to recovery. We further show that, for some parameter regimes, there are “incurable regions” in the state space – the starting configurations that will not lead to a recovery regardless of the chosen treatment strategy – suggesting an alternative therapy (or goal) should be considered. Moreover, for some other starting configurations, the “always on” strategy might not lead to a recovery even if the recovery is actually achievable with some on/off hybrid strategies.
Just as any other model, the approach in Kaznatcheev et al.40 is based on simplifying assumptions (e.g., only the subpopulation fractions are important, and no novel types can arise), which limit its practical applicability. But our message is broader, and we use this specific model primarily to illustrate the general optimization approach. With adaptive therapy trials now under way showing early promise14, we propose that such methods will become increasingly integrated into trial design discussions, with increasingly detailed cancer evolution equations or even in data-driven/equation-free framework.
Of the two main approaches of optimal control theory, the Pontryagin Maximum Principle (PMP) has been much more widely used in cancer treatment research up till now. In contrast, our approach here is based on dynamic programming and the numerical methods for Hamilton-Jacobi-Bellman (HJB) equations. The higher computational cost of these methods is balanced by several important practical considerations. First, they yield a policy in feedback-form and are thus more robust to modeling/measurement errors. Second, they always return the globally optimal treatment strategies and avoid some of the pitfalls well-know for the PMP-based methods (e.g., see Figure 3(b)). With the advent of efficient numerical methods, we posit that the HJB equations will be soon playing a larger role in treatment optimization.
There are several obvious directions for extending our approach. First, the ability to optimize outcomes for a range of criteria will open new avenues to quantify physician/patient discussions concerning trade-offs and personalized therapy that have previously been only qualitative. In the current paper, computing optimal policies for different values of σ can be viewed as a small step in this direction. But there are many other possible optimization criteria of practical interest. Methods for approximating all Pareto-optimal policies are available44 but are usually more computationally challenging. For probabilistic cancer evolution models, one can also choose between optimizing different characteristics of the same random quantity. (E.g., minimize the average time-to-recovery versus maximizing the probability of recovery in the next year). Finally, one can also use the choice of criterion to promote robustness by systematically treating possible measurement/modeling errors as perturbations chosen by some adversarial player. Such “games-against-nature”, as described recently by Stankova and colleagues42, can be similarly treated by solving Hamilton-Jacobi-Isaacs equations.
Another important limitation of the above techniques is our assumed full knowledge of the system state. E.g., we assumed that the exact subpopulation fractions are known at every point in time and can be used to decide whether to administer the drugs. In practice, one can periodically obtain an approximation of these quantities (e.g., based on a repeat biopsy), but most of the time the decisions must be made based on some less invasive measurements (e.g., based on PSA-levels in castrate-resistant prostate cancer model14 or in cell free circulating DNA as recently shown by Khan and colleagues45). A rigorous treatment of such optimization challenges will require the framework of partially-observable controlled processes; see e.g., Davis and Varaiya46.
With cancer evolution playing a larger and larger role in our thinking about therapy, and adaptive/evolutionary therapy coming to the fore, methods for optimizing these approaches will grow in importance. We suggest that the HJB method be used in these scenarios, and that clinicians and modelers begin discussions of further method development along these lines.
5 Conflict of Interest
The authors declare no conflict of interest.
Acknowledgement
MG acknowledges the support from The Institute for Data and Decision Analytics at The Chinese University of Hong Kong, Shenzhen during his visit when this research was completed.
JGS would like to thank the NIH Case Comprehensive Cancer Center support grant P30CA043703 and the Calabresi Clinical Oncology Research Program, National Cancer Institute of Award Number K12CA076917.
AV would like to thank the Simons Foundation for its fellowship support and the National Science Foundation (award DMS-1738010) for supporting development of numerical methods for Hamilton Jacobi equations. A part of this work was performed during a sabbatical leave at Princeton/ORFE, and AV is grateful to ORFE Department for its hospitality.
Footnotes
↵* Other tumor regimes (fully angiogenic and glycolyctic) also exist outside of this parameter range. They are less interesting from the point of view of treatment strategies, but we still consider them for the sake of completeness in Section 6.3 of Supplementary Materials.
↵† This change in parameter values is meant to decrease the computational cost of our numerical approach (see section 6.2 in Supplementary Materials.) The original rb = fb = 10−4 from Kaznatcheev et al.40 would require computations on a finer mesh.
↵‡ This is easy to prove by redefining the parameter ba := ba − (n + 1)dmax > 0 and reducing the MTD-based case to the periodic behavior of the original uncontrolled system (3) in the parameter regime (4).
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].
- [54].↵
- [55].↵
- [56].↵
- [57].↵