Abstract
Mathematical modelling has been widely applied to better understand the transmission, treatment and prevention of infectious diseases. The comparison of different models is crucial for providing robust evidence for policy-makers because differences in model properties can influence their prediction.
In this study, two individual-based models for simulating HIV epidemics and sexually transmitted infection (STI) co-factor effects, i.e. models developed with the Simpact Cyan 1.0 and StepSyn 1.0 frameworks, were compared. Simpact Cyan 1.0 uses a continuous-time implementation and is updated each time an event happens, while StepSyn 1.0 uses discrete daily or weekly time-steps and updates population, sexual links, and STI states at each time-step. Furthermore, there are differences in how stochastic processes are described, how individuals enter and leave the population, and in the formation and break-up of ties in the sexual network. We evaluated how these differences would affect survival and HIV prevalence curves during the course of a heterosexual HIV epidemic with Herpes simplex virus-2 as contributing STI, using as case study Yaoundé (Cameroon), 1989-1998. To allow such direct comparison, neither modelling framework used its full potential. For each model, 100 simulations were performed with parameters calibrated to HIV prevalence data from Yaoundé between 1989-1998. The median time profile for both models matched the data equally well, but a slight difference in variability among simulations could be observed. This can be explained by differences in model initialization and the generation of the sexual network. Moreover, our example shows that similar HIV prevalence curves can be simulated using a different combination of sexual network and HIV transmission parameters.
We conclude that it is important to carefully consider differences in model properties, here in particular their initialization and the generation of sexual networks. We strongly recommend using different models to assess the robustness of evidence provided by mathematical modelling.
1. Introduction
Mathematical modelling has been widely applied to better understand the transmission, treatment and prevention of infectious diseases. The role of mathematical models in understanding the dynamics of the human immunodeficiency virus (HIV) epidemic has been recently reviewed by Geffen & Welte [1]. They present examples of HIV models that were used for estimating the size of the epidemic in specific subpopulations such as the Black population in South Africa and for estimating the impact of interventions such as the use of anti-retrovirals and condoms to reduce HIV and AIDS.
When considering mathematical models, two commonly used types of implementation can be distinguished: compartmental and individual-based models (IBMs). While compartmental models simulate population counts, IBMs (also called agent-based models or micro-simulation models) keep track of the history of each individual in the population separately.
Recent applications of compartmental HIV models include testing the effect of different assumptions for HIV dynamics on the predicted impact of anti-retroviral therapy (ART) in men-having-sex-with-men (MSM) [2], and the study of the influence of concurrent partnerships on HIV dynamics [3]. IBMs have been recently applied to assess the influence of pre-exposure prophylaxis (PrEP) in MSM [4], to evaluate the long-term effect of early ART initiation [5] and to understand the factors underlying the emergence of HIV in humans [6].
In contrast to compartmental models, IBMs allow a high degree of individual heterogeneity [7] (e.g. regarding (sexual) risk behaviour, age demography and individual response to treatment). This is an advantage when heterogeneity matters for the particular question/process that is studied. Because individual heterogeneity is inherent in transmission, prevention and treatment of HIV and other sexually transmitted infections (STI), IBMs are particularly suitable to estimate the most beneficial intervention for specific individuals. Furthermore, IBMs allow for an explicit modelling of the sexual relationships that jointly form the sexual network over which STI/HIV infections are transmitted [8].
Model comparisons are crucial for providing robust evidence for decision-making in public health and policy purposes [9]. To assess uncertainty, it is necessary to study how differences in model properties influence their prediction. Eaton et al. compared ten mathematical models aimed at studying HIV prevalence, incidence and ART [10]. However, eight of these models were compartmental, and the majority of these models did not include the co-factor effects of other STIs. The remaining two models were discrete time IBMs.
This study presents and compares two IBM frameworks for simulating HIV transmission dynamics with STI co-factor effects. The purpose of the comparison was to assess how the differences between the two modelling frameworks affect the simulation of an HIV epidemic influenced by an STI co-factor. Model options that were absent in one of the frameworks had to be turned off in the other framework to allow direct comparison. For this purpose we chose the HIV epidemic in Yaoundé (Cameroon) and we will refer to the models built in the two frameworks as Simpact Yaoundé and StepSyn Yaoundé. As representative use cases, we compare survival curves and fit both models to HIV prevalence time series from Yaoundé.
2 Methods
2.1. Simpact Cyan 1.0 modelling framework
Simpact Cyan 1.03 [11] is a freely available framework for developing IBMs to simulate HIV transmission, progression and treatment. Models developed with Simpact Cyan 1.0 are event-driven, which means that the models are not updated at fixed time-intervals, but at every time an event happens, making Simpact Cyan 1.0 a continuous-time simulation modelling framework. Events occur as a result of stochastic event processes, described by hazard functions. The timing of an event is determined using the modified next reaction method (mNRM) [12].
To initialize a model, a number of individuals are generated. The age of each person is drawn from an age distribution based on a population pyramid, and when the age is larger or equal to the sexual debut age, a person is marked as sexually active.
In Simpact Cyan 1.0, the user can specify which events are possible in the simulation, depending on the research question he or she wants to solve. The possible events are: heterosexual and homosexual relationship formation and dissolution, conception and birth, HIV transmission, AIDS and non-AIDS mortality, HIV diagnosis and treatment, HIV treatment dropout, and STI co-factor transmission events. Furthermore, events describing the natural history of HIV are implemented in Simpact Cyan 1.0. For each event, the form and the parameters of the hazard function can be modified flexibly, depending on the research question one wants to answer. Simpact Cyan 1.0 is implemented in C++ with R [13] and Python [14] interfaces.
2.2. StepSyn 1.0 modelling framework
StepSyn 1.0 is an IBM modelling framework that simulates epidemics of STIs, including HIV. It focuses on the epidemiological synergy and interactions between HIV and other STIs. Time is divided into fixed one-day or one-week intervals. In this study, we use only weekly time steps to speed up computational time. Formation and dissolution of sexual relationships, and STI and HIV transmission are modelled as stochastic processes. For each STI, life history is explicitly modelled, including several stages, recurrences, genital ulcers, and urethral discharge symptoms. The effects of these symptoms on the HIV transmission probability are explicitly modelled. The parameters for the timing and duration of the STI stages and symptoms, and the co-factor effects of genital ulcers and discharge on HIV transmission are based on a literature review. To enable comparison of the two models, in the present study, StepSyn is parameterized to track only HIV and Herpes simplex virus type 2 (HSV-2), and the latter virus is modelled without explicit stages or recurrences, and with the probability of transmission and co-factor effect on HIV both constant and not dependent on recurrences. Heterosexual relationships can include marital and short-term links, and contacts between commercial sex workers (CSW) and clients. In the present study CSWs are not included. Individuals vary in their tendency to form short-term relationships, so that their number of non-marital partners within a year would follow a power law distribution. The parameters of the latter were derived from behavioural data gathered in the 4 Cities Study [15,16]. StepSyn is implemented in R [13].
2.3. Key model differences
The Simpact Cyan 1.0 and StepSyn 1.0 modelling frameworks were developed separately. Similarities and differences between the two modelling frameworks are outlined in Table 1. Model options that were absent in one of the frameworks were turned off in the other framework to allow comparison (see Table 2). In Simpact 1.0 model options were turned off in the parameter configuration file (R file calling the C++ source code), while in StepSyn both changes in the parameter files and R source code were necessary.
Simpact Cyan 1.0 is freely available. StepSyn is not yet freely available but will become so upon publication of a manuscript that describes the modelling framework into more detail (manuscript in preparation). Models developed with Simpact Cyan 1.0 and StepSyn 1.0 are both individual-based and both include HIV natural history. While StepSyn models can also include the natural history of other STIs, Simpact Cyan 1.0 models are restricted to a generic, constant STI co-factor effect on HIV. Within the StepSyn 1.0 framework, five different STIs (other than HIV) can be simulated but for the purpose of the present study only HSV-2 and HIV were simulated (Table 2). Furthermore, StepSyn Yaoundé was ran without modelling HSV-2 recurrences, considering only the chronic stage as infective and using a constant co-factor effect on HIV.
Models developed with Simpact Cyan 1.0 and StepSyn 1.0 are both accessible from R, but they differ in the way they are implemented. While the source code for StepSyn is R code, Simpact Cyan 1.0 is implemented in C++ with interfaces to Python and R. In this way, Simpact Cyan 1.0 combines the computational efficiency of C++ with the user-friendliness of R and Python.
Simpact Cyan 1.0 and StepSyn 1.0 also differ in the way the state of the model system is updated and, as a consequence, in the way stochastic processes are described. StepSyn implements discrete time steps of either one day or one week (one week in this study) and updates population, sexual links, and STI states at each time step. Stochastic processes are implemented through R random distribution functions which are called in each time step. Simpact Cyan 1.0 implements a continuous time model and the state of the model system is updated each time an event happens, and therefore requires the use of hazard functions. The hazard at a certain time point is defined as the event rate at that particular time point, conditional on model states such as still being alive at that time point. An advantage of the continuous time implementation is that events that happen on different time scales can be integrated into a single simulation. A disadvantage is that model parameters in the literature are mostly described as probabilities, and as a consequence have to be converted to hazards. Another drawback is that continuous time models are computationally more complex, and therefore require a longer simulation time.
Another difference between both modelling frameworks is the way individuals can enter and leave the population. In Simpact Cyan 1.0, individuals can enter the population by birth, and leave the population by non-AIDS or AIDS mortality. The mortality events are age-dependent. StepSyn 1.0 is not age-structured; individuals enter the population by immigration or sexual debut and leave the population by emigration, AIDS-or non-AIDS age-independent mortality. In this study, the age-dependent parameters of Simpact Cyan 1.0 were set to 0, no new individuals enter the population, and individuals only leave the population by AIDS-mortality. In this study, StepSyn parameters are set so that immigration, emigration, sexual debut, and background mortality are set to 0, so that individuals only leave the population by AIDS mortality.
An extra option in Simpact Cyan 1.0 that is not implemented in StepSyn 1.0 is pregnancy. In Simpact Cyan 1.0, women are pregnant between a conception event and a birth event. This means that after a simulation run, one can measure how many women were pregnant at any given point in time in the simulation.
Formation and break-up of relationships in the sexual network are implemented differently in StepSyn 1.0 and Simpact Cyan 1.0. In Simpact Cyan 1.0, formation and break-up of relationships happens through formation and dissolution events. The timing of these events is sampled from a probability distribution that emerges as a result of the hazard function that was specified. In StepSyn, formation and break-up of relationships is performed each time step (week in the current study), using an individual-specific probability of forming a new relationship.
An extra option in Simpact Cyan 1.0 that is not currently implemented in StepSyn 1.0 is the simulation of diagnosis and treatment, including treatment dropout, and the possibility to introduce treatment during the simulation through a so-called intervention event.
2.4. Data - city of Yaoundé
2.4.1. Data available for generating the sexual network in StepSyn 1.0
Since StepSyn 1.0 models explore behavioural differences between individuals, behavioural data on the distribution of the number of non-marital partners in the last 12 months were gathered for Yaoundé during the development of the StepSyn 1.0 modelling framework. These data were obtained from the surveys of the 4 Cities Study, and provided by the Institute of Tropical Medicine (ITM) in Antwerp, Belgium (Anne Buvé, unpublished database). These data were used to estimate the parameters of a power law distribution to generate the sexual network for the StepSyn Yaoundé model (see paragraph 2.5). This work was performed prior to the comparison study.
2.4.2. Data available for comparison of Simpact Cyan 1.0 with StepSyn 1.0
For the comparison of the two models, demographic data for 1997, and HIV prevalence data for 1989–1998 related to the African city of Yaoundé (Cameroon), are used. The adult male population for that year was estimated at 387,398, in the 4 Cities Study [17]. Based on the surveys made by the same study, 34.5% of men and 44.2% of women were married; and 7.2% of married men were polygamous [16]. Assuming most of the latter had 2 wives, we estimate the number of women as 387,398 * 0.345 * (1+0.072) / 0.442 = 324,152, implying an adult sex ratio (M:F) of 1000:836.74.
Since one of our aims here is to compare HIV prevalence curves resulting from fitting HIV transmission parameters of our two models to data, we gathered HIV-1 prevalence data for Yaoundé for the period 1989–1998. The only prevalence data available was of pregnant women, for whom prevalence increased from 0.7% in 1989 to 5.5% in 1998 (US Census Bureau, 2001, HIV/AIDS Profile, Cameroon, HIV/AIDS Surveillance Database [18]) as shown in Table 3.
Because StepSyn 1.0 does not implement pregnant women as a separate category of the population, and we wanted to compare model simulations for males and females separately, the prevalence rates for pregnant women were converted to prevalence rates for males and females as follows. In 1997, during the 4 Cities Study, HIV-1 prevalence was 4.1% in men and 7.8% in women [19]. The prevalence for pregnant women in 1997 was estimated by fitting a smoothing spline through the data from Table 3, using the smooth.spline function of the stats package in R, and evaluating the spline at the time point corresponding with 1997 (see Figure S1). We obtained a prevalence of 4.778 % for pregnant women in 1997, and calculated the prevalence ratio men/pregnant women (resp. women/pregnant women) by dividing 4.1 (resp. 7.8) by 4.778. The prevalence ratios (0.858 for men and 1.632 for women) were multiplied with the data from Table 3 to obtain HIV prevalence for men and women separately for all years between 1989 and 1998 (see Table 4).
2.5. StepSyn 1.0 sexual network
Power law distributions (separately for men and women) were fitted to the data on the reported number of partners in the last 12 months taken from the 4 Cities Study. Each individual is assigned a preferred degree (number of partners within a period of 12 months) drawn from the appropriate distribution. The individual-specific preferred degrees (PDi) are translated into probabilities of forming a new short term relationship per week (PFi) using the formula PFi = PDi / (MD + 52), with MD being the mean duration of short term relationships in weeks. With these parameters, in each week, the total male demand for new relationships is higher than the female one, partly because of the male-biased sex ratio, and partly because of female underreporting of number of partners in the survey that serves as basis for the distribution used. The extra male demand is uniformly distributed between the females, a method that is acceptable for modelling purposes [20]. The number of weekly sex acts in married couples and in short-term relationships is calculated by applying two different Poisson distributions, so that their means would be consistent with the values found in the 4 Cities Study for Yaoundé [15,16].
2.6. Simpact 1.0 sexual network
In Simpact Cyan 1.0, formation and dissolution of relationships in the sexual network are simulated as discrete events. For man-woman pairs of sexually active persons, formation events are scheduled. When a formation event is triggered, a sexual relationship is established, and a dissolution event is scheduled. When the dissolution event is triggered, the relationship ceases to exist.
Formation and dissolution events are described by hazards. In the description of the hazards, parameters can be included describing dependence of relationship formation and dissolution on the number of relationships the person already has, the age of the person and the preferred age gap between the partners.
In this comparison study, the default settings of Simpact Cyan 1.0 for the formation and dissolution hazard were applied (the developers of Simpact Cyan 1.0 did not have access to the behavioural data described in paragraph 2.4.1).
Because Simpact Cyan 1.0 keeps track of the history of each individual, we are able to reconstruct the sexual network from the model output. The network can be reconstructed at a given time point by including all relationships a person has at that time point for each person in the population. Furthermore, a cumulative network over a certain period of interest can be reconstructed, including all relationships between individuals in the population during that period (e.g. all relationships people had during the past 12 months).
2.7. Converting transmission probability parameters to hazard parameters
While in the majority of the literature, and also in StepSyn 1.0, transmission parameters are described as probabilities, the parameters of Simpact Cyan 1.0 are described in terms of hazards. Transmission probability parameters were converted to hazard parameters using the following formula [21]: where F(t) is the cumulative distribution function and λ(x) is the hazard function.
An example of conversion of a transmission probability parameter to a hazard parameter is presented in the Supplementary Material.
2.8. Parameter fitting methodology
For four of the HIV transmission parameters in StepSyn Yaoundé, there exists a corresponding set of parameters in Simpact Yaoundé (see Table 5). These four HIV transmission parameters for both models (see Table 7) were fitted to the data in Table 4 by applying an iterative active learning approach [22] using the procedure described in [23] and minimizing the sum of squared relative errors [24] to determine model performance. The remaining model parameters were drawn from the literature.
For each of the four HIV transmission parameters, values were drawn from a uniform distribution, using the ranges from Table S1. These uniform distributions were used as initial (prior) probability distributions for the parameters in the parameter fitting procedure. We applied Latin Hypercube Sampling (LHS) [25] to select 10,000 parameter sets.
In brief, the parameter fitting procedure consists of studying the subset of model simulations that corresponds to the top 1% of lowest values of the sum of squared relative errors. First, a parameter wise comparison between the density of the initial probability distribution of the parameters and the density of the probability distribution of the subset of parameters corresponding with the top 1% solutions is conducted in order to determine which parameters are highly influenced by the data. Second, classification trees and generalized additive models are applied to determine which patterns of parameter vectors characterize the subspace of the top 1% solutions. Finally, we apply the Maximal Information Coefficient (MIC) [26] to determine associations between the parameters in the subspace of the top 1%.
Based on the results of the analyses above, we can narrow the solution space and repeat the steps described above several times.
A more detailed description of the parameter fitting methodology is available in the Supplementary Material. The R-scripts that have been used for fitting HIV transmission parameters are available from GitHub4.
3. Results
3.1. Survival curves
To compare how both models simulate the progression of AIDS to death, we simulated survival curves for both models, setting the HIV transmission parameters to zero and using the literature parameter values in Table 6. For each model, 100 simulations were performed by initializing the models with approximately 2% of HIV-infected people, and assuming no HIV transmission. The results show that the two models are able to simulate similar survival curves (see Figure 1).
3.2. HIV prevalence curves
Both models were used to simulate HIV prevalence in Yaoundé between 1989 and 1998.
Before fitting the HIV parameters, we adapted the parameters for the STI (HSV-2) co-factor. In major epidemiological reviews of HSV-2 [31-33] the only data about Yaoundé that is mentioned is the data collected by Buvé et al. [34] in the 4 Cities Study, in which the HSV-2 seroprevalence was 50% in females. Thus, we do not have information about the temporal trends of HSV-2 in Yaoundé. However, since the prevalence in Africa seems to have slightly declined between 2003 and 2012 [31] we assumed that the measured prevalence in Yaoundé, in 1997 [34], reflects an epidemic not far from its peak. Accordingly, we adapted the parameters so that the seroprevalence of this virus first increases and afterwards stabilizes at approximately 50% for females (see Figure 2), corresponding to the HSV-2 prevalence in 1997 described by Buvé et al. [34]. The corresponding parameters for the STI co-factor are presented in Table 8. An overview of the fitted HIV transmission parameters can be found in Table 7.
For each model, Figure 3 compares the median (with 100% percentile) of 100 simulations with the fitted parameters to the estimated HIV-1 prevalence data for Yaoundé’s men and women in Table 4. Figure 4 presents the median (with 100% percentile) HIV prevalence rate over time during the period 1980–2005. For both simulation models, the median of the 100 simulations matches the data equally well. Furthermore, model predictions for the HIV prevalence rate in the period 1998–2005 are similar for both models.
4. Discussion
In this study, two agent-based models were compared based on their model properties and their ability to simulate survival curves and the course of the HIV epidemic in Yaoundé, Cameroon. Model features that were absent in one of the frameworks were turned off in the other one to allow comparison using a comparable set of parameters. The two models generate similar survival curves for the same initial conditions (equal HIV seeds) (Figure 1), but the survival curves for Simpact Cyan 1.0 show more variation than the ones for StepSyn 1.0. This can be explained by the fact that the percentage HIV seeds is taken randomly from the whole population for Simpact Cyan 1.0, and consequently the percentage HIV seeds for males and females separately can vary. In contrast, for StepSyn 1.0, the percentage of HIV seeds for males and females is the same (see Figure 5, which shows 100 survival curves for a constant HIV seed of 2%).
The slightly larger variation observed in the HSV-2 and HIV prevalence curves in Figure 2 and 4 respectively can be explained in the same way.
When calibrating the models using the same data, the STI (HSV-2) co-factor parameters are estimated higher for Simpact Cyan 1.0 than for StepSyn 1.0, while the baseline parameter is estimated lower (see Table 7). This can be explained by the difference in how the sexual network was generated. When comparing the distribution of the number of partners for both models (see Figure 6), it can be observed that in StepSyn Yaoundé some individuals have a very high promiscuity (> 10 concurrent partnerships), which is caused by using a power law distribution for generating the sexual network. In Simpact Yaoundé, the maximum number of concurrent partnerships is lower (max. 6 partners for the example in Figure 6). In summary, the example of the HIV epidemic in Yaoundé shows that similar HIV prevalence curves can be simulated using different combinations of sexual network and HIV transmission parameters.
Both modelling frameworks have several advantages and limitations. Simpact Cyan 1.0 is implemented in C++, a programming language with better performance in terms of speed compared to R. The C++ code is linked to an R (and Python) user interface, so that people can use Simpact Cyan 1.0 without knowledge of C++ and take advantage of the strengths of R (e.g. user-friendly for performing data analysis and visualization) [39].
Despite being programmed in a language with better performance, a model developed with the current version of Simpact Cyan 1.0 is as expected, being a continuous-time event-driven model, slower than a model developed with StepSyn 1.0, being a discrete time model (see Table 1), because Simpact Cyan 1.0 has to be updated more regularly and because of its higher mathematical complexity. In addition, if StepSyn 1.0 is ran with daily instead of weekly steps, it will be considerably slower (only weekly steps were ran in this study). As next steps in the further development of Simpact Cyan 1.0, a refactoring exercise will be performed (among others: selection of output to keep track of, using more optimal packages, modifying how relationships are scheduled in the core of the program) likely closing the gap in computing time between both models. Note however, that using parallel computing, the computing times for both models do not necessarily present a bottleneck.
Simpact Cyan 1.0 is a continuous-time modelling framework, which has the advantage that sexual behaviour and disease progression and transmission can be described more realistically. However, discrete-time models, like those developed with StepSyn 1.0, are easier to handle in terms of parameter estimation, because empirical data for fitting are only available at discrete time-intervals, and are often expressed in terms of probabilities and durations, instead of hazard functions [40].
Models in Simpact Cyan 1.0 can be stratified by age, while in StepSyn 1.0 this option is not available. However, in this study all age-related options in Simpact Cyan 1.0 were turned off. Ignoring age-heterogeneity can among others lead to underestimation of the initial growth rate of the HIV epidemic [41].
Both modelling frameworks include STI co-factor effects, which are known to strongly influence HIV infectivity [42]. Apart from STI co-factor effects, StepSyn 1.0 models can also include the natural history of several STIs. While StepSyn 1.0 mainly focuses on the natural history and transmission of STIs, Simpact Cyan 1.0 has also features to evaluate the effect of interventions, like treatment with antiretroviral therapy.
5. Conclusion
In this study, we compared two individual-based models for simulating HIV transmission dynamics and STI co-factor effects. The models were used to simulate survival curves and HIV prevalence over time in Yaoundé (Cameroon). When calibrated using the same HIV prevalence data, simulations performed with both models fitted the data equally well. However, survival curves and HIV or STI prevalence curves simulated over a larger period showed more variation between simulations for Simpact Cyan 1.0 than for StepSyn 1.0. Differences in the initialization of the models and the generation of the sexual network explain the observed differences in the variance of the output and the estimated parameters for the two models. We strongly recommend using different models to assess the robustness of evidence provided by mathematical modelling.
Appendix A. Supplementary material
see Supplementary_material.pdf
Acknowledgements
The research conducted by DMH and NH in this study was funded by the Fonds Wetenschappelijk Onderzoek - Vlaanderen (Research Foundation – Flanders; FWO, http://www.fwo.be/en/)(Grant agreements G0E8416N and G0B2317N).
The research done by JDS and AMV in this study has been supported in part by grant G.0692.14 and G0B2317N, funded by the FWO, Belgium.
PL was supported by a PhD grant of the FWO (1S31916N).
WD was supported by a postdoctoral followship from FWO (12L5816N).
Research done by VM in this study has been supported by the ELTE Institutional Excellence Program (1783-3/2018/FEKUTSRAT) of the Hungarian Ministry of Human Capacities. VM was also supported by the ÚNKP-18-4 New National Excellence Program of the Hungarian Ministry of Human Capacities and by a Bolyai János Research Fellowship of the Hungarian Academy of Sciences.
The authors gratefully acknowledge support from the FWO Scientific Research Community on Network Statistics for Sexually Transmitted Diseases Epidemiology.
The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government – department EWI.