Abstract
Antiretroviral therapy (ART) suppresses most viral replication in HIV-infected persons, allowing a nearly normal lifespan. Yet, if ART is stopped, even after several decades of treatment, HIV levels typically rebound, and there is a high risk of progression to AIDS and death. While HIV replication re-emerges from a set of persistently infected memory CD4+ T cells following ART cessation, the mechanism for persistence remains controversial. One hypothesis maintains that HIV replicates and infects new cells continuously in anatomic microenvironments called ‘drug sanctuaries’ where ART drug concentrations are low. Alternatively, HIV infected cells may persist due to the natural longevity of memory CD4+ T cells. A final, non-mutually exclusive possibility is that latently infected CD4+ T cells frequently proliferate to maintain a relatively constant reservoir volume. Here we re-analyze existing data with ecologic methods and find that after one year of ART, at least 99% of remaining HIV infected cells originate via proliferation rather than infection. Next, using mathematical models, we demonstrate that even if a drug sanctuary is assumed, nearly all newly generated infected cells will arise from proliferation rather than new infection within 6 months of ART. Our results suggest that targeting proliferation of reservoir cells rather than HIV replication may be an optimal strategy for HIV eradication.
Introduction
Antiretroviral therapy (ART) severely limits viral replication in previously uninfected cells allowing for elimination of most HIV-infected CD4+ T cells from the body [1]. Yet, some infected cells remain and decay at an extremely slow rate despite many years of treatment [2, 3]. There is an ongoing debate whether infected cells persist due to ongoing HIV replication and cell-to-cell spread within a small minority of infected cells [4, 5] or due to HIV integration within human chromosomal DNA in memory CD4+ T cells in the absence of HIV replication [3, 6, 7]. If the latter mechanism predominates, then prolonged cellular lifespan, frequent cellular proliferation or a combination of both, must be responsible for sustaining the number of infected cells at relatively stable levels over decades of ART.
The mechanisms sustaining HIV infection must be understood to optimize ongoing HIV cure strategies. Persistent viral replication implies the presence of a micro-anatomic drug sanctuary where antiviral levels are inadequate and would suggest the need for better ART delivery technologies to achieve HIV cure [8]. Alternatively, if HIV persists in a non-replicative state within a latent reservoir of memory CD4+ T cells, then the natural survival mechanisms of these cells would serve as ideal therapeutic targets. Infected cell longevity might be addressed by reactivating the lytic HIV replication cycle within these dormant cells [9], thus leading to their premature demise. Anti-proliferative therapies have been tested in HIV-infected patients [10,11], and may be useful if reservoir cells frequently undergo homeostatic or antigen driven proliferation [12].
The validity of these competing hypotheses hinges on studies of viral evolution. Due to the high mutation rate of HIV reverse transcriptase and high viral population size during primary and chronic infection [13], ongoing HIV replication in the absence of ART can be inferred when the molecular evolution rate exceeds zero [14, 15]. New strains accrue new synonymous mutations in keeping with the error prone viral replication enzyme, reverse transcriptase, and non-synonymous mutations due to ongoing natural selection in the context of a mounting cytolytic or antibody-mediated immune response [16]. A recent study documented HIV evolution during months 0–6 of ART in three patients, finding a molecular evolution rate equivalent to pre-ART time points. New viral mutations were noted across multiple anatomic compartments implying widespread circulation of replicating strains [4], and suggesting the possibility of a drug sanctuary.
Alternatively, in multiple other studies, viral evolution was not observed by sampling multiple anatomic compartments in participants who had been on ART for at least one year [17–20]. Sampled HIV DNA was instead often linked to distant historical time points on inferred phylogenetic trees [21–23]. That observation implied the existence of latently infected cells as the dominant mechanism of HIV persistence [3, 6, 7, 19, 20], though low-level intermittent viral replication cannot be completely excluded.
Furthermore, large clonal expansions of identical strains are commonly observed and accrue over time in consistently treated patients, demonstrating that cellular proliferation contributes to the generation of new infected cells [4, 12, 19, 24–26]. Multiple identical HIV DNA sequences have been observed in blood, gut-associated lymphoid tissue, and lymph nodes, represent up to 60% of observed sequences, and are evident even during the first several months of ART [19, 25, 26]. The majority of reservoir phylodynamic studies have relied on sequencing viral genes including env, gag and pol, an approach that overestimates clonality because mutations and signs of positive selection might go unobserved without whole-genome sequencing [14, 27]. Others have used a less-biased approach via integration site analysis, finding multiple identical HIV integration sites within human chromosomal DNA for multiple strains [28–32]. Given the extremely small probability that two infection events would result in HIV integration to the same site within the human chromosome, these studies provide definitive evidence that HIV reservoir cells undergo proliferation.
In summary, these previous findings suggest that the HIV reservoir may be maintained by proliferation with some continued contribution from viral replication early during ART. However, these conclusions are limited by the fact that current study protocols only allow sampling of a tiny fraction of the HIV-1 latent reservoir. Here, by using mathematical tools from ecology, we identify that that total number of viruses in the HIV reservoir far exceeds the total number of unique viral sequences. After a year of ART, we estimate that >99% of infected cells contain expansions of the same identical sequence. To reconcile observed viral evolution during early ART with clonal predominance after one year of treatment, we developed a mechanistic mathematical model. The model includes a drug sanctuary which is small enough to preclude continuous detection of HIV in blood during ART, but large enough to sustain cell-to-cell transmission. Our model includes the fundamental observation that activated CD4+ T cells, the critical targets for replicating HIV, decay slowly during ART [33, 34]. This decay is slower than concurrent decline in HIV RNA [35–37], and abnormal T cell activation persists more than a year [34]. Finally, we incorporate experimentally derived proliferation rates of cell types within the HIV reservoir including central memory (Tcm), effector memory (Tem) and naïve (Tn) CD4+ T cells [38, 39]. Our results demonstrate that a sanctuary could sustain a small pool of infected cells early during ART. However, within a year, new infected cells arise almost entirely from proliferation [21, 25]. Our framework explains why observations of evolution in the first 6 months of ART are susceptible to a lag effect. Incompletely sampled HIV sequences may give the appearance of ongoing replication, but likely actually represent a fossil record of archived historical infection events. Our ecological inference uncovers the fossil signal, and our models demonstrate the proliferative process that sustains the reservoir.
Results
The observed frequency of clonal sequences increases with time on ART
HIV-1 establishes latency within memory CD4+ T cells by integrating a copy of its DNA into human chromosomal DNA within each infected cell [40]. Wagner et al. sampled HIV DNA in three participants at three successive time points from years 1–12 following ART initiation to examine whether cellular proliferation contributes to persistence of the HIV reservoir: 10–60% of infected cell samples appeared to be “identical sequence clones” [29], meaning that HIV-1 DNA was identified in precisely the same human chromosomal integration site in at least two infected cells, while also containing identical env sequences. The presence of identical chromosomal integration sites in multiple cells shows that new copies of HIV DNA were in all likelihood generated via cellular mitosis, rather than via viral infection of new cells and error prone reverse transcription.
To examine the underlying structure of Wagner et al.’s existing sequence (integration site) data, we organized each unique sequence into rank order abundance curves (Fig 1A). Using the rank order distribution curves, we note that certain sequences predominate: the most dominant sequence clone accounted for 3–15% (mean=10%) of total observed sequences. While identical sequence expansions accounted for the top 2–9 ranks across the 9 sampled time points, 60–96% (mean=73%) of viruses were identified in a specific chromosomal integration site in only a single cell; we define these sequences as “observed singletons”. The percentage of observed singleton sequences decreased slightly over time in the three study participants (Fig 1B) [29].
The total number of viruses in the HIV reservoir far exceeds the total number of truly unique chromosomal integration sites and sequences
Because fewer than 100 of the ∼106 infected cells are sampled in human HIV reservoir studies, it is possible that observed singleton sequences are in fact not singletons, but rather members of larger identical sequence clones. Only sampling of the entire reservoir would provide an empiric value for the proportion of viral sequences that are true singletons (true population size = 1) versus identical sequence clones (true population size ≥ 2).
To address the limited sample size of HIV infected cells during ART, we generated rarefaction curves to capture the relationship between sample size and observed number of unique HIV DNA sequences: these curves demonstrate a monotonic, but nonlinear increase in the total number of observed unique sequences with increased number of samples (Fig 2A). The decelerating increase in observed unique sequences occurs due to repeated detection of identical sequence clones.
Next we used the Chao1 estimator, a non-parametric ecologic tool that utilizes the relative abundance of observed singleton and doubleton sequences (Fig 1A), as well as the total number of observed unique sequences [41], to estimate the actual number of unique sequences (“sequence richness”) in the reservoir. The sequence richness is also the asymptote of the sampling rarefaction curve, which would be reached at a higher number of samples, well beyond the observed data. Current sampling schemes remain on the steep, initial portion of the rarefaction curve (Fig2A).
Theoretical values for richness range from 1, if all sequences (integration sites) in the reservoir are identical and originate from a single proliferative cell, to ∼106 (the total size of the reservoir if all sequences in the reservoir are unique due to or prone v replication). In our samples, the estimated minimum number of unique reservoir sequences varied from 400–6,000 at one year of ART in the three study participants and contracted slightly as a function of time on ART (Fig 2B).
The vast majority of the HIV reservoir is clonal after a year of ART
The ratio of unique to total viral sequences represents an upper limit on the percentage of cells within the entire reservoir containing true singleton HIV sequences and integration sites: estimates for this ratio varied between 0.0003–0.008 in the nine samples (Fig 2C), suggesting that at least 99% of HIV infected cells originated via proliferation and not infection within a year of ART. For this analysis, we assumed contraction of the HIV reservoir with a 44-month half-life [2, 3], the ratio of unique to total sequences therefore increased slightly with time on ART (Fig 2C).
Next, we estimated the proportion of the HIV reservoir consisting of identical sequence clones by modeling 300 possible rank abundance profiles representing the entire experimentally unobserved reservoir and testing these for fit to data. To vary the morphology of these curves, we used exponential, linear or power law rank order models (in Fig 3&S1). Each of these models can be summarized with a slope parameter and the abundance of the highest ranked sequence clone. We assumed that all true singleton sequences fall outside of these distributions. Therefore, the proportion of true singleton sequences represents a third fitting parameter in our simulated distributions.
We performed in silico experiments, drawing the number of samples (40–80) from our simulated reservoirs to compare with each sample collected from the three participants in Wagner et al. Then we assessed the fit of our simulated experiments to rank order abundance curves from the empiric data (Fig 1A). Rank order models were also assessed for their ability to recapitulate true HIV reservoir volume on ART (assuming a log-normal distribution with mean 106 infected cells and variance 1 log), as well as plausible ranges for sequence richness (Fig 2B).
An example of exponential, linear and power law distribution fit to the rank order distributions is demonstrated in Fig 3A. Exponential and linear models provided superior fit relative to power law models, which only fit the observed rank order distributions when unrealistically small reservoir volumes were assumed (Fig 3B&C). For the exponential and linear models, a narrow range of plausible rank order slopes allowed optimal fit to the rank order abundance data; regardless of rank order distribution selection, optimal fit occurred when fewer than 1% of sequences were assumed to be true singletons (Fig 3B).
Extrapolated, best-fitting distributions allow us to approximate the rank order distribution of the entire HV reservoir (Fig 3C&S2). The exponential and linear models both predict a relatively slow decrease in clone size with increasing rank. In keeping with our estimates from the Chao1 estimator (Fig 2B), the estimated overall sequence richness ranged from 300 to 5,000 across the nine samples. The largest identical sequence clones compromised a large portion of the reservoir for those models: the highest ranked 10 identical sequence clones contained >104 cells each, while the highest ranked 100 clones typically had >103 cells (Fig 3C&S2). Therefore, observed singleton sequences in experimental data (Fig 1) are likely to be members of very large identical sequence clones. The first 10% of ranked sequences accounted for >90% of reservoir volume at each of the nine time points. Despite small sample size, observed experimental sequences make up >99.9% of the reservoir at each time point, highlighting the importance of a small number of extremely large sequence clones in maintaining the reservoir. Accordingly, simulated sampling rarefaction curves decelerate quickly, suggesting that detection of new clones after 1,000 samples would occur relatively infrequently (Fig 3D).
The estimated proportion of true singleton sequences (population size=1) was extremely low in each of the three study participants, and remained stable over time on ART (assuming concurrent contraction in reservoir volume) (Fig 3E). While this analysis does not mechanistically discriminate whether true singleton sequences are members of a dying proliferative clone, or represent ongoing low levels streams of HIV replication and infection, it strongly suggests that nearly all cells compromising the HIV reservoir originated via proliferation rather infection within a year of ART initiation.
Our mathematical model includes both sanctuary and reservoir properties
To reconcile evidence of ongoing viral evolution during months 0–6 of ART [4], with the extremely high proportions of identical sequence clones after one year of ART estimated above, we developed a viral dynamic model. Our model (Fig 4A) consists of differential equations that are described in detail in the Methods. Briefly, we classify rapid death δ1 and viral production within actively infected cells I1. Cells with longer half-lifeI2 are activated to I1 at rate ξ2. I2 may represent pre-integration infected memory CD4+ T cells that proliferate and die at rates α2,δ2 akin to effector memory cells (Tem) [39, 42] but their precise biology is tangential to our model. The state I3(j) represents latently infected reservoir cells of phenotype j, which contain a single HIV DNA provirus integrated into the human chromosome [40]. The probabilities of a newly infected cell entering I1, I2, I3(j), are τ1, τ2, τ3(j). Because we are focused on the role of proliferating reservoir cells, we assume sub-populations of I3 [12], including Tem, central memory CD4+ T cells (Tcm), and naïve CD4+ T cells (Tn), which proliferate and die at different rates depending on phenotype α3(j), δ3(j) obtained from the literature [12, 39]. I3 reactivates to I1 at rate ξ3 [43]. The majority of the model parameters are derived from the literature (Table 1).
ART potency ϵ ∈ [0,1] characterizes the decrease in viral infectivity β due to treatment [35]. Thus, the basic reproductive number becomes R0(1 – ϵ) on ART, which drops below 1 when ϵ > 0.95, meaning that each cell infects fewer than one other cell and consequently viral load declines until reaching undetectable levels. In this setting, only short stochastic chains of infection typically occur.
To allow viral evolution despite ART, we allowed the possibility of a drug sanctuary (IS) with a different reproductive number R0(1 – ϵS)∼8 when ϵS = 0. In the sanctuary, target cell limitation or a local immune response must result in a viral setpoint, otherwise infected cells and viral load would grow infinitely. The amount of virus produced by the sanctuary VS is by definition, extremely low relative to non-sanctuary regions because ART initiation consistently results in a >5 order of magnitude decrease in viral load [36]. We assume uniform mixing of V1 and VS in blood and lymph node homogenates in terms of sampling as well as infectivity to new cells [4]. Based on observations in treated patients, target cell availability in the sanctuary is assumed to decrease on ART at rate ζ [33, 34], decreasing the sanctuary reproductive number as R0(1 – ϵS)exp (-ζt).
The model accurately simulates viral dynamics during ART
Our first goal was to recapitulate average three-phase viral clearance dynamics during ART rather than discriminate subtle viral clearance rate differences between treated patients. Therefore, we fit to ultra-sensitive viral load measurements collected from multiple treated participants by Palmer et al. [36]. We included experimentally derived values for most parameter values in the model (Table 1), solving only for ξ2 and ξ3 by fitting to viral load. Simulations reproduce three phases of viral clearance (Fig 4B) and predict trajectories of infected cell compartments (Fig 4C). While a minority of cells infected pre-ART are reservoir cells (I3), the transition to I3 predominance is predicted to occur relatively early during ART. If a relatively large sanctuary is assumed with 0.1% of virus being produced in a sanctuary pre-therapy (which would allow persistent detection of sanctuary derived viral RNA in blood), then between 3–6 months, IS approximates I1 that originate from I2 and I3 through reactivation (Fig 4C). Sanctuary-derived HIV RNA also approaches reservoir-derived HIV RNA 6 months after ART before contracting to undetectable levels (Fig 4B). A small, contracting drug sanctuary provides one possible explanation for why diverging HIV DNA samples may be observed during the first 6 months of ART but not at 12 months [4].
Cellular proliferation sustains HIV infection during ART
We simulated the model under several sanctuary and reservoir conditions to assess the relative contributions of infection and cellular proliferation in sustaining HIV infected cells. We considered different reservoir compositions (Fig 5A) based on empiric evidence that effector memory (Tem), central memory (Tcm) and naïve (Tn) cells proliferate at different rates and that distributions of these cells in the HIV reservoir differ substantially among infected patients [12, 38, 39]. Because a drug sanctuary has not been observed, its true volume is unknown and may vary across infected persons. Thus, we also varied the potential size of a sanctuary relative to reservoir cells (Fig 5B).
Under all possible parameter regimes, the contribution of sanctuary viruses to generation of new infected cells was negligible after one year of ART. Regardless of assumed pre-treatment reservoir composition and sanctuary size, the model predicts that percentage of new infected cells generated from viral replication rather than cellular proliferation decreases during ART. This transition is more protracted with a higher ratio of drug sanctuary to reservoir volume but is not dependent on reservoir CD4+ T cell phenotypic distribution (Fig 5C).
ART immediately eliminated generation of reservoir cells (I3) by HIV replication and cell-to-cell transmission (Fig 5D). Within days of ART initiation, proliferation was the critical generator of long-lived, HIV-infected cells even if we assumed an unrealistically large drug sanctuary (Fig 5, right column). This finding captures both the rarity of infection events on ART even when a sanctuary is assumed, as well as the relatively frequent turnover of CD4+ T cells in HIV infected persons. It also argues against longevity of latently infected cells as a driver of reservoir persistence.
Observed HIV DNA sequence evolution during early ART represents a fossil record of prior replication events rather than contemporaneous infection
Two factors may lead to biased underestimation of infected cell proliferation. First, we demonstrated that sample size limitations may lead to overestimation of singleton sequence prevalence during ART (Fig 1B) and that nearly all observed singleton sequences in persons on fully suppressive ART for more than a year are likely to be clonal (Fig 3C&D). Second, due to the several-month lifespan of memory CD4+ T cells, observed HIV DNA-containing cells with sequence divergence may reflect historical rather than contemporaneous HIV infection events, particularly during the first 6 months of ART. In the case of a small-to-moderate drug sanctuary, the observed percentage of cells generated from infection rather than proliferation at 6 months of ART (10–25% in Fig 5E, left and middle columns) may be higher than the actual ongoing percentage of cells generated from infection, which approaches zero (Fig 5C, left and middle columns). Under this circumstance, single sequences at the distal end of the phylogenetic tree may be misinterpreted as new infection, when in fact these cells were generated via a relatively recent proliferation event or an infection event several months beforehand (Fig 6). After 12 months of ART, the observed and ongoing percentage of cells generated from infection equilibrate near zero for all parameter sets which is in keeping with our estimation of >99% identical clonal sequences (Fig 3). Critically, the lag between observed and actual infected cell generation is a central, emergent feature of the model, whether a drug sanctuary is assumed or not.
If the drug sanctuary is assumed to be extremely large in model simulations (Fig 5C-E, right column), then observed sequence diversity might temporarily underestimate the true percentage of infected cells being generated via HIV replication within a sanctuary during the first several months of treatment (Fig 5E, right column). However, such a large drug sanctuary is less realistic as it would result in continually detected viremia. A higher fraction of slowly proliferating Tn cells would also contribute slightly to temporary overestimation of the ratio of replication to proliferation (Fig 5E).
Different factors drive observed and ongoing reservoir dynamics during early ART
We used the model to perform sensitivity analyses to identify parameters of infection that predict a more protracted conversion from viral replication to proliferation as a source of new infected cells. Under a wide range of plausible parameter assumptions, new infected cells arose almost exclusively via proliferation and not HIV replication after a year of chronic ART (Fig 7A). Only one variable, clearance rate of activated CD4+ T cell target cells ζ in a drug sanctuary during ART had an important, persistent impact on the generation of new infected cells: an extremely slow clearance rate of activated cells from a drug sanctuary (0.002/day) allowed greater than 1% of new infected cells to be generated from HIV infection (rather than proliferation) after more than one year of ART (Fig 7A). If this clearance was slower than the latent cell clearance (|ζ| < |θL|), ultimately the sanctuary would overtake the reservoir, but these values are far out of the range expected by experiments, and moreover would occur at a stage with such small cell numbers in both compartments (I3∼IS∼0) that stochastic burnout would become important. Rapid disappearance of HIV replication as a source of new infection was predicted regardless of reservoir volume, sanctuary volume, and relative fraction of Tem, Tcm, and Tn.
At time points beyond 6 months of ART, the percentage of observed sequences which arose from replication usually exceeded the percentage of actual infected cells arising from replication (Fig 7A). The only exception to this trend was again when a very low rate of activated CD4+ T cell target cell depletion was assumed (Fig 7A). The percentage of observed sequences arising from replication rather than proliferation did not differ according to clearance rate of activated CD4+ T cell target cells during ART. Rather, a higher percentage of slowly proliferating reservoir cells (Tn), predicted a higher proportion of observed sequences that appear to have arisen from infection (Fig 7A). Therefore, the drivers of observed infected cell origin and actual ongoing infected cell origin differed completely after one year of ART.
We simulated 1,000 possible patients with a multi-parameter sensitivity analysis in which all parameter values were randomly selected. These results again showed a rapid transition to proliferation as the source of new (de novo) infected cells during year one of ART and a generally higher proportion of observed versus de novo infection events due to HIV replication (Fig 7B). The same parameters correlated with a higher percentage of contemporaneous and observed infection (Fig 7C) as in the single parameter sensitivity analysis.
Discussion
To eliminate remaining HIV infected cells during prolonged ART, it is necessary to understand the mechanisms by which they persist. We demonstrated that ongoing HIV replication and cell-to-cell transmission is not likely to play a meaningful role in maintaining the reservoir of HIV infected cells at a stable level, particularly after 6 months of ART. These results suggest that current ART delivery and activity is adequate for long-term suppression of viral replication, even if a small drug sanctuary exists during the first few months of therapy. Strategies that enhance delivery of antiviral small molecules to anatomic drug sanctuaries could decrease viral divergence early during therapy, but will not decrease the volume of the reservoir and are not likely to be an essential component of an HIV cure strategy.
Our results also suggest that the HIV reservoir does not persist merely due to the longevity of latently infected memory CD4+ T cells. Rather, when we estimate the richness of HIV sequences in the entire reservoir, we identify that within a year of ART initiation, >99% of HIV infected cells contain identical sequence clones. This trend is then recapitulated in our mathematical model, regardless of assumed reservoir composition, heterogeneity of proliferation rates within each subset of CD4+ T cells, or ART sanctuary size. Therefore, proliferation of reservoir cells with replication-competent HIV represents a major barrier to HIV cure [44], but perhaps also a therapeutic target for achieving reservoir reduction [10, 11]. Several licensed drugs, including mycophenolate mofetil and azathioprine, specifically target the proliferation of lymphocytes and may be useful for depleting the HIV reservoir.
Several other results emerge from our model. Using ecologic techniques, we demonstrate that observed estimates of sequence clone frequency are likely to be substantial underestimates. Current clinical studies only sample the “tip of the reservoir iceberg.” Most, if not all observed singleton sequences are likely to be members of large sequence clones. Thousands of contemporaneously sampled cells would be needed to observe all unique reservoir sequences. While this level of sampling might only be possible as part of an autopsy study, our analysis suggests that 1000 sampled cells would be adequate to capture a high proportion of reservoir sequence diversity.
Our modeling also describes the concept of a fossil record of infection which emerges early during ART. Singleton sequences that appear to be diverging from the transmitting/founder virus and exist on the most distal branches of phylogenetic trees, may represent a record of prior infection events rather than ongoing infection events. The fossil record is predicted to be a transient effect: within a year of fully effective ART, observed phylogenetic data is likely to be more reflective of contemporaneous events that generate new infected cells. Importantly, at these time points, little viral evolution is observed [17–23, 25, 26, 28, 29]. Regardless of the inclusion (or not) of the sanctuary into the model, the fossil record phenomenon exists, making more difficult the interpretability of evolutionary signals derived from phylogenetic trees early during ART.
Adequate levels of antiretroviral agents eliminate ART replication [45, 46]. Therefore, a drug sanctuary can exist only in the context of inadequate delivery to tissue micro-environments. Our model cannot discriminate whether a small drug sanctuary is a real phenomenon during early ART. First, we are unable to disentangle whether the small number of singleton integrated proviruses with truly unique sequences and integration sites are smoldering remnants of previous clonal expansions or ongoing infection events. Second, the existence of a fossil record raises the possibility that no new infections are needed to observe ongoing viral evolution several months into ART. Nevertheless, our model at least demonstrates that a sanctuary must represent only a very small proportion of infected cells after 6 months of ART to account for the lack of persistently detected viremia. Moreover, our sensitivity analysis shows that even an extremely low estimate for clearance of activated CD4+ T cells will eventually result in elimination of a clinically meaningful sanctuary. In the absence of ART, HIV fuels ongoing infection by drawing more target cells to sites of viral replication, leading to a more rapid cycle of CD4+ T cell turnover [33]. Our model suggests that if ART decreases this process, even at a considerably slower rate than has been demonstrated empirically [33, 34], then the role of the sanctuary becomes trivial after less than a year of ART. Nevertheless, because chronic low-grade inflammation does persist on ART in many patients, it is possible that this mechanism allows a small but inconsequential drug sanctuary to persist.
A final conclusion of our model is that observed sequence data are likely to be misleading in ascribing the origin of infected cells, particularly during the first months of ART. In fact, the major variable that predicts a higher relative contribution of observed ongoing sequence evolution due to new infection (a larger proportion of slowly proliferating naïve CD4+ T cells in the reservoir) is not the same variable that predicts actual proportion of ongoing infection that sustains the reservoir (a slower decrease in availability of CD4+ T cell targets). Rather than cross-sectional observations of the reservoir, the most efficient way to probe reservoir dynamics is likely to be in the context of an effective therapy that perturbs reservoir volume and composition in ways that can be tested and validated with mathematical modeling. More importantly, assessments of reservoir dynamics are likely to be most reliable if performed after one year of ART.
In summary, we synthesize diverse observations from early and late time points during ART and conclude that a drug sanctuary may exist during early ART, but its importance in sustaining HIV diminishes due to a slow decrease in activated CD4+ T cells after one year on ART, and the high frequency of reservoir cell proliferation. Enhancing ART delivery would not decrease HIV reservoir volume. Rather, eradication strategies should focus on limiting proliferation of latently infected cells as nearly all reservoir cells arise from this mechanism after the first year of ART.
Methods
Computational code for all calculations and simulations can be found at https://github.com/dbrvs/reservoir_persistence
Rank abundance of HIV sequences and interpolation of species rarefaction curves
We use the rank abundance framework to analyze integration site data from Wagner et al. [29] In Fig 1 integration site data has been placed in a rank-ordered histogram such that the sequences are arranged from most to least commonly observed. Using the data, we developed rarefaction curves to estimate the expected number of species that would be observed given a certain number of samples from this sample set. Calling the total number of samples: N; the number sampled to have sequence s: Ns; and the total number of distinct sequences (the richness): R, the expected number of unique sequences observed after k samples is:
This equation is used to interpolate the rarefaction curves in Fig 2A. Later, in Fig 3D we extrapolate rarefaction curves using the complete rank abundance curves with the total size of the HIV reservoir N = L. Then because the number of samples is much fewer than the size of the reservoir, we can approximate the combinatoric factors as such that again with the total richness R, we have: to avoid computation of large factorials.
Ecological analysis of integration site data
We employed common ecological estimators to infer the estimated richness, or number of distinct HIV sequences expected to be in the entire reservoir. The method, using the “Chao estimator” [41], notably does not require information about the entire size of the reservoir, taking it’s power from the ratio of the numbers of observed singletons f1 and doubletons f2, that is, sampling frequencies for the number of sequences measured once or twice [47]. The bias-corrected Chao1 estimator for the richness Rest is expressed in terms of the observed richness, which is equivalent to the maximum rank Robs as:
It is important to note that these estimates provide lower bounds on richness because infinitely many rare species may exist. The 95% confidence intervals are calculated by defining the ratio of singleton to doubleton sequences as fr = f1/f2, and calculating the variance [48]:
When f2 = 0, we assume that the variance is equivalent to the mean, providing the large error bounds for patient R1 time point 1 as seen in Fig 2B. In Fig 2C, we use a model for the exponential decay of the latent reservoir to show that the number of unique sequences likely is far fewer than the total population size, indicating clonality.
The model is the decoupled reservoir decay L = L0exp (θ3t) where the net clearance rate is taken to have a 44 month half-life [2, 3].
Parametric models to extrapolate the reservoir rank abundance curves
We developed models for the abundance a of the reservoir using piecewise-defined distributions on the rank r such that some part of the reservoir followed a monotonically decaying smooth function f(r), while also leaving the possibility that many sequences were unique. Defining the abundance a of each sequence ranked as r, we have: where we chose the functions f(r) = a0r-α for the power law model, f(r) = a0e-β(r-1) for the exponential, and f(r) = a0+ m(r – 1) for the linear model. The models are each parameterized by a single parameter, the inial value a0, and the number of unique sequences umax. The value rmax is calculated by solving f(r) = 1. In Fig 3B, we explored the parameter range for the models and scored each model using a likelihood approach.
We separately scored the model in terms of accuracy with expected reservoir size ∼1 million cells [6, 7], expected richness derived from the Chao1 estimator, and similiarity between the observed rank-abundance curve, and simulated draws from the model distribution given the same number of samples as the observed data and provided equal weight to these metrics. Our fitting metric is the combined log-likelihood log (ℒ) = log (ℒLℒChaoℒdist), where each separate likelihood assumes a normal distribution 𝒩. That is, we calculate the probability of finding! given a log-normal distribution with mean 1 million and variance 1 log, such that ℒL = 𝒩 (log10(L) – 6,1). The likelihood of the richness is ℒChao = 𝒩(R – Rest, σR), with Rest and σR calculated for each individual and each time point as above. The likelihood of the simulated distribution ar is calculated using the observed rank-abundance distributions dr as ℒdist = Πr 𝒩(ar – dr, 1). The best-fit distributions were plotted, and the true singleton frequency was calculated as a1/Σr ar.
Mechanistic model for the persistence of the HIV reservoir
The canonical model for HIV dynamics describes the time-evolution of the concentrations of susceptible S and infected I CD4+ T cells and HIV virus V [35, 37]. Our model grows from the canonical model, simplifying with several approximations and extending the biological detail to simulate HIV dynamics on ART, including a long-lived latent reservoir and a potential drug sanctuary (see Fig 5). We include our assumptions here:
Perelson et al. first noticed and quantified a ‘biphasic’ clearance of HIV virus upon initiation of ART and showed that viral half-lives of 1.5 and 14 days correspond with the half-lives on two infected cell compartments [35, 37]. With longer observation times and single-copy viral assays, Palmer etal. find four-phases of viral clearance after initiation of ART [36]. Because of uncertainty in distinguishing the third and fourth phase in that study, we focus on the first three decay rates and corresponding cellular compartments, attributing a mixture of the third and fourth phase decay to the clearance of the latent reservoir (half-life 44 months) as measured by Siliciano et al. and recently corroborated by Crooks et al. [2, 3]. With this in mind we developed a mechanistic mathematical model that has three types of infected cells I1, I2,I3 which are meant to simulate productively infected cells, pre-integration infected cells, and latently infected cells, respectively.
A recent hypotheses about reservoir persistence suggests there may be a small ART-free sanctuary (1 in 105 infected cells) [4]. Thus, we include the state variable Is that is maintained at a constant set-point level prior to ART, where all new infected cells arise from ongoing replication. We opt for this simplification because it biases against our conclusions.
Next, many studies have demonstrated that HIV accelerates immunosenescene through abnormal activation of T cells [49–51]. However, ART results in a marked reduction of T cell activation and apoptosis, a potential signature of HIV susceptible cells [52]. By examining the decline of activation markers for CD4+ T cells, we can approximate the decay kinetics of activated T cells upon ART, inferring approximate decay kinetics of the target cells in our model [33, 34, 53]. A range of initial values exists (from ∼5–20% activation) depending on stage of HIV infection, yet after a year of ART, a large percentage of patients return to almost normal, or slightly elevated T cell activation levels (2–3%). Because we assume that target cell depletion is minimal at viral load set-point, we can approximate that the susceptible cell concentration decreases over time as the immune activation decreases, i.e., S = S(0)e-ζt. This single exponential decay is simplified (it may be biphasic but the data are not granular enough to discriminate this dynamic subtlety). From existing data, the decay constant should be in the range ζ∼[0.01,0.002]. We extend this decay into the sanctuary, allowing the number of susceptible cells over the whole body to decrease so that we have IS = I1(0)φse−ζt where φS is the fraction of infected cells that are in a sanctuary.
Last, we use the quasi-static approximation that virus is proportional to the number of actively infected cells in all compartments V = n(l1+ IS) where n = π/γ. The model is thus where we use the overdot to denote the time derivative.
Comparing proliferation to ongoing replication: ‘de novo’ and ‘observed’ percentages
Using the ODE model (Eq 6), we calculate the total number of newly infected cells generated at time t by infection and by proliferation, both for the total model and within the latent reservoir, as the de novo infection percentage (due to replication):
By defining several tracking ODE equations, we calculate the observed percentage from ongoing replication over time. We assume that all infected cells at the initiation of ART were originally generated by ongoing replication and then solve for the number of cells generated by proliferation: and similarly for those cells whose origin was ongoing viral replication:
Then the observed infection percentage can be written or similarly, the percentage in any compartment can be calculated separately by removing the summation.
Sensitivity analysis
Using the parameter ranges I3(0) = [0.02,2], φS = [10−6,10−4], ζ = [0.002,0.02], ϵ = [0.9,0.99], ϵS = [0,0.9], ϱj = [0,0.5] we completed a local and global sensitivity analysis. For the local analysis, we fixed all values as in Table 1 and modified one parameter at a time over each listed range. The global analysis was performed by using 104 Latin Hypercube samplings of the complete 6-dimensional parameter space [54]. The key outcome, the percent (observed and de novo) due to ongoing replication at 1 year of ART, was correlated to each parameter using the Spearman correlation coefficient—defined by the ratio of the covariance between the outcome and the variable divided by the standard deviations of each when the variables were rank-ordered by value.
A summary of model parameters from the literature. Death rates are calculated from clearance, proliferation, and activation rates as δi(j) = αi(j) − θi − ξi.
Author Contributions
DBR, ERD, and JTS conceptualized the model with help from AMS. Data was shared and discussed by TAW and SEP. DBR analyzed data and performed mathematical calculations and computational simulations. All authors contributed to the writing of the manuscript.
Acknowledgements
We gratefully acknowledge the VIDD faculty initiative at the Fred Hutchinson Cancer Research Center and the NIH National Institute of Allergy and Infectious Diseases (U19 AI096111 and UM1 AI12662 to JTS). This work was supported in part through SEP by the Delaney AIDS Research Enterprise (DARE) to Find a Cure (1U19AI096109 and 1UM1AI126611-01) and the Australian National Health and Medical Research Council (AAP1061681). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.