Abstract
HIV-1 infection currently cannot be cured because the virus persists as integrated proviral DNA in long-lived cells despite years of suppressive antiretroviral therapy (ART). To characterize establishment, turnover, and evolution of viral DNA reservoirs we deep-sequenced the p17gag region of the HIV-1 genome from samples obtained after 3-18 years of suppressive ART from 10 patients. For each of these patients, whole genome deep-sequencing data of HIV-1 RNA populations before onset of ART were available from 6-12 longitudinal plasma samples spanning 5-8 years of untreated infection. This enabled a detailed analysis of the dynamics and origin of proviral DNA during ART. A median of 14% (range 0-42%) of the p17gag DNA sequences were overtly defective due to G-to-A hypermutation. The remaining sequences were remarkably similar to previously observed RNA sequences and showed no evidence of evolution over many years of suppressive ART. Most sequences from the DNA reservoirs were very similar to viruses actively replicating in plasma (RNA sequences) shortly before start of ART. The results do not support persistent HIV-1 replication as a mechanism to maintain the HIV-1 reservoir during suppressive therapy. Rather, the data indicate that viral DNA variants are turning over as long as patients are untreated and that suppressive ART halts this turnover.
Introduction
Combination antiretroviral therapy (ART) has had a dramatic effect on the morbidity and mortality of human immunodeficiency virus type 1 (HIV-1) infection. Even though ART is very effective in suppressing active virus replication, it cannot eradicate the infection because HIV-1 persists as integrated proviral DNA in long-lived cells that constitute a virus reservoir. Latently infected resting memory CD4+ T-lymphocytes (memory CD4 cells) represent the most solidly documented HIV-1 reservoir [1–3]. Thus, a small fraction of memory CD4 cells have fully functional integrated HIV-1 proviruses. These cells do not produce virus when they are in a resting state, but can be induced to produce virus upon activation in vitro and in vivo [1–4].
Because of their importance for HIV-1 cure efforts, many methods to quantify the HIV-1 reservoirs have been developed. The quantitative virus outgrowth assay (QVOA) represents the “gold standard” [4–6], but this assay underestimates the true size of the functional reservoir due to incomplete induction by PHA stimulation. Ho et al.[7] showed that the functional HIV-1 reservoir may be 60-fold larger than originally estimated. PCR based assays are also commonly used for quantifying the HIV-1 reservoir, but these assays overestimate the size of the functional reservoir because they cannot distinguish between replication-competent and defective viral genomes. Quantification of the HIV-1 reservoir by PCR-based methods typically give at least 100-fold higher numbers that the QVOA because of defective proviruses [4–6]. Many defective proviruses have large internal deletions 7, 8]. Defective proviruses are also the result of APOBEC editing, which induces G-to-A hypermutation [9–11].
The HIV-1 reservoir is established early during primary infection and is remarkably stable in both quantitative and qualitative terms. Early ART reduces the size and the genetic complexity of the reservoir [12–15]. Sili-ciano et al.[16] documented a half-life of 44 months for latently infected cells capable of producing replication-competent virus in the QVOA. Similarly, HIV-1 DNA levels and genetic compositions are very stable in patients on long-term suppressive ART [5, 13, 17–21]. Most studies indicate that the HIV-1 reservoir is maintained by the physiological homeostasis of CD4 memory that in part involves occasional expansions and contractions of individual CD4 cell clones [5, 12, 22]. However, some studies have suggested that persistent virus replication may be an important contributor to the maintenance of the HIV-1 reservoir 23, 24]. In particular, Lorenzo-Redondo et al.[25] recently reported evidence of rapid HIV-1 evolution in lymphoid tissue reservoirs.
Despite their significance for HIV-1 cure efforts relatively little is known about the establishment and turnover of the HIV-1 reservoir before start of ART. In this study we have characterized how HIV-1 DNA reservoirs are established and maintained in 10 patients. Evolution of HIV-1 in these patients during 5-8 years prior to ART had been characterized in a recent study by Zanini et al.[26] by whole genome deep-sequencing. The patients were selected to later have gone on to many years of fully suppressive ART. We now sequenced HIV-1 DNA from peripheral blood mononuclear cells (PBMCs) and compared these reservoir sequences to replicating HIV populations prior to ART. We found that HIV-1 DNA populations remained genetically stable for up to 18 years after start of suppressive ART, which provides evidence against viral evolution and replication as a mechanisms to maintain HIV-1 reservoirs. Furthermore, we found that variants replicating shortly before start of therapy were overrepresented in the HIV-1 DNA reservoirs indicating that proviral HIV-1 variants were turning over as long as the patients were untreated.
Results
Patients and samples
The study included 10 HIV-1 infected patients who were diagnosed in Sweden between 1990 and 2003. The patients were selected on the following criteria: 1) A relatively well-defined time of infection; 2) Treatment-naive for a minimum of 5 years; and 3) Thereafter gone on to suppressive ART (plasma HIV-1 RNA levels continuously < 50 copies/μl) for a minimum of 2 years. In a recent study we performed whole-genome deep sequencing of replicating HIV-1 RNA populations in 9 of the 10 patients covering the time period before they started ART (6-12 longitudinal plasma samples per patient spanning 5-8 years) [26]. Here we included plasma RNA sequences from the tenth patient. Patient characteristics are summarized in Table 1.
For the present study we have obtained sequence data from HIV-1 DNA in viral reservoirs by deep sequencing of the p17gag region of the HIV-1 genome in DNA prepared from PBMC. Patient characteristics are summarized in Table 1. Longitudinal PBMC samples (1-3 samples per patient spanning up to 2.6 years) were obtained obtained 3 - 18 years after start of suppressive ART (Table 1). We define viral DNA reservoirs as HIV p17gag sequences that were still present in PBMC after a minimum of 2 years of suppressive ART. The HIV-1 DNA template numbers were quantified by limiting dilution. Identical p17gag sequences were merged into haplotypes while preserving their abundance. Minor haplotypes were merged with major haplotypes if they differed by only one mutation (see Materials and Methods). Processed sequence data will be made available at hiv.tuebingen.mpg.de. Raw sequencing reads from all HIV-1 DNA samples have been deposited in the European Nucleotide archive and will be available under study accession number PRJEB13841 (sample accession numbers ERS1138001-ERS1138025).
Proviral DNA sequences reflect pretreatment RNA sequences
The HIV-1 DNA sequences recapitulate the diversity observed in RNA sequences before treatment, often with exact sequence matches, see Fig. 1 and Fig. S3. While we observed large variations in the abundance of haplotypes with sequence read frequencies varying between 0.1% and 50% (see Fig. S1), the close match between RNA and DNA sequences confirms that we characterized proviral diversity in a specific and sensitive manner. Variation in haplotype abundance likely reflects clonal expansions 5, 13], independent integrations of identical sequences, and resampling of the same original DNA templates during sequencing. The exact contribution by these distinct mechanisms is difficult dissect in our sequence data.
The estimated number of HIV DNA templates, the number of distinct haplotypes observed, and the fraction of haplotypes seen in multiple samples are given in Table S1. We typically recapture one third (median 0.29) of haplotypes observed at a frequency above 1% in another sample from the same patient.
Hypermutated sequences are frequent in HIV-1 reservoirs
We found that a substantial proportion (median 14%; range 0-42%) of the p17gag DNA sequences from the viral reservoirs were hypermutated and therefore replication incompetent (see Fig. S2), which is consistent with other reports [5, 6, 10, 13]. A small proportion of sequences had stop codons not obviously due to G-to-A hypermutation (average 3%, range 0-12%). It is likely that a proportion of sequences without overt inactivating mutations were also replication incompetent due to mutations or deletions outside of p17gag.
Hypermutation in the HIV-1 DNA sequences complicates comparison with non-defective DNA and RNA sequences. For this reason we excluded hypermutated sequences from the main analyses, but we also performed complementary analyses that included hypermutated sequences.
Lack of evidence of persistent replication in HIV-1 DNA reservoirs
It remains controversial whether or not HIV-1 reservoirs are maintained by persistent replication [5, 12, 20, 22–25]. We used the p17gag DNA sequences from viral reservoirs to search for evidence of sequence evolution, which would be expected to take place if the virus was replicating. The p17gag DNA sequences from viral reservoirs were obtained from 3.0 to 18.6 years after start of suppressive ART. HIV-1 RNA sequences from plasma samples obtained before start of therapy were used as reference materials.
Root-to-tip distances for plasma RNA populations and PBMC DNA populations were calculated relative to the major RNA haplotype in the first plasma sample. Fig. 2 shows temporal changes of root-to-tip distances in HIV-1 RNA and DNA populations obtained before and after start of suppressive ART, respectively. As previously shown, plasma HIV-1 RNA populations obtained before start of ART evolved at a relatively constant rate [26], as evidenced by a steady increase of average root-to-tip distances over time. In sharp contrast, HIV-1 DNA populations obtained after 3 - 18 years of suppressive therapy showed stable root-to-tip distances. Hypermutated DNA sequences showed larger root-to-tip distances, but also these distances were stable over time (Fig. S4)
Table 2 shows the rate of evolution before and after start of suppressive ART. Before start of therapy we observed statistically significant evolution of plasma RNA sequences with rates 1 to 4 × 10−3/year in all 10 patients. In contrast, no statistically significant evolution was observed in DNA reservoirs during up to 18 years of suppressive ART.
Collectively, our results do not provide support for persistent HIV-1 replication as a mechanism to maintain the HIV-1 reservoir during suppressive therapy.
Time for deposition of reservoir DNA sequences
To investigate when PBMC HIV-1 DNA variants had been deposited the viral reservoirs we compared the on-treatment PBMC DNA sequences with the longitudinal pre-treatment plasma RNA sequences (Fig. 3 and Fig. S5). For each p17gag DNA sequence, we determined the RNA sample and haplotype that was the most likely source. By this procedure, most proviral sequences were assigned to the plasma samples closest to the start of treatment (Fig. 3, panel A). The representation of RNA haplotypes from earlier plasma samples dropped rapidly over 1-2 years. However, sequences representing earlier plasma sampling time points were also found as minor variants among the pl7gag DNA sequences. Among these minor variants, DNA sequences matching the variant dominating the first plasma samples obtained within 6 month after infection were overrepresented in some patients (Fig. 3, panel B). Thus, the proportion of DNA sequences matching the initial plasma HIV-1 RNA hap-lotype was 14%, 2.4%, 42%, < 1%, and 6.9% in patients 2, 3, 6, 8 and 11, respectively.
The overrepresentation in the HIV-1 DNA reservoir of viral variants that replicated shortly before start of suppressive ART indicates that cells carrying defective and non-defective proviral variants were turning over as long as the patients were untreated and that suppressive ART halted this turnover. The width of the peak in Fig. 3A suggest a half-life on the order of one year.
Discussion
In this study we have investigated the composition and turnover of HIV-1 DNA sequences in viral reservoirs in patients on long-term suppressive therapy. Reservoir HIV-1 DNA populations were remarkably stable and showed no signs of ongoing replication. We also traced when during the course of HIV infection viruses in the the DNA reservoirs had been deposited and found that they mainly derived from the last year(s) before start of suppressive therapy.
Our study provides evidence against persistent HIV-1 replication as a mechanism for maintenance of HIV-1 reservoirs during suppressive therapy. This is at variance with a recent report by Lorenzo-Redondo et al. [25] and a few earlier reports 23, 24], but agrees with several other earlier studies [5, 13, 17–21]. Lorenzo-Redondo et al. [25] compared genetic diversity in samples HIV-1 RNA in plasma at start of therapy with HIV-1 DNA sequences obtained from blood and tissues at baseline, three and six months after the start of treatment. They report a signal of evolution between the different time points at an extraordinary high rate (7.4–12×10−3 changes per site per year) – about 5-fold higher than typically observed in gag and pol of replicating RNA populations. This observation is incompatible with the lack of observable changes in reservoir sequences over 20 times longer time intervals reported here. Without longitudinal data on the evolution of the HIV-1 population prior to treatment, the nature of the change reported by Lorenzo-Redondo et al. [25] is difficult to discern. One possible explanation for the apparent conflict between the two studies could be that after onset of therapy, short-lived cells that sampled the most recent circulating virus populations start to disappear, leaving longer-lived cells that sample deeper into the history of the infection. This scenario would not correspond to evolution, but quite oppositely a sampling of earlier variants. Another difference is that Lorenzo-Redondo et al. [25] investigated HIV-1 DNA sequences in tissue as well as PBMC samples whereas we only studied PBMC samples. However, tissue and blood HIV-1 DNA variants should be well-mixed over the time frame that we investigate [5, 13, 25]. Both Lorenzo-Redondo et al. [25] and we studied DNA sequences in HIV-1 reservoirs, which are known to contain a high proportion of defective virus. These proviruses serve as markers of T-cell clones, rather than replication-competent virus. Hence the absence of evolution or turnover of provirus that we found does not exclude the possibility that there is replication and evolution of replication-competent virus in reservoirs. However, if such replication exists, it happens very low levels that do not contribute substantially to the pool of proviral DNA in PBMCs. To enrich for replication competent and putatively evolving virus, QVOA followed by sequencing of virus released into to super-natants should performed, rather than sequencing of total HIV-1 DNA as done by Lorenzo-Redondo et al. [25], us and others [10, 13, 20]. In agreement with our finding of genetic stability in the DNA reservoirs Josefsson et al. [13] and Stockenstrom et al. [5] have reported that defective HIV-1 DNA integrants present during long-term effective ART appear to be maintained by proliferation and longevity of infected cells rather than by ongoing viral replication.
Because we had access to detailed longitudinal data on the evolution of the plasma HIV-1 RNA population from time of infection to start of suppressive ART, we could trace when during the course of untreated HIV-1 infection the viruses in the DNA reservoirs had been deposited. We found that a majority of variants in the HIV-1 DNA reservoirs were derived from HIV-1 RNA variants that had actively replicated during the last year(s) before start of suppressive ART, with no evidence for evolution after treatment start. Frenkel et al.[28], in contrast to us, reported persistence of a greater number of early compared to recent viruses in a few children on suppressive ART; more research is warranted to assess the origin of this difference.
Defective HIV-1 proviruses can be regarded as unique in vivo labels of individual memory CD4 cell clones which can be used to track their fate similar to sequencing of T-cell receptors [29]. This strategy was used by Imamichi et al.[30] to demonstrate that a T-cell clone persisted more than 17 years. Similarly, prenatally formed T-cell receptors shared by twins have been reported to have lifetimes > 30 years [31]. During suppressive ART the turnover of infected memory CD4 cell clones is likely to follow the same dynamics as in uninfected people. In contrast, we observe a strong overrepresentation in the reservoirs of “late” HIV-1 RNA variants, which indicates that HIV-1 target cells, primarily CD4+ T-lymphocytes, were turning over with a half-life of about one year in absence of treatment. This turnover was dramatically slowed by suppressive ART. Earlier studies, based on different types of labelling of CD4 cells, have indicated a 3 −4 fold increased rate of CD4 cell death in untreated HIV-1-infected patients as compared with uninfected persons and patients on suppressive ART [32–34]. The more dramatic difference we observe is likely explained by different methodologies. Earlier studies estimated the lifespan of individual cells whereas we primarily have estimated the lifespan of CD4 cell clones carrying defective proviruses (i.e. infected cells as well as their daughter cells).
Our study has several limitations. We have not sorted cells and therefore cannot investigate if there are differences in HIV-1 turnover between different types and subsets of cells, such as memory CD4 cells and their subsets. However, it is reasonable to assume that a majority of our HIV-1 DNA sequences came from memory CD4 cells because others have shown that these cells constitute the main HIV-1 reservoir [1–3]. We sequenced a relatively short region of the HIV-1 genome and therefore cannot reliably distinguish between replication-competent and defective viruses. While we observe no evolution in these proviral DNA sequences, we cannot rule out the possibility that a small subset of viruses indeed was replicating but remained undetected among the many replication-incompetent viruses. We observed large variations in the abundance of sequence haplotypes that likely reflect both clonal expansions 5, 13], independent integrations of identical sequences, and resampling of the same original DNA templates during sequencing. With our sequencing method we could not exactly determine the relative contribution by these distinct mechanisms. We are attempting Primer ID sequencing [35] to even better understand the in vivo dynamics of different viral haplo-types.
In summary, we provide compelling evidence against persistent viral replication as a mechanism to maintain the latent HIV-1 DNA reservoir during suppressive therapy. Furthermore, we show that most latently infected cells during long-term suppressive ART are infected shortly before ART start and that the rate of T-cell turnover is reduced upon starting suppressive ART.
Materials and methods
Ethical statement
The study was conducted according to the Declaration of Helsinki. Ethical approval was granted by the Regional Ethical Review board in Stockholm, Sweden (Dnr 2012/505 and 2014/646). Patients participating in the study gave written and oral informed consent to participate.
Patients
The study included 10 HIV-1-infected patients who were diagnosed in Sweden between 1990 and 2003. Prior to the present study the patients were included in a recent study on the population genomics of intrapatient HIV-1 evolution [26]. The patients were selected based on the following inclusion criteria: 1) A relatively well-defined time of infection based on a negative HIV antibody test less than two years before a first positive test or a laboratory documented primary HIV infection; 2) No ART during a minimum of approximately five years following diagnosis; 3) Availability of biobank plasma samples covering this time period; and 4) Later have started successful ART (plasma viral levels < 50 copies/μl) for a minimum of two years. As previously described 6 - 12 plasma samples per patient were retrieved from biobanks and used for full-genome HIV-1 RNA sequencing [26]. The same patient nomenclature is used in both studies. For the present study the same patient were asked to donate 70 ml of fresh EDTA-treated blood on up to three occasions over a time period of 2.5 years. These blood samples were obtained 3-18 years after start of successful ART. Estimated time of infection (ETI) was calculated as previously described using clinical and laboratory findings including Fiebig staging and BED testing [26]. Information about the patients and the samples are summarized in Table 1.
HIV-1 RNA sequencing from plasma
Whole-genome deep-sequencing of virus RNA populations in plasma samples obtained before start of therapy was performed as previously described [26]. In short, total RNA in plasma was extracted using RNeasy® Lipid Tissue Mini Kit (Qiagen Cat No. 74804) and amplified using a one-step RT-PCR with outer primers for six overlapping regions and Superscript ® III One-Step RT-PCR with Platinum ® Taq High Fidelity High Enzyme Mix (In-vitrogen, Carlsbad, California, US). An optimized Illu-mina Nextera XT library preparation protocol was used together with a kit from the same supplier to build DNA libraries, which were sequenced on the Illumina MiSeq instrument with 2 x 250bp or 2x 300bp sequencing kits (MS-102-2003/MS-10-3003). For the present study a part of the p17gag region of the HIV-1 genome (see below) was extracted from the entire full-genome RNA data set. The median number of high quality reads covering the entire p17 sequence was 146 (inter-quartile range 56 - 400) and the cDNA template numbers are available in Zanini et al.[26].
HIV-1 DNA sequencing from PBMCs
Approximately 70 ml of fresh whole blood was obtained in 7 Vacu-tainer tubes with EDTA as anticoagulant. PMBC were isolated by Ficoll-Paque PLUS (GE Healthcare Bio-Sciences AB, Uppsala, Sweden) centrifugation according to the instructions by the manufacturer. Total DNA was extracted from PBMC using the OMEGA E.Z.N.A® Blood DNA Mini Kit (Omega bio-tek, Norcross, Georgia) or the QIAamp DNA Blood Mini Kit (Qiagen GmbH, Hilden, Germany) according to the instructions by the manufacturer. The amount of DNA was measured with Qubit® dsDNA HS Assay Kit (Invitrogen™ Eugene, Oregon, USA). Patient-specific nested primers (Integrated DNA Technologies) were used to amplify a 387-bp long portion of the p17gag gene corresponding to positions 787 to 1173 in the HxB2 reference sequence. The primers were designed based on the plasma RNA sequences from each patient (Tab. S2). Outer primers were used together with Platinum® Taq DNA Polymerase High Fidelity (Invitrogen™ Carlsbad, California, US) for the first PCR. The program started with a denaturation step at 94° C for 2 min followed by 15 PCR cycles of denaturation at 94°C for 20 s, annealing at 50°C for 20 s and extension at 72° C for 30 s and a final extension step at 72°C for 6 min. For the second PCR, 2.5 μl of the product from the first PCR was amplified with inner primers and the cycle profile and enzyme as for the first PCR. Amplified DNA was purified using Agencourt AMPure XP (Beckman Coulter Beverly, Massachusetts) and quantified using Qubit. For each sample the number of HIV-1 DNA templates used for sequencing was roughly quantified in triplicate by limiting dilution using the same PCR conditions, three dilutions (usually 0.5, 0.1, 0.02 μg of DNA) and Poisson statistics. Control experiments were performed to evaluate PCR-induced recombination using the plasmids NL4-3 and SF162, which were spiked in equal proportion into human DNA and amplified using the same PCR conditions as above. The results showed that there was minimal PCR-induced recombination in this short amplicon.
Sequencing and read processing
The HIV specific primers were flanked by NexteraXT adapters. To construct sequencing libraries, indices and sequencing primers were added in 12-15 cycles of additional PCR. Amplicons were sequenced on an Illumina MiSeq machine with 2x250 cycle kits. Between 6,500 and 190,000 (median 35,000) paired-end reads were generated per sample. The overlapping paired-end sequencing reads were merged to create synthetic reads spanning the entire p17 amplicon. In case of disagreement between paired reads, the nucleotide on the read with the higher quality score was used. We counted the number of times a particular p17 sequence was observed and did subsequent analysis with read-abundance pairs. To reduce the influence of sequencing and PCR errors, we combined rare sequences (below frequency 0.002) with common sequences if they differed by no more than one position. Specifically, starting with the rarest sequences, we merged rare sequences with the most common sequence that was one base away. The cutoff 0.002 is the typical error frequency of the pipeline as determined earlier [26]. All analysis is done in Python using the libraries numpy, biopython, and mat-plotlib [36–38].
Hypermutation detection
To classify sequences into obvious hypermutants and sequences representative of circulating virus, we counted mutations at positions that are not variable in the RNA samples obtained prior to therapy. If more than 4 mutations were observed and at least half of them were G→A, the sequence was considered a hypermutant. The distribution of the different transition mutations relative to the closest genome found in RNA samples are shown in Fig. S2 for reads classified as hypermutants or not. Results we obtained for sequences classified as non-hypermutants are very similar to results obtained when using only sequences without stop codons.
Phylogenetic analysis
We reconstructed phylogenetic trees using the approximate maximum likelihood method implemented by FastTree [27]. Tips were annotated with frequency, source and sample date using custom python scripts.
Statistical analysis
Root-to-tip distances were calculated as the average distance between a sample and the founder sequence approximated by the consensus sequence of the first RNA sample. To determine the rate of evolution in absence of treatment, this root-to-tip distance was regressed against time. To determine the rate of evolution on treatment, the root-to-tip sequence of the last RNA sample and the DNA samples was regressed against time. To determine the most likely seeding time for a p17gag DNA sequence obtained from PBMCs, we calculated the likelihood of sampling this sequence given the SNP frequencies in each RNA sample and assigned the sequence to the sample where this likelihood was highest.
Ethical statement
The study was conducted according to the Declaration of Helsinki. Ethical approval was granted by the Regional Ethical Review board in Stockholm, Sweden (Dnr 2012/505 and 2014/646). Patients participating in the study gave written and oral informed consent to participate.
Patients
The study included 10 HIV-1-infected patients who were diagnosed in Sweden between 1990 and 2003. Prior to the present study the patients were included in a recent study on the population genomics of intrapatient HIV-1 evolution [26]. The patients were selected based on the following inclusion criteria: 1) A relatively well-defined time of infection based on a negative HIV antibody test less than two years before a first positive test or a laboratory documented primary HIV infection; 2) No ART during a minimum of approximately five years following diagnosis; 3) Availability of biobank plasma samples covering this time period; and 4) Later have started successful ART (plasma viral levels < 50 copies/μl) for a minimum of two years. As previously described 6-12 plasma samples per patient were retrieved from biobanks and used for full-genome HIV-1 RNA sequencing [26]. The same patient nomenclature is used in both studies. For the present study the same patient were asked to donate 70 ml of fresh EDTA-treated blood on up to three occasions over a time period of 2.5 years. These blood samples were obtained 3-18 years after start of successful ART. Estimated time of infection (ETI) was calculated as previously described using clinical and laboratory findings including Fiebig staging and BED testing [26]. Information about the patients and the samples are summarized in Table 1.
HIV-1 RNA sequencing from plasma
Whole-genome deep-sequencing of virus RNA populations in plasma samples obtained before start of therapy was performed as previously described [26]. In short, total RNA in plasma was extracted using RNeasy® Lipid Tissue Mini Kit (Qiagen Cat No. 74804) and amplified using a one-step RT-PCR with outer primers for six overlapping regions and Superscript ® III One-Step RT-PCR with Platinum ® Taq High Fidelity High Enzyme Mix (In-vitrogen, Carlsbad, California, US). An optimized Illu-mina Nextera XT library preparation protocol was used together with a kit from the same supplier to build DNA libraries, which were sequenced on the Illumina MiSeq instrument with 2 x 250bp or 2x 300bp sequencing kits (MS-102-2003/MS-10-3003). For the present study a part of the p17gag region of the HIV-1 genome (see below) was extracted from the entire full-genome RNA data set. The median number of high quality reads covering the entire p17 sequence was 146 (inter-quartile range 56 - 400) and the cDNA template numbers are available in Zanini et al.[26].
HIV-1 DNA sequencing from PBMCs
Approximately 70 ml of fresh whole blood was obtained in 7 Vacu-tainer tubes with EDTA as anticoagulant. PMBC were isolated by Ficoll-Paque PLUS (GE Healthcare Bio-Sciences AB, Uppsala, Sweden) centrifugation according to the instructions by the manufacturer. Total DNA was extracted from PBMC using the OMEGA E.Z.N.A® Blood DNA Mini Kit (Omega bio-tek, Norcross, Georgia) or the QIAamp DNA Blood Mini Kit (Qiagen GmbH, Hilden, Germany) according to the instructions by the manufacturer. The amount of DNA was measured with Qubit® dsDNA HS Assay Kit (Invitrogen™ Eugene, Oregon, USA). Patient-specific nested primers (Integrated DNA Technologies) were used to amplify a 387-bp long portion of the p17gag gene corresponding to positions 787 to 1173 in the HxB2 reference sequence. The primers were designed based on the plasma RNA sequences from each patient (Tab. S2). Outer primers were used together with Platinum® Taq DNA Polymerase High Fidelity (Invitrogen™ Carlsbad, California, US) for the first PCR. The program started with a denaturation step at 94° C for 2 min followed by 15 PCR cycles of denaturation at 94°C for 20 s, annealing at 50°C for 20 s and extension at 72° C for 30 s and a final extension step at 72°C for 6 min. For the second PCR, 2.5 μl of the product from the first PCR was amplified with inner primers and the cycle profile and enzyme as for the first PCR. Amplified DNA was purified using Agencourt AMPure XP (Beckman Coulter Beverly, Massachusetts) and quantified using Qubit. For each sample the number of HIV-1 DNA templates used for sequencing was roughly quantified in triplicate by limiting dilution using the same PCR conditions, three dilutions (usually 0.5, 0.1, 0.02 μg of DNA) and Poisson statistics. Control experiments were performed to evaluate PCR-induced recombination using the plasmids NL4-3 and SF162, which were spiked in equal proportion into human DNA and amplified using the same PCR conditions as above. The results showed that there was minimal PCR-induced recombination in this short amplicon.
Sequencing and read processing
The HIV specific primers were flanked by NexteraXT adapters. To construct sequencing libraries, indices and sequencing primers were added in 12-15 cycles of additional PCR. Amplicons were sequenced on an Illumina MiSeq machine with 2x250 cycle kits. Between 6,500 and 190,000 (median 35,000) paired-end reads were generated per sample. The overlapping paired-end sequencing reads were merged to create synthetic reads spanning the entire p17 amplicon. In case of disagreement between paired reads, the nucleotide on the read with the higher quality score was used. We counted the number of times a particular p17 sequence was observed and did subsequent analysis with read-abundance pairs. To reduce the influence of sequencing and PCR errors, we combined rare sequences (below frequency 0.002) with common sequences if they differed by no more than one position. Specifically, starting with the rarest sequences, we merged rare sequences with the most common sequence that was one base away. The cutoff 0.002 is the typical error frequency of the pipeline as determined earlier [26]. All analysis is done in Python using the libraries numpy, biopython, and mat-plotlib [36–38].
Hypermutation detection To classify sequences into obvious hypermutants and sequences representative of circulating virus, we counted mutations at positions that are not variable in the RNA samples obtained prior to therapy. If more than 4 mutations were observed and at least half of them were G→A, the sequence was considered a hypermutant. The distribution of the different transition mutations relative to the closest genome found in RNA samples are shown in Fig. S2 for reads classified as hypermutants or not. Results we obtained for sequences classified as non-hypermutants are very similar to results obtained when using only sequences without stop codons.
Phylogenetic analysis
We reconstructed phylogenetic trees using the approximate maximum likelihood method implemented by FastTree [27]. Tips were annotated with frequency, source and sample date using custom python scripts.
Statistical analysis
Root-to-tip distances were calculated as the average distance between a sample and the founder sequence approximated by the consensus sequence of the first RNA sample. To determine the rate of evolution in absence of treatment, this root-to-tip distance was regressed against time. To determine the rate of evolution on treatment, the root-to-tip sequence of the last RNA sample and the DNA samples was regressed against time. To determine the most likely seeding time for a p17gag DNA sequence obtained from PBMCs, we calculated the likelihood of sampling this sequence given the SNP frequencies in each RNA sample and assigned the sequence to the sample where this likelihood was highest.
Acknowledgements
This work was supported by the European Research Council through grant Stg. 260686 and the Swedish Research Council trough grant K2014-57X-09935. We would also like to express our gratitude to the study participants.