Abstract
Background Mycobacterium tuberculosis (MTB) whole genome sequencing data can provide insights into temporal and geographic trends in resistance acquisition and inform public health interventions.
Methods We curated a set of clinical MTB isolates with high quality sequencing and culture-based drug susceptibility data spanning four lineages and more than 20 countries. We constructed geographic and lineage specific MTB phylogenies and used Bayesian molecular dating to infer the most-recent-common-susceptible-ancestor age for 4,869 instances of resistance to 10 drugs.
Findings Of 8,550 isolates curated, 6,099 from 15 countries met criteria for molecular dating. The number of independent resistance acquisition events was lower than the number of resistant isolates across all countries, suggesting ongoing transmission of drug resistance. Ancestral age distributions supported the presence of old resistance, ≥20 years prior, in the majority of countries. A consistent order of resistance acquisition was observed globally starting with resistance to isoniazid, but resistance ancestral age varied by country. We found a direct correlation between country wealth and resistance age (R2= 0.47, P-value= 0.014). Amplification of fluoroquinolone and second-line injectable resistance among multidrug-resistant isolates is estimated to have occurred very recently (median ancestral age 4.7 years IQR 1.9-9.8 prior to sample collection). We found the sensitivity of commercial molecular diagnostics for second-line resistance to vary significantly by country (P-value <0.0003)
Interpretation Our results highlight that both resistance transmission and amplification are contributing to disease burden globally but are variable by country. The observation that wealthier nations are more likely to have old resistance suggests that programmatic improvements can reduce resistance amplification, but that fit resistant strains can circulate for decades subsequently.
Funding This work was supported by the NIH BD2K grant K01 ES026835, a Harvard Institute of Global Health Burke Fellowship (MF), Boston Children’s Hospital OFD/BTREC/CTREC Faculty Career Development Fellowship and Bushrod H. Campbell and Adah F. Hall Charity Fund/Charles A. King Trust Postdoctoral Fellowship (AD).
Evidence before this study Acquisition and spread of drug-resistance by Mycobacterium tuberculosis (MTB) varies across countries. Local factors driving evolution of drug resistance in MTB are not well studied.
Added value of this study We applied molecular dating to 6,099 global MTB patient isolates and found the order of resistance acquisition to be consistent across the countries examined, i.e. acquisition of isoniazid resistance first followed by rifampicin and streptomycin followed by resistance to other drugs. In all countries with data available there was evidence for transmission of resistant strains from patient-to-patient and in the majority for extended periods of time (>20 years).
Countries with lower gross wealth indices were found to have more recent resistance acquisition to the drug rifampicin. Based on the resistance patterns identified in our study we estimate that commercial diagnostic tests vary considerably in sensitivity for second-line resistance diagnosis by country.
Implications of all available evidence The longevity of resistant MTB in many parts of the world emphasizes its fitness for transmission and its continued threat to public health. The association between country wealth and recent resistance acquisition emphasizes the need for continued investment in TB care delivery and surveillance programs. Geographically relevant diagnostics that take into account a country’s unique distribution of resistance are necessary.
Introduction
Tuberculosis (TB) defines a global epidemic that takes more lives than any other infection due to a single pathogen1. The emergence of multidrug-resistant (MDR) and extensively drug-resistant (XDR) resistant TB presents a major hurdle to efforts in accelerating TB decline. Halting the transmission of drug-resistant (DR) TB has been a major focus of studies addressing this hurdle2. But the epidemic is ultimately defined by local factors that remain understudied in many parts of the world3. The study of geographic and temporal heterogeneity of the DR-TB epidemic can provide insights into these local factors as key drivers of MDR-TB prevalence and persistence in the community, including programmatic and bacterial factors. This understanding is key to future disease control and prevention of antibiotic resistance development.
Over the past decade, increased uptake of molecular and whole genome sequencing (WGS) technologies, and their application to Mycobacterium tuberculosis (MTB) clinical isolates has offered novel insights into pathogen biology and diversity in the context of human infection4–7. The application of WGS has allowed us to better understand the genetic determinants of drug resistance (DR) within MTB8. The detection of these genetic determinants using molecular technologies that include WGS is now increasingly adopted for TB resistance diagnosis in many parts of the world9 and is beginning to replace the more biohazardous and time consuming culture based drug susceptibility tests (DST). The study of isolates sampled from epidemiological outbreaks or from the same host over time has allowed the estimation of MTB’s molecular clock rate, or temporal rate of accumulation of fixed genome-level variation10, 11. The application of this rate to new WGS data from isolates collected for surveillance has helped improve transmission inference and molecular dating of specific evolutionary events such as resistance acquisition or lineage divergence10, 12–13.
We sought to use a large clinical collection of MTB WGS and resistance phenotype data to study how, when, and where resistance was acquired on a global scale. Using a Bayesian implementation of coalescent theory, we estimate and compare dates of resistance acquisition for MDR/XDR isolates across 15 different countries. We use the recency of resistance acquisition as a measure of fitness of the circulating strains in their respective environments and study the effect of country wealth, as a proxy for TB control programme funding, on the recency of resistance acquisition at a macro level. We also assess the distribution of unexplained MTB phenotypic resistance across 20 countries, to evaluate the accuracy and geographic heterogeneity of molecular detection of common MTB genetic resistance determinants, and discuss implications for DR-TB control.
Methods
Further details available in the supplementary material.
Data and quality control
We compiled a 10,299 MTB WGS dataset with culture based DST (phenotypic) data using public databases (Patric14, ReSeqTB15) and literature curation11–13, 16–26. A summary table with the phenotypic data is available online at https://github.com/farhat-lab/resdata.
Genomic analysis/variant calling
We used a previously validated genomic analysis pipeline for MTB described by Ezewudo et al.27 with modifications as detailed in the supplement.
Drug resistance definitions
Drugs were labelled as follows: isoniazid (INH), ethambutol (EMB), rifamycins (rifampicin or rifabutin) (RIF), streptomycin (STR), pyrazinamide (PZA), fluoroquinolones (FLQ) (includes moxifloxacin, ciprofloxacin, ofloxacin), second-line injectables (SLIs) (includes kanamycin, amikacin, capreomycin), ethionamide/prothionamide (ETH), and cycloserine (CYS). Para-aminosalicylic acid was not analysed due to the paucity of data. Isolates not tested for susceptibility to both INH and RIF were excluded from the assessment of DR frequency by country and lineage. Isolates resistant to both INH and RIF were labelled MDR. Those resistant to INH, RIF, FLQ and SLIs were labelled XDR.
Estimating resistance acquisition dates
Isolates were separated into 179 groups corresponding to a single drug, lineage and source country, referred to hereafter as a ‘group’. Genetic diversity was computed as the average pairwise genetic distance within a group. To accurately date resistance acquisition, a drug-geography-lineage group was analysed only if it consisted of at least 10 isolates, ≥20% of isolates were susceptible and ≥1 isolate was resistant. To exclude isolates that only represent outbreak settings and didn’t carry more long-term information about resistance, we excluded groups with a genetic diversity score <1 standard deviation from the mean genetic diversity score measured across all groups. Supplementary methods detail the phylogeny construction and the estimation of the age of the most recent susceptible common ancestor (MRSCA) in years prior to isolation of the clinical sample(s).
Distribution of Resistance Mutations
We compared the expected sensitivity and specificity of mutations captured by commercial diagnostics (summarized from the literature in Table S9) and those based on more extensive lists of mutations in DR genes that can be captured using targeted or whole genome sequencing. We used three mutations lists for the latter (1) a set of 267 common resistance-associated mutations that we previously determined using randomForests28 designated “RF-select WGS test”, (2) a set of mutations determined using direct association29 designated “DA-select WGS test”, and (3) any non-synonymous mutation or noncoding mutation in known DR regions (Table S10) in a “all WGS test”. We excluded previously described neutral/lineage associated mutations9.
Code
All code used in the analysis is publically accessible at https://github.com/farhat-lab/geo_dist_tb.
Results
Data and global lineage distribution
Of the 10,299 MTB clinical isolates with WGS and culture-based DST data available, 9,385 passed sequence quality criteria and of these 8,550 had country of origin data (Figure 1). The four major MTB lineages, 1-4, were well represented. A relatively high proportion, 42%, of United Kingdom (UK) isolates (n= 1873) belonged to Lineages-1 & 3 (Figure 2A). Overall, the non-Europe-America-Africa Lineages-1,3, and 2, comprised 40% of European isolates (n=3956) and 7% of North and South American isolates (n=1297).
Phenotypic resistance distribution
Of the 8,550 isolates, 568 isolates lacked either INH or RIF DST data. Out of the remaining 7,909 isolates, 5,022 were pan-susceptible, 2887 were resistant to one or more drugs (DR) and of these 1937 were resistant to INH and RIF (MDR) and 288 were MDR and resistant to an SLI and a FLQ (XDR). The 8,550 isolates originated from 52 countries. Of these, 23 countries were represented by >10 isolates with resistance data, 21/23 were found to have MDR isolates and 9/21 had XDR (Figure 2B). We compared the MDR frequency in our WGS based sample with the WHO reported MDR/RIF resistance (RR) rates for the latest year available30. Out of the 21 countries, the confidence interval for the MDR-TB proportion in our sample overlapped with that of the WHO in 4 (19%) countries, was higher in 14 (67%) and lower in 3 (14%) of countries (Table S4). MDR rates by lineage were 3% for Lineage-1 (n=439), 48% for Lineage-2 (n=1085), 4% for Lineage-3 (n=760) and 23% for Lineage-4 (n=3358).
Molecular dating of Resistance Acquisition
Of the 8,550 isolates, 2,451 isolates appeared in groups that did not meet our dating requirements (Methods). The remaining 6,009, included 1,547 isolates resistant to one or more drugs and were grouped into 179 country/lineage/drug combinations. We estimated 4,869 MRSCA dates for 10 drugs across these 179 groups. The number of independent resistance acquisition events i.e. unique MRSCA dates, was consistently lower than the total number of dated resistance isolates suggesting ongoing transmission of drug resistant isolates (Table S11). We estimated a lower bound on the burden of resistance due to transmission ranging by country from ≥14% to ≥52% pooled across drugs (Methods, Table S11). The proportion of INH or RIF resistance attributed to transmission was the highest among the 10 drugs at ≥43% and ≥46% respectively pooled across countries (pooled from Table S11).
We examined the relative order of phenotypic resistance acquisition on a global scale. For INH, we found that resistance to INH on average developed before resistance to other drugs (Figure 3A-B). Median MRSCA for INH was 11.4 years prior to isolation (IQR 6.3-16.2) vs. 7.6 years (IQR 3.0 – 16.0) for RIF, Wilcoxon P-value <10-14. Median MRSCA ages for RIF and STR resistance (7.6 years, IQR 3.0 – 16.0 and 7.7 years, IQR 3.4 – 13.0 respectively) were second oldest and not statistically significant from each other (Wilcoxon P-value 0.31). The dating supported that EMB resistance followed the acquisition of RIF (Wilcoxon P-value < 10-6) at a median MRSCA age of 5.0 years prior (IQR 2.1 – 12.5), and that this was followed by resistance to PZA, ETH, FLQ, SLIs, or CYS (Figure 3A), amongst which MRSCA ages did not significantly differ (Figure 3B). We found no significant correlation between the median MRSCA dates and the drug’s date of introduction into clinical use with R2= 0.04 (F-test with 1 DF P-value= 0.60, Table S12).
We assessed the frequency of recent resistance amplification to PZA, EMB, FLQs and SLIs among MDR, i.e. to pre-XDR/XDR, within five years of sample isolation. Among the 11 countries with both MDR and pre-XDR/XDR isolates, we identified four countries (Peru, Russia, Sierra Leone, South Africa) with recent resistance amplification to PZA and EMB (>1% of MDR). The rates of recent amplification ranged from 2% (95% CI 1% - 4%) for PZA in Russia to 33% (95% CI 26% - 41%) for EMB in South Africa (Figure 5). Peru, Romania and South Africa were also measured to have recent resistance amplification to FLQs and SLIs (Figure 5). The median MRSCA age for FLQ or SLI resistance acquisition among MDR isolates was 4.7 years (IQR 1.9-9.8) prior to sample collection.
We found RIF to have the highest proportion of old resistance (MRSCA >20 years prior to isolation) at 17%, 197/1184 out of the total dated RIF resistance acquisition events. Old resistance was well distributed geographically and for RIF occurred in 9 of 12 countries with available dating data (Figure 4). Old FLQ resistance constituted 8% (24/311) of total dated isolates and spanned 6 of the 7 countries with available data.
We compared the geographic distribution of MRSCA ages restricting to four key drug classes, namely INH, RIF, SLIs and FLQs, and the five countries with the largest number of resistant isolates (Figure 4). MRSCA ages did not differ between the UK and China across all four drug classes. These two countries had the oldest median MRSCA across the five countries and four drug classes except for INH. MRSCA ages in the UK were a median of 13.0 years (IQR 10.7-19.0) for RIF, a median of 10.6 years (IQR 7.3-11.2) for SLIs and a median of 8.4 years (IQR 6.7-18.4) for FLQs (Figure 4). South Africa most consistently had the youngest median MRSCA for the four drug classes, but its MRSCA distribution was not significantly different from that of Peru (for FLQs and SLIs) and Russia (for SLIs) (Table S3). A similar geographic/age pattern was observed for the drugs PZA and EMB across these five countries (Table S3).
We examined if the geographic resistance age differences correlated with resources available for TB control programs using the gross domestic product per capita as a proxy. We found GDP to correlate significantly with an older RIF MRSCA date with R2= 0.47 (F-test with 1 DF P-value= 0.014) (Figure 6 & Table S8).
Distribution of Resistance Mutations
We assessed the frequency of 267 resistance mutations previously determined to be important for resistance prediction28 and their geographic distribution among the 8,550 isolates with country of origin and WGS data meeting quality criteria (Figure 1). Resistance mutation prevalence varied significantly by country. The most frequent INH causing mutation31, katG S315T was more frequent among phenotypically INH resistant isolates by DST (pheno-R) in Russia (84%, n=526) than in Peru (67%, n=760) (Fisher P-value 1×10-12). The second most common INH resistance mutation −15 C>T fabG1/inhA promoter was more prevalent among INH pheno-R Peruvian isolates (20%) than in Russian isolates (8%) (Fisher P-value 7×10-9). Twenty four of the 267 resistance mutations (9%) varied geographically to a larger extent than the mutation fabG/inhA promoter −15C>T (standard deviation 11%, frequency range 0-39%, Table S13). The mutation I491F was recently described to be common in Estawini32 and is not detectable by line-probe or GeneXpert commercial molecular diagnostics. In our sample that did not contain data from Estawini, we calculated a standard deviation of 1% for the global frequency of I491F (range 0% - 4%) among RIF pheno-R isolates.
We calculated the proportion of pheno-R isolates that can be captured by the Hain Line-probe or GeneXpert commercial molecular diagnostics due to the presence of one or more mutations in their pooled target regions for the drugs INH, RIF, SLIs, and FLQ (Tables 1 and S14). Sensitivity was highest for RIF (90% of 2624) and lowest for FLQs (51% of 854). Specificity was consistently high (lowest for SLIs at 86%) (Table 1). Second-line sensitivity of commercial diagnostics differed significantly across countries (Table S14). FLQ sensitivity in Peru was 38% (n= 121) and 77% in South Africa (n=111) (Fisher P-value 1×10-6). A similarly low sensitivity for SLI resistance was seen in Peru compared with South Africa (Fisher P-value 3×10-4) (Table S14).
We examined if expanding the resistance mutation list to variants previously characterized in diverse global MTB genomic datasets using direct association29 or random forests28 can improve sensitivity and specificity in a “select WGS test.” The select WGS test improved sensitivity slightly for INH and SLIs with relatively preserved specificity (Table 1). In addition select WGS test allowed for prediction of resistance to other drugs not tested by commercial diagnostics: PZA, EMB and Streptomycin. For comparison, we assessed if including any non-silent variant in the resistance regions (excluding a select number of known lineage markers) was indeed inferior to the more informed ‘select WGS test” reported previously. We found that this “all WGS test” only modestly improved sensitivity and at the expense of a larger decrease in specificity (Figure 7).
Discussion
Using 8,550 clinical MTB sequences with culture-based DST, we examined geographic trends in the DR-TB epidemic. Geographically, MTB lineages 1-4 were each represented in the continents of Europe, Asia and Africa providing evidence of disease spread across borders, likely driven by human migration30. We found MDR-TB in nearly every country represented by more than 10 isolates. XDR isolates were found in half of these countries and spanning all five major continents. Lineage-2 had the highest percentage of MDR isolates in our sample followed by Lineages-4, 3 and 1. Using this diverse sample we dated more than 4869 resistance phenotypes across 4 lineages and 15 countries.
We found a consistent order of resistance acquisition globally among drug classes. The development of INH resistance was previously found to be a sentinel event heralding the development of MDR33. Our results corroborate these findings using phenotypic resistance data and across a larger geographically diverse sample. After INH, we find that MTB acquires resistance to RIF/STR then EMB followed by PZA, ETH, FLQ, SLIs, or CYS. We found no correlation between the age of resistance acquisition and the year of clinical introduction of the drug but there may be multiple other causes for the observed order of resistance acquisition. Differences in mutation rates across drug targets or resistance genes have been postulated but shown to be an unlikely explanation for INH resistance arising first34, 35. Pharmacokinetic difference may result in higher risk for under-dosing36 for some drugs and earlier resistance acquisition. Bacterial fitness costs are also variable across resistance mutations. For INH resistance, mutations like katG S315T carry a low fitness cost and likely contribute to resistance arising earliest for this drug33, 37, 38. The order of drug administration can explain dating differences between first-line (INH, RIF, EMB, PZA) and second (ETH, FLQ, SLIs) or third-line (CYS) resistance, as second-line drugs are usually only administered after resistance to first-line drugs is ascertained. Acquisition of resistance to INH first then RIF may also relate to their use for treatment of latent TB infection, leading to more exposure and selection pressure overall. However, because adoption of INH preventative therapy for latent TB remains low in many parts of the world, we expect it to be a lesser contributor to INH and RIF resistance rates39. Lastly, the observation of contemporaneous acquisition of RIF and STR resistance is likely best explained by the effects of Category II TB treatment initially recommended in 199140. Category II is no longer recommended by the WHO but consists of adding streptomycin to the first-line drug regimen after treatment failure. Our dating supports that streptomycin resistance amplified among patients failing due to recent RIF resistance and/or MDR acquisition.
Published evidence supports that most resistant cases of MTB result from recent resistance acquisition in the host or are related to transmission41. Reactivation of resistant MTB disease acquired remotely (>2 years prior) is much less likely42. Thus the identification of isolates with old resistance suggests high fitness for continued transmission between human hosts. Most countries with available data had isolates with resistance dated more than 20 years prior. This is also supported by our phylogenetic assessment where we estimate a lower bound of TB resistance due to transmission to range between 14-52% across countries with available data. As our approach cannot distinguish between resistance importation through human migration after transmission outside of the country and new resistance acquisition, these figures are underestimates of the true resistance burden due to transmission. Mathematical models of TB rates have previously predicted transmission to be a major driver of observed resistance rates43, we present here WGS based evidence of the high burden of resistance transmission. Mathematical models have also emphasized that drug resistant strain fitness is a key parameter that dictates how the resistance epidemic will unfold. Our results support that >14-52% of isolates are fit and successfully transmitting patient-to-patient and in most countries there have been uninterrupted chains of resistance transmission for >20 years. These data emphasize the need to contain resistance transmission through improved diagnosis, treatment and other preventative strategies such as infection control and vaccine development.
In addition to transmission, we find evidence for recent resistance amplification, especially to second-line drugs mediating the transition from MDR to XDR-TB. XDR has considerably worse treatment outcome than susceptible TB and incurs more than 25 times the cost44. We estimate that half of FLQ and SLI resistant isolates had acquired resistance within 4.7 years of isolation despite the promotion of directly-observed-therapy (DOT) by the WHO since 1994. As most FLQ and SLI resistant isolates are also MDR, our results also emphasize the need for better regimens to treat MDR that can prevent resistance amplification. By country, we found a significant correlation between the estimated age of resistance acquisition and per capita GDP, with more affluent countries having older ages of resistance. This unexpected correlation is likely driven by a combination of factors but the routine use of DST and close patient monitoring in the more well-resourced health systems are likely important contributors. Specifically, we found the UK and China to have the oldest resistance ages across the drugs. The Chinese national TB program budget was increased from $98 million in 2002 to $272 million in 200745 and a new policy for free TB diagnostics tests and drug use was introduced in 200458, 46. This increased investment can explain the observed low rates of recent resistance acquisition in China30, 47.
Likely due to geographic differences in MTB lineage, transmission and resistance acquisition rates, we find 10% of assayed resistance mutations to have high geographic variance. We also found commercial diagnostics to vary in sensitivity for second-line drugs. Given recent reports about the accuracy of WGS for confirming susceptibility of MTB29, we measured improvements in resistance sensitivity offered by including mutations outside of regions targeted by commercial diagnostics through direct association. This offered modest improvements in sensitivity with little to no change in specificity. We found a considerable number of indeterminate mutations in resistance regions, that when included with simple direct association improve sensitivity but at the expense of loss of specificity. The study of these variants through statistical models will likely further inform their diagnostic use in the future28, 48.
Our study has several limitations including the oversampling of DR isolates as evidenced by our comparison with WHO reported MDR rates. We tried to control for this by dating only in countries with at least 20% susceptible isolates and limiting dating of low diversity samples that represent unique outbreak settings and lack long term information about resistance acquisition. This may have resulted in underestimation of rates of recent resistance acquisition but despite this we were able to document recent resistance acquisition in many countries. Molecular dating is also reliant on the accurate estimation of the phylogenetic tree of MTB isolates and the molecular clock assumption. We thus used a rigorous approach to phylogenetic estimation and dating despite its computational and time cost49. Our analysis also assumes the accuracy of culture based phenotypic DST, even though test to test variability is known to exist. We justify this as our data was curated from ReseqTB15 and studies were phenotypic testing was performed in national or supranational laboratories with rigorous quality control.
Overall, our results support that DR rates are fuelled by both recent resistance acquisition and ongoing transmission, and suggest the need for better detection, treatment and health system investment. In the future, reassessment of these patterns will be enabled by the sharing of systematically collected isolate data, data that is increasingly generated as by-products of TB surveillance and resistance diagnosis29.
Authors’ contributions
Yasha Ektefaie conducted the data analysis, drafted and revised the manuscript.
All authors provided key edits to the manuscript.
Additionally:
Luca Freschi contributed to the data analysis.
Avika Dixit contributed to the data analysis.
Maha Farhat conceptualized the study, supervised the data analysis, reviewed, wrote and edited the manuscript.
Declaration of interests
Yasha Ektefaie: None
Luca Freschi: None
Avika Dixit: None
Maha Farhat: None
Supplementary Methods
Data curation and quality control
The isolate metadata including their geographic locations were downloaded using metatools_ncbi (https://github.com/farhat-lab/metatools_ncbi). We also performed literature curation to fill the gaps in the NCBI geographic location data. The resulting table of the geographic locations of the isolates is available in Supplementary File 1. We excluded isolates that did not meet WGS quality control criteria as detailed below, had no geographic information or were not tested for phenotypic resistance to one or more drugs.
Genomic analysis/variant calling
Briefly, reads were trimmed using PRINSEQ1 setting average phred score threshold to 20. Raw read data was confirmed to belong to MTB complex using Kraken2. Isolates with <90% mapping were excluded. Reads were aligned to H37Rv (GenBank NC000962.3) reference genome using BWA MEM3. Duplicate reads were removed using PICARD4. We excluded any isolates with coverage <95% of known drug resistance regions (katG, inhA & its promoter, rpoB, embA, embB, embC & embB promoter, ethA, gyrA and gyrB, rrs, rpsL, gid, pncA, rpsA, eis promoter) at 10x or higher. Variants were called using Pilon5 that uses local assembly to increase indel (insertions and deletions) call accuracy. This deviates from Ezewudo et al. that uses Samtools for variant calling6. The reference allele was implied if allele frequency was <75% or the Pilon filter was not PASS. Low confidence coordinates were filtered from all strains if >95% of strains did not have coverage of (trimmed reads) at least 10x at that site. Isolate lineage belonging to one of the seven main MTB lineages was confirmed using the Coll et al. SNP barcode7.
Drug resistance definitions and comparison with WHO reported resistance proportions
The ‘Mono’ resistant designation was given to isolates that were resistant to only one specific drug and susceptible to all others that were tested, with the exception of the INH-mono resistant category that encompassed any isolates that were resistant to INH and/or STR but not to others that were tested. The ‘Other-R’ category was reserved for isolates that were resistant to some drugs but were neither INH or STR mono resistant, nor MDR or XDR. Isolates were labelled susceptible if they were susceptible to all drugs tested.
To compute exact binomial confidence intervals for MDR proportions by country we used the python library statsmodel8. To assess overlap with World Health Organization (WHO) estimates we determined whether the confidence intervals of our proportion intersected with that of the WHO. We labelled our estimate as high if our confidence interval was higher than the WHO, low if it was lower, and the same if they intersected.
Estimating resistance acquisition dates and lower bounds of resistance transmission
A maximum likelihood tree was generated for each group via RAxML 8.2.119 with H37Rv (NC000962.3) as the outgroup, starting from a neighbour-joining seed tree and assuming a generalized time reversible (GTR) nucleotide substitution model with the Γ distribution used to model site rate heterogeneity10. We bootstrapped the maximum likelihood tree 1000 times. The maximum likelihood tree was dated using BEAST v1.10.411 assuming a relaxed molecular clock with a log normal distribution and a mean rate of 0.5 SNP per genome per year based on prior published data12. Sumtrees.py from the DendroPy library13 was then used to combine the output from the bootstrap analysis and that of BEAST to get our final dated phylogenetic tree with nodal bootstrap support.
We dated the most recent common ancestor between all the resistant isolates and their most closely related susceptible isolate. Accordingly, the dates of resistance acquisition will be referred to as the estimated age of the most recent susceptible common ancestor (MRSCA) in years prior to isolation of the clinical sample(s) throughout the text. We excluded resistant isolates with MRSCAs inferred at nodes with less than 50 bootstrap support.
We calculated the number of phylogenetically inferred resistance acquisition events (Naq) per country and lineage as the number of unique MRSCAs identified. This was compared with the total resistant isolates that could be dated (Ntd). Phylogenetically inferred unique resistance acquisition events for a particular country may be related to either in host evolution of new resistance or due to human migration/importation from another country with the latter still possibly related to transmission elsewhere. Thus, the following quantity represents a lower bound on the burden of resistance due to transmission for a particular country:
To estimate the order of resistance acquisition for different drugs we pooled the MRSCA dates by drug across countries and lineages. We compared the medians of the MRSCA distributions and performed pairwise Wilcoxon Rank Sum tests to assess for statistical significance, correcting for multiple testing using the Bonferroni approach. Interquartile ranges were calculated using the python package numpy14.
We correlated the median MRSCA date per drug pooled across countries against the date of drug introduction using linear regression as implemented in Microsoft Excel version 16.25.
Distribution of resistance mutations
We measured resistance mutations’ geographic variance by calculating the proportion of resistant isolates with the resistance mutation per country, excluding countries with less than 10 resistant isolates for a particular drug. We computed the standard deviation of this distribution of proportions across countries. To test cross-country differences in the proportion of INH phenotypically resistant isolates that contained specific mutations, we used a Fisher-test using the python library scipy15.
GDP/Programmatic Spending
To correlate countries’ gross domestic product (GDP) per capita against resistance acquisition dates, GDP per capita data for 2019 was gathered from the International Monetary Fund (IMF)16 and plotted against the median MRSCA date for RIF using Microsoft Excel version 16.25. F-value and significance was calculated via Anova17 in Excel.