Abstract
The global spread of antimicrobial resistance has been well documented in Gram-negative bacteria and healthcare-associated epidemic pathogens, often emerging from regions with heavy antimicrobial use. However, the degree to which similar processes occur with Gram-positive bacteria in the community setting is less well understood. Here we demonstrate the recent origin and global spread from the Indian subcontinent of a multidrug resistant Staphylococcus aureus lineage, sequence type 772 (Bengal Bay clone). Short-term outbreaks occurred following intercontinental transmission, typically associated with travel and family contacts, but ongoing endemic transmission was uncommon. Instrumental in the emergence of a single dominant clade in the early 1990s was the acquisition of a multidrug resistance integrated plasmid that did not appear to incur a significant fitness cost. The Bengal Bay clone therefore combines the multidrug resistance of traditional healthcare-associated clones with the epidemiological and virulence potential of community-associated clones.
Introduction
Methicillin-resistant Staphylococcus aureus (MRSA) is a major human pathogen with a propensity to acquire antibiotic resistance, complicating treatment and allowing persistence in environments where there is antibiotic selection pressure. While multidrug resistance has traditionally been associated with healthcare-associated strains, the emergence of multidrug resistant MRSA that is capable of surviving in community reservoirs would pose a significant challenge to infection control and public health1. While previous studies have highlighted localized or sporadic emergence of multidrug resistant strains in the community setting2–7, there has so far been limited evidence for resistance acquisition driving the emergence and global dissemination of an epidemic, community-associated lineage of MRSA (as opposed with healthcare-associated lineages). Given the heavy burden and costs associated with MRSA infections8,9, there is an urgent need to elucidate the patterns and drivers of the spread of novel virulent and multidrug-resistant MRSA.
In 2004, a novel S. aureus clone designated sequence type (ST) 772 was isolated from hospitals in Bangladesh10 and from a community-setting in India11. The clone continued to be reported in community-and healthcare-associated environments in India, where it has become one of the dominant epidemic lineages of community-associated MRSA12. Similar to other S. aureus, ST772 primarily causes skin and soft tissue infections, but more severe manifestations such as bacteraemia and necrotising pneumonia have been observed. Its potential for infiltration into nosocomial environments13–16 and resistance to multiple classes of commonly used antibiotics (aminoglycosides, β-lactams, fluoroquinolones, macrolides and trimethoprim)16–18 has resulted in ST772 becoming a serious public health concern in South Asia and elsewhere. Over the last decade, the clone has been isolated from community-and hospital-environments in Asia, Australasia, Africa, the Middle East and Europe (Supplementary Map 1, Supplementary Table 1). As a consequence of its discovery, distribution and epidemiology, the lineage has been informally dubbed the “Bengal Bay clone”19. Despite clinical and epidemiological hints for a recent and widespread dissemination of ST772, a unified perspective on the global evolutionary history and emergence of the clone is lacking.
Here, we analyse whole genome sequences from a globally representative collection of 340 ST772 strains to elucidate the key events associated with the emergence and global spread of a multidrug resistant community-associated MRSA clone. We found that international travel and family connections to the Indian subcontinent were closely linked with the global spread of ST772. The integration of a multidrug resistance plasmid led to the emergence of a dominant clade (ST772-A) in the early 1990s with phenotypic assays suggesting that the mobile element has not incurred a significant fitness cost to this clade.
Results
We generated whole genome sequence data of 354 S. aureus ST772 isolates collected across Australasia, South Asia, Hong Kong, the Middle East and Europe between 2004 and 2013 (Supplementary Map 2, Supplementary Table 2). Fourteen isolates were excluded after initial quality control due to contamination (Supplementary Tables 2, 3). The remainder mapped with 165x average coverage against the PacBio reference genome DAR414518 from Mumbai (Supplementary Tables 2, 3). Phylogenetic analysis using core-genome SNPs (n = 7,063) revealed little geographic structure within the lineage (Figure 1a). Eleven ST772 methicillin-susceptible S. aureus (MSSA) and -MRSA strains were basal to a single globally distributed clade (ST772-A, n = 329) that harbored an integrated resistance plasmid (IRP) described in the reference genome DAR414518 (Figures 1a, 1b). Population network analysis distinguished three distinct subgroups within ST772-A (Figures 1a, 1c): an early-branching subgroup harboring multiple subtypes of the staphylococcal chromosome cassette (SCCmec) (A1, n = 81), a dominant subgroup (A2, n = 153) and an emerging subgroup (A3, n = 56), that exclusively harbors a short variant of SCCmec-V.
Emergence and global spread from the Indian subcontinent
Epidemiological and genomic characteristics of ST772 were consistent with an evolutionary origin from the Indian subcontinent. Sixty percent of isolates in this study were collected from patients with family-or travel-background in Bangladesh, India, Nepal or Pakistan, compared to unknown (19%) or other countries (21%) (Figure 2a, Supplementary Table 2). We found significantly more isolates from India and Bangladesh among the basal strains, compared to clade ST772-A (Fisher’s exact test, 5/11 vs. 47/291, p = 0.026). In particular, three isolates from India and Bangladesh were basal in the (outgroup-rooted) maximum-likelihood phylogeny (Figure 1b, Supplementary Figure 1), including two MSSA samples from the original isolations in 2004 (RG28, NKD22). Isolates recovered from South Asia were genetically more diverse than isolates from Australasia and Europe, supporting an origin from the Indian subcontinent (Figure 2b, Supplementary Figure 2).
Consistent with a methicillin-susceptible progenitor, a significantly higher proportion of MSSA was found in the basal isolates (Fisher’s exact test, 4/11 vs. 31/291, p = 0.028) and MSSA isolates demonstrated a lower patristic distance to the root of the maximum likelihood phylogeny compared to MRSA (Supplementary Figure 3a). Although it appears that MSSA is proportionately more common in South Asia (Supplementary Figure 3b), it is also possible that the observed distribution may be related to non-structured sampling.
Phylogenetic dating suggests an initial divergence of the ancestral ST772 population in 1970 (age of root node: 1970.02, CI: 1955.43 – 1982.60) with a core-genome substitution rate of 1.61 x 10−6 substitutions/site/year after removing recombination (Figure 2c, 2d, Supplementary Figures 4, 5). This was followed by the emergence of the dominant clade ST772-A and its population subgroups in the early 1990s (ST772-A divergence, 1990.83, 95% CI: 1980.38 – 1995.08). The geographic pattern of dissemination is heterogeneous (Figure 1a). There was no evidence for widespread endemic dissemination of the clone following intercontinental transmission, although localised healthcare-associated outbreak clusters occurred in neonatal intensive care units in Ireland (NICU-1 and NICU-2, Figure 1a, Supplementary Figure 6)20 and have been reported from other countries in Europe16 and South Asia13–15. While some localised spread in the community was observed among our isolates, patients in local transmission clusters often had traveled to or had family in South Asia (19/27 clusters, Supplementary Figure 6).
Antibiotic resistance acquisition is associated with emergence and dissemination
We examined the distribution of virulence factors, antibiotic resistance determinants and mutations in coding regions to identify the genomic drivers in the emergence and dissemination of ST772. Nearly all isolates (336/340) carried the Panton-Valentine leucocidin (PVL) genes lukS/F, most isolates (326/340) carried the associated enterotoxin A (sea) and all isolates carried scn (Supplementary Table 5). This indicates a nearly universal carriage, across all clades, of both, a truncated hlb-converting prophage (the typically associated staphylokinase gene sak was only present in one isolate) and the PVL/sea prophage φ-IND77221. Amongst other virulence factors, the enterotoxin genes sec and sel, the gamma-hemolysin locus, egc cluster enterotoxins and the enterotoxin homologue ORF CM14 were ubiquitous in ST772 (Supplementary Table 7). We detected no statistically significant difference between core virulence factors present in the basal group and ST772-A (Supplementary Table 5, Supplementary Figure 7).
We noted a pattern of increasing antimicrobial resistance as successive clades of ST772 emerged. Predicted resistance phenotypes across ST772 were common for ciprofloxacin (97.4%), erythromycin (96.2%), gentamicin (87.7%), methicillin (89.7%), penicillin (100%) and trimethoprim (98.8%), with a corresponding resistome composed of acquired and chromosomally encoded genes and mutations (Figure 3a, Figure 3b, Supplementary Table 6). There was significantly less predicted resistance in the basal strains compared to ST772-A, including overall multidrug-resistance (≥ 3 classes, 8/11 vs. 291/291, Fisher’s exact test, p < 0.001) (Figure 3d). The key resistance determinants of interest were the SCCmec variants, an integrated resistance plasmid, and other smaller mobile elements and point mutations.
MRSA isolates predominantly harbored one of two subtypes of SCCmec-V: a short variant (5C2) or a composite cassette (5C2&5), which encodes a type 5 ccr complex containing ccrC1 (allele 8) between the mec gene complex and orfX22 (Supplementary Figure 8). Integration of the Tn4001 transposon encoding aminoglycoside resistance gene aadA-aphD occurred across isolates with different SCCmec types (260/267), but not in MSSA (0/35). All MRSA isolates (n = 7) within the basal group carried the larger composite cassette SCCmec-V (5C2&5), with two of these strains lacking ccrC and one isolate carried a remnant of SCCmec-IV (Figure 1a).
The diversity of SCCmec types decreased as ST772-A diverged into subgroups (Figure 1a, c, Supplementary Table 6). ST772-A1 included MSSA (n = 30) as well as SCCmec-V (5C2) (n = 22) and (5C2&5) (n = 18) strains. Four isolates harbored a putative composite SCC element that included SCCmec-V (5C2), as well as pls and the kdp operon previously known from SCCmec-II. One isolate harbored a composite SCCmec-V (5C2&5) with copper and zinc resistance elements, known from the European livestock associated CC398-MRSA23. Another six isolates yielded irregular and/or composite SCC elements (Supplementary Table 6). In contrast, the dominant subgroups ST772-A2 and -A3 exclusively carried the short SCCmec-V (5C2) element. Eleven of these isolates (including all isolates in NICU-2) lacked ccrC and two isolates carried additional recombinase genes (ccrA/B2 and ccrA2).
ST772-A was characterized by the acquisition of an integrated multidrug resistance plasmid (IRP, Figure 3c), encoding the macrolide-resistance locus msrA / mphC, as well as determinants against β-lactams (blaZ), aminoglycosides (aadE-sat4-aphA3) and bacitracin (bcrAB). Thus predicted resistance to erythromycin was uniquely found in ST772-A and not in any of the basal strains (Fisher’s exact test, 289/291 vs 0/11, p < 0.001, Figure 3d). The mosaic IRP element was highly similar to a composite extrachromosomal plasmid in ST8 (USA300)24 and a SCCmec integration in the J2 region of the ST8025 reference genome (Figure 3c, Supplementary File 1). A search of closed S. aureus genomes (n = 274) showed that the element is rare and predominantly plasmid-associated across ST8 genomes (6/274), with one chromosomal integration in the ST772 reference genome and the SCCmec integration in the ST80 reference genome (Supplementary Table 9).
Three basal strains were not multi-drug resistant and included two isolates from the original collections in India (RG28) and Bangladesh (NKD122) (Figure 1a, 3a). These two strains lacked the trimethoprim determinant dfrG and the fluoroquinolone mutations in grlA or gyrA, encoding only a penicillin-resistance determinant blaZ on a Tn554-like transposon. However, seven of the strains more closely related to ST772-A did harbor mobile elements and mutations conferring trimethoprim (dfrG) and quinolone resistance (grlA and gyrA mutations). Interestingly, we observed a shift from the quinolone resistance grlA S80F mutation in basal strains and ST772-A1, to the grlA S80Y mutation in ST772-A2 and -A3 (Figure 3a).
Thus, the phylogenetic distribution of the key resistance elements suggests acquisition of the IRP by a PVL-positive MSSA strain in the early 1990s (ST772-A1 divergence, 1990.83, 95% CI: 1980.38 – 1995.08), followed by fixation of both the shorter variant of SCCmec-V (5C2) and the grlA S80Y mutation in a PVL-and IRP-positive MSSA ancestor in the late 1990s (ST772-A2 divergence, 1999.18, 95% CI: 1993.26 – 2001.56) (Figure 1a, Figure 2c).
Phenotypic comparison of basal strains and the derived ST772-A lineage
We found three other mutations of interest that were present exclusively in ST772-A strains (Supplementary Table 7). The first mutation caused a non-synonymous change in fbpA (L55P), encoding a fibrinogen-binding protein that mediates surface adhesion in S. aureus26. The second comprised a non-synonymous change (L67V) in the plc gene, encoding a phospholipase associated with survival in human blood cells and abscess environments in USA30027. The third encoded a non-synonymous mutation (S273G) in tet(38), an efflux pump that promotes resistance to tetracyclines as well as survival in abscess environments and skin colonisation28. The functional implication of genes harboring these canonical mutations might suggest a modification of the clone’s ability to colonise and cause SSTIs.
In light of these canonical SNPs, we selected five basal strains and 10 strains from ST772-A to screen for potential phenotypic differences that may contribute to the success of ST772-A. We assessed in vitro growth, biofilm formation, cellular toxicity, and lipase activity (Figure 4, Supplementary Table 8). We found no statistically significant differences between the basal strains and ST772-A in these phenotypic assays, apart from significantly lower lipase activity among ST772-A strains (Welch’s two-sided t-test, t = 3.4441, df = 6.0004, p = 0.0137, Figure 4e), which may be related to the canonical non-synonymous mutation in plc. However, it is increased rather than decreased lipase activity that has been associated with viability of S. aureus USA300 in human blood and neutrophils27. We found no difference in the median growth rate of ST772-A compared to the basal strains (Figure 4, Mann-Whitney, W = 27, p = 0.8537, Supplementary Table 8), although there were two ST772-A strains that grew more slowly suggesting the possibility of some strain to strain variability. However, overall, it appears that acquisition of resistance determinants on the IRP has not incurred a significant cost to in vitro growth of strains from ST772-A.
Discussion
In this study, we used whole genome sequencing in combination with epidemiological and phenotypic data to investigate the drivers behind the emergence and spread of a multidrug resistant community-associated MRSA lineage from the Indian subcontinent. Our data suggests that the Bengal Bay clone has acquired the multidrug phenotype of traditional healthcare-associated MRSA, but retains the epidemiological and virulence potential of community-associated MRSA.
Within our dataset, the basal population of ST772 appears to have emerged from the Indian subcontinent in the 1960s to early 1970s. This basal population included strains from the original isolations of ST772 in Bangladesh and India in 2004. Recent studies have detected ST772-MSSA and -MRSA in Nepal29 and ST772-MRSA in Pakistan30 also, but it is unclear whether the lineage has been endemic in these countries prior to its emergence in India. Deeper genomic surveillance of ST772-MSSA and –MRSA in the region will be necessary to understand the local epidemiology and evolutionary history of the clone on the Indian subcontinent.
The establishment and expansion of a single dominant clade (ST772-A) occurred in the early 1990s and was associated with the acquisition of an integrated multidrug resistance MGE. The element is similar to a previously described extrachromosomal plasmid of USA30024 and a partially integrated element in the SCCmec of a ST80 reference genome25. While the element was found only once in the ST80 lineage31 and occurs predominantly on plasmids in closed ST8 (USA300) genomes, its distribution and contribution to the emergence of resistance in the ST8 lineage has so far not been addressed32,33. In contrast, the ubiquitous occurrence and retention of the element in ST772-Asuggests that it was instrumental to the emergence of the dominant clade in the Bengal Bay clone.. Importantly, our phenotype assays show that acquisition of drug resistance on this element was not accompanied with a significant fitness cost to ST772-A. This raises the possibility that members of this clade will both survive in environments where antibiotics are heavily used, such as hospitals or in communities with poor antibiotic stewardship, but also be at little disadvantage in environments where there is less antibiotic use, because its growth rate is comparable to that of non-resistant strains.
Furthermore, we observed a replacement of the long composite SCCmec-V (5C2&5) element with the shorter SCCmec-V (5C2) and fixation of the quinolone resistance mutation from grlA S80F to the grlA S80Y as ST772 diverged into its population subgroups in the 1990s. In light of earlier studies demonstrating a fitness advantage in having a smaller SCCmec element34–36, the fixation of the shorter SCCmec-V (5C2) may be a contributing factor to the success of ST772. We speculate that these changes may have allowed the clone to retain its multidrug resistant phenotype without incurring a significant fitness cost. Further work is required to investigate the role of resistance dynamics in the evolution and fitness potential of ST772. While we only assayed for a limited number of phenotypic differences, our data suggest that acquisition of antibiotic resistance was a key driver in the emergence and persistence of ST772-A.
Given the available epidemiological data, phylogeographic heterogeneity and the clone’s limited success to establish itself in regions outside its endemic range in South Asia (Figure 1a), there appears to be ongoing exportation of ST772 from the Indian subcontinent, associated with travel and family background in the region. This is supported by reports of MRSA importation in travelers, including direct observations of ST772 importation by returnees from India37. Our data suggest non-endemic spread within households and the community, including short-term outbreaks at two NICUs in Ireland. This pattern of limited endemic transmission is supported by reports of small transmission clusters in hospitals and households during a comprehensive surveillance study of ST772 in Norway16. The rapid emergence, global exportation and patterns of local transmission, together with a relatively homogenous genotype, emphasize the clone’s high transmissibility.
Overall, the pattern of spread mirrors other community-associated MRSA lineages such as USA30038,39, ST80-MRSA31 and ST596 where clones emerge within a particular geographic region, are exported elsewhere, but rarely become established and endemic outside of their place of origin. In contrast, healthcare-associated MRSA clones such as CC22-MRSA-IV (EMRSA-15)40 and ST239-MRSA-III41,42 demonstrate much stronger patterns of phylogeographic structure, consistent with importation into a country followed by local dissemination through the healthcare system. The pattern of dissemination and potential for survival of multidrug resistant clones in environments outside healthcare settings, as demonstrated in this study, may have important implications for infection control of community-associated MRSA.
Considering the widespread use of antibiotics and associated poor antibiotic regulation, poor public health infrastructure, and high population density in parts of South Asia, the emergence and global dissemination of multidrug resistant bacterial clones (both Gram-positive and Gram-negative) is alarming, and perhaps not surprising. Here, we demonstrate that the acquisition of specific antimicrobial resistance determinants has been instrumental in the evolution and emergence of a multidrug resistant community-associated MRSA clone. Global initiatives and funding to monitor the occurrence of emerging clones and resistance mechanisms, and support for initiatives in antimicrobial stewardship at community, healthcare and agricultural levels are urgently needed.
Methods
Isolates
Isolates were obtained from Australia (21), Bangladesh (3), Denmark (70), England (103), Germany (16), Hong Kong (6), India (44), Ireland (28), Italy (2), Netherlands (4), New Zealand (17), Norway (3), Saudi Arabia (1), Scotland (29) and the United Arab Emirates (1) between 2004 and 2012 (Supplementary Table 2). The collection was supplemented with six previously published genome sequences from India22,43,44. Notable samples include the initial isolates from Bangladesh and India10,11, two hospital-associated (NICU) clusters from Ireland20 and longitudinal isolates from a single healthcare worker at a veterinary clinic sampled over two consecutive weeks (VET) 45. Geographic regions were designated as Australasia (Australia, New Zealand), East Asia (Hong Kong), South Asia (India, Bangladesh), Arabian Peninsula (Saudi Arabia, United Arab Emirates) and Europe (Denmark, England, Germany, Ireland, Italy, Netherlands, Norway and Scotland).
Clinical data and epidemiology
Anonymised patient data was obtained for the date of collection, clinical symptoms, geographic location, epidemiological connections based on family or travel-history, and acquisition in nosocomial-or community-environments, where available (Supplementary Table 2). Clinical symptoms were summarized as SSTI (abscesses, boils, ulcers, exudates, pus, ear and eye infections), urogenital-(vaginal swabs, urine), bloodstream-(bacteremia) or respiratory-infections (pneumonia, lungs abscesses) and colonization (swabs from ear, nose, throat, perineum or environmental) (Supplementary Table 2, Supplementary Figure 9). Literature and sample maps (Supplementary Maps 1 and 2) were constructed with geonet, a wrapper for geographic projections with Leaflet in R (https://github.com/esteinig/geonet).
Where available, acquisition in community-or healthcare-environments was recorded in accordance with guidelines from the CDC. Community-associated MRSA is therein classified as an infection in a person who has none of the following established risk factors for MRSA infection: isolation of MRSA more than 48 h after hospital admission; history of hospitalization, surgery, dialysis or residence in a long-term care facility within one year of the MRSA culture date; the presence of an indwelling catheter or a percutaneous device at the time of culture; or previous isolation of MRSA46,47 (Supplementary Figure 9).
A valid epidemiological link to South Asia was declared if either travel-or family-background could be reliably traced to Bangladesh, India, Nepal or Pakistan. If both categories (travel and family) were unknown or one did not show a link to the region, we conservatively declared the link as unknown or absent, respectively. The longitudinal collection (n = 39) from a staff member at a veterinary hospital in England was treated as a single patient sample.
Sequencing, quality control and assembly
Unique index-tagged libraries were created for each isolate, and multiplexed libraries were sequenced on the Illumina HiSeq with 100 bp paired-end reads. Read quality control was conducted with Trimmomatic48, Kraken49 and FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc). Quality control identified a large proportion of reads classified as Enterococcus faecalis in sample HWM2178 (Supplementary Table 3). In silico micro-array typing (see below) identified an additional 13 isolates with possible intra-specific contamination due to simultaneous presence of agr I and II, as well as capsule types 5 and 8 (Supplementary Table 2). We excluded these isolates from all genomic analyses. Raw Illumina data were sub-sampled to 100x coverage and assembled with the SPAdes50 pipeline Shovill (https://github.com/tseemann/shovill), which wraps SPAdes, Lighter51, FLASH52, BWA MEM53, SAMtools54, KMC55 and Pilon56. Final assemblies were annotated with Prokka v1.1157. Samples from the veterinary staff member were processed and sequenced as described by Paterson et al.45.
MLST and SCC typing
In silico multi-locus sequence typing (MLST) was conducted using mlst (https://github.com/tseemann/mlst) on the assembled genomes with the S. aureus database from PubMLST (https://pubmlst.org/saureus/). Three single locus variants (SLVs) of ST772 were detected and retained for the analysis, describing sequence types ST1573, ST3362 and ST3857 (Supplementary Table 2). Sequences of experimentally verified sets of probes for SCC-related and other S. aureus specific markers58,59 were blasted against SPAdes assemblies (in silico micro-array typing), allowing prediction of presence or absence of these markers and detailed typing of SCC elements. We assigned MRSA to four isolates that failed precise SCC classification based on presence of mecA on the probe array and detection of the gene with Mykrobe predictor60.
Variant calling
Samples passing quality control (n = 340) were aligned to the PacBio reference genome DAR4145 from Mumbai and variants were called with the pipeline Snippy (available at https://github.com/tseemann/snippy) which wraps BWA MEM, SAMtools, SnpEff61 and Freebayes62. Core SNPs (n = 7,063) were extracted with snippy-core at default settings. We assigned canonical SNPs for ST772-A, as those present exclusively in all isolates of ST772-A, but not in the basal strains. Annotations of variants were based on the reference genome DAR4145.
Phylogenetics and recombination
A maximum-likelihood (ML) tree under the General Time Reversible model of nucleotide substitution with among-site rate heterogeneity across 4 categories (GTR + Γ), ascertainment bias correction (Lewis) and 100 bootstrap (BS) replicates was generated based on 7,063 variant sites (core-genome SNPs) in RaxML-NG 0.5.0 (available at https://github.com/amkozlov/raxml-ng), which implements the core functionality of RAxML63. The tree with the highest likelihood out of ten replicates was midpoint-rooted and visualized with interactive Tree of Life (ITOL) (Figure 1a, 2a, Supplementary Figure 6, 12a)64. In all phylogenies (Figures 1a, 2a, 3a, Supplementary Figures 6, 10, 12a) samples from the veterinary staff member were collapsed for clarity.
A confirmation alignment (n = 351) was computed as described above for resolving the pattern of divergence in the basal strains of ST772. The alignment included the CC1 strain MW2 as outgroup, as well as another known SLV of CC1, sequence type 573 (n = 10). The resulting subset of core SNPs (n = 25,701) was used to construct a ML phylogeny with RaxML-NG (GTR + Γ) and 100 bootstrap replicates (Supplementary Figure 1). We also confirmed the general topology of our main phylogeny as described above using the whole genome alignment of 2,545,215 nucleotides generated by Snippy, masking sites if they contained missing (-) or uncertain (N) characters across ST772 (not shown).
Gubbins65 was run on a complete reference alignment with all variant sites defined by Snippy to detect homologous recombination events, using a maximum of five iterations and the GTR + Γ model in RaxML (Supplementary Figure 10). A total of 205 segments were identified as recombinant producing a core alignment of 7,928 SNPs. Phylogenies were visualized using ITOL, ape66, phytools67, ggtree68 or plotTree (https://github.com/holtlab/plotTree/). Patristic distances to the root of the phylogeny (Supplementary Figure 2) were computed in the adephylo69 function distRoot.
Dating analysis
We used LSD v0.3 70 to obtain a time-scaled phylogenetic tree. This method fits a strict molecular clock to the data using a least-squares approach. Importantly, LSD does not explicitly model rate variation among lineages and it does not directly account for phylogenetic uncertainty. However, its accuracy is similar to that obtained using more sophisticated Bayesian approaches 71, with the advantage of being computationally less demanding.
LSD typically requires a phylogenetic tree with branch lengths in substitutions per site, and calibrating information for internal nodes or for the tips of the tree. We used the phylogenetic tree inferred using Maximum likelihood in PhyML72 (before and after removing recombination with Gubbins, as described above) using the GTR+Γ substitution model with 4 categories for the Γ distribution. We used a combination of nearest-neighbour interchange and subtree-prune-regraft to search tree space. Because PhyML uses a stochastic algorithm, we repeated the analyses 10 times and selected that with the highest phylogenetic likelihood. To calibrate the molecular clock in LSD, we used the collection dates of the samples (i.e. heterochronous data). The position of the root can be specified a priori, using an outgroup or by optimising over all branches. We chose the latter approach. To obtain uncertainty around node ages and evolutionary rates we used the parametric bootstrap approach with 100 replicates implemented in LSD.
An important aspect of analysing heterochronous data is that the reliability of estimates of evolutionary rates and timescales is contingent on whether the data have temporal structure. In particular, a sufficient amount of genetic change should have accumulated over the sampling time. We investigated the temporal structure of the data by conducting a regression of the root-to-tip distances of the Maximum likelihood tree as a function of sampling time73, and a date-randomisation test74. Under the regression method, the slope of the line is a crude estimate of the evolutionary rate, and the extent to which the points deviate from the regression line determines the degree of clocklike behaviour, typically measured using the R75. The date randomisation test consists in randomising the sampling times of the sequences and re-estimating the rate each time. The randomisations correspond to the distribution of rate estimates under no temporal structure. As such, the data have strong temporal structure if the rate estimate using the correct sampling times is not within the range of those obtained from the randomisations76. We conducted 100 randomisations, which suggested strong temporal structure for our data (Supplementary Figure 3). We also verified that the data did not display phylogenetic-temporal clustering, a pattern which sometimes misleads the date-randomisation test77.
Results from this analysis (substitution rates, and node age estimates) using phylogenies before and after removing recombination were nearly identical (Supplementary Figure 4, 5). We therefore chose to present results from our analysis after removing recombination.
Nucleotide diversity
Pairwise nucleotide diversity and SNP distance distributions for each region with n > 10 (Australasia, Europe, South Asia) were calculated as outlined by Stucki et al.78. Pairwise SNP distances were computed using the SNP alignment from Snippy (n = 7,063) and the dist.dna function from ape with raw counts and deletion of missing sites in a pairwise fashion. An estimate of average pairwise nucleotide diversity per site (π) within each geographic region was calculated from the SNP alignments using raw counts divided by the alignment length. Confidence intervals for each region were estimated using 1000 bootstrap replicates across nucleotide sites in the original alignment via the sample function (with replacement) and 2.5% - 97.5% quantile range (Figure 2b).
Population structure
We used the network-analysis and -visualization tool NetView79,80 (available at http://github.com/esteinig/netview) to delineate population subgroups in ST772. Pairwise Hamming distances were computed from the core SNP alignment derived from Snippy. The distance matrix was used to construct mutual k-nearest-neighbour networks from k = 1 to k = 100. We ran three commonly used community detection algorithms as implemented in igraph to limit the parameter choice to an appropriate range for detecting fine-scale population structure: fast-greedy modularity optimization81, Infomap82 and Walktrap83. We thereby accounted for differences in the mode of operation and resolution of algorithms. Plotting the number of detected communities against k, we were able to select a parameter value at which the results from the community detection were approximately congruent (Supplementary Figure 11).
Since we were interested in the large-scale population structure of ST772, we selected k = 40 and used the low-resolution fast-greedy modularity optimisation to delineate final population subgroups. Community assignments were mapped back to the ML phylogeny of ST772 (Figure 1a). All subgroups agreed with the phylogenetic tree structure and were supported by ≥ 99% bootstrap values (Supplementary Figure 12). One exception was isolate HW_M2760 located within ST772-A2 by phylogenetic analysis, but assigned to ST772-A3 by network analysis (Supplementary Figures 11, 12). This appeared to be an artefact of the algorithm, as its location and connectivity in the network representation matched its phylogenetic position within ST772-A2. The network and communities were visualized using the Fruchtermann-Reingold algorithm (Figure 1c), excluding samples from the veterinary staff member in Figure 1c (Supplementary Figure 11).
Local transmission clusters
We obtained approximate transmission clusters by employing a network approach supplemented with the ML topology and patient data, including date of collection, location of collection and patient family links and travel or family links to South Asia. We used pairwise SNP distances to define a threshold of 4 SNPs, corresponding to the maximum possible SNP distance obtained within one year under a core genome substitution rate of 1.61 x 10−6 nucleotide substitutions/site/year. We then constructed the adjacency matrix for a graph, in which isolates were connected by an undirected edge, if they had a distance of less or equal to 4 SNPs. All other isolates were removed from the graph and we mapped the resulting connected components to the ML phylogeny, showing that in each case the clusters were also reconstructed in the phylogeny, where isolates diverged from a recent common ancestor (gray highlights, Supplementary Figure 6). We then traced the identity of the connected components in the patient meta-data and added this information to each cluster. NICU clusters were reconstructed under these conditions.
Antimicrobial resistance, virulence factors and pan-genome
Mykrobe predictor was employed for antibiotic susceptibility prediction and detection of associated resistance determinants and mutations. Mykrobe predictor has demonstrated sensitivity and specificity > 99% for predicting phenotypic resistance and is comparable to gold-standard phenotyping in S. aureus60. Predicted phenotypes were therefore taken as a strong indication for actual resistance phenotypes in ST772. Genotype predictions also reflect multidrug resistance profiles (aminoglycosides, β-lactams, fluoroquinolones, MLS, trimethoprim) reported for this clone in the literature16–18,20,84,85. As most resistance-associated MGEs in the complete reference genome DAR4145 are mosaic-like and flanked by repetitive elements18, we used specific diagnostic genes present as complete single copies in the reference annotation of DAR414518 to define presence of the IRP (msrA) and Tn4001 (aacA-aphD). Mykrobe predictor simultaneously called the grlA mutations S80F and S80Y for quinolone resistant phenotypes. However, in all cases one of the variants was covered at extremely low median k-mer depth (< 20) and we consequently assigned the variant with higher median k-mer depth at grlA (Supplementary Table 6).
ARIBA86 with default settings and the core Virulence Factor database were used to detect the complement of virulence factors in ST772. We corroborated and extended our results with detailed in-silico microarray typing, including the presence of the egc gene cluster or S. aureus specific virulence factors such as the enterotoxin homologue ORF CM14. Differences in detection of relevant virulence factors between the in silico typing and ARIBA included, amongst others, lukS/F-PVL (337 vs. 336), sea carried on the φ-IND772 prophage (336 vs. 326), sec (333 vs 328) and sak (1 vs. 2). Since in silico microarray typing was based on assembled genomes and may therefore be prone to assembly errors, we used results from the read-based typing with ARIBA to assess statistical significance of virulence factors present in basal strains and ST772-A (Supplementary Figure 7).
Pan-genome analysis was conducted using Prokka annotated assemblies in Roary87, with minimum protein BLAST identity at 95% and minimum percentage for a gene to be considered core at 99% (Supplementary Figure 13). A gene synteny comparison between major SCCmec types was plotted with genoPlotR88 (Supplementary Figure 8). A nucleotide BLAST comparison between the extrachromosomal plasmid 11809-03 of USA300, the integrated resistance plasmid in the ST772 reference genome DAR4145 and the integrated plasmid region in strain 11819-07 of ST80 was plotted with geneD3 (https://github.com/esteinig/geneD3/), showing segments > 1kb (Supplementary File 1).
We searched for the three resistance regions which aligned to the 11819-07 and the 11809-03 plasmid (DAR4145 reference genome; R1: 1456024-1459959 bp, R2: 1443096-1448589 bp and R3: 1449679-1453291 bp) in all completed S. aureus genomes (including plasmids) in RefSeq (NCBI) and the NCTC3000 project (http://creativecommons.org/licenses/by-nc-nd/4.0/) using nctc-tools (https://github.com/esteinig/nctc-tools) and nucleotide BLAST with a minimum of 90% coverage and identity (n = 273). Since the IRP is mosaic-like and composed of several mobile regions, we only retained query results, if all three of the regions were detected (Supplementary Table 9). We then traced the integration sites in the accessions, determining whether integrations occurred the chromosome or plasmids. Multi-locus sequence types were assigned using mlst (https://github.com/tseemann/mlst).
Growth curves
S. aureus strains were grown overnight in 5 mL tryptic soy broth (TSB, Fluka) with shaking (180 rpm) at 37 °C. Overnight cultures were diluted 1:1000 in fresh TSB and 200 µL was added to a 96 – well plate (Costar) in triplicate. Growth was measured 37 °C, with shaking (300 rpm) using a FLUOROstar fluorimeter (BMG Labtech) using an absorbance wavelength of 600 nm. Growth curves represent the mean of triplicate results.
Cell culture conditions
The monocyte-macrophage THP-1 cell line was maintained in suspension in 30 mL Roswell Park Memorial Medium Institute (RPMI-1640) medium, supplemented with 10% heat-inactivated fetal bovine serum (FBS), 1 μM L-glutamine, 200 units/mL penicillin, and 0.1 mg/mL streptomycin at 37 °C in a humidified incubator with 5% CO2. Cells were harvested by centrifugation at 700 x g for 10 min at room temperature and re-suspended to a final density of 1–1.2 x 106 cells/mL in tissue-grade phosphate buffered saline, typically yielding >95 % viable cells as determined by easyCyte flow cytometry (Millipore).
Human erythrocytes were harvested from 10 mL of human blood following treatment in sodium heparin tubes (BD). Whole blood was centrifuged at 500 x g for 10 min at 4 °C. Supernatant (plasma) was aspirated and cells were washed twice in 0.9 % NaCl and centrifuged at 700 x g for 10 min. Cell pellet was gently re-suspended in 0.9 % NaCl and diluted to 1 % (v/v).
Cytotoxicity assay
To monitor S. aureus toxicity, S. aureus strains were grown overnight in TSB, diluted 1:1000 in 5 mL fresh TSB and grown for 18 h at 37 °C with shaking (180 rpm). Bacterial supernatants were prepared by centrifugation of 1 mL of bacterial culture at 20,000 x g for 10 min. For assessing toxicity to THP-1 cells, 20 μL of cells were incubated with 20 μL of bacterial supernatant and incubated for 12 min at 37 °C. Both neat and 30% diluted supernatant (in TSB) were used as certain S. aureus strains were considerably more toxic than others. Cell death was quantified using easyCyte flow cytometry using the Guava viability stain according to manufacturer’s instructions. Experiments were done in triplicate. For assessing haemolysis, 150 µL of 1% (v/v) erythrocytes were incubated with 50 µl of either neat and 30% supernatant in a 96 well plate for 30 min at 37°C. Plates were centrifuged for 5 min at 300 x g and 75 µL of supernatant was transferred to a new plate and absorbance was measured at 404nm using a FLUOROstar fluorimeter (BMG Labtech). Normalised fluorescence was achieved using the equation (At–A0) / (Am / A0) where At is the haemolysis absorbance value of a strain, A0 is the minimum absorbance value (negative control of 0.9% NaCl) and Am is the maximum absorbance value (positive control of 1 % triton X-100).
Lipase assay
Bacterial supernatants used in the above cytotoxicity assays were also used to assess lipase activity, using the protocol published by Cadieux et al. 89 with modifications. Briefly, 8mM para-nitrophenyl butyrate (pNPB), the short chain substrate, or para-nitrophenyl palmitate (pNPP), the long chain substrate, (Sigma) was mixed with a buffer (50mM Tris-HCl (pH 8.0), 1mg/ml gum Arabic and 0.005% Triton-X100) in a 1:9 ratio to create assay mixes. A standard curve using these assay mixes and para-nitrophenyl (pNP) (Sigma) was created, and 200µl of each dilution was pipetted into one well of a 96-well plate (Costar). 180µl of the assay mixes was pipetted into the remaining wells of a 96-well plate, and 20µl of the harvested bacterial supernatant was mixed into the wells. The plate was placed in a FLUOstar Omega microplate reader (BMG Labtech) at 37°C, and a reading at 410nm was taken every 5 min.s for 1h. The absorbance readings were converted to µM pNP released/min. using the standard curve.
Biofilm formation
Semi-quantitative measurements of biofilm formation on 96-well, round-bottom, polystyrene plates (Costar) was determined based on the classical, crystal violet method of Ziebuhr et al.90. 18 h bacterial cultures grown in TSB were diluted 1:40 into 100 µL TSB containing 0.5 % glucose. Perimeter wells of the 96-well plate were filled with sterile H2O and plates were placed in a separate plastic container inside a 37°C incubator and grown for 24 h under static conditions. Following 24 h growth, plates were washed five times in PBS, dried and stained with 150 μL of 1% crystal violet for 30 min at room temperature. Following five washes of PBS, wells were re-suspended in 200 μL of 7% acetic acid, and optical density at 595 nm was measured using a FLUOROstar fluorimeter (BMG Labtech). To control for day to day variability, a control strain (E-MRSA15) was included on each plate in triplicate, and absorbance values were normalised against this. Experiments were done using six technical repeats from 2 different experiments.
Statistical analysis
All statistical analyses were carried out in R or python and considered significant at p < 0.05, except for comparisons of proportions across the multiple virulence and resistance elements, which we considered be statistically significant at p < 0.01. Veterinary samples (n = 39) were restricted to one isolate (one patient, Staff_E1A) for statistical comparison of region of isolation, proportion of resistance, virulence and MSSA between basal strains and ST772-A (n = 302, Main, Figures 3d, Supplementary Figure 7). Differences in pairwise SNP distance and nucleotide diversity between all regions were assessed using non-parametric Kruskal-Wallace test and post-hoc Dunn’s test for multiple comparisons with Bonferroni correction, as distributions were assumed to be not normally distributed (Figure 2b, n = 340, Supplementary Figure 2). Phenotypic differences were assessed for normality with Shapiro-Wilk tests. We consequently used either Welch’s two-sided t-test or the non-parametric two-sided Wilcoxon rank-sum test (Figure 4, Supplementary Table 8).
Code availability
Core analyses, including parameter settings, cluster resource configurations and versioned software distributions are reproducible through the bengal-bay-0.1 workflow, which can be found along with other scripts and data files at our GitHub repository (https://github.com/esteinig/ST772) and can be executed with PathFinder (https://github.com/esteinig/pathfinder). The workflow implements Anaconda virtual environments, including software distributed in the Bioconda91 channel and is implemented in Snakemake92. Analyses were conducted on the Cheetah cluster at Menzies School of Health Research, Darwin.
Data availability
Short-read sequences have been deposited at ENA under accession numbers detailed in Supplementary Table 2. Additional isolates from India are available from the SRA under accession numbers SRR404118, SRR653209, SRR653212 and SRR747869-SRR747873. Outgroup strains used in the context phylogeny are available from ENA under accession numbers SRR592258 (MW2), ERR217298, ERR217349, ERR221806, ERR266712, ERR279022, ERR279023, ERR278908. ERR279026, ERR716976, ERR717011 (ST573). The ST772 reference genome DAR4145 is available at GenBank under accession number CP010526.1.
Author contributions
EJS, ST conducted the bioinformatics analysis; SD performed the dating analysis; SM, PS, PA performed in silico typing and provided bioinformatics support; DS provided support on the computing cluster; MY, ML, RM conducted phenotyping experiments; DAR, DW, AK, RG, ED, RE, SM, MI, MO, GC, AP, GB, AS, DC, AP, AM, HdL, HW, NK, HH, BS, FL, SP, SW, HA, LS, SH provided strains and relevant meta-data; EJS, ST, DAR, SM, MTGH wrote the manuscript; all authors contributed to critical review of the manuscript. ST directed the project with support from SB and JP.
Competing financial interests
There are no competing financial interests to declare.
Materials and correspondence
Steven Y.C. Tong
Acknowledgements
We thank the library construction, sequencing, and core informatics teams at the Wellcome Trust Sanger Institute. We also extend our gratitude to Anand Manoharan for comments on the manuscript and strains from India. ST is supported by an Australian National Health and Medical Research Council Career Development Award (#1145033). DAR is supported by NIH grant GM080602. DC and AS are supported by an Irish Health Research Board grant HRA-POR-2015-1051. MO is supported by an NHMRC project grant (#1065908).
References
- 1.↵
- 2.↵
- 3.
- 4.
- 5.
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.
- 15.↵
- 16.↵
- 17.
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.
- 44.
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵