ABSTRACT
Background Escherichia coli sequence type 131 (ST131) has emerged globally as the most predominant lineage within this clinically important species, and its association with fluoroquinolone and extended-spectrum cephalosporin resistance impacts significantly on treatment. The evolutionary histories of this lineage, and of important antimicrobial resistance elements within it, remain unclearly defined.
Results This study of the largest worldwide collection (n = 215) of sequenced ST131 E. coli isolates to date demonstrates that clonal expansion of two previously recognized antimicrobial-resistant clades, C1/H30R and C2/H30Rx, started around 25 years ago, consistent with the widespread introduction of fluoroquinolones and extended-spectrum cephalosporins in clinical medicine. These two clades appear to have emerged in the United States, with the expansion of the C2/H30Rx clade driven by the acquisition of a blaCTX-M-15-containing IncFII-like plasmid that has subsequently undergone extensive rearrangement. Several other evolutionary processes influencing the trajectory of this drug-resistant lineage are described, including sporadic acquisitions of CTX-M resistance plasmids, and chromosomal integration of blaCTX-M within sub-clusters followed by vertical evolution. These processes are also occurring for another family of CTX-M gene variants more recently observed amongst ST131, the blaCTX-M-14/14-like group.
Conclusions The complexity of the evolutionary history of ST131 has important implications for antimicrobial resistance surveillance, epidemiological analysis, and control of emerging clinical lineages of E. coli. These data also highlight the global imperative to reduce specific antibiotic selection pressures, and demonstrate the important and varied roles played by plasmids and other mobile genetic elements in the perpetuation of antimicrobial resistance within lineages.
IMPORTANCE
Escherichia coli, perennially a major bacterial pathogen, is becoming increasingly difficult to manage due to emerging resistance to all preferred antimicrobials. Resistance is concentrated within specific E. coli lineages, such as sequence type (ST) 131. Clarification of the genetic basis for clonally-associated resistance is key to devising intervention strategies.
We used high-resolution genomic analysis of a large global collection of ST131 isolates to define the evolutionary history of extended-spectrum beta-lactamase production in ST131. We documented diverse contributory genetic processes, including stable chromosomal integrations of resistance genes, persistence and evolution of mobile resistance elements within sub-lineages, and sporadic acquisition of different resistance elements. Both global distribution and regional segregation were evident. The diversity of resistance element acquisition and propagation within ST131 indicates aneed for flexible approaches to control and for ongoing surveillance.
BACKGROUND
Resistance to extended-spectrum cephalosporins in extra-intestinal pathogenic Escherichia coli (ExPEC) represents a major clinical challenge and is commonly caused by the presence of extended-spectrum beta-lactamases (ESBLs). Most ESBL-associated E. coli infections are due to a recently emerged, globally distributed ExPEC clone, sequence type (ST) 131. ST131 corresponds to serogroup 025b [1, 2] and belongs to phylogenetic group B2 [3, 4], It remains unclear which features of this clone have resulted in its recent widespread clinical dominance, although antimicrobial resistance and virulence factors are suspected contributors [5].
The blaCTX-M-15 beta-lactamasegene is the dominant ESBL gene in ST131, but other genetically divergent CTX-M genes also occur in this ST, particularly blaCTX-M-14/14-like variants, e.g. in Canada, China, and Spain [6, 7], The almost contemporaneous identification of blaCTX-M in ST131 strains from multiple geographic locations suggests repeated acquisition via multiple horizontal gene transfer events [8], Consistently, both blaCTX-M-15 and blaCTX-M-14/14-like variants occur on conjugative plasmids, especially multi-replicon IncFII plasmids additionally harboring FIA/FIB replicons [9].
Other data, however, suggest that the widespread distribution of these genes is mediated by clonal expansion of CTX-M-containing strains and global dissemination [10]. This is also a plausible hypothesis, since CTX-M plasmids can be inherited stably and blaCTX-M-15 and blaCTX-M-14 variants can also integrate into the chromosome [11-13], Nevertheless, clonal expansion of E. coli strains with chromosomally integrated blaCTX-M has not yet been demonstrated.
Two recent studies used whole genome sequence (WGS) data to investigate the population structure of ST131. The first found that ST131 expansion in the US has been driven by a single sub-lineage, H30, defined by the presence of a specific fimbrial adhesin allele, fimH30. Within H30, nested clades have emerged: H30R, containing mutations in the chromosomal genes gyrA and parC that confer fluoroquinolone resistance, and H30Rx, containing the same gyrA and parC mutations but additionally associated with blaCTX-M-15 [11]. The second study [14], which included samples from six locations around the world, resolved the ST131 population structure into three clades, A, B, and C, with clade C comprising two sub-groups, C1 and C2, corresponding to the H30R and H30Rx clades. However, this study included only four isolates from Asia, where ESBL ExPEC prevalence may be highest [15], Furthermore, neither study directly tested the competing hypotheses that ESBL dissemination in ST131 has occurred through multiple horizontal gene transfer events versus clonal expansion.
Here we used a broader set of ST131 WGS data, including many more isolates from Asia [16] and CTX-M-14/14-like-containing strains, alongside a subset of CTX-M plasmid sequences, to estimate the contribution of each potential route of dissemination to the worldwide prevalence ofST131.
RESULTS
The 215 ST131 genome sequences analyzed included 67 strains from various locations in Southeast Asia, 33 from Oxford in the United Kingdom, 11 from a global resistance surveillance program at AstraZeneca, 8 from Canada and 96 predominantly North American isolates previously reported by Price et al [11] (details on new isolates in Supplementary Table S1; these strains included both human and animal, and clinical and carriage isolates.)
Asian ST131 strains are consistent with the previously described core phylogeny, and the C1/H30 and C2/H30Rx clades emerged from a North American ancestor
For the 4,717,338 sites in the SE15 ST131 reference genome [17], the mean mapping call rate across the dataset was 93.3%. In total, 40,057 (0.85%) sites were variable, with 6,879 (0.15%) representing core, single nucleotide variants (SNVs) called in all 215 isolates. Overall, 611,770 (13%) sites were in recombinant regions, including 4,120 core SNVs, leaving 2,759 core, non-recombinant SNVs for phylogenetic analysis.
Consistent with the two previous WGS-based ST131 phylogenies [11, 14], the time-scaled phylogeny inferred from this ST131 dataset (which included >10 times more Asian isolates than considered previously), comprised three clades (Fig. 1),A(n = 25), B (n = 51), and C(n = 139), with C containing two sub-clades, C1 (n = 57) and C2 (n = 82), characterized by the presence vs. absence respectively of blaCTX-M-15 [14]. B and C1 were in fact paraphyletic groups rather than monophyletic clades, but we followed previous notation by calling them clades nonetheless. Isolates from all geographic regions were identified within each clade, although there were smaller, geographically restricted clusters within these (Fig. 1, tip color). This supports both global transmission and localized clonal expansion following specific introductions into a geographic locality.
The estimated time to most recent common ancestor (TMRCA) for the whole genomic dataset was ~130 years ago, when clade A diverged from clades B and C. Twenty-five years ago, clade C emerged out of the paraphyletic clade B, which was quickly followed by the split between sub-clades C1 and C2. The number of core SNVs separating the clades was approximately 250 for clades A vs. B/C, 50-60 for clades B vs. C1/C2, and 10-30 for clades C1 vs. C2. The evolutionary rate of ST131 was estimated in BEAST (see methods) at 2.46×10−7 mutations per site per year (95% CI: 2.18-2.75×10−7), equating to 1.00 (95% CI: 0.89-1.12) mutation per genome per year.
All possible geographic origins of the root of the ST131 lineage were inferred to be equally likely since the root is far back in time relative to the estimated migration rates. Clade A was inferred to originate in Southeast Asia with ~70% confidence (78% when the unsampled deme was included in the model – see methods), and the B/C clades from North America with ~88% confidence. The ancestral origin of C1/H30 and C2/H30Rx was strongly inferred as being in North America (98% confidence; 85% confidence when the unsampled deme was included in the model) with subsequent dissemination to Europe and Asia. Locations of more recent nodes are inferred with high confidence, as expected [18].
blaCTX-M, fimH and gyrA variants are strongly associated with specific ST131 clades
Overall, 105 (49%) ST131 isolates harbored blaCTX-M: blaCTX-M-15 was found in 74 isolates (34%), blaCTX-M-14 in 20 (9%), blaCTX-M-27 in 8 (4%), with one isolate with each of blaCTX-M-19, blaCTX-M-24, and blaCTX-M-55 (Fig. 1). blaCTX-M-15 was almost completely restricted to the C2 clade, as described previously[14], occurring in 69/82 (84%) C2 isolates but only sporadically in other clades (4/133 ;p < 0.001, Fisher’s exact test). blaCTX-M-14 and blaCTX-M-27 were also clustered within the two different clades A and C1, and completely absent from B and C2 (Fig. 1). Overall, the presence of shared blaCTX-M variants within clusters was constrained to those with a TMRCA of less than 25 years, suggestive of the emergence of blaCTX-M within ST131 after the widespread introduction of third generation cephalosporins in clinical practice.
The most common fimH variant was fimH30 (n = 123; 57%), followed by fimH22 (n = 24; 11%) and fimH41 (n = 21; 10%), whereas 23 isolates had novel fimH variants, and one was fimH-null. As observed for blaCTX-M, fimH alleles were strongly associated with clade, with 21/25 (84%) isolates in clade A having fimH41, 23/51 (45%) in clade B having fimH22, and 122/139 (88%) in clade C having fimH30 (p < 0.001; Fisher’s exact test).
Fluoroquinolone resistance mutations in gyrA and parC were also clade-associated, with isolates in clades A and B typically having no or only single mutations in these genes’ quinolone-resistance determining regions (QRDR) (Fig. 1). In contrast, most clade C isolates had double mutations in both gyrA and parC, shown to confer high-level fluoroquinolone resistance [19] (132/139 [95%]). The seven clade C isolates without these mutations (5 in C1 and 2 in C2) were sporadic, with two having non-fimH30 variants, suggesting intermittent recombination events affecting gyrA, parC, and fimH. The emergence of this double mutation, high-level fluoroquinolone-resistant genotype dated to 25-40 years ago, consistent with the introduction of fluoroquinolones in clinical practice.
blaCTX-M-15 in clade C2 is present in a consistent but short flanking structure, frequently truncated by IS26 elements and within different genetic backgrounds
In four of the 74 blaCTX-M-15-containing isolates, blaCTX-M-15 was present on two different contigs (C1353, JJ2643, CD358, JJ2434). In another isolate (JJ2547), the assembled contig with blaCTX-M-15 contained a series of “N”s, suggesting possible uncertainty around the contig assembly or multiple locations of the gene. These five isolates were excluded from further analysis of flanking regions. In the 69 remaining isolates (3 in A, 1 in C1, 65 in C2), blaCTX-M-15 was found downstream of a homologous tract of 48bp preceded by an ISEcp1 right-end inverted repeat region (IRR-R), and upstream of a homologous tract of 46bp followed by ORF477. This is consistent with the introduction of an ISEcp1-blaCTX-M-15-ORF477 unit within ST131, and subsequent rearrangement events affecting this structure.
In clade C2, blaCTX-M-15 was integrated into the chromosome of 8/65 (12%) isolates, with four unique integration events, one of which was stably present in a sub-cluster of five isolates with TMRCA in 2002 and spread across two geographical regions (Fig. 2). All chromosomal integration events were associated with an intact ISEcp1 upstream of blaCTX-M-15. In three isolates the ISEcp1-blaCTX-M-15-ORF477 unit was flanked by 5bp target site duplications consistent with transposition, and in one isolate the ORF477 was truncated, suggestive of either one-ended transposition [20], or standard transposition followed by a deletion event (Fig. 2).
In 27 of the 57 remaining C2 isolates, blaCTX-M-15 appeared plasmid-associated, either present in plasmid transformants (n = 20) or flanked by likely plasmid-associated sequences in the contig assemblies (n = 7). In the remaining 30 isolates the location of blaCTX-M-15 could not be defined due to limitations of the short-read assemblies. In all 57 isolates the upstream sequence was either an intact or truncated ISEcp1 sequence, and in 51/57 (89%) isolates the sequence downstream of ORF477 was either an intact or truncated ISSWi1-like (Tn2-like) structure (Fig. 3). In 12 isolates distributed throughout clade C2, a continuation of the ISSWil-like sequence was also observed upstream of the ISEcp1 sequence, consistent with the ISEcp1 element (flanked by a pair of 5bp repeats, all TCATA) being nested within a complete or partial ISSWil-like transposon. In 40/57 (65%) isolates IS26 repeat regions truncated either or both of these upstream and downstream contexts (Fig. 3).
blaCTX-M-14 and blaCTX-M-27 are present in diverse genetic backgrounds, and within a common ISEcp1-IS903B transposition unit
For blaCTX-M-14, evidence of chromosomal integration and propagation by descent was also found: two related blaCTX-M-14 isolates had ISEcp1-mediated chromosomal integration of blaCTX-M-14 downstream of the gatY gene (clade A, isolates HFMK328 and HFMK347). Six isolates had plasmid-associated blaCTX-M-14 based on annotated flanking sequences/transformants, whereas for the rest (n = 14) the location of blaCTX-M-14 was uncertain due to limitations of the de novo assemblies.
In all these isolates ISEcp1 was consistently located upstream of blaCTX-M-14, as with blaCTX-M-15, but at only 43bp distance, and the downstream flanking sequences were composed of either intact or truncated IS903B elements. In clade A, the genetic flanking sequences surrounding blaCTX-M-14 were consistent with the host strain sub-cluster, and homologous over the observed contig length within this sub-cluster (Supplementary Figure S1, Clade A CTX-M-14 sub-cluster [i]), supporting a single blaCTX-M-14 plasmid acquisition event followed by either evolution with plasmid inheritance or subsequent transfer of a blaCTX-M-14-containing genetic unit within the sub-cluster. The flanking sequence for isolate la_5108_T in clade Cl also incorporated an ISEc23 element downstream of IS903B, and was homologous to that in clade A CTX-M-14 sub-cluster [i]), (Supplementary Figure S2), suggesting horizontal transfer of this genetic unit between clades.
Six of eight isolates with blaCTX-M-27 were closely related in clade C1, again supporting a single plasmid acquisition event. However, they also all contained bilateral truncation of the ISEcp1-blaCTX-M-IS903B structure by IS26 elements, which occurred in four different contexts, suggesting frequent IS26-mediated blaCTX-M-27 transposition events within this sub-cluster (Supplementary Figure S2).
Plasmid replicon analysis demonstrates a degree of clade-associated plasmid segregation suggestive of ancient IncF plasmid acquisition events
The predominant replicon family was IncF, identified in 206/215 ST131 isolates (96%). Specific IncF variants differed in frequency, with FII found in 199/215 isolates (93%), FIB in 155/215 (72%), FIA in 145/215 (67%), and FIC in 17/215 (8%). Specific IncF replicons and combinations thereof were clade-associated (Table 1). A number of non-F Inc types were also identified; of these, IncH was associated with clade B and Incl with clade C1 (Table 1). Col-like plasmids were also common (189/215 isolates [88%]); however, there was no clear association of any Col-type with clade (Supplementary Figure S3).
A specific FII variant (GenBank: AY458016; pC15-1a; consistent with pMLST allele 2) was significantly associated with clade C2 (48/82 C2 isolates versus 18/153 non-C2 isolates, p < 0.001, Fisher’s exact test). Within clade C2 a further 23 isolates had eight different FII_AY458016-like variants containing up to 12 SNVs between them; almost all of these variants were in isolates with FIA-FIB-FII replicon combinations (Supplementary Figure S3). Of the 11 clade C2 isolates without FII_AY458016-like replicon variants, four contained a plasmid with a different FII replicon (GenBank: AJ851089; pRSB107, 35 SNVs different from FII_AY458016; consistent with pMLST allele 1), five had chromosomally integrated blaCTX-M-15 (of which four also contained an FII_AJ851089-like plasmid), one was blaCTX-M negative, and one contained deletions in blaCTX-M-15. There were only nine clade C2 isolates with FII_AY458016-like replicons but no blaCTX-M-15. The different FII replicon containing blaCTX-M-15 in clade C2, FII_AJ851089, was also clade-associated, being found predominantly in clades A (13/25 isolates, 52%) and C1 (41/57, 72%) rather than B (12/51, 24%) and C2 (8/82, 10%) (p < 0.0001, Fisher’s exacttest) (Supplementary Figure S3). Overall, this strongly suggests the ancestral acquisition of the FII_AY458016 replicon within clade C2, its association with blaCTX-M-15 and the expansion of the clade, its evolution in the presence ofFIA-FIB, and its sporadic loss.
Plasmid transformants demonstrate similarities and differences in blaCTX-M-15 plasmids from ST131 clades and other sequence types
Sequence data were generated for 30 transformed blaCTX-M plasmids (relevant source strains labeled “T” on Fig. 1): four from clade A, containing blaCTX-M-15 (n = 1), blaCTX-M-14 (n = 2), and blaCTX-M-27 (n = 1); one from clade B, containing blaCTX-M-55; three from clade C1, containing blaCTX-M-14 (n = 2) and blaCTX-M-24 (n = 1); 20 from clade C2, containing blaCTX-M-15; and two blaCTX-M-15 plasmids from non-ST131 isolates (Supplementary Table S2). A comparison of the mean percentage pairwise differences between all transformant pairs versus the divergence time of the two host strains demonstrated that all blaCTX-M transformant plasmids shared at least 10% homology but could be genetically divergent (Fig. 4); plasmids found in different STs could be very similar (up to ~90% sequence homology); and plasmid genetic similarity correlated with host strain divergence time for recently diverged host strains (up to ~30 years) but was much more variable for more remotely diverged host strains.
Most transformant plasmids were IncF, except in strains 11B00320_T and la_7619_T. BLASTn-based comparisons revealed that the clade A blaCTX-M-15 Incl transformant (11B00320_T; isolated in Mae Sot, Thai-Myanmar border) was circulating in a limited fashion (Fig. 5), but with substantial sequence homology to the other two clade A CTX-M-15-positive isolates (JJ2591, Minneapolis, USA and AZ779845, Spain). Although we did not have transformants or specific plasmid sequences for these, the blaCTX-M-15-containing contig assembled for JJ2591 was 88,693bp long and very similar to the 11B00320_T assembly, whereas the AZ779845 blaCTX-M-15-containing contig was 32,228bp long and likewise highly similar in structure (Fig. 5). These data suggest that an IncI-CTX-M-15 plasmid is responsible for sporadic, horizontal introductions of blaCTX-M-15 into ST131 with a wide geographic distribution.
Genetic comparisons amongst the blaCTX-M-14/14-like transformant plasmids revealed that three shared strikingly similar genetic structures, two of which (uk_8A9B_T, Oxford, UK and cam_1071_T, Siem Reap, Cambodia) were identified in clade A, in host strains with a TMRCA within the last 15 years, and one in clade C1 (la_5108_T, Vientiane, Laos) (Fig. 6). BLASTn-based comparisons across all 215 ST131 sequences demonstrated that many isolates in clades A (predominantly sub-cluster [i]) and C1 contained highly similar genetic sequences, as did small numbers of isolates in clade C2. One of these, a blaCTX-M-15 transformant plasmid in clade C2, 11B01979_T (isolated in Mae Sot, Thai-Myanmar border), also showed significant homology to uk_8A9B_T, cam_1071_T and la_5108_T (Fig. 6); suggesting that both blaCTX-M-14 and blaCTX-M-15 variants can be accommodated on the same plasmid background.
The isolates containing blaCTX-M-55 and blaCTX-M-24 (one SNV derivatives of blaCTX-M-15 and blaCTX-M-14, respectively) apparently resulted from discrete plasmid acquisition and/or blaCTX-M transposition events within ST131 (Supplementary Figure S4). These were not therefore shown to represent blaCTX-M evolution within established blaCTX-M-15 or blaCTX-M-14 plasmid backgrounds.
Nineteen of 20 blaCTX-M-15 transformant plasmids from clade C2 contained an FII_AY458016-like replicon, supporting the association of IncFII_AY458016 with blaCTX-M-15. When 17/20 IncFII_AY458016-containing transformant plasmids from clade C2 were compared with each other a significant degree of homology was evident (Fig. 7; excluding 8A16G_T, 11B01979_T, 19B19L_T – see methods). However, only eight coding sequences were shared with 100% nucleotide similarity, including: blaCTX-M-15, blaoxA-1, aac(6’)-Ib-cr, a glucose-l-phosphatase-like-enzyme, a CAAX amino terminal protease self-immunity protein, a hypothetical phage protein, and apemI/pemK plasmid addiction system. This lack of gene conservation suggests that significant genetic exchange and rearrangement occurs amongst these plasmids as they evolve within the sub-clade.
DISCUSSION
Our WGS analysis of the largest and most diverse collection of ST131 isolates to date (n = 215) demonstrates conclusively that the global emergence of drug-resistant clades (H30, H30Rx) occurred approximately 25 years ago, most likely in a North American context, and consistent with strong selection pressure exerted by the widespread introduction and use of fluoroquinolones and extended-spectrum cephalosporins. Although members of each ST131 clade have dispersed globally, within specific geographic regions smaller clonal ST131 outbreaks occur at all genetic levels (gene, flanking context, plasmid, and host strain), indicating that both horizontal gene transfer and clonal expansion have contributed to the global dissemination of this sequence type. The estimated molecular evolutionary rate of ST131 (1.00 mutation per genome per year) is similar to previous estimates from ST131 [21] and the species overall [22], strongly suggesting that ST131’s epidemiological success is not due to a higher-than-average mutation rate.
Our study shows that the apparent persistence of particular blaCTX-M variants within specific ST131 clades is due to diverse mechanisms. These include (i) acquisition of a blaCTX-M-containing plasmid by a specific host strain sub-cluster, followed by evolution and spread across geographic regions (e.g. clade A blaCTX-M-14 sub-cluster [i], Figs. 6 and Supplementary Figure S2); (ii) multiple discrete acquisition events involving blaCTX-M-containing plasmids (e.g. blaCTX-M-55, blaCTX-M-24; Supplementary Figure S4); (iii) horizontal transfer of common plasmid structures across clades (e.g. the IncI blaCTX-M-15 plasmid, Fig. 5); and (iv) chromosomal integration of blaCTX-M and evolution by descent (e.g. blaCTX-M-15, Fig. 2; blaCTX-M-14). Despite this high degree of genetic plasticity, we also found clear structuring of blaCTX-M variants and plasmid content, with the near-complete absence of blaCTX-M in clade B, and associations of blaCTX-M-14/14-like variants with clade A and clade C1/H30R, of blaCTX-M-15 with clade C2/H30Rx, and of specific combinations of IncF replicons with certain clades. This supports the hypothesis that some plasmid replicons are acquired and persist stably within clades. Although the evolutionary dynamics of plasmid-host combinations remain to be clearly elucidated, co-evolution of host and plasmid in the case of C2/H30Rx appear to have ameliorated costs to the host and facilitated persistence of the replicon[23, 24], with on-going conjugative exchange of genetic material. The relative contribution of changing environmental influences on this co-evolution is unclear; it may also be affected by a host-plasmid “arms race” in a micro-evolutionary version of the “Red Queen Hypothesis” (antagonistic co-evolution)[25, 26].
The almost ubiquitous presence of blaCTX-M-15 in clade C2/H30Rx is most striking, and is strongly associated with the acquisition of an IncFII_AY458016-like replicon. Previous smaller studies have found that blaCTX-M-15 is frequently part of a 2,971bp ISEcp1-blaCTX-M-15-ORF477 transposition unit, with ISEcp1 located 48bp downstream of the ISEcp1 IRR-R, and that this is commonly nested within a Tn2-like element (ISSWil) [27], One hypothesis is that an IncFII_AY458016 ancestral plasmid was acquired by a fluoroquinolone-resistant C1 host strain approximately 25 years ago, and subsequently incorporated one of these blaCTX-M-15 transposition units. In response to the widespread clinical use of third-generation cephalosporins and fluoroquinolones the C2/H30Rx clade has expanded, and within it blaCTX-M-15 has been mobilized through further transposition events (e.g. to the chromosome) and rearrangement/recombination amongst IncFII-like plasmids, much of this associated with IS26 [28] (Fig. 3). The persistence of the IncFII_AY458016-like replicon in C2 may be attributable, at least in part, to its association with aplasmid addiction system (pemI/pemK) [29], whereas its ongoing evolution is potentially linked to the concomitant presence of FIA/FIB replicons on blaCTX-M-15 plasmids (Supplementary Figure S3)[30], Alternative hypotheses – either of multiple, C2-restricted acquisitions of different blaCTX-M-15 -containing FII_AY458016-like plasmids, or of recurrent ISEcp1-blaCTX-M-15-ORF477 unit acquisitions – seem less likely, given that (i) there are no geographic or major genotypic distinctions between clades C1 and C2 to explain why this would occur, (ii) there is a degree of homology in the flanking contexts around the gene throughout the clade, and (iii) flanking context/transformed plasmid structures also appear to be consistent within C2 sub-clusters.
Our novel comparison of transformed, sequenced plasmids demonstrates, however, that a substantial degree of similarity can exist amongst blaCTX-M plasmids found in different clades and STs. This confirms that between-clade/ST transfer of these resistance plasmids occurs, and that care is needed when inferring plasmid evolution by descent (Fig. 4). Plasmid similarity across geography in the context of host strain phylogenetic clustering and homology in regions flanking blaCTX-M (as demonstrated here) is much more likely to represent plasmid acquisition and evolution by descent rather than multiple acquisition events, but still needs to be interpreted with caution, as it may, for example, represent exposure to a common, global, plasmid reservoir.
The main study limitation is the inability with short-read sequencing and limited transformant sequencing to assess fully the flanking regions and plasmid structures across the entire dataset. In particular, the BLASTn-based heatmaps across the wider dataset represent not genetic contiguity of plasmid structures within isolates as such, but instead overall plasmid sequence presence/absence. Similarly, results from de novo assemblies of these short-read data also must be interpreted cautiously, as these assembly methods are known to increase the number of SNVs when compared with mapping-based approaches, and may result in misinterpretations of genetic structures, particularly repetitive regions [31], Further, again relating to the limitations of short-read data, the transformant plasmid sequences comprise multiple contigs, precluding certainty as to the plasmids’ exact structure. Wider use of long-read sequencing (e.g. PacBio) could help resolve this in future studies. Many of our H30Rx/C2 clade transformant plasmids were from a single UK center; however, the genetic flanking contexts identified here have also been found in plasmid sequences from other national and international locations [27, 32-34], suggesting that these are dispersed more widely and that our results are likely generalizable.
In summary, our analysis strongly suggests that the emergence of the C2/H30Rx clade within ST131 has been driven by the acquisition of a specific FII plasmid, which has subsequently undergone major genetic restructuring within its globally dispersing bacterial host. The initial acquisition event occurred approximately 25 years ago, possibly associated with the widespread introduction of third-generation cephalosporins and fluoroquinolones, which would have exerted significant selection pressure for persistence of chromosomal fluoroquinolone mutations and presence of blaCTX-M. Sporadic gain/loss events of other, non-FII blaCTX-M-15 plasmids have also occurred, but have not dominated. Similar processes may be driving the more recent emergence of sub-lineages of ST131 with blaCTX-M-14 and blaCTX-M-27, as described in Japan (18), although for blaCTX-M-14, these appear to have occurred on at least two occasions (clades A and C1/H30R; Fig. 1). This study highlights the global imperative to reduce antimicrobial selection pressures; the capacity of these resistance plasmids for genetic re-assortment; the important role of certain insertion sequences, such as IS26, in facilitating horizontal mobility of resistance determinants; and the possibility of targeting specific replicons in an attempt to limit the spread of important resistance gene mechanisms.
METHODS
Sample collection, sequencing and sequence read processing
Isolates were obtained from wider collections held in several centers: the Shoklo Malaria Research Unit, Mae Sot, Thailand; the Lao-Oxford-Mahosot Hospital Wellcome Trust Research Unit, Vientiane, People’s Democratic Republic of Laos; the Cambodia-Oxford Medical Research Unit, Angkor Hospital for Children, Siem Reap, Cambodia; the Microbiology Laboratory, Oxford University Hospitals NHS Trust, Oxford, UK. Strains were de-duplicated by individual prior to sequencing. In addition, seven isolates collected from clinical samples across Canada between 2006 and 2008, and one isolate recovered from poultry in 2006 were included. DNA was extracted as previously described [35]. Sequence data for the eight AstraZeneca strains had been generated from a series of isolates collected by International Health Management Associates, Inc, as part of a global resistance survey; that for the Price strains was as previously described [11]. Sequencing was performed using either the Illumina HiSeq or MiSeq (100 or 151bp paired-end reads [details for non-Price strains in Supplementary Table S1]). Correct sequence type was confirmed using BLASTn-based in silico MLST typing of de novo assembled WGS data [36].
Sequence data for all ST131 strains were mapped against the E. coli SE15 (ST131) reference (RefSeq: NC_013654) [17] and variants called using a validated in-house pipeline [37]. Alignments of core variable sites (base called in all sequences, excluding “N” or “-” calls) were reinserted into the reference to form an alignment of modified reference sequences.
De novo assemblies were generated using Velvet with the VelvetOptimizer wrapper (n=211), or A5-MiSeq [38], The latter was used in cases where the number of assembled bases was below the expected assembly size of 4-5.5Mb (n=4 [strains la_12107_3, can_70883, can_1731_01 and can_1070] in which median optimized assembly size with Velvet was 16,004 bases, and median number of contigs only six). Using A5-Miseq, assemblies for these four strains were generated with an appropriate median size of 5,143,908 bp and 269 contigs.
Identification/characterization of blaCTX-M and genetic context, gyrA mutations and fimH typing
BLASTn of de novo assemblies was used to identify: (i) blaCTX-M presence and variant (in-house reference gene database) [35]; (ii) genetic context for blaCTX-M, by extracting and annotating contigs containing blaCTX-M variants using PROKKA and ISFinder (manual annotation) [39, 40]; (iii) chromosomal gyrA mutations in the quinolone-resistance determining region known to be responsible for conferring most resistance to fluoroquinolones; (iv) fimH presence and variant [41]; and (v) Inc type using the downloaded PlasmidFinder [42] and pMLST databases (available at http://pubmlst.org/plasmid/) [43], Genetic contexts for blaCTX-M were classified as chromosomal if annotations for regions flanking blaCTX-M were found to be consistently chromosomal in other E. coli strains in GenBank, and plasmid if these were associated specifically with plasmids (e.g. tra genes); otherwise, they were classified as unknown. IncFII_AY458016-like sequences were extracted, aligned and visually inspected to confirm variant types using Geneious (Version 7.1.9; http://geneious.com) [44].
ST131 chromosomal phylogenetic comparisons using ClonalFrame, BEAST and BASTA
ExPEC are recombinogenic, and contain recombination hotspots with higher than average recombination rates [45]. Recombination can obscure the true phylogenetic signal, and we therefore initially analyzed the alignment of sequences with ClonalFrame [46] to identify recombinant regions; any SNVs within these regions were excluded.
Using this modified alignment, mutation rate estimates across ST131 and atime-scaled phylogeny were calculated in BEAST [47], The model parameters were: (i) a generalized time-reversible nucleotide substitution model, (ii) four relative rates of mutation across sites, allowing for all sites to be subject to mutation (i.e. the proportion of invariant sites fixed at 0%), (iii) a strict molecular clock estimating a uniform evolutionary rate across all branches of the tree, and (iv) a constant population size. Triplicate runs with 30 million iterations were performed, with 10% discounted as burn-in. Run convergence and mixing was assessed by inspecting the run log files in Tracer v1.5 [48]; adequate convergence of run statistics and mixing for each run and effective sample sizes (ESS) for all parameters greater than 200 were required for an analysis to be considered adequate, in line with recommendations in the BEAST tutorials on the developers’ website (http://beast.bio.ed.ac.uk). We explored the application of several other models in BEAST incorporating the relaxed-clock and variable population growth (exponential, logistic and Bayesian skyride), but these either failed to converge, showed poor mixing, or had effective sample size estimates (ESS) of <200, and were therefore not considered robust.
We used the phylogeographic method BASTA [18] in the Bayesian phylogenetic package BEAST 2.2.1 [49] to infer patterns and rates of migration between geographical regions from the genome alignment, collection dates, and sampling locations. Initially, we grouped samples into three discrete locations: North America, South-East Asia, and Europe, and disregarded samples from South America and Australasia because of the small sample numbers. Due to the non-random sampling scheme, we only estimated a single effective population size, equal for all locations, and a symmetric migration rate matrix. The analysis was run for 108 Monte Carlo Markov Chain (MCMC) steps. We subsequently re-ran the analysis including a fourth, unsampled deme, using the same model parameters, to determine whether this altered the outcome.
Plasmid transformations, sequencing and analyses
Plasmid transformants were generated from 30 strains chosen on the basis of tree topology and association with CTX-M variants, aiming to transform at least one plasmid from each of the major CTX-M variant clusters. Two blaCTX-M containing plasmids from non-ST131 E. coli (one ST617/blaCTX-M-15, one ST405/blaCTX-M-55) were also transformed and sequenced as an external comparison.
Plasmid DNA was extracted from sub-cultures of frozen stock grown overnight on blood agar, followed by selective culture of a single colony in Luria-Bertani (LB) broth with ceftriaxone at lμg/mL. DNA extraction was performed using the Qiagen plasmid mini-kit (Qiagen, Venlo, Netherlands), in accordance with the manufacturer’s instructions, with the addition of Glycoblue™ co-precipitate (Life Technologies, Carlsbad, USA) to the DNA eluates prior to isopropanol precipitation to enable better visualization of the DNA pellet. Plasmid DNA was re-dissolved in distilled water and then typically electroporated on the same day, or stored in the fridge prior to electroporation within 24 hours.
Commercially prepared DH10B E. coli (ElectroMAX™ DH10B™ Cells; Invitrogen/Life Technologies, Carlsbad, California, USA) were used as the recipient cell strain for plasmid electroporation, because of their high transformation efficiencies and the fact that the strain has been fully sequenced (NCBI RefSeq: NC_010473.1) [50], Briefly, 2μ1 of plasmid DNA (extracted as above) were mixed with 20μ1 of electrocompetent cells in a small Eppendorf tube on ice. The mix was then pipetted into a pre-chilled 0.2cm electroporation cuvette (Bio-Rad, Hercules, California, USA), placed in the MicroPulser™ electroporator, and an electric shock was applied (Ec2 settings; typically 2.5 kV applied for less than 5msec). The shocked cells were immediately suspended in 9mls of pre-warmed SOC medium (Super Optimal Broth with Catabolite repression) in a clean Eppendorf tube, and incubated at 37°C, whilst being shaken at 220rpm, for one hour. One hundred microliters of the transformant cell suspension was then plated onto pre-warmed LB agar plates (Becton Dickinson, Franklin Lakes, New Jersey, USA), infused with ceftriaxone (1μg/mL). pUC19 DNA was provided with the purchased cells and was used as a positive control for the success of transformation. Selective agars were incubated with known positive and negative control strains with each set of transformations.
Sequencing was performed on the IlluminaHiSeq or MiSeq generating 150 or 300-base paired-end reads (Supplementary Table S2). Sequencing reads from the isolate from which the transformed plasmid had been obtained were mapped back to the transformed plasmid assembly in order to ascertain the reliability of the assembly in each case. Reads were assembled using A5/A5-MiSeq [38], and assembled contigs annotated with PROKKA [39]. The median plasmid assembly size was 122,786 (range: 72,449-171,919), with a median of 22 contigs in each assembly (range: 1-33). Using longer reads (300bp; MiSeq platform) resulted in a significantly smaller number of contigs per assembly (median 17 versus 25; ranksum p=0.003). Mapping was used to assess the reliability of our plasmid constructs and reflected the content present in each transformed and assembled resistance plasmid, with the exception of 8A16G_T.
A single strain (P46212) from the dataset was also sequenced using long-read technology (PacBio); the CTX-M-15 plasmid (pP46212) from this strain was assembled into a single, circularized contig as described [51].
Plasmid content across the dataset was investigated in a number of ways. Firstly, the transformant plasmid sequences generated were used as references, against which BLASTn-based comparisons for degree of presence/absence were made for the whole dataset. We used default BLASTn settings, with the respective reference divided into 100bp bins. Secondly, pairwise comparisons between each set of transformant assemblies were undertaken by identifying the extent of shared homology of sequences across a number of subsets representing different groupings on the main host tree, again using BLASTn. For each pair, two percentage-similarity statistics were generated, taking each member of the pair as a reference in turn, to account for differences in length, and using default BLASTn settings. The mean percentage pairwise divergence was then plotted against the time to most recent common ancestor (TMRCA) of the two host strains (derived from the time-scaled tree). Thirdly, for visualization, plasmid sequences were compared using ProgressiveMauve [52], with assembled contigs reordered with respect to the pP46212 PacBio-generated CTX-M-15 plasmid reference, using the “Move contigs” tool. For this three transformants were excluded: 8A16G_T because of issues surrounding the assembly, 11B01979_T because it was virtually identical to blaCTX-M-14 plasmid transformants in clade A, and 19B19L_T because it lacked an FII replicon. Finally, annotated, nucleotide sequences across transformant groups of interest were clustered using CD-Hit [53] [-c 1.0 -n 5 -d 0 -g 1],to identify whether any coding sequences were shared, and whether there might be any biological significance associated with these on the basis of their annotations.
Sequencing data for the new isolates sequenced for this study have been deposited in the NCBI short read archive (BioProject number: PRJNA297860, 108 ST131 sequences and 30 blaCTX-M plasmid transformants).
FUNDING INFORMATION
JRJ has received grants and/or consultancies from Actavis, ICET, Jannsen/Crucell, Merck, Syntiron, and Tetraphase. JRJ, LBP, and ES have submitted patent applications pertaining to tests for specific E. coli strains. This material is based in part upon work supported by Office of Research and Development, Medical Research Service, Department of Veterans Affairs, grant #1 I01 CX000192 01 (JRJ) and NIH R01 AI106007 (EVS). ARM is supported through funding from the Canadian Institutes of Health Research (MOP-114879). NS was funded through a Wellcome Trust Clinical Research Fellowship during this study (099423/Z/12/Z).
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication
COMPETING INTERESTS
The authors have no specific conflicts of interest to declare.
SUPPLEMENTARY FILE LEGENDS
Supplementary Table S1. Details of new ST131 strains included in the analysis.
Supplementary Figure S1. Genetic contexts of clade A-associated blaCTX-M-14/14-like variants. Many contexts are limited by the extent of the assembled region around the blaCTX-M-14/14-like gene (marked with “X”). For all aligned, similarly colored regions, sequence homology is preserved; curly brackets cluster those isolates with identical flanking sequences. Flanking contexts not shown for isolates with known chromosomal integration, or for blaCTX-M negative/non-blaCTX-M-14/14-like isolates in the sub-clusters. Coloring of isolate names reflects geographic locations (blue = North America, red = Europe, green = South-East Asia, Yellow = Australasia).
Supplementary Figure S2. Genetic contexts of clade C1-associated blaCTX-M-14/14-like variants. Many contexts are limited by the extent of the assembled region around the blaCTX-M-15 gene (marked with “X”). For all aligned, similarly colored regions, sequence homology is preserved; curly brackets cluster those isolates with identical flanking sequences. Flanking contexts not shown for isolates with known chromosomal integration, or for blaCTX-M negative/non-blaCTX-M-14/14-like isolates in the sub-cluster, (blue = North America, red = Europe, green = South-East Asia, Yellow = Australasia).
Supplementary Figure S3. Inc types identified in whole isolate sequencing data, plotted with respect to ST131 host strain phylogeny. Blast match (%) denotes a composite score of percentage matched length and percentage homology to reference Inc sequence, with highest percentage score/contig hit represented. Matches < 80% were excluded. Reference Inc sequences were downloaded from the PlasmidFinder database; those that were present (Blast match ≥ 80%) in at least one isolate are represented on the x-axis.
Supplementary Table S2. Details of transformed blaCTX-M plasmid sequences.
Supplementary Figure S4. BLASTn-based comparisons across the ST131 dataset, using la_12107–3 and la_5220–3_T as references. Color represents degree of presence/absence of corresponding reference sequence on an isolate-by-isolate basis per row. Rows/isolates arranged as in the Fig. 1 phylogeny.
ACKNOWLEDGEMENTS
We are grateful to the patients and staff at the healthcare, microbiology laboratory and research units contributing isolates to this study, including Prof Nicholas Day of the Mahidol-Oxford Tropical Medicine Research Unit, Bangkok, Thailand; Prof Paul Newton and Dr David Dance of the Lao-Oxford-Mahosot Hospital-Wellcome Trust Research Unit, Vientiane, Laos; the Ped Study Team and Microbiology Laboratory at Patan Hospital, Kathmandu, Nepal; and Prof Francois Nosten of the Shoklo Malaria Research Unit, Mae Sot, Thailand. We thank Prof Peter Donnelly and the staff at the Sequencing Center, Wellcome Trust Center for Human Genetics, Oxford, UK, for their sequencing work, and Dr Laura Matseje of the Public Health Agency of Canada for sharing her laboratory protocol for plasmid transformation. We are grateful to Prof Johann Pitout, Prof Nicholas Day, Dr Amy Mathers and Dr Chris Parry for their critical review of the draft manuscript.