ABSTRACT
The dissemination of carbapenem resistance in Escherichia coli has major implications for the management of common human infections. blaKPC, encoding a transmissible carbapenemase (KPC), has historically largely been associated with Klebsiella pneumoniae, a predominant plasmid (pKpQIL), and a specific transposable element (Tn4401, ~10kb). Here we characterize the genetic features of the emergence of blaKPC in global E. coli, 2008-2013, using both long-and short-read whole genome sequencing.
Amongst 43/45 successfully sequenced blaKPC-E. coli strains, we identified high strain (n=21 sequence types, 18% of annotated genes in the core genome); plasmid (≥9 replicon types); and blaKPC-associated, mobile genetic element (MGE) diversity (50% not within complete Tn4401 elements). We also found evidence of interspecies, regional and international plasmid spread. In several cases blaKPC was found on high copy number, small Col-like plasmids, previously associated with horizontal transmission of resistance genes in the absence of antimicrobial selection pressures.
E. coli is a common human pathogen, but also a commensal in a multiple environmental and animal reservoirs, and easily transmissible. The association of blaKPC with a range of MGEs previously linked to the successful spread of widely endemic resistance mechanisms (e.g. blaTEM, blaCTX-M) suggests that it is likely to become similarly prevalent.
INTRODUCTION
Carbapenemases have emerged over the last 15 years as one of the most significant antimicrobial resistance threats in Enterobacteriaceae, many species of which are major human pathogens1. They are enzymes with broad-spectrum hydrolytic activity targeting most beta-lactams, and commonly associated with other resistance mechanisms producing cross-resistance to other antimicrobial classes2. The Klebsiella pneumoniae carbapenemase (KPC) enzyme, encoded by alleles of the blaKPC gene, represents one of the five major carbapenemase families, others being the VIM, IMP and NDM metallo-beta-lactamases, and the OXA-48-like oxacillinases3. The first KPC-producer, a K. pneumoniae strain harbouring blaKPC-2, was identified in 1996 in the eastern USA; since then, KPC-2 and KPC-3 (H272Y [C814T] with respect to KPC-2) have become widespread, and particularly entrenched in endemic hotspots in the USA, Greece, Israel, China and parts of Latin America4,5. The epidemic K. pneumoniae lineage, ST258, is thought to have contributed significantly to the global dissemination of blaKPC-2/blaKPC-36, although these genes have now been observed in several species in the family Enterobacteriaceae7,8.
Acquired carbapenem resistance in Escherichia coli was considered rare as recently as 2010, although the first cases of blaKPC in E. coli were observed as early as 2004-2005 in Cleveland (n=1, KPC-29), New York City (n=2, KPC-2), New Jersey, USA (n=1, KPC-3)10, and Tel Aviv, Israel (n=4, KPC-2)11,12. No apparent epidemiological links were observed between any of these cases. Genotyping was limited at this time, but supported diversity being present in both host E. coli and blaKPC plasmid backgrounds. Since then, direct, plasmid-mediated transfer of blaKPC into E. coli within human hosts has been observed13, and clusters of KPC-producing E. coli have been identified in several geographic locations, from China to Puerto Rico14,15, and in the context of clinical infections14,15, asymptomatic colonization16 and in environmental isolates17.
More recently there has been significant concern around the identification of blaKPC in E. coli sequence type (ST) 131, a globally disseminated and clinically successful strain18-20. Notably, the H30R/C1 clade (fluoroquinolone-resistant) and H30Rx/C2 clade (fluoroquinolone and extended-spectrum cephalosporin-resistant) sub-lineages of this strain have previously expanded globally in association with particular drug resistance mechanisms, including the extended-spectrum beta-lactamase (ESBL) gene, blaCTX-M-15 (clade C2)21,22. Given the high rates of community and healthcare-associated infections attributable to ST13123, and its capacity to be harbored asymptomatically in the gastrointestinal tract24, a stable association of ST131 with blaKPC could have dramatic consequences for the management of E. coli infections12.
Despite these concerns, there are very limited detailed molecular epidemiological data investigating the genetic structures associated with blaKPC in E. coli and the extent to which these may have been shared with other Enterobacteriaceae. Here we used a combination of short-read (Illumina) and long-read (PacBio) sequencing to reconstruct the chromosome and plasmid sequences of 44 blaKPC-positive E. coli isolates obtained consecutively from global surveillance schemes (67 participating countries, 2008-2013), fully resolving the blaKPC-containing plasmids in 24 cases, and comparing these data with other publicly available blaKPC plasmid sequences.
RESULTS
Global blaKPC-E. coli strains are diverse, even within the most prevalent ST, ST131, with evidence for local transmission
45 isolates were obtained from 21 cities in 11 countries across four continents (2010-2013; previous laboratory typing results summarized in Supplementary Table 1). One isolate was blaKPC negative on sequencing (ecol_252), and for one isolate the whole genome sequencing (WGS) data were inconsistent with the lab typing result on two occasions (ecol_451); these isolates were therefore excluded from analyses. Twenty-one different E. coli STs were represented amongst the remaining 43 isolates (Table 1; predicted in silico from whole genome sequencing), including: ST131 [n=16], ST410 [n=4], ST38 [n=3], ST10, ST69 [n=2 each] (remaining isolates singleton STs).
Of 16,053 annotated open reading frames (ORFs) identified across all KPC E. coli isolates, only 2,950 (18.4%) were shared in all isolates (“core”), and a further 222 (1.4%) in 95-<100% of isolates (“soft core”25). At the nucleotide level there were 213,352 single nucleotide variants (SNVs) in the core genome, consistent with the previously observed diversity in the species26. Resistance gene profiles also varied markedly between strains, with some harbouring several beta-lactam, aminoglycoside, tetracycline and fluoroquinolone resistance mechanisms (e.g. ecol_224) and others containing blaKPC only (e.g. ecol_584; Figure 1). For the 16 KPC-ST131 strains, 4,071/7,910 (50%) ORFs were core, with 6,778 SNVs across the core genome of these isolates, again consistent with previous global studies of ST131 diversity21,22 (Supplementary Figure S1). Accessory genomes were highly concordant for some (e.g. ecol_356/ecol_276/ecol_875), but not all (e.g. ecol_AZ159/ecol_244) isolates that were closely related in their core genomes, supporting highly variable evolutionary dynamics between core and accessory genomes (Figure 1). The geographic distribution of isolates closely related in both the core and accessory genomes supports local (e.g. ecol_AZ166, ecol_AZ167 [ST131, Beijing, China]) transmission of particular KPC-E. coli strains. The homology of genetic flanking motifs around the blaKPC genes in these closely related isolate pairs would also be consistent with this hypothesis, and less consistent with multiple acquisition events of blaKPC within the same genetic background, especially given the diversity in blaKPC flanking sequences observed across the rest of the dataset (see below).
blaKPC genes appear restricted to plasmid contexts in E. coli, but may exist in multiple copies on single plasmid structures or in high copy number plasmids
Thirty-four isolates (80%) contained blaKPC-2, and nine isolates (20%) blaKPC-3. Chromosomal integration of blaKPC has been previously described in other Enterobacteriaceae, Pseudomonas and Acinetobacter spp. but remains rare8,27,28; there was no evidence of chromosomal integration of blaKPC in either the 18 chromosomal structures reconstructed from long-read sequencing or in blaKPC-containing contigs from all evaluable 42 isolates. blaKPC alleles were not segregated by ST.
Estimates of blaKPC copy number per bacterial chromosome varied between <1 (ecol_879, ecol_881) and 55 (ecol_AZ152). In nine cases this estimate was ≥10 copies of blaKPC per bacterial chromosome (ecol_276, ecol_356, ecol_867, ecol_869, ecol_870, ecol_875, ecol_AZ150, ecol_AZ152, ecol_AZ159, Supplementary Table 2). Six of these isolates contained blaKPC in a col-like plasmid context, in two cases the plasmid rep type was unknown, and in one case it was an IncN replicon. Plasmid copy number is associated with higher levels of antibiotic resistance if the relevant gene is located on a high-copy unit. Interestingly, high copy number plasmids are postulated to have higher chances of fixing in descendant cells, as they distribute more adequately by chance and without the requirement for partitioning systems29, and of being transferred in any conjugation event, either directly or indirectly30-32.
blaKPC and non- blaKPC plasmid populations across global blaKPC-E.coli strains are extremely diverse
Plasmid Inc typing across all isolates revealed the presence of a median of four plasmid replicon types per isolate (range: 1-6; IQR: 3-5), representing wide diversity (Table 1). However, IncN, col, IncFIA and IncI1 replicons were disproportionately over-represented in certain STs (p<0.05; Table 1). Within the 18 isolates that underwent PacBio sequencing, we identified 53 closed, non-blaKPC plasmids, ranging from 1,459 bp to 289,903 bp (Supplementary Table 1; at least four additional, partially complete plasmid structures were present). Of these non-blaKPC plasmids, 10 (size: 2,571-150,994 bp) had <70% similarity (defined by percent sequence identity multiplied by proportion of query length demonstrating homology) to other sequences available in GenBank, highlighting that a significant proportion of the “plasmidome” in KPC-E. coli remains incompletely characterized. For the other 43 plasmids, the top match in GenBank was a plasmid from E. coli in 35 cases, K. pneumoniae in 5 cases, and Citrobacter freundii, Shigella sonnei, Salmonella enterica in 1 case each (Supplementary Table 3).
Twenty-three blaKPC plasmid structures were fully resolved (five from Illumina data), ranging from 14,029 bp to 287,067 bp (median = 55,434 bp; IQR: 44,320-85,865 bp). These blaKPC-containing plasmids, and five additional cases where blaKPC was identified on a replicon-containing contig, were highly diverse based on Inc typing (Supplementary Table 1). IncN was the most common type (n=7; 30%), followed by small, col-like plasmids (n=5 [col-like plasmids with single replicons only]; 22%). Other less common types were: A/C2, FII(k), U (all n=2); and L/M, P, Q1 and R (all n=1). Four (14%) blaKPC plasmids were multi-replicon constructs, namely: col/repA, FIB/FII, FIA/FII, and FIA/FII/R.
Common IncN plasmid backbones have dispersed globally within E. coli
From GenBank, we selected all unique, fully sequenced IncN-blaKPC plasmid sequences (Supplementary Table 4) for comparison, dating from as early as 2005, around the time of the earliest reports of KPC-producing E. coli. The plasmid backbones and flanking sequences surrounding blaKPC in these 16 plasmid references and a subset of 12 study sequences (see “Methods”) were consistent with multiple acquisitions of two known IncN-Tn4401-blaKPC complexes in divergent E. coli STs: firstly, within a Plasmid-9 (FJ223607, 2005, USA)-like background, and secondly, within a Tn2/3-like element in a Plasmid-12 (FJ223605, 2005, USA)-like background.
In the first instance, genetic similarities were identified between Plasmid-9, pKPC-FCF/3SP, pKPC-FCF13/05, pCF8698, pKP1433 (representing a hybrid IncN), and blaKPC plasmids from isolates ecol_516, ecol_517, ecol_656, and ecol_736 (this study). Plasmid-9 contains duplicate Tn4401b elements in reverse orientation with four different 5bp flanking sequences in an atypical arrangement within a group II intron33. The backbone structures of the other plasmids in this group are consistent with a separate acquisition event of a Tn4401b element between the pld and traG regions within an ancestral version of the Plasmid-9 structure, with the generation of a flanking TTCAG target site duplication (TSD) (labelled as Plasmid 9-like plasmid (hypothetical), Figure 2). International spread followed by local evolution both within and across species would account for the differences between plasmids, including: (i) nucleotide level variation (observed in all plasmids); (ii) small insertion/deletion events (observed in all plasmids); (iii) larger insertion/deletion events mediated by transposable elements (e.g. pCF8698_KPC_2); and (iv) likely homologous recombination, resulting in clustered variation within a similar plasmid backbone (e.g. ecol_656/ecol_736), as well as more distinct rearrangements, including the formation of “hybrid” plasmids (e.g. pKP1433)(Figure 2).
In Plasmid-12 (FJ223605), Tn4401b has inserted into a hybrid Tn2-Tn3-like element (with associated drug resistance genes including blaTEM-1, blaOXA-9, and several aminoglycoside resistance genes), albeit in the absence of target sequence duplication, possibly as the result of an intra-molecular, replicative transposition event generating mismatched target site sequences (L TSS = TATTA; R TSS = GTTCT). This complex is in turn located between two IS15DIV (IS15Δ)/IS26-like elements flanked by 8bp inverted repeats, and located between the traI (891bp from 3’ end) and pld loci (~28Kb; Figure 3A). The backbone components of the IncN Plasmid-12 are consistent with those seen in an NIH outbreak8 and in a rearranged version in a University of Virginia outbreak (CAV1043; 2008)7. From this study, plasmids from ecol_224, ecol_881, ecol_AZ159, ecol_422, and scaffolds from ecol_AZ151, ecol_744, ecol_AZ150 all share near identical structures to Plasmid-12, with clustered nucleotide level variation present in the traJ-traI genes, consistent with a homologous recombination event affecting this region, and evidence of sporadic insertion/deletion events (Figure 3A). However, the blaKPC-Tn4401 structures in these isolates are almost entirely degraded by the presence of other mobile genetic elements (MGEs), including Tn2/Tn3-like elements, ISKpn8/27 and Tn1721. In ecol_224, blaKPC-2 has been inserted into the IncN backbone as part of two repeat, inverted Tn3-like structures, flanked by a TTGCT TSD, and closer to traI (136bp from 3’ end) than the aforementioned IS15DIV (IS15Δ)/IS26-like complex in Plasmid-12 (Figure 3B). Although it is not possible to accurately trace the evolutionary history of this genomic region given the available data, the presence of shared signatures of this structure in ecol_422, ecol_744, ecol_881, ecol_AZ159, ecol_AZ150 and ecol_AZ151 suggest a shared acquisition, and multiple subsequent rearrangements mediated by the presence of the large number of MGEs flanking blaKPC-2.
Col-like plasmids may represent an important vector of transmission for blaKPC in E. coli
Small col-like plasmids were the second most common type of plasmid carrying blaKPC in E. coli (n=5 [plasmids with single replicons only]), but three of these were identical (blaKPC-2, 16,559bp), all isolated in Pittsburgh, USA, from ST131 isolates across a two year timeframe (ecol_276 [PacBio; 2010], ecol_356 [2011], ecol_875 [2013]). These three isolates additionally contained FIA, FIB, FII, X3 and X4 replicons, suggesting stable persistence of a clonal strain+plasmids over time, consistent with both SNV/core and accessory genome analyses (Figure 1, Figure S1).
The other two col-like plasmids effectively represent short stretches of DNA encoding different mobilization genes (mbeA/mbeC/mbeD) harnessed to Tn4401/blaKPC modules. The 5bp sequences flanking Tn4401 were consistent with direct, intermolecular transposition in both cases (ecol_870: TGTTT-TGTTT; ecol_867: TGTGA-TGTGA). A col/ repA co-integrate plasmid was also observed in this dataset (ecol_AZ161), in which Tn4401b was inserted between colE3 signature sequences and a Tn3 element (Tn4401 TSS: AGATA-GTTCT). The formation of such cointegrate plasmid structures in E. coli has also been previously described34, including that of a fused col/pKpQIL-like plasmid structure (pKpQIL being historically associated with blaKPC)35.
Col-like plasmids have been associated with KPC-producers in other smaller, regional studies19,36. Of concern, these small vectors have been shown to be responsible for the inter-species diffusion of qnr genes mediating fluoroquinolone resistance, even in the absence of any obvious antimicrobial selection pressure37. The significant association of col-like plasmids with particular E. coli STs (predominantly ST131) in this study could be one explanation for the disproportionate representation of blaKPC in this lineage.
Diverse Tn4401 5bp target site sequences (TSSs) support high transposon mobility
Complete Tn4401 isoforms flanking blaKPC-2 or blaKPC-3 were observed in only 24/43 (56%) isolates, including Tn4401a/a-like (n=10; one isolate with a contig break upstream of blaKPC), Tn4401b (n=12), and Tn4401d (n=2) variants. Eleven different 5bp target site sequence (TSS) pairs were identified, of which 7 (64%) were not observed in any comparison plasmid downloaded from GenBank (Supplementary table 5). Tn4401a had three different 5bp TSSs, Tn4401b seven, and Tn4401d one. Most represented TSDs, but in three cases different 5bp TSSs were flanking Tn4401, consistent with both direct inter-and replicative intra-molecular transposition events.
From the full set of Genbank plasmids and in vitro transposition experiments carried out by others, 30 different types of 5bp TSS pairs have been characterized, seven in the experimental setting only38. The downloaded plasmids come from a range of species and time-points (2005-2014), although they may under-represent wider Tn4401 insertion site diversity as a result of sampling biases. Our data however would be consistent with significant Tn4401 mobility within E. coli following acquisition of diverse Tn4401 isoforms and/or represent multiple importation events into E. coli from other species.
The traditional association of blaKPC with Tn4401 has been significantly eroded in KPC plasmids in E. coli
Notably, in the other 19/43 (44%) isolates the Tn4401 structure had been degraded through replacement with MGEs, only some of which have been previously described39,40. Two isolates had novel Tn4401Δb structures (upstream truncations by IS26 [ecol_270] or IS26-ΔIS5075 [ecol_584]). A Tn4401e-like structure (255bp deletion upstream of blaKPC) was present in three isolates (ecol_227, ecol_316, ecol_583): this was further characterized in one complete PacBio plasmid assembly (ecol_316) and represented a rearrangement at the site of the L TSS of the ISKpn7 element. In this plasmid, a second, partial Tn4401 element was present without blaKPC, which would be consistent with an incomplete, replicative, intra-molecular transposition event (GGGAA = L TSS and R TSS on the two Tn4401b elements, in reverse orientation). Other motifs flanking blaKPC included: hybrid Tn2/Tn3 elements-ISKpn8/27-blaKPC (n=1; ecol_224); IS26-ΔtnpR(Tn3)-ISKpn8/27- blaKPC-ΔTn1721-IS26 (n=5; ecol_AZ153-AZ155, ecol_AZ166, ecol_AZ167); ISApu2-tnpR(Tn3)- ΔblaTEM -blaKPC- korC-klcA-ΔTn1721-IS26 (n=1; ecol_542); IS26-tnpR(Tn3)-ΔblaTEM-blaKPC-korC-IS26 (n=1; ecol_545); hybrid Tn2/Tn3 elements + ΔblaTEM-blaKPC-ΔΤn1721 (n=2; ecol_744, ecol_422), Tn3 elements-ΔblaTEM-blaKPC- ΔTn1721 (n=4; ecol_881, ecol_AZ151, ecol_AZ159, ecol_AZ150) and ΔTn3-Δ-ΔIS3000 (Tn3-like) (n=1; ecol_AZ152). We were unable to assess the flanking context of blaKPC in ecol_452 due to limitations of the assembly.
This apparent diversity in independently MGEs around the blaKPC gene is a major concern, as it extends the means by which blaKPC can be mobilized. Interestingly, as observed previously41, all the degraded Tn4401 sequences in this dataset were associated with variable stretches of flanking Tn2/3-like sequences, suggesting that the insertion of Tn4401 into a Tn2/Tn3-like context may have enabled the latter to act as a hotspot for the insertion of other MGEs7. A particular worry is the association with IS26, which has been linked to the dissemination of several other resistance genes in E. coli, including CTX-M ESBLs22,42; is able to increase the expression of closely co-located resistance genes43; participates in co-integrate formation and hence plasmid rearrangement44; and enhances the occurrence of other IS26-mediated transfer events into plasmids harbouring IS2644.
DISCUSSION
This study of an unselected set of KPC-E. coli, obtained from two global resistance surveillance schemes, has demonstrated that the genetic structures associated with blaKPC are highly diverse, at all genetic levels, including: (i) host bacterial strain; (ii) plasmid types; (iii) associated transposable MGEs, including transposons and insertion sequences; and (iv) blaKPC alleles. This has previously been observed within institutional, poly-species outbreaks, particularly for non-E. coli Enterobacteriaceae7,8, as well as in a more recent study of nine KPC-E. coli from the US45. We have identified evidence of global and regional spread at the strain and plasmid levels, including signatures consistent with inter-species spread of plasmids in both these geographic contexts, over short timeframes. Although the geographic reach of sampling has been more substantial than any other similar study to date, some limitations in the sampling consistency of both the SMART and Astra Zeneca surveillance schemes has been observed20 (e.g. isolates from China were only submitted to these schemes in 2008, 2012 and 2013).
We utilized long-read sequencing methods on only a subset of isolates, given resource limitations, which allowed us to completely resolve chromosomal and plasmid structures in less than half the isolates. Nevertheless, despite this drawback, we have still highlighted the extraordinary diversity amongst these strains. This study, along with other recent analyses utilizing long-read sequencing to fully close important antimicrobial resistance plasmid structures7,8, also demonstrates the difficulty in making adequate evolutionary comparisons between these structures, given the absence of any effective phylogenetic methods to characterize the genetic histories for these structures where rearrangements are common, and events are not restricted to single nucleotide mutations.
This study has demonstrated the particular association of blaKPC in E. coli with IncN plasmid structures, which have been associated with the spread of other antimicrobial resistance elements46, as well as col-like plasmids, which are small, potentially highly mobile, and generally high copy number units. It has also highlighted that the traditional association of blaKPC with Tn4401 has been eroded in E. coli, with the complete Tn4401 structure absent in 50% of strains investigated. This finding is in contrast to the majority of global descriptions in K. pneumoniae where blaKPC has been stably associated with largely intact Tn4401 isoforms for more than a decade. Instead, multiple other shorter mobile units, such as Tn2/Tn3-like elements and IS26, now appear to be commonly involved in the dispersal of blaKPC in E. coli. These MGEs have been associated with the spread of multiple resistance mechanisms, such as blaTEM and blaCTX-M, and will potentially similarly contribute to the dispersal of blaKPC in E. coli. We did not undertake any functional assays investigating the experimental dynamics of blaKPC transmission in E. coli to support this hypothesis, but this would be illuminating and important work for future studies.
The global emergence and spread of blaKPC in E. coli has been driven by multiple mechanisms, including local and international spread of highly genetically related strains, exchange of plasmids with other Enterobacteriaceae and between E. coli lineages, transposition events within the species, and a breakdown of the traditional association of blaKPC with Tn4401. The genetic flexibility observed is impressive, and concerning, particularly given that only a reasonably small number of KPC-E. coli over a short timeframe were characterized.
The diversity observed in this study has major implications for both surveillance and the clinical epidemiology of E. coli. Tracking the spread of resistance genes in the context of such multi-level genetic variability is complicated, even with a high-resolution typing method such as whole genome sequencing. The association of E. coli, both a common pathogen and commensal in a wide range of environmental/animal reservoirs, with MGEs (col-like plasmids, IS26) that have been shown to facilitate the dissemination of other successful resistance genes even in the absence of antimicrobial selection pressures, may represent a difficult, if not impossible, situation to control.
METHODS
Isolate collection and sampling frames
Isolates were obtained from two global antimicrobial resistance surveillance schemes (The Merck Study for Monitoring Antimicrobial Resistance Trends [SMART], 2008-2012; AstraZeneca global surveillance study of antimicrobial resistance, 2012-2013; 417 institutions operating in 95 countries), as previously described20. Of 55,874 isolates collected, 45 (0.08%) were positive for blaKPC by PCR (n=7 from 2010, 10 from 2011, 13 from 2012, 15 from 2013 – Supplementary table 1). Isolates had been previously characterized using partial, sequenced-based typing methods, including multi-locus sequence typing (MLST; Achtman scheme), fimH typing, PCR for beta-lactamases, strain/plasmid PFGE (Supplementary table 1) 20.
DNA extraction, sequencing and sequence data processing
All isolates were sequenced on the Illumina MiSeq; a subset of 18 were purposively selected for PacBio sequencing, to represent a range of years of isolation, geographic location, standard ST, plasmid size and resistance gene content (based on laboratory typing). DNA for sequencing was extracted from sub-cultures of bacterial stocks (frozen at -80°C) using the Qiagen Genomic tip 100/G extraction kit, as per the manufacturer’s instructions (Qiagen, Hilden, Germany; catalogue no: 10243).
DNA libraries for MiSeq sequencing were generated and normalized using 300 base, paired-end Nextera XT DNA library preparation kits (Illumina, San Diego, CA, USA). PacBio sequencing on the subset of strains was performed as previously described 47; in these cases, the same DNA extract was used for both Illumina and PacBio sequencing approaches.
Short-read Illumina data was processed as previously described22. Core variable sites (base called in all sequences, excluding “N” or “-” calls) derived from mapping to the SE15 reference were “padded” with invariant sites in a proportion consistent with the GC content and length of the reference genome (4.72Mb, 51% average GC content), to generate a modified alignment of input sequences to generate phylogenies. These were done using RaxML (Version 7.7.6) 48, with a generalized time reversible model, four gamma categories (relative rates of mutation across categories), bootstrapped 100 times. De novo assemblies of short-read Illumina data were generated using the A5-MiSeq pipeline (version 20140604)49.
Plasmid and chromosome structures were closed by resolving repeats at the ends of assembled, polished, PacBio contigs. Illumina reads were mapped to the resulting assemblies using bwa-MEM version 0.7.9a-r786 with default settings50. Read pileups were visualized in Geneious51; mismatches between the sequence derived from mapping and the reference PacBio assemblies were inspected manually to identify the correct structure, resulting in a final consensus sequence used for subsequent analyses and submission to GenBank. Unmapped reads were de novo assembled using the A5-MiSeq pipeline 2014060449 to capture small plasmids that may have been filtered out due to size-selection of DNA fragments >7,000 bases implemented prior to PacBio sequencing.
All plasmid structures and de novo assemblies were annotated using PROKKA52, with subsequent manual refinement of annotations for regions of interest using BLASTn 53 and the NCBI bacterial and ISFinder databases 54. Alignments of sequence structures were visualized and modified in Geneious.
Core/accessory genome comparisons
These were undertaken using the pangenome pipeline, ROARY 25, by inputting the *.gff files generated from the PROKKA annotation of each of the Illumina de novo assemblies (default settings). Comparisons were made separately for all isolates and the ST131 subset. The output gene_presence_absence.csv files were processed using the pheatmap function in R. Resistance genes were identified using ResisType, an inhouse tool [scripts available at: https://github.com/hangphan/resisType]. These were plotted on the maximum likelihood phylogenies using the Ape package in R.
Comparisons with publicly available KPC plasmid sequences
All complete KPC RefSeq sequences available in GenBank in May 2015 were identified using the search terms “plasmid” + “KPC” + “complete sequence”. The resulting list was filtered manually to exclude any additional sequences present that were not complete plasmid sequences. In total 73 plasmid sequences were included (Supplementary Table 5).
For the IncN plasmid comparisons, we included the following from our dataset: (i) six cases where PacBio sequencing had fully resolved the blaKPC IncN plasmid; (ii) two cases where Illumina sequencing had fully resolved the blaKPC IncN plasmid; (iii) two cases where the IncN rep and blaKPC were co-located on the same, incomplete contig; and (iv) two cases where blaKPC present in isolates containing an IncN rep and on contigs that had similar plasmid backbones to the other IncN plasmids under scrutiny.
Availability of Data and Materials
The data sets (Illumina raw reads, PacBio assemblies) supporting the results of this article are available in NCBI’s GenBank/SRA, under the project accession: PRJNA316786 (https://www.ncbi.nlm.nih.gov/bioproject/?term=316786).
AUTHOR CONTRIBUTIONS
NS and JP conceived of the study. Significant contributions to sample collection, laboratory processing and sequencing were made by GP, LWA, LP, PB, MRM, NS and JP. Short-read (Illumina) sequencing was performed by LWA and LP; long-read (PacBio) sequencing by RS and AK. Sequence data processing and analysis were performed by AES, HTTP and NS. NS drafted the manuscript, which was reviewed and improved by all authors, including ASW, TEAP, DWC and AJM.
ADDITIONAL INFORMATION
FUNDING INFORMATION
NS is currently funded through a Public Health England/University of Oxford Clinical Lectureship; the sequencing work was also partly funded through a previous Wellcome Trust Doctoral Research Fellowship (#099423/Z/12/Z). Additional funding support was provided by a research grant from Calgary Laboratory Services (#10006465), and by the Health Innovation Challenge Fund (a parallel funding partnership between the Wellcome Trust [WT098615/Z/12/Z] and the Department of Health [grant HICF-T5-358]). This research was supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Center (BRC) Program, and the Health Protection Research Unit (NIHR HPRU) in Healthcare Associated Infections and Antimicrobial Resistance at the University of Oxford, in partnership with Public Health England (PHE).
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
Competing interest
The authors declare that they have no competing interests.
ACKNOWLEDGEMENTS
We acknowledge the contributions of the laboratory, healthcare and administrative teams contributing to the SMART and Astra Zeneca global antimicrobial resistance surveillance programs, and the Modernising Medical Microbiology Informatics Group (MMMIG). For this study, the MMMIG consisted of Adam Giess, Carlos Del Ojo Elias, Milind Acharya, Nicholas Sanderson, Trien Do and Vasiliki Kostiou.