RT Journal Article SR Electronic T1 Identification of Klebsiella capsule synthesis loci from whole genome data JF bioRxiv FD Cold Spring Harbor Laboratory SP 071415 DO 10.1101/071415 A1 Kelly L. Wyres A1 Ryan R. Wick A1 Claire Gorrie A1 Adam Jenney A1 Rainer Follador A1 Nicholas R. Thomson A1 Kathryn E. Holt YR 2016 UL http://biorxiv.org/content/early/2016/08/24/071415.abstract AB Background Klebsiella pneumoniae and close relatives are a growing cause of healthcare-associated infections for which increasing rates of multi-drug resistance are a major concern. The Klebsiella polysaccharide capsule is a major virulence determinant and epidemiological marker. However, little is known about capsule epidemiology since serological typing is not widely accessible, and many isolates are serologically non-typeable. Molecular methods for capsular typing are needed, but existing methods lack sensitivity and specificity and fail to take advantage of the information available in whole-genome sequence data, which is increasingly being generated for surveillance and investigation of Klebsiella.Methods We investigated the diversity of capsule synthesis loci (K loci) among a large, diverse collection of 2503 genome sequences of K. pneumoniae and closely related species. We incorporated analyses of both full-length K locus DNA sequences and clustered protein coding sequences to identify, annotate and compare K locus structures, and we propose a novel method for identifying K loci based on full locus information extracted from whole genome sequences.Results A total of 134 distinct K loci were identified, including 31 novel types. Comparative analysis of K locus gene content detected 508 unique protein coding gene clusters that appear to reassort via homologous recombination, generating novel K locus types. Extensive nucleotide diversity was detected among the wzi and wzc genes, both within and between K loci, indicating that current typing schemes based on these genes are inadequate. As a solution, we introduce Kaptive, a novel software tool that automates the process of identifying K loci from large sets of Klebsiella genomes based on full locus information.Conclusions This work highlights the extensive diversity of Klebsiella K loci and the proteins that they encode. We propose a standardized K locus nomenclature for Klebsiella, present a curated reference database of all known K loci, and introduce a tool for identifying K loci from genome data (https://github.com/katholt/Kaptive). These developments constitute important new resources for the Klebsiella community for use in genomic surveillance and epidemiology.K-typecapsule typeK locuscapsule synthesis locusISinsertion sequenceCDScoding sequenceLPSlipopolysaccharide