Abstract
Epidemiological surveillance of bacterial pathogens requires real-time data analysis with a fast turn-around, while aiming at generating two main outcomes: 1) Species level identification; and 2) Variant mapping at different levels of genotypic resolution for population-based tracking, in addition to predicting traits such as antimicrobial resistance (AMR). With the recent advances and continual dissemination of whole-genome sequencing technologies, large-scale population- based genotyping of bacterial pathogens has become possible. Since bacterial populations often present a high degree of clonality in the genomic backbone (i.e., low genetic diversity), the choice of genotyping scheme can even facilitate the understanding of ancestral relationships and can be used for prediction of co- inherited traits such as AMR. Multi-locus sequence typing (MLST) fits that purpose and can identify sequence types (ST) based on seven ubiquitous genome- scattered loci that aid in genotyping isolates beneath the species level. ST-based mapping also standardizes genotyping across laboratories and can be consistently used worldwide. However, ST-based algorithms, when using Illumina paired-end sequences, often rely on genome assembly prior to classification. That hinders rapid genotyping and scalability which are essential aspects of genomic epidemiology. stringMLST is a kmer-based ST method with the capacity to solve both hurdles. Yet, a comprehensive scalable comparison of its use in contrast to a standard MLST program for a wide array of phylogenetically divergent Public Health-relevant bacterial pathogens is lacking. Herein, we first demonstrated that stringMLST is a fast tool that can be deployed for ST-based epidemiological inquiries of bacterial populations. Additionally, we systematically evaluated and showed the impact of genome-intrinsic and -extrinsic features, as well as the optimal kmer length in maximizing the performance of stringMLST on species- by-species basis, and highlighted a few instances where this program may not be applicable in its current format. Furthermore, we integrated stringMLST as part of our freely available and scalable hierarchical-based population genomics platform called ProkEvo. Besides facilitating automatable and reproducible bacterial population guided analysis, ProkEvo now offers a rapidly deployable genomic epidemiology tool for ST mapping, with specific guidance on how to optimize its performance, that can be widely applicable by microbiological laboratories and epidemiological agencies.
Competing Interest Statement
The authors have declared no competing interest.