RT Journal Article
SR Electronic
T1 Neptune: A Bioinformatics Tool for Rapid Discovery of Genomic Variation in Bacterial Populations
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 032227
DO 10.1101/032227
A1 Eric Marinier
A1 Rahat Zaheer
A1 Chrystal Berry
A1 Kelly Weedmark
A1 Michael Domaratzki
A1 Philip Mabon
A1 Natalie Knox
A1 Aleisha Reimer
A1 Morag Graham
A1 Linda Chui
A1 The Canadian Listeria Detection and Surveillance using Next Generation Genomics (LiDS-NG) Consortium
A1 Gary Van Domselaar
YR 2016
UL http://biorxiv.org/content/early/2016/11/30/032227.abstract
AB The ready availability of vast amounts of genomic sequence data has created the need to rethink comparative genomics algorithms using “big data” approaches. Neptune is an efficient system for rapidly locating differentially abundant genomic content in bacterial populations using an exact k-mer matching strategy, while accommodating k-mer mismatches. Neptune’s loci discovery process identifies sequences that are sufficiently common to a group of target sequences and sufficiently absent from non-targets using probabilistic models. Neptune uses parallel computing to efficiently identify and extract these loci from draft genome assemblies without requiring multiple sequence alignments or other computationally expensive comparative sequence analyses. Tests on simulated and real data sets showed that Neptune rapidly identifies regions that are both sensitive and specific. We demonstrate that this system can identify trait-specific loci from different bacterial lineages. Neptune is broadly applicable for comparative bacterial analyses, yet will particularly benefit pathogenomic applications, owing to efficient and sensitive discovery of differentially abundant genomic loci.