Abstract
Neptune locates genomic signatures using an exact k-mer matching strategy while accommodating k-mer mismatches. The software identifies sequences that are sufficiently represented within “inclusion targets” and sufficiently absent from “exclusion targets”. The signature discovery process is accomplished using probabilistic models instead of heuristic strategies. We have evaluated Neptune on Listeria monocytogenes and Escherichia coli genome data sets and found that signatures identified from these experiments are sensitive and specific to their respective data sets. In addition, the identified loci provide a catalog of differential loci for research of group-specific traits. Neptune has broad implications in microbial characterization for public health applications due to its efficient ad hoc signature discovery based upon differential genomics.