RT Journal Article SR Electronic T1 The Northern Arizona SNP Pipeline (NASP): accurate, flexible, and rapid identification of SNPs in WGS datasets JF bioRxiv FD Cold Spring Harbor Laboratory SP 037267 DO 10.1101/037267 A1 Jason W. Sahl A1 Darrin Lemmer A1 Jason Travis A1 James M. Schupp A1 John D. Gillece A1 Maliha Aziz A1 Elizabeth M. Driebe A1 Kevin Drees A1 Nathan Hicks A1 Charles H.D. Williamson A1 Crystal Hepp A1 David Smith A1 Chandler Roe A1 David M. Engelthaler A1 David M. Wagner A1 Paul Keim YR 2016 UL http://biorxiv.org/content/early/2016/01/25/037267.abstract AB Whole genome sequencing (WGS) of bacteria is becoming standard practice in many laboratories. Applications for WGS analysis include phylogeography and molecular epidemiology, using single nucleotide polymorphisms (SNPs) as the unit of evolution. The Northern Arizona SNP Pipeline (NASP) was developed as a reproducible pipeline that scales well with the large amount of WGS data typically used in comparative genomics applications. In this study, we demonstrate how NASP compares to other tools in the analysis of two real bacterial genomics datasets and one simulated dataset. Our results demonstrate that NASP produces comparable, and often better, results to other pipelines, but is much more flexible in terms of data input types, job management systems, diversity of supported tools, and output formats. We also demonstrate differences in results based on the choice of the reference genome and choice of inferring phylogenies from concatenated SNPs or alignments including monomorphic positions. NASP represents a source-available, version-controlled, unit-tested method and can be obtained from tgennorth.github.io/NASP.