TY - JOUR T1 - One Codex: A Sensitive and Accurate Data Platform for Genomic Microbial Identification JF - bioRxiv DO - 10.1101/027607 SP - 027607 AU - Samuel S. Minot AU - Niklas Krumm AU - Nicholas B. Greenfield Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/09/25/027607.abstract N2 - High-throughput sequencing (HTS) is increasingly being used for broad applications of microbial char-acterization, such as microbial ecology, clinical diagnosis, and outbreak epidemiology. However, the analytical task of comparing short sequence reads against the known diversity of microbial life has proved to be computationally challenging. The One Codex data platform was created with the dual goals of analyzing microbial data against the largest possible collection of microbial reference genomes, as well as presenting those results in a format that is consumable by applied end-users. One Codex identifies microbial sequences using a “k-mer based” taxonomic classification algorithm through a web-based data platform, using a reference database that currently includes approximately 40,000 bacterial, viral, fungal, and protozoan genomes. In order to evaluate whether this classification method and associated database provided quantitatively different performance for microbial identification, we created a large and diverse evaluation dataset containing 50 million reads from 10,639 genomes, as well as sequences from six organisms novel species not be included in the reference databases of any of the tested classifiers. Quantitative evaluation of several published microbial detection methods shows that One Codex has the highest degree of sensitivity and specificity (AUC = 0.97, compared to 0.82-0.88 for other methods), both when detecting well-characterized species as well as newly sequenced, “taxonomically novel” organisms. ER -