TY - JOUR T1 - Bracken: Estimating species abundance in metagenomics data JF - bioRxiv DO - 10.1101/051813 SP - 051813 AU - Jennifer Lu AU - Florian P Breitwieser AU - Peter Thielen AU - Steven L Salzberg Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/05/05/051813.abstract N2 - We describe a new, highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomy labels assigned by Kraken, a highly accurate metagenomics classification algorithm, to estimate the number of reads originating from each species present in a sample. Kraken classifies reads to the best matching location in the taxonomic tree, but does not estimate abundances of species. We use the Kraken database itself to derive probabilities that describe how much sequence from each genome is shared with other genomes in the database, and combine this information with the assignments for a particular sample to estimate abundance at the species level, the genus level, or above. Combined with the Kraken classifier, Bracken produces accurate species-and genus-level abundance estimates even when a sample contains multiple near-identical species. ER -