Sigma: strain-level inference of genomes from metagenomic analysis for biosurveillance

Bioinformatics. 2015 Jan 15;31(2):170-7. doi: 10.1093/bioinformatics/btu641. Epub 2014 Sep 29.

Abstract

Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis.

Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic reads to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. The algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains.

Availability and implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Biosurveillance*
  • Computational Biology / methods*
  • DNA, Bacterial / analysis*
  • Genome, Bacterial*
  • Humans
  • Metagenomics / methods*
  • Sequence Analysis, DNA / methods*
  • Software*

Substances

  • DNA, Bacterial