PT - JOURNAL ARTICLE AU - Stephen Nayfach AU - Beltran Rodriguez-Mueller AU - Nandita Garud AU - Katherine S. Pollard TI - An integrated metagenomics pipeline for strain profiling reveals novel patterns of transmission and global biogeography of bacteria AID - 10.1101/031757 DP - 2016 Jan 01 TA - bioRxiv PG - 031757 4099 - http://biorxiv.org/content/early/2016/08/02/031757.short 4100 - http://biorxiv.org/content/early/2016/08/02/031757.full AB - We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single nucleotide polymorphisms, from shotgun metagenomes. Our method leverages a database of >30,000 bacterial reference genomes which we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare single nucleotide variants to reveal extensive vertical transmission of strains at birth but colonization with strains unlikely to derive from the mother at later time points. This pattern was missed with species-level analysis, because the infant gut microbiome composition converges towards that of an adult over time. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data is analyzed at a higher taxonomic resolution.