1 Abstract
The microbiome is a collection of microbes that exist in symbiosis with a host. Whole genome sequencing produces off-target, non-specific reads, to the host in question, which can be used for metagenomic inference of a microbiome. This data is advantageous over barcoding methods since higher taxonomic resolution and functional predictions of microbes are possible. With the growing number of genomic sequencing data publicly available, comes opportunity to elucidate reads pertaining to the microbiome. However, characterization of these reads can be complex, with many steps required to perform a robust analysis. To address this, we developed MINUUR (Microbial INsights Using Unmapped Reads); a snakemake pipeline to characterize non-host reads from existing genomic data. We apply this pipeline to ten, publicly available, high coverage Aedes aegypti (Ae. aegypti) genomic samples. Using MINUUR, we describe species level microbial classifications; predict microbe associated genes and pathways and find bacterial metagenome assembled genomes (MAGs) associated to the Ae. aegypti microbiome. Of these MAGS, 19 are high-quality representatives with over 90% completeness and under 5% contamination. In summary, we present an in-depth analysis of non-host reads from Ae. aegypti whole genome sequencing data within a reproducible and open-access pipeline.
Competing Interest Statement
The authors have declared no competing interest.