Abstract
Background A few recent large efforts significantly expanded the collection of human-associated bacterial genomes, which now contains thousands of entities including reference complete/draft genomes and metagenome assembled genomes (MAGs). These genomes provide useful resource for studying the functionality of the human-associated microbiome and their relationship with human health and diseases. One application of these genomes is to provide a universal reference for database search in metaproteomic studies, when matched metagenomic/metatranscriptomic data are unavailable. However, a greater collection of reference genomes may not necessarily result in better peptide/protein identification because the increase of search space often leads to fewer spectrum-peptide matches, not to mention the drastic increase of computation time.
Methods Here, we present a new approach that uses two steps to optimize the use of the reference genomes and MAGs as the universal reference for human gut metaproteomic MS/MS data analysis. The first step is to use only the High Abundance Proteins (HAPs) (i.e., ribosomal proteins and elongation factors) for metaproteomic MS/MS database search and, based on the identification results, to derive the taxonomic composition of the underlying microbial community. The second step is to expand the search database by including all proteins from identified abundant species. We call our approach HAPiID (HAPs guided metaproteomics IDentification).
Results We tested our approach using human gut metaproteomic datasets from a previous study and compared it to the state-of-the-art reference database search method MetaPro-IQ for metaproteomic identification in studying human gut microbiota. Our results show that our two-steps method not only performed significantly faster but also was able to identify more peptides. We further demonstrated the application of HAPiID to revealing protein profiles of individual human-associated bacterial species, one or a few species at a time, using metaproteomic data.
Conclusions The HAP guided profiling approach presents a novel effective way for constructing target database for metaproteomic data analysis. The HAPiID pipeline built upon this approach provides a universal tool for analyzing human gut-associated metaproteomic data.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Email address: Moses H. Stamboulian, mstambou{at}indiana.edu, Sujun Li, sujli{at}indiana.edu, Yuzhen Ye, yye{at}indiana.edu
List of abbreviations
- MAG
- Metagenome Assembled Genomes
- MS
- Mass Spectrometry
- HAP
- Highly Abundant Protein
- HAPiID
- Highly Abundant Protein guided Metaproteomic IDentification
- IBS
- Irritable Bowel Syndrome
- IBD
- Inflammatory Bowel Disease
- CDI
- Clostridium difficile Infection
- FDR
- False Discovery Rate
- HMP
- Human Microbiome Project
- HBC
- Bacteria Culture Collection
- RefSeq
- Reference Sequence
- UMGS
- Unclassified Metagenome Genome Sequences
- GTDBTK
- Genome Taxonomy DataBase Tool Kit
- FGS
- Frag Gene Scan
- KEGG
- Kyoto Encyclopedia of Genes and Genomes
- KoFamDB
- Kegg Orthology family database
- Leu
- Leucine
- Ile
- Isoleucine
- IGC
- Integrated reference Catalog of the human gut microbiome