Abstract
In shotgun metagenomics (SM), the state of the art bioinformatic workflows are referred to as high resolution shotgun metagenomics (HRSM) and require intensive computing and disk storage resources. The increase in data output of the latest iteration of high throughput DNA sequencing systems can allow for unprecedented sequencing depth at a minimal cost and will require adaptations in HRSM workflows architecture. Such a strategy is to generate so-called shallow SM datasets that contain fewer sequencing data per sample as compared to the more classic high coverage sequencing. While shallow sequencing is a promising avenue for SM, detailed benchmarks using real data are lacking. In this case study, we took two public SM datasets one moderate and the other massive in size and subsampled each dataset at various levels to mimic shallow sequencing datasets of various sequencing depths. Our results suggest that shallow SM sequencing is a viable avenue to obtain sound results regarding microbial structures and that high depth sequencing does not bring additional elements for ecological interpretation. One area, however, where ultra-deep sequencing and maximizing the usage of all data was undeniably beneficial was in the generation of metagenomic amplified genomes (MAGs). We finally include a proof of concept analysis showing that alpha diversity is the main driver of gut microbiome structure and demonstrate that this conclusion can be reached using shallow SM, validating this method as a viable and sound option for HRSM analyses.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
List of abbreviations
- Tb
- Tera-base
- TB
- Terabyte
- Gb
- Giga-base
- GB
- Gigabyte
- HPC
- High Performance Computing
- bp
- base-pairs
- MB
- Megabyte
- SM
- Shotgun metagenomics
- SSM
- Shallow Shotgun Metagenomics
- HRSMG
- High resolution shotgun metagenomics
- KO
- KEGG ortholog
- arm
- all reads mapped