Abstract
Background Environmental DNA (eDNA) and metabarcoding, allow the identification of a mixture of individuals and launch a new era in bio- and eco-assessment. A number of steps are required to obtain taxonomically assigned (Molecular) Operational Taxonomic Unit ((M)OTU) tables from raw data. For most of these, a plethora of tools is available; each tool’s execution parameters need to be tailored to reflect each experiment’s idiosyncrasy. Adding to this complexity, for such analyses, the computation capacity of High Performance Computing (HPC) systems is frequently required.
Software containerization technologies ease the sharing and running of software packages across operating systems; thus, they strongly facilitate pipeline development and usage. Likewise are programming languages specialized for big data pipelines, incorporating features like roll-back checkpoints and on-demand partial pipeline execution.
Findings PEMA is a containerized assembly of key metabarcoding analysis tools with a low effort in setting up, running and customizing to researchers’ needs. Based on third party tools, PEMA performs reads’ pre-processing, clustering to (M)OTUs and taxonomy assignment for 16S rRNA and COI marker gene data. Due to its simplified parameterisation and checkpoint support, PEMA allows users to explore alternative algorithms for specific steps of the pipeline without the need of a complete re-execution. PEMA was evaluated against previously published datasets and achieved comparable quality results.
Conclusions Given its time-efficient performance and its quality results, it is suggested that PEMA can be used for accurate eDNA metabarcoding analysis, thus enhancing the applicability of next-generation biodiversity assessment studies.
List of abbreviations
- BDS
- BigDataScript
- COI
- Cytochrome Oxidase Subunit 1
- eDNA
- Environmental DNA
- MOTU
- Molecular Operational Taxonomic Unit (species equivalent for Eukaryotes)
- HPC
- High Performance Computing
- MCMC
- Markov chain Monte Carlo
- MSA
- Multiple Sequence Alignment
- OTU
- Operational Taxonomic Unit (species equivalent for prokaryotes)
- PEMA
- a Pipeline for Environmental DNA Metabarcoding Analysis
- SSU
- Small Subunit