Abstract
One major limitation of microbial community marker gene sequencing is that it does not provide direct information on the functional composition of sampled communities. Here, we present PICRUSt2 (https://github.com/picrust/picrust2), which expands the capabilities of the original PICRUSt method1 to predict the functional potential of a community based on marker gene sequencing profiles. This updated method and implementation includes several improvements over the previous algorithm: an expanded database of gene families and reference genomes, a new approach now compatible with any OTU-picking or denoising algorithm, and novel phenotype predictions. Upon evaluation, PICRUSt2 was more accurate than PICRUSt1 and other current approaches overall. PICRUSt2 is also now more flexible and allows the addition of custom reference databases. We highlight these improvements and also important caveats regarding the use of predicted metagenomes, which are related to the inherent challenges of analyzing metagenome data in general.
Footnotes
Additional validations are now included in the manuscript based on (1) new paired amplicon-metagenomics datasets and (2) a different validation approach. The different approach, which was introduced by others, is to compare the output of differential abundance tests on predicted metagenomes compared to actual metagenomes on the same samples. This validation approach gives valuable insight into how interpretations can vary across prediction tools and different metagenomics workflows. We now highlight these results and provide clearer cautions regarding the use of metagenome predictions in general. In addition to these important changes, the main-text of the manuscript has been shortened in accordance with requested revisions.