Abstract
Testing for differential abundance is a crucial task in metagenome-wide association studies, complicated by technical or biological confounding and a lack of consensus regarding statistical methodology. Here, we developed a framework for benchmarking differential abundance testing methods based on implanting signals into real data. This strategy yields a ground truth for benchmarking while retaining the statistical characteristics of real metagenomic data, which we quantitatively validated in comparison to previous approaches. Our benchmark revealed dramatic issues with elevated false discovery rates or limited sensitivity for the majority of methods with the exception of limma, linear models and the Wilcoxon test. When additionally modeling confounders, we observed these issues to be exacerbated, but also that linear mixed-effect models or the blocked Wilcoxon test effectively address them. Exploratory analysis of cardiometabolic disease cohorts illustrates the confounding potential of medications and the need to consider confounders to prevent spurious associations in real-world applications.
Competing Interest Statement
The authors have declared no competing interest.