Rarefaction, alpha diversity, and statistics

Amy Willis

doi:10.1101/231878

Abstract

Understanding the drivers of microbial diversity is a fundamental question in microbial ecology. Extensive literature discusses different methods for describing microbial diversity and documenting its effects on ecosystem function. However, it is widely believed that diversity depends on the number of reads that are sequenced. I discuss a statistical perspective on diversity, framing the diversity of an environment as an unknown parameter, and discussing the bias and variance of plug-in and rarefied estimates. I argue that by failing to account for both bias and variance, we invalidate analysis of alpha diversity. I describe the state of the statistical literature for addressing these problems, and suggest that measurement error modeling can address issues with variance, but bias corrections need to be utilized as well. I encourage microbial ecologists to avoid motivating their investigations with alpha diversity analyses that do not use valid statistical methodology.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.