PT - JOURNAL ARTICLE AU - Yuqing Yang AU - Ning Chen AU - Ting Chen TI - mLDM: a new hierarchical Bayesian statistical model for sparse microbioal association discovery AID - 10.1101/042630 DP - 2016 Jan 01 TA - bioRxiv PG - 042630 4099 - http://biorxiv.org/content/early/2016/03/07/042630.short 4100 - http://biorxiv.org/content/early/2016/03/07/042630.full AB - Interpretive analysis of metagenomic data depends on an understanding of the underlying associations among microbes from metagenomic samples. Although several statistical tools have been developed for metage-nomic association studies, they suffer from compositional bias or fail to take into account environmental factors that directly affect the composition of a given microbial community. In this paper, we propose metagenomic Lognormal-Dirichlet-Multinomial (mLDM), a hierarchical Bayesian model with sparsity constraints to bypass compositional bias and discover new associations among microbes and between microbes and environmental factors. The mLD-M model can 1) infer both conditionally dependent associations among microbes and direct associations between microbes and environmental factors; 2) consider both compositional bias and variance of metagenomic data; and 3) estimate absolute abundance for microbes. Thus, conditionally dependent association can capture direct relationship underlying microbial pairs and remove the indirect connections induced from other common factors. Empirical studies show the effectiveness of the mLDM model, using both synthetic data and the TARA Oceans eukaryotic data by comparing it with several state-of-the-art methodologies. Finally, mLDM is applied to western English Channel data and finds some interesting associations.