Abstract
Motivated by advances in the causal inference in mediation analysis and problems arising in the analysis of metagenomic data, we consider the effect of treatment on an outcome transmitted through compositional mediators. Compositional and high dimensional features of such mediators make the standard mediation analysis not directly applicable. A sparse mediation model for high-dimensional compositional data is proposed in this paper utilizing the algebraic structure of a composition under the simplex space and a constraint linear regression model to achieve sub-compositional coherence. Under this model, we develop estimation method for estimating direct and microbial mediation effects of a randomly assigned treatment on an outcome. Tests for the total mediation effect of all bacterial taxa and individual mediation effects are also proposed. We conduct extensive simulation studies to assess the performance of the proposed method and apply the method to a real metagenomic dataset to investigate the effect of fat intake on body mass index (BMI) mediated through the gut microbiome composition.