TY - JOUR T1 - MEACA: efficient gene-set interpretation of expression data using mixed models JF - bioRxiv DO - 10.1101/106781 SP - 106781 AU - Bin Zhuo AU - Duo Jiang Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/02/07/106781.abstract N2 - Competitive gene-set analysis, also called enrichment analysis, is a widely used tool for functional interpretation of high-throughput biological data such as gene expression data. It aims at testing a known category (e.g. a pathway) of genes for enriched differential expression (DE) signals compared to genes not in the category. Most enrichment testing methods ignore the widespread correlations among genes, which has been shown to result in excessive false positives. We show, both theoretically and empirically, that existing methods to account for correlations, such as GSEA and CAMERA, can result in severely mis-calibrated type 1 error and/or considerable power loss due to the failure to properly accommodate the DE heterogeneity across genes. We propose MEACA, a new gene-set testing framework based on a mixed effects model. Our method flexibly incorporates the unknown distribution of DE effects, effectively adjusts for completely unknown, unstructured correlations among genes, and does not rely on time-consuming permutations. Compared to existing methods, MEACA enjoys robust type 1 error control in widely ranging scenarios and substantially improves power. Applications of MEACA to a Huntington’s disease study and a lymphoblastoid cell line data set demonstrate its ability to recover biologically meaningful relationships. MEACA is available as an R package. ER -