RT Journal Article SR Electronic T1 MEACA: efficient gene-set interpretation of expression data using mixed models JF bioRxiv FD Cold Spring Harbor Laboratory SP 106781 DO 10.1101/106781 A1 Bin Zhuo A1 Duo Jiang YR 2017 UL http://biorxiv.org/content/early/2017/02/07/106781.abstract AB Competitive gene-set analysis, also called enrichment analysis, is a widely used tool for functional interpretation of high-throughput biological data such as gene expression data. It aims at testing a known category (e.g. a pathway) of genes for enriched differential expression (DE) signals compared to genes not in the category. Most enrichment testing methods ignore the widespread correlations among genes, which has been shown to result in excessive false positives. We show, both theoretically and empirically, that existing methods to account for correlations, such as GSEA and CAMERA, can result in severely mis-calibrated type 1 error and/or considerable power loss due to the failure to properly accommodate the DE heterogeneity across genes. We propose MEACA, a new gene-set testing framework based on a mixed effects model. Our method flexibly incorporates the unknown distribution of DE effects, effectively adjusts for completely unknown, unstructured correlations among genes, and does not rely on time-consuming permutations. Compared to existing methods, MEACA enjoys robust type 1 error control in widely ranging scenarios and substantially improves power. Applications of MEACA to a Huntington’s disease study and a lymphoblastoid cell line data set demonstrate its ability to recover biologically meaningful relationships. MEACA is available as an R package.