TY - JOUR T1 - Differential Expression Analysis for RNAseq using Poisson Mixed Models JF - bioRxiv DO - 10.1101/073403 SP - 073403 AU - Shiquan Sun AU - Michelle Hood AU - Laura Scott AU - Qinke Peng AU - Sayan Mukherjee AU - Jenny Tung AU - Xiang Zhou Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/09/05/073403.abstract N2 - Identifying differentially expressed (DE) genes from RNA sequencing (RNAseq) studies is one of the most common analyses in genomics. DE analysis often represents the first step towards understanding the molecular mechanisms underlying disease susceptibility and phenotypic variation. However, identifying DE genes from RNAseq presents statistical and computational challenges that arise from several unique properties of the sequencing data. Specifically, gene expression estimates in RNAseq experiments are based on read counts that often display over-dispersion. In addition, gene expression levels are heritable and are influenced by the genetic structure of the study samples. Previous count-based methods for identifying DE genes rely on simple hierarchical Poisson models (e.g., negative binomial) to model over-dispersion, which is assumed to be independent among samples. However, these methods fail to account for the gene expression similarity induced by relatedness and/or population structure, which can cause inflation of test statistics and/or loss of power. To address this problem, we present a Poisson mixed model with two random effects terms to account for both independent over-dispersion and sample relatedness in RNAseq DE analysis. To make our method scalable, we develop a novel sampling-based inference algorithm, taking advantage of recently developed innovations in efficient mixed model optimization and a latent variable representation of the Poisson model. With simulations, we show that, in the presence of population structure, our method properly controls for type I error and is more powerful than several widely used approaches. We apply our method to identify DE genes associated with sex in a baboon data set and DE genes associated with type 2 diabetes status as well as fasting glucose levels in a human data set. In both data sets, our method detects at least 40% more DE genes compared with the next best approach while properly controlling for type I error. Our method is implemented in the MACAU software package, freely available at www.xzlab.org/software.html. ER -