PT - JOURNAL ARTICLE AU - Greg Finak AU - Andrew McDavid AU - Masanao Yajima AU - Jingyuan Deng AU - Vivian Gersuk AU - Alex K. Shalek AU - Chloe K. Slichter AU - Hannah W. Miller AU - M. Julianna McElrath AU - Martin Prlic AU - Peter S. Linsley AU - Raphael Gottardo TI - MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA-seq data AID - 10.1101/020842 DP - 2015 Jan 01 TA - bioRxiv PG - 020842 4099 - http://biorxiv.org/content/early/2015/06/15/020842.short 4100 - http://biorxiv.org/content/early/2015/06/15/020842.full AB - Single-cell transcriptomic profiling enables the unprecedented interrogation of gene expression heterogeneity in rare cell populations that would otherwise be obscured in bulk RNA sequencing experiments. The stochastic nature of transcription is revealed in the bimodality of single-cell transcriptomic data, a feature shared across single-cell expression platforms. There is, however, a paucity of computational tools that take advantage of this unique characteristic. We present a new methodology to analyze single-cell transcriptomic data that models this bimodality within a coherent generalized linear modeling framework. We propose a two-part, generalized linear model that allows one to characterize biological changes in the proportions of cells that are expressing each gene, and in the positive mean expression level of that gene. We introduce the cellular detection rate, the fraction of genes turned on in a cell, and show how it can be used to simultaneously adjust for technical variation and so-called “extrinsic noise” at the single-cell level without the use of control genes. Our model permits direct inference on statistics formed by collections of genes, facilitating gene set enrichment analysis. The residuals defined by such models can be manipulated to interrogate cellular heterogeneity and gene-gene correlation across cells and conditions, providing insights into the temporal evolution of networks of co-expressed genes at the single-cell level. Using two single-cell RNA-seq datasets, including newly generated data from Mucosal Associated Invariant T (MAIT) cells, we show how model residuals can be used to identify significant changes across biologically relevant gene sets that are missed by other methods and characterize cellular heterogeneity in response to stimulation.