TY - JOUR T1 - Modelling dropouts allows for unbiased identification of marker genes in scRNASeq experiments JF - bioRxiv DO - 10.1101/065094 SP - 065094 AU - Tallulah S. Andrews AU - Martin Hemberg Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/07/21/065094.abstract N2 - Single-cell RNASeq (scRNASeq) differs from bulk RNASeq in that a large number of genes have zero reads in some cells, but relatively high expression in the remaining cells. We propose that these zeros, or dropouts, are due to failure of the reverse transcription, and we model the process using the Michaelis-Menten (MM) equation. We show that the MM equation provides an equivalent or superior fit to existing scRNASeq datasets compared to other models. In addition, identifying genes significantly to the right of the MM curve is a fast and accurate method to distinguish differentially expressed genes without prior identification of subpopulations of cells. We applied our method to a mouse preimplantation dataset and demonstrate that clustering the selected genes identifies biologically meaningful clusters. Furthermore, this feature selection makes it possible to overcome batch effects and cluster cells from five different datasets by their biological groups rather than by experimental origin. ER -