Comparison of marker selection methods for high throughput scRNA-seq data

Anna C. Gilbert; Alexander Vargo

doi:10.1101/679761

Abstract

Here, we evaluate the performance of a variety of marker selection methods on scRNA-seq UMI counts data. We test on an assortment of experimental and synthetic data sets that range in size from several thousand to one million cells. In addition, we propose several performance measures for evaluating the quality of a set of markers when there is no known ground truth. According to these metrics, most existing marker selection methods show similar performance on experimental scRNA-seq data; thus, the speed of the algorithm is the most important consid-eration for large data sets. With this in mind, we introduce RANKCORR, a fast marker selection method with strong mathematical underpinnings that takes a step towards sensible multi-class marker selection.