Abstract
Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNAseq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a zero-inflated negative binomial noise model, and nonlinear gene-gene or gene-dispersion interactions are captured. Our method scales linearly with the number of cells and can therefore be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.
List of abbreviations
- scRNA-seq
- single-cell RNA sequencing
- tSNE
- t-distributed stochastic neighbor embedding
- DCA
- deep count autoencoder
- AE
- autoencoder
- PCA
- principal component analysis
- H1
- human embryonic stem cells
- DEC
- definitive endoderm cells
- MEP
- megakaryocyte-erythroid progenitors
- GMP
- granulocyte-macrophage progenitors
- MSE
- mean squared error
- ZINB
- zero-inflated negative binomial
- CITE-seq
- Cellular Indexing of Transcriptome and Epitopes by sequencing
- NK
- natural killer cells
- DPT
- diffusion pseudotime
- ReLU
- rectified linear unit.