Learning interpretable latent autoencoder representations with annotations of feature sets

Sergei Rybakov; Mohammad Lotfollahi; Fabian J. Theis; F. Alexander Wolf

doi:10.1101/2020.12.02.401182

Abstract

Existing methods for learning latent representations for single-cell RNA-seq data are based on autoencoders and factor models. However, representations learned by autoencoders are hard to interpret and representations learned by factor models have limited flexibility. Here, we introduce a framework for learning interpretable autoencoders based on regularized linear decoders. It decomposes variation into interpretable components using prior knowledge in the form of annotated feature sets obtained from public databases. Through this, it provides an alternative to enrichment techniques and factor models for the task of explaining observed variation with biological knowledge. Benchmarking our model on two single-cell RNA-seq datasets, we demonstrate how our model outperforms an existing factor model regarding scalability while maintaining interpretability.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

Presented at the 15th Machine Learning in Computational Biology (MLCB) meeting. Copyright 2020 by the author(s).

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.