openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding

Pavlin G. Poličar; Martin Stražar; Blaž Zupan

doi:10.1101/731877

Abstract

Summary Point-based visualisations of large, multi-dimensional data from molecular biology can reveal meaningful clusters. One of the most popular techniques to construct such visualisations is t-distributed stochastic neighbor embedding (t-SNE), for which a number of extensions have recently been proposed to address issues of scalability and the quality of the resulting visualisations. We introduce openTSNE, a modular Python library that implements the core t-SNE algorithm and its extensions. The library is orders of magnitude faster than existing popular implementations, including those from scikit-learn. Unique to openTSNE is also the mapping of new data to existing embeddings, which can surprisingly assist in solving batch effects.

Availability openTSNE is available at https://github.com/pavlin-policar/openTSNE.

Contact pavlin.policar{at}fri.uni-lj.si, blaz.zupan{at}fri.uni-lj.si

Footnotes

https://github.com/pavlin-policar/openTSNE

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.