TY - JOUR T1 - Comparison of Principal Component Analysis and t-Stochastic Neighbor Embedding with Distance Metric Modifications for Single-cell RNA-sequencing Data Analysis JF - bioRxiv DO - 10.1101/102780 SP - 102780 AU - Haejoon (Ellen) Kwon AU - Jean Fan AU - Peter Kharchenko Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/01/28/102780.abstract N2 - Recent developments in technological tools such as next generation sequencing along with peaking interest in the study of single cells has enabled single-cell RNA-sequencing, in which whole transcriptomes are analyzed on a single-cell level. Studies, however, have been hindered by the ability to effectively analyze these single cell RNA-seq datasets, due to the high-dimensional nature and intrinsic noise in the data. While many techniques have been introduced to reduce dimensionality of such data for visualization and subpopulation identification, the utility to identify new cellular subtypes in a reliable and robust manner remains unclear. Here, we compare dimensionality reduction visualization methods including principle component analysis and t-stochastic neighbor embedding along with various distance metric modifications to visualize single-cell RNA-seq datasets, and assess their performance in identifying known cellular subtypes. Our results suggest that selecting variable genes prior to analysis on single-cell RNA-seq data is vital to yield reliable classification, and that when variable genes are used, the choice of distance metric modification does not particularly influence the quality of classification. Still, in order to take advantage of all the gene expression information, alternative methods must be used for a reliable classification. ER -