TY - JOUR T1 - A reassessment of consensus clustering for class discovery JF - bioRxiv DO - 10.1101/002642 SP - 002642 AU - Yasin Senbabaoglu AU - George Michailidis AU - Jun Z. Li Y1 - 2014/01/01 UR - http://biorxiv.org/content/early/2014/02/13/002642.abstract N2 - Consensus clustering (CC) is an unsupervised class discovery method widely used to study sample heterogeneity in high-dimensional datasets. It calculates “consensus rate” between any two samples as how frequently they are grouped together in repeated clustering runs under a certain degree of random perturbation. The pairwise consensus rates form a between-sample similarity matrix, which has been used (1) as a visual proof that clusters exist, (2) for comparing stability among clusters, and (3) for estimating the optimal number (K) of clusters. However, the sensitivity and specificity of CC have not been systemically studied. To assess its performance, we investigated the most common implementations of CC; and compared CC with other popular methods that also focus on cluster stability and estimation of K. We evaluated these methods using simulated datasets with either known structure or known absence of structure. Our results showed that (1) CC was able to divide randomly generated unimodal data into pre-specified numbers of clusters, and was able to show apparent stability of these chance partitions of known cluster-less data; (2) for data with known structure, the proportion of ambiguously clustered (PAC) pairs infers the known number of clusters more reliably than several commonly used K estimating methods; and (3) validation of the optimal K by choosing the most discriminant genes from the discovery cohort and applying them in an independent cohort often exaggerates the confidence in K due to inherent gene-gene correlations among the selected genes. While these results do not yet prove that any of the published studies using CC has generated false positive findings, they show that datasets with subtle or no structure are fully capable of producing strong evidence of consensus clustering. We therefore recommend caution is using CC in class discovery and validation.Author Summary Consensus clustering (CC) is rapidly becoming the algorithm of choice for unsupervised class discovery with genomic datasets. It has been used both as a visualization tool and an inference tool, and has been cited ∼600 times since its introduction in 2003. In a typical application, The Cancer Genome Atlas (TCGA) Research Network used CC to analyze gene expression data of glioblastoma and identified four subtypes. But as often occurred in this type of studies, neither the strength of the evidence nor the sensitivity of the method was quantitatively evaluated. Here, by comparing the TCGA dataset with a series of randomly simulated datasets known to lack cluster, we highlight the potential for CC to generate false positive results in subtype discovery. We describe a CC-based summary statistic, the proportion of ambiguous clustering (PAC), as the measure to infer the optimal number of clusters. Using simulated data with known number of clusters we show that PAC outperforms commonly used methods such as CDF, Δ(K), Silhouette Width, GAP-PC and CLEST in scenarios closely resembling real studies. We conclude by making practical recommendations for conducting unsupervised class discovery using CC. ER -