RT Journal Article SR Electronic T1 A Novel Clustering Method for Patient Stratification JF bioRxiv FD Cold Spring Harbor Laboratory SP 073189 DO 10.1101/073189 A1 Hongfu Liu A1 Rui Zhao A1 Hongsheng Fang A1 Feixiong Cheng A1 Yun Fu A1 Yang-Yu Liu YR 2016 UL http://biorxiv.org/content/early/2016/09/03/073189.abstract AB Patient stratification or disease subtyping is crucial for precision medicine and personalized treatment of complex diseases. The increasing availability of high-throughput molecular data provides a great opportunity for patient stratification. In particular, many clustering methods have been employed to tackle this problem in a purely data-driven manner. Yet, existing methods leveraging high-throughput molecular data often suffers from various limitations, e.g., noise, data heterogeneity, high dimensionality or poor interpretability. Here we introduced an Entropy-based Consensus Clustering (ECC) method that overcomes those limitations all together. Our ECC method employs an entropy-based utility function to fuse many basic partitions to a consensus one that agrees with the basic ones as much as possible. Maximizing the utility function in ECC has a much more meaningful interpretation than any other consensus clustering methods. Moreover, we exactly map the complex utility maximization problem to the classic K-means clustering problem with a modified distance function, which can then be efficiently solved with linear time and space complexity. Our ECC method can also naturally integrate multiple molecular data types measured from the same set of subjects, and easily handle missing values without any imputation. We applied ECC to both synthetic and real data, including 35 cancer gene expression benchmark datasets and 13 cancer types with four molecular data types from The Cancer Genome Atlas. We found that ECC shows superior performance against existing clustering methods. Our results clearly demonstrate the power of ECC in clinically relevant patient stratification.