TY - JOUR T1 - Cross-population analysis of high-grade serous ovarian cancer reveals only two robust subtypes JF - bioRxiv DO - 10.1101/030239 SP - 030239 AU - Gregory P. Way AU - James Rudd AU - Chen Wang AU - Habib Hamidi AU - Brooke L. Fridley AU - Gottfried Konecny AU - Ellen L. Goode AU - Casey S. Greene AU - Jennifer A. Doherty Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/01/29/030239.abstract N2 - Background Four gene expression-based subtypes of high-grade serous ovarian cancer (HGSC), variably associated with differential survival, have been previously described. However, in these studies, clustering heuristics were consistent with only three subtypes and reproducibility of the subtypes across populations and assay platforms has not been formally assessed. Therefore, we systematically determined the concordance of transcriptomic HGSC subtypes across populations.Methods We used a unified bioinformatics pipeline to independently cluster (k = 3 and k = 4) five mRNA expression datasets with >130 tumors using k-means and non-negative matrix factorization (NMF) without removing “hard-to-classify” samples. Within each population, we summarized differential expression patterns for each cluster as moderated t statistic vectors using Significance Analysis of Microarrays. We calculated Pearson’s correlations of these vectors to determine similarities and differences in expression patterns between clusters. We identified sets of clusters that were most correlated across populations to define syn-clusters (SC), and we associated SC expression patterns with biological pathways using geneset overrepresentation analyses.Results Across populations, for k = 3, moderated t score correlations for clusters 1, 2 and 3 ranged between 0.77-0.85, 0.80-0.90, and 0.65-0.77, respectively. For k = 4, correlations for clusters 1-4 were 0.77-0.85, 0.83-0.89, 0.51-0.76, and 0.61-0.75, respectively. Within populations, comparing analogous clusters (k = 3 versus k = 4), correlations were high for clusters 1 and 2 (0.91-1.00), but lower for cluster 3 (0.22-0.80). Results were similar using NMF. SC1 corresponds to mesenchymal-like, SC2 to proliferative-like, SC3 to immunoreactive-like, and SC4 to differentiated-like subtypes reported previously.Conclusions While previous single-population studies reported four HGSC subtypes, our cross-population comparison finds strong evidence for only two subtypes and our re-analysis of previous data suggests that results favoring four subtypes may have been driven, at least in part, by the inclusion of samples with low malignant potential. Because the mesenchymal-like and proliferative-like subtypes are highly consistent across populations, they likely reflect intrinsic biological subtypes and are strong candidates for targeted therapies. The other two previously described subtypes (immunoreactive-like and differentiated-like) are considerably less consistent and may represent either a single subtype or signal that is not amenable to clustering.Conflicts of Interest The authors do not declare any conflicts of interest.Other Presentations Aspects of this study were presented at the 2015 AACR Conference and the 2015 Rocky Mountain Bioinformatics Conference.Notes Words: 3,392; Figures: 4; Tables 4; Sup. Figures: 9; Sup. Tables: 5; Sup. MethodsAuthors’ Contributions Study concept and design: GW, JR, CG, JD. Original data collection and processing:CW, HH, BF, GK, EG. Data analysis:GW, JR, CG, JD. Manuscript drafting and editing: GW, JR, CG, JD. All authors read, commented on, and approved the final manuscript. ER -