TY - JOUR T1 - Extracting a low-dimensional description of multiple gene expression datasets reveals a potential driver for tumor-associated stroma in ovarian cancer JF - bioRxiv DO - 10.1101/048215 SP - 048215 AU - Safiye Celik AU - Benjamin A Logsdon AU - Stephanie Battle AU - Charles W Drescher AU - Mara Rendi AU - Hawkins David R AU - Su-In Lee Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/04/12/048215.abstract N2 - Background: Discovering patient subtypes and molecular drivers of a subtype are difficult and driving problems underlying most modern disease expression studies collected across patient populations. Expression patterns conserved across multiple expression datasets from independent disease studies are likely to represent important molecular events underlying the disease.Methods: We present the INSPIRE (INferring Shared modules from multiPle gene expREssion datasets) method to infer highly coherent and robust modules of co-expressed genes and the dependencies among the modules from multiple expression datasets. Focusing on inferring modules and their dependencies conserved across multiple expression datasets is important for several reasons. First, using multiple datasets will increase the power to detect robust and relevant patterns (modules and dependencies among modules). Second, INSPIRE enables the use of multiple datasets that contain different sets of genes due to, e.g., the difference in microarray platforms. Many methods designed for expression data analysis cannot integrate multiple datasets with variable discrepancy to infer a single combined model, whereas INSPIRE can naturally model the dependencies among the modules even when a large proportion of genes are not observed on a certain platform.Results: We evaluated INSPIRE on synthetically generated datasets with known underlying network structure among modules, and gene expression datasets from multiple ovarian cancer studies. We show that the model learned by INSPIRE can explain unseen data better and can reveal prior knowledge on gene functions more accurately than alternative methods. We demonstrate that applying INSPIRE to nine ovarian cancer datasets leads to the identification of a new marker and potential molecular driver of tumor-associated stroma - HOPX. We also demonstrate that the HOPXmodule strongly overlaps with the genes defining the mesenchymal patient subtype identified in The Cancer Genome Atlas (TCGA) ovarian cancer data. We provide evidence for a previously unknown molecular basis of tumor resectability efficacy involving tumor-associated mesenchymal stem cells represented by HOPX.Conclusions: INSPIRE extracts a low-dimensional description from multiple gene expression data, which consists of modules and their dependencies. The discovery of a new tumor-associated stroma marker, HOPX, and its module suggests a previously unknown mechanism underlying tumor-associated stroma.Abbreviations:INSPIREINferring Shared modules from multiPle gene expREssion datasetsTCGAThe Cancer Genome AtlasLDRLow-Dimensional RepresentationWGCNAWeighted Gene Co-expression Network AnalysisGGMGaussian Graphical ModelEMTEpithelial-Mesenchymal TransitionBICBayesian Information CriterionPPIProtein-Protein InteractionMSigDBMolecular Signatures DataBasePCAPrincipal Component AnalysisPCPrincipal ComponentTOMTopological Overlap MeasureMGLModule Graphical LassoSLFAStructured Latent Factor AnalysisGLassoGraphical LassoUGLUnknown Group L1 regularizationCVCross-ValidationTFTranscription FactorChEAChIP Enrichment AnalysisGEOGene Expression OmnibusCNVCopy Number VariationNBSNetwork-Based StratificationSAMSignificance Analysis of MicroarraysOVOVarian cancerAUCArea Under the CurveMSCMesenchymal Stem CellCAFCancer-Associated FibroblastsMAPMaximum A PosterioriLOOCVLeave-One-Out Cross ValidationROCReceiver Operator CharacteristicGISTICGenomic Identification of Significant Targets in Cancer ER -