RT Journal Article SR Electronic T1 moGSA: integrative single sample gene-set analysis of multiple omics data JF bioRxiv FD Cold Spring Harbor Laboratory SP 046904 DO 10.1101/046904 A1 Chen Meng A1 Bernhard Kuster A1 Bjoern Peters A1 Aedín C Culhane A1 Amin Moghaddas Gholami YR 2016 UL http://biorxiv.org/content/early/2016/04/03/046904.abstract AB Background The increasing availability of multi-omics datasets has created an opportunity to understand how different biological pathways and molecules interact to cause disease. However, there is a lack of analysis methods that can integrate and interpret multiple experimental and molecular data types measured over the same set of samples.Result To address this challenge, we introduce moGSA, a multivariate single sample gene-set analysis method. It uses multivariate latent variable decomposition to discover correlated global variance structure across datasets and calculates an integrated gene set enrichment score using the most informative features in each data type. Integrating multiple diverse sources of data reduces the impact of missing or unreliable information in any single data type, and may increase the power to discover subtle changes in gene-sets. We show that integrative analysis with moGSA outperforms existing single sample GSA methods on simulated data. We apply moGSA to two studies with real data. First, we discover similarities and differences in mRNA, protein and phosphorylation profiles of induced pluripotent and embryonic stem cell lines. Secondly, we report that three molecular subtypes are robustly discovered when copy number variation and mRNA profiling data of 308 bladder cancers from The Cancer Genome Atlas are integrated using moGSA. Our method provides positive or negative gene-set scores (with p-values) of each gene set in each sample. We demonstrate how to assess the influence of each data type or gene to a moGSA gene set score. With moGSA, there is no requirement to filter data to the intersect of features, therefore, all molecular features on all platforms may be included in the analysis.Conclusion moGSA provides a powerful yet simple tool to perform integrated simple sample gene-set analysis. Its latent variable approach is fundamentally different to existing single sample GSA approaches. It is an attractive approach for data integration and is particularly suited to integrated cluster or molecular subtype discovery. It is available in the Bioconductor R package “mogsa”.