TY - JOUR T1 - A Graph Theoretical Approach to Data Fusion JF - bioRxiv DO - 10.1101/025262 SP - 025262 AU - Justina Žurauskienė AU - Paul DW Kirk AU - Michael PH Stumpf Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/08/21/025262.abstract N2 - The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. A variety of computational techniques allow us to analyse such data and to model the biological processes behind them. However, it is increasingly being recognised that we can gain deeper understanding by combining the insights obtained from multiple, diverse datasets. We therefore require scalable computational approaches for data fusion.We propose a novel methodology for scalable unsupervised data fusion. Our technique exploits network representations of the data in order to identify (and quantify) similarities among the datasets. We may work within the Bayesian formalism, using Bayesian nonparametric approaches to model each dataset; or (for fast, approximate, and massive scale data fusion) can naturally switch to more heuristic modelling techniques. An advantage of the proposed approach is that each dataset can initially be modelled independently (and therefore in parallel), before applying a fast post-processing step in order to perform data fusion. This allows us to incorporate new experimental data in an online fashion, without having to rerun all of the analysis. The methodology can be applied to genomic scale datasets and we demonstrate its applicability on examples from the literature, using a broad range of genomic datasets, and also on a recent gene expression dataset from Sporadic inclusion body myositis Availability. Example R code and instructions are available from https://sites.google.com/site/gtadatafusion/. ER -