Abstract
Summary Integrative omics is a central component of most systems biology studies. Computational methods are required for extracting meaningful relationships across different omics layers. Various tools have been developed to facilitate integration of paired heterogenous omics data; however most existing tools allow integration of only two omics datasets. Further-more, existing data integration tools do not incorporate additional steps of identifying sub-networks or communities of highly connected entities and evaluating the topology of the integrative network under different conditions. Here we present xMWAS, an R package for data integration, network visualization, clustering, differential network analysis of data from biochemical and phenotypic assays, and two or more omics platforms.
Availability https://sourceforge.net/projects/xmwas/
Contact kuppal2{at}emory.edu
1 Introduction
Technological advances have led to a major paradigm shift where multi-assay molecular profiling of biological samples is increasingly being used to understand molecular mechanisms for diseases and host responses to environmental exposures (Hawkins 2010, Cancer Genome Atlas Network 2008). Most cellular processes in a biological system are dependent on complex molecular interactions (Barabasi 2011). Integrative omics allows researchers to address such complexity and answer challenging biological questions, such as function of genetic variants and unknown metabolites, mechanisms of gene regulation, signaling and metabolic pathway responses to infection and toxicity (Hawkins 2010, Chandler 2016, Uppal 2016).
Numerous data-driven/unsupervised and knowledge-based tools allow integration of data from different omics technologies and other molecular assays (Wanichthanarak 2015, Meng 2016). Most existing data integration tools allow integration of only two datasets and do not allow identification of community structure and evaluation of network changes between different conditions. Community detection reveals topological modules comprised of functionally related biomolecules (Barabasi 2011, Yang 2016). Differential network analysis (DiNA) allows characterization of nodes that undergo changes in topological characteristics between different conditions, e.g. healthy vs disease (Lichtblau 2016).
To advance these capabilities, we present, xMWAS, an R package that provides an automated workflow for integrative analysis of more than two datasets, differential network analysis, and community detection to improve our understanding of complex molecular interactions and disease mechanisms.
2 Implementation
xMWAS provides an automated framework for integrative and differential network analysis. Figure 1A provides an overview of different stages of xMWAS. In stage one, xMWAS uses dimension reduction techniques such as Partial Least Squares (PLS), sparse Partial Least Squares (sPLS), and multilevel sparse Partial Least Squares (msPLS; for repeated measures) regression for pairwise integrative and association analysis between data matrices (Le Cao 2009, Liquet 2012, Gonzalez 2012). sPLS and msPLS methods perform simultaneous data integration and variable selection using a LASSO penalty for the loading vectors, which reduces the complexity of the networks (Liquet 2012). R package plsgenomics is used to determine the optimal number of latent components. The network() function in the mixOmics package is used to generate the association matrix, AXY, between matrices X and Y (Le Cao 2009, Gonzalez 2012). Student’s t-test is used to evaluate the statistical significance of association scores. Only the associations that satisfy the user-defined thresholds, e.g. |association score|>0.7 and p-value<0.01, are used for downstream analysis. The resulting graph, Gi=(V,E), where V is a set of nodes and E is a set of edges, is used to generate an edge list matrix, Li., such that each row in Li corresponds to an edge between nodes Xp and Yq. The same process is repeated for generating edge list matrices from all pairwise association analyses between datasets, e.g. Li=cor(X,Y); Lj=cor(Y,Z), and Lk=cor(Y,Z).
In stage two, the union of the individual edge list matrices from pairwise integrative analysis of the n datasets is used to generate a combined edge list matrix, Le = . Matrix Le is used to generate the integrative network graph, G=(V,E), where V corresponds to nodes and E corresponds to edges or connections between the nodes, representing positive or negative associations between multiple datasets (Figure 1B). Network graphics are generated using the igraph package in R.
In stage three, the multilevel community detection algorithm (Blondel 2008) is used to identify communities of nodes that are tightly connected with each other, but sparsely connected with the rest of the network (Figure 1C). Comparative studies for community detection algorithms show that the multilevel algorithm is suitable for both small and large networks with varying connectivity patterns (Yang 2016). The quality of the community structure is evaluated using the network modularity measure (Newman 2006).
Differential network analysis is performed using the differential betweenness centrality and differential eigenvector centrality methods to identify nodes that undergo changes in their topological characteristics (Odibat 2012, Lichtblau 2016). Additional description about the software input and output is provided in Supplementary Section S1.
3 Example
We tested xMWAS in a three-way integrative analysis using cytokine, transcriptome, and metabolome datasets from a recently published study to examine H1N1 influenza virus infection-altered metabolic response in mouse lung (Chandler 2016). For comparisons, we used data from all samples (Supplementary Figure S1A), only control samples (Supplementary Figure S1B), and only H1N1 influenza samples (Supplementary Figure S1C). Supplementary Section S2 shows that the various stages of xMWAS capture biologically meaningful information and provide deeper insights into the underlying biology, which cannot be obtained by analyzing and exploring the different layers individually.
4 Conclusion
xMWAS provides a platform-independent framework for integrative network analysis of two or more datasets, identification of modules of functionally related biomolecules, and differential network analysis. The results show that xMWAS can improve our understanding of disease pathophysiology and complex molecular interactions across various functional levels.
Funding
This project was funded by National Institutes of Health grants, ES025632, ES023485, ES019776, OD018006, HL095479, EY022618. The project was also funded in part by federal funds from the US National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under contract # HHSN272201200031C.
Conflict of Interest
none declared.
Acknowledgements
The authors acknowledge members of the Clinical Biomarkers Laboratory, Emory University for testing and suggesting improvements to the software.