RT Journal Article SR Electronic T1 An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures JF bioRxiv FD Cold Spring Harbor Laboratory SP 102475 DO 10.1101/102475 A1 Sahir R Bhatnagar A1 Yi Yang A1 Mathieu Blanchette A1 Budhachandra Khundrakpam A1 Alan Evans A1 Luigi Bouchard A1 Celia MT Greenwood YR 2017 UL http://biorxiv.org/content/early/2017/01/23/102475.abstract AB Computational approaches to variable selection have become increasingly important with the advent of high-throughput technologies in genomics and brain imaging studies, where the data has become massive, yet where it is believed that the number of truly important variables is small relative to the total number of variables. Although many approaches have been developed for main effects, less attention has been paid to interaction models. Here, starting from the hypothesis that a binary exposure variable can alter correlation patterns between clusters of high-dimensional variables, i.e. alter network properties of the variables, we explore whether such exposure-dependent clustering relationships can improve predictive modelling of an outcome or phenotype variable. Hence, we propose a modelling framework called ECLUST to test this hypothesis, and evaluate performance through extensive simulations. We see improved model fit in many scenarios. We further illustrate the framework through the analysis of three data sets from very different fields, each with high dimensional data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package.