PT - JOURNAL ARTICLE AU - Genevieve L Stein-O’Brien AU - Jacob L Carey AU - Wai-shing Lee AU - Michael Considine AU - Alexander V Favorov AU - Emily Flam AU - Theresa Guo AU - Sijia Li AU - Luigi Marchionni AU - Thomas Sherman AU - Shawn Sivy AU - Daria A Gaykalova AU - Ronald D McKay AU - Michael F Ochs AU - Carlo Colantuoni AU - Elana J Fertig TI - PatternMarkers & GWCoGAPS for novel data-driven biomarkers via whole transcriptome NMF AID - 10.1101/083717 DP - 2016 Jan 01 TA - bioRxiv PG - 083717 4099 - http://biorxiv.org/content/early/2016/10/28/083717.short 4100 - http://biorxiv.org/content/early/2016/10/28/083717.full AB - Summary Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g., time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel PatternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with PatternMarkers requires whole-genome data. However, NMF algorithms typically do not converge for the tens of thousands of genes in genome-wide profiling. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. This software contains analytic and visualization tools including a Shiny web application, patternMatcher, which are generalized for any NMF. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTex data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data.Availability PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL license.Contact gsteinobrien{at}jhmi.edu; ccolantu{at}jhmi.edu; ejfertig{at}jhmi.edu