Abstract
Summary NMF algorithms associate gene expression changes with biological processes (e.g., time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers identification. Therefore, we developed a novel PatternMarkers statistic to extract unique genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with PatternMarkers requires whole-genome data. However, NMF algorithms typically do not converge for the tens of thousands of genes in genome-wide profiling. Therefore, we also developed GWCoGAPS, the first robust Bayesian NMF technique for whole genome transcriptomics using the sparse, MCMC algorithm, CoGAPS. This software contains additional analytic and visualization tools including a Shiny web application, patternMatcher, which are generalized for any NMF. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTex data, illustrating GWCoGAPS and patternMarkers unique ability to detect data-driven biomarkers from whole genome data.
Availability PatternMarkers and GWCoGAPS are in the CoGAPS Bioconductor package as of version 3.5 under the GPL license.
Contact CColantu{at}jhmi.edu; ejfertig{at}jhmi.edu