RT Journal Article SR Electronic T1 PatternMarkers and Genome-Wide CoGAPS in Analysis in Parallel Sets (GWCoGAPS) for data-driven detection of novel biomarkers via whole transcriptome Non-negative matrix factorization (NMF) JF bioRxiv FD Cold Spring Harbor Laboratory SP 083717 DO 10.1101/083717 A1 Genevieve Stein-O’Brien A1 Jacob Carey A1 Wai-shing Lee A1 Michael Considine A1 Alexander Favorov A1 Emily Flam A1 Theresa Guo A1 Lucy Li A1 Luigi Marchionni A1 Thomas Sherman A1 Shawn Sivy A1 Daria Gaykalova A1 Ronald McKay A1 Michael Ochs A1 Carlo Colantuoni A1 Elana Fertig YR 2016 UL http://biorxiv.org/content/early/2016/10/26/083717.abstract AB Summary NMF algorithms associate gene expression changes with biological processes (e.g., time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers identification. Therefore, we developed a novel PatternMarkers statistic to extract unique genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with PatternMarkers requires whole-genome data. However, NMF algorithms typically do not converge for the tens of thousands of genes in genome-wide profiling. Therefore, we also developed GWCoGAPS, the first robust Bayesian NMF technique for whole genome transcriptomics using the sparse, MCMC algorithm, CoGAPS. This software contains additional analytic and visualization tools including a Shiny web application, patternMatcher, which are generalized for any NMF. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTex data, illustrating GWCoGAPS and patternMarkers unique ability to detect data-driven biomarkers from whole genome data.Availability PatternMarkers and GWCoGAPS are in the CoGAPS Bioconductor package as of version 3.5 under the GPL license.Contact CColantu{at}jhmi.edu; ejfertig{at}jhmi.eduSupplementary information Supplementary data is available at Bioinformatics online.