RT Journal Article SR Electronic T1 System-wide automatic extraction of functional signatures in Pseudomonas aeruginosa with eADAGE JF bioRxiv FD Cold Spring Harbor Laboratory SP 078659 DO 10.1101/078659 A1 Jie Tan A1 Georgia Doing A1 Kimberley A. Lewis A1 Courtney E. Price A1 Kathleen M. Chen A1 Kyle C. Cady A1 Barret Perchuk A1 Michael T. Laub A1 Deborah A. Hogan A1 Casey S. Greene YR 2016 UL http://biorxiv.org/content/early/2016/10/03/078659.abstract AB Abundant public expression data capture gene expression across diverse conditions. These steady state mRNA measurements could reveal the transcriptional consequences of cells’ genetic backgrounds or their responses to the environment. However, public data remain relatively untapped, in part because extracting biological signal as opposed to technical noise remains challenging. Here we introduce a procedure, termed eADAGE, that performs unsupervised integration of public expression data using an ensemble of neural networks as well as heuristics that, given a dataset, help users identify an appropriate level of model complexity. This ensemble modeling approach captures biological pathways more clearly than existing methods, enabling analyses that span entire public gene expression compendia such as that for the bacterium Pseudomonas aeruginosa. These analyses reveal a previously undiscovered feature of the phosphate starvation response apparent in public data: a sensor kinase, KinB, that is required for full activation of the response to phosphate at intermediate concentrations. Our molecular validation experiments confirm this role of KinB and our screen of a histidine kinase knock out collection confirmed the prediction’s specificity. Public data are captured from a broad range of conditions in diverse organism backgrounds and may provide a unique opportunity to identify these subtle and context-specific regulatory interactions. Algorithms that extract biological signal from these data, such as eADAGE, can highlight opportunities to discover mechanisms that are apparent from but unrealized in public data.