PT - JOURNAL ARTICLE AU - Djordje Djordjevic AU - Yun Xin Chen AU - Shu Lun Shannon Kwan AU - Raymond W. K. Ling AU - Gordon Qian AU - Chelsea Y. Y. Woo AU - Samuel J. Ellis AU - Joshua W. K. Ho TI - GEOracle: Mining perturbation experiments using free text metadata in Gene Expression Omnibus AID - 10.1101/150896 DP - 2017 Jan 01 TA - bioRxiv PG - 150896 4099 - http://biorxiv.org/content/early/2017/06/16/150896.short 4100 - http://biorxiv.org/content/early/2017/06/16/150896.full AB - Summary There exists over 1.6 million publicly available gene expression samples across 79,000 data series in NCBI’s Gene Expression Omnibus database. Due to the lack of the use of standardised ontology terms to annotate the experimental type and sample type, this database remains difficult to harness computationally without significant manual intervention. In this work, we present an interactive R/Shiny tool called GEOracle that utilises text mining and machine learning techniques to automatically identify perturbation experiments, group treatment and control samples and perform differential expression. We present applications of GEOracle to discover conserved signalling pathway target genes and identify an organ specific gene regulatory network.Availability GEOracle is available at http://georacle.victorchang.edu.au/Contact jho{at}victorchang.edu.auSupplementary information Supplementary data are available at BioRXiv