TY - JOUR T1 - Semi-Automated Identification of Ontological Labels in the Biomedical Literature with goldi JF - bioRxiv DO - 10.1101/073460 SP - 073460 AU - Christopher B. Cole AU - Sejal Patel AU - Leon French AU - Jo Knight Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/10/18/073460.abstract N2 - Recent growth in both the scale and the scope of large publicly available ontologies has spurred the development of computational methodologies which can leverage structured information to answer important questions. However, ontological labels, or “terms” have thus far proved difficult to use in practice; text mining, one crucial aspect of electronically understanding and parsing the biomedical literature, has historically had difficulty identifying “terms” in literature. In this article, we present goldi, an open source R package whose goal it is to identify terms of variable length in free form text. It is available at https://github.com/Chris1221/goldi or through CRAN. The algorithm works through identifying words or synonyms of words present in individual terms and comparing the number of present words to an acceptance function for decision making. In this article we present the theoretical rationale behind the algorithm, as well as practical advice for its usage applied to Gene Ontology term identification and quantification. We additionally detail the options available and describe their respective computational efficiencies. ER -