RT Journal Article SR Electronic T1 Semi-Automated Identification of Ontological Labels in the Biomedical Literature with goldi JF bioRxiv FD Cold Spring Harbor Laboratory SP 073460 DO 10.1101/073460 A1 Christopher B. Cole A1 Sejal Patel A1 Leon French A1 Jo Knight YR 2016 UL http://biorxiv.org/content/early/2016/10/18/073460.abstract AB Recent growth in both the scale and the scope of large publicly available ontologies has spurred the development of computational methodologies which can leverage structured information to answer important questions. However, ontological labels, or “terms” have thus far proved difficult to use in practice; text mining, one crucial aspect of electronically understanding and parsing the biomedical literature, has historically had difficulty identifying “terms” in literature. In this article, we present goldi, an open source R package whose goal it is to identify terms of variable length in free form text. It is available at https://github.com/Chris1221/goldi or through CRAN. The algorithm works through identifying words or synonyms of words present in individual terms and comparing the number of present words to an acceptance function for decision making. In this article we present the theoretical rationale behind the algorithm, as well as practical advice for its usage applied to Gene Ontology term identification and quantification. We additionally detail the options available and describe their respective computational efficiencies.