Abstract
Background Annotation of biomedical entities with ontology classes provides for formal semantic analysis and mobilisation of background knowledge in determining their relationships. To date enrichment analysis has been routinely employed to identify classes that are over-represented in annotations across sets of groups, such as biosample gene expression profiles or patient phenotypes. These approaches, however, usually consider only univariate relationships, make limited use of the semantic features of ontologies, and provide limited information and evaluation of the explanatory power of both singular and grouped candidate classes. Moreover, they do not solve the problem of deriving cohesive, characteristic, and discriminatory sets of classes for entity groups.
Results We have developed a new method, Klarigi, which introduces multiple scoring heuristics for identification of classes that are both compositional and discriminatory for groups of entities annotated with ontology classes. The tool includes a novel algorithm for derivation of multivariable semantic explanations for entity groups, makes use of semantic inference through live use of an ontology reasoner, and includes a classification method for identifying the discriminatory power of candidate sets. We describe the design and implementation of Klarigi, and evaluate its use in two test cases, comparing and contrasting methods and results with literature and enrichment analysis methods.
Conclusions We demonstrate that Klarigi produces characteristic and discriminatory explanations for groups of biomedical entities in two settings. We also show that these explanations recapitulate and extend the knowledge held in existing biomedical databases and literature for several diseases. We conclude that Klarigi provides a distinct and valuable perspective on biomedical datasets when compared with traditional enrichment methods, and therefore constitutes a new method by which biomedical datasets can be explored, contributing to improved insight into semantic data.
Competing Interest Statement
John Williams is an employee of Eisai, Inc. Eisai, Inc had no role in funding or design of this study.
Footnotes
Resampled to correct class removals. Added clinical and biological interpretation. Rewritten aspects of discussion, results, and intro to clarify and provide better description of the approach and compare to other methods.
Abbreviations
- HPO
- Human Phenotype Ontology
- IHPRF3
- Hypotonia infantile with psychomotor retardation and characteristic facies 3
- OMIM
- Online Mendelian Inheritance In Man
- OWL
- Web Ontology Language
- GSEA
- Gene Set Enrichment Analysis
- EGL
- Exclusive Group Loading
- GO
- Gene Ontology
- AUC
- Area Under receiver operating Characteristic
- IC
- Information Content
- MIMIC-III
- Medical Information Mart for Intensive Care III
- ML
- Machine Learning.