Rapid Knowledgebase Construction and Hypotheses Generation Using Extractive Literature Search

Shaked Launer-Wachs; Hillel Taub-Tabib; Yoav Goldberg; Yosi Shamay

doi:10.1101/2022.02.13.480241

Abstract

As knowledgebases become increasingly important for structuring vast amounts of scientific knowledge and making it accessible to researchers, their construction entails expensive multi-year projects involving teams of bio-curators, computer scientists, or both. This restricts the coverage of existing knowledgebases to a limited set of popular topics, leaving a long tail of more specialized interests uncovered.

We present a methodology and a supporting tool to allow individual researchers or small teams, without background in bio-curation or computer science, to mine the scientific literature and construct ad-hoc, personalized, and literature-anchored knowledgebases, that are tailored around their specific research interests and support their scientific goals. The time investment involved in creating a knowledgebase ranges from a few hours to a few weeks, depending on the desired coverage and accuracy.

We demonstrate the methodology by constructing knowledgebases for different purposes: a high-level overview of challenges and controversies in a field (the cancer frontiers knowledgebase); a mapping of main concepts and interactions in a field, to support lab-internal hypothesis generation (tissue engineering and regeneration, cancer surgery and radiotherapy knowledgebases); and a comprehensive and accurate knowledgebase designated as an online up-to-date resource for the wider research community (the cell specific drug delivery knowledgebase). In each case we show how the structured knowledgebase, coupled with effective visualizations, facilitates effective data exploration, hypothesis generation and meta-analysis.

We implement the method as part of an open source web-based platform for knowledgebase construction, available publicly and freely at https://spike-kbc.apps.allenai.org.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

https://spike-kbc.apps.allenai.org

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.