TY - JOUR T1 - Biomedical Text Mining for Research Rigor and Integrity: Tasks, Challenges, Directions JF - bioRxiv DO - 10.1101/108480 SP - 108480 AU - Halil Kilicoglu Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/02/14/108480.abstract N2 - An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted, due to problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the end result of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part towards enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload, and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can add checks and balances that promote responsible research practices and can provide significant benefits for the biomedical research enterprise.Supplementary information Supplementary material is available at BioRxiv. ER -