RT Journal Article SR Electronic T1 Synthesizer: Expediting synthesis studies from context-free data with natural language processing JF bioRxiv FD Cold Spring Harbor Laboratory SP 053629 DO 10.1101/053629 A1 Lisa Gandy A1 Jordan Gumm A1 Benjamin Fertig A1 Michael J. Kennish A1 Sameer Chavan A1 Ann Thessen A1 Luigi Marchionni A1 Xiaoxan Xia A1 Shambhavi Shankrit A1 Elana J Fertig YR 2016 UL http://biorxiv.org/content/early/2016/05/16/053629.abstract AB Today’s low cost digital data provides unprecedented opportunities for scientific discovery from synthesis studies. For example, the medical field is revolutionizing patient care by creating personalized treatment plans based upon mining electronic medical records, imaging, and genomics data. Standardized annotations are essential to subsequent analyses for synthesis studies. However, accurately combining records from diverse studies requires tedious and error-prone human curation, posing a significant barrier to synthesis studies. We propose a novel natural language processing (NLP) algorithm, Synthesize, to merge data annotations automatically. Application to patient characteristics for diverse human cancers and ecological datasets demonstrates the accuracy of Synthesize in diverse scientific disciplines. This NLP approach is implemented in an open-source software package, Synthesizer. Synthesizer is a generalized, user-friendly system for error-free data merging.