TY - JOUR T1 - Synthesizer: Expediting synthesis studies from context-free data with natural language processing JF - bioRxiv DO - 10.1101/053629 SP - 053629 AU - Lisa Gandy AU - Jordan Gumm AU - Benjamin Fertig AU - Michael J. Kennish AU - Sameer Chavan AU - Ann Thessen AU - Luigi Marchionni AU - Xiaoxan Xia AU - Shambhavi Shankrit AU - Elana J Fertig Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/05/16/053629.abstract N2 - Today’s low cost digital data provides unprecedented opportunities for scientific discovery from synthesis studies. For example, the medical field is revolutionizing patient care by creating personalized treatment plans based upon mining electronic medical records, imaging, and genomics data. Standardized annotations are essential to subsequent analyses for synthesis studies. However, accurately combining records from diverse studies requires tedious and error-prone human curation, posing a significant barrier to synthesis studies. We propose a novel natural language processing (NLP) algorithm, Synthesize, to merge data annotations automatically. Application to patient characteristics for diverse human cancers and ecological datasets demonstrates the accuracy of Synthesize in diverse scientific disciplines. This NLP approach is implemented in an open-source software package, Synthesizer. Synthesizer is a generalized, user-friendly system for error-free data merging. ER -