TY - JOUR T1 - PEDLA: predicting enhancers with a deep learning-based algorithmic framework JF - bioRxiv DO - 10.1101/036129 SP - 036129 AU - Feng Liu AU - Hao Li AU - Chao Ren AU - Xiaochen Bo AU - Wenjie Shu Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/01/07/036129.abstract N2 - Transcriptional enhancers are non-coding segments of DNA that play a central role in the spatiotemporal regulation of gene expression programs. However, systematically and precisely predicting enhancers on a genome-wide scale remain a major challenge. Although existing methods have achieved some success in enhancer prediction, they still suffer from a limited number of training samples, a simplicity of features, class-imbalanced data, and inconsistent performance across diverse cell types/tissues. Here, we developed a deep learning-based algorithmic framework named PEDLA (https://github.com/wenjiegroup/PEDLA), which can directly learn an enhancer predictor from massively heterogeneous data and generalize in ways that are mostly consistent across various cell types/tissues. We first trained PEDLA with 1,114-dimensional heterogeneous features in H1 cells, and we demonstrated that our PEDLA framework integrates diverse heterogeneous features and gives state-of-the-art performance relative to five existing methods for enhancer prediction. We further extended PEDLA to continuously learn from 22 training cell types/tissues, and the results showed that PEDLA manifested superior performance consistency in both training and independent test sets. On average, PEDLA achieved 95.0% accuracy and a 96.8% geometric mean (GM) across 22 training cell types/tissues, as well as 95.7% accuracy and a 96.8% GM across 20 independent test cell types/tissues. Together, our work illustrates the power of harnessing state-of-the-art deep learning techniques to consistently identify regulatory elements at a genome-wide scale from massively heterogeneous data across diverse cell types/tissues. ER -