TY - JOUR T1 - LMethyR-SVM: Predict human enhancers using low methylated regions based on weighted support vector machines JF - bioRxiv DO - 10.1101/054221 SP - 054221 AU - Jingting Xu AU - Hong Hu AU - Yang Dai Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/05/19/054221.abstract N2 - Background The identification of enhancer is a challenging task. Various types of epigenetic information including histone modification have been utilized in the construction of enhancer prediction models based on a diverse panel of machine learning models. However, DNA methylation profiles generated from the whole genome bisulfate sequencing (WGBS) have not been fully explored for their potential in enhancer prediction despite the fact that low methylated regions (LMRs) have been implied to be distal active regulatory regions.Method In this work we propose a prediction framework, LMethyR-SVM, using LMRs identified from cell-type-specific WGBS DNA methylation profiles based on an unlabeled-negative learning framework. In LMethyR-SVM, the set of cell-type-specific LMRs is further divided into three sets: reliable positive, like positive, and likely negative, according to their resemblance to a small set of experimentally validated enhancers in the VISTA database based on an estimated non-parametric density distribution. Then, the prediction model is trained by solving a weighted support vector machine.Results We demonstrate the performance of LMethyR-SVM by using the WGBS DNA methylation profiles derived from the H1 human embryonic stem cell type (H1) and the fetal lung fibroblast cell type (IMR90). The predicted enhancers are highly conserved with a reasonable validation rate based on a set of commonly used positive markers including transcription factors, p300 binding and DNase-I hypersensitive sites. In addition, we show evidence that the large fraction of LMethyR-SVM predicted enhancers are not predicted by ChromHMM in H1 cell type and they are more enriched for the FANTOM5 enhancers.Conclusion Our work suggests that low methylated regions detected from the WGBS data are useful as complementary resources to histone modification marks in developing models for the prediction of cell type-specific enhancers. ER -