TY - JOUR T1 - Modelling the transcription factor DNA-binding affinity using genome-wide ChIP-based data JF - bioRxiv DO - 10.1101/061978 SP - 061978 AU - Monther Alhamdoosh AU - Dianhui Wang Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/07/04/061978.abstract N2 - Understanding protein-DNA binding affinity is still a mystery for many transcription factors (TFs). Although several approaches have been proposed in the literature to model the DNA-binding specificity of TFs, they still have some limitations. Most of the methods require a cut-off threshold in order to classify a K-mer as a binding site (BS) and finding such a threshold is usually done by handcraft rather than a science. Some other approaches use a prior knowledge on the biological context of regulatory elements in the genome along with machine learning algorithms to build classifier models for TFBSs. Noticeably, these methods deliberately select the training and testing datasets so that they are very separable. Hence, the current methods do not actually capture the TF-DNA binding relationship. In this paper, we present a threshold-free framework based on a novel ensemble learning algorithm in order to locate TFBSs in DNA sequences. Our proposed approach creates TF-specific classifier models using genome-wide DNA-binding experiments and a prior biological knowledge on DNA sequences and TF binding preferences. Systematic background filtering algorithms are utilized to remove non-functional K-mers from training and testing datasets. To reduce the complexity of classifier models, a fast feature selection algorithm is employed. Finally, the created classifier models are used to scan new DNA sequences and identify potential binding sites. The analysis results show that our proposed approach is able to identify novel binding sites in the Saccharomyces cerevisiae genome.Contact monther.alhamdoosh{at}unimelb.edu.au, dh.wang{at}latrobe.edu.auAvailability http://homepage.cs.latrobe.edu.au/dwang/DNNESCANweb ER -