RT Journal Article SR Electronic T1 LR-DNase: Predicting TF binding from DNase-seq data JF bioRxiv FD Cold Spring Harbor Laboratory SP 082594 DO 10.1101/082594 A1 Arjan van der Velde A1 Michael Purcaro A1 William Stafford Noble A1 Zhiping Weng YR 2016 UL http://biorxiv.org/content/early/2016/10/24/082594.abstract AB Transcription factors play a key role in the regulation of gene expression. Hypersensitivity to DNase I cleavage has long been used to gauge the accessibility of genomic DNA for transcription factor binding and as an indicator of regulatory genomic locations. An increasing amount of ChIP-seq data on a large number of TFs is being generated, mostly in a small number of cell types. DNase-seq data are being produced for hundreds of cell types. We aimed to develop a computational method that could combine ChIP-seq and DNase-seq data to predict TF binding sites in a wide variety of cell types. We trained and tested a logistic regression model, called LR-DNase, to predict binding sites for a specific TF using seven features derived from DNase-seq and genomic sequence. We calculated the area under the precision-recall curve at a false discovery rate cutoff of 0.5 for the LR-DNase model, a number of logistic regression models with fewer features, and several existing state-of-the-art TF binding prediction methods. The LR-DNase model outperformed existing unsupervised and supervised methods. Additionally, for many TFs, a model that uses only two features, DNase-seq reads and motif score, was sufficient to match the performance of the best existing methods.