TY - JOUR T1 - Whole Genome Regulatory Variant Evaluation for Transcription Factor Binding JF - bioRxiv DO - 10.1101/017392 SP - 017392 AU - Haoyang Zeng AU - Tatsunori Hashimoto AU - Daniel D. Kang AU - David K. Gifford Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/04/01/017392.abstract N2 - Contemporary approaches to predict single nucleotide polymorphisms (SNPs) that alter transcription factor binding rely upon the sequence affinity of a transcription factor as represented by its canonical motif. WAVE (Whole-genome regulAtory Variants Evaluation) is a novel method for predicting more general regulatory variants that affect transcription factor binding, including those that fall outside of the canonical motif. WAVE learns a k-mer based generative model of transcription factor binding from ChIP-seq data and scores variants using its generative binding model. The k-mers learned by WAVE capture more sequence feature in transcription factor binding than a motif-based approach alone, including both a transcription factor’s canonical motif as well as associated co-factor motifs. WAVE significantly outperforms motif-based methods in predicting SNPs associated with allele-specific binding.Author Summary Specific variations in our genome sequence can render us more susceptible to a genetic disease. Certain disease risks are caused by genetic variations that alter where transcription factors bind to the genome and regulate cellular function. Previous methods for identifying which genetic changes are significant have assumed that transcription factor binding is dependent on a short single sequence recognized by a transcription factor. Here we consider a more general model where the binding of a factor may be up or down regulated by any number of short DNA sequences that are proximal to a binding site. Our method substantially improves the detection of genomic changes that are important for factor binding. ER -