PT - JOURNAL ARTICLE AU - Haoyang Zeng AU - Tatsunori Hashimoto AU - Daniel D Kang AU - David K Gifford TI - GERV: A Statistical Method for Generative Evaluation of Regulatory Variants for Transcription Factor Binding AID - 10.1101/017392 DP - 2015 Jan 01 TA - bioRxiv PG - 017392 4099 - http://biorxiv.org/content/early/2015/07/04/017392.short 4100 - http://biorxiv.org/content/early/2015/07/04/017392.full AB - The majority of disease-associated variants identified in genome-wide association studies (GWAS) reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of GWAS studies. We present GERV (Generative Evaluation of Regulatory Variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor’s canonical motif as well as associated co-factor motifs. We show that GERV outperforms existing methods in predicting SNPs associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked SNPs, and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally an-notating and prioritizing causal variants for experimental follow-up analysis. The implementation of GERV and related data are available at http://gerv.csail.mit.edu/