%0 Journal Article %A Mingxiang Teng %A Rafael A. Irizarry %T Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-Seq peak callers %D 2016 %R 10.1101/090704 %J bioRxiv %P 090704 %X ChIP-seq technology is widely used in biomedical and basic science research. The main application is the detection of genomic regions that bind to a protein of interest. ChIP-Seq studies rely on peak calling algorithms that attempt to infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance as a result of the experimental protocol’s lack of perfect specificity. We find that GC-content bias accounts for a substantial amount of variability in the observed coverage for ChIP-Seq experiments and that this variability leads to false positive peak calls. More concerning is that the GC-effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell-line. Although solutions have been proposed for GC-bias corrections in other Next Generation Sequencing (NGS) applications, accounting for GC-content in ChIP-Seq data is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signal with the unwanted variability we want to remove. To account for this challenge we introduce a statistical approach, based on a mixture model, that accounts for GC-content effects on both non-specific noise and signal induced by the binding site we seek to detect. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show improved consistency across laboratories. %U https://www.biorxiv.org/content/biorxiv/early/2016/11/30/090704.full.pdf