RT Journal Article SR Electronic T1 geneXtendeR: R/Bioconductor package for functional annotation of histone modification ChIP-seq data in a 3D genome world JF bioRxiv FD Cold Spring Harbor Laboratory SP 082347 DO 10.1101/082347 A1 Bohdan B. Khomtchouk A1 Derek J. Van Booven A1 Claes Wahlestedt YR 2016 UL http://biorxiv.org/content/early/2016/10/21/082347.abstract AB Motivation Functional genomic annotation of epigenetic histone modification ChIP-seq data is a computationally challenging task. Epigenetic histone modifications that have inherently broad peaks with a diffuse range of signal enrichment (e.g., H3K9me1, H3K27me3) differ significantly from narrow peaks that exhibit a compact and localized enrichment pattern (e.g., H3K4me3, H3K9ac). Varying degrees of tissue-dependent broadness of the specific epigenetic mark coupled with environmentally mediated 3D-folding of chromosomes for long-range communication make it difficult to accurately and reliably link sequencing data to biological function. Hence, it would be useful to develop a software program that can precisely tailor the computational analysis of a histone modification ChIP-seq dataset to the specific tissue-dependent, environmentally mediated characteristics of the data.Results geneXtendeR is an R/Bioconductor package designed to optimally annotate a histone modification ChIP-seq peak input file with functionally important genomic features (e.g., genes associated with peaks) based on optimization calculations. geneXtendeR optimally extends the boundaries of every gene in a genome by some genomic distance (in DNA base pairs) for the purpose of flexibly incorporating cis and trans-regulatory elements, such as enhancers and promoters, as well as downstream elements that are important to the function of the gene relative to an epigenetic histone modification ChIP-seq dataset. geneXtendeR computes optimal gene extensions tailored to the broadness of the specific epigenetic mark (e.g., H3K9me1, H3K27me3), as determined by a user-supplied ChIP-seq peak input file. As such, geneXtendeR maximizes the signal-to-noise ratio of locating genes closest to and directly under peaks that may be linked by epigenetic regulation. By performing a computational expansion of this nature, ChIP-seq reads that would initially not map strictly to a specific gene can now be optimally mapped to the regulatory regions of the gene, thereby implicating the gene as a potential candidate, and thereby making the ChIP-seq experiment more successful. Such an approach becomes particularly important when working with epigenetic histone modifications that have inherently broad peaks. We have tested geneXtendeR on 547 human transcription factor ChIP-seq ENCODE datasets and 215 human histone modification ChIP-seq ENCODE datasets, providing the analysis results as case studies.Availability The geneXtendeR R/Bioconductor package (including detailed introductory vignettes) is available under the GPL-3 Open Source license and is freely available to download from Bioconductor at: https://bioconductor.org/packages/geneXtendeR/.Contact b.khomtchouk{at}med.miami.eduAuthor Summary geneXtendeR is an R/Bioconductor package for histone modification ChIP-seq analysis. It is designed to optimally annotate a histone modification ChIP-seq peak input file with functionally important genomic features (e.g., genes associated with peaks) based on optimization calculations. geneXtendeR optimally extends the boundaries of every gene in a genome by some genomic distance (in DNA base pairs) for the purpose of flexibly incorporating cis-regulatory elements, such as enhancers and promoters, as well as downstream elements that are important to the function of the gene (relative to an epigenetic histone modification ChIP-seq dataset). geneXtendeR computes optimal gene extensions tailored to the broadness of the specific epigenetic mark (e.g., H3K9me1, H3K27me3), as determined by a user-supplied ChIP-seq peak input file. As such, geneXtendeR maximizes the signal-to-noise ratio of locating genes closest to and directly under peaks. By performing a computational expansion of this nature, ChIP-seq reads that would initially not map strictly to a specific gene can now be optimally mapped to the regulatory regions of the gene, thereby implicating the gene as a potential candidate, and thereby making the ChIP-seq experiment more successful. Such an approach becomes particularly important when working with epigenetic histone modifications that have inherently broad peaks. Vignettes detailing input/output requirements, suggested biological workflows, and underlying package infrastructure are included as part of the R/Bioconductor package. We have tested geneXtendeR on 547 human transcription factor ChIP-seq ENCODE datasets and 215 human histone modification ChIP-seq ENCODE datasets, providing the analysis results as case studies.