Abstract
Genome-wide profiling of chromatin accessibility by DNase-seq or ATAC-seq has been widely used to identify regulatory DNA elements and transcription factor binding sites. However, enzymatic DNA cleavage exhibits intrinsic sequence biases that confound chromatin accessibility profiling data analysis. Existing computational tools are limited in their ability to account for such intrinsic biases. Here, we present Simplex Encoded Linear Model for Accessible Chromatin (SELMA), a computational method for systematic estimation of intrinsic cleavage biases from genomic chromatin accessibility profiling data. We demonstrate that SELMA yields accurate and robust bias estimation from both bulk and single-cell DNase-seq and ATAC-seq data. We show that transcription factor binding inference from DNase footprints can be improved by incorporating estimated biases using SELMA. We also demonstrate improved cell clustering of single-cell ATAC-seq data by considering the SELMA-estimated bias effect. SELMA can be applied to existing bioinformatics tools to improve the analysis of chromatin accessibility sequencing data.
Competing Interest Statement
The authors have declared no competing interest.