Abstract
High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure genome-wide regulatory element activity across the human genome. The assays, however, also present substantial analytical challenges. Here, we identify technical biases that explain most of the variance in STARR-seq signals. We then develop a statistical model to correct those biases and to improve detection of regulatory elements. This approach substantially improves precision and recall over current methods, improves detection of both activating and repressive regulatory elements, and controls for false discoveries despite strong local signal correlations.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Authors’ email addresses Young-Sook Kim: yk162{at}duke.edu; Graham D. Johnson: grahams.mailbox{at}gmail.com; Jungkyun Seo: jungkyun.seo{at}duke.edu; Alejandro Barrera: alejandro.barrera{at}duke.edu; William H. Majoros: william.majoros{at}duke.edu; Alejandro Ochoa: alejandro.ochoa{at}duke.edu; Andrew S. Allen: andrew.s.allen{at}duke.edu