Differential Transcript Usage Analysis Incorporating Quantification Uncertainty Via Compositional Measurement Error Regression Modeling

Scott Van Buren; Naim Rashid

doi:10.1101/2020.05.22.111450

Abstract

Differential transcript usage (DTU) occurs when the relative transcript abundance of a gene changes between different conditions. Existing approaches to analyze DTU often rely on computational procedures that can have speed and scalability issues as the number of samples increases. In this paper, we propose a new method, termed CompDTU, that utilizes compositional regression to model transcript-level relative abundance proportions that are of interest in DTU analyses. This procedure does not suffer from speed and scalability issues due to the relative computational simplicity, making it ideally suited for DTU analysis with large sample sizes. The method also allows for the testing of and controlling for multiple categorical or continuous covariates. Additionally, many existing approaches for DTU ignore quantification uncertainty present in RNA-Seq data, where prior work has shown that accounting for such uncertainty may improve testing performance. We extend our CompDTU method to incorporate quantification uncertainty using bootstrap replicates of abundance estimates from Salmon and term this method CompDTUme. Through several power analyses, we show that CompDTU improves sensitivity and reduces false positive results relative to existing methods. Additionally, CompDTUme results in further improvements in performance over CompDTU with sufficient sample size for genes with high levels of quantification uncertainty while maintaining favorable speed and scalability.