RT Journal Article SR Electronic T1 Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation JF bioRxiv FD Cold Spring Harbor Laboratory SP 025767 DO 10.1101/025767 A1 Michael I. Love A1 John B. Hogenesch A1 Rafael A. Irizarry YR 2015 UL http://biorxiv.org/content/early/2015/08/28/025767.abstract AB RNA-seq technology is widely used in biomedical and basic science research. These studies rely on complex computational methods that quantify expression levels for observed transcripts. We find that current computational methods can lead to hundreds of false positive results related to alternative isoform usage. This flaw in the current methodology stems from a lack of modeling sample-specific bias that leads to drops in coverage and is related to sequence features like fragment GC content and GC stretches. By incorporating features that explain this bias into transcript expression models, we greatly increase the specificity of transcript expression estimates, with more than a four-fold reduction in the number of false positives for reported changes in expression. We introduce alpine, a method for estimation of bias-corrected transcript abundance. The method is available as a Bioconductor package that includes data visualization tools useful for bias discovery.