PT - JOURNAL ARTICLE AU - Nicholas F. Lahens AU - Ibrahim Halil Kavakli AU - Ray Zhang AU - Katharina Hayer AU - Michael B. Black AU - Hannah Dueck AU - Angel Pizarro AU - Junhyong Kim AU - Rafael Irizarry AU - Russell S. Thomas AU - Gregory R. Grant AU - John B. Hogenesch TI - IVT-seq reveals extreme bias in RNA-sequencing AID - 10.1101/005371 DP - 2014 Jan 01 TA - bioRxiv PG - 005371 4099 - http://biorxiv.org/content/early/2014/05/21/005371.short 4100 - http://biorxiv.org/content/early/2014/05/21/005371.full AB - Background RNA sequencing (RNA-seq) is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value.Results Here we present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of > 1000 in vitro transcribed (IVT) RNAs from a full-length human cDNA library and sequenced them with poly-A and total RNA-seq, the most common protocols. Because each cDNA is full length and we show IVT is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find ∼50% of transcripts have > 2-fold and ∼10% have > 10-fold differences in within-transcript sequence coverage. Strikingly, we also find > 6% of transcripts have regions of high, unpredictable sequencing coverage, where the same transcript varies dramatically in coverage between samples, confounding accurate determination of their expression. To get at causal factors, we used a combination of experimental and computational approaches to show that rRNA depletion is responsible for the most significant variability in coverage and that several sequence determinants also strongly influence representation.Conclusions In sum, these results show the utility of IVT-seq in promoting better understanding of bias introduced by RNA-seq and suggest caution in its interpretation. Furthermore, we find that rRNA-depletion is responsible for substantial, unappreciated biases in coverage. Perhaps most importantly, these coverage biases introduced during library preparation suggest exon level expression analysis may be inadvisable.