PT - JOURNAL ARTICLE AU - K. Hayer AU - A. Pizzaro AU - N. L. Lahens AU - J. B. Hogenesch AU - G. R. Grant TI - Benchmark Analysis of Algorithms for Determining and Quantifying Full-length mRNA Splice Forms from RNA-Seq Data AID - 10.1101/007088 DP - 2014 Jan 01 TA - bioRxiv PG - 007088 4099 - http://biorxiv.org/content/early/2014/07/14/007088.short 4100 - http://biorxiv.org/content/early/2014/07/14/007088.full AB - The advantages of RNA sequencing (RNA-Seq) suggest it will replace microarrays for highly parallel gene expression analysis. For example, in contrast to arrays, RNA-Seq is expected to be able to provide accurate identification and quantification of full-length transcripts. A number of methods have been developed for this purpose, but short error prone reads makes it a difficult problem in practice. It is essential to determine which algorithms perform best, and where and why they fail. However, there is a dearth of independent and unbiased benchmarking studies of these algorithms. Here we take an approach using both simulated and experimental benchmark data to evaluate their accuracy. We conclude that most methods are inaccurate even using idealized data, and that no is method sufficiently accurate once complicating factors such as polymorphisms, intron signal, sequencing error, and multiple splice forms are present. These results point to the pressing need for further algorithm development.