RT Journal Article SR Electronic T1 Dumpster diving in RNA-sequencing to find the source of every last read JF bioRxiv FD Cold Spring Harbor Laboratory SP 053041 DO 10.1101/053041 A1 Serghei Mangul A1 Harry Taegyun Yang A1 Nicolas Strauli A1 Franziska Gruhl A1 Timothy Daley A1 Stephanie Christenson A1 Agata Wesolowska-Andersen A1 Roberto Spreafico A1 Cydney Rios A1 Celeste Eng A1 Andrew D. Smith A1 Ryan D. Hernandez A1 Roel A. Ophoff A1 Jose Rodriguez Santana A1 Prescott G. Woodruff A1 Esteban Burchard A1 Max A. Seibold A1 Sagiv Shifman A1 Eleazar Eskin A1 Noah Zaitlen YR 2016 UL http://biorxiv.org/content/early/2016/05/13/053041.abstract AB High throughput RNA sequencing technologies have provided invaluable research opportunities across distinct scientific domains by producing quantitative readouts of the transcriptional activity of both entire cellular populations and single cells. The majority of RNA-Seq analyses begin by mapping each experimentally produced sequence (i.e., read) to a set of annotated reference sequences for the organism of interest. For both biological and technical reasons, a significant fraction of reads remains unmapped. In this work we develop a read origin protocol (ROP) aimed at discovering the source of all reads, originated from complex RNA molecules, recombinant antibodies and microbial communities. Our approach can account for 98.8% of all reads across poly(A) and ribo-depletion protocols. Furthermore, using ROP we show that immune profiles of asthmatic individuals are significantly different from the control individuals with decreased average per sample T-cell/B-cell receptor diversity and that immune diversity is inversely correlated with microbial load. This demonstrates the potential of ROP to exploit unmapped reads to better understand the functional mechanisms underlying the connection between immune system, microbiome, human gene expression, and disease etiology.The ROP pipeline is freely available at https://sergheimangul.wordpress.com/rop/