Abstract
Long-read sequencing technologies allow the systematic interrogation of transcriptomes from any species. However, functional characterization requires the determination of the correct 5’-to-3’ orientation of reads. Oxford Nanopore Technologies (ONT) allows the direct measurement of RNA molecules in the native orientation (Garalde et al. 2018), but sequencing of complementary-DNA (cDNA) libraries yields generally a larger number of reads (Workman et al. 2018). Although strand-specific adapters can be used, error rates hinder their detection. Current methods rely on the comparison to a genome or transcriptome reference (Wyman and Mortazavi 2018; Workman et al. 2018) or on the use of additional technologies (Fu et al. 2018), which limits the applicability of rapid and cost-effective long-read sequencing for transcriptomics beyond model species. To facilitate the interrogation of transcriptomes de-novo in species or samples for which a genome or transcriptome reference is not available, we have developed ReorientExpress (https://github.com/comprna/reorientexpress), a new tool to perform reference-free orientation of ONT reads from a cDNA library, with our without stranded adapters. ReorientExpress uses a deep neural network (DNN) to predict the orientation of cDNA long-reads independently of adapters and without using a reference.