RT Journal Article SR Electronic T1 Annotation and differential analysis of alternative splicing using de novo assembly of RNAseq data JF bioRxiv FD Cold Spring Harbor Laboratory SP 074807 DO 10.1101/074807 A1 Clara Benoit-Pilven A1 Camille Marchet A1 Emilie Chautard A1 Leandro Lima A1 Marie-Pierre Lambert A1 Gustavo Sacomoto A1 Amandine Rey A1 Cyril Bourgeois A1 Didier Auboeuf A1 Vincent Lacroix YR 2016 UL http://biorxiv.org/content/early/2016/09/12/074807.abstract AB Genome-wide analyses reveal that more than 90% of multi exonic human genes produce at least two transcripts through alternative splicing (AS). Various bioinformatics methods are available to analyze AS from RNAseq data. Most methods start by mapping the reads to an annotated reference genome, but some start by a de novo assembly of the reads. In this paper, we present a systematic comparison of a mapping-first approach (FaRLine) and an assembly-first approach (KisSplice). These two approaches are event-based, as they focus on the regions of the transcripts that vary in their exon content. We applied these methods to an RNAseq dataset from a neuroblastoma SK-N-SH cell line (ENCODE) differentiated or not using retinoic acid. We found that the predictions of the two pipelines overlapped (70% of exon skipping events were common), but with noticeable differences. The assembly-first approach allowed to find more novel variants, including novel unannotated exons and splice sites. It also predicted AS in families of paralog genes. The mapping-first approach allowed to find more lowly expressed splicing variants, and was better in predicting exons overlapping repeated elements. This work demonstrates that annotating AS with a single approach leads to missing a large number of candidates. We further show that these candidates cannot be neglected, since many of them are differentially regulated across conditions, and can be validated experimentally. We therefore advocate for the combine use of both mapping-first and assembly-first approaches for the annotation and differential analysis of AS from RNAseq data.