TY - JOUR T1 - Challenges and solutions for transcriptome assembly in non-model organisms with an application to hybrid specimens JF - bioRxiv DO - 10.1101/084145 SP - 084145 AU - Ungaro Arnaud AU - Pech Nicolas AU - Martin Jean-François AU - McCairns R.J. Scott AU - Chappaz Rémi AU - Gilles André Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/10/28/084145.abstract N2 - Analyses of high-throughput transcriptome sequences of non-model organisms are based on three main approaches: de novo assembly, genome-guided assembly, or direct read to genome mapping (DGM). We describe a flexible DGM pipeline, and demonstrate its performance by using simulated reads of lengths corresponding to those generated by the most common sequencing platforms, and over a realistic range of genetic divergence. We also evaluate the performance of a combined pipeline (de novo + DGM) via simulation and empirically, using data from two hybridizing Cyprinid fish species. Finally, we explore the assignation of F1 hybrids reads to their parental species, and discuss the implications of erroneous assignations on gene expression studies. Our DGM pipeline recovers 94.8% of the genes irrespective of read length at 0% divergence; however, assignation rate of reads is negatively impacted both by increasing divergence level and reducing read lengths. Likewise, our combined de novo + DGM pipeline outperforms de novo analyses alone at all levels of divergence and the read length. ER -