TY - JOUR T1 - The Impact of High-Performance Computing Best Practice Applied to Next-Generation Sequencing Workflows JF - bioRxiv DO - 10.1101/017665 SP - 017665 AU - Pierre Carrier AU - Bill Long AU - Richard Walsh AU - Jef Dawson AU - Carlos P. Sosa AU - Brian Haas AU - Timothy Tickle AU - Thomas William Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/04/07/017665.abstract N2 - High Performance Computing (HPC) Best Practice offers opportunities to implement lessons learned in areas such as computational chemistry and physics in genomics workflows, specifically Next-Generation Sequencing (NGS) workflows. In this study we will briefly describe how distributed-memory parallelism can be an important enhancement to the performance and resource utilization of NGS workflows. We will illustrate this point by showing results on the parallelization of the Inchworm module of the Trinity RNA-Seq pipeline for de novo transcriptome assembly. We show that these types of applications can scale to thousands of cores. Time scaling as well as memory scaling will be discussed at length using two RNA-Seq datasets, targeting the Mus musculus (mouse) and the Axolotl (Mexican salamander). Details about the efficient MPI communication and the impact on performance will also be shown. We hope to demonstrate that this type of parallelization approach can be extended to most types of bioinformatics workflows, with substantial benefits. The efficient, distributed-memory parallel implementation eliminates memory bottlenecks and dramatically accelerates NGS analysis. We further include a summary of programming paradigms available to the bioinformatics community, such as C++/MPI. ER -