RT Journal Article SR Electronic T1 The Impact of High-Performance Computing Best Practice Applied to Next-Generation Sequencing Workflows JF bioRxiv FD Cold Spring Harbor Laboratory SP 017665 DO 10.1101/017665 A1 Pierre Carrier A1 Bill Long A1 Richard Walsh A1 Jef Dawson A1 Carlos P. Sosa A1 Brian Haas A1 Timothy Tickle A1 Thomas William YR 2015 UL http://biorxiv.org/content/early/2015/04/07/017665.abstract AB High Performance Computing (HPC) Best Practice offers opportunities to implement lessons learned in areas such as computational chemistry and physics in genomics workflows, specifically Next-Generation Sequencing (NGS) workflows. In this study we will briefly describe how distributed-memory parallelism can be an important enhancement to the performance and resource utilization of NGS workflows. We will illustrate this point by showing results on the parallelization of the Inchworm module of the Trinity RNA-Seq pipeline for de novo transcriptome assembly. We show that these types of applications can scale to thousands of cores. Time scaling as well as memory scaling will be discussed at length using two RNA-Seq datasets, targeting the Mus musculus (mouse) and the Axolotl (Mexican salamander). Details about the efficient MPI communication and the impact on performance will also be shown. We hope to demonstrate that this type of parallelization approach can be extended to most types of bioinformatics workflows, with substantial benefits. The efficient, distributed-memory parallel implementation eliminates memory bottlenecks and dramatically accelerates NGS analysis. We further include a summary of programming paradigms available to the bioinformatics community, such as C++/MPI.