TY - JOUR T1 - Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment JF - bioRxiv DO - 10.1101/092171 SP - 092171 AU - Yatish Turakhia AU - Kevin Jie Zheng AU - Gill Bejerano AU - William J. Dally Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/01/15/092171.abstract N2 - Genomics is set to transform medicine and our understanding of life in fundamental ways. But the growth in genomics data has been overwhelming - far outpacing Moore's Law. The advent of third generation sequencing technologies is providing new insights into genomic contribution to diseases with complex mutation events, but have prohibitively high computational costs. Over 1,300 CPU hours are required to align reads from a 54× coverage of the human genome to a reference (estimated using [1]), and over 15,600 CPU hours to assemble the reads de novo [2]. This paper proposes “Darwin” - a hardware-accelerated framework for genomic sequence alignment that, without sacrificing sensitivity, provides 125× and 15.6× speedup over the state-of-the-art software counterparts for reference-guided and de novo assembly of third generation sequencing reads, respectively. For pairwise alignment of sequences, Darwin is over 39,000× more energy-efficient than software. Darwin uses (i) a novel filtration strategy, called D-SOFT, to reduce the search space for sequence alignment at high speed, and (ii) a hardware-accelerated version of GACT, a novel algorithm to generate near-optimal alignments of arbitrarily long genomic sequences using constant memory for trace-back. Darwin is adaptable, with tunable speed and sensitivity to match emerging sequencing technologies and to meet the requirements of genomic applications beyond read assembly. ER -