Abstract
Motivation Modern genomic sequencing data is trending toward longer sequences with higher accuracy. Many analyses using these data will center on alignments, but classical exact alignment algorithms are infeasible for long sequences. The recently proposed WFA algorithm demonstrated how to perform exact alignment for long, similar sequences in O(sN) time and O(s2) memory, where s is a score that is low for similar sequences (Marco-Sola et al., 2021). However, this algorithm still has infeasible memory requirements for longer sequences. Also, it uses an alternate scoring system that is unfamiliar to many bioinformaticians.
Results We describe variants of WFA that improve its asymptotic memory use from O(s2) to O(s3/2) and its asymptotic run time from O(sN) to O(s2 + N). We expect the reduction in memory use to be particularly impactful, as it makes it practical to perform highly multithreaded megabase-scale exact alignments in common compute environments. In addition, we show how to fold WFA’s alternate scoring into the broader literature on alignment scores.
Availability All code is publicly available for use and modification at https://github.com/jeizenga/wfalm.
Contact jeizenga{at}ucsc.edu
Supplementary information Supplementary data are available online.
Competing Interest Statement
The authors have declared no competing interest.