Abstract
Graphical representation of DNA sequences is one of the most popular techniques of alignment-free sequence comparison. In this article, we propose a new method for extracting features of DNA sequences represented by binary images, in which we estimate the similarity between DNA sequences by the frequency histograms of local bitmap patterns on the images. Our method has linear time complexity for the length of DNA sequences, which is practical even for comparison of long sequences. We tested five distance measures to estimate sequence similarities and found that histogram intersection and Manhattan distance are most appropriate for our method among them.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.