RT Journal Article
SR Electronic
T1 Cookiecutter: a tool for kmer-based read filtering and extraction
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 024679
DO 10.1101/024679
A1 Ekaterina Starostina
A1 Gaik Tamazian
A1 Pavel Dobrynin
A1 Stephen O’Brien
A1 Aleksey Komissarov
YR 2015
UL http://biorxiv.org/content/early/2015/08/16/024679.abstract
AB Motivation Kmer-based analysis is a powerful method used in read error correction and implemented in various genome assembly tools. A number of read processing routines include extracting or removing sequence reads from the results of high-throughput sequencing experiments prior to further analysis. Here we present a new approach to sorting or filtering of raw reads based on a provided list of kmers.Results We developed Cookiecutter — a computational tool for rapid read extraction or removing according to a provided list of k-mers generated from a FASTA file. Cookiecutter is based on the implementation of the Aho-Corasik algorithm and is useful in routine processing of high-throughput sequencing datasets. Cookiecutter can be used for both removing undesirable reads and read extraction from a user-defined region of interest.Availability The open-source implementation with user instructions can be obtained from GitHub: https://github.com/ad3002/Cookiecutter.