TY - JOUR T1 - Efficient cardinality estimation for k-mers in large DNA sequencing data sets JF - bioRxiv DO - 10.1101/056846 SP - 056846 AU - Luiz C. Irber, Jr. AU - C. Titus Brown Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/06/07/056846.abstract N2 - We present an open implementation of the HyperLogLog cardinality estimation sketch for counting fixed-length substrings of DNA strings (“k-mers”).The HyperLogLog sketch implementation is in C++ with a Python interface, and is distributed as part of the khmer software package. khmer is freely available from https://github.com/dib-lab/khmerunder a BSD License. The features presented here are included in version 1.4 and later. ER -