gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data

Shifu Chen; Yanqing Zhou; Yaru Chen; Tanxiao Huang; Wenting Liao; Yun Xu; Zhihua Liu; Jia Gu

doi:10.1101/501502

Abstract

Summary This paper presents an efficient tool gencore, to eliminate errors and duplicates of next-generation sequencing (NGS) data. This tool clusters the mapped sequencing reads and merges each cluster to generate one consensus read. If the data has unique molecular identifier (UMI), gencore uses it for identifying the reads derived from same original DNA fragment. Comparing to the conventional tool Picard, gencore greatly reduces the output data’s mapping mismatches, which are mostly caused by errors. This error-suppressing feature makes gencore very suitable for the application of detecting ultra-low frequency mutations from deep sequencing data. Comparing to the performance of Picard, gencore is about 3X faster and uses much less memory.

Availability and Implementation gencore is an open source tool written in C++. It’s hosted in github: https://github.com/OpenGene/gencore

Contact chen{at}haplox.com

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.