RT Journal Article SR Electronic T1 A tandem simulation framework for predicting mapping quality JF bioRxiv FD Cold Spring Harbor Laboratory SP 103952 DO 10.1101/103952 A1 Ben Langmead YR 2017 UL http://biorxiv.org/content/early/2017/01/30/103952.abstract AB Read alignment is the first step in most sequencing data analyses. It is also a source of errors and interpretability problems. Repetitive genomes, algorithmic shortcuts, and genetic variation impede the aligner’s ability to find a read’s true point of origin. Aligners therefore report a mapping quality: the probability the reported point of origin for a read is incorrect. However, there is no established method for calculating mapping quality in a general way. We describe an accurate, aligner-agnostic framework for predicting mapping qualities that works by simulating a set of tandem reads, similar to the input reads in important ways, but for which the true point of origin is known. Alignments of tandem reads are used to build a model for predicting mapping quality, which is then applied to the input-read alignments. The model is automatically tailored to the alignment scenario at hand, allowing it to make accurate mapping-quality predictions across a range of scenarios. We implement this approach in a software tool called Qtip, which is accurate, low-overhead, and compatible with popular read aligners. Qtip is open source software available from https://github.com/BenLangmead/qtip.