Abstract
Short tandem repeats (STRs) are polymorphic genomic loci valuable for various applications such as research, diagnostics and forensics. However, their polymorphic nature acts as a double-edged sword, as during in vitro amplification STRs undergo mutational processes that cause stutter noise, especially in the shorter, more mutable, repeat types. Although it is possible to overcome stutter noise by using amplification-free library preparation, such protocols are presently incompatible with single cell analysis and with known targeted-enrichment protocols. To address this challenge, we have designed a method for direct measurement of in vitro noise. Using a synthetic STR sequencing library, we have calibrated a proposed Markov model for the prediction of stutter patterns at any amplification cycle. By employing this model, we have managed to genotype accurately even cases of severe amplification noise, where as little as 3% of the reads accurately reflect the original STR size.