PT - JOURNAL ARTICLE AU - Jungeui Hong AU - David Gresham TI - Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing AID - 10.1101/114603 DP - 2017 Jan 01 TA - bioRxiv PG - 114603 4099 - http://biorxiv.org/content/early/2017/03/07/114603.short 4100 - http://biorxiv.org/content/early/2017/03/07/114603.full AB - Quantitative analysis of next-generation sequencing data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are defined as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. The false positive rate of coordinate-based deduplication has not been well characterized and may introduce unforeseen biases during analyses. We developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed sequencing. Incorporation of UMIs enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TruSeq adapters containing UMIs (TrUMIseq adapters), we find that accurate removal of PCR duplicates results in enhanced data quality for quantitative analysis of allele frequencies in heterogeneous populations and gene expression.Method Summary TrUMIseq adapters incorporate unique molecular identifiers in TruSeq adapters while maintaining the capacity to multiplex sequencing libraries using existing workflows. The use of UMIs increases the accuracy of quantitative sequencing assays, including RNAseq and allele frequency estimation, by enabling accurate detection of PCR duplicates.