RT Journal Article SR Electronic T1 UPS-indel: a Universal Positioning System for Indels JF bioRxiv FD Cold Spring Harbor Laboratory SP 133553 DO 10.1101/133553 A1 Mohammad Shabbir Hasan A1 Xiaowei Wu A1 Layne T. Watson A1 Zhiyi Li A1 Liqing Zhang YR 2017 UL http://biorxiv.org/content/early/2017/05/23/133553.abstract AB Background Indels, though differing in allele sequence and position, are biologically equivalent when they lead to the same altered sequences. Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and may mislead downstream analysis and interpretations. About 10% of the human indels stored in dbSNP are redundant. It is thus desirable to have a unified system for identifying and representing equivalent indels in publically available databases. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare indel calling results produced by different tools.Results UPS-indel identifies nearly 15% indels in dbSNP (version 142) as redundant across all human chromosomes, higher than previously reported. When applied to COSMIC coding and noncoding indel datasets, UPS-indel identifies nearly 29% and 13% indels as redundant, respectively. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to other state-of-the-art approaches for indel call set comparison demonstrates that UPS-indel is clearly superior to other approaches in finding indels in common among call sets.Conclusions UPS-indel is theoretically proven to find all equivalent indels, and is thus exhaustive. UPS-indel is written in C++ and the command line version is freely available to download at http://ups-indel.sourceforge.net. The online version of UPS-indel is available at http://bench.cs.vt.edu/ups-indel/.