PT - JOURNAL ARTICLE AU - Peter A. Andrews AU - Ivan Iossifov AU - Jude Kendall AU - Steven Marks AU - Lakshmi Muthuswamy AU - Zihua Wang AU - Dan Levy AU - Michael Wigler TI - MUMdex: MUM-based structural variation detection AID - 10.1101/078261 DP - 2016 Jan 01 TA - bioRxiv PG - 078261 4099 - http://biorxiv.org/content/early/2016/09/30/078261.short 4100 - http://biorxiv.org/content/early/2016/09/30/078261.full AB - Motivation Standard genome sequence alignment tools primarily designed to find one alignment per read have difficulty detecting inversion, translocation and large insertion and deletion (indel) events. Moreover, dedicated split read alignment methods that depend only upon the reference genome may misidentify or find too many potential split read alignments because of reference genome anomalies.Methods We introduce MUMdex, a Maximal Unique Match (MUM)-based genomic analysis software package consisting of a sequence aligner to the reference genome, a storage-indexing format and analysis software. Discordant reference alignments of MUMs are especially suitable for identifying inversion, translocation and large indel differences in unique regions. Extracted population databases are used as filters for flaws in the reference genome. We describe the concepts underlying MUM-based analysis, the software implementation and its usage.Results We demonstrate via simulation that the MUMdex aligner and alignment format are able to correctly detect and record genomic events. We characterize alignment performance and output file sizes for human whole genome data and compare to Bowtie 2 and the BAM format. Preliminary results demonstrate the practicality of the analysis approach by detecting de novo mutation candidates in human whole genome DNA sequence data from 510 families. We provide a population database of events from these families for use by others.Availability http://mumdex.com/Contact andrewsp{at}cshl.edu (or paa{at}drpa.us)Supplementary information Supplementary data are available online.