PT - JOURNAL ARTICLE AU - Giuseppe Narzisi AU - Jason A. O’Rawe AU - Ivan Iossifov AU - Yoon-ha Lee AU - Zihua Wang AU - Yiyang Wu AU - Gholson J. Lyon AU - Michael Wigler AU - Michael C. Schatz TI - Accurate detection of de novo and transmitted INDELs within exome-capture data using micro-assembly AID - 10.1101/001370 DP - 2013 Jan 01 TA - bioRxiv PG - 001370 4099 - http://biorxiv.org/content/early/2013/12/13/001370.short 4100 - http://biorxiv.org/content/early/2013/12/13/001370.full AB - We present a new open-source algorithm, Scalpel, for sensitive and specific discovery of INDELs in exome-capture data. By combining the power of mapping and assembly, Scalpel searches the de Bruijn graph for haplotype-specific sequence paths (contigs) that span each exon. The algorithm reports a single path for homozygous exons, two paths for heterozygous exons, and multiple paths for more exotic variations. A detailed repeat composition analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for INDEL discovery. We extensively compared Scalpel with a battery of >10000 simulated and >1000 experimentally validated INDELs between 1 and 100bp against two recent algorithms for INDEL discovery: GATK HaplotypeCaller and SOAPindel. We report anomalies for these tools in their ability to detect INDELs, especially in regions containing near-perfect repeats which contribute to high false positive rates. In contrast, Scalpel demonstrates superior specificity while maintaining high sensitivity. We also present a large-scale application of Scalpel for detecting de novo and transmitted INDELs in 593 families with autistic children from the Simons Simplex Collection. Scalpel demonstrates enhanced power to detect long (≥20bp) transmitted events, and strengthens previous reports of enrichment for de novo likely gene-disrupting INDEL mutations in children with autism with many new candidate genes. The source code and documentation for the algorithm is available at http://scalpel.sourceforge.net.