TY - JOUR T1 - Discovery of large genomic inversions using pooled clone sequencing JF - bioRxiv DO - 10.1101/015156 SP - 015156 AU - Marzieh Eslami Rasekh AU - Giorgia Chiatante AU - Mattia Miroballo AU - Joyce Tang AU - Mario Ventura AU - Chris T. Amemiya AU - Evan E. Eichler AU - Francesca Antonacci AU - Can Alkan Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/02/11/015156.abstract N2 - Motivation There are many different forms of genomic structural variation that can be broadly classified as copy number variation (CNV) and balanced rearrangements. Although many algorithms are now available in the literature that aim to characterize CNVs, discovery of balanced rearrangements (inversions and translocations) remains an open problem. This is mainly because the breakpoints of such events typically lie within segmental duplications and common repeats, which reduce the mappability of short reads. The 1000 Genomes Project spearheaded the development of several methods to identify inversions, however, they are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies (HTS).Results Here we propose to use a sequencing method (Kitzman et al., 2011) originally developed to improve haplotype resolution to characterize large genomic inversions. This method, called pooled clone sequencing, merges the advantages of clone based sequencing approach with the speed and cost efficiency of HTS technologies. Using data generated with pooled clone sequencing method, we developed a novel algorithm, dipSeq, to discover large inversions (>500 Kbp). We show the power of dipSeq first on simulated data, and then apply it to the genome of a HapMap individual (NA12878). We were able to accurately discover all previously known and experimentally validated large inversions in the same genome. We also identified a novel inversion, and confirmed using fluorescent in situ hybridization.Availability Implementation of the dipSeq algorithm is available at https://github.com/BilkentCompGen/dipseqContact calkan{at}cs.bilkent.edu.tr, francesca.antonacci{at}uniba.it ER -