TY - JOUR T1 - Evaluating the genetic diagnostic power of exome sequencing: Identifying missing data JF - bioRxiv DO - 10.1101/068825 SP - 068825 AU - Pascual Lorente-Arencibia AU - Deyán Guacarán AU - Antonio Tugores Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/08/10/068825.abstract N2 - A hurdle of exome sequencing is its limited capacity to represent the entire exome. To ascertain the diagnostic power of this approach we determined the extent of coverage per individual sample. Using alignment data (BAM files) from 15 exome samples, sequences of any length that were below a determined sequencing depth coverage (DP) were detected and annotated with the Ensembl exon database using MIST, a novel software tool. Samples sequenced at 50X mean coverage had, on average, up to 50% of the Ensembl annotated exons with at least one nucleotide (L=1) with a DP<20, improving to 35% at 100X mean coverage. In addition, almost 15% of annotated exons were never sequenced (L=50, DP<1) at 50x mean coverage, reaching down to 5% at 100x. The diagnostic utility of this approach was tested for hypertrophic cardiomyopathy, a genetically heterogeneous disease, where exome sequencing covered as much as 80% of all candidate genes exons at DP≥20. This report stresses the value of identifying, precisely, which sequences are below a specific depth in an individual’s exome, and provides a useful tool to assess the potential and pitfalls of exome sequencing in a diagnostic or gene discovery setting.COMPETING INTERESTS The authors declare no conflicts of interest. ER -