RT Journal Article SR Electronic T1 Joint Variant and De Novo Mutation Identification on Pedigrees from High-Throughput Sequencing Data JF bioRxiv FD Cold Spring Harbor Laboratory SP 001958 DO 10.1101/001958 A1 John G. Cleary A1 Ross Braithwaite A1 Kurt Gaastra A1 Brian S. Hilbush A1 Stuart Inglis A1 Sean A. Irvine A1 Alan Jackson A1 Richard Littin A1 Sahar Nohzadeh-Malakshah A1 Minita Shah A1 Mehul Rathod A1 David Ware A1 Len Trigg A1 Francisco M. De La Vega YR 2014 UL http://biorxiv.org/content/early/2014/01/24/001958.abstract AB The analysis of whole-genome or exome sequencing data from trios and pedigrees has being successfully applied to the identification of disease-causing mutations. However, most methods used to identify and genotype genetic variants from next-generation sequencing data ignore the relationships between samples, resulting in significant Mendelian errors, false positives and negatives. Here we present a Bayesian network framework that jointly analyses data from all members of a pedigree simultaneously using Mendelian segregation priors, yet providing the ability to detect de novo mutations in offspring, and is scalable to large pedigrees. We evaluated our method by simulations and analysis of WGS data from a 17 individual, 3-generation CEPH pedigree sequenced to 50X average depth. Compared to singleton calling, our family caller produced more high quality variants and eliminated spurious calls as judged by common quality metrics such as Ti/Tv, Het/Hom ratios, and dbSNP/SNP array data concordance. We developed a ground truth dataset to further evaluate our calls by identifying recombination cross-overs in the pedigree and testing variants for consistency with the inferred phasing, and we show that our method significantly outperforms singleton and population variant calling in pedigrees. We identify all previously validated de novo mutations in NA12878, concurrent with a 7X precision improvement. Our results show that our method is scalable to large genomics and human disease studies and allows cost optimization by rational sequencing capacity distribution.