RT Journal Article SR Electronic T1 Evaluation of the accuracy of imputed sequence variants and their utility for causal variant detection in cattle JF bioRxiv FD Cold Spring Harbor Laboratory SP 085399 DO 10.1101/085399 A1 Hubert Pausch A1 Iona M MacLeod A1 Ruedi Fries A1 Reiner Emmerling A1 Phil J Bowman A1 Hans D Daetwyler A1 Michael E Goddard YR 2016 UL http://biorxiv.org/content/early/2016/11/03/085399.abstract AB Background The availability of dense genotypes and whole-genome sequence variants from various sources offers the opportunity to compile large data sets consisting of tens of thousands of animals with genotypes for millions of polymorphic sites that may enhance the power of genomic analyses. The imputation of missing genotypes ensures that all individuals have genotypes for a shared set of variants.Results We evaluated the accuracy of imputation from dense genotypes to whole-genome sequence variants in 249 Fleckvieh and 450 Holstein cattle using Minimac and FImpute. The sequence variants of a subset of the animals were reduced to the variants that were included in the Illumina BovineHD genotyping array and subsequently inferred in silico using either within- or multi-breed reference populations. The accuracy of imputation varied considerably across chromosomes and dropped at regions where the bovine genome contains segmental duplications. Depending on the imputation strategy, the correlation between imputed and true genotypes ranged from 0.898 to 0.952. The accuracy of imputation was higher with Minimac than FImpute particularly for rare alleles. Considering a multi-breed reference population increased the accuracy of imputation, particularly when FImpute was used to infer genotypes. When the sequence variants were imputed using Minimac, the true genotypes were more correlated to predicted allele dosages than best-guess genotypes. The computing costs to impute 23,256,743 sequence variants in 6958 animals were 10-fold higher with Minimac than FImpute. Association studies with imputed sequence variants revealed seven quantitative trait loci (QTL) for milk fat percentage. Two known causal mutations in the DGAT1 and GHR genes were the most significantly associated variants at two QTL on chromosomes 14 and 20 when Minimac was used to infer genotypes.Conclusions The population-based imputation of millions of sequence variants in large cohorts provides accurate genotypes and is computationally feasible. Using a reference population that includes individuals from many breeds increases the accuracy of imputation particularly at low-frequency variants. Considering allele dosages rather than best-guess genotypes as explanatory variables is advantageous for association studies with imputed sequence variants.