TY - JOUR T1 - Evaluating the accuracy of genomic prediction of growth and wood traits in two <em>Eucalyptus</em> species and their F<sub>1</sub> hybrids JF - bioRxiv DO - 10.1101/081281 SP - 081281 AU - Biyue Tan AU - Dario Grattapaglia AU - Gustavo Salgado Martins AU - Karina Zamprogno Ferreira AU - Björn Sundberg AU - Pär K. Ingvarsson Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/10/15/081281.abstract N2 - Background Genomic prediction is a genomics assisted breeding methodology that can increase genetic gains by accelerating the breeding cycle and potentially improving the accuracy of breeding values. In this study, we used 41,304 informative SNPs genotyped in a Eucalyptus breeding population involving 90 E.grandis and 78 E.urophylla parents and their 949 F1 hybrids to develop genomic prediction models for eight phenotypic traits - basic density and pulp yield, circumference at breast height and height and tree volume scored at age thee and six years. Based on different genomic prediction methods we assessed the impact of the composition and size of the training/validation sets and the number and genomic location of SNPs on the predictive ability (PA).Results Heritabilities estimated using the realized genomic relationship matrix (GRM) were considerably higher than estimates based on the expected pedigree, mainly due to inconsistencies in the expected pedigree that were readily corrected by the GRM. Moreover, GRM more precisely capture Mendelian sampling among related individuals, such that the genetic covariance was based on the actual proportion of the genome shared between individuals. PA improved considerably when increasing the size of the training set and by enhancing relatedness to the validation set. Prediction models trained on pure species parents could not predict well in F1 hybrids, indicating that model training has to be carried out in hybrid populations if one is to predict in hybrid selection candidates. The different genomic prediction methods provided similar results for all traits, therefore GBLUP or rrBLUP represents better compromises between computational time and prediction efficiency. Only slight improvement was observed in PA when more than 5,000 SNPs were used for all traits. Using SNPs in intergenic regions provided slightly better PA than using SNPs sampled exclusively in genic regions.Conclusions Effects of training set size and composition and number of SNPs used are the most important factors for model prediction rather than prediction method and the genomic location of SNPs. Furthermore, training the prediction model on pure parental species provide limited ability to predict traits in interspecific hybrids. Our results provide additional promising perspectives for the implementation of genomic prediction in Eucalyptus breeding programs.BLBayesian LASSOCBHcircumference at breast heightCDScoding sequencesGBLUPgenomic best linear unbiased predictorGEBVgenomic estimated breeding valuesGRMgenomic relationship matrixGSgenomic selectionIBDidentity by descentIBSidentity by stateLDlinkage disequilibriumMASmarker-assisted selectionNeeffective population sizePApredictive abilityPCAprincipal components analysisQTLsquantitative trait lociRKHSreproducing kernel Hilbert spacerrBLUPridge-regression best linear unbiased predictionSNPsingle-nucleotide polymorphismTStraining setVSvalidation set ER -