RT Journal Article SR Electronic T1 Rfpred: A Random Forest Approach for Prediction of Missense Variants in Human Exome JF bioRxiv FD Cold Spring Harbor Laboratory SP 037127 DO 10.1101/037127 A1 Fabienne Jabot-Hanin A1 Hugo Varet A1 Frederic Tores A1 Alexandre Alcaïs A1 Jean-Philippe Jaïs YR 2016 UL http://biorxiv.org/content/early/2016/01/19/037127.abstract AB Exome sequencing is becoming a standard tool for gene mapping of genetic diseases. Given the vast amount of data generated by Next Generation Sequencing techniques, identification of disease causal variants is like finding a needle in a haystack. The impact assessment and the prioritization of potential pathogenic variants are expected to reduce work in biological validation, which is long and costly.One of the possible approaches to determine the most probable deleterious variants in individual exomes is to use protein function alteration prediction. We propose in this paper to use a machine learning approach, the random forest to build a new meta-score based on five previously described scores (SIFT, Polyphen2, LRT, PhyloP and MutationTaster) and compiled in the dbNSFP database.The functional meta-score was trained on a dataset of 61 500 non-synonymous Single Nucleotide Polymorphisms (SNPs). The random forest method (rfPred) appears to be globally better than each of the classifiers separately or in combination in a logistic regression model, and better than a newly described score (CADD) on independent validation sets.RfPred scores have been pre-calculated for all the possible non-synonymous SNPs of human exome and are freely accessible at the web-server http://www.sbim.fr/rfPred/