Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores
Abstract
Polygenic risk scores have shown great promise in predicting complex disease risk, and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves LD-pruning markers and applying a P-value threshold to association statistics, but this discards information and may reduce predictive accuracy. We introduce a new method, LDpred, which infers the posterior mean causal effect size of each marker using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the pruning/thresholding approach, particularly at large sample sizes. Accordingly, prediction R2 increased from 20.1% to 25.3% in a large schizophrenia data set and from 9.8% to 12.0% in a large multiple sclerosis data set. A similar relative improvement in accuracy was observed for three additional large disease data sets and when predicting in non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
Subject Area
- Biochemistry (11561)
- Bioengineering (8619)
- Bioinformatics (28861)
- Biophysics (14793)
- Cancer Biology (11918)
- Cell Biology (17159)
- Clinical Trials (138)
- Developmental Biology (9302)
- Ecology (14019)
- Epidemiology (2067)
- Evolutionary Biology (18128)
- Genetics (12144)
- Genomics (16614)
- Immunology (11706)
- Microbiology (27689)
- Molecular Biology (11384)
- Neuroscience (60088)
- Paleontology (447)
- Pathology (1847)
- Pharmacology and Toxicology (3183)
- Physiology (4878)
- Plant Biology (10276)
- Synthetic Biology (2849)
- Systems Biology (7288)
- Zoology (1618)