TY - JOUR T1 - Multi-ethnic polygenic risk scores improve risk prediction in diverse populations JF - bioRxiv DO - 10.1101/051458 SP - 051458 AU - Carla Márquez-Luna AU - Po-Ru Loh AU - South Asian Type 2 Diabetes (SAT2D) Consortium AU - The SIGMA Type 2 Diabetes Consortium AU - Alkes L. Price Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/02/01/051458.abstract N2 - Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sample size or training data from the target population in small sample size, but not both. Here, we introduce a multi-ethnic polygenic risk score that combines training data from European samples and training data from the target population. We applied this approach to predict type 2 diabetes (T2D) in a Latino cohort using both publicly available European summary statistics in large sample size and Latino training data in small sample size. We attained a >70% relative improvement in prediction accuracy (from R2=0.027 to R2=0.047) compared to methods that use only one source of training data, consistent with large relative improvements in simulations. We observed a systematically lower load of T2D risk alleles in Latino individuals with more European ancestry, which could be explained by polygenic selection in ancestral European and/or Native American populations. Application of our approach to predict T2D in a South Asian UK Biobank cohort attained a >70% relative improvement in prediction accuracy, and application to predict height in an African UK Biobank cohort attained a 30% relative improvement. Our work reduces the gap in polygenic risk prediction accuracy between European and non-European target populations.Author Summary The use of genetic information to predict disease risk is of great interest because of its potential clinical application. Prediction is performed via the construction of polygenic risk scores, which separate individuals into different risk categories. Polygenic risk scores can also be applied to improve our understanding of the genetic architecture of complex diseases. The ideal training data set would be a large cohort from the same population as the target sample, but this is generally unavailable for non-European populations. Thus, we propose a summary statistics based polygenic risk score that leverages both a large European training sample and a training sample from the same population as the target population. This approach produces a substantial relative improvement in prediction accuracy compared to methods that use a single training population when applied to predict type 2 diabetes in a Latino cohort, consistent with simulation results. We observed similar relative improvements in applications to predict type 2 diabetes in a South Asian cohort and height in an African cohort. ER -