RT Journal Article
SR Electronic
T1 Constraints on eQTL fine mapping in the presence of multi-site local regulation of gene expression
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 084293
DO 10.1101/084293
A1 Biao Zeng
A1 Luke R. Lloyd-Jones
A1 Alexander Holloway
A1 Urko M. Marigorta
A1 Andres Metspalu
A1 Grant W. Montgomery
A1 Tonu Esko
A1 Kenneth L. Brigham
A1 Arshed A. Quyyumi
A1 Youssef Idaghdour
A1 Jian Yang
A1 Peter M. Visscher
A1 Joseph E. Powell
A1 Greg Gibson
YR 2016
UL http://biorxiv.org/content/early/2016/10/29/084293.abstract
AB Expression QTL (eQTL) detection has emerged as an important tool for unravelling of the relationship between genetic risk factors and disease or clinical phenotypes. Most studies use single marker linear regression to discover primary signals, followed by sequential conditional modeling to detect secondary genetic variants affecting gene expression. However, this approach assumes that functional variants are sparsely distributed and that close linkage between them has little impact on estimation of their precise location and magnitude of effects. In this study, we address the prevalence of secondary signals and bias in estimation of their effects by performing multi-site linear regression on two large human cohort peripheral blood gene expression datasets (each greater than 2,500 samples) with accompanying whole genome genotypes, namely the CAGE compendium of Illumina microarray studies, and the Framingham Heart Study Affymetrix data. Stepwise conditional modeling demonstrates that multiple eQTL signals are present for ~40% of over 3500 eGenes in both datasets, and the number of loci with additional signals reduces by approximately two-thirds with each conditioning step. However, the concordance of specific signals between the two studies is only ~30%, indicating that expression profiling platform is a large source of variance in effect estimation. Furthermore, a series of simulation studies imply that in the presence of multi-site regulation, up to 10% of the secondary signals could be artefacts of incomplete tagging, and at least 5% but up to one quarter of credible intervals may not even include the causal site, which is thus mis-localized. Joint multi-site effect estimation recalibrates effect size estimates by just a small amount on average. Presumably similar conclusions apply to most types of quantitative trait. Given the strong empirical evidence that gene expression is commonly regulated by more than one variant, we conclude that the fine-mapping of causal variants needs to be adjusted for multi-site influences, as conditional estimates can be highly biased by interference among linked sites.