RT Journal Article SR Electronic T1 Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels JF bioRxiv FD Cold Spring Harbor Laboratory SP 007633 DO 10.1101/007633 A1 Patrick Deelen A1 Daria V. Zhernakova A1 Mark de Haan A1 Marijke van der Sijde A1 Marc Jan Bonder A1 Juha Karjalainen A1 K. Joeri van der Velde A1 Kristin M. Abbott A1 Jingyuan Fu A1 Cisca Wijmenga A1 Richard J. Sinke A1 Morris A. Swertz A1 Lude Franke YR 2014 UL http://biorxiv.org/content/early/2014/08/01/007633.abstract AB Given increasing numbers of RNA-seq samples in the public domain, we studied to what extent expression quantitative trait loci (eQTLs) and allele-specific expression (ASE) can be identified in public RNA-seq data while also deriving the genotypes from the RNA-seq reads. 4,978 human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though this data originated from many different laboratories, samples reflecting the same cell-type clustered together, suggesting that technical biases due to different sequencing protocols were limited. We derived genotypes from the RNA-seq reads and imputed non-coding variants. In a joint analysis on 1,262 samples combined, we identified cis-eQTLs effects for 8,034 unique genes. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become relevant for studying tissue-specific effects of rare pathogenic genetic variants.eQTLExpression quantitative trait locusASEAllele-specific expressionENAEuropean nucleotide archiveMAFMinor allele frequencyRNA-seqRNA-sequencingPCAPrincipal component analysisQCQuality controlLCLLymphoblastoid cell-lineFDRFalse discovery rateGoNLGenome of the NetherlandsGQPhred-scaled genotype qualityDR2Estimated dosage r2 after imputation