Abstract
A test of association between the phenotype and a set of genes within a biological pathway can be complementary to single variant or single gene association analysis and provide further insights into the genetic architecture of complex phenotypes. Although multiple methods exist to perform such a gene-set analysis, most have low statistical power when only a small fraction of the genes are associated with the phenotype. Further, since existing methods cannot identify possible genes driving association signals, interpreting results of such association in terms of the underlying genetic mechanism is challenging. Here, we introduce Gene-set analysis Association Using Sparse Signals (GAUSS), a method for gene-set association analysis with GWAS summary statistics. In addition to providing a p-value for association, GAUSS identifies the subset of genes that have the maximal evidence of association and appears to drive the association. Using pre-computed correlation structure among test statistics from a reference panel, the p-value calculation is substantially faster compared to other permutation or simulation-based approaches. Our numerical experiments show that GAUSS can increase power over several existing methods while controlling type-I error under a variety of association models. Through the analysis of summary statistics from the UK Biobank data for 1,403 phenotypes, we show that GAUSS is scalable and can identify associations across many phenotypes and gene-sets.