Abstract
During the past decade, genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced vast repositories of genetic variation and trait measurements across millions of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyze summary association statistics. Here we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.
Glossary
- INDIVIDUAL-LEVEL DATA
- Genome-wide SNP genotypes and trait values for each individual included in a GWAS.
- SUMMARY ASSOCIATION STATISTICS
- Estimated effect sizes and their standard errors for each SNP analyzed in a GWAS.
- Z-SCORES
- Association statistics that follow a standard normal distribution under the null; often computed as per-allele effect sizes divided by their standard error.
- META-ANALYSIS
- A method for combining data from different studies in which summary association statistics from each study are jointly analyzed.
- MEGA-ANALYSIS
- A method for combining data from different studies in which individual-level data from each study are merged and jointly analyzed.
- SUMMARY LD INFORMATION
- In-sample correlations between each pair of typed SNPs analyzed in a GWAS; can be restricted to proximal pairs of typed SNPs to limit the number of pairs of SNPs.
- TRANSCRIPTOME-WIDE ASSOCIATION STUDY (TWAS)
- A study that evaluates the association between expression of each gene and a trait of interest; predicted expression may be used instead of measured expression to improve practicality.
- MENDELIAN RANDOMIZATION
- A method that uses significantly associated SNPs as instrumental variables to quantify causal relationships between two traits.
- BURDEN TEST
- A gene-based rare variant test in which all rare variants in a gene are assumed to have the same direction of effect.
- OVERDISPERSION TEST
- A gene-based rare variant test in which rare variants in a gene are assumed to impact trait in either direction.
- POSTERIOR PROBABILITY OF CAUSALITY
- The inferred probability that a SNP is causal, based on association data and optional prior information.
- POLYGENIC RISK SCORE
- A method of predicting trait by summing the predicted marginal effects of all markers below a P-value threshold in a training sample, multiplied by marker genotypes in a validation sample.
- LD SCORE REGRESSION
- A method of assessing trait polygenicity by regressing χ2 association statistics against LD scores for each SNP, computed as sums of squared correlations of each SNP with all SNPs including itself.
- PLEIOTROPY
- The existence of shared genetic variant(s) with nonzero causal effect sizes for two traits.
- GENETIC CORRELATION
- The signed correlation across SNPs between causal effect sizes for two traits.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.