Dissecting the genetics of complex traits using summary association statistics

Bogdan Pasaniuc; Alkes L. Price

doi:10.1101/072934

Abstract

During the past decade, genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced vast repositories of genetic variation and trait measurements across millions of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyze summary association statistics. Here we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.

Introduction

Genome-wide association studies (GWAS) have been broadly successful in identifying genetic variants associated to complex traits and diseases, explaining a significant fraction of narrow-sense heritability and occasionally pinpointing biological mechanisms¹. These studies have produced vast databases of genetic variation (typically at the level of common single nucleotide polymorphisms (SNPs) included on genotyping arrays) in millions of individuals across hundreds of complex traits. Further analyses of this data can yield important insights into the genetics of complex traits, but privacy concerns and other logistical considerations often restrict access to individual-level data. On the other hand, summary association statistics, defined here as per-allele SNP effect sizes (log odds ratios for case-control traits) together with their standard errors, are often readily available and can be used to compute z-scores (per-allele effect sizes divided by their standard errors; see Figure 1); we note that in some applications, allele frequencies may also be required. A partial list of publicly available summary association statistics from large GWAS is provided in Table 1. Summary statistics also offer advantages in computational cost, which does not scale with the number of individuals in the study. These advantages have motivated the recent development of many new methods for analyzing summary association data, often in conjunction with linkage disequilibrium (LD) information from a population reference panel such as 1000 Genomes².

Figure 1 Illustration of summary association statistics.

Per-allele SNP effect sizes (and their standard errors) are typically estimated by regressing the phenotype on the genotype values at the SNP of interest (top). At large sample sizes, the vector of z-scores (effect sizes divided by their standard errors) at a locus are approximated by a multivariate normal distribution with mean 0 and variance equal to the LD matrix V (bottom).

View this table:

Table 1. Publicly available summary association statistics.

We provide a partial list of publicly available summary statistics from GWAS with sample size at least 20,000. *: includes specialty chip data; not suitable for analysis using LD score regression and its extensions.

Here, we review these summary statistic-based methods. First, we review methods for performing single-variant association tests, including meta-analysis, conditional association and imputation using summary statistics. Second, we review methods for performing gene-based association tests by incorporating transcriptome reference data or aggregating signals across multiple rare variants. Third, we review methods for fine-mapping causal variants, including integration of functional annotation and/or trans-ethnic data. Fourth, we review methods for constructing polygenic predictions of disease risk and inferring polygenic architectures. Finally, we review methods for jointly analyzing multiple traits. We conclude with a discussion of research areas where further work on summary statistic based methods is needed.

Single-variant association tests

Meta-analysis using fixed-effects or random-effects models

Large consortia often combine multiple GWAS studies into a single aggregate analysis to boost power for discovering SNP associations of small effect. Studies are combined either by jointly analyzing summary association results from each study (meta-analysis) or by re-analyzing individual-level data across all studies (mega-analysis)³. It has been shown that meta-analysis attains similar power for association as mega-analysis, with fewer privacy constraints and logistical challenges (since only summary association data is shared across studies)⁴. Meta-analysis is usually performed using fixed-effects approaches, which assume that true effect sizes are the same across studies. If true effect sizes are expected to differ across studies, this heterogeneity can be explicitly modeled using random-effects methods, which include an extra variance term in the model to account for heterogeneity. Traditional random-effects methods allow for heterogeneity under the null model, leading to low power even when heterogeneity is present. This motivated the development of a random-effects method based on a null model of no-heterogeneity, which increases power over traditional random-effects methods⁵. Under this framework, a statistical test against a null model of no-heterogeneity can be viewed as a summation of a fixed-effect component and a heterogeneity component, thus connecting fixed-effects and random-effects meta-analysis⁵. Subsequent work has introduced the concept of posterior probability for each study to have a non-zero effect, aiding interpretation and power when only a subset of studies have non-zero effect⁶.

Conditional association using LD reference data

Conditional association, in which the association between SNP and trait is evaluated after conditioning on the top SNP at a locus, can be used to identify multiple signals of association at a previously identified GWAS locus. Conditional association methods have traditionally required individual-level data in order to jointly fit multiple SNPs. Recent work has shown that conditional and joint association analysis of multiple SNPs can be approximated using only summary association statistics together with linkage disequilibrium (LD) information estimated from a population reference panel such as 1000 Genomes (see Box 1)⁷. This has enabled the discovery of new secondary associations at known loci for height, BMI, and other complex traits and diseases, increasing the variance explained by GWAS associations for these traits^8-10; for example, in a recent height GWAS, approximate conditional analysis using summary data identified 697 genome-wide significant SNPs, including 34 SNPs with r²>0.1 to a more significant SNP at the same locus⁸.

Imputation using summary association statistics

A standard approach to boost association power in GWAS is to leverage LD information from a population reference panel to impute genotypes at variants not typed in the study¹¹. Imputation is traditionally performed using individual-level data, which requires substantial computational resources and can be logistically cumbersome when new reference panels become available, particularly for large consortia combining data from multiple studies. As an alternative to imputation using individual-level data, approaches have been developed to perform imputation directly at the level of summary statistics^12-18. The key insight of these approaches is that LD induces correlations between z-scores, which can be modeled using a multivariate normal (MVN) distribution with variance equal to the LD correlation matrix¹⁹. Thus, z-scores at untyped SNPs can be imputed from observations at typed SNPs using conditional means and variances of the MVN distribution. Imputation using summary statistics recovers >80% of the information from imputation using individual-level data at common variants^14-16, and is practical and efficient since the imputed summary statistics are linear combinations of the observed statistics (see Box 1). However, imputation using summary statistics cannot capture non-linear relationships between SNPs, which are modeled using haplotypes in imputation from individual-level data.

Conditional association and imputation using summary statistics critically rely on accurate LD information from a population reference panel. Even in the best case where the reference population closely matches the GWAS population, the relatively small reference panel size (typically hundreds or at most thousands of individuals) makes accurate estimation of a large number of LD parameters a challenge. This motivates regularization of the estimated LD matrix, both to maximize accuracy and to ensure robustness in the case of imputation using summary statistics, as mis-estimation of the variance of imputed statistics can lead to false-positive associations. A simple approach to regularization is to set all correlations between distal SNPs to zero, based on a fixed distance threshold⁷ or approximately independent LD blocks inferred from the data²⁰. An alternative is to specify a prior distribution and compute Bayesian posteriors¹²; data can be combined across multiple ancestry reference panels to further boost accuracy^17,18. Singular value decomposition based approaches have also been proposed in other contexts¹⁰. In general, the accuracy of conditional association and imputation using summary statistics is reduced at low-frequency variants and when the LD structure between typed and imputed SNPs is mis-specified (e.g., when the ancestry of the GWAS sample does not exactly match the reference panel). We note that concerns about false-positive associations in imputation using summary statistics can be avoided entirely via the release of in-sample summary LD information, i.e. pairwise correlations between all typed SNPs.

Gene-based association tests

Gene-based association using transcriptome reference data

GWAS risk variants are significantly enriched for genetic variants that impact gene expression (eQTLs)²¹. This motivates the paradigm of transcriptome-wide association studies (TWAS), which evaluate the association between the expression of each gene and a complex trait of interest. Due to the limited availability of very large samples with measured gene expression and trait values, initial TWAS approaches integrated eQTL and GWAS to identify susceptibility genes either via matching the association signals^22-24, via mediation analyses²⁵, or via assessing whether the same causal variant impacts both gene expression and trait under a single causal variant model^26-28.

More recent studies have leveraged predicted expression to improve the power of TWAS. Under this paradigm, transcriptome reference data is used to predict gene expression in the GWAS data set (using cis SNPs, e.g. within 1Mb of the transcription start site), followed by a test for association between predicted expression and trait. Although originally proposed using individual-level data²⁹, TWAS using predicted expression can also be performed using only summary association statistics and summary LD information^30,31. The key intuition is that the correlation between a weighted linear combination of SNPs (i.e. predicted gene expression) and trait is equivalent to a weighted linear combination of correlations between SNPs and trait (i.e. summary association statistics from GWAS) (see Figure 2). Since TWAS using predicted expression is conceptually similar to a test for non-zero genetic covariance between gene expression and trait³⁰, it can also be performed via a two-sample Mendelian randomization from summary statistics³¹. TWAS using predicted expression can increase power over a standard GWAS when there exist multiple causal variants whose effect on trait is mediated through expression. TWAS also reduces the multiple hypothesis burden by testing tens of thousands of genes instead of millions of SNPs. TWAS using predicted expression typically uses individual-level transcriptome reference data to predict gene expression, but can also be performed using only summary association statistics between SNPs and gene expression, albeit with a reduction in power³⁰. The potential power gains of TWAS are underscored by the recent identification of 71 new susceptibility genes across 28 complex traits, of which 17 have no GWAS association within 1 Mb³². However, TWAS is underpowered compared to standard GWAS when the true biological mechanism is independent of gene expression or when expression data in the most relevant tissue is not available.

Figure 2 TWAS using predicted expression and summary data.

TWAS using predicted expression and summary data follows two steps. First, transcriptome reference data is used to build a linear predictor for gene expression, typically using SNPs from the 1Mb local region around the gene with regularized effect sizes (e.g. using BSLMM⁷⁸). Second, this predictor is applied to summary GWAS z-scores and gene-trait association z-scores are computed, testing the null model of no association between gene and trait.

Rare variant association tests

Although most GWAS of complex traits and diseases have focused on common variants that are typed on genotyping arrays or imputed from population reference panels, rare variant associations may also provide a rich source of biological insights, particularly for traits under strong negative selection^33,34. Because association tests of individual rare variants are likely to be underpowered, rare variant association tests generally aggregate evidence for association across multiple rare variants at a locus. In exome sequencing studies (or exome array studies), rare variants are aggregated at the gene level, making the gene the unit of association. This can be done either using burden tests, which assume that all rare variants in a candidate gene have the same direction of effect, or using overdispersion tests, which assume that rare variants in a candidate gene can impact a complex trait in either direction; hybrid omnibus tests are also possible³⁵. Recent studies have shown that both burden tests and overdispersion tests can be performed using only summary association statistics from each rare variant, together with summary LD information^36-38 (see Box 2). Roughly, burden tests are computed as weighted sums of single-variant z-scores and overdispersion tests are computed as weighted sums of squared single-variant z-scores (analogous to previous work on common variant overdispersion tests using summary statistics³⁹), with summary LD information used to specify appropriate null distributions in each case. However, a key limitation is that these studies require the use of in-sample summary LD information in preference to reference LD information to ensure appropriate null distributions and avoid false-positive associations. Thus, in contrast to summary statistic based methods for common variants (see above), both summary association statistics and in-sample summary LD information are required in order for these methods to be useful (see Discussion). An additional limitation is that, for case-control traits, asymptotic null distributions may not be valid when variant counts or case or control sample sizes are small, necessitating careful scrutiny of quantile-quantile plots.

Fine-mapping

Fine-mapping using posterior probabilities of causality

Statistical fine-mapping aims to identify the causal variant(s) that are driving a GWAS association signal, enabling functional experiments to validate biological function. A straightforward approach to fine-mapping is to prioritize variants based on the strength of the marginal association statistics (i.e. ranking p-values)⁴⁰. This is an effective strategy in the case of a single causal variant, but can be suboptimal when multiple causal variants are present, as the SNP with the top p-value at the locus may be tagging multiple causal variants. An alternative is to compute the posterior probabilities of causality for every SNP in the region, based on the likelihoods of the observed z-scores conditional on each possible set of causal variant(s)⁴¹. These posterior probabilities can be used to construct a credible set of SNPs, defined as the smallest set of SNPs that contains the true causal variant(s) with a given probability (typically 90% or 99%). Initial studies approximated the posterior probabilities of causality under a single causal variant assumption. Under this assumption, posterior probabilities of causality can be estimated from z-scores without the need for LD information⁴²; this approach is both practical and computationally efficient. More recent studies have computed posterior probabilities of causality under a multiple causal variant assumption⁴³. As in the case of imputation using summary statistics, the likelihoods of the observed z-scores can be computed based on the multi-variate normal (MVN) distribution with variance equal to the LD correlation matrix, with LD estimated from population reference panels using regularization techniques. Unlike imputation using summary statistics, which uses the null model of no association (i.e. a mean of 0 in the MVN), in fine-mapping the mean is a function of causal effect sizes, which can be heuristically approximated or integrated out using conjugate priors^43,44. These methods often restrict computations to a maximum number of causal variants (e.g. 3 or 6); more recent studies have shown that further speed-ups can be achieved through matrix factorizations⁴⁵ or stochastic search⁴⁶. Methods that model multiple causal variants generally improve the accuracy (and calibration) of credible sets at loci with multiple causal variants^43-47, with very limited decreases in accuracy at loci with only a single causal variant^43-49. A less accurate alternative is to use conditional association analysis to detect multiple signals of associations^7,50,51, followed by estimation of posterior probabilities of causality under a single causal variant assumption for each independent signal. In this case, special care is required in specifying the boundaries of each independent signal and the threshold for the conditional test.

Leveraging functional annotation data

Fine-mapping accuracy can be improved by integrating functional annotation data such as predicted regulatory elements from the ENCODE and ROADMAP Epigenomics projects^52,53. This approach is motivated by early studies showing that disease-associated variants are systematically enriched in chromatin marks that delineate active regulatory regions in disease-relevant cell types^54,55. Under this paradigm, a statistical model is developed to jointly estimate functional enrichment and update posterior probabilities of causality using functional annotations^44,49,56,57. Some integrative methods assume that SNPs are unlinked⁵⁷ or assume a single causal variant per locus^49,56, but a recent study built upon the multiple causal variant model of ref. ⁴³ to incorporate functional annotation data⁴⁴. In an analysis of rheumatoid arthritis summary association data, integrative fine-mapping using this approach reduced the average size of 90% credible sets by 10%⁵⁸. In addition to increasing fine-mapping accuracy, these studies have also provided insights into polygenic architectures (see below) by identifying tissue-specific functional annotations that are enriched for causal disease signals. This can also be achieved by conducting fine-mapping without integrating functional annotation data (typically under a single causal variant assumption) and then overlapping the resulting credible sets with functional annotation data to assess enrichment^59-61. Future integrative methods could increase fine-mapping resolution by integrating probabilistic functional annotations (e.g., ChIP-seq peak intensity) or modeling the strength of association between SNPs and chromatin marks in population-based studies^62,63.

Trans-ethnic fine-mapping

Fine-mapping accuracy can also be improved by leveraging differences in LD patterns across continental populations that have arisen due to differences in demographic events such as population bottlenecks (see Figure 3) ^64-67. Intuitively, the set of tag SNPs linked to a causal variant will vary across populations, so that aggregating evidence of association across populations will dilute signals from tag SNPs and strengthen signals from causal variants. A standard approach to combining information across multiple studies is to compute posterior probabilities of causality from fixed-effects meta-analysis results^64,66,68,69. Alternately, posterior probabilities can be computed from results of random-effects trans-ethnic meta-analysis methods^61,65. These approaches assume a single causal variant and thus do not require LD information from the underlying populations. More recent studies have introduced hierarchical probabilistic models that allow for multiple causal variants while incorporating LD information from population reference panels⁵⁸. These studies assume that causal variants are shared across populations but allow for heterogeneity in effect sizes across populations, and can also incorporate functional annotation data to further increase fine-mapping accuracy⁵⁸. In an analysis of rheumatoid arthritis summary association data in Europeans and Asians (see above), trans-ethnic fine-mapping reduced the average size of 90% credible sets by 25%, and by 32% when also integrating functional annotation data⁵⁸.

Figure 3 Leveraging functional annotation and trans-ethnic data to improve fine-mapping.

A sample locus with simulated fine-mapping data in Europeans and Africans is displayed. The top panel shows the 99% credible set (denoted in red) produced by leveraging functional annotation data (DNase I Hypersensitivity Sites, DHS) in trans-ethnic fine-mapping. The middle and bottom panels show the –log 10 p-values (left) and LD (right) in Europeans and Africans.

Polygenicity of complex traits

Polygenic risk prediction

Although the main focus of complex disease genetics is to gain insights about disease biology, genetics can also be leveraged to build predictions of disease risk, which may become clinically useful as sample sizes increase^70,71. A landmark study of schizophrenia showed that polygenic risk scores, constructed by summing the predicted effects of all markers below a P-value threshold in the training sample, produced predictions of schizophrenia risk in validation samples that were significantly better than random, and far more accurate than those based on the single genome-wide significant locus identified in the study⁷². This provided an early demonstration of the advantages of incorporating markers that do not attain genome-wide significance into polygenic risk scores to improve prediction accuracy for polygenic traits. One complexity of polygenic risk scores is that of LD between markers, which has historically been addressed by LD-pruning—either without regard to P-values⁷², or via informed LD-pruning⁷³ (clumping) that preferentially retains markers with more significant P-values. More recent work has shown that explicitly modeling LD using an LD reference panel and estimating posterior mean causal effect sizes can improve prediction accuracy from summary statistics⁷⁴. An alternative to summary statistic based methods is to fit effect sizes of all markers simultaneously using Best Linear Unbiased Prediction (BLUP) methods and their extensions^75-77, which require individual-level training data. Fitting all markers simultaneously is theoretically more appropriate and can produce more accurate predictions, although the relative advantage is small when overall prediction accuracies are modest (Box 3). In their simplest form, polygenic risk scores and BLUP methods assume infinitesimal (Gaussian) architectures in which all markers are causal, but these methods have been extended to increase prediction accuracy in the case of non-infinitesimal architectures; this has been accomplished for polygenic risk scores via restricting to markers below a P-value threshold⁷² or estimating posterior mean causal effect sizes under a point-normal prior⁷⁴, and for BLUP methods by estimating (joint-fit) posterior mean causal effect sizes under a normal mixture prior^78,79. Although polygenic risk scores must await even larger training sample sizes to attain clinical utility, appreciable prediction accuracies have been achieved for some traits, including a Nagelkerke R² of 0.25 (AUC: 75%) for schizophrenia⁷⁴. An important caveat is that it is critical when constructing and evaluating polygenic risk scores to avoid non-independence of training and validation samples (e.g. due to cryptic relatedness or shared population stratification), which could cause prediction accuracy to be overstated relative to what could be achieved in an independent validation sample^74,80.

Inferring polygenic architectures

It is increasingly clear that most complex traits and diseases have highly polygenic architectures, with a large number of causal variants of small effect. In order to understand these polygenic architectures, it is of interest to infer parameters such as the heritability explained by SNPs and the number of causal variants. Both of these quantities have been estimated using accuracies of polygenic risk scores (see above), as a function of the P-value threshold used to constrain the set of markers employed^72,73. Computing polygenic risk scores requires individual-level data in the validation cohort, implying that these methods are not strictly summary statistic based. Recent work has shown that the information in polygenic risk scores can be derived from summary-level data in the training and validation cohorts to estimate the heritability explained by SNPs and the number of causal variants⁸¹; a limitation of this approach is that SNPs are assumed to be uncorrelated, which can be approximately achieved by LD-pruning but precludes analyses of dense marker panels. The heritability explained by SNPs can alternatively be estimated from the slope of LD score regression⁸², in which χ² statistics for each SNP are regressed against LD scores (sum of squared correlations with all SNPs), leveraging the fact that SNPs with higher LD scores are expected to contain more polygenic signal⁸³. This approach explicitly allows for LD between SNPs and can distinguish between polygenicity and confounding, but makes strong assumptions about effect sizes of rare variants and thus currently only produces robust estimates for common variants. Another recent method models LD while treating SNP effects as fixed rather than random (similar to ref. ⁸¹), enabling estimation of heritability explained by common SNPs in local regions as well as genome-wide¹⁰. Overall, summary statistic based methods provide a useful alternative to methods for estimating heritability explained by SNPs from individual-level data using restricted maximum likelihood (REML) and its extensions^84,85.

The increasing availability of functional annotation data (see above) can also be used to identify functional annotations that are enriched for polygenic signals of disease heritability. A recent study accomplished this using a Bayesian hierarchical model that splits the genome into blocks and incorporates both coarse-scale functional annotations at the level of blocks and fine-scale functional annotations at the level of SNPs⁵⁶. This was the first study to quantify polygenic enrichments for cell-type-specific chromatin marks and DNase I hypersensitivity sites (DHS) across a broad set of complex traits and diseases. For example, polygenic signals for platelet volume and platelet count were enriched at DHS in CD34+ cells, which are on the cell lineage that lead to platelets, and polygenic signals for Crohn’s disease were depleted at repressed chromatin in LCL, an immune-related cell line. Functional enrichments can alternatively be estimated by stratified LD score regression⁸⁶, which generalizes LD score regression⁸² to regress χ² statistics for each SNP against LD scores with each functional category. Fine-mapping methods can also estimate functional enrichments, although these analyses are often restricted to disease-associated loci^44,49,58. Notably, all of these summary statistic based methods have been applied to a large number of overlapping functional annotations, whereas methods that analyze individual-level genotypes have only been applied to a small number of non-overlapping functional annotations^87,88. In addition, stratified LD score regression is not limited by the single causal variant per block assumption of the Bayesian hierarchical model, increasing power in settings of highly polygenic traits⁸⁶. Application of the method identified significant cell-type-specific enrichments for many highly polygenic traits, including enrichments for histone marks in brain for smoking behavior and educational attainment—even though the summary statistics analyzed contained only one and three genome-wide significant loci, respectively. One limitation of the method is limited power for functional categories spanning a small percentage of the genome, motivating additional work in this area. As both summary statistic and functional annotation data sets grow larger and richer, identifying enriched functional annotations using summary statistic data will likely continue to be a fruitful endeavor.

Cross-trait analyses

Many complex traits and diseases have a shared genetic etiology, either via shared genetic variant(s) with nonzero causal effect sizes (pleiotropy) or via a signed correlation between causal effect sizes (genetic correlation). Indeed, many instances of genetic variants with pleiotropic effects on multiple traits have been identified^89-94. A recent study applied a Bayesian framework to summary association statistics from pairs of traits to estimate, at each locus in the genome, the probability that an associated variant has pleiotropic effects on both traits⁹⁵. Pleiotropic SNPs can also be utilized as instrumental variables in Mendelian randomization analyses from summary statistics^96-98, with one such analysis showing that increased body mass index causally increases triglyceride levels⁹⁵.

An alternate approach to assessing the genetic overlap between two traits is to estimate the correlation between causal effect sizes across the two traits. Genome-wide genetic correlations can be estimated from individual-level data using bivariate REML⁹⁹. A recent study estimated genome-wide genetic correlations from summary data using the information in polygenic risk scores, although this approach required LD-pruning the data which may lead to upwards bias⁸¹. Another recent study estimated genome-wide genetic correlations from summary data using cross-trait LD score regression¹⁰⁰, which generalizes LD score regression to regress products of z-scores against LD scores for each SNP; this method produced estimates that were highly concordant with those from individual-level data⁹⁹. Fitting the underlying MVN model using maximum likelihood instead of linear regression has produced promising results in applications to estimating cross-trait and cross-population genetic correlations, and may also prove useful in other settings¹⁰¹. Although genetic correlation analyses restricted to associated variants have also produced important findings⁹⁵, the power of methods that leverage polygenic signals in genome-wide data is underscored by the discovery of significant genetic correlations involving traits with zero or few genome-wide significant loci, including a significant negative genetic correlation between smoking behavior and educational attainment¹⁰⁰.

Conclusion

Recently developed methods have made it possible to leverage summary association statistics to perform a wide range of analyses, many of which previously required individual-level data. As the availability of summary association statistics continues to grow (Table 1), summary statistics will continue to be broadly used in analyses involving single-variant association tests, gene-based association tests, fine-mapping, polygenic prediction and inferring polygenic architectures, and cross-trait analysis. The use of summary data will entail a slight loss of accuracy in some applications, such as imputation, where methods that analyze individual-level data can use haplotypes to model nonlinear structure, and polygenic prediction, where methods that analyze individual-level data can reduce noise by fitting all markers simultaneously; however, when summary statistics are available in larger sample size than individual-level data, the advantage of larger sample size will far outweigh those limitations. In addition, there are some settings where summary statistic based methods are the method of choice even when individual-level data is available, such as identifying functional annotations that are enriched for heritability, where methods that analyze individual-level data cannot currently handle a large number of overlapping annotations.

Despite considerable recent progress, there are some areas where further research on summary statistic based methods is needed. As population reference panels grow, more accurate modeling of rare and low-frequency variants will become possible, and it will be important to assess the limits of such efforts. It is also of interest to develop methods for inferring polygenic architectures from summary statistics that allow for different relationships between allele frequency and effect size. Identifying functional annotations that are enriched for heritability is an application that is particularly likely to produce important biological insights, and here there is a need for new methods that are well-powered for functional categories spanning a small percentage of the genome. As the number of functional annotations continues to increase, the integration of such data poses computational and statistical challenges in disentangling the correct functional annotations among many correlated ones.

We conclude by emphasizing the importance of making summary association statistics publicly available. A 2012 editorial in the journal Nature Genetics asked its authors to publish or database summary association statistics for all SNPs analyzed¹⁰², broadly impacting the set of publicly available summary statistics in the years that followed (Table 1). The public release of summary statistics is a useful compromise in situations where sample consent restrictions or privacy concerns preclude the release of individual-level data in a public repository. Although even the release of summary statistics can in principle lead to privacy concerns¹⁰³, more recent work has shown that such privacy attacks have low power when the summary sample size exceeds the effective number of independent markers (currently estimated at 60,000 in typical GWAS data sets¹⁰⁴), implying that privacy concerns should not preclude the public release of summary statistics from large studies^105-107. Indeed, some recent studies have created web portals where summary data can be publicly accessed and visualized⁶⁰. Finally, we note the potential benefits of publicly releasing summary statistics that include summary LD information (i.e. correlations) between each pair of proximal SNPs; however, the optimal approach to aggregating summary LD information across multiple cohorts in large-scale meta-analyses remains unclear, motivating future work in this area.

Box 1: Conditional association and summary statistic imputation using LD reference data

Let X be an N x M matrix of genotypes, standardized to mean 0 and unit variance, and Y be an N x 1 vector of standardized trait values, where M is the number of SNPs at the locus and N is the number of samples. Under a standard linear model, Y = Xβ + ∊. Let V be an M x M LD matrix of pairwise LD; V is equal to X^TX if individual-level data is available, but can otherwise be estimated from a population reference sample (with or without regularization).

Conditional association using LD reference data

We estimate the joint effects of all SNPs using least-squares as with , where is the residual variance in the joint analysis. In a standard GWAS, however, each SNP is marginally tested one at a time, which can be expressed in matrix form as with , where D is the (nearly constant) diagonal matrix of V and is the residual variance in the marginal analysis. It follows that:

Summary statistic imputation using LD reference data

Let be a vector of z-scores (estimated effect sizes divided by their s.e.) obtained by marginally testing each SNP one at a time. Under the null hypothesis of no association, Z ∼ N(0,V). Let Z_t and Z_i partition the vector Z into T typed SNPs and M − T untyped SNPs, and let V_t,t (covariances among typed SNPs), V_i,i (covariances among untyped SNPs), and V_t,i (covariances among typed and untyped SNPs) partition the matrix V accordingly. It follows that

The mean and variance of the conditional distribution can be used to impute summary association statistics at untyped SNPs.

Box 2: Rare variant association tests using summary association statistics

Let X be an N x M matrix of genotypes, standardized to mean 0 and variance 1, and Y be an N x 1 matrix of standardized trait values, where M is the number of rare variants (e.g. in a given gene being tested for association) and N is the number of samples. An M x 1 vector of z-scores (estimated effect sizes divided by their s.e.) can be computed as , with multivariate normal null distribution z ∼ N(0,V), where V is an in-sample LD matrix.

Burden tests

Burden tests assume that all rare variants in a candidate gene have the same direction of effect. Burden tests may either assume that standardized effect sizes are the same for each rare variant¹⁰⁸ (i.e. per-allele effect sizes are proportional to , where p_i is the allele frequency), or apply weights or thresholds based on allele frequency or functional information^109,110. If w is an M x 1 vector of weights for each rare variant (including zero weights for rare variants excluded by a threshold), the test statistic for a weighted burden test is T_burden = w^TZ with null distribution T_burden ∼ N(0, w^TVw). This test statistic can naturally be extended to meta-analysis of burden tests from multiple cohorts (via inverse-variance weighting), and can be extended to variable threshold tests and binary traits^36-38.

Overdispersion tests

Overdispersion tests assume that rare variants in a candidate gene can impact a complex trait in either direction, and can be computed as weighted sums of squared single-variant test statistics ^111,112. If W = diag(w₁,…, w_M) is an M x M diagonal matrix of weights for each rare variant, the test statistic for a weighted overdispersion test is T_{overdispersion} = z^TWz with null distribution , where weights µ_i for each χ² (1 d.o.f.) distribution are given by eigenvalues of the matrix V^1/2WV^1/2. This test statistic can be extended to meta-analysis of overdispersion tests from multiple cohorts (via inverse-variance weighting), and can be extended to binary traits^36-38.

Box 3: Polygenic risk prediction using summary vs. individual-level data

Suppose that polygenic risk prediction for a quantitative trait is conducted using a training cohort with N unrelated samples, using M unlinked markers with SNP-heritability⁷ equal to . We initially consider two polygenic risk prediction methods that assume infinitesimal (Gaussian) architectures: polygenic risk scores computed using marginal effects at all markers with no P-value thresholding (PRS_all), and fitting effect sizes of all markers simultaneously via Best Linear Unbiased Prediction (BLUP). We note that PRS_all requires only summary statistics from the training cohort, whereas BLUP requires individual-level data. The prediction R² for each method are given by ^80,113

These equations can naturally be extended to linked markers (using the effective number of unlinked markers¹⁰⁴) and case-control traits (using observed-scale SNP-heritability¹¹⁴). The relative advantage of BLUP over PRS_all is small when prediction R² is small in absolute terms, but grows larger when prediction R² is larger; this is illustrated in the figure below, which reports prediction R² at various training sample sizes based on M=60,000 unlinked markers and a SNP-heritability of . These results generalize to non-infinitesimal extensions of polygenic risk scores^72,74 and BLUP^78,79; in the latter case, the noise reduction from fitting all markers simultaneously remains equal to 1−R², corresponding to an increase in training sample size of 1/(1−R²).

Box 3 Figure:

Competing interests

No competing interests

Key references

Ref. ⁵: This study introduced a powerful new random-effects meta-analysis method that employs a null model of no-heterogeneity.

Ref. ⁷: This study demonstrated that conditional association analysis can be performed using summary statistics.

Ref. ¹²: This was the first study showing that Gaussian imputation methods can be applied to summary-level genetic data.

Ref. ²⁶: This study introduced a method for performing TWAS using summary statistics by assessing whether a single causal variant impacts both gene expression and trait.

Ref. ³⁰: This study introduced a powerful method for performing TWAS using summary statistics by assessing the association between predicted gene expression (using all cis SNPs) and trait.

Ref. ³⁶: This was the first of three studies demonstrating that rare variant burden and overdispersion tests can be performed using summary statistics.

Ref. ⁴²: This study used posterior probabilities of causality to construct credible sets of causal disease-associated SNPs across multiple loci and diseases, under a single causal variant per locus assumption.

Ref. ⁵⁶: This study used Bayesian hierarchical model to estimate posterior probabilities of causality and identify functional annotations enriched for disease heritability, under a single causal variant per locus assumption.

Ref. ⁵⁸: This study showed that fine-mapping accuracy can be improved by leveraging functional annotation data and trans-ethnic samples and modelling multiple causal variants per locus.

Ref. ⁷²: This study used polygenic risk scores to predict schizophrenia risk with appreciable accuracy, implicating a highly polygenic disease architecture.

Ref. ⁹⁵: This study applied a Bayesian framework to identify pleiotropic effects across a broad set of complex traits and diseases.

Ref. ¹⁰⁰: This study introduced a new method for estimating genome-wide genetic correlations from summary statistics.

Acknowledgements

We are grateful to H. Finucane, S. Gazal, N. Mancuso and H. Shi for helpful discussions. We are grateful to G. Kichaev and R. Johnson for help with Figure 3. This work was funded by NIH grants R01 HG006399, R01 MH101244, R01 GM105857 and R01 MH107649.

Glossary

INDIVIDUAL-LEVEL DATA: Genome-wide SNP genotypes and trait values for each individual included in a GWAS.
SUMMARY ASSOCIATION STATISTICS: Estimated effect sizes and their standard errors for each SNP analyzed in a GWAS.
Z-SCORES: Association statistics that follow a standard normal distribution under the null; often computed as per-allele effect sizes divided by their standard error.
META-ANALYSIS: A method for combining data from different studies in which summary association statistics from each study are jointly analyzed.
MEGA-ANALYSIS: A method for combining data from different studies in which individual-level data from each study are merged and jointly analyzed.
SUMMARY LD INFORMATION: In-sample correlations between each pair of typed SNPs analyzed in a GWAS; can be restricted to proximal pairs of typed SNPs to limit the number of pairs of SNPs.
TRANSCRIPTOME-WIDE ASSOCIATION STUDY (TWAS): A study that evaluates the association between expression of each gene and a trait of interest; predicted expression may be used instead of measured expression to improve practicality.
MENDELIAN RANDOMIZATION: A method that uses significantly associated SNPs as instrumental variables to quantify causal relationships between two traits.
BURDEN TEST: A gene-based rare variant test in which all rare variants in a gene are assumed to have the same direction of effect.
OVERDISPERSION TEST: A gene-based rare variant test in which rare variants in a gene are assumed to impact trait in either direction.
POSTERIOR PROBABILITY OF CAUSALITY: The inferred probability that a SNP is causal, based on association data and optional prior information.
POLYGENIC RISK SCORE: A method of predicting trait by summing the predicted marginal effects of all markers below a P-value threshold in a training sample, multiplied by marker genotypes in a validation sample.
LD SCORE REGRESSION: A method of assessing trait polygenicity by regressing χ² association statistics against LD scores for each SNP, computed as sums of squared correlations of each SNP with all SNPs including itself.
PLEIOTROPY: The existence of shared genetic variant(s) with nonzero causal effect sizes for two traits.
GENETIC CORRELATION: The signed correlation across SNPs between causal effect sizes for two traits.

References

↵
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five Years of GWAS Discovery. American journal of human genetics (2012).
↵
1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
OpenUrl CrossRef PubMed Web of Science
↵
Evangelou, E. & Ioannidis, J. P. Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 14, 379–389 (2013).
OpenUrl CrossRef PubMed
↵
Lin, D. Y. & Zeng, D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genetic epidemiology 34, 60–66 (2010).
OpenUrl PubMed Web of Science
↵
Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. American journal of human genetics 88, 586–598 (2011).
OpenUrl CrossRef PubMed
↵
Han, B. & Eskin, E. Interpreting meta-analyses of genome-wide association studies. PLoS genetics 8, e1002555 (2012).
OpenUrl
↵
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature genetics 44, 369–75–S1–3 (2012).
OpenUrl CrossRef PubMed
↵
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature genetics 46, 1173–1186 (2014).
OpenUrl CrossRef PubMed
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
OpenUrl CrossRef PubMed
↵
Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the Genetic Architecture of 30 Complex Traits from Summary Association Data. American journal of human genetics 99, 139–153 (2016).
OpenUrl CrossRef
↵
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
OpenUrl CrossRef PubMed Web of Science
↵
Wen, X. & Stephens, M. Using linear predictors to impute allele frequencies from summary or pooled genotype data. Ann. Appl. Stat. 4, 1158–1182 (2010).
OpenUrl CrossRef PubMed
Kostem, E., Lozano, J. A. & Eskin, E. Increasing Power of Genome-Wide Association Studies by Collecting Additional Single-Nucleotide Polymorphisms. Genetics 188, 449–460 (2011).
OpenUrl Abstract/FREE Full Text
↵
Lee, D., Bigdeli, T. B., Riley, B. P., Fanous, A. H. & Bacanu, S. A. DIST: direct imputation of summary statistics for unmeasured SNPs. Bioinformatics 29, 2925–2927 (2013).
OpenUrl CrossRef PubMed Web of Science
Pasaniuc, B. et al. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30, 2906–2914 (2014).
OpenUrl CrossRef PubMed Web of Science
↵
Xu, Z. et al. DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics 31, 2434–2442 (2015).
OpenUrl CrossRef PubMed
↵
Lee, D. et al. DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts. Bioinformatics 31, 3099–3104 (2015).
OpenUrl CrossRef PubMed
↵
Park, D. S. et al. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses. Bioinformatics 31, i181–9 (2015).
OpenUrl CrossRef PubMed
↵
Conneely, K. N. & Boehnke, M. So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. American journal of human genetics 81, 1158–1168 (2007).
OpenUrl CrossRef PubMed Web of Science
↵
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).
OpenUrl CrossRef PubMed
↵
Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS genetics 6, e1000888 (2010).
OpenUrl
↵
Nica, A. C. et al. Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations. PLoS genetics 6, e1000895 (2010).
OpenUrl
Xiong, Q., Ancona, N., Hauser, E. R., Mukherjee, S. & Furey, T. S. Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets. Genome Res. 22, 386–397 (2012).
OpenUrl Abstract/FREE Full Text
↵
He, X. et al. Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. American journal of human genetics 92, 667–680 (2013).
OpenUrl CrossRef PubMed
↵
Huang, Y. T., Liang, L., Moffatt, M. F., Cookson, W. O. C. M. & Lin, X. iGWAS: Integrative Genome-Wide Association Studies of Genetic and Genomic Data for Disease Susceptibility Using Mediation Analysis. Genetic epidemiology 39, 347–356 (2015).
OpenUrl CrossRef
↵
Giambartolomei, C. et al. Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. PLoS genetics 10, (2014).
Onengut-Gumuscu, S. et al. Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nature genetics 47, 381–386 (2015).
OpenUrl CrossRef PubMed
↵
Fortune, M. D. et al. Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls. Nature genetics 47, 839–846 (2015).
OpenUrl CrossRef PubMed
↵
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nature genetics 47, 1091–1098 (2015).
OpenUrl CrossRef PubMed
↵
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nature genetics 48, 245–252 (2016).
OpenUrl CrossRef PubMed
↵
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature genetics 48, 481–487 (2016).
OpenUrl CrossRef PubMed
↵
Pavlides, J. M. W. et al. Predicting gene targets from integrative analyses of summary data from GWAS and eQTL studies for 28 human complex traits. Genome Med 8, 84 (2016).
OpenUrl
↵
Gibson, G. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2011).
OpenUrl CrossRef PubMed
↵
Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proceedings of the National Academy of Sciences of the United States of America 111, E455–64 (2014).
OpenUrl Abstract/FREE Full Text
↵
Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-Variant Association Analysis: Study Designs and Statistical Tests. The American Journal of Human Genetics 95, 5–23 (2014).
OpenUrl CrossRef PubMed
↵
Lee, S., Teslovich, T. M., Boehnke, M. & Lin, X. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies. The American Journal of Human Genetics (2013).
Hu, Y.-J. et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. American journal of human genetics 93, 236–248 (2013).
OpenUrl CrossRef PubMed
↵
Liu, D. J. et al. Meta-analysis of gene-level tests for rare variant association. Nature genetics 46, 200–204 (2014).
OpenUrl CrossRef PubMed
↵
Liu, J. Z. et al. A Versatile Gene-Based Test for Genome-wide Association Studies. The American Journal of Human Genetics 87, 139–145 (2010).
OpenUrl CrossRef PubMed Web of Science
↵
Faye, L. L., Machiela, M. J., Kraft, P., Bull, S. B. & Sun, L. Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification. PLoS genetics 9, e1003609 (2013).
OpenUrl
↵
Stephens, M. & Balding, D. J. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10, 681–690 (2009).
OpenUrl CrossRef PubMed Web of Science
↵
Wellcome Trust Case Control, C. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nature genetics 44, 1294–1301 (2012).
OpenUrl CrossRef PubMed
↵
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
OpenUrl Abstract/FREE Full Text
↵
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS genetics 10, e1004722 (2014).
OpenUrl CrossRef
↵
Chen, W. et al. Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics. Genetics 200, 719–736 (2015).
OpenUrl Abstract/FREE Full Text
↵
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
OpenUrl CrossRef PubMed
↵
Newcombe, P. J., Conti, D. V. & Richardson, S. JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects. Genetic epidemiology 40, 188–201 (2016).
OpenUrl CrossRef
van de Bunt, M. et al. Evaluating the Performance of Fine-Mapping Strategies at Common Variant GWAS Loci. PLoS genetics 11, e1005535 (2015).
OpenUrl
↵
Li, Y. & Kellis, M. Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases. Nucleic acids research gkw627 (2016). doi:10.1093/nar/gkw627
OpenUrl CrossRef PubMed
↵
Udler, M. S. et al. FGFR2 variants and breast cancer risk: fine-scale mapping using African American studies and analysis of chromatin conformation. Human molecular genetics 18, 1692–1703 (2009).
OpenUrl CrossRef PubMed Web of Science
↵
Udler, M. S., Tyrer, J. & Easton, D. F. Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genetic epidemiology 34, 463–468 (2010).
OpenUrl CrossRef PubMed
↵
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
OpenUrl CrossRef PubMed Web of Science
↵
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
OpenUrl CrossRef PubMed
↵
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
OpenUrl Abstract/FREE Full Text
↵
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nature genetics 45, 124–130 (2013).
OpenUrl CrossRef PubMed
↵
Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. American journal of human genetics 94, 559–573 (2014).
OpenUrl CrossRef PubMed
↵
Chung, D., Yang, C., Li, C., Gelernter, J. & Zhao, H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS genetics 10, e1004787 (2014).
OpenUrl
↵
Kichaev, G. & Pasaniuc, B. Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies. American journal of human genetics 97, 260–271 (2015).
OpenUrl CrossRef PubMed
↵
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
OpenUrl CrossRef PubMed
↵
Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).
OpenUrl CrossRef PubMed
↵
Liu, C.-T. et al. Trans-ethnic Meta-analysis and Functional Annotation Illuminates the Genetic Architecture of Fasting Glucose and Insulin. American journal of human genetics 99, 56–75 (2016).
OpenUrl CrossRef PubMed
↵
Grubert, F. et al. Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions. Cell 162, 1051–1065 (2015).
OpenUrl CrossRef PubMed
↵
Waszak, S. M. et al. Population Variation and Genetic Control of Modular Chromatin Architecture in Humans. Cell 162, 1039–1050 (2015).
OpenUrl CrossRef PubMed
↵
Zaitlen, N., Pasaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. American journal of human genetics 86, 23–33 (2010).
OpenUrl CrossRef PubMed
↵
Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genetic epidemiology 35, 809–822 (2011).
OpenUrl CrossRef PubMed
↵
Ong, R. T.-H., Wang, X., Liu, X. & Teo, Y. Y. Efficiency of trans-ethnic genome-wide meta-analysis and fine-mapping. European journal of human genetics: EJHG 20, 1300–1307 (2012).
OpenUrl
↵
Asimit, J. L., Hatzikotoulas, K., McCarthy, M., Morris, A. P. & Zeggini, E. Trans-ethnic study design approaches for fine-mapping. European Journal of Human Genetics 24, 1330–1336 (2016).
OpenUrl CrossRef PubMed
↵
Liu, C.-T. et al. Multi-ethnic fine-mapping of 14 central adiposity loci. Human molecular genetics 23, 4738–4744 (2014).
OpenUrl CrossRef PubMed
↵
Kuo, J. Z. et al. Trans-ethnic fine mapping identifies a novel independent locus at the 3' end of CDKAL1 and novel variants of several susceptibility loci for type 2 diabetes in a Han Chinese population. Diabetologia 56, 2619–2628 (2013).
OpenUrl CrossRef PubMed Web of Science
↵
Chatterjee, N., Shi, J. & Garcia-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
OpenUrl CrossRef PubMed
↵
Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nature genetics 45, 400–5–405e1–3 (2013).
↵
International Schizophrenia Consortium et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
OpenUrl CrossRef PubMed Web of Science
↵
Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature genetics 44, 483–489 (2012).
OpenUrl CrossRef PubMed
↵
Vilhjalmsson, B. J. et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. American journal of human genetics 97, 576–592 (2015).
OpenUrl CrossRef PubMed
↵
Henderson, C. R. Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447 (1975).
OpenUrl CrossRef PubMed Web of Science
de los Campos, G., Gianola, D. & Allison, D. B. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat. Rev. Genet. 11, 880–886 (2010).
OpenUrl CrossRef PubMed
↵
Speed, D. & Balding, D. J. MultiBLUP: improved SNP-based prediction for complex traits. Genome research 24, 1550–1557 (2014).
OpenUrl Abstract/FREE Full Text
↵
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic Modeling with Bayesian Sparse Linear Mixed Models. PLoS genetics 9, (2013).
↵
Moser, G. et al. Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model. PLoS genetics 11, e1004969 (2015).
OpenUrl
↵
Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).
OpenUrl CrossRef PubMed
↵
Palla, L. & Dudbridge, F. A. Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait. American journal of human genetics 97, 250–259 (2015).
OpenUrl CrossRef PubMed
↵
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics 47, 291–295 (2015).
OpenUrl CrossRef PubMed
↵
Yang, J. et al. Genomic inflation factors under polygenic inheritance. European Journal of Human Genetics 19, 807–812 (2011).
OpenUrl CrossRef PubMed
↵
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nature genetics 42, 565–569 (2010).
OpenUrl CrossRef PubMed Web of Science
↵
Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nature genetics 47, 284–290 (2015).
OpenUrl CrossRef PubMed
↵
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature genetics 47, 1228–1235 (2015).
OpenUrl CrossRef PubMed
↵
Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nature genetics 43, 519–525 (2011).
OpenUrl CrossRef PubMed
↵
Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nature genetics 47, 1385–1392 (2015).
OpenUrl CrossRef PubMed
↵
Cotsapas, C. et al. Pervasive Sharing of Genetic Effects in Autoimmune Disease. PLoS genetics 7, e1002254 (2011).
OpenUrl
Sivakumaran, S. et al. Abundant pleiotropy in human complex diseases and traits. American journal of human genetics 89, 607–618 (2011).
OpenUrl CrossRef PubMed
Styrkársdottir, U. et al. Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits. Nature 497, 517–520 (2013).
OpenUrl CrossRef PubMed Web of Science
Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nature Biotechnology 31, 1102–1110 (2013).
OpenUrl CrossRef PubMed
Gusev, A. et al. Quantifying missing heritability at known GWAS loci. PLoS genetics 9, e1003993 (2013).
OpenUrl CrossRef
↵
Stefansson, H. et al. CNVs conferring risk of autism or schizophrenia affect cognition in controls. Nature 505, 361–366 (2014).
OpenUrl CrossRef PubMed Web of Science
↵
Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nature genetics (2016). doi:10.1038/ng.3570
OpenUrl CrossRef PubMed
↵
Voight, B. F. et al. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380, 572–580 (2012).
OpenUrl CrossRef PubMed Web of Science
Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic epidemiology 37, 658–665 (2013).
OpenUrl CrossRef PubMed
↵
Burgess, S., Dudbridge, F. & Thompson, S. G. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat Med 35, 1880–1906 (2016).
OpenUrl CrossRef PubMed
↵
Lee, S. H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature genetics 45, 984–+ (2013).
OpenUrl CrossRef PubMed
↵
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nature genetics 47, 1236–1241 (2015).
OpenUrl CrossRef PubMed
↵
Brown, B. C., Asian Genetic Epidemiology Network-Type 2 Diabetes, C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic Genetic-Correlation Estimates from Summary Statistics. American journal of human genetics 99, 76–88 (2016).
OpenUrl CrossRef PubMed
↵
Nature Genetics. Asking for more. Nature genetics 44, 733–733 (2012).
OpenUrl CrossRef PubMed
↵
Homer, N. et al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS genetics 4, e1000167 (2008).
OpenUrl CrossRef
↵
Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nature genetics 46, 100–106 (2014).
OpenUrl CrossRef PubMed
↵
Sankararaman, S., Obozinski, G., Jordan, M. I. & Halperin, E. Genomic privacy and limits of individual detection in a pool. Nature genetics 41, 965–967 (2009).
OpenUrl CrossRef PubMed Web of Science
Visscher, P. M. & Hill, W. G. The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis. PLoS genetics 5, e1000628 (2009).
OpenUrl
↵
Erlich, Y. & Narayanan, A. Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 409–421 (2014).
OpenUrl CrossRef PubMed
↵
Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS genetics 5, e1000384 (2009).
OpenUrl CrossRef
↵
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. American journal of human genetics 83, 311–321 (2008).
OpenUrl CrossRef PubMed Web of Science
↵
Price, A. et al. Pooled association tests for rare variants in exon resequencing studies. 86, 832–838 (2010).
OpenUrl
↵
Neale, B. M. et al. Testing for an Unusual Distribution of Rare Variants. PLoS genetics 7, e1001322 (2011).
OpenUrl CrossRef PubMed
↵
Wu, M. C. et al. Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. The American Journal of Human Genetics 89, 82–93 (2011).
OpenUrl CrossRef PubMed
↵
Daetwyler, H. D., Villanueva, B. & Woolliams, J. A. Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach. PloS one 3, e3395 (2008).
OpenUrl CrossRef PubMed
↵
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. American journal of human genetics 88, 294–305 (2011).
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted September 01, 2016.

Download PDF

Citation Tools

Subject Area

Genetics

Subject Areas

All Articles

Animal Behavior and Cognition (5197)
Biochemistry (11699)
Bioengineering (8715)
Bioinformatics (29119)
Biophysics (14927)
Cancer Biology (12047)
Cell Biology (17347)
Clinical Trials (138)
Developmental Biology (9405)
Ecology (14138)
Epidemiology (2067)
Evolutionary Biology (18261)
Genetics (12216)
Genomics (16760)
Immunology (11839)
Microbiology (27996)
Molecular Biology (11549)
Neuroscience (60781)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3228)
Physiology (4937)
Plant Biology (10382)
Scientific Communication and Education (1679)
Synthetic Biology (2876)
Systems Biology (7332)
Zoology (1642)

[1] ↵
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five Years of GWAS Discovery. American journal of human genetics (2012).

[2] ↵
1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
OpenUrl CrossRef PubMed Web of Science

[3] ↵
Evangelou, E. & Ioannidis, J. P. Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 14, 379–389 (2013).
OpenUrl CrossRef PubMed

[4] ↵
Lin, D. Y. & Zeng, D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genetic epidemiology 34, 60–66 (2010).
OpenUrl PubMed Web of Science

[5] ↵
Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. American journal of human genetics 88, 586–598 (2011).
OpenUrl CrossRef PubMed

[6] ↵
Han, B. & Eskin, E. Interpreting meta-analyses of genome-wide association studies. PLoS genetics 8, e1002555 (2012).
OpenUrl

[7] ↵
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature genetics 44, 369–75–S1–3 (2012).
OpenUrl CrossRef PubMed

[8] ↵
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature genetics 46, 1173–1186 (2014).
OpenUrl CrossRef PubMed

[9] Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
OpenUrl CrossRef PubMed

[10] ↵
Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the Genetic Architecture of 30 Complex Traits from Summary Association Data. American journal of human genetics 99, 139–153 (2016).
OpenUrl CrossRef

[11] ↵
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
OpenUrl CrossRef PubMed Web of Science

[12] ↵
Wen, X. & Stephens, M. Using linear predictors to impute allele frequencies from summary or pooled genotype data. Ann. Appl. Stat. 4, 1158–1182 (2010).
OpenUrl CrossRef PubMed

[13] Kostem, E., Lozano, J. A. & Eskin, E. Increasing Power of Genome-Wide Association Studies by Collecting Additional Single-Nucleotide Polymorphisms. Genetics 188, 449–460 (2011).
OpenUrl Abstract/FREE Full Text

[14] ↵
Lee, D., Bigdeli, T. B., Riley, B. P., Fanous, A. H. & Bacanu, S. A. DIST: direct imputation of summary statistics for unmeasured SNPs. Bioinformatics 29, 2925–2927 (2013).
OpenUrl CrossRef PubMed Web of Science

[15] Pasaniuc, B. et al. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30, 2906–2914 (2014).
OpenUrl CrossRef PubMed Web of Science

[16] ↵
Xu, Z. et al. DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics 31, 2434–2442 (2015).
OpenUrl CrossRef PubMed

[17] ↵
Lee, D. et al. DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts. Bioinformatics 31, 3099–3104 (2015).
OpenUrl CrossRef PubMed

[18] ↵
Park, D. S. et al. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses. Bioinformatics 31, i181–9 (2015).
OpenUrl CrossRef PubMed

[19] ↵
Conneely, K. N. & Boehnke, M. So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. American journal of human genetics 81, 1158–1168 (2007).
OpenUrl CrossRef PubMed Web of Science

[20] ↵
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).
OpenUrl CrossRef PubMed

[21] ↵
Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS genetics 6, e1000888 (2010).
OpenUrl

[22] ↵
Nica, A. C. et al. Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations. PLoS genetics 6, e1000895 (2010).
OpenUrl

[23] Xiong, Q., Ancona, N., Hauser, E. R., Mukherjee, S. & Furey, T. S. Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets. Genome Res. 22, 386–397 (2012).
OpenUrl Abstract/FREE Full Text

[24] ↵
He, X. et al. Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. American journal of human genetics 92, 667–680 (2013).
OpenUrl CrossRef PubMed

[25] ↵
Huang, Y. T., Liang, L., Moffatt, M. F., Cookson, W. O. C. M. & Lin, X. iGWAS: Integrative Genome-Wide Association Studies of Genetic and Genomic Data for Disease Susceptibility Using Mediation Analysis. Genetic epidemiology 39, 347–356 (2015).
OpenUrl CrossRef

[26] ↵
Giambartolomei, C. et al. Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. PLoS genetics 10, (2014).

[27] Onengut-Gumuscu, S. et al. Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nature genetics 47, 381–386 (2015).
OpenUrl CrossRef PubMed

[28] ↵
Fortune, M. D. et al. Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls. Nature genetics 47, 839–846 (2015).
OpenUrl CrossRef PubMed

[29] ↵
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nature genetics 47, 1091–1098 (2015).
OpenUrl CrossRef PubMed

[30] ↵
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nature genetics 48, 245–252 (2016).
OpenUrl CrossRef PubMed

[31] ↵
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature genetics 48, 481–487 (2016).
OpenUrl CrossRef PubMed

[32] ↵
Pavlides, J. M. W. et al. Predicting gene targets from integrative analyses of summary data from GWAS and eQTL studies for 28 human complex traits. Genome Med 8, 84 (2016).
OpenUrl

[33] ↵
Gibson, G. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2011).
OpenUrl CrossRef PubMed

[34] ↵
Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proceedings of the National Academy of Sciences of the United States of America 111, E455–64 (2014).
OpenUrl Abstract/FREE Full Text

[35] ↵
Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-Variant Association Analysis: Study Designs and Statistical Tests. The American Journal of Human Genetics 95, 5–23 (2014).
OpenUrl CrossRef PubMed

[36] ↵
Lee, S., Teslovich, T. M., Boehnke, M. & Lin, X. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies. The American Journal of Human Genetics (2013).

[37] Hu, Y.-J. et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. American journal of human genetics 93, 236–248 (2013).
OpenUrl CrossRef PubMed

[38] ↵
Liu, D. J. et al. Meta-analysis of gene-level tests for rare variant association. Nature genetics 46, 200–204 (2014).
OpenUrl CrossRef PubMed

[39] ↵
Liu, J. Z. et al. A Versatile Gene-Based Test for Genome-wide Association Studies. The American Journal of Human Genetics 87, 139–145 (2010).
OpenUrl CrossRef PubMed Web of Science

[40] ↵
Faye, L. L., Machiela, M. J., Kraft, P., Bull, S. B. & Sun, L. Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification. PLoS genetics 9, e1003609 (2013).
OpenUrl

[41] ↵
Stephens, M. & Balding, D. J. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10, 681–690 (2009).
OpenUrl CrossRef PubMed Web of Science

[42] ↵
Wellcome Trust Case Control, C. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nature genetics 44, 1294–1301 (2012).
OpenUrl CrossRef PubMed

[43] ↵
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
OpenUrl Abstract/FREE Full Text

[44] ↵
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS genetics 10, e1004722 (2014).
OpenUrl CrossRef

[45] ↵
Chen, W. et al. Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics. Genetics 200, 719–736 (2015).
OpenUrl Abstract/FREE Full Text

[46] ↵
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
OpenUrl CrossRef PubMed

[47] ↵
Newcombe, P. J., Conti, D. V. & Richardson, S. JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects. Genetic epidemiology 40, 188–201 (2016).
OpenUrl CrossRef

[48] van de Bunt, M. et al. Evaluating the Performance of Fine-Mapping Strategies at Common Variant GWAS Loci. PLoS genetics 11, e1005535 (2015).
OpenUrl

[49] ↵
Li, Y. & Kellis, M. Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases. Nucleic acids research gkw627 (2016). doi:10.1093/nar/gkw627
OpenUrl CrossRef PubMed

[50] ↵
Udler, M. S. et al. FGFR2 variants and breast cancer risk: fine-scale mapping using African American studies and analysis of chromatin conformation. Human molecular genetics 18, 1692–1703 (2009).
OpenUrl CrossRef PubMed Web of Science

[51] ↵
Udler, M. S., Tyrer, J. & Easton, D. F. Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genetic epidemiology 34, 463–468 (2010).
OpenUrl CrossRef PubMed

[52] ↵
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
OpenUrl CrossRef PubMed Web of Science

[53] ↵
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
OpenUrl CrossRef PubMed

[54] ↵
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
OpenUrl Abstract/FREE Full Text

[55] ↵
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nature genetics 45, 124–130 (2013).
OpenUrl CrossRef PubMed

[56] ↵
Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. American journal of human genetics 94, 559–573 (2014).
OpenUrl CrossRef PubMed

[57] ↵
Chung, D., Yang, C., Li, C., Gelernter, J. & Zhao, H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS genetics 10, e1004787 (2014).
OpenUrl

[58] ↵
Kichaev, G. & Pasaniuc, B. Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies. American journal of human genetics 97, 260–271 (2015).
OpenUrl CrossRef PubMed

[59] ↵
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
OpenUrl CrossRef PubMed

[60] ↵
Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).
OpenUrl CrossRef PubMed

[61] ↵
Liu, C.-T. et al. Trans-ethnic Meta-analysis and Functional Annotation Illuminates the Genetic Architecture of Fasting Glucose and Insulin. American journal of human genetics 99, 56–75 (2016).
OpenUrl CrossRef PubMed

[62] ↵
Grubert, F. et al. Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions. Cell 162, 1051–1065 (2015).
OpenUrl CrossRef PubMed

[63] ↵
Waszak, S. M. et al. Population Variation and Genetic Control of Modular Chromatin Architecture in Humans. Cell 162, 1039–1050 (2015).
OpenUrl CrossRef PubMed

[64] ↵
Zaitlen, N., Pasaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. American journal of human genetics 86, 23–33 (2010).
OpenUrl CrossRef PubMed

[65] ↵
Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genetic epidemiology 35, 809–822 (2011).
OpenUrl CrossRef PubMed

[66] ↵
Ong, R. T.-H., Wang, X., Liu, X. & Teo, Y. Y. Efficiency of trans-ethnic genome-wide meta-analysis and fine-mapping. European journal of human genetics: EJHG 20, 1300–1307 (2012).
OpenUrl

[67] ↵
Asimit, J. L., Hatzikotoulas, K., McCarthy, M., Morris, A. P. & Zeggini, E. Trans-ethnic study design approaches for fine-mapping. European Journal of Human Genetics 24, 1330–1336 (2016).
OpenUrl CrossRef PubMed

[68] ↵
Liu, C.-T. et al. Multi-ethnic fine-mapping of 14 central adiposity loci. Human molecular genetics 23, 4738–4744 (2014).
OpenUrl CrossRef PubMed

[69] ↵
Kuo, J. Z. et al. Trans-ethnic fine mapping identifies a novel independent locus at the 3' end of CDKAL1 and novel variants of several susceptibility loci for type 2 diabetes in a Han Chinese population. Diabetologia 56, 2619–2628 (2013).
OpenUrl CrossRef PubMed Web of Science

[70] ↵
Chatterjee, N., Shi, J. & Garcia-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
OpenUrl CrossRef PubMed

[71] ↵
Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nature genetics 45, 400–5–405e1–3 (2013).

[72] ↵
International Schizophrenia Consortium et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
OpenUrl CrossRef PubMed Web of Science

[73] ↵
Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature genetics 44, 483–489 (2012).
OpenUrl CrossRef PubMed

[74] ↵
Vilhjalmsson, B. J. et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. American journal of human genetics 97, 576–592 (2015).
OpenUrl CrossRef PubMed

[75] ↵
Henderson, C. R. Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447 (1975).
OpenUrl CrossRef PubMed Web of Science

[76] de los Campos, G., Gianola, D. & Allison, D. B. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat. Rev. Genet. 11, 880–886 (2010).
OpenUrl CrossRef PubMed

[77] ↵
Speed, D. & Balding, D. J. MultiBLUP: improved SNP-based prediction for complex traits. Genome research 24, 1550–1557 (2014).
OpenUrl Abstract/FREE Full Text

[78] ↵
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic Modeling with Bayesian Sparse Linear Mixed Models. PLoS genetics 9, (2013).

[79] ↵
Moser, G. et al. Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model. PLoS genetics 11, e1004969 (2015).
OpenUrl

[80] ↵
Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).
OpenUrl CrossRef PubMed

[81] ↵
Palla, L. & Dudbridge, F. A. Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait. American journal of human genetics 97, 250–259 (2015).
OpenUrl CrossRef PubMed

[82] ↵
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics 47, 291–295 (2015).
OpenUrl CrossRef PubMed

[83] ↵
Yang, J. et al. Genomic inflation factors under polygenic inheritance. European Journal of Human Genetics 19, 807–812 (2011).
OpenUrl CrossRef PubMed

[84] ↵
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nature genetics 42, 565–569 (2010).
OpenUrl CrossRef PubMed Web of Science

[85] ↵
Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nature genetics 47, 284–290 (2015).
OpenUrl CrossRef PubMed

[86] ↵
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature genetics 47, 1228–1235 (2015).
OpenUrl CrossRef PubMed

[87] ↵
Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nature genetics 43, 519–525 (2011).
OpenUrl CrossRef PubMed

[88] ↵
Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nature genetics 47, 1385–1392 (2015).
OpenUrl CrossRef PubMed

[89] ↵
Cotsapas, C. et al. Pervasive Sharing of Genetic Effects in Autoimmune Disease. PLoS genetics 7, e1002254 (2011).
OpenUrl

[90] Sivakumaran, S. et al. Abundant pleiotropy in human complex diseases and traits. American journal of human genetics 89, 607–618 (2011).
OpenUrl CrossRef PubMed

[91] Styrkársdottir, U. et al. Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits. Nature 497, 517–520 (2013).
OpenUrl CrossRef PubMed Web of Science

[92] Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nature Biotechnology 31, 1102–1110 (2013).
OpenUrl CrossRef PubMed

[93] Gusev, A. et al. Quantifying missing heritability at known GWAS loci. PLoS genetics 9, e1003993 (2013).
OpenUrl CrossRef

[94] ↵
Stefansson, H. et al. CNVs conferring risk of autism or schizophrenia affect cognition in controls. Nature 505, 361–366 (2014).
OpenUrl CrossRef PubMed Web of Science

[95] ↵
Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nature genetics (2016). doi:10.1038/ng.3570
OpenUrl CrossRef PubMed

[96] ↵
Voight, B. F. et al. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380, 572–580 (2012).
OpenUrl CrossRef PubMed Web of Science

[97] Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic epidemiology 37, 658–665 (2013).
OpenUrl CrossRef PubMed

[98] ↵
Burgess, S., Dudbridge, F. & Thompson, S. G. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat Med 35, 1880–1906 (2016).
OpenUrl CrossRef PubMed

[99] ↵
Lee, S. H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature genetics 45, 984–+ (2013).
OpenUrl CrossRef PubMed

[100] ↵
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nature genetics 47, 1236–1241 (2015).
OpenUrl CrossRef PubMed

[101] ↵
Brown, B. C., Asian Genetic Epidemiology Network-Type 2 Diabetes, C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic Genetic-Correlation Estimates from Summary Statistics. American journal of human genetics 99, 76–88 (2016).
OpenUrl CrossRef PubMed

[102] ↵
Nature Genetics. Asking for more. Nature genetics 44, 733–733 (2012).
OpenUrl CrossRef PubMed

[103] ↵
Homer, N. et al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS genetics 4, e1000167 (2008).
OpenUrl CrossRef

[104] ↵
Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nature genetics 46, 100–106 (2014).
OpenUrl CrossRef PubMed

[105] ↵
Sankararaman, S., Obozinski, G., Jordan, M. I. & Halperin, E. Genomic privacy and limits of individual detection in a pool. Nature genetics 41, 965–967 (2009).
OpenUrl CrossRef PubMed Web of Science

[106] Visscher, P. M. & Hill, W. G. The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis. PLoS genetics 5, e1000628 (2009).
OpenUrl

[107] ↵
Erlich, Y. & Narayanan, A. Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 409–421 (2014).
OpenUrl CrossRef PubMed

[108] ↵
Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS genetics 5, e1000384 (2009).
OpenUrl CrossRef

[109] ↵
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. American journal of human genetics 83, 311–321 (2008).
OpenUrl CrossRef PubMed Web of Science

[110] ↵
Price, A. et al. Pooled association tests for rare variants in exon resequencing studies. 86, 832–838 (2010).
OpenUrl

[111] ↵
Neale, B. M. et al. Testing for an Unusual Distribution of Rare Variants. PLoS genetics 7, e1001322 (2011).
OpenUrl CrossRef PubMed

[112] ↵
Wu, M. C. et al. Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. The American Journal of Human Genetics 89, 82–93 (2011).
OpenUrl CrossRef PubMed

[113] ↵
Daetwyler, H. D., Villanueva, B. & Woolliams, J. A. Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach. PloS one 3, e3395 (2008).
OpenUrl CrossRef PubMed

[114] ↵
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. American journal of human genetics 88, 294–305 (2011).
OpenUrl CrossRef PubMed Web of Science

Dissecting the genetics of complex traits using summary association statistics

Abstract

Introduction

Single-variant association tests

Meta-analysis using fixed-effects or random-effects models

Conditional association using LD reference data

Imputation using summary association statistics

Gene-based association tests

Gene-based association using transcriptome reference data

Rare variant association tests

Fine-mapping

Fine-mapping using posterior probabilities of causality

Leveraging functional annotation data

Trans-ethnic fine-mapping

Polygenicity of complex traits

Polygenic risk prediction

Inferring polygenic architectures

Cross-trait analyses

Conclusion

Box 1: Conditional association and summary statistic imputation using LD reference data

Conditional association using LD reference data

Summary statistic imputation using LD reference data

Box 2: Rare variant association tests using summary association statistics

Burden tests

Overdispersion tests

Box 3: Polygenic risk prediction using summary vs. individual-level data

Competing interests

Key references

Acknowledgements

Glossary

References

Citation Manager Formats

Subject Area