Abstract
Our understanding of the genetic basis of human adaptation is biased toward loci of large pheno-typic effect. Genome wide association studies (GWAS) now enable the study of genetic adaptation in polygenic phenotypes. We test for polygenic adaptation among 187 world-wide human populations using polygenic scores constructed from GWAS of 34 complex traits. We identify signals of polygenic adaptation for anthropometric traits including height, infant head circumference (IHC), hip circumference and waist-to-hip ratio (WHR). Analysis of ancient DNA samples indicates that a north-south cline of height within Europe and and a west-east cline across Eurasia can be traced to selection for increased height in two late Pleistocene hunter gatherer populations living in western and west-central Eurasia. Our observation that IHC and WHR follow a latitudinal cline in Western Eurasia support the role of natural selection driving Bergmann’s Rule in humans, consistent with thermoregulatory adaptation in response to latitudinal temperature variation.
Author’s Note on Failure to Replicate After this preprint was posted, the UK Biobank dataset was released, providing a new and open GWAS resource. When attempting to replicate the height selection results from this preprint using GWAS data from the UK Biobank, we discovered that we could not. In subsequent analyses, we determined that both the GIANT consortium height GWAS data, as well as another dataset that was used for replication, were impacted by stratification issues that created or at a minimum substantially inflated the height selection signals reported here. The results of this second investigation, written together with additional coauthors, have now been published (https://elifesciences.org/articles/39725 along with another paper by a separate group of authors, showing similar issues https://elifesciences.org/articles/39702). A preliminary investigation shows that the other non-height based results may suffer from similar issues. We stand by the theory and statistical methods reported in this paper, and the paper can be cited for these results. However, we have shown that the data on which the major empirical results were based are not sound, and so should be treated with caution until replicated.
Main Text
Decades of research in anthropology have identified anthropometric traits that show evidence of biological adaptation to climatic conditions as humans spread around the world over the past hundred thousand years.1, 2, 3 However, it can be challenging to rule out non-heritable environmental factors,4, 5 as opposed to genetic variation, as the primary cause of these phenotypic differences.6 Even for phenotypes where there is some confidence that some of the phenotypic differences among populations are due in part to genetic differences, it is often hard to rule out genetic drift as an alterative explanation to selection.7, 8, 9 The development of population-genetic methods and genomic data resources during the last few decades has enabled the interrogation of adaptive hypotheses and has produced an expanding list of examples of plausible human adaptations.10, 11 However, such approaches are often inherently limited to detecting adaptation in genetically simple traits via large allele frequency changes at a small number of loci, whereas many adaptations likely involve highly polygenic traits and so are undetectable by most approaches.12, 13 Genome-wide association studies (GWAS) have now identified thousands of loci underlying the genetic basis of many complex traits.14, 15, 16 These studies oer an unprecedented opportunity to identify adaptation in recent human evolution by detecting subtle shifts in allele frequencies compounded over many GWAS loci.17, 18, 19, 20, 21, 22, 23
We conducted a broad screen for evidence of directional selection on variants that contribute to 34 polygenic traits by studying the distribution of their allele frequencies in a dataset of 187 human populations (2158 individuals across 161 populations from the Human Origins Panel24 and 2504 individuals across 26 populations of the 1000 Genomes phase 3 panel25), making use of prior large-scale GWAS for these traits (see Table S1). We divided the genome into 1700 non-overlapping and approximately independent linkage blocks26 and choose the SNP with the highest posterior probability of association within the block.27, 28 For each trait, we calculate a polygenic score for each population as a weighted sum of allele frequencies at each of these 1700 SNPs, with the GWAS effect sizes taken as the weights. Figure 1 shows the distribution of these scores for height across our population samples.
These polygenic scores should not be viewed as phenotypic predictions across populations. For example, the Maasai and Biaka pygmy populations have similar polygenic scores despite having dramatic differences in height.29 Discrepancies between polygenic scores and actual phenotypes may be expected to occur either because of purely environmental influences on phenotype, or due to gene-by-gene and gene-by-environment interactions. We also expect that the accuracy of these scores when viewed as predictions should decay with genetic distance from Europe (where the GWAS were carried out), due to changes in the structure of linkage disequilibrium (LD) between causal variants and tag SNPs picked up in GWAS, and because GWAS are biased toward discovering intermediate frequency variants, which will explain more variance in the region they are mapped in than outside of it. These caveats notwithstanding, the distribution of polygenic scores across populations is informative about the history of natural selection on a given phenotype,18 and a number of striking patterns are visible in their distribution. For example, there is a strong gradient in polygenic height scores running from east to west across Eurasia (Figure 1)
To explore whether patterns observed in the polygenic scores were caused by natural selection, we tested whether the observed distribution of polygenic scores across populations could plausibly have been generated under a neutral model of genetic drift. To understand this null model, consider that a neutrally evolving allele has the same expected frequency across a set of independently evolving sub-populations. However, due to genetic drift, individual sub-populations will deviate from this expected frequency, with the variance of the sub-population frequencies given by FST p (1 − p), where p is the ancestral allele frequency, and FST is Wright’s “fixation index,”30 which can be measured from genome-wide data.17, 31 Our polygenic scores sum the contributions of a large number of effectively unlinked loci, which under our null model will experience genetic drift independently. It follows that under a model of genetic drift, the polygenic score of each of a set of independent sub-populations will be normally distributed, with variance of V AFST, where VA is the additive genetic variance of polygenic scores the ancestral population. Our test is based on a generalization of this simple relation in which we account for both variance and covariance among multiple populations that are non-independent due to common descent, migration, and admixture over the history of human evolution. Specifically, we model the joint distribution of polygenic scores as multivariate normal and use a generalized variance statistic (QX) to measure the over-dispersion of polygenic scores relative to the neutral prediction, which is taken as evidence in favor of natural selection driving dierence among populations in polygenic scores (see Methods and our previous study18 for details). Our approach is similar to classic tests of adaptation on phenotypes measured in common gardens, which rely on comparisons of the within and among-population additive genetic variance for phenotypes and neutral markers, i.e. QST/FST comparisons.32, 33, 34 Importantly, the neutral distribution we derive holds independent of whether the loci truly influence the trait in an additive manner (with respect to each other or the environment), and whether the GWAS loci are truly causal or merely imperfect tags. However, population structure in the original GWAS panels can confound signals of polygenic adaptation.18, 20 Modern methods are generally considered to be effective at controlling for the effects of population structure,35 and we proceed assuming that it has been adequately accounted for in the original GWAS panels.
We applied our test to each of the 34 traits across all populations, as well as within nine restricted regional groupings (Figure 2 and Table S3). Using our test across all populations as a general test for the impact of selection anywhere in the dataset, we find 5 signals of selection after controlling for multiple testing (p < 0.05/34). In each case of significant over-dispersion, the signal represents a small but systematic shift in allele frequency of a few percent across many loci, which would be undetectable by standard population-genetic tests for selection (see Table S6), such that the majority of the variance in polygenic scores is within populations as opposed to among populations (see Table S4). The traits involved include height, infant head circumference (IHC), hip circumference, waist-hip ratio (WHR), and type 2 diabetes (T2D). Although the sixth-strongest signal, waist circumference, failed to meet the multiple-testing correction, we include it in subsequent analyses due to its obvious relationship to WHR. We also found signals of selection on polygenic scores constructed for waist and hip circumference and waist-hip ratio when adjusted for BMI (Table S3), but we focus on the unadjusted versions for ease of interpretation. We do not replicate a previously reported signal of selection on BMI within Europe, but also note that the previous study used many more SNPs than we have in constructing polygenic scores, which likely explains the difference.20
The predominantly European ascertainment of GWAS loci can lead to apparent deviations from neutrality. Therefore all p values in Figure 2 and throughout the paper are derived from comparing test statistics against frequency-matched empirical controls, unless otherwise stated (see Text S1.3). This empirical matching is an important control. For example, the distribution of polygenic scores for Schizophrenia show a signal of over-dispersion under the naive null hypothesis, but not after controlling for the effects of ascertainment. More generally, the ascertainment and selection against disease phenotypes pose difficulties for the interpretation of tests of dierentiation. Thus, although we see a signal of selection for decreased T2D polygenic scores in Europe, the interpretation of this signal likely requires the development of more explicit models of selection on disease traits (section S1.4).
The Geography of Selection on Height
In addition to the known gradient of increased polygenic height scores in northern Europeans relative to southern Europeans (latitude correlation within Europe p = 6.3 × 10−6, see S2 and Methods for statistical details),17, 18, 19, 20, 36 we also find evidence that that natural selection has impacted polygenic height scores well outside of modern Europe. Polygenic scores decline sharply from west to east across Eurasia in a way that cannot be predicted by a neutral model (longitude correlation across Eurasia, p = 4.46 × 10−15; Figure 1), and they are overdispersed within each of our four population clusters (north, south/central, east, and west) across Asia, as well among Native Americans (Figure 2). Does this broadly Eurasian signal represents multiple independent episodes of selection on the genetic basis of height, or can it be explained by ancient selection on one or just a few populations, with modern signals reflecting variation in the extent to which modern populations derive ancestry from these ancient populations? For example, the signal of selection on height in East Asia is driven entirely by the Tu population sample, who have the highest polygenic height score among East Asian samples (p = 0.4329 for height in East Asia after the Tu are removed). Does this unusually high polygenic score reflect recent selection, or the fact that the Tu derive a proportion of their ancestry from an ∼800-year-old admixture event involving a population resembling modern Europeans37?
To test whether the height signal within Asia is due to a selective event shared with Europeans, we predicted the polygenic height scores across Asia given the deviation of European populations from the Asian mean, and each of the Asian sample’s genome-wide relationship to the European samples (see Figure 3, and Methods for details). We find that this prediction conditioned on Europeans are suffcient to explain most the divergence between the Tu and the other East Asian populations in our dataset (see sky blue dots in Figure 3), and eliminate the signal of selection among East Asian populations (p = 0.099 after conditioning). In fact, all signals of dierential selection on height across Asia can be eliminated using these conditional predictions (p = 0.2019 after conditioning). This suggests that most of the selected divergence in our polygenic height scores across Eurasia can be attributed either to events which are predominantly ancestral to modern Europeans (but which have impacted other regions via admixture), or which lie along an early lineage which has contributed ancestry broadly across Eurasia.
To gain further clarity about the history of selection on height, we examined polygenic height scores in a set of ancient DNA samples from Western Eurasia.19, 38, 39 In Figure 4A we plot estimates of the polygenic score through time for ancient and modern samples, and in Figure 4B a heatmap of signed p-values from our test of selection applied to pairs of populations (for more detail see Text S1.5). The earliest unambiguous signal of selection for increased height is found approximately 15,000 years ago in the Villabruna cluster of hunter-gatherers, who have significantly increased polygenic scores relative to earlier pleistocene hunter-gatherers (e.g. Villabruna vs Ust’-Ishim p = 0.0015, Villabruna vs Kostenki14 p = 0.0244, Villabruna vs Vestonice p = 0.003). The Mal’ta sample also appears to have an elevated polygenic score, on par with modern Europeans, but it is not significantly different from the earlier pleistocene hunter-gatherers in pairwise tests. Moving into the Holocene, the western, Scandanavian, and Caucasus hunter-gatherers (WHG, SHG, and CHG respectively) all have signficiantly increased polygenic height scores when compared to any of the early pleistocene hunter-gatherers. While WHG and SHG share a significant amount of ancestry with the Villabruna cluster, CHG do not, having separated approximately 46kya (along with Mal’ta and the Eastern hunter-gatherers: EHG) from the lineage leading to Villabruna/WHG.40, 38 Many ancient samples have ancestry nested within this split between Villabruna/WHG and CHG, but seemingly do not inherit a signal of selection for increased height (including pleistocene hunter-gatherers Kostenki14 and Vestonice41, 38). It is therefore unlikely that the signals we observe can be traced to a single selective event common to Villabruna/WHG/SHG and to CHG. Instead, our results are potentially consistent with at least two independent episodes of selection for increased height among pleistocene and/or holocene hunter-gatherers: at least one in the west, affecting Villabruna, WHG, and SHG, and one in the east, affecting CHG (and potentially Mal’ta).
The Yamnaya-related steppe samples (STP) also show a signal of selection for increased polygenic height scores (e.g. STP–Ust’-Ishim p = 0.001, STP–Vestonice p = 0.004).19, 42 This signal is likely due to the fact that they draw ∼45% of their ancestry from a population related to the CHG,19 who they are not significantly different from (STP–CHG p = 0.62). In turn, the central European Late Neolithic and Bronze Age samples (CLB, including the Corded Ware and Bell Beaker culture) share the high polygenic height signal, and draw much of their ancestry from the expansion of the Yamnaya Steppe people.43, 44 In contrast, many of the European and Near East early Neolithic samples show little dierence in scores relative to the early pleistocene hunter-gatherers and have significantly lower polygenic height scores than Villabruna/WHG/SHG and CHG samples and the populations with Yamnaya ancestry (e.g. Levant–SHG p = 0.001, Levant–CHG p = 0.01, Levant– STP p = 0.014). We do not find support for Mathieson and colleagues’19 suggestion of selection for reduced height in Iberian Neolithic samples relative to Anatolian Neolithic (p = 0.90, see also42).
Taken together, our results suggest that much of the variation we observe among modern Eurasian populations for polygenic height scores can be traced to variation in the amount of the WHG and Yamnaya/CHG ancestry they have inherited. For example, modern Europeans can be described approximately as a mixture between WHG, Yamnaya, and early Neolithic farmers from Anatolia,43 and the variation in the relative proportion of ancestry derived from these three sources explains a substantial amount of the variation in polygenic height scores (see Figure S10).19, 42 Similarly, Yamanaya/CHG ancestry decays from west to east across both northern and southern Asia,40, 44 consistent with the cline of decreasing polygenic height scores moving from west to east across the continent.
Finally, we note that we can reject neutrality in pairwise comparisons between modern East Asian populations and certain ancient samples that do not appear to be involved in the signal of selection for increased height in the west (e.g. CHB–EHG p = 0.004, CHB–Levant p = 0.014, Mal’ta–CHB p = 0.006). As these ancient populations are distantly related to one another, and show no other signals of selection on height, this may indicate that selection drove polygenic height scores down somewhere in the history of East Asians. However, the intepretation of this signal is complicated by the fact that we cannot completely exclude that polygenic height scores were selected up in these ancient populations. Clarifying this signal will likely require investigation via more explict models of human demographic history23 as well as the incorporation of height GWAS from East Asia.
Selection on Body Shape Polygenic Scores
As four out of the next five strongest signals beyond height also represent anthropometric traits, we focus the remainder of our efforts on these phenotypes. Due to genetic correlations between traits, it is possible that signals of selection on two (or more) distinct phenotypes actually represent only a single episode of selection, where one trait responds indirectly to selection on the correlated trait. Because the genetic correlation with height varies among these phenotypes (hip circumference: r = 0.39, IHC: r = 0.268, waist circumference: r = 0.22, and WHR: r = −0.08),45, 46 we expect a priori that signals for more tightly correlated phenotypes are more likely due to a correlated response to selection on height, whereas for example the WHR signal is more likely to be independent.
To test whether the new signals we observe represent selective events distinguishable from the height signal, we developed a multi-trait extension to our null model based on the quantitativegenetic multivariate-selection model of Lande and Arnold47 (see Methods and Supplementary Text Section S1.6). We condition on the observed polygenic height scores, and test whether the signal of selection on a second trait is still significant after accounting for a genetic correlation with height (a non-significant p-value is consistent with a correlated response to selection on height). Applying this test to our entire panel of populations, we find that conditioning on height ablates much of the signal for hip circumference (p = 0.0186 compared to p = 1.12 × 10−4 when not conditioning on height), whereas signals in IHC (p = 1.11 × 10−5 vs p = 5.37 × 10−8) and WHR (p = 3.57 × 10−8 vs p = 3.38 × 10−7) are less aected. Restricting to European populations only, height is better able to explain hip circumference (p = 0.1152 vs p = 3.4 × 10−3), waist circumference (p = 0.0104 vs p = 2.63 × 10−3), and IHC (p = 5.1 × 10−3 vs p = 1.41 × 10−8) signals, while the signal of selection on WHR again remains strong even after conditioning on height (p = 1.92 × 10−8 vs p = 6.03 × 10−10). WHR is genetically correlated within populations with hip (r = 0.316) and waist circumference (r = 0.729), but not with IHC (r = 0.01).45, 46 Conditioning on WHR is suffcient to explain waist circumference (global p = 0.1523 vs p = 3 × 10−3, Europe p = 0.5178 vs p = 2.6 × 10−3), but signals in HIP, IHC, and height are all independent of WHR (see Table S4). Together, these results suggest that we can distinguish the action of natural selection along a minimum of two phenotypic dimensions (i.e. height and WHR, or unmeasured phenotypes closely correlated to them). The signal of selection observed for hip circumference is likely due at least in part to selection on height, and the waist circumference signal is probably due to selection on a combination of height and WHR (or closely correlated phenotypes; we provide additional evidence for this claim in supplement section S1.6.2). Whereas IHC shows some evidence of being influenced by selection on height, a correlated response to height seems not to fully explain this signal.
Signals of divergence for both IHC and WHR polygenic scores are confined mostly to Europe and West Asia. For both traits the null model gives a significantly improved fit to the data when conditioned on Europe to explain West Asia and similar when conditioning on West Asia to explain Europe (Table S5). This suggests that, as is the case for Eurasian height scores, a substantial fraction of the divergence in IHC and WHR polygenic scores among modern populations across western Eurasia reflects divergence among ancient populations and subsequent mixture rather than recent selection.
Bergmann’s Rule and Thermoregulatory Adaptation
For both IHC and WHR, the selective signal in Western Eurasia can be captured in large part by strong, positive latitudinal clines (p = 3.16 × 10−15 for IHC and p = 3.16 × 10−7 for WHR; Figure 6). These clines in polygenic scores support independent phenotypic evidence for larger and wider bodies and rounder skulls at high latitudes,48, 1, 49, 2, 50, 51, 3 consistent with Bergmann’s Rule,52, 53 and add genetic support for a thermoregulatory hypothesis for morphological adaptation, whereby individuals in colder environments are thought to have adapted to improve heat conservation by decreasing their surface area to volume ratio.
A broad range of selective mechanisms have been proposed to act on height variation.54 Because we do not detect any signal of selection on age at menarche, we think it unlikely that the height signal represents a correlated response due to life-history mediated selection on age at reproductive maturity.55 It has also been suggested that selection on height may be explained as a thermoregulatory adaptation.54 However, because the surface area to volume ratio is approximately independent of height,56, 2 the effect of height SNPs on this ratio is mediated almost entirely through their effect on circumference (hip and/or waist; see section S1.8). Because the signal of selection on height cannot be explained by conditioning on hip and waist circumference, it seems that the thermoregulation hypothesis cannot fully explain the signal of selection on height.
A second eco-geographic rule relevant to height is Allen’s rule,57 which predicts relatively shorter limbs in colder environments, again consistent with adaptation on the basis of thermoregulation. In support of this, human populations in colder environments are observed to have proportionally shorter legs, compared to those in warmer environments.49, 58 However, we detect no signal of selection on polygenic scores for the ratio of sitting to standing height (SHR); a measure of leg length relative to total body height.59 Indeed, by combining our height SNPs with their effect on SHR, we find a strong signal that both increases in leg length and torso length underlie the selective signal on height from North to South within Europe, and from East to West across Eurasia (see S1.9). This again suggests that thermoregulatory concerns are unlikely to fully explain signals of selection on height.
Discussion
The study of polygenic adaptation provides new avenues for the study of human evolution, and promises a new synthesis of physical anthropology and human genetics. Here, we undertake a broad scan for evidence of polygenic adaptive divergence among modern human populations, with body size and shape phenotypes providing most of our strongest signals. We show for the first time that it is possible to reject a neutral model of evolution at height associated loci in comparissons between populations outside of Europe. Using ancient DNA, we show that patterns seen across modern populations are consistent with two independent episodes of selection for increased height in pleistocene hunter-gatherer populations that lived in western and west-central Eurasia during or shortly after the last glacial maximum, and then distributed ancestry widely across the continent. We also provide evidence for adaptive divergence of IHC and WHR in western Eurasia, independent of selection on height, and show that signals of selection on hip and waist circumference can likely be explained as correlated responses to selection on height and WHR (or some other closely correlated phenotypes).
It is conspicuous that the signals of adaptive divergence that we detect are mostly localized to western Eurasia, even in cases where it seems implausible that observed phenotypic differences could have been generated under neutrality (e.g. Maasai vs Biaka pygmy). However, the fact that we do not detect departures from neutrality in such cases should not necessarily be taken as evidence against selection. We should expect to be better-powered to detect selective events in populations more closely related to Europeans for two reasons. First, changes in the structure of linkage disequilibrium (LD) across populations should lead GWAS variants to tag causal variation best in populations genetically close to the European-ancestry GWAS panels.60 Second, gene-by-environment and gene-by-gene interactions can lead to changes in the additive effects of individual loci among populations,61 and therefore in the way that they respond to selection on the phenotype. We expect that these difficulties can be overcome or mitigated in the future through a combination of well-powered GWAS in multiple populations of non-European ancestry, access to a wider array of ancient DNA samples, and improved frameworks for the interpretation of signals of polygenic adaptation.23
The existence of latitudinal trends in the polygenic scores for WHR and IHC support the notion that some of the clinal phenotypic variation in body shape typically thought to represent thermoregulatory adaptation can be attributed to genetic variation driven by selection, while the ability of simple models to unify signals across broad geographic regions again suggests that these patterns could have been generated by a limited number of selective events. Evidence for adaptation on the basis of specific environmental pressures is most convincing when multiple populations independently converge on the same phenotype in the face of the same environmental pressure, a pattern for which we currently lack evidence. Therefore, while our evidence is consistent with adaptation to temperature environments, alternative explanations (e.g. adaptation to diet) are plausible.
1 Methods
1.1 Population Genetics Datasets
We downloaded the 1000 genomes phase 3 release data from the 1000 genomes ftp portal.25 We also used data from the Human Origins fully public panel24 which was imputed from the 1000 Genomes phase 3 as reference, using the Michigan imputation server,62 and restricting to SNPs with an imputation quality score (in terms of predicted r2) of 0.8 or greater (pers. comm. Joe Pickrell). The original genotype data can be downloaded from the Reich lab website (https://reich.hms.harvard.edu/datasets).
This combined dataset represent samples from 2504 people from 26 populations in the 1000 Genomes dataset and 2158 people across 161 populations from the Human Origins dataset, for a total of 4662 samples from 187 populations (S2). For global analyses we include all 187 populations. In regional analyses we exclude populations with a significant recent (i.e. < 500 years) African/non-African admixture to avoid confounding admixture with signals of recent selection within regions (see S2 and S1 for the regions).
1.2 Selection of GWAS SNPs
We took public GWAS results for a set of traits28 and combined them with additional anthropometric traits from the GIANT consortium and a subset of Early Growth phenotypes contributed by EGG Consortium. Table S1 gives a full list of the traits included in this study and the relevant references. For each trait we selected a set of SNPs with which to construct our polygenic scores as follows. For each SNP, we calculated an approximate Bayes factor summarizing the evidence for association at that SNP via the method of Wakefield,63 following Pickrell et al (2016)28 (see their supplementary note section 1.2.1). We then used a published set of 1700 non-overlapping linkage disequilibrium blocks26 to divide the genome, after which we selected the single SNP with the strongest approximate Bayes factor in favor of association within each block to carry forward for analyses.
1.3 Polygenic Scores and Null Model
Given a set of L SNPs associated with a trait (L ≈ 1700), we construct the vector of polygenic scores across all M = 187 populations by taking the sum of allele frequencies across the L sites (the vector at site ℓ), weighting each allele’s frequency by its effect on the trait (αℓ) to give
For each trait, we construct a null model for the joint distribution of polygenic scores across populations, assuming where . Here population samples (weighting all population samples equally), and F is the M × M population-level genetic covariance matrix.18 All polygenic scores are plotted in centered standardized form . We use the Mahalanobis distance of from its distribution under the null as a natural test statistic to assess the ability of the null model to explain the data (see Berg and Coop (2014)18 for an extended discussion). This test statistic should be X2 with M − 1 degrees of freedom under neutrality. However, in practice we are concerned that the ascertainment of GWAS loci may invalidate our null model, so we compare the test statistic to an empirical null (see Section S1.3)
1.4 Latitudinal and Longitudinal Correlations
We also test for selection-driven correlations between geographic variables (e.g. latitude) and a subset of our polygenic scores (see Berg and Coop (2014)18 and Section S1.1 for more details of the test). We take the standardized geographic variable and polygenic scores, and then rotate these vectors by the inverse Cholesky decomposition of the relatedness matrix F. These rotated vectors are in a reference frame where the populations represent independent contrasts under the neutral model. We take as our test statistic the covariance of these rotated vectors. We calculate the significance of the statistic by comparing to a null distribution generated by calculating null sets of polygenic scores assembled from resampled SNPs with derived frequency matched to the CEU population sample so as to mimic the effects of the GWAS ascertainment.
1.5 Analysis of Ancient DNA
We included a combined dataset of 63 Ancient Eurasian human population samples with date estimates from 45kya-2.5kya,19, 38, 39 combining these samples into pre-specified analysis clusters we took a set of 19 populations that had < 10% of height SNPs missing (see Table S7 for a list of ancient populations included). We compare these to the modern population samples from 1000 genome consortium data. We then took the subset of 724 of our 1700 height associated GWAS SNPs with low levels of missing data in these 19 ancient populations (6.2% averaged over populations).
Polygenic height scores were calculated as in Eq. (1), for loci with no counts in an ancient population we set to the frequency in the combined rest of sample. We construct the 95% credible intervals show in Figure 4A, by assuming that the the posterior of the underlying population frequency is independent across loci and populations and follows a beta distribution, with a uniform prior distribution, which is updated by our binomial sample of ancient counts. Using the variance of the posterior distribution at each locus, we then calculated the variance of the polygenic score (VZ), which follows from Eq. (1). The 95% credible-interval error bars in Figure 4A were then calculated as for each population.
For calculating QX(eqn (3)) for pairs of population samples, we restricted the SNP set to the loci that had counts in both samples. Our p-values are calculated assuming that the pairwise QX statistic has a χ2 distribution, with one degree of freedom. We also constructed a null by flipping the signs of the GWAS effect of the loci at random, and found the χ2 p-values to be well callibrated.
1.6 Two-Trait Conditional Tests
Because some of the traits we examine are genetically correlated with one another, we were concerned that signals of selection observed for one trait might reflect a response to selection on another correlated trait. To determine whether genetic correlations might be responsible for some of our signals, we developed a multitrait extension to our neutral model that accounts for genetic covariance among traits. The extension is on the framework of Lande and Arnold.47
If are vectors of polygenic scores for two different traits constructed according to equation (1), and the matrix contains these vectors as columns, then under neutrality the distribution of Z is approximately matrix normal where the matrix µ contains the trait-specific means, F gives the population covariance structure among rows as in the single trait model, and G is the among trait additive genetic covariance matrix, the “G matrix” of multivariate quantitative genetics,47 estimated for a population ancestral to all populations in the sample. The diagonal elements of the 2 × 2 G matrix are given by the VA parameters from above in the single trait model and the o-diagonal element (CA,12) corresponds to the additive genetic covariance between the two traits. Given this null model for the joint distribution of the two traits, we can construct a conditional model for the distribution of polygenic scores for trait 1, given the polygenic score observed for trait 2, as
Given a value of CA,12 we can then use these conditional means and variances in equation (3) to form a conditional QX statistic and compare it to its null distribution. We take the failure to reject neutrality on the basis of the conditional QX statistic as consistent with the hypothesis that any response to selection observed for trait 1 is a result of selection on trait 2. Some of the traits we study have non-linear allometric relationships with each other, but because our polygenic scores are linear by construction our tests are robust to this non-linearity (see S1.7).
We experimented with estimating CA,12 on the basis of SNPs that overlap between the two traits in each genomic block. However, we were concerned about this approach to estimating genetic correlations not being a suffcient joint model for cases in which different SNPs within a block affected the two traits but were in linkage disequilibrium with one another, and therefore do not drift independently. To deal with this issue, we represent the genetic covariance among populations as where ρ represents the genetic correlation between the two sets of polygenic scores. We pursued a conservative strategy, testing a range of values for ρ along a dense grid from −1 to 1 to ask whether any assumed genetic correlation between polygenic scores could plausibly allow one trait to be explained as a correlated response to another. As a further conservative measure, we allowed the genetic correlation used to calculate the conditional variance (Eq (7)) to be equal to zero, while allowing the ρ used to compute the conditional mean (Eq (6)) was not. This is a conservative approach, as it fits our conditional prediction to the mean, but allows the variance of the null model to remain as large as the unconditional model. The conditional two-trait p-values we present in the text, and the CI shown in two-trait Figure 5 and in the supplement, use this conservative approach. In practice our values of ρ are consistent with estimates of genetic correlations obtained from the LDscore approach,45, 46 given that our polygenic scores capture only a fraction of the total genetic variance for each trait.
1.7 Single Trait Conditional Null Model
We also developed an extension of the null model for a single trait to test whether two (or more) signals of selection detected in different geographic regions might reflect a single ancestral event that occurred in an ancient population that has contributed ancestry broadly to modern populations.
Assume for example that we have detected a signal of selection among the population samples from region A (e.g. Europe) and among the population samples from (e.g. Asia), and we would like to test whether the signal detected in region B is due to a selective event that is also responsible for generating a signal of selection in region A. We first reorganize our samples into two blocks for the two regions
Where µB is the mean polygenic score in the set of populations being tested, the F•,•s refer to the sub-matrices of the relatedness matrix F, and F itself has been recentered at the mean of the test set (i.e. region B). Then the conditional distribution of polygenic scores in region B given the polygenic scores observed in region A is
The conditional mean, reflects the best predictions of population means in region B given the values observed in region A, whereas the conditional covariance matrix FB|A reflects the scale and form of the variance around this expectation that arises from drift that is independent of drift in the ancestry of populations in region A.
We can then test for over-dispersion of polygenic score in region B given the observed polygenic scores in region A by using in (3) to construct a conditional QX score. We judge the statistical significance of this conditional QX score by comparing it to a frequency matched dataset, as with the standard test. We interpret a non-significant conditional QX score for region B as evidence that any selective signal of overdispersion in B is well explained by genome-wide allele-sharing with A. We view this as evidence that the selection signal in B overlaps that in A, due to selection in shared ancestral populations and admixture.
In Figure 3 we plot the observed polygenic scores for Asia against the predicted polygenic scores for Asia (B), conditional on the Europe population sample polygenic scores (A). The error bars are 95% CIs for each population sample, obtained from the variances on the diagonal of VAFB|A.
Acknowledgements
We thank the Coop Lab and Doc Edge, Iain Mathieson, Emily Josephs, Joe Pickrell, Molly Przeworski, David Reich, Je Ross-Ibarra, Guy Sella, and Tim Weaver for helpful discussions and feedback on earlier drafts. The work was supported in part by an NSF GRFP (to JJB), the UC Davis Anthropology department (XZ), and NIGMS-NIH RO1 grants GM108779 to GC. JJB was also supported in part by R01 grants GM115889 to Guy Sella and GM121372 to Molly Przeworski.