Abstract
Sensitivity to external demands is essential for adaptation to dynamic environments, but comes at the cost of increased risk of adverse outcomes when facing poor environmental conditions. Here we identify genetic loci associated with phenotypic variability in key brain structures: amygdala, pallidum, and intracranial volumes. Variance-controlling loci included genes with a documented role in brain and mental health and were not associated with the mean anatomical volumes.
Phenotypic variability is key in evolution, and partly reflects inter-individual differences in sensitivity to the environment1. Genetic studies of human neuroanatomy have identified shifts in mean phenotype distributions (e.g., mean brain volumes) between groups of individuals with different genotypes2, and have documented genetic overlaps with common brain and mental disorders3. Despite the evolutionary relevance of phenotypic dispersion evidenced in multiple species and traits1,4, the genetic architecture of variability in human brain morphology is elusive.
Phenotypic variance across genotypes can be interpreted in relation to robustness, i.e., the persistence of a system under perturbations1,4 and evolvability, the capacity for adaptive evolution5. High phenotypic robustness is indicated by low variation in face of perturbations, i.e. phenotypes are strongly determined by a given genotype. In contrast, lack of robustness corresponds to high sensitivity, yielding phenotypes with overall larger deviations from the population mean in response to environmental, genetic or stochastic developmental factors. Neither increased or decreased robustness confers evolutionary advantages per se1, and their consequences for adaptation need to be understood in view of the genotype-environment congruence. Reduced robustness (and thus increased variability of trait expression) can be a conducive to adaptive change5, and increased variability of phenotypic expression can in itself also be favored by natural selection in fluctuating environments6. Thus, recognizing genetic markers of sensitivity can aid in identifying individuals who are more susceptible to show negative outcomes when exposed to adverse factors –either genetic or environmental– and otherwise optimal outcomes in the presence of favorable factors. Such variance-controlling genotypes may be conceived as genomic hotspots for gene-environment and/or gene-gene interactions, with high relevance for future genetic epidemiology studies7.
To provide a proof-of-principle of the hypothesis of a genetic regulation of brain volume variability, we conducted a genome-wide association study of intragenotypic variability in seven key subcortical regions and intracranial volume (ICV) using a harmonized genotype and imaging data analysis protocol in a lifespan sample (n=19,093 individuals; 3 to 91 years, mean age 47.8 years; 48% male, Methods and Supplementary Information). Genome-wide association statistics were computed for genetic effects on the variance and mean of the volumetric feature distributions. Consistent with previous large-scale analyses on genetics of neuroimaging volumetric measures2,8, features included bilateral (sum of left and right) amygdala, caudate nucleus, hippocampus, nucleus accumbens, pallidum, putamen and thalamus, as well as ICV. Over 92% of the included participants were healthy controls (n=17,590); the remaining 8% were diagnosed with a brain disorder (n=1,503; including psychosis, depression, and neurodegenerative disorders, Supplementary Information). The analyses were conducted in a two-stage protocol. For each genotype, we conducted a standard association test for the inverse-normal transformed (INT) brain volumes9, adjusting for scanning site, sex, age, age squared, diagnosis, and ICV (for the subcortical volumes only). The residuals from that model were then INT-transformed and submitted to genome-wide Levene’s tests to investigate if specific alleles associate with elevated or reduced levels of phenotypic variability. For relevant markers, variances explained by mean and variance models were estimated from the INT-transformed volumes before fitting regression models using a previously reported approach7.
A mega-analysis of 19,093 unrelated subjects of European ancestry identified two loci associated with differential levels of phenotypic variability at genome-wide significance (p<5×10-8), and one with marginal significance (p=5.7 × 10-8) overall on three out of the eight volumetric features (pallidum, ICV, and amygdala). Genomic inflation factors (lambda) ranged between 1.005 and 1.102 for the different variance-GWAS (Supplementary Information) and were adjusted using genomic control (based on the observation that here the F-statistic of the Brown-Forsythe test can be approximated by a chi-squared statistic7). A conventional mean phenotype GWAS with additive model on INT-transformed phenotypes showed 14 significant loci influencing four volumetric traits (amygdala [2], caudate [2], hippocampus [4], pallidum [2], putamen [3] and ICV [1]), and two loci (from amygdala and ICV analyses) were close to significant (p<10-7). Manhattan plots for both mean- and variance-GWAS are displayed as Supplementary Information.
Genome-wide significant loci included an intergenic locus around rs741078 associated with pallidum volume variance (chr4:10215987:A:G; minor allele frequency (MAF)=0.113; 13691 bp from AC006499.6; p=2.1×10-8; variance explained variance model: 0.67%; variance explained mean model: 0.01%), and a locus in a TMPRSS15 intron associated with ICV variance (rs4482570; chr21:19762408:C:T; MAF=0.325; p=5.1×10-9; variance explained variance model: 0.19%; variance explained mean model: 0.01%). In addition, a variance-controlling locus for amygdala volume in an intron of SATB2 showed borderline significance (rs10497831; chr2:200142649:A:G; MAF=0.152; p=5.7×10-8; variance explained mean model: 0.001%; variance explained variance model: 0.09%). Results were consistent when re-analyzing the data from healthy controls only (excluding participants with neuropsychiatric diagnoses): p=8.6×10-9 (rs741078-pallidum), p=6.5×10-9 (rs4482570-ICV) and p=2.5×10-7 (rs10497831-amygdala). Figure 2 shows the relevant phenotype distributions for the top hits for the two models grouped by genotypes generated via the shift function10. In short, the adopted shift function procedure was implemented in three stages: deciles of two phenotype distributions were estimated using the Harrell-Davis quantile estimator, followed by the computation of 95% confidence intervals of decile differences with bootstrap estimation of deciles’ standard error, and multiple comparison control so that the type I error rate remained close to 5% across the nine confidence intervals. Decile-by-decile shift function analysis confirmed higher amygdala volume variance among homozygotes for the minor rs10497831 allele (GG) in relation to the other two genotypes (AA, AG). Similarly, minor allele homozygous subjects for rs741078 (CC genotype) showed higher pallidum volume variance than carriers of the major allele A. The rs4482570 heterozygotes and minor allele homozygotes (CT, TT) had higher ICV variance than the participants with CC genotypes. Pathway analysis on pallidum, ICV and amygdala variance-controlling summary statistics using MAGMA Gene-Set Analysis on the summary statistics (via FUMA11) revealed significant enrichment for the term “neuron projection regeneration” (pallidum GWAS: 28 genes, β=0.6, standard error: 0.135, p=4×10-6, Bonferroni p=0.046) (more detailed results provided as Supplementary Information). No significant enrichment was found for ICV or amygdala.
To our knowledge, this is the first evidence of genetic loci influencing variability of brain volumes beyond their mean value. A conceptually and methodologically similar approach revealed genetic control of the variance in body height and body mass index12. Adding to the notion that phenotypic spread in a population is related to genetic variability, our current findings show that the population variance of subcortical and intracranial volumes is partly under genetic control. Importantly, our findings on brain structure and the previous work on body mass index12 provide converging evidence supporting the notion that common genetic variants affecting the mean and the variance of a trait need not be correlated and may influence phenotypes through complementary mechanisms.
Variants associated with volumetric dispersion mapped to genes that have previously been linked to cognitive and mental health traits. Amygdala variability was related to a genotype in SATB2, which has been associated with intelligence13, and is expressed in adult and fetal brain, and related to syndromic neurodevelopmental deficits14. Similarly, the significant variance locus (21q21.1) for ICV spanned TMPRSS15, a gene reportedly linked to neurodevelopmental disruptions15 and to brain changes in post-traumatic stress disorder16. Moreover, pathway analysis with MAGMA Gene-Set Analysis using the full set of variance-controlling GWAS results from the pallidum analysis revealed significant enrichment for the “neuron projection regeneration” term, providing a plausible mechanism modulating neuroanatomical adaptation in response to distinct genetic and environmental factors.
Variance-controlling alleles can be interpreted as underlying distinct degrees of organismic robustness1. Relevance to medical genetics also comes from the observation that several disease phenotypes emerge beyond a phenotypic threshold, which could be reached by the influence of high variability phenotypes17. It is thus important to understand how the identified markers relate to brain variability under changing environments (robustness), how they interact with other genetic loci (epistasis) and how they relate to the clinical manifestation of disease. Similarly, variance-controlling loci can underlie variability from other genetic factors, potentially affecting evolutionary dynamics4. Identifying the mechanisms by which variance-controlling genotypes influence gene expression variance in relevant brain structures may provide a proof of principle for the functional relevance of the identified genotypes. This type of effect on expression has been shown in model organisms18, and the genomic loci identified here represent suitable candidates for targeted gene expression analysis in the human brain. The identification of specific genes involved in neural evolution and mental disorders suggests that brain variability in human populations is mediated by genetic factors. In so doing it also underscores the validity of gene-gene and gene-environment interactions in explaining heritability of complex human traits.
In summary, the results indicate that beyond associations with mean volumetric values, genotypic architecture modulates the variance of subcortical and intracranial dimensions across individuals. The lack of overlap between genetic associations detected by the standard additive genetic model and variance-controlling loci indicate independent mechanisms. These findings contribute to establish the genetic basis of phenotypic variance (i.e., heritability), allow identifying different degrees of brain robustness across individuals, and open new research avenues in the search for mechanisms controlling brain and mental health.
Methods
Participants
Data from 19,093 unrelated European-ancestry individuals were included (mean age 47.8 years, ranging from 3.2 to 91.4 years old; 48% male), recruited through 16 independent cohorts with available genome-wide genotyping and T1-weighted structural MRI. Extended information on each cohort reported in Supplementary Information includes recruitment center, genotyping and brain imaging data collection, sample-specific demographics, distribution of brain volumes and, when relevant, diagnoses (1,464 individuals had a diagnosis). Written informed consent was provided by the participants at each recruitment center, and the protocols were approved by the corresponding Institutional Review Boards.
Genotypes
Only participants with European ancestry (as determined by multidimensional scaling) were included in the final set of analyses, in recognition that the inclusion of subjects from other ethnicities can potentially add genetic and phenotypic confounding. Except for the UK Biobank cohort, all directly genotyped data were imputed in-house using standard methods with the 1000 Genomes European reference panel. After imputation, each genotyping batch underwent a quality control stage (MAF < 0.01; Hardy-Weinberg equilibrium p < 10-6; INFO score < 0.8). When all samples were combined, over 5 million distinct markers passed quality control genome-wide. Additional filters on genotyping frequencies were applied to the final merged dataset based on statistical considerations for genotype frequency in variance-controlling detection, as described below.
Brain features
Three-dimensional T1-weighted brain scans were processed using FreeSurfer19 (v5.3.0; http://surfer.nmr.mgh.harvard.edu/). Eight well-studied volumetric features were selected for analysis moving forward, as literature findings on large datasets show that their mean population value is influenced by common genetic variation2: accumbens, amygdala, caudate, hippocampus, pallidum, putamen, thalamus and ICV. Cohort-wise distribution of values is summarized in Supplementary Information. Before the ensuing statistical analyses, outliers (+-3 standard deviations from the mean) were removed, and generalized additive models (GAM) were implemented in R (https://www.r-project.org) to regress out the effects of scanning site, sex, age, diagnosis and ICV (for subcortical volumes only). Hereafter, brain volumes correspond to residuals from those GAM fits unless otherwise specified.
Statistical analyses
Genome-wide association statistics were computed for genetic effects on the mean and variance of the volumetric feature distributions. For each marker, the distribution of each outcome phenotype was normalized via rank-based inverse normal transformation (INT) to prevent statistical artifacts. Scale transformations like INT have been shown to aid genetic discovery by constraining mean-effects and reducing the effect of phenotypic outliers, which reduces Type I error rates without sacrificing power9,12. In short, INT was applied to transform each subject’s phenotype (yi) as where rank(yi) is the rank within the distribution, stands for sample size (without missing values) and ϕ−1denotes the standard normal quantile function. Intuitively, all phenotype values are ranked and the ranks are mapped to percentiles of a normal distribution. Then, an additive genetic model was computed with where INT(y) is the normalized phenotype variable; SNP is the relevant marker coded additively and, stands for regression residuals. Four genomic principal components (C1-C4) were included, to control for population stratification and cryptic relatedness, and to make the results consistent/comparable with a previous large-scale analysis of genetic variation and brain volumes2. Results from that analysis (mean-model) were contrasted with the statistics from the variance-model. The previous residuals ε were again inverse normal transformed, and used as input for the variance-model using the Brown–Forsythe test. Briefly, INT-transformed residuals were used to compute zij = |εij – ε̃|, with ε̃j as the median of group j (here, genotype) and these, in turn, to compute the F statistic: where nj is the number of observations in group j, p is the number of groups (2 or 3 different genotypes), and Z̄·jdenotes the mean in group j To prevent increases in false positive rates arising from small groups20, only markers with at minimum (non-zero) genotype count of at least 100 were included. This value was chosen based on literature about power and statistical considerations of genome-wide association studies for phenotypic variability20. The data were analyzed and visualized in R with the aid of appropriate packages. When relevant, significant markers were annotated and additionally inspected using FUMA11.