Abstract
Incidence of glioma is approximately 50% higher in males. Previous analyses have examined exposures related to sex hormones in women as potential protective factors for these tumors, with inconsistent results. Previous glioma genome-wide association studies (GWAS) have not stratified by sex. Potential sex-specific genetic effects were assessed in autosomal SNPs and sex chromosome variants for all glioma, GBM and non-GBM patients using data from four previous glioma GWAS. Datasets were analyzed using sex-stratified logistic regression models and combined using meta-analysis. There were 4,831 male cases, 5,216 male controls, 3,206 female cases and 5,470 female controls. A significant association was detected at rs11979158 (7p11.2) in males only. Association at rs55705857 (8q24.21) was stronger in females than in males. A large region on 3p21.31 was identified with significant association in females only. The identified differences in effect of risk variants do not fully explain the observed incidence difference in glioma by sex.
Introduction
Glioma is the most common type of primary malignant brain tumor in the United States (US), with an average annual age-adjusted incidence rate of 6.0/100,000 [1]. Glioma can be broadly classified into glioblastoma (GBM, 61.9% of gliomas in adults 18+ in the US) and lower-grade glioma (non-GBM glioma, 24.2% of adult gliomas) with tumors such as ependymoma (6.3%), unclassified malignant gliomas (5.1%), and pilocytic astrocytoma (1.9%) making up the majority of other cases [1]. Many environmental exposures have been investigated as sources of glioma risk, but the only validated risk factors for these tumors are ionizing radiation (which increases risk), and history of allergies or other atopic disease (which decreases risk) [2]. These tumors are significantly more common in people of European ancestry, in males and in older adults [1]. The contribution of common low-penetrance SNPs to the heritability of sporadic glioma in persons with no documented family history is estimated to be ∼25% [3]. A recent glioma genome-wide association study (GWAS) meta-analysis validated 12 previously reported risk loci [4], and identified 13 new risk loci. These 25 loci in total are estimated to account for ∼30% of heritable glioma risk. This suggests that there are both undiscovered environmental risk (which accounts for ∼75% of incidence variance) and genetic risk factors (accounting for ∼70% of heritable risk) [3,4].
Population-based studies consistently demonstrate that incidence of gliomas varies significantly by sex. Most glioma histologies occur with a 30-50% higher incidence in males, and this male preponderance of glial tumors increases with age in adult glioma (Figure 1) [1]. Several studies have attempted to estimate the influence of lifetime estrogen and progestogen exposure on glioma risk in women [5,6]. Results of these analyses have been mixed, and it is not possible to conclusively determine the impact of hormone exposure on glioma risk. Male predominance in incidence occurs broadly across multiple cancer types and is also evident in cancers that occur in pre-pubertal children and in post-menopausal adults [7,8]. Together these observations suggest that other mechanisms in addition to acute sex hormone actions must be identified to account for the magnitude of sex difference in glioma incidence.
Though sex differences exist in glioma incidence, sex differences have not been interrogated in previous glioma GWAS. Sex-specific analyses have the potential to reveal genetic sources of sexual dimorphism in risk, as well as to increase power for detection of loci where effect size or direction may vary by sex [9,10]. The aim of this analysis is to investigate potential sex-specific sources of genetic risk for glioma that may contribute to observed sex-specific incidence differences.
Results
Study population
There were 4,831 male cases, 5,216 male controls, 3,206 female cases, and 5,470 female controls (Table 1). A slightly larger proportion of male cases were GBM (58.7% of male cases vs 52.5% of female cases). Controls were slightly older than cases. GBM cases had a higher mean age than non-GBM cases, which was consistent with known incidence patterns of these tumors. Male and female cases within histology groups had similar age at diagnosis. The proportion of non-GBM cases varied by study due to differing recruitment patterns and study objectives (see original publications for details of recruitment patterns and inclusion criteria [4,11-14]).
Previously discovered glioma risk regions
There were 5,934 SNPs within 500kb of 26 previously discovered glioma risk loci with IMPUTE2 information score (INF0)>0.7 and MAF>0.01 that were previously found to have at least a nominal (p<5×10−4) association with glioma [4], and results were considered significant at p<2.8×10−6 level (adjusted for 6,000 tests in each of three histologies [18,000 tests], see Figure 2a for schematic of study design). Among the 25 previously validated glioma risk loci, nine loci contained 10 SNPs with pM<2.8×10−6 and/or pF<2.8×10−6 in any histology: 1p31.3 (RAVER2), 5p15.33 (TERT), 7p11.2 (EGFR, 2 independent loci), 8q24.21 (intergenic region nearMYC), 9p21.3 (CDKN2B-AS1), 11q23.3 (PHLDB1), 16p13.3 (RHBDF1), 17p13.1 (TP53), and 20q13.33 (RTEL1) (Table 2). ORM and ORF were similar in the majority of these loci.
For one of two independent loci at 7p11.2 (rs 11979158), there was a significant association only in males for all glioma (ORM= 1.33 [95% CI=1.23-1.44], pM=4.87×10−12) and GBM (ORM=1.40 [95% CI=1.28-1.54], pM=1.26×10−12)) but the sex differences did not meet the significance threshold (overall pD=0.0055, and GBM pD=0.1184) (Figure 3, Table 2).
The previously identified SNP at 8q24.21 (rs55705857) was the most significant SNP in both males and females. Odds ratio for rs55705857 in all glioma was significantly higher in females (ORF=2.45 [95% CI=2.14-2.80], pF=1.22×10−39) as compared to males (ORM=1.56 [95% CI=1.40-1.75], pM=1.09×10−14) with pD=3.46×10−7. In non-GBM only, ORF (ORF=4.71 [95% CI=3.94-5.63], pF=1.85×10−65) was also elevated as compared to ORM (ORM=2.66 [95% CI=2.28-3.10)], pM=8.13×10−36) with pD=8.44x10−7 (Figure 3, Table 2). This association was further explored in a case-only analysis, where there was a significant difference between males and females overall (p=0.0012), and in non-GBM (p=0.0084) (Table 3).
Previous studies have found a strong association between rs55705857 and oligodendroglial tumors (particularly tumors with isocitrate dehydrogenase 1/2 (IDH1/2) mutation and loss of the 1p and 19q), so this association was further explored in the non-GBM (lower grade glioma [LGG]) histology groups (Table 4). For World Health Organization (WHO) grade II-grade III astrocytoma, effect was stronger in females (ORF=4.64 [95% CI=3.53-6.09], pF=2.15×10−28) as compared to males (ORM=2.87 [95% CI=2.31-3.56], pM=1.19×10−21) with pD=0.0065. For WH0 grade II-III Oligodendrogliomas effect was stronger than observed in WHO grade II-III astrocytomas, and effect size was stronger in females (ORF=12.15 [95% CI= 8.96-16.48], pF=3.68×10−58) as compared to males (ORM=5.47 [95% CI=4.16-7.19], pM=5.37×10−34) with pD=6.60×10−5. Oligoastrocytic tumors were not included in sub-analyses due to recent research that suggests that these tumors are not an entity that is molecularly distinct from oligodendrogliomas or astrocytomas [17].
Genome-wide scan of nominally significant regions
In a previous eight study meta-analysis, ∼12,000 SNPs (INFO>0.7, MAF>0.01) were identified as having a nominally significant (p<5×10−4) association with all glioma, GBM, or non-GBM [4]. A sex-stratified genome-wide scan was conducted within this set of SNPs and results were considered significant at pD<1.4×10−6 (adjusted for 12,000 tests in each of three histologies [36,000 tests], see Figure 2a for schematic of study design). Similar genome-wide peaks were observed between males and females (Figures 4-6). One large region within 3p21.31 (49400kb-49600kb, ∼200kb) was identified as being significantly associated with glioma and GBM in females only (Supplemental Figure 1). There were 243 SNPs with nominally significant associations within this region in the previous eight-study meta-analysis (p<5×10−4), and 32 of these had nominally significant sex associations (pF<5×10−6 or pM<5×10−6) in all glioma or GBM. The strongest association in females within this region was at rs9841110, in both all glioma (ORF=1.22 [95% CI=1.14-1.32], pF=5.55×10−8) with pD=1.77×10−4) and GBM only (ORF=1.27 [95% CI=1.16-1.38], pF=3.86×10−7) with pD=6.04×10−4), while there were no significant associations detected in males (Figure 3). No SNPs in this region were significantly associated with non-GBM. In a case-only analysis a marginally significant difference was detected between males and females overall (p=0.0520) and in GBM (p=0.0428) (Supplemental Table 1).
Agnostic scan of sex chromosome loci
SNPs on the sex chromosomes were analyzed in GICC only. There were 245,746 SNPs with INFO>0.7 and MAF>0.01 on the X chromosome after quality control and imputation, and results were considered significant at p<2×10−7 (corrected for 250,000 tests, see Figure 2b for a schematic of study design). No SNPs met this significance threshold. After quality control procedures were complete, there were 300 SNPs remaining on the Y chromosome. No significant signals were detected on the Y chromosome.
Combined analysis of germline variants and somatic characterization
Due to the lack of molecular classification data included in the GICC, MDA-GWAS, SFAGS-GWAS< and GliomaScan datasets, glioma data obtained from TCGA datasets (GBM and LGG) were used to explore the potential confounding due to molecular subtype variation with histologies. There were 758 individuals from the TCGA dataset available for analysis with available germline genotyping, molecular characterization, sex and age data (Supplemental Table 2). Overall, slightly more females (53.2%) as compared to males (47.2%) had IDH1/2 mutant glioma, but this difference was not statistically significant (p=0.1104) (Figure 7). When tumors were stratified by histological type, approximately equal proportions of males and females had IDH1/2 mutations present in their tumors (GBM: 6.0% in males, and 5.2% in females; LGG: 17.9% in males, and 17.7% in females). There were also no significant differences by sex in IDH/TERT/1p19q subtype (Supplemental Figure 2, overall p=0.2859), or panglioma methylation subgroup (Supplemental Figure 3, overall p=0.4153).
SNPs found to be nominally significant (p<5×10−4) in a previous 8 study meta-analysis, with imputation quality (r2) ≥0.7 were identified within the TCGA germline genotype data and D’ and r2 values in CEU were used to select proxy SNPs (Supplemental Table 3) [18].
A case-only analysis was conducted using sex as a binary phenotype for proxy SNPs in the TCGA dataset. In the overall meta-analysis, there was a nominally significant signal in the case-only meta-analysis for the proxy SNP in 3p21.31 in glioblastoma (Table 5). There was no significant association in the TCGA set, but RAF was elevated in females as compared to males in the GBM set, as well as in all IDH1/2 wild type gliomas (Table 5). MAF in LGG and IDH1/2 mutant glioma was similar among males and females. There was a nominally significant signal in the case-only meta-analysis for the proxy SNP at 7p11.2, but no significant association in the TCGA, but RAF was elevated in males as compared to females in the GBM set, as well as in all IDH1/2 wild type gliomas (Table 5). There was no significant signal detected in the overall case-only meta-analysis for the proxy SNP at 8q24.21, or within the TCGA set. Among both LGG and IDH1/2 mutant, RAF was elevated in females as opposed to males.
Sex-stratified genotypic risk scores
In order to estimate the cumulative effects of significant variants by sex, unweighted risk scores (URS) were calculated by summing all risk alleles for each individual using the 10 SNPs (rs12752552, rs9841110, rs10069690, rs11979158, rs55705857, rs634537, rs12803321, rs3751667, rs78378222, and rs2297440) found to be significantly associated with glioma in this analysis. GBM (URS-GBM) and non-GBM (URS-NGBM) specific URS were calculated only using sets of 6 SNPs in this set that were significantly associated with these histologies (URS-GBM: rs9841110, rs10069690, rs11979158, rs634537, rs78378222, and rs2297440, and URS-NGBM: rs10069690, rs55705857, rs634537, rs12803321, rs78378222, and rs2297440). See Methods for additional information on score calculation. Median URS, URS-GBM, and URS-NGBM were significantly different (p<0.0001) between cases and controls in both males and females in all histology groups (Supplemental Figure 4). There was no significant difference in median risk scores between male and female cases for any histology group. Glioma risk increased with increasing number of alleles in both males and females for the 10 SNPs included in the overall URS, as well as the 6 SNPs in the URS-GBM and 6 SNPs in URS-NGBM (Figure 9, Supplemental Table 4). Risk was higher in females (OR=3.97 [95% CI=2.42-6.80]) as compared to males (OR=1.74 [95% CI=1.21-2.53]) in all glioma for individuals for with 13-16 alleles, though the difference between these estimates were not statistically significant. Risk was also higher among females (OR=2.69 [95% CI=1.98-3.66]) as compared to males (OR=1.79 [95% CI=1.38-2.32]) in GBM for individuals with 8-11 risk alleles, as well as in non-GBM for individuals with 6-11 risk alleles (females: OR=2.83 [95% CI=2.12-3.78], males: OR=1.70 [95% CI=1.31-2.19]), though the difference between these estimates were not statistically significant. The estimates may underestimate actual risk due to varying effect sizes and alleles frequencies between risk variants.
Discussion
This is the first analysis of inherited risk variants in sporadic glioma focused specifically on sex differences, and the first agnostic unbiased scan for glioma risk variants on the X and Y sex chromosomes. Like many other types of chronic disease, there is a male preponderance of glioma. This incidence difference is not currently explained by known environmental or genetic risk factors.
One SNP at the 7p11.2 locus (rs11979158) showed significant association in males only, in both all glioma and GBM (Table 2). Effects were similar in all studies included in the analysis (Supplemental Table 5, Supplemental Figure 6). This variant is within one of two previously identified independent glioma risk loci located near epidermal growth factor receptor (EGFR) and is most strongly associated with risk for GBM.[4,19] Though EGFR is implicated in many cancer types and is a target for many anti-cancer therapies, this risk locus has not been previously associated with any other cancer type. Estrogen has been demonstrated to interact with EGFR as well as other growth factors [20]. Previous studies have not been definitive about the role of endogenous estrogen exposure in glioma risk, so it was not possible to determine the biological plausibility of this association [20]. Alternatively, cell intrinsic, hormone independent sex differences in EGF effects have been observed in a murine model of gliomagenesis in which EGF treatment was transforming for male but not female astrocytes that had been rendered null for neurofibromin and p53 function [21]. While this specific SNP was not genotyped on the germline genotyping array used for TCGA, a SNP in strong LD with rs11979158 (rs7785013, D’=1, r2=1 in CEU [18]) was evaluated. The association in the case-only analysis in TCGA was not statistically significant in any histology group, but a similar trend to that observed in the overall meta-analysis in sex-specific RAF was observed in both the overall GBM group, as well as in the IDH1/2 wild type group.
The association at 8q24.21 (rs55705857) is the strongest that has been identified by glioma GWAS to date,[4] with an odds ratio of 1.99 (95% CI=1.85-2.13, p=9.53×10−79) in glioma overall, and an odds ratio of 3.39 (95% CI=3.09-3.71, p=7.28×10−149) in non-GBM. Effects were similar in all studies included in the analysis (Supplemental Table 5, Supplemental Figure 7). The identified SNP, rs55705857, is located in an intergenic region near coiled-coil domain containing 26 (CCDC26, a long non-coding RNA). This analysis found a stronger association in females than males in all glioma and non-GBM, where female odds ratio estimates are ∼2x those of males (Table 2). ORs were higher in women than men in all studies included in the analysis, but the magnitude of the ORs varied between studies (Supplemental Table 5). Furthermore, the MAF for rs55705857 in the SFAGS-GWAS differed from the other three studies (See Supplemental Table 6 for MAF by study). Consequently, a sensitivity analysis was conducted to assess the effect of study heterogeneity on this estimate in non-GBM using only the GICC, MDA-GWAS, and GliomaScan datasets. The results of this analysis did not substantially change from (Main analysis pD=1.20×10−6 and sensitivity pD=1.49×10−5).
A histology-specific analysis found a similar sex differences in ORs for rs55705957 for both non-GBM astrocytoma, and oligodendroglioma (Table 4, see Supplemental Table 7 for study-specific estimates). Previous analyses have shown that this variant is strongly associated with IDH1/2 mutant tumors, particularly those that have 1p/19q deletions [22,23]. Data on IDH1/2 mutation and 1p/19q codeletion were not available for the combined four GWAS datasets used here. Hence, to assess potential differences in frequency of IDH1/2 mutation, the frequency of these mutations by sex was assessed within the combined TCGA GBM and LGG datasets [24-26]. Approximately the same proportion of males as females with histologically confirmed GBM had IDH1/2 mutations (5.2% vs 6.0%, respectively), so females may not be more likely than males to present with IDH1/2 mutant GBM (Figure 7). While this specific SNP was not genotyped on the germline genotyping array used for TCGA, a SNP in weak LD with rs55705857 (rs4636162, D’=1; r2=0.104, in CEU [18]) was able to be evaluated. There was no significant association in the overall case-only meta-analysis for this SNP, and the association in the case-only analysis in TCGA was not statistically significant in any histology group. Sex-specific RAF for this SNP was slightly higher in females as compared to males in the overall LGG group as well as the IDH1/2 mutant group.
A large region in 3p21.31 was identified that was associated with all glioma and GBM in females only (Table 2). Effects were similar in all studies included in the analysis (Supplemental Table 5, Supplemental Figure 7). The strongest association in this region was rs9841110, an intronic variant located upstream of dystroglycan 1 (DAG1) within an enhancer region. While this specific SNP was not genotyped on the germline genotyping array used for TCGA, a SNP in strong LD with rs9841110 (rs9814873, D’=1, r2=1 in CEU [18]) was able to be evaluated. The association in the case-only analysis in TCGA was not statistically significant in any histology group, but a similar trend to that observed in the overall meta-analysis in sex-specific RAF was observed in both the overall GBM group, as well as in the IDH1/2 wild type group. The identified risk allele at rs9841110 (C) is associated with significantly increased expression in glutathione peroxidase 1 (GPX1 1.3×10−7 in cerebellum, p=3.1×10−7, in frontal cortex), macrophage stimulating 1 receptor (MST1R [RON], p=4.3×10−5 in cerebellar hemisphere) and ring finger protein 123 (RNF123 [KPC1), and significantly decreased expression of macrophage stimulating 1 (MST1, p=4.8×10−6 in hypothalamus and –p=1.5×10−5 in cerebellum),and RNA binding motif protein 6 (RBM6, p=2.7×10−6 in cerebellum) in normal brain tissue [27]. Glioblastoma samples have elevated expression of GPX1 (fold change 2.79) and decreased expression of MST1R (fold change 0.44) as compared to normal tissue[25,28], and increased expression of GPX1 and MSTIR1 have been associated with poor prognosis in multiple cancer types [29,30].
Though this region has not previously been associated with glioma, previous GWAS have detected associations at 3p21.31 for a large variety of traits, including several autoimmune diseases as well as increased age at menarche [31-34]. Three variants previously associated with increased age at menarche (rs7647973-A: D’=1.0 and r2=0.1441; rs6762477-G: D’=0.66 and r2=0.1659; rs7617480-A: D’=1.0 and r2=0.1332) are in linkage disequilibrium with the identified risk allele at rs9841110 (C), with in CEU [18,32]. If lifetime estrogen exposure modifies glioma risk, it is reasonable that variants which increase age at menarche, which may potentially decrease total lifetime estrogen exposure, may also be related to glioma risk in females. Due to the complexity of measuring lifetime estrogen exposure (which is affected by age at menarche, age at menopause, parity, breast feeding patterns, and estrogen replacement therapy post-menopause) it is difficult to determine the ‘true’ effect that this exposure might have on glioma risk.
As compared to a model containing age at diagnosis and sex alone, the three SNPs (rs55705857, rs9841110 and rs11979158) identified as having sex-specific effects explain an additional 1.4% of trait variance within the GICC set. The variance explained by these SNPs varies by histology (0.6% in GBM, and 3.3% in Non-GBM). The variance explained by the addition of these three SNPs was higher in females for all glioma (1.3% in males and 2.2% in females), and non-GBM glioma (2.3% in males and 5.3% in females), and slightly higher in males for GBM (0.9% in males and 0.7% in females).
In order to compare the cumulative effects of glioma risk variants by sex, unweighted risk scores (URS) were generated by summing all risk alleles using the 10 SNPs found to be significantly associated with glioma in this analysis. GBM (URS-GBM) and non-GBM (URS-NGBM) specific URS were calculated using sets of 6 SNPs in this set that were associated with significantly associated with these histologies. Individuals with lower numbers of risk alleles had significantly lower risk of glioma, and those with higher numbers of alleles had increased risk for glioma, with statistically significant trends in each histology group). Males and females with low risk scores had similar odds of glioma, while females had increased odds in the upper strata of scores as compared to males. Development of risk scores that weight alleles by effect size, and use sex-specific estimates for variants for which effect size varies by sex (such as 7p11.2 and 8q24.21), may lead to better predictive values for risk scores.
This is the first sex-specific analysis of germline risk variants for glioma, and identifies three loci with sex-specific effects, and leverages multiple existing glioma GWAS datasets. While often not included in GWAS, sex-stratified analyses can reveal genetic sources of sexual dimorphism in risk, [9,10]. Sex variation in genetic susceptibility to disease is likely not due to sex differences in DNA sequence, but is likely to be related to sex-specific regulatory functions [35-37]. These analyses may not only contribute to understanding of sources of sex difference in incidence, but may also suggest mechanisms and pathways that vary by sex in contributions to gliomagenesis.
In addition to genetic sources of difference, there are likely several additional factors acting in combination which contribute to sex differences in glioma incidence. Sex differences in disease can also be linked to in-utero development, during which time gene expression and risk phenotypes are patterned through the action of X alleles that escape inactivation and genes on the non-pseudo-autosomal component of the Y chromosome, as well as the epigenetic effects of in utero testosterone. [38]. A previous analysis estimating heritability of brain and CNS tumors by sex using twins attempted to estimate sex-specific relative risks, but these analyses were limited by a small sample size [39]. Further investigation of the inheritance patterns of familial glioma by sex may also provide additional information about sex differences in this disease.
There are several limitations to this analysis. Individuals included in these datasets were recruited during different time periods from numerous institutions, with no central review of pathology. Molecular tumor markers were unavailable for all datasets, and as a result classifications are based on the treating pathologist using the prevailing histologic criteria at time of diagnosis. The variant at 8q24.21 has been shown to have significant association with particular molecular subtypes, and without molecular data it was not possible to determine whether the observed result is an artifact of varying molecular features by sex. Oligodendroglioma as a histology is highly enriched for IDH1/2 and 1p/19q co-deleted tumors (117/174, or ∼67% within the TCGA glioma dataset [24] and it is therefore likely that the analysis using only tumors classified as oligodendroglioma captured most of this molecular subtype. Males and females within histology groups have different frequencies of IDH1/2 mutation [24], which may have confounded the estimates for 8q24.21. The TCGA dataset was used to explore sex differences in allele frequency within molecular groups, but none of the identified SNPs were able to be directly validated within this set; however SNPs in strong LD were evaluated except for in 8q24.21. The 8q24.21 region is not well characterized on the array used for the TCGA genotyping, and as a result this region imputed poorly. No proxy SNP in strong LD with rs55705857 was able to be identified. Similar trends in RAF to those observed in the overall meta-analysis were seen in the TCGA set, though these differences were not statistically significant. Further interrogation in datasets with molecular classification where direct genotyping of these regions is warranted in order to confirm the sex-specific associations observed in this analysis.
Conclusions
Sex and other demographic differences in cancer susceptibility can provide important clues to etiology, and these differences can be leveraged for discovery in genetic association studies. This analysis identified potential sex-specific effects in 2 previous identified glioma risk loci (7p11.2, and 8q24.21), and 1 newly identified autosomal locus (3p21.31). Odds ratios for the highest strata of an unweighted risk score calculated by summing total risk alleles was higher in females as compared to males in all three histology groups. These significant differences in effect size may be a result of differing biological function of these variants by sex due to biological sex differences, or interaction between these variants and unidentified risk factors that vary in prevalence or effect by sex.
Materials and Methods
Study cohorts
This study was approved locally by the institutional review board (IRB) at University Hospitals Cleveland Medical Center and by each participating study site’s IRB. Written informed consent was obtained from all participants. In this study, data was combined from four prior glioma GWAS: Glioma International Case-Control Study (GICC), San Francisco Adult Glioma Study GWAS (SFAGS-GWAS), MD Anderson Glioma GWAS (MDA-GWAS), and National Cancer Institute’s GliomaScan (Figure 4A) [4,11-14]. The SFAGS-GWAS includes controls from the Illumina iControls dataset, and MDA-GWAS includes controls from Cancer Genetic Markers of Susceptibility (CGEMS) breast and prostate studies [15,16,40]. Details of data collection and classification are available in previous publications [4,11-14].
Genotyping and imputation of GWAS datasets
GICC cases and controls were genotyped on the Illumina Oncoarray [41]. The array included 37,000 beadchips customized to include previously-identified glioma-specific candidate single nucleotide polymorphisms (SNPs). SFAGS-GWAS cases and some controls were genotyped on Illumina’s HumanCNV370-Duo BeadChip, and the remaining controls were genotyped on the Illumina HumanHap300 and HumanHap550. MDA-GWAS cases were genotyped on the Illumina HumanHap610 and controls using the Illumina HumanHap550 (CGEMS breast [16,40]) or HumanHap300 (CGEMS prostate [15]). GliomaScan cases were genotyped on the Illumina 660W, while controls were selected from cohort studies and were genotyped on Illumina 370D, 550K, 610Q, or 660W (See Rajaraman et al. for specific details of genotyping) [14]. Details of DNA collection and processing are available in previous publications [4,12-14]. Individuals with a call rate (CR) <99% were excluded, as well as all individuals who were of non-European ancestry (<80% estimated European ancestry using the FastPop [42] procedure developed by the GAMEON consortium)). For all apparent first-degree relative pairs were removed (identified using estimated identity by descent [IBD]>.5), for example, the control was removed from a case-control pair; otherwise, the individual with the lower call rate was excluded. SNPs with a call rate <95% were excluded as were those with a minor allele frequency (MAF)<0.01, or displaying significant deviation from Hardy-Weinberg equilibrium (HWE) (p<1×10−5). Additional details of quality control procedures have been previously described in Melin et al [4]. All datasets were imputed separately using SHAPEIT and IMPUTE using a merged reference panel consisting of data from the 1,000 genomes project and the UK10K [43-47].
TCGA cases were genotyped on the Affymetrix Genomewide 6.0 array using DNA extracted from whole blood (see previous manuscript for details of DNA processing [25,26]), and underwent standard GWAS QC, and duplicate and related individuals within datasets have been excluded [4]. Ancestry outliers were identified in TCGA using principal components analysis in plink 1.9 [48]. Resulting files were imputed using Eagle 2 and Minimac3 as implemented on the Michigan imputation server (https://imputationserver.sph.umich.edu) using the Haplotype Reference Consortium Version r1.1 2016 as a reference panel [49-51]. Somatic characterization of TCGA cases was obtained from the final dataset used for the TCGA pan-glioma analysis [24], and classification schemes were adopted from Eckel-Passow, et al.[52] and Ceccarelli, et al.[24].
Sex-stratified scan of the autosomal chromosomes
The data were analyzed using sex-stratified logistic regression models in SNPTEST for all SNPs on autosomal chromosomes within 500kb of previously identified risk loci, and/or those found to be nominally significant (p<5×10−4) in a previous meta-analysis (Figure 2A) [4,53]. Sex-specific betas (βM and βF), standard errors (SEM and SEF), and p-values (pM and pF) were generated using sex-stratified logistic regression models that were adjusted for number of principal components that significantly differed between cases and controls within each study.
Estimation of sex difference and test of statistical significance
βD and SED were estimated using the sex-specific betas and standard errors separately for each dataset, as follows:
The difference between the groups was then tested using a z test.[54,55] Sex-stratified results and differences estimates from the four studies were separately combined via inverse-variance weighted fixed effects meta-analysis in META [56]. See Figure 2A for schematic of autosomal analysis methods. Case only-analyses were performed for SNPs found to be significant in agnostic analyses using sex as outcome for all glioma, GBM, and non-GBM by study and betas and standard errors were combined via inverse-variance weighted fixed effects meta-analysis in META [56].
Sex chromosome analysis
X and Y chromosome data were available from GICC set only. Males and females were imputed separately for the X chromosome using the previously described merged reference panel. X chromosomes were analyzed using logistic regression model in SNPTEST module ‘newml’ assuming complete inactivation of one allele in females, and males are treated as homozygous females (Figure 2B). For prioritized SNPs in the combined model, sex-specific effect estimates were generated using stratified logistic regression models. Y chromosome data were analyzed using logistic regression in SNPTEST (Figure 2B) [57]. Figures were generated using R 3.3.2, GenABEL, qqman, and ggplot.[58-61]
Analysis of TCGA germline and somatic data
Only newly diagnosed cases from TCGA GBM and LGG with no neo-adjuvant treatment or prior cancer were used. Demographic characteristics, molecular classification and somatic alterations data was obtained from Ceccarelli, et al [24]. Chi-square tests were used to compare the frequency of somatic alterations between age groups. SNPs found to be nominally significant (p<5×10−4) in a previous 8 study meta-analysis [4], with imputation quality >= 0.7 were identified within the TCGA genotype data and D’ and r2 values in CEU were used to select proxy SNPs [18]. Using these SNPs, a case-only analysis using sex as a binary phenotype was conducted using logistic regression in SNPTEST assuming an additive model to estimate beta, standard error, and p values [53]. Results were considered significant at p<0.003 (Bonferroni correction for 15 tests, for the three assessed loci in each of five histology groups).
Calculation of unweighted genetic risk scores
In order to estimate the cumulative effects of significant variants by sex, histology-specific unweighted risk scores were calculated using the SNPs found to be significantly associated with each outcome. Data from all four studies was merged, and any imputed genotypes with genotype probability > 0.8 were converted to hard calls. An overall unweighted risk score (URS) was generated using the sum of risk alleles at rs12752552, rs9841110, rs10069690, rs11979158, rs55705857, rs634537, rs12803321, rs3751667, rs78378222, and rs2297440. As risk alleles are known to have histology specific associations,[4] histologic specific scores were generated for GBM and non-GBM using only the SNPs found to have a significant association with each histology. GBM-specific URS (URS-G) was calculated by summing the number of risk alleles at rs9841110, rs10069690, rs 11979158, rs634537, rs78378222, and rs2297440. Non-GBM-specific (URS-N) specific URS was calculated by summing the number of risk alleles at rs10069690, rs55705857, rs634537, rs12803321, rs78378222, and rs2297440. Unweighted risk scores (URS) were calculated by summing all risk alleles for each individual. Differences in median scores between groups using were tested using Wilcoxon rank sum tests. Scores were compared against the median score for each set (URS: 10, URS-GBM: 6 alleles, URS-NGBM: 4 alleles). Odds ratios and 95% confidence intervals for each level of the score using sex-stratified logistic regression adjusted for age at diagnosis (for controls where only an age range was available, the mean value of the range was used), where each score was compared to the median score within the entire population as described in Shete et al. [13].
Calculation of trait variance explained by SNPs with sex-specific effects
In order to determine whether the identified SNPs with sex-specific effects more accurate estimate odds of glioma than sex alone, logistic regression models were used to estimate odds of all glioma, GBM, and non-GBM glioma based on sex using the GICC data only. Proportion of variance in odds of glioma explained by sex-specific SNPs was calculated using R2 estimated using the log likelihood of the null model (sex, age at diagnosis, and the first two principal components only) and the full model (including identified SNPs, rs9841110, rs11979158, rs55705857) [62], calculated as follows:
Proportion of variance explained was also calculated separately by sex for each histology (null model adjusted for age at diagnosis, and the first two principal components only).
Acknowledgements
The GICC was supported by grants from the National Institutes of Health, Bethesda, Maryland (R01CA139020, R01CA52689, P50097257, P30CA125123, P30CA008748, C. Thompson PI). Additional support was provided by the McNair Medical Institute and the Population Sciences Biorepository at Baylor College of Medicine.
In Sweden work was additionally supported by Acta Oncologica through the Royal Swedish Academy of Science (BM salary) and The Swedish Research council and Swedish Cancer foundation. We are grateful to the National clinical brain tumor group, all clinicians and research nurses throughout Sweden who identified all cases.
The UCSF Adult Glioma Study was supported by the National Institutes of Health (grant numbers R01CA52689, P50CA097257, R01CA126831, and R01CA139020), the Loglio Collective, the National Brain Tumor Foundation, the Stanley D. Lewis and Virginia S. Lewis Endowed Chair in Brain Tumor Research, the Robert Magnin Newman Endowed Chair in Neuro-oncology, and by donations from families and friends of John Berardi, Helen Glaser, Elvera Olsen, Raymond E. Cooper, and William Martinusen. This project also was supported by the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health, through UCSF-CTSI Grant Number UL1 RR024131. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. The collection of cancer incidence data used in this study was supported by the California Department of Public Health as part of the statewide cancer reporting program mandated by California Health and Safety Code Section 103885; the National Cancer Institute’s Surveillance, Epidemiology and End Results Program under contract HHSN261201000140C awarded to the Cancer Prevention Institute of California, contract HHSN261201000035C awarded to the University of Southern California, and contract HHSN261201000034C awarded to the Public Health Institute; and the Centers for Disease Control and Prevention’s National Program of Cancer Registries, under agreement # U58DP003862-01 awarded to the California Department of Public Health. The ideas and opinions expressed herein are those of the author(s) and endorsement by the State of California Department of Public Health, the National Cancer Institute, and the Centers for Disease Control and Prevention or their Contractors and Subcontractors is not intended nor should be inferred. Other significant contributors for the UCSF Adult Glioma Study include: M Berger, P Bracci, S Chang, J Clarke, A Molinaro, A Perry, M Pezmecki, M Prados, I Smirnov, T Tihan, K Walsh, J Wiemels, S Zheng.
GliomaScan group comprised: Laura E. Beane Freeman, Stella Koutros, Demetrius Albanes, Kala Visvanathan, Victoria L. Stevens, Roger Henriksson, Dominique S. Michaud, Maria Feychting, Anders Ahlbom, Graham G. Giles Roger Milne, Roberta McKean-Cowdin, Loic Le Marchand, Meir Stampfer, Avima M. Ruder, Tania Carreon, Goran Hallmans, Anne Zeleniuch-Jacquotte, J. Michael Gaziano, Howard D. Sesso, Mark P. Purdue, Emily White, Ulrike Peters, Howard D. Sesso, Julie Buring.
UK10K data generation and access was organized by the UK10K consortium and funded by the Wellcome Trust.
We are grateful to all the patients and individuals for their participation and we would also like to thank the clinicians and other hospital staff, cancer registries and study staff in respective centers who contributed to the blood sample and data collection.
Footnotes
↵^ Membership of the GliomaScan Consortium is presented in the acknowledgements