Abstract
Over 90 regions of the genome have been associated with lung function to date, many of which have also been implicated in chronic obstructive pulmonary disease (COPD). We carried out meta-analyses of exome array data and three lung function measures: forced expiratory volume in one second (FEV1), forced vital capacity (FVC) and the ratio of FEV1 to FVC (FEV1/FVC). These analyses by the SpiroMeta and CHARGE consortia included 60,749 individuals of European ancestry from 23 studies, and 7,721 individuals of African Ancestry from 5 studies in the discovery stage, with follow-up in up to 111,556 independent individuals. We identified significant (P<2·8x10-7) associations with six SNPs: a nonsynonymous variant in RPAP1, which is predicted to be damaging, three intronic SNPs (SEC24C, CASC17 and UQCC1) and two intergenic SNPs near to LY86 and FGF10. eQTL analyses found evidence for regulation of gene expression at three signals and implicated several genes including TYRO3 and PLAU. Further interrogation of these loci could provide greater understanding of the determinants of lung function and pulmonary disease.
Introduction
Lung function measures are important predictors of mortality and morbidity and form the basis for the diagnosis of chronic obstructive pulmonary disease (COPD), a leading cause of death globally.1 Lung function is largely affected by environmental factors such as smoking and exposure to air pollution; however there is also a genetic component, with heritability estimates ranging between 39-66%.2–5 A number of large-scale genome-wide association studies (GWAS) of lung function have successfully identified single nucleotide polymorphisms (SNPs) influencing lung function at over 90 independent loci.6–13 Associations have also been identified in GWAS of COPD;14–18 however the identification of disease associated SNPs has been restricted by limited sample sizes. Many signals first identified in powerful studies of quantitative lung function traits, have been found to be associated with risk of COPD, highlighting the potential clinical usefulness of comprehensive identification of lung function associated SNPs.13
Low frequency (minor allele frequency (MAF) 1-5%) and rare (MAF<1%) variants, have been largely underexplored by GWAS to date. Exome arrays have been designed to facilitate the investigation of these low frequency and rare variants, predominately within coding regions, in large sample sizes. Alongside a core content of rare coding SNPs, the exome array additionally includes common variation including tags for previously identified GWAS hits, ancestry informative SNPs, a grid of markers for estimating identity by descent and a random selection of synonymous SNPs.19
Results
We carried out a meta-analysis of exome array data and three lung function measures: forced expiratory volume in one second (FEV1), forced vital capacity (FVC) and the ratio of FEV1 to FVC (FEV1/FVC). These analyses included 68,470 individuals from the SpiroMeta and CHARGE consortia in a discovery analysis, with follow-up in an independent sample of up to 111,556 individuals. All studies are listed with their study-specific sample characteristics in Table 1, with full study descriptions, including details of spirometry and other measurements described in the Supplementary Note. The genotype calling procedures implemented by each study (Supplementary Table 1) and quality control of genotype data are described in the Supplementary Methods. We have undertaken both single variant analyses, and gene-based associations, which test for the joint effect of several rare variants in a gene (see methods for details).
Meta-analyses of single variant associations
We first evaluated single variant associations between FEV1, FVC and FEV1/FVC and the 179,215 SNPs which passed study level quality control and were polymorphic in both consortia. These analyses identified 34 SNPs in regions not previously associated with lung function, showing association with at least one trait at overall P<10-5, and showing association with consistent direction and P<0·05 in both consortia (full results in Supplementary Table 2, quantile-quantile and Manhattan plots shown in Supplementary Figure 1). We followed up these SNP associations in a replication analysis comprising 3 studies with 111,556 individuals. Combining the results from the discovery and replication stages in a meta-analysis identified six SNPs in total that were independent to known signals and met the pre-defined significance threshold (P<2·8x10-7) overall in, or near to FGF10, LY86, SEC24C, RPAP1, CASC17 and UQCC1 (Table 2, Supplementary Figure 2). A SNP near to the CASC17 signal (rs11654749, r2=0·3 with rs1859962) has previously been associated with FEV1 in a genome-wide analysis of gene-smoking interactions, although this association was not replicated at the time;20 the present analysis provides the first evidence for independent replication of this signal. A seventh signal was also identified in LCT (Table 2, Supplementary Figure 2); whilst this locus has not previously been implicated in lung function, this SNP is known to vary in frequency across European populations,21 and we cannot rule out that this association is not an artefact of population structure. Our discovery analysis furthermore identified associations (P<10-5) in 25 regions previously associated with one or more of FEV1, FVC and FEV1/FVC (Supplementary Table 3).
Generally, the observed effect of the SNPs at the novel signals were similar in ever and never smokers; the exception was rs1448044 near FGF10, which showed a significant association with FVC only in ever smokers in our discovery analysis (ever smokers P=1·49x10-6; never smokers P=0·695, Supplementary Table 4 and Supplementary Figure 3). In the replication analysis, however, this association was observed in both ever and never smokers (ever smokers P=3·14x10-5; never smokers P=1·40x10-4, Supplementary Table 5). For rs1200345 (RPAP1) and rs1859962 (CASC17)), associations were most statistically significant in the analyses restricted to individuals of European Ancestry (Supplementary Table 4 and Supplementary Figure 3), as was the association with rs2322659 (LCT), giving further support that this association may be due to population stratification.
Meta-Analyses of gene-based associations
We undertook Weighted Sum Tests (WST)22 and Sequence Kernel Association tests (SKAT)23 to assess the joint effects of multiple low frequency variants within genes on lung function traits. In our discovery analyses of all 68,470 individuals, we tested up to 14,380 genes that had at least two variants with MAF<5% and met the inclusion criteria (exonic or loss of function [LOF], see methods for definitions) in both consortia. The SKAT analyses identified 16 genes associated (P<0·05 in both consortia and overall P<10-4) with FEV1, FVC or FEV1/FVC (Supplementary Table 6), whilst the WST analyses identified 12 genes (Supplementary Table 7). There was one gene (LY6G6D) that was identified in both analyses. These genes were followed up in UK Biobank, with two genes, GPR126 and LTBP4, showing evidence of replication in the exonic SKAT analysis (P<3·5x10-6); however conditional analyses in UK Biobank showed that both these associations were driven by single SNPs, that were identified in the single variant association analyses and have been previously reported in GWAS of these traits (Tables E6 & E7).
Functional characterization of novel loci
In order to gain further insight into the six loci identified in our analyses of single variant associations (excluding LCT), we employed functional annotation and assessed whether identified SNPs in these regions were associated with gene expression levels. One of the identified novel SNPs was nonsynonymous, three intronic and two were intergenic. We found evidence that three of the SNPs may be involved in cis-acting regulation of the expression of several genes in multiple tissues (Supplementary Table 8).
SNP rs1200345 in RPAP1 is a nonysynomous variant, predicted to be deleterious by both SIFT (deleterious) and Polyphen (possibly damaging) (Supplementary Table 9); RPAP1 is ubiquitously expressed, with high levels of protein detected in the lung (Supplementary Table 10). SNP rs1200345 or proxies (r2>0·8) were also found to be amongst the most strongly associated SNPs with expression levels of RPAP1 in several tissues including lung, and with a further six genes in lung tissue (Supplementary Table 8) including TYRO3, one of the TAM family of receptor tyrosine kinases. TYRO3 regulates several processes including cell survival, migration and differentiation and is highly expressed in lung macrophages (Supplementary Table 10). Evidence for associations with gene expression was found at two more of the novel signals (sentinel SNPs rs3849969 and rs6088813), implicating a further 16 genes. Of note, in blood eQTL databases, a proxy of a SNP in complete linkage disequilibrium (r2=1) with rs3849969 (rs3812637) was an eQTL for plasminogen activator, urokinase (PLAU).
Discussion
We undertook an analysis of 68,470 individuals from 23 studies with data from the exome array and three lung function traits, following up the most significant single SNP and gene-based associations in an independent sample of up to 111,556 individuals. The combined analyses of our discovery and replication single variant associations identified six SNPs meeting the pre-defined significance threshold (P<2·8x10-7). The replication stage results for these six SNPs also meet Bonferroni corrected significance for independent replication (P<1·47x10-3, corrected for 34 SNPs being tested).
One of these SNPs is in a region that has previously been implicated in lung function (near KCJN2/SOX9),20 whilst the remaining five SNPs, although all common, have not previously been identified in other GWAS of lung function. In a recent 1000 Genomes imputed analysis of lung function (which includes some of the studies contributing to the present discovery analysis), all of these SNPs showed at least suggestive association (2·97x10-3>P>1·28x10-5) with one or more lung function trait, but none reached the required threshold (P<5x10-6) to be taken forward for replication in that analysis.12
We further identified a seventh association (P<2·8x10-7) with rs2322659 in LCT. Given SNPs in this region are known to vary in frequency across European populations, we cannot dismiss the possibility that this association may be confounded by population stratification; hence we do not report this signal as a novel lung function locus. We undertook a look-up of associations in our discovery meta-analyses of 7 loci (including LCT) that were identified as showing differences in allele frequency between individuals from different regions in the UK,24 and subsequently across European populations.25 Aside from the association between the LCT locus and FVC, no significant associations were observed between SNPs at these loci and any lung function trait, in either the analyses restricted to EA individuals, or in the analysis of EA and AA individuals combined (Supplementary Table 11); this suggests population structure was generally accounted for adequately in our analyses.
One of the novel signals was with a nonsynonymous SNPs: rs1200345 in RPAP1, which is predicted to be deleterious. This SNP and proxies with r2>0·8 were also associated with expression in lung tissue of seven genes, including RPAP1 and the TAM receptor TYRO3. TAM receptors play a role in the inhibition of Toll-like receptors (TLRs)-mediated innate immune response by initiating the transcription of cytokine signalling genes (SOCS-1 and 3) which limit cytokine overproduction and inflammation.26,27 It has been shown that influenza viruses H5N1 and H7N9 can cause downregulation of Tyro3, resulting in an increased inflammatory cytokine response.27
Three further signals were with intronic SNPs in SEC24C, CASC17, and UQCC1. Two of these intronic SNPs have previously been implicated in GWAS of other traits: rs1859962 in CASC17 with prostate cancer28 and rs6088813 in UQCC1 with height.29 The CASC17 locus, near KCNJ2/SOX9 has also previously been implicated in lung function, showing significant association with FEV1 in a genome-wide analysis of gene-smoking interactions, however this association was not formally replicated.20 Whilst the individuals utilised in the discovery stage of this analysis overlap with those included in this previous interaction analysis, the replication stage of the present study provides the first evidence of replication for this signal in independent cohorts. In the present analysis, there was no evidence that the results differed by smoking status.
SNPs rs6088813 in UQCC1 and rs3849969 in SEC24C were identified as eQTLs for multiple genes. Notably, a SNP in complete linkage disequilibrium with rs3849969 (rs3812637, r2=1) is associated with expression of PLAU in blood. The plasminogen activator, urokinase (PLAU) plays a role in fibrinolysis and immunity, and with its receptor (PLAUR) is involved in degradation of the extra cellular matrix, cell migration, cell adhesion and cell proliferation.30 A study of preterm infants with respiratory distress syndrome, a condition characterised by intra-alveolar fibrin deposition, found PLAU and its inhibitor SERPINE1 to be expressed in the alveolar epithelium, and an increased ratio of SERPINE1 to PLAU was associated with severity of disease.31 Studies in mice have also shown that increased expression of PLAU may be protective against lung injury, by reducing fibrosis.32 PLAU has also been found to be upregulated in lung epithelial cells subjected to cyclic strain33 and in patients with COPD and lung cancer, PLAU was found to be expressed in alveolar macrophages and epithelial cells.30
The final two signals were with common intergenic SNPs close to LY86 and FGF10. LY86 (lymphocyte antigen 86) interacts with the Toll-like receptor signalling pathway, when bound with RP105 to form a heterodimer. 34 The sentinel SNP rs1294421 has previously been associated with waist-hip ratio,35 and an intronic SNP within LY86 (rs7440529, LD with rs1294421: r2=0·005) has previously been implicated in asthma in two studies of individuals of Han Chinese ancestry. 36,37 FGF10 is a member of the fibroblast growth factor family of proteins, involved in a number of biological processes, including embryonic development, cell growth, morphogenesis, tissue repair, tumor growth and invasion. Specifically, the FGF10 signalling pathway plays an essential role in lung development and lung epithelial renewal.38 A study in mice demonstrated that a deficiency in FGF10 resulted in a fatal disruption of branching morphogenesis during lung development.39
Our discovery analyses included individuals of both European and African ancestry. Two of the identified six novel signals showed inconsistent effects in the African and European ancestry individuals. For these SNPs, the associations in African Ancestry individuals were not statistically significant, and we report associations from the analysis restricted to European ancestry individuals only. For the remaining four SNPs similar effects were observed in both the European and African ancestry individuals (Supplementary Figure 3). We also examined the effects of the novel SNPs in ever smokers and never smokers separately and found these to be broadly similar, with the exception of rs1448044 in FGF10, which in the discovery analysis showed significant association with FVC in ever smokers, whilst showing no association in never smokers (P=0·695). In our replication stage analyses, similar effects were seen in both ever and never smokers for this SNP however, and the combined analysis of discovery and replication stages for this SNP, including both ever and never smokers, met the exome chip-wide significance level overall (P=4·22x10-9). We also considered whether this signal could be driven by smoking behaviour in our discovery stage as our primary analyses in SpiroMeta did not adjust for smoking quantity. We undertook a look-up of this SNP in the publicly available results of a GWAS of several smoking behaviour traits;40 there was only weak evidence that this SNP was associated with ever versus never smoking (P=0·039), and no evidence for association with amount smoked (cigarettes per day, P=0·10).
Through the use of the exome array, we aimed to identify associations with low frequency and rare functional variants, thereby potentially uncovering some of the missing heritability of lung function. However, whilst our discovery analyses identified single SNP associations with 23 low frequency variants (Supplementary Table 2), we did not replicate any of these findings. Eleven of these 23 SNPs we were unable to follow-up in our replication studies, due to them either being not genotyped, or monomorphic. Overall, our lack of convincing associations with rare variants is likely due to limited statistical power for identifying single variant associations, particularly if those variants exhibit only modest effects.41 We additionally employed SKAT and WST gene-based tests to investigate the joint effects of low frequency and rare variants within genes, on lung function traits. These analyses identified associations with a number of genes that did not appear to be driven by single SNPs. Replication of these signals proved difficult however, as again many SNPs included within the discovery stage of these analyses were not genotyped, or were monomorphic in the replication studies. This often meant a disparity in the gene unit being tested in our discovery and replication samples; hence the interpretation of these results was not straightforward. In the end, we were able to replicate only findings with common SNPs. This finding is in line with several other studies of complex traits and exome array data that have been unable to report robust associations with low frequency variants42–44 and it is clear that future studies, will require increasingly larger sample sizes in order to fully evaluate the effect of variants across the allele frequency spectrum. The identification of common SNPs remains important however, as such findings have the potential to highlight drug targets,45 and these variants collectively could have utility in risk prediction.
This study has identified six common SNPs, independent to signals previously implicated in lung function. Further interrogation of these loci could lead to greater understanding of lung function and lung disease, and could provide novel targets for therapeutic interventions.
Methods
Study Design, cohorts and genotyping
The SpiroMeta analysis included 23,751 individuals of European ancestry (EA) from 11 studies, and the CHARGE analysis comprised 36,998 EA individuals and a further 7,721 individuals of African ancestry (AA) from 12 studies. Follow-up analyses were conducted in an independent sample of up to 111,556 individuals from UK Biobank, the UK Household Longitudinal Study (UKHLS) and the Netherlands Epidemiology of Obesity (NEO) Study (Figure 1). All studies (excluding UK Biobank) were genotyped using either the Illumina Human Exome BeadChip v1or the Illumina Infinium HumanCoreExome-12 v1·0 BeadChip. UK Biobank samples were genotyped using the Affymetrix Axiom UK BiLEVE or UK Biobank arrays.
Statistical analyses
Consortium level analyses
Within the SpiroMeta Consortium, each study contributing to the discovery analyses calculated single-variant score statistics, along with covariance matrices describing correlations between variants, using RAREMETALWORKER46 or rvtests.47 For each trait, these summary statistics were generated separately in ever and never smokers, with adjustment for sex, age, age2 and height, and with each trait being inverse normally transformed prior to association testing. For studies with unrelated individuals, further adjustments were made for the first 10 ancestry principal components, whilst studies with related individuals utilised linear mixed models to account for familial relationships and underlying population structure.
Within the CHARGE Consortium, each study generated equivalent summary statistics using the R package SeqMeta.48 For each trait, summary statistics were generated in ever and never smokers separately, and in all individuals combined. The untransformed traits were used for all analyses, adjusted for smoking status and pack-years, age, age2, sex, height, height2, centre/cohort. Models for FVC were additionally adjusted for weight. Linear regression models, with adjustment for principal components of ancestry were used for studies with unrelated individuals, and linear mixed models were used for family-based studies.
Within each consortium we used the score statistics and variance-covariance matrices generated by each study to construct both single variant and gene-based tests using either RAREMETAL46 (SpiroMeta) or SeqMeta48 (CHARGE). For single variant associations, score statistics were combined in fixed effects meta-analyses. Two gene-based tests were constructed: first, the Weighted Sum Test (WST) using Madsen Browning weightings,22 and secondly, the Sequence Kernel Association Test (SKAT).23 We performed the SKAT and WST tests using two subsets of SNPs: 1) including all SNPs with an overall consortium-wide MAF<5% that were annotated as splicing, stopgain, stoploss, or frameshift (loss of function [LOF] analysis), and 2) including all SNPs meeting the LOF analysis criteria in addition to all other nonsynonymous variants with consortium wide MAF<5% (exonic analysis). Variants were annotated to genes using dbNSFP v2·649 on the basis of the GRCh37/hg19 database.
For both single variant and gene-based associations, consortium-level results were generated for ever smokers and never smokers separately, and in all individuals combined. Within the CHARGE Consortium, results were combined separately for the EA and AA studies and also in a trans-ethnic analysis of both ancestries.
Combined Meta-Analysis
The single variant association results from the SpiroMeta and CHARGE consortia were combined as follows: The genomic inflation statistic (λ) was calculated for SNPs with consortium-wide MAF>1%; where λ had a value greater than one, genomic control adjustment was applied to the consortium level P-values. The consortium-level results were then combined using sample size weighted z-score meta-analysis. The λ was again calculated for the meta-analysis results and genomic control applied, as appropriate. Since we were interested in identifying low frequency and rare variants, we applied no MAF or minor allele count (MAC) filter. We identified SNPs of interest as those with an overall P<10-5 and a consistent direction of effect and P<0·05 observed in both consortia. Where we identified a SNP within 1Mb of a previously identified lung function SNP, we deemed the SNP to represent an independent signal if it had r2<0·2 with the known SNP, and if it retained a P <10-5, when conditional analyses were carried out with the known SNP, or a genotyped proxy, using data from the SpiroMeta Consortium, or UK Biobank. Our primary meta-analysis included all individuals; we additionally carried out analyses in smoking subgroups (ever and never smokers), and in the subgroup of individuals of European ancestry only.
For genes which contained at least 2 polymorphic SNPs in both consortia, we combined the results of the consortium level gene based tests using either z-score meta-analysis (WST) or Fisher’s Method for combining P-values (SKAT). We identified genes of interest as those with P<0·05 observed in both consortia and an overall P<10-4. As in the analyses of single variant associations, our primary meta-analyses included all individuals, with secondary analyses undertaken in smoking and ancestry specific subgroups.
Replication Analyses
All SNP and gene-based associations were followed up for the trait with which they showed the most statistically significant association only. For associations identified through the smoking subgroup analyses, we followed up associations in the appropriate smoking strata; however no ancestry stratified follow-up was undertaken as replication studies included only a sufficient number of individuals of European Ancestry.
Single variant associations in UK Biobank were tested in ever smokers and never smokers separately using the score test as implemented in SNPTEST v2·5b4.50 Traits were adjusted for age, age2, height, sex, ten principal components and pack-years (ever smokers only), and inverse normally transformed. For UKHLS, analyses were undertaken analogously to the SpiroMeta discovery studies using RAREMETALWORKER, while for NEO, analyses were undertaken in the same way as was done in the CHARGE discovery studies using SeqMeta. The single variant results from all replication studies were combined using sample size weighted Z-score meta-analysis. Subsequently we combined the results from the discovery and replication stage analyses and we report SNPs with overall exome-wide significance of P<2·8x10-7 (Bonferroni corrected for the original 179,215 SNPs tested).
We followed up genes of interest (P<10-4) using data from UK Biobank only. Summary statistics for UK Biobank were generated using RAREMETALWORKER, with gene-based tests then constructed using RAREMETAL. Finally, we combined the results from the discovery analysis with the replication results in an overall combined meta-analysis using either z-score meta-analysis (WST) or Fisher’s Method (SKAT). We declared genes with overall P<3·5x10-6 (Bonferroni corrected for 14,380 genes tested) in our combined meta-analysis to be statistically significant. For these statistically significant genes, we carried out additional analyses using the UK Biobank data in which we conditioned on the most significantly associated individual SNP within that gene, to determine whether this was a true gene-based signal, or whether the association could be ascribed to the single SNP (if the conditional P<0·01, then association was deemed to not be driven by the single SNP).
Characterization of findings
In order to gain further insight into the loci identified in our analyses of single variant associations, we assessed whether these regions were associated with gene expression levels in various tissues (FDR of 5%, or q-value<0·05), by querying a publically available blood eQTL database 51 and the GTEx project 52 for the sentinel SNPs, or any proxy (r2>0·8). We further assessed SNPs of interest (and proxies) within a lung eQTL resource based on non-tumour lung tissues of 1,111 individuals.53–55 Descriptions of these resources and further details of the look-ups are provided in the Supplementary Methods. Moreover, all sentinel SNPs and proxies with r2>0.8 were annotated using ENSEMBL’s Variant Effect Predictor (VEP);56 potentially deleterious coding variants were identified as those annotated as ‘deleterious’ by SIFT57 or ‘probably damaging’ or ‘possibly damaging’ by PolyPhen-2.58 For all genes implicated through the expression data or functional annotation, we searched for evidence of protein expression in the respiratory system by querying the Human Protein Atlas.59
Author Contributions
Ordered Alphabetically:
ABW, AGE, AL, BMP, BS, CH, CP, DOMK, DPS, EZ, GGB, HS, IPH, JBJ, JK, KMB, LL, MAI, MAP, MDT, MK, NG, NMPH, OP, OTR, RdM, RGB, SBK, SG, SJL, SSR, TA, TBH, TH, TL, TR, TS, UG contributed to study concept and designs. AC, AJ, A.Manichaikul, BHS, BMP, BS, CP, DJP, DPS, EI, GGB, GTOC, IJD, JBJ, JGW, JK, JMS, KS, LAL, LL, LL, MAP, MI, MK, NG, NMPH, OP, OTR, PAC, RdM, RGB, RR, SBK, SE, SEH, SG, SK, SK, TA, TBH, TDP, TL, TNB, TR, UG, WT, WT contributed to phenotype data acquisition and quality control. AGE, AJ, AK, AK, ALT, ALT, A.Manichaikul, APM, AT, BMP, BP, CH, DOMK, EI, GD, HV, IJD, JAB, JCM, JGW, JL, KDT, KEN, KL, L-PL, LAL, LL, MAP, MI, MLG, NMPH, OP, RGB, RLG, RR, SBK, SE, SEH, SRH, SSR, SW, TBH, TDP, TH, TL, YL contributed to genotype data acquisition and quality control. DDS, KH, WT, YB contribute to eQTL data acquisition and quality control. ABW, ACM, AK, AK, ALT, A.Mahajan, A.Manichaikul, APM, AT, BP, BQ, CH, CMS, EA, HV, IPH, JAB, JCL, JD, JEH, JL, JM, JMJ, KL, L-PL, LL, LVW, MDT, MI, MO, NF, NMPH, OP, PAC, RLG, SE, SEH, SJL, SW, TDP, TH, TMB, VEJ, WG, WT, YL contributed to data analysis. All authors contributed to writing and/or critical review of the manuscript.
The ‘Understanding Society Scientific Group’ include the following: Understanding Society Scientific Group: Michaela Benzeval, Jonathan Burton, Nicholas Buck, Annette Jäckle, Meena Kumari, Heather Laurie, Peter Lynn, Stephen Pudney, Birgitta Rabe, Shamit Saggar, Noah Uhrig, Dieter Wolke.
Competing Interests
JK Consulted for Pfizer on nicotine dependence in 2012-2014. WT received fees to the Institution, all outside the submitted work, from Pfizer, GSK, Chiesi, Roche Diagnostics / Ventana, Biotest, Merck Sharp Dohme, Novartis, Lilly Oncology, Boehringer Ingelheim, grants from Dutch Asthma Fund. BMP serves on the DSMB of a clinical trial funded by the manufacturer (Zoll LifeCor) and on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson.
Acknowledgements
This research has been conducted using the UK Biobank Resource. This article presents independent research funded partially by the National Institute for Health Research (NIHR). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. These data are from Understanding Society: The UK Household Longitudinal Study (UKHLS), which is led by the Institute for Social and Economic Research at the University of Essex and funded by the Economic and Social Research Council. The data were collected by NatCen and the genome wide scan data were analysed by the Wellcome Trust Sanger Institute. Information on how to access the data can be found on the Understanding Society website https://www.understandingsociety.ac.uk/. The authors would like to thank the staff at the Quebec Respiratory Health Network Tissue Bank for their valuable assistance with the lung eQTL dataset at Laval University. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. A full list of principal CHS investigators and institutions can be found at CHS-NHLBI.org. The authors thank the staff and participants of the ARIC study for their important contributions. The authors of the NEO study thank all individuals who participated in the Netherlands Epidemiology in Obesity study, all participating general practitioners for inviting eligible participants and all research nurses for collection of the data. We thank the NEO study group, Pat van Beelen, Petra Noordijk and Ingeborg de Jonge for the coordination, lab and data management of the NEO study. SAPALDIA could not have been done without the help of the study participants, technical and administrative support and the medical teams and field workers at the local study sites. Local fieldworkers : Aarau: M Broglie, M Bünter, D Gashi, Basel: R Armbruster, T Damm, U Egermann, M Gut, L Maier, A Vögelin, L Walter, Davos: D Jud, N Lutz, Geneva: M Ares, M Bennour, B Galobardes, E Namer, Lugano: B Baumberger, S Boccia Soldati, E Gehrig-Van Essen, S Ronchetto, Montana: C Bonvin, C Burrus, Payerne: S Blanc, AV Ebinger, ML Fragnière, J Jordan, Wald: R Gimmi, N Kourkoulos, U Schafroth. Administrative staff: N Bauer, D Baehler, C Gabriel, R Gutknecht. SAPALDIA Team: Study directorate: NM Probst Hensch, T Rochat, N Künzli, C Schindler, JM Gaspoz. Scientific team: JC Barthélémy, W Berger, R Bettschart, A Bircher, G Bolognini, O Brändli, C Brombach, M Brutsche, L Burdet, M Frey, U Frey, MW Gerbase, D Gold, E de Groot, W Karrer, R Keller, B Knöpfli, B Martin, D Miedinger, U Neu, L Nicod, M Pons, F Roche, T Rothe, E Russi, P Schmid-Grendelmeyer, A Schmidt-Trucksäss, A Turk, J Schwartz, D. Stolz, P Straehl, JM Tschopp, A von Eckardstein, E Zemp Stutz. Scientific team at coordinating centers: M Adam, E Boes, PO Bridevaux, D Carballo, E Corradi, I Curjuric, J Dratva, A Di Pasquale, L Grize, D Keidel, S Kriemler, A Kumar, M Imboden, N Maire, A Mehta, F Meier, H Phuleria, E Schaffner, GA Thun, A Ineichen, M Ragettli, M Ritter, T Schikowski, G Stern, M Tarantino, M Tsai, M Wanner. MDT has been supported by MRC fellowships G0501942 and G0902313. MDT and LVW are supported by the MRC (MR/N011317/1). IPH is supported by the MRC (G1000861). ALW and SJL are supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (ZIA ES 043012). We acknowledge use of phenotype and genotype data from the British 1958 Birth Cohort DNA collection, funded by the Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02. APM is a Wellcome Trust Senior Fellow in Basic Biomedical Science (grant number WT098017) and also supported by Wellcome Trust grant WT064890. EI is supported by the Swedish Research Council (2012-1397), Knut och Alice Wallenberg Foundation (2013.0126) and the Swedish Heart-Lung Foundation (20140422). JK is supported by Academy of Finland Center of Excellence in Complex Disease Genetics grants 213506, 129680 and Academy of Finland grants 265240, 263278. The Finnish Twin Cohort is supported by the Welcome Trust Sanger Institute, UK. The Lothian Birth Cohort is supported by Age UK (The Disconnected Mind Project), the UK Medical Research Council (MR/K026992/1) and The Royal Society of Edinburgh. ÅJ is supported by the Swedish Society for Medical Research (SSMF), The Kjell och Märta Beijers Foundation, The Marcus Borgström Foundation, The Åke Wiberg foundation and The Vleugels Foundation. UG is supported by Swedish Medical Research Council grants K2007-66X-20270-01-3 and 2011-2354 and European Commission FP6 (LSHG-CT-2006-01947). SHIP is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the Federal Ministry of Education and Research, the Ministry of Cultural Affairs as well as the Social Ministry of the Federal State of Mecklenburg-West Pomerania, and the network ‘Greifswald Approach to Individualized Medicine (GANI_MED)’ funded by the Federal Ministry of Education and Research, and the German Asthma and COPD Network (COSYCONET) (grant no.01ZZ9603, 01ZZ0103, 01ZZ0403, 03IS2061A, BMBF 01GI0883). ExomeChip data have been supported by the Federal Ministry of Education and Research (grant no. 03Z1CN22) and the Federal State of Mecklenburg-West Pomerania. The University of Greifswald is a member of the Caché Campus program of the InterSystems GmbH. UKHLS is supported by grants WT098051 (Wellcome Trust) and ES/H029745/1 (Economic and Social Research Council). Y.B. holds a Canada Research Chair in Genomics of Heart and Lung Diseases. Lies Lahousse is a Postdoctoral Fellow of the Research Foundation - Flanders (FWO grant G035014N). The Rotterdam Study is funded by Erasmus Medical Center and Erasmus University, Rotterdam, the Netherlands Organization for Scientific Research (NOW), the Netherlands Organization for the Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. Genotyping in the Rotterdam study was supported by Netherlands Organization for Scientific Research (NOW grants 175.010.2005.011; 911-03-305 012), the Research Institute for Diseases in the Elderly (RIDE2 grants 014-93-015) and Netherlands Genomics Initiative (NGI)/Netherlands Consortium for Healthy Aging (NCHA grant050-060-810). MESA/MESA SHARe is supported by HHS (HHSN268201500003I), NIH/NHLBI (contracts N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169) and HIH/NCATS (contracts UL1-TR-000040, UL1-TR-001079, UL1-TR-001881, DK063491). MESA SHARe is funded by NIH/NHLBI contract N02-HL-64278, MESA Air is funded by US EPA (RD831697) and MESA Spirometry funded by NIH/NHLBI (R01-HL077612). SSR and BMP are supported by NIH/NHLBI grant rare variants and NHLBI traits in deeply phenotyped cohorts (R01-HL120393). Cardiovascular Health Study: This CHS research was supported by NHLBI contracts HHSN268201200036C, HHSN268200800007C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086; and NHLBI grants U01HL080295, R01HL068986, R01HL087652, R01HL105756, R01HL103612, R01HL120393, and R01HL130114 with additional contribution from the National Institute of Neurological Disorders and Stroke (NINDS). Additional support was provided through R01AG023629 from the National Institute on Aging (NIA). The provision of genotyping data was supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR000124, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. The Atherosclerosis Risk in Communities (ARIC) study is carried out as a collaborative study supported by the National Heart, Lung, and Blood Institute (NHLBI) contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C). Funding support for “Building on GWAS for NHLBI-diseases: the U.S. CHARGE consortium” was provided by the NIH through the American Recovery and Reinvestment Act of 2009 (ARRA) (5RC2HL102419). DOMK received funding from the Dutch Science Organisation (ZonMW-VENI Grant 916.14.023). The genotyping in the NEO study was supported by the Centre National de Génotypage (Paris, France), headed by Jean-François Deleuze. The NEO study is supported by the participating Departments, the Division and the Board of Directors of the Leiden University Medical Center, and by the Leiden University, Research Profile Area Vascular and Regenerative Medicine. SAPALDIA was supported by the Swiss National Science Foundation (grants no 33CS30-148470/1, 33CSCO- 134276/1, 33CSCO-108796, 324730_135673, 3247BO-104283, 3247BO-104288, 3247BO-104284, 3247-065896, 3100-059302, 3200-052720, 3200-042532, 4026-028099, PMPDP3_129021/1, PMPDP3_141671/1), the Federal Office for the Environment, the Federal Office of Public Health, the Federal Office of Roads and Transport, the canton’s government of Aargau, Basel-Stadt, Basel-Land, Geneva, Luzern, Ticino, Valais, and Zürich, the Swiss Lung League, the canton’s Lung League of Basel Stadt/Basel Landschaft, Geneva, Ticino, Valais, Graubünden and Zurich, Stiftung ehemals Bündner Heilstätten, SUVA, Freiwillige Akademische Gesellschaft, UBS Wealth Foundation, Talecris Biotherapeutics GmbH, Abbott Diagnostics, European Commission 018996 (GABRIEL), Wellcome Trust WT 084703MA. The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent Research Center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation (www.metabol.ku.dk). Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates [CZD/16/6] and the Scottish Funding Council [HR03006]. Genotyping of the GS:SFHS samples was carried out by the Genetics Core Laboratory at the Edinburgh Clinical Research Facility, University of Edinburgh, Scotland and was funded by the Medical Research Council UK. The Croatia KORCULA study was supported by the Ministry of Science, Education and Sport in the Republic of Croatia (108-1080315-0302). JD, JCL, WG and GTOC are supported by NIH/NHLBI Contract HHSN268201500001I. Genotyping, quality control and calling of the Illumina HumanExome BeadChip in the Framingham Heart Study was supported by funding from the National Heart, Lung and Blood Institute Division of Intramural Research (Daniel Levy and Christopher J. O’Donnell, Principle Investigators). The AGES study is supported by the NIH (N01-AG012100), the Iceland Parliament (Alþingi) and the Icelandic Heart Association. HABC was supported by NIA contracts N01AG62101, N01AG62103, and N01AG62106; NIA grant R01-AG028050, and NINR grant R01- NR012459 and was supported in part by the Intramural Research Program of the NIH, National Institute on Aging. The HABC genome-wide association study was funded by NIA grant 1R01AG032098- 01A1 and genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University, contract number HHSN268200782096C.
This research used the ALICE and SPECTRE High Performance Computing Facilities at the University of Leicester.