1. Summary
An important aspect of age-related research is to find proteins in human blood that can be used to track physiological processes of aging. Here, we have used a multiplexed affinity proteomics approach to search for the presence of age-associated levels of proteins in human body fluids. First, serum samples from 156 subjects aged 50-92 years were explored using a comprehensive bead array assay including 7,258 antibodies. We identified 16 age-associated profiles (adjusted P < 0.05) and followed up on the most significantly age-associated profiles in eight additional study sets (n = 4,044 individuals) analyzing both serum and plasma. As a result of a meta-analysis and antibody validation, we found that levels of histidine-rich glycoprotein (HRG), a plasma glycoprotein produced by the liver, consistently increased with age (P = 5.37 × 10-6). Higher levels of HRG also increased the risk of mortality during about 8.5 years follow-up (interquartile range = 7.7-9.3) after blood sampling at a hazard ratio = 1.25 per standard deviation (P = 6.45 × 10-5). Our multi-cohort affinity proteomics analysis found that blood levels of the multi-purpose HRG were associated with age and all-cause mortality. This combination suggests that elevated HRG levels could serve as an accessible molecular indicator for accelerated aging.
2. Introduction
Aging is the single most dominant risk factor of common diseases of elderly and death in the human population (López-Otín, Blasco, Partridge, Serrano, & Kroemer, 2013). Molecular insights on aging could enable direct identification of future treatments for various diseases and would increase our understanding of longevity and related mechanisms. However, many of the underlying molecular processes and changes in humans still remain poorly understood (López-Otín et al., 2013). Aging is most often studied using animal models or cell lines (López-Otín et al., 2013), despite the vast differences in lifespans (2 weeks to 100 years) (Gorbunova, Seluanov, Zhang, Gladyshev, & Vijg, 2014). Findings from these model organisms should preferably be translated into studies on humans. Recently rejuvenation factors were found in mouse blood (Katsimpardi et al., 2014), which suggest the potential to find aging governing molecules in human blood. Some proteomics studies on aging showed several candidates as age or mortality predictors (Barron, Lara, White, & Mathers, 2015; Jylhävä, Pedersen, & Hägg, 2017). In this study, we aim to search for novel proteins aging predictor associated with both age and mortality. This poses a challenge to the analytical methods in terms of sample complexity and availability.
To study proteins at a wider scale, there are currently two major technological concepts available for measuring the proteome: mass spectrometry (MS) and affinity-based proteomics (Ayoglu et al., 2011). Both approaches have been used to study the plasma proteome (Schwenk et al., 2017), offering a unique window into human health and diseases. Even though affinity proteomics has suffered from a lack of binding reagents to the proteome (Stoevesandt & Taussig, 2012), antibody resources such as the Human Protein Atlas (HPA) (Uhlén et al., 2015) or aptamer-based platforms (Emilsson et al., 2018), now offer the possibility to apply affinity proteomics for broader discovery projects. The capacities to conduct near population-based studies implementing also genome-wide association studies (GWAS) is very attractive, which has been demonstrated using different plasma proteomics assays (Johansson et al., 2013; Melzer et al., 2008; Suhre et al., 2017). Several interesting connections between proteins and genetic variants in human have recently been identified in large scale cohort studies (Emilsson et al., 2018; Sun et al., 2018). The two-omics approach have indeed provided novel insights on the links between the distant molecular systems, but also indirect validation to mitigate uncertainty in the molecular assays.
Utilizing the in-house developed antibody assays based on the suspension bead arrays (Schwenk, Igel, Neiman, et al., 2010) we aimed to profile serum and plasma from a large number of individuals, studying the changes in age-related plasma protein levels. Our strategy was to explore, filter and rank plasma profiles associated with age in extended sets of samples and to confirm antibody selectivity by applying different validation assays (Supporting Figure 1).
3. Results
We analyzed human serum and plasma samples of 4200 subjects for blood proteins associated with age. Using affinity proteomic assays initially based on a large set of antibody reagents, protein profiling was conducted with nine sets of samples to determine the most consistent age associations. Our aim was to describe proteins present in serum and plasma that could serve as indicators of biological age to increase our understanding of age-related phenotypes.
Screening for age-associated profiles
Age-associated protein profiles were first investigated in a set of 156 human subjects selected in 5-year age intervals from a Swedish twin cohort using our proteomic assays (Table 1). The sex-matched samples from the study set included 30 monozygotic (MZ) twin pairs (age 50-70). The average intraclass correlation within pairs of antibody profiles was small (ICC = 0.13) so twins were treated as unrelated. Minimal effects of the twin relationship were corroborated by a linear mixed model that considered the dependency.
Assays using a total of 7,258 HPA antibodies were applied to profile age-associated proteins in serum. For this screening, target inclusion criteria were purely dependent on availability of antibodies but not due to their target antigens (Byström et al., 2014). This set of antibodies comprised targets from 6,370 protein-encoding genes (about 32% of the non-redundant human proteome) and profiles were obtained using antibody suspension bead arrays (Drobin, Nilsson, & Schwenk, 2013), which provided up to 384 profiles on 384 samples per batch. The acquired data was preprocessed through quality control including outlier removal and normalizations to account for experimental variation across individual samples, assay plates and data batches (details in Experimental procedures). Linear regression models were then used to determine the protein profiles that changed monotonically with increasing age. The models revealed 16 out of 7258 (adjusted P < 0.05) protein profiles that were age associated when screening sera of individuals at the age of 50 to 92.
Study sets to replicate and confirm the discovered age-associations
Next, we aimed at following up on this screening phase and focused on validating the most significant indications, accepting that other previously published age-associated proteins will not be included. Because the age range of set 1 covered life expectancy in Sweden (84 years for women and 80 years for men in 2015 (Statistics Sweden, 2016)) we considered it more likely that findings from this set might be related with accelerated ageing and mortality. Concentrating on the three most significant findings from set 1 (Table 2), we investigated eight additional sample sets (set 2-9) (Table 1). Out of 4,044 subjects in total, 829 subjects were from non-diseased control groups. The entire set of subjects were from 3 to 93 years old at blood draw. Blood samples had been prepared either as serum or plasma (Table 1, Supporting Figure 2), hence differences originating from these preparation types were likely. Twelve sera in the otherwise plasma sample set 2 were not included in the meta analyses. Sample sets 6 to 9 (729 subjects) were from four independent studies (Table 1)(Baldassarre et al., 2010; Gabrielson et al., 2017; Odeberg et al., 2014; Samnegård et al., 2005). Two other sample sets (set 4 and 5) included 100 subjects that were selected for cancer-related studies and derived from the same twin cohort as set 1 (Lichtenstein et al., 2002; Magnusson et al., 2013). Besides one single subject, there was no overlap between these and the individuals analyzed during the discovery. The sample set 3 (2999 subjects) was again chosen from the population-based twin cohort (Lichtenstein et al., 2002; Magnusson et al., 2013), in which disease status was not considered during recruitment. Almost all (98.5%, all except forty-four) subjects were not included in sample set 1.
Discovery and confirmation of age-association of HRG in serum and plasma
Protein profiles were generated by using antibody bead arrays, as this platform allows to combine antibodies towards different targets whilst consuming only minute amounts of samples. Our analysis revealed consistent age-associated trends for HPA045005 across the eight sample sets for replication (Supporting Figure 3). Accounting for differences in age ranges and spectra, the combined effect of age on HPA045005 in the 9 sample sets was estimated using a random effects model and showed consistent association across sample sets (meta-analysis, P = 5.37 × 10-6, Figure 1). Less prominently, the increasing trends of the second and third ranked candidate profiles generated by HPA039928 and HPA029931 could not be replicated as consistently in all study sets (Supporting Figure 3).
Focusing on HPA045005, of which the profile was most significantly associated and replicated, we investigated if the described protein levels in the circulation were controlled by genetic components. Employing GWAS with sample set 3 (N = 2592), a locus in chromosome 3q27.3 was solely found to be associated with the antibody profile (P < 1.13 × 10-9 = 0.01 / 8,833,947) among ~8.8M genetic variants imputed from or genotyped by Illumina BeadChip for >700K single nucleotide variants (Figure 2A). The locus spans two genes, FETUB and HRG, in the human genome (Figure 2B). The most significantly associated genetic variant in the locus was the single nucleotide polymorphism (SNP) of rs9898 (P = 2.35 × 10-97, minor allele frequency (MAF) = 0.32, increment per number of minor allele (β) = 0.15), which is a missense variant that induces the amino-acid sequence change from proline to serine in the histidine rich glycoprotein (HRG). Among the top 99 genome-wide significant SNPs (P < 1.13 × 10-9) having RefSNPs (rs) number, four SNPs (rs1042464, rs2228243, rs10770) including rs9898 are non-synonymous and two SNPs (rs3890864 and rs56376528) are located near (<2kbp) to transcription start site (Supporting Table 1). All are located in exons or upstream of HRG. This GWAS result indicates that the antibody describes the protein levels of HRG, a protein secreted by the liver and found in abundance in blood (Morgan, Koskelo, Koenig, & Conway, 1978). Associations of plasma HRG levels to SNPs have also been found in previous plasma profiling studies (Suhre et al., 2017).
Next, we confirmed the binding of HPA045005 to HRG by using a sandwich immunoassay. Beads with HPA045005, an additional anti-HRG antibody (HPA054598), as well as negative controls were combined to detect HRG. When analyzing full-length HRG in dilution series experiments, we found that pairing both HPA045005 and HPA054598 with a biotinylated version of HPA054598 allowed us to detect HRG in a concentration dependent manner (Supporting Figure 4A). Here, the data obtained from both sandwich assay pairs was substantially higher than the internal negative controls. To further elucidate the selectivity of HPA045005 over other proteins, we used a protein microarray (Sjoberg et al., 2012). Besides the antigen used to generate HPA045005, the array contained 12,412 other protein fragments. In this analysis HPA045005 exclusively bound to its corresponding antigen (Supporting Figure 4B). This indicated that the antibody does not generally cross-react in an unspecific manner, which points at a selective recognition of abundant HRG in serum and plasma. This molecular analysis supports the GWAS findings that there is an affinity of HPA045005 to the secreted liver HRG, when using the antibody in bead-based assays for the analysis of serum and plasma samples.
Mortality association and prediction
Finding the age-association of HRG lead us to study further age-related indications. We therefore accessed the Swedish death registry that listed information whether the subjects were still alive or not with a follow-up time of ~8.5 years after donating the serum samples. We chose the largest sample set of the subjects at mid to old ages (sample set 3, N=2973, 48-93 years old) that also revealed the age-association of HRG. To test for all-cause mortality, a Cox proportional hazards model with age as the time scale was used. The Cox model was further adjusted for the effects of sex revealed that the HRG levels were significantly associated with the mortality during follow up (inter-quartile range (IQR) = 7.7-9.3; P = 1.13 × 10-4). As protein levels increase with age, the HRG value was standardized using a linear regression model for age and age squared for each sex, followed by scaling in order to account for the linear and quadratic effects of age and to let the hazard ratio (HR) quantifiable that was estimated by the survival model. The hazards model using the standardized HRG value affirmed the association (number of deaths=362, P = 6.45 × 10-5), estimating that the risk of all-cause mortality increased 1.25 times per standard deviation (SD) of the HRG values compared to persons at the same age and sex. A Cox model stratified by sex suggested stronger association in women (N = 1602, deaths = 160, P = 2.13 × 10-5, HR = 1.35 per SD) than in men (N = 1371, deaths = 202, P = 0.059, HR = 1.15 per SD). The comparison of the extreme subsets with standardized HRG levels of the upper and lower quartile demonstrated that the difference of median age at death was 1.8 years in favor of the bottom quarter (P = 3.87 × 10-3, HR = 1.54, Figure 3). The difference was 1.9 years in men (86.9 years vs. 85.0), while 0.6 years in women (89.6 vs. 89.0; Supporting Figure 5). Potential influence of general inflammation on survival were also tested by models including clinically measured C-reactive protein (CRP) values. As for HRG, two Cox models were fitted to 1) CRP and 2) age-adjusted CRP. The latter was obtained using same linear model as HRG including the adjustment for same covariate. The outcome of P = 0.024 (N = 2971, HR = 1.07 per SD) and P = 0.023 (HR = 1.01) was far less significant than for HRG. Next, we included CRP as a covariate in the Cox model for HRG in order to determine if inflammation in general would have an influence of HRG-related mortality. The resulting CRP-adjusted HRG association reduce the significance only from P = 6.45 × 10-5 to P = 1.05 × 10-4. (HR = from 1.25 to 1.24 per SD) We also confirmed that none of the hazard models violated the proportionality assumption of the Cox model. The predictive power of HRG levels together with age for all-cause mortality was tested using a Cox model with the time from sample collection as the time scale. The Harrell’s C-index of the model was 0.766.
4. Discussion
HRG, a multi-functioning protein in plasma
We analyzed the age-association of proteins with antibody-based assays in blood prepared as serum or plasma, and found increasing levels of HRG to be consistently associated with age. GWAS and sandwich assays demonstrated that HRG was the protein captured by the antibody in the exploratory discovery antibody assay.
According to mRNA sequencing data of human tissues, HRG is exclusively expressed in liver (Uhlén et al., 2015). HRG has been known and described as an abundant protein in human blood plasma (Morgan et al., 1978; Poon, Patel, Davis, Parish, & Hulett, 2011). It has been characterized to interact with diverse molecules including heparin, immunoglobulin G (IgG), Zn2+, and complement components (Poon et al., 2011). HRG in plasma is known be involved not only in immune response toward foreign substances and clearance of dead cells, but also in vascular biology including anti-coagulation (Poon et al., 2011). HRG levels have previously been correlated and linked to blood type and age (Drasin & Sahud, 1996), but because of its molecular composition and abundance, HRG has also been assigned to many different biological processes. Hence it is yet difficult to pinpoint and postulate the most plausible mechanism of increasing HRG levels in the process of aging.
In a genetic study of activated partial thromboplastin time (APTT), Houlihan et al. observed that the minor T allele of the rs9898 associated with shorter APTT, suggesting elevated risk of thrombosis in these individuals (Houlihan et al., 2010). The rs9898 was the SNP most significantly associated with HRG levels in sera of sample set 3. We observed that, as for APTT, the HRG levels increases with the number of T alleles of the SNP. Houlihan et al. proposes a potential interaction between HRG and thrombosis, hence thrombosis could be a possible mediator between HRG and risk of mortality. Notwithstanding, no single genetic variant around HRG reached genome-wide significance for the mortality risk in a study including the TwinGene cohort (Ganna et al., 2013), which served as sample source for several study sets.
Limitation and variation
Our study was cross-sectional and the ages of the participants covered those age ranges of average lifespan in many of the profiled sample sets, including both serum and plasma. There is still limited information available to interpret the trend of increasing of HRG levels as advanced age reflected longitudinal change within individual subjects. On the other hand, those gradual alterations were repeatedly observed in multiple independent study sets derived from different Swedish cohorts, which provides strong indications that the association of HRG with age was confirmed. As we also found that the elevated level of HRG comparing to same-aged peers was correlated with higher risk of mortality, the age-dependent diversity may imply a time-wise transition along individual ages, possibly biological ages. We observed variation in the degree of the age-dependent transition, which is visible in Figure 2. To some extent, the variation can be explained by the shift of signal range in each assay, which was primarily developed to screen for possible associations and not standardized to determine absolute abundance levels. Seeing that the estimated slopes from sample sets 2, 3, and 9 were relatively lower than the values from the other sets, some parts of the variation might originate from the difference in age range and sample source, collection and selection procedures. For example, the individuals in the sample sets 2, 7 and 9 were substantially younger (median age 40, 52 and 54 years, respectively) compared to all others (~65 years old). The sample sets 2, 3, and 9 were near population-based, while the others were healthy individuals except those in the sample set 1, in which older women and men were overrepresented due to same number selection per age group.
Significance as a potential predictor of mortality
Several indicators in blood were found predictive for mortality risk in previous studies and Barron et al. (2015) showed three markers, CRP, N-terminal pro brain natriuretic peptide (NT-proBNP), and white blood cell (WBC) count, were statistically significant in meta-analyses (Barron et al., 2015). The HR estimate of HRG in this study (1.54 between top and bottom quarters) was comparable with the combined estimate in the meta-analysis (1.42 for CRP, 1.43 for NP-proBNP, and 1.36 for WBC count). In a previous study of CRP for mortality risk in similar follow-up duration (median 8.9 years) to this study, HR per SD of CRP was estimated 1.18, which is slightly smaller than our estimate of 1.25 from HRG (Schnabel et al., 2013). Comparing with the questionnaire-derived measures examined in Ganna and Ingelsson’s study, HRG (C-index = 0.766 with age) marginally outperformed the top predictors (max C-index = 0.74 including age) in the extensive population-based mortality study (Ganna & Ingelsson, 2015).
Other recent affinity proteomics approaches have also shown age related signatures of aging, highlighting GDF15 as well as other proteins of coagulation system (Tanaka et al., 2018). While that study acknowledged the need for further validation, we have conducted extensive effort to confirm our observations across many different cohorts. Our strategy, on the other hand, was not to include previously known age-related proteins, hence did not shortlist antibodies that could be useful for screening plasma for these markers by using our method.
In conclusion, an increased level of HRG in serum or plasma of older humans was discovered and adequately replicated in multiple sample sets by affinity proteomics. Appropriate molecular approaches were employed to characterize the identity of the discovered protein profile that paved the way to develop targeted assays for expanding the analysis of our primary data. The supporting evidence of HRG serving as a predictive indicator for all-cause mortality in several years suggests HRG in blood as an aging indicator.
5. Experimental Procedures
Cohort design and sample selection
a) Sample set 1 from TwinGene
A population wide collection of blood from 12,614 twins born between 1911-1958 has been undertaken in a project called TwinGene. The primary aim of the TwinGene project has been to systematically transform the oldest cohorts of the Swedish Twin Registry (STR) into a molecular-genetic resource (Magnusson et al., 2013). From 2004 to 2008, a total of 21,500 twins (~200 twin pairs per month) were contacted by the invitation to the study containing information of it and its purpose, also consent forms and health questionnaire. The study population was limited to those participating in the Screening Across the Lifespan Twin Study (SALT) which was a telephone interview study conducted in 1998-2002 (Lichtenstein et al., 2002). Other inclusion criteria were that both twins in the pair had to be alive and living in Sweden. Subjects were excluded from the study who had declined to participate in future studies or been enrolled in other STR DNA sampling projects. When the signed consent forms returned, blood-sampling equipment was sent to the subjects, who were asked to visit local health-care facilities on the morning, after fasting from 20:00 the previous night, from Monday to Thursday and not the day prior to a national holiday. This was to ensure that the sample tube would be delivered to the Karolinska Institutet (KI) Biobank by the following morning by overnight mail. After arrival, the serum was stored in liquid nitrogen.
The contribution for sample set 1 of serum samples from the TwinGene study consisted of: A) samples from 96 unrelated twins distributed in groups of 12 subjects (6 males and 6 females) in each age strata 50, 55, 60, 65, 70, 75, 80 and 85 years of age. The width of the age intervals was approximately +/-3 months, and B) samples from 60 MZ twins (30 complete pairs) distributed in groups of 12 (3 male pairs, 3 female pairs) in each age strata of 50, 55, 60, 65 and 70 years of age. The width of the age intervals was approximately +/-3 months.
b) Sample set 2 from LifeGene
Life Gene is a prospective cohort study that includes collection of plasma and serum, tests of physical performance, as well as questionnaire responses regarding a wide range of lifestyle factors, health behaviors and symptoms (Almqvist et al., 2011). Participants respond to a web-based questionnaire and book time for a visit to a LifeGene test center, at which blood samples are taken. EDTA plasma was processed at the test center as follows: the EDTA tube with a gel plug was centrifuged, put into -20°C prior to shipment in a cold chain. All samples were sent to KI Biobank for further separation into aliquots in REMP plates and frozen at -70°C. All participants or, in the case of children under the age of 11, their guardians, provided signed consent.
The sample set 2 cohort consisted of 5 male and 5 female samples randomly chosen from each of the ages <5, 10, 15, 20, 25, 30, 35, 40, 45, 50 and 55 (+/-3 months). For 12 participants, serum was also available.
c) Sample sets 3, 4, and 5 from TwinGene
Sample sets 3, 4, and 5 were selected from the same cohort, TwinGene (Magnusson et al., 2013), as for sample set 1 (described above). Out of 132 microtiter 96-well plates for storage of TwinGene samples, the twelve plates having the largest age span (>20 years) among samples in a plate and another randomly chosen twenty plates having enough number of samples (>91) were selected. Sample set 3 consisted of the three thousand samples in the selected 32 storage places. The data of one individual was removed in the analyses because age of the subject is missing. Independently from the sample selection, sample sets 4 and 5 were age and gender matched controls for breast and prostate cancer studies, respectively. The mortality data was obtained by connecting individuals in TwinGene to the data in the Swedish tax authorities by personal identification number. The data was updated on 2015-01-10. Clinical blood chemistry assessments of hs-CRP of the samples in TwinGene was performed using Syncron LX System (Beckham Coulter).
d) Sample sets 6 to 9
The sample sets 6 to 9 are described in Supporting Text.
e) Ethics
All the studies were approved by the Ethics Board of the correspondent hospital or institution, and conducted in agreement with the Declaration of Helsinki. The ethical approval document numbers are 2007/644-31/2 for TwinGene, 2009/615-31/1 for LifeGene, 03-115 and 2017/404-32 for IMPROVE, 95-397 and 02-091 for SCARF, EPN 2009/762 and LU 298-91 for CHAPS, and 2010/958-31/1 for Karma. All subjects, or their guardians, provided their informed consent to participation in individual studies.
Data acquisition - Assay design and SBA procedure
All 372 samples from sample set 1 and sample set 2 together were randomly allocated into wells in four 96-well plates. One sample from sample set 1 and one from sample set 2 were loaded into two more wells as a repeated control within a plate. Another sample in each cohort was transferred to two more wells of two different plates as a control to examine inter-plate variation. The data of each of those 4 samples was combined by taking mean of three measures. All the human materials were biotinylated together with four negative controls that contained only buffer. For the entire 19 assays for discovery stage, the samples were labeled two times.
The selected antibodies were divided into collections named bead arrays (BA) so that each BA consisted of 384 antibodies including positive and negative controls, anti-albumin and no antibody, respectively. For discovery, the selection of the affinity binders for one BA was determined by technical reasons such as the available amount. Every antibody in an BA was coupled with beads with a different colour code as detailed together with the assay procedure in the Supporting Information and as described earlier (Drobin et al., 2013).
Quality control and preprocessing
Because an aliquot of mixed bead solution was suspended into each sample, all values of the samples that were seemingly failed within an assay (with 384 antibodies) were discarded rather than one measure of a sample for an antibody. These were samples 1) that had median bead counts lower than twenty, 2) that had median values of MFIs lower than the median of the negative controls (buffer only) in the same plate and assay, and 3) that were detected as an outlier by robust PCA using ‘rrcov’ R package (version 1.3-4)(Hubert, Rousseeuw, & Branden, 2005). The cutoff probability values in an outlier diagnostic plot were 0.025 for both score and orthogonal distance coordinates. Those deviating beyond the cutoffs in both coordinates were classified as outliers, setting alpha, the proportional tolerance, to 0.9.
The human samples were of two different types in terms of preparation method, plasma and serum. The two blood preparation types showed considerable dissimilarity, which was expected (Supporting Figure 2)(Schwenk, Igel, Kato, et al., 2010). Since such contrast was not of our research interest here, the data was split by the sample preparation type after quality control.
In order to minimize the underlying sample-wise fluctuation of individual sample such as total concentration and dilution, probabilistic quotient normalization (PQN) was applied (Dieterle, Ross, Schlotterbeck, & Senn, 2006). It normalizes the data assuming 1) that the majority of measures of a sample reflect the fluctuation and 2) that it is not associated with our interests, age and gender. The variation across samples in the 96-well plates was minimized by Multi-MA method, with the assumption that the mean of observed values for each antibody within a plate is same as those of the other plates (Hong, Lee, Nilsson, Pawitan, & Schwenk, 2016) . The means of log-transformed measures within each plate were positioned in a 4-dimensional space, in which each axis corresponded to one plate. The vector that goes through origin and (1, 1, 1, 1) is named A. The projection of each point onto the A axis is computed. All values for the point were shifted as much as the element vector on the corresponding plate axes of the projection. Limited for the analysis using both samples sets 1 and 2 for pilot investigation, the difference of individual antibody profiles between two blood preparation types was adjusted by the data of serum samples to have same mean and variance as plasma samples in the overlapping age range (50-60) trimming the upper and lower 5% to avoid the effect of outliers.
Data acquisition of replication sample sets
Data of other replication samples (sample sets 3-9) were acquired using the same protocol with a few variations. For each original study for sample sets 4-8, the samples were distributed into plates together with patient samples. The 383 other antibodies selected for each of the intended studies were included in the assays. Experiment and data preprocessing were conducted together with those additional samples and antibodies. Data of disease-free controls and for HPA045005 were extracted from the processed full data sets. Likewise, we obtained the data of the other top 2 candidates shown in Table 2.
Genome-wide association study
Genomic DNA from all available dizygotic twins and one member of each monozygotic twin pair were genotyped by using Illumina OmniExpress BeadChip (700K). Genotyping QC exclusion criteria: genotypic or individual missingness > 0.03, minor allele frequency (MAF) < 0.01, Hardy-Weinberg equilibrium (HWE) P < 10-7, sex mismatch, heterozygosity (individuals with an F-statistic beyond ±5 SD from the sample mean), or cryptic relatedness. The 1000 Genome reference panel (GRCh 37/hg 19, Phase 1, version 3) was used for imputation, by using Mach 1.0 and Minimac.
After genotype antibody-profile match, GWAS was performed among 2592 twins by using PLINK. Analyses were restricted to autosomal SNPs with imputation quality (info or r2) higher than 0.4. The first four principal components were used as covariates in the linear regression model to control population stratification. The ‐‐within option in PLINK was used to statistically adjust for relatedness (complete dizygotic twin pairs). Manhattan and quantile-quantile plots were drawn by using qqman package in R 3.4.1.
The mutation types of the associated SNPs were obtained from UCSC table browser (https://genome.ucsc.edu) using human ‘GRCh38/hg38’ assembly and ‘snp150Common’ (dbSNP build 150, ≥1% MAF) table, which was accessed on 2018-09-20.
Other experimental details
Experimental details on antibody selection, bead array assays, sandwich immunoassays, and protein microarray analysis are available in Supporting Text.
Statistical analysis
The preprocessed intensity data was log-transformed ahead of following analyses. To control family-wise error rate, Bonferroni method was employed in adjusting P-values unless otherwise specified. The linear association of an antibody signal with age was tested with ordinary linear regression using R. The meta-analysis was conducted using the inverse variance method with between-study variance estimated by DerSimonian-Laird model (DerSimonian & Laird, 1986), which was implemented in the R-package “meta”. We used a linear mixed model to address the correlation between twins where the response variable was the normalized antibody measurement and age was a fixed covariate. This model was performed using the R-package “lme4”. For the association test for mortality, Cox proportional hazards model was fitted to the survival data with age as the time-scale and right censoring at the age on the updated date of death information (Thiébaut & Bénichou, 2004). In the survival analysis for two group comparison, the subjects in sample set 3 were divided into two groups, top and bottom quarters by the standardized HRG values, which were the scaled residuals of linear model where the normalized MFIs of HRG were regressed on age and age squared for women and for men, separately. The hazard models were adjusted for sex if applicable and for CRP as described above. The proportionality assumption of the models was tested using Schoenfeld residuals (Grambsch & Therneau, 1994). Survival analyses including computation of Harrell’s C-index (Harrell, Califf, Pryor, Lee, & Rosati, 1982) were conducted using the R package “survival”.
7. Author Contributions
M-G.H. and J.M.S. designed the study. T.D-C., K.D., and R.S. acquired the proteomic data and performed the analyses of the data. M-G.H. and X.C. analyzed GWAS data. M-G.H., T.D-C., W.L., Y.P., S.H., P.K.E.M., and J.M.S. performed statistical analyses. J.O., A.H., A.S., P.H., N.L.P., and P.K.E.M. generated phenotypic data and contributed the samples for present study. All authors were involved in writing and reviewing the manuscript.
9. Supporting Information
Supporting Text - Materials and methods
Supporting Table 1. The HRG associated SNPs that are non-synonymous or located near to transcription start site.
Supporting Table 2. Statistics for survival analyses presented in Figure 3 and Supporting Figure 5
Supporting Figure 1. Study design
Supporting Figure 2. Difference between serum and plasma in sample sets 1 and 2
Supporting Figure 3. Protein profiles of top 3 proteins in every sample set
Supporting Figure 4. Additional results for molecular target
Supporting Figure 5. Survival curves for women and men, comparing two extreme quarters by HRG levels
6. Acknowledgments
We like to thank Camilla Björk and Jens Mattsson from MEB at the Karolinska Institutet, everyone in the Affinity Proteomics group at SciLifeLab, and especially Claudia Fredolini, MariaJesus Iglesias, Matilda Dale, Sanna Byström, Martin Zwahlen, Björn Forsström, Björn Winckler, and Philippa Pettingill for supporting this work. We also thank the entire staff of the Human Protein Atlas for their efforts, Hanna Tegel and Johan Rockberg and their team for providing the recombinant HRG protein. This work was supported by ProNova VINN Excellence Centre for Protein Technology (VINNOVA, Swedish Governmental Agency for Innovation Systems), the Knut and Alice Wallenberg Foundation, and Science for Life Laboratory. We acknowledge The Swedish Twin Registry for access to samples and data. The Swedish Twin Registry is managed by Karolinska Institutet and receives funding through the Swedish Research Council under the grant no 2017-00641. LifeGene was supported by grants from the Swedish Research Council, Torsten and Ragnar Söderbergs Foundation, Stockholm County Council, and AFA Försäkringar.
The authors declare no conflict of interest.
Researchers interested in using STR data must obtain approval from a Swedish Ethical Review Board and from the Steering Committee of the Swedish Twin Registry. Researchers using the data are required to follow the terms of an agreement containing a number of clauses designed to ensure protection of privacy and compliance with relevant laws. For further information, contact Patrik Magnusson (Patrik.Magnusson{at}ki.se).