Abstract
Markers of biological ageing have potential utility in primary care and public health. We developed an elastic net regression model of age based on untargeted metabolic profiling across multiple platforms, including nuclear magnetic resonance spectroscopy and liquid chromatography-mass spectrometry in urine and serum (almost 100,000 features assayed), within a large sample (N= 2,239) from the UK occupational Airwave cohort. We investigated the determinants of accelerated ageing, including genetic, lifestyle and psychological risk factors for premature mortality. The metabolomic age model was well correlated with chronological age (r = 0.85 in independent test set) and DNA methylation age. Increased metabolomic age acceleration (p < 0.05) was associated with high alcohol use, overweight or obesity, low income, and depression. We also observed increases in DNA methylation age acceleration associated with anxiety, post-traumatic stress disorder and low income that were of a greater size than for metabolomic age acceleration.
Introduction
Ageing can be defined as the “time-dependent decline of functional capacity and stress resistance, associated with increased risk of morbidity and mortality” 1. Environmental stressors, including social adversity 2,3, psychological disorders 4,5, and genetic factors6 may influence the ageing process, leading to differing ageing rates. Traditionally, quantitative assessment of “the rate of ageing” relies on the analysis of mortality curves of populations. However, at the level of a living individual, this method does not allow assessment of the state of ageing (i.e. the state of the functional decline) and a prediction of the risk of morbidity and remaining life expectancy. Therefore, markers of ‘biological age’ (the ageing state typical of one’s chronological age) that can be assessed at any point in the lifespan therefore, may have enormous potential in both personalised medicine and public health. Since ageing is a process that affects almost all tissues and organs of the body and involves cross-talk between multiple physiological systems, there has been increased research into composite markers of ageing, involving multiple parameters 7. Levine 8 employed 10 biomarkers representing multiple systems to develop a biological age score, that could better predict mortality than chronological age. Belsky et al. 9 used a similar selection of biomarkers measured longitudinally in young adults to develop a biological age score and found that increased pace of ageing was associated with measures of functional decline such as cognitive ability. However, a synthetic indicator of biological age is still lacking. Modern ‘omics’ platforms have provided new opportunities for the systematic assessment of biological ageing. For example, Horvath 10 and Hannum et al. 11 employed genome-wide DNA methylation to develop highly predictive models of age based on multiple methylated CpG loci. Furthermore, it has been shown that ‘age acceleration’, defined as having a greater DNA methylation age than chronological age, is associated with multiple risk factors of mortality such as low social class, smoking, and alcohol use 3 and is predictive of mortality 12,13. Agnostic metabolomics is a promising candidate technology to develop biomarkers of ageing. Several metabolomic studies have found strong associations between numerous metabolites and age, although in a limited sample size 14,15 or through employing targeted analyses that give limited coverage of the full metabolome 16 17,18. Only the study of Hertel et al. 18 combined a small set of markers to provide an overall assessment of biological ageing, observing that the predicted metabolomic age was associated with time to death, after adjustment for chronological age and other risk factors.
In the present study, we have employed untargeted metabolomics across multiple analytical platforms, providing unprecedented metabolome coverage (almost 100,000 features assayed), to develop a predictive model of age, within a large sample from the UK occupational Airwave cohort. A second cohort was used for longitudinal validation of selected metabolic age predictors. We explore the relationship between metabolomic age and DNA methylation age and lifespan associated genetic factors. Furthermore, we investigate the determinants of accelerated ageing, focussing on risk factors of premature mortality, and show that psycho-social risk factors including depression and low income are associated with accelerated metabolomic ageing.
Results
Building and validation of the metabolomic age model
The study population included 2,238 participants of the AIRWAVE cohort that had full metabolomic data. 60.5% of participants were male and mean age was 41.24 years (SD: 9.1, range: 19.2 – 65.2 years). Most participants (97.5%) were of white British ethnicity and 27.8% of participants were educated to degree level. The demographic characteristics of this sample are representative of the wider cohort (Elliott et al 2014). Further covariate information is provided in table 1. Metabolomic data were acquired from both urine and serum samples using multiple Nuclear Magnetic Resonance Spectroscopy (NMR) and Ultra-Performance Liquid Chromatography -Mass Spectrometry (UPLC-MS) platforms, providing in total nine different metabolomic data types (table s1). For purposes of constructing the main predictive model of age through elastic net regression, these data types were combined into one metabolomic dataset, giving a total of 98,824 metabolic features.
In the first stage of model building, an analysis by metabolomic platform (sequentially leaving on platform out each time) indicated that predictive performance (minimisation of mean squared error in 10-fold cross validation) was improved through using only the four following platforms (figure s1): Bruker IVDr Lipoprotein Subclass Analysis derived from NMR in serum (“sBiLISA”), lipid-targeted reverse-phase UPLC-MS in positive mode in serum (“sLPOS”), reverse-phase UPLC-MS in positive mode in urine (“uRPOS”) and hydrophilic interaction UPLC-MS in positive mode in urine (“uHPOS”) to give a total of 28941 metabolic features. The final predictive model selected 525 predictors from across this set (see table s2 for list of predictors along with table s3 annotation information), including 8 lipoprotein subclasses from sBiLISA and 219, 104 and 194 features (retention time-m/z pairs) from the sLPOS, uHPOS and uRPOS platforms respectively. The model predicted age with high accuracy (mean absolute error, (MAE) = 1.47 years) in the building data set (80% of data n= 1790), with a correlation between chronological age and predicted age of 0.96 (figure 1a). When this model was applied to the independent validation dataset, consisting of the remaining 20% of study participants (N = 448), the MAE was 3.80 years and the correlation between predicted age and chronological age was 0.85 (figure 1a).
Pathway enrichment analysis, using the Mummichog algorithm performed across the UPLC-MS derived model predictors, identified enrichment (p < 0.05) in eleven metabolic pathways (table 2): Vitamin E metabolism, Tryptophan metabolism, CoA Catabolism, Urea cycle/amino group metabolism, Lysine metabolism, Carnitine shuttle, Vitamin B5 - CoA biosynthesis from pantothenate, Biopterin metabolism, Drug metabolism - cytochrome P450, Tyrosine metabolism, and Aspartate and asparagine metabolism.
We examined concentration changes of nine metabolites included in our age prediction model, that were available in an independent cohort, the Northern Finnish Birth Cohort 1966, that had serum NMR metabolomic data measured at two ages, 31 and 46 yrs, among 2144 individuals. Eight of these metabolites (89%) changed significantly with age, in the same direction as predicted in the metabolomic age model (table 3).
Metabolomic age, DNA methylation and genetic predictors of longevity
DNA methylation age was assessed for 1102 participants. Demographic characteristics for this sample were similar to those for participants with metabolomic age available (table s4). DNA methylation age predicted chronological age with a MAE of 4.37 years. DNA methylation age was strongly correlated with chronological age (r=0.91, figure 2a) and metabolomic age (n = 837, r = 0.85, figure 2b). Age acceleration scores were derived for both DNA methylation age acceleration (DNAmAA) and metabolomic age acceleration (mAA), as the difference, at a given age, between actual and predicted age. However, no correlation was observed between DNAmAA and mAA (r = 0.02, figure 2c).
Table 4 shows mean age acceleration scores by genotype for 11 single nucleotide polymorphisms (SNPs), that have robust and replicated associations with lifespan 6. Directions of association between genotype and mAA that were consistent with effects on lifespan (e.g. age acceleration increases with number of effect alleles associated with shorter lifespan or visa versa) were observed for SNPs in the BSND, TRAIP, FTO and APOC1 genes. Only SNPs in the FTO and APOC1 genes were nominally significant (p= 0.05 for both). For DNAmAA, directions of association with each genotype that were consistent with effects on lifespan were noted for SNPs in genes BSND and APOC1, with a nominally significant association observed for APOC1 (p= 0.03).
Risk factors of age acceleration
In bivariate analyses (table 1) we observed increased mAA (p < 0.05) among participants who were diabetic, heavy drinkers, overweight or obese, former smokers, or were suffering from depression, anxiety or PTSD. Clinical biomarkers associated with mAA included creatinine, total cholesterol, γ -glutamyl transferase (GGT) apolipoprotein B and glycated haemoglobin (%HBa1C). Regarding dietary intake in the week prior to sampling, those who reported high fish consumption and those in the second or fourth quintiles of the DASH score (compared to the first quintile, the least healthy dietary pattern) also had increased mAA. In bivariate analyses with DNAmAA (table s4), sex was associated at p<0.05, with an increase in DNAmAA of 0.89 (interpretable as years of increase in DNA methylation age, 95% CI: 0.47, 1.30) in men compared to women. Clinical biomarkers associated with DNAmAA in bivariate analyses included creatinine, high density lipoproteins, GGT and apolipoprotein A.
Table 5 shows adjusted associations with mAA and DNAmAA for non-communicable disease and psychological risk factors (adjusted for sex, ethnicity, study centre, income, hypertension, diabetes, BMI, smoking, alcohol intake, physical activity, DASH score and fish consumption). We observed significant increases (p<0.05) in mAA with overweight, obesity, heavy drinking, lower income, depressive symptoms and depression, ranging from 0.35 (95% confidence interval (CI): 0.01, 0.69) for low income compared to those with high income, to 0.97 (95% CI: 0.57, 1.37) for obesity compared to those of normal weight. Significant increases in DNAmAA were observed with heavy drinking, anxiety and PTSD, ranging 0.92 (95% CI: 0.03, 1.80) for anxiety compared to those without anxiety symptoms, to 2.15 (95% CI: 0.31, 4.00) for PTSD compared to those who had not experienced trauma in the past six months.
Discussion
In an important proof of principle study, we have demonstrated in a large nationwide cohort study of working age adults that metabolomic profiling may be used to predict chronological age with high accuracy. We employed a wide range of metabolomic platforms to provide the broadest metabolome coverage yet presented in population based studies. We found that metabolomic age acceleration, defined as having a greater predicted metabolomic age than chronological age, was associated with NCD risk factors including low income, overweight and obesity and high alcohol intake. Mental well-being, particularly when assessed through reported symptoms of depression, was also strongly associated with metabolomic age acceleration. We did not observe an association between epigenetic age acceleration and metabolomic age acceleration, suggesting these measures capture separate aspects of the ageing process. We observed a different pattern of risk factors associated with epigenetic accelerated ageing including being male, heavy drinking, anxiety and PTSD.
The correlation between chronological and predicted age, of our measure of metabolomic ageing (r= 0.86 in the validation dataset), was somewhat lower than that of the Hannum epigenetic age clock in our cohort (r= 0.91) but greater than reported for other biological ageing markers, including the measure based on urinary NMR data 18(r = 0.53 in men and 0.61 in women in validation dataset), the blood transcriptomic clock 19 (r = 0.35-0.74 depending on cohort) and telomere length (r ∼ 0.3, 20). Biological ageing markers aim to better capture the body’s rate of decline or physiological breakdown than chronological age itself, and should therefore also be more predictive of mortality and age-related disease. The associations we observed between accelerated metabolomic ageing and factors known to increase risk of mortality, suggest that metabolomic age may capture this physiological decline.
Strong associations with mAA were observed with overweight and obesity. These conditions are forms of metabolic dysregulation and their additional metabolic burden may increase the rate of decline of the metabolic systems of the body. Genetic predisposition to longevity is associated with low levels of abdominal visceral fat 21 and many different conditions that prolong lifespan in animal models also improve obesity-related conditions. Furthermore, obesity has been linked to telomere shortening, and drastic measures to combat morbid obesity like bariatric surgery can actually cause a recovery in telomere length 22. Much is now known about the ageing process at the molecular level primarily from experimental work. López-Otín et al.23 proposed nine ‘hallmarks of ageing’ that may all be expected to have detectable effects on the metabolome and overlap significantly with the effects of metabolic disorders 24. For instance, the hallmark ‘deregulated nutrient signalling’ refers to pathways that sense and respond to nutrient availability such as “insulin and IGF1 signalling” (IIS) pathway, which is altered among diabetics.
We observed multiple metabolomic pathways enriched among the predictors of our metabolomic clock that reflect fundamental metabolic processes and are closely related to these hallmarks. We observed enrichment of the pathway related to metabolism of Vitamin E, a potent anti-oxidant and anti-inflammatory agent that protects cell membranes from oxidative damage that can induce genome instability 25. As a primary hallmark, genome instability has far-reaching and complex consequences including altered nutrient sensing, energy metabolism and redox balance 26. Many human progerias are disorders accompanied by the hyperactivation of DNA repair machinery dependent on nicotinamide adenine dinucleotide (NAD+). This hyperactivation leads to NAD+ depletion, resulting in inhibition of the NAD+-dependent nutrient sensor sirtuin 1 (SIRT1) 27. Levels of NAD+ are also affected by other factors including circadian rhythm disruption, chronic inflammation 28 and tryptophan metabolism (also enriched among the metabolomic clock predictors). The functional impairment of SIRT1 29, which limits expression of nuclear genes encoding mitochondrial proteins, leads directly to the hallmark mitochondrial dysfunction. We observed enrichment among the metabolomic clock predictors of the pathways CoA catabolism, Vitamin B5 - CoA biosynthesis from pantothenate and lysine metabolism, which all maintain acetyl-coA levels necessary for mitochondrial reactions, and carnitine shuttle, which is required for the transport of fatty acids for beta-oxidation in the mitochondria. The enrichment of these pathways suggests the importance of the mitochondrial dysfunction hallmark in our metabolomic ageing model. SIRT1 also contributes to regulating the circadian oscillation of acetyl-coA levels 30 which has been linked to the ageing process 31 and epigenetic alterations through acetylation 32. Mitochondrial fitness further has impact on other ageing hallmarks 24, including genomic instability (dysfunctional mitochondria are major sources of genotoxic ROS), altered intercellular communication (ROS overgeneration is connected to the secretion of inflammatory mediators) and stem cell exhaustion (which are particularly sensitive to ROS 33). The observed enrichment of the urea cycle and aspartate and asparagine metabolism pathways will also result from perturbation to the Krebs and urea cycles following changes in mitochondrial fitness.
The enrichment of the tryptophan, tyrosine and biopterin metabolic pathways appear to relate to the hallmark ‘altered intercellular communication’. Tyrosine is required for signal transduction through incorporation into protein kinases, while tryptophan and biopterin are necessary for synthesis of neurotransmitters including dopamine, norepinephrine, epinepherine, serotonin and melatonin. Alterations to neurotransmitter levels may underlie the associations we observed between mAA with depressive symptoms and depression. Both psychological distress and major depression had similar hazard ratios for mortality in a recent prospective study 4, which would be consistent with the observed increases in mAA for both depressive symptoms and depression. Anxiety was also associated with increased mAA, albeit a smaller increase than observed for depression. This is again consistent with the relative hazard ratios for mortality observed for anxiety and depression in a nation-wide prospective cohort, even after taking suicides and accidental deaths into account 34. While in this cross-sectional study we cannot disentangle the causal direction between depression and mAA, a study of biological ageing among elderly people found that accelerated biological age was associated with depressive symptoms at baseline and was also predictive of depressive symptoms at follow-up 35. Consistent evidence demonstrates a bi-directional association between depression and so-called metabolic syndrome, suggesting common pathological roots 36. Proposed pathophysiological commonalities include abnormal activation of the hypothalamic–pituitary–adrenal (HPA) axis and altered levels of circulating leptin and ghrelin, two peripheral hormones that are classically implicated in the homeostatic control of food intake. A large body of research has investigated the concept of ‘allostatic load’ whereby repeated activation of the HPA axis leads to biological ‘wear and tear’ or physiological decline of downstream metabolic, immune and cardiovascular systems 37. Many studies have demonstrated a link between social adversity 38,39 and allostatic load and it is theorised that chronic stress associated with low socio-economic position leads to prolonged activation of the HPA axis. We observed that lower income was associated with increased mAA, which may similarly be considered to capture physiological decline.
We observed increases in DNAmAA associated with anxiety, PTSD and low income that were generally of greater size than for mAA. Meta-analyses have shown that both PTSD 5 and low socio-economic position 3 to be associated with increases in DNAmAA. We did not observe any evidence for an association between depression and DNAmAA, suggesting the two ageing measures may be sensitive to separate dimensions of mental health. The DNA methylation clock has been shown to perform well as marker of biological age since it is predictive of all-cause mortality, even after adjusting for chronological age and a variety of known risk factors, and is associated with physical measures of ageing such as frailty and cognitive decline 40. However, other biological ageing markers may add value in capturing different aspects of the ageing process. Peters et al. 19 reported that transcriptomic age was only moderately correlated with DNA methylation age and the different measures were associated with different ageing phenotypes. Similarly, Belsky et al. 41 report only weak correlations between telomere length, DNA methylation age, and a composite biomarker-based measure of biological ageing among young adults. While metabolomic and DNA methylation age were correlated in our study, there was no association between mAA and DNAmAA. DNAmAA has been shown to be predictive of cancer related mortality but not CVD 13,40 while the risk factors associated with mAA suggest it may be predictive of cardio-metabolic related disease. Accelerated transcriptomic age was found to be similarly associated with CVD risk factors, although it was not related to mental health 19. Further research into biological ageing may consider combining markers at different levels of biological organisation to provide a more complete picture of the ageing process.
To provide further external validation of the biological ageing markers we explored associations with genetic factors, selected from SNPs found to be robustly associated with lifespan 6. The strongest associations were observed for the FTO and APOC1 genes with mAA and the APOC1 gene for DNAmAA. APOC1 encodes a member of the apolipoprotein C1 family and plays a key role in lipoprotein metabolism and has been associated with multiple are-related disordered including cognitive decline 42, Alzheimer’s disease 43 and heart disease 44. FTO encodes an alpha-ketoglutarate-dependent hydroxylase and is best known for its associations with BMI 45 and obesity 46. While these findings suggest the role of genetic predisposition to longevity in our biological age markers, they require confirmation in larger, independent populations.
This study had some important limitations. The study was cross-sectional, based on a single biological sampling from participants at a wide range of ages over a single sampling period. It is therefore difficult to separate processes relating to the ageing itself from cohort effects associated with the different environment of people at different ages. This is particularly a problem for analyses of the metabolome which contains both endogenous metabolites related to physiological processes such as ageing and short-lived exogenous metabolites related to factors such as diet and medication. Indeed, we observed that fish consumption, which is associated with reduced risk of mortality 47, actually increased mAA, likely due to the confounding of our model by cohort effects. We addressed these points in two ways: Firstly, we validated some of the metabolomic age predictors that were available in an independent cohort at two timepoints (15 years apart) in the early adult life of the same individuals. We found that there were highly significant changes in levels of the majority of metabolites we checked, in the same direction as predicted by the metabolomic age model. Secondly, we adjusted associations with mAA for diet that had been assessed through a food diary in the week prior to sampling. Pathways that were enriched in our model were generally related to physiological processes known to be related to ageing, with the possible exception of the drug metabolism pathway. However, medication history was unavailable in this study.
The second limitation was the use of untargeted metabolomics. This limits the potential to apply the full model in separate metabolomic datasets due to differences in retention time and mass accuracy in different runs of spectral acquisition. Furthermore, full laboratory annotation of all predictors was outside the scope of the present study, and may not even be possible for some predictors without current database matches. However, the aim of the study was to develop an overall predictive model to assess metabolic ageing rather than identify individual predictors. Indeed, the nature of the variable selection method used means that an equally valid predictive model can be built on different sets of predictors. We used the Mummichog pathway analysis tool to extract information at the pathway level, as the algorithm bypasses laboratory annotation based on the assumption that misidentification will apply equally both to the feature set (metabolites included in the age prediction model) and the reference set (metabolites not selected into the model). The tool has been validated in separate datasets that have also undergone full laboratory annotation 48.
The main strengths of this study also relate to its use of untargeted metabolomics. We incorporated a range of MS platforms able to detect both lipophilic and hydrophilic molecules at low concentrations and NMR platforms able to detect larger structures such as lipoproteins that would be destroyed during MS acquisition. We also analysed both serum and urine that contain different sets of metabolites – more lipophilic molecules in serum and more polar molecules that are present at higher concentrations in urine. Together, we were able to assay a large portion of the metabolome that would not be possible with current targeted methods. Other strengths include the incorporation of genomic and DNA methylation data, the wide age range of participants including those in early adult life where ageing interventions may be most effective 49, and the use of validated psychological instruments. Future work will assess the effects of mAA on functional ageing measures and other health endpoints and assess metabolomic age in longitudinal, repeat samples.
In conclusion, we have developed a predictive indicator of aging based on broad metabolomic analysis among working age adults. We found that while mAA, the difference between metabolomic and chronological age, was not related to DNAmAA, it was associated with mortality risk factors including obesity, diabetes, heavy alcohol use and psycho-social factors including depression, anxiety and lower income. Biological age acceleration may be an important mechanism linking psycho-social stress to age-related disease. Advances in life expectancies have led to an increased prevalence of age-related morbidities. Targeting the process of ageing itself, through changes in living conditions, behaviours or therapeutic interventions, may help more people experience healthy ageing.
Methods
Cohort and covariate information
The Airwave Health Monitoring Study is an occupational cohort of employees of 28 police forces from across Great Britain. Full details of the cohort and methods are available in Elliott et al 50. The study started recruitment in 2006 and now contains 53,280 participants. The study received ethical approval from the National Health Service Multi-Site Research Ethics Committee (MREC/13/NW/0588). At the baseline health screening, participants underwent health examination, self-completed a computer questionnaire and provided urine and blood samples. Blood samples were spun at the health clinic and the biological samples were stored in a Thermoporter (LaminarMedica) and sent overnight from the clinics for next-day analysis of standard clinical chemistry tests or were frozen at −80°C long term storage. DNA samples and plasma for metabolomic analysis were extracted from blood collected in EDTA tubes.
Important covariates in the analysis were categorised from self-report or clinical data as follows: Ethnicity was defined as ‘white’ or otherwise. Marital status was defined as living with partner or otherwise. Income was defined as low, medium or high, based on terciles of total net household income after adjustment for the number of dependant household members. Education was defined as low (completed GCSEs or equivalent only), medium (completed ‘A’ levels or equivalent only) or high (completed university or higher degree). Alcohol use was classed as non-drinker, moderate drinker (≤ 14 alcohol units/week for women and ≤ 21 alcohol units/week for men) or heavy drinker (> 14 alcohol units/week for women and >21 alcohol units/week for men). Hypertension was defined as ether reported diagnosis or systolic blood pressure ≥ 140 mmHg or diastolic blood pressure ≥ 90 mmHg. Diabetic status was defined as normal (no diagnosis and HbA1c < 6.5%), or diabetic (diagnosis or HbA1c ≥ 6.5%). Physical activity was defined as low, moderate or high based on the scoring protocol of the International Physical Activity Questionnaire 51.
Psychological instruments
The Patient Health Questionnaire – 9 depression questionnaire was used to define participants as “normal (i.e. no depression)”, “minimal symptoms of depression” or as a “depression case” 52. The Hospital Anxiety and Depression Scale questionnaire was used to assess anxiety levels as “normal (i.e. no anxiety)”, “borderline” and “anxiety case” 53. Participants were defined as job strained if they had levels of ‘job demand’ above the median of the whole cohort and levels of ‘job latitude’ below the median of the cohort based of a subset of six items from the Job Content Questionnaire 54. Subjective feelings of chronic fatigue were assessed as “low”, “medium” or “high” according to the questionnaire of 55. Participants were asked if they had experienced a work-related traumatic incident in the previous six months. Those who reported a traumatic incident were then asked to complete a brief screening instrument for post-traumatic stress disorder (PTSD) 56. Participants were thus classed into three categories: “not experienced traumatic incident in past 6 months”, “experienced traumatic incident in past 6 months without leading to PTSD”, and “experienced traumatic incident in past 6 months leading to PTSD”,
Assessment of Diet
Dietary intake was measured using validated 7-day estimated weight food diaries as fully described previously 57. Nutritional intake was calculated using Dietplan6.7 software (Forestfield Software, Horsham, UK) which is based on the McCance and Widdowson’s 6th Edition Composition of Foods UK Nutritional Data set (UKN) by a team of trained coders trained to match food and drink items to the UKN database code and a portion size.
Energy adjusted average consumption of fruit, vegetables, red meat, processed meat, wholegrain and dairy over the week was categorised into tertiles. For fish, consumption was divided into none, medium (below median consumption among consumers) and high (above median consumption). Total average energy consumption was categorised as low, medium and high based on separate tertiles for men and women. Two overall dietary scores were calculated: The Dietary Approaches to Stop Hypertension (DASH) diet score divided into quintiles 58 and the Mediterranean Diet score as a continuous measure 59.
Metabolomic data acquisition
Metabolomic analysis of serum and urine was performed at the National Phenome Centre, based at Imperial College London. Samples were randomly sorted into batches of 80 and thawed to 4°C, centrifuged to remove particulate matter, and the supernatant dispensed across dedicated 96-well plates for each assay. Study-Reference (SR) samples, a pool of all samples for each matrix in the study, and Long-Term Reference (LTR) samples, a pool of samples external to study, were included in each analytical run to allow for quantification and correction of technical variation. Samples are prepared and analysed daily in batches of 80 study samples with the addition of 4 quality controls (2 SR and 2 LTR). Samples were maintained at 4°C during preparation for, and while awaiting, acquisition.
Acquisition of Nuclear Magnetic Resonance Spectroscopy (NMR) profiles (the NOESY experiment in urine and the CPMG experiment in serum) was conducted as described in 60. Lipoprotein parameters were generated by the Bruker B.I.-LISA (Bruker IVDr Lipoprotein Subclass Analysis platform, derived from NMR of serum. Spectra were acquired at 600 MHz with Bruker Ascend 600 magnets and Avance III HD consoles configured to the Bruker IVDr specification (Bruker Corporation, Billerica, MA, USA).
Ultra-Performance Liquid Chromatography - Mass Spectrometry (UPLC-MS) acquisitions were conducted in batches of up to 1000 study-samples, interleaved with alternating SR and LTR samples every five injections (16 per 80 samples), each batch was flanked by a serial dilution of the SR sample to assess linearity of response. Multiple analytical experiments were performed to increase metabolomic coverage. Hydrophilic interaction chromatography was performed in both urine and serum as described in 61. Reversed-phase chromatography was performed on urine samples in both positive and negative modes as described in 61. Lipid-targeted reverse-phase chromatography was applied in serum ionised in both positive and negative modes as described in 62. All UPLC-MS profiling assays were acquired on Waters G2-S ToF mass spectrometers, with Acquity UPLC chromatography systems (Waters Corporation, Milford, MA, USA).
Metabolomic data processing
NMR spectra were automatically processed in TopSpin 3.2, followed by a suite of in house scripts 60. Each spectrum was automatically checked, before all spectra were aligned to a common reference scale. Analytical quality was further assessed manually on four factors: Line width of less than 0.9 Hz, quality of water-suppression, even baseline signal and accurate chemical shift referencing. Urine samples were referenced to an internal spiked standard 3-(trimethylsilyl)-2,2,3,3-tetradeuteropropionic acid (TSP) at 0 ppm. Plasma samples were referenced to the α-anomeric glucose doublet at 5.233 ppm. Spectra were aligned to a common reference scale, running from 10 to −1 ppm, and interpolated onto a common 20,000 point grid. Lipoprotein parameters were validated according to Bruker’s B.I.-LISA protocols.
Chromatograms and mass spectra instrument raw files were imported into Progenesis QI (Waters Corp. Milford, MA, USA) for retention-time alignment and feature detection. Progenesis QI was configured to align retention time to the central LTR sample of the acquisition. Peak detection was configured with a minimum chromatographic peak width of 0.01 minutes, and automatic noise detection set to the minimum threshold of 1. Peaks arising from isotopes and chemical adducts were automatically resolved according to the observed m/z and chromatographic peak shape, and peaks areas integrated. Further processing and filtering of UPLC-MS profiling datasets was conducted with in-house scripts, and used to account for analytical run-order effects and remove noise from each dataset. Analytical run-order effects were accounted for with an adaption of the method described in 63. A robust LOWESS regression was generated per-feature, based on the SR samples, in run-order, with the window scaled to include 21 SR samples. The smoothed response values for each feature were then interpolated to the intermediate study sample injections using simple linear interpolation. Finally, the median intensity of each feature in each analytical batch was aligned. Extracted features spuriously arising from analytical noise were removed from the dataset by a pair of approaches, both applied on a per-feature basis. First, a serial dilution of the study reference sample was used to assess the linearity of responses of each feature. Detected features were correlated to their expected intensity in the dilution series, and those features showing a Pearson’s r of less than 0.7 were excluded from further analysis. Second, the relative standard deviation (RSD) of each feature across the study reference samples was calculated, and those features where the RSD exceeded 30%, or the observed biological variance was less than 1.5 times the RSD, were excluded.
Metabolomic age model
Untargeted NMR datasets were glog-transformed 64, the quantified BiLISA data was log-transformed, and the UPLC-MS data were log transformed, following unit addition to every value to allow transformation of zero values. Data were then mean centred and scaled to unit variance.
A predictive model of metabolomic age was constructed using elastic net regression 65 in the “glmnet” package 66 in R. The model was fitted on metabolic features from across all metabolomic datasets, using a multi-step process on 80% of the data (the training dataset). The remaining 20% was reserved for assessment of the predictive ability (Pearson’s correlation between predicted and chronological age) of the model in an independent dataset (the test dataset). The steps were as follows:
Step 1 Parameterisation: Elastic net model parameters, α (that defines mixing between lasso and ridge penalties) and λ (overall strength of penalty), were found following 10-fold cross validation. A line search across α, between 0 and 1 in 0.01 increments, was performed to find the minimum mean cross-validated error (MSE) using the optimal value of λ found using the ‘cvfit’ command for each α value.
Step 2 Leave platform out analysis: Due to potential redundancy between metabolomic datasets, we performed the parameterisation step above on data with one metabolomic platform left out each time. Platforms were removed from further analysis if model performed better (lower MSE) with their exclusion. We continued this process leaving further platforms out each time until no improvement in MSE was observed.
Step 3 Stability analysis: Using the selected metabolomic datasets, we repeated elastic net regression on 100 subsamples of the training dataset (a random selection of 80% each time). The metabolic features selected in each model was stored for each iteration.
Step 4 Metabolomic data restriction: On the same subsample for 101 iterations, the number of metabolic features available to build an elastic net model was restricted by the percentage of iterations in step 3 that a feature was selected, moving from 100% to 0%, in 1% decrements for each subsequent iteration. The correlation between predicted and chronological age in remaining 20% of training set was stored for each iteration and the percentage restriction value that gave the best correlation, was chosen for the final metabolic feature restriction in step 5.
Step 5 Final model building: On the complete training dataset, a final elastic net model was constructed using metabolic features restricted to those present in a set percentage of models, as found in step 4.
Metabolomic age acceleration (metAA) was defined as the difference between chronological age and predicted age, adjusted on actual age as previously defined for DNA methylation age acceleration 10. That is, we define mAA as the residuals of a linear regression between the chronological age and predicted age difference, with chronological age itself.
Metabolic feature and pathway annotation
Tentative annotations were provided for mass-spectrometry based metabolic features bases on m/z searches across the Human metabolome database 67, for the ion forms M+2H, M+H+NH4, M+NH4, M+H, M+ACN+H, M+CH3OH+H, M+Na, M+K, 2M+H at ± □ 8 □ ppm mass tolerance.
For five UPLC-MS based metabolic features that were both tentatively annotated by exact mass within our metabolomic age model and also available in repeat measurements within the Northern Finnish Birth Cohort dataset, we performed further annotation procedures. Two of these annotations, for citrate (as in-source fragmentation product) and leucine (M+Na ionic form), were supported by matching retention times and accurate mass to an internal reference standard database.
Significantly enriched metabolic pathways were predicted using the mummichog program 48. The algorithm searches tentative compound lists from metabolite reference databases against an integrated model of human metabolism to identify functional activity. Fisher’s exact tests and permutation are used to infer p-values for likelihood of pathway enrichment among significant features as compared to pathways identified among the entire compound set present in reference list (the entire metabolome dataset), considering the probability of mapping the significant m/z features to pathways. Mummichog parameters were set to match against ions included in the ‘generic positive mode’ setting at ± □ 8 □ ppm mass tolerance.
Metabolite validation in the Northern Finnish Birth Cohort 1966
The Northern Finnish Birth Cohort 1966 is a prospective birth cohort that sampled 12,058 live births in 1966, including 96.3% of all births in the regions of Oulu and Lapland in Finland 68. Fasting blood samples were collected at follow-up of participants at ages 31 and 46 yrs and stored at −80 □ °C for subsequent biomarker profiling. A high-throughput NMR metabolomics platform was used for the analysis of 87 metabolic measures 69. This metabolomics platform provides simultaneous quantification of routine lipids and lipid concentrations of 14 lipoprotein subclasses and major sub-fractions, and further quantifies abundant fatty acids, amino acids, ketone bodies and gluconeogenesis-related metabolites in absolute concentration units.
We assessed changes of nine metabolites, that were available in this dataset and also included in our predictive model, between these two sampling points using 1-tailed t-tests.
DNA methylation analysis
For the microarray, bisulphite conversion of 500 ng of each DNA sample was performed using the EZ DNA Methylation-Lightning™ Kit according to the manufacturer’s protocol (Zymo Research, Orange, CA). Then, bisulfite-converted DNA was used for hybridization on the Infinium HumanMethylation EPIC BeadChip, following the Illumina Infinium HD Methylation protocol. Briefly, a whole genome amplification step was followed by enzymatic end-point fragmentation and hybridization to HumanMethylation EPIC BeadChips at 48°C for 17 h, followed by single nucleotide extension. The incorporated nucleotides were labelled with biotin (ddCTP and ddGTP) and 2,4-dinitrophenol (DNP) (ddATP and ddTTP). After the extension step and staining, the BeadChip was washed and scanned using the Illumina HiScan SQ scanner. The intensities of the images were extracted using the GenomeStudio (v.2011.1) Methylation module (1.9.0) software, which normalizes within-sample data using different internal controls that are present on the HumanMethylation EPIC BeadChip and internal background probes. The methylation score for each CpG was represented as a β-value according to the fluorescent intensity ratio representing any value between 0 (unmethylated) and 1 (completely methylated).
DNA methylation (DNAm) data were pre-processed and normalized using in-house software written for the R statistical computing environment, including background and color bias correction, quantile normalization, and Beta MIxture Quantile dilation (BMIQ) procedure to remove type I/type II probes bias, as described elsewhere 3. DNAm levels were expressed as the ratio of the intensities of methylated cytosines over the total intensities (β values). Cross-reactive and polymorphic probes - with minor allele frequency greater than 0.01 in Europeans 70 - were excluded. Methylation measures were set to missing if the detection p-value was greater than 0.01. Samples with the bisulfite conversion control fluorescence intensity lower than 10,000 for both type I and type II probes and those with total call rate lower than 95% were excluded. Finally, samples were excluded if the predicted sex (based on chromosome X methylation) did not match that self-reported.
DNA methylation age was computed according to the algorithm described by Hannum et al. 11 based on a set of 71 blood-specific age-associated CpG sites. We used this algorithm, rather than the algorithm of Hovarth, since it was developed specifically for blood samples and found to be the most predictive of mortality 12. Age acceleration (AA) was defined as the difference between epigenetic and chronological age. Since AA could be correlated with chronological age and WBC percentage, we computed the so-called intrinsic epigenetic age acceleration 12, which is defined as the residuals from the linear regression of AA with chronological age and blood cell counts (measured using flow cytometry) for neutrophils, lymphocytes, monocytes and eosinophils.
Genotyping
Genotyping was performed on the Illumina Infinium HumanCoreExome-12v1-1 BeadChip and quality control filters including call rate (>=97%), heterozygosity rate (<=3SD from the mean) were applied on the samples. Duplicated and second-degree relatives were further excluded and 14,062 samples of European ancestry based on principle component analysis remained. Markers were removed for high missing rate (>2%), deviation from Hardy-Weinberg equilibrium (P<1E-5) or minor allele frequency below 1%, resulting in 254,027 high-quality and common markers. Imputation was performed using the Haplotype Reference Consortium (HRC) panel (version r1.1 2016).
We selected 11 SNPs, previously associated with lifespan 6, and tested their associations with both DNAmAA and mAA, in bivariate linear models. DNAmAA or mAA was used as the dependent variable and the dosage of the effect allele for each SNP (i.e. 0,1 or 2) was used as the independent variable.
Analysis of risk factors of biological age acceleration
We analysed associations between mortality risk factors, including psychosocial factors, and age acceleration scores in separate adjusted linear regression models. The adjustment set, chosen a priori and included in all models was: sex, ethnicity, study centre, income, hypertension, diabetes, BMI, smoking, alcohol intake, physical activity, DASH score and fish consumption.
Acknowledgements
OR was supported by a MRC Early Career Fellowship. This study was partly supported by the European Commission grant to the LIFEPATH project (Horizon 2020 grant number 633666). The Airwave Health Monitoring Study is funded by the Home Office (grant number 780-TETRA) with additional support from the National Institute for Health Research (NIHR) Biomedical Research Centre. The Airwave Study uses the computing resources of the UK MEDical BIOinformatics partnership (UK MED-BIO supported by the Medical Research Council (MR/L01632X/1). We thank all Airwave participants for their contributions. We thank the late professor Paula Rantakallio (launch of NFBC1966), the participants in the 31 yrs and 46 yrs studies and the NFBC project center. NFBC1966 received financial support from University of Oulu Grants no. 24000692, Oulu University Hospital Grant no. 24301140, ERDF European Regional Development Fund Grant no. 539/2010 A31592, University of Oulu Grant no. 65354, Oulu University Hospital Grant no. 2/97, 8/97, Ministry of Health and Social Affairs Grant no. 23/251/97, 160/97, 190/97, National Institute for Health and Welfare, Helsinki Grant no. 54121, Regional Institute of Occupational Health, Oulu, Finland Grant no. 50621, 54231. I.K. acknowledges support from the EU PhenoMeNal project (Horizon 2020, 654241).