Abstract
Saliva, as a biofluid, is inexpensive and non-invasive to obtain, and provides a vital tool to investigate oral health and its interaction with systemic health conditions. There is growing interest in salivary biomarkers for systemic diseases, notably cardiovascular disease. Whereas hundreds of genetic loci have been shown to be involved in the regulation of blood metabolites leading to unprecedented insights into the pathogenesis of complex human diseases, little is known about the impact of host genetics on salivary metabolites. Here we report the first genome-wide association study exploring 476 salivary metabolites in 1,419 subjects of European ancestry from the TwinsUK cohort (discovery phase). A total of 14 salivary metabolites were significantly associated (p<10−10) with genetic variants that mapped to 11 distinct loci, most of which replicated in the Study of Health in Pomerania (SHIP-2) cohort. Interestingly, while only a limited number of the loci that are known to regulate blood metabolites were also associated with salivary metabolites in our study, we identified several novel saliva-specific locus-metabolite associations, including associations for the AGMAT (with the metabolites 4-guanidinobutanoate and beta-guanidinopropanoate), ATP13A5 (with the metabolite creatinine) and DPYS (with the metabolites 3-ureidopropionate and 3-ureidoisobutyrate) loci. Our study suggests that there are biological pathways which are specific to the regulation of the salivary metabolome. In addition, some of our findings may have clinical relevance, such as the utility of the pyrimidine (uracil) degradation metabolites in predicting 5-fluorouracil toxicity and the role of the agmatine pathway metabolites as biomarkers of oral health.
Introduction
Metabolic reactions pervade every aspect of human physiology, abnormalities in which underlie a plethora of human diseases1. Investigating the genetic underpinnings of population-wide variation of metabolites can offer novel insights into human metabolism and diseases, in addition to providing potential therapeutic targets to modulate metabolite levels. Large-scale genetic association studies have so far identified hundreds of loci that regulate the levels of metabolites in blood2–6, and to a lesser extent in other biospecimens as well7–9. Previous studies have shown that genetic variants on average explain a greater proportion of trait variance for metabolites compared to what is generally observed for complex traits3,4, highlighting the utility of metabolites as intermediate traits for dissecting the genetics of complex diseases.
Saliva is an abundantly produced biofluid, and it can be obtained in an inexpensive and non-invasive manner, without the need for healthcare professionals. It is mainly composed of water (>99%) and several other minor constituents such as mucous, digestive enzymes, cytokines, immunoglobulins, antibacterial peptides and low molecular weight metabolites10.
Recent advances in metabolomic profiling allow quantification of hundreds of metabolites belonging to diverse biochemical pathways in large population samples4,11. In 2015, the Human Metabolome Database (HMDB) incorporated data on the ‘salivary metabolome’ which included 853 salivary metabolites that were systematically characterized using a multiplatform approach12. Since saliva is separated from the systemic circulation by just a thin layer of cells, which allows passive and active exchange of substances13, it provides a reflection of not just oral health but the functioning of other organ systems as well14. Indeed, a number of studies have previously reported associations between oral health and systemic conditions such as cardiovascular diseases, diabetes, autoimmune diseases, mental health disorders, dementia, among others15–19. Therefore, investigation of salivary metabolites could not only provide novel biomarkers but also further our understanding of biological pathways underlying oral as well as general health conditions.
Here we report a genome-wide association analysis (based on 1000 Genomes imputed data) for 476 salivary metabolites in the population-based TwinsUK study, followed by replication in the population-based Study of Health in Pomerania (SHIP-2) cohort.
Material and methods
(I) Discovery phase
Study population
The discovery phase of the study was conducted in the TwinsUK cohort, an adult twin registry comprising of healthy volunteers, based at St. Thomas’ Hospital in London20. Twins gave fully informed consent under a protocol reviewed by the St. Thomas’ Hospital Local Research Ethics Committee. Subjects of European ancestry with available genotype data and for whom salivary metabolite profiling was done on a fasting state sample were included in our study (N = 1,419; mean age = 62.2 years; % females = 92.7).
Genotyping, imputation and QC
Subjects were genotyped in two different batches of approximately the same size, using two genotyping platforms from Illumina: 300K Duo and HumanHap610-Quad arrays. Whole genome imputation of the genotypes was performed using the 1000 Genomes reference haplotypes21, further details of which are provided in Moayyeri et al (2013). Stringent QC measures, including minimum genotyping success rate (>95%), Hardy-Weinberg equilibrium (p>10−6), minimum MAF (>0.5%) and imputation quality score (INFO>0.5), retained ~9.6 million variants for genome-wide analysis.
Saliva sample collection
Saliva samples were obtained by asking the fasted volunteer to spit as much saliva as possible into an empty sterile pot over a period of 10 minutes. The saliva samples were immediately refrigerated and then frozen at −80°C (usually within 4 hours of sample collection) before further processing. Following that, the samples were shipped on dry ice for metabolite profiling at Metabolon Inc., Durham, USA (See Supplemental methods (I) for further details on sample processing).
Metabolic profiling of saliva samples
Metabolite concentrations in the saliva samples were estimated using Ultrahigh Performance Liquid Chromatography-Tandem Mass Spectroscopy (UPLC-MS) i.e. chromatographic separation, followed by full-scan mass spectroscopy, to record all detectable ions in the samples (see Supplemental methods (II) for further details). Based on their unique ion signatures (chromatographic and mass spectral), 997 distinct metabolites were identified, of which 823 had known chemical identity at the time of analysis. The 823 known metabolites were broadly classified into 8 metabolic groups (amino acids, peptides, carbohydrates, energy, lipids, nucleotides, cofactors and vitamins, and xenobiotics) as described in the KEGG (Kyoto Encyclopedia of Genes and Genomes) database22. The 8 metabolic groups were further subdivided into 99 distinct biochemical pathways.
Raw metabolite values were normalised for the volume and osmolality measurement of the saliva samples. The normalised metabolite values were then log-transformed, and scaled to uniform mean 0 and standard deviation 1. Of the 823 known metabolites, 476 were retained for analysis based on presence of measurement in more than 80% samples. For the samples with missing values for these 476 metabolites, data were imputed using the run day minimum value for the metabolite. The resulting imputed dataset of the 476 metabolites was used for further analysis.
Genome-wide association analysis of salivary metabolites
(i) Primary genome-wide association analysis
For each of the 476 metabolites, a linear mixed-model was fitted to test the association between the metabolite (dependent variable) and genome-wide variants (independent variable). Age, sex and time of saliva sample collection were included as covariates in the model. The score test implemented in GEMMA23, which utilises a sample kinship matrix (estimated using a subset of ~500,000 variants) to account for the twin structure or relatedness in the TwinsUK data, was used to assess significance of the associations. A genome-wide and metabolome-wide significance cut-off of p<10−10 (corresponding to the conventional genome-wide significance threshold of 5×10−8, corrected for 476 metabolites) was used to identify significant variant-metabolite associations. For each locus that was significantly associated with a metabolite, we reported the variant with the lowest association p-value.
(ii) Testing loci identified in the primary analysis for additional metabolite associations
Next, we focused just on the loci that were identified in the primary stage of association testing to look for additional variant-metabolite associations for those loci. For each locus that was identified in the primary analysis, we clumped all variants located within a 100 Mb block and with LD (r2) > 0.2, and checked for additional metabolite associations at a significance threshold of p<10−6 (corresponding to p=0.05, corrected for 476 metabolites and a prior assumption of about 100 associated loci).
(iii) Testing the significantly associated loci using metabolite data from other biospecimens
For each locus-metabolite association, we further assessed the most significant variant-metabolite pair by using measurements for the respective metabolite in serum4 and faecal samples7 of the TwinsUK subjects (provided the metabolite was measured in that biospecimen). We tested only those serum and faecal samples that overlapped with the ones in saliva and were collected within 5 years of the saliva samples (in order to ensure that, for a given individual, samples from the different biospecimens being tested were obtained within a certain period of one another). Association testing for serum and faecal metabolites was done using an identical model to that described for the analysis of salivary metabolites.
(iv) Conditional analysis for the significantly associated loci
(a) Detection of secondary association signals
We used approximate conditional analysis, as implemented on GCTA24, to test whether any of the associated loci had multiple distinct i.e. secondary association signals, at a “locus-wide” significance threshold of p<10−5. For each associated locus, all variants that surpassed the study-wide significance threshold (p<10−10) were conditioned on the most significantly associated variant at that locus (using the association summary statistics). For the conditional analysis, we used genotype data from the complete TwinsUK dataset (N=5,654) to model LD patterns between variants.
(b) Adjustment for periodontal disease status
Since salivary metabolites are known to be associated with periodontal disease (PD)11, for each locus-metabolite association, we additionally adjusted the most significant variant-metabolite pair for PD status. Self-reported gingival bleeding, and a history of gum disease or tooth mobility were used as indicators of PD in TwinsUK25 (270 PD cases; 1,083 controls).
Expression quantitative trait locus (eQTL) analysis
We used the version 7 data release of the Genotype-Tissue Expression (GTEx) project (accessed 15 April 2019), which was based on RNA-Seq data obtained from 48 non-diseased tissue sites across ~1,000 individuals, to test whether the most significant variant at each associated locus had an eQTL effect on transcripts located within a 1 Mb window of the variant.
Annotation of associations using reference databases
We searched the NHGRI GWAS catalogue (accessed 15 April 2019) for previous disease associations for the significantly associated loci that were identified in our study. We also searched the OMIM database (accessed 15 April 2019) to check the candidate genes at the associated loci for a causal link with inborn errors of metabolism. Moreover, we also queried the HMDB12 and KEGG22 databases to identify biochemical pathways and known disease associations for the associated metabolites.
(II) Replication phase
The replication phase was performed in the Study of Health in Pomerania (SHIP-2), a population-based study comprising of European ancestry subjects, conducted in the northeastern area of Germany. Further details of SHIP-2, including cohort details, genotyping and imputation, and saliva sample collection are provided in Supplemental methods (III). Metabolic profiling for SHIP-2 saliva samples (N=1,000) was performed using an identical process to that described for TwinsUK.
Since the method of saliva sample collection in SHIP-2 (chewing on a piece of cotton) meant that the sample thus obtained represented stimulated saliva, normalisation of the metabolite measurements for sample osmolality was not considered necessary. The fact that the salivary osmolality values in SHIP-2 had a much narrower distribution compared to that in TwinsUK verified our rationale (Figure S1).
For each locus-metabolite association that was identified in the discovery phase, we tested the most significantly associated variant using a linear regression model was fitted on R (version 3.5.2). Covariates used in the association model were similar to those used for the discovery phase.
(III) Testing the significantly associated salivary metabolites with phenotypes of interest
We wanted to test how salivary metabolites that were regulated by genetic loci related to relevant phenotypes. For that, we selected the metabolites that were uniquely associated in saliva i.e. the ones for which a genetic association had not been previously reported in blood, and tested them with phenotypes (diseases / traits / adverse drug effects) relating to the metabolite or its associated biochemical pathway. We obtained the relevant phenotype information from the TwinsUK database, selecting one twin per pair (N=1,426). The phenotype association analysis was performed on R (version 3.5.2) by fitting a linear regression model to test the association between the salivary metabolite and the disease / trait / adverse drug effect (adjusted for age and sex).
Results
Identification of novel genetic loci regulating salivary metabolite levels
Primary genome-wide discovery analysis in TwinsUK identified 13 metabolites that were significantly associated with genetic loci after correcting for multiple testing (p<10−10). Furthermore, when we narrowed our analysis to just the loci that were identified in the primary stage, one additional metabolite was found associated (p<10−6). Consequently, a total of 14 distinct locus-metabolite associations (hereafter, referred to as ‘mQTLs’) were identified in the discovery phase of our study. The set of significantly associated variants mapped to 11 distinct genetic loci, which have been referred to by the name(s) of the overlapping or the nearest gene(s) (Figure 1). Three of those loci (AGMAT, SLC2A9 and DPYS) were associated with two metabolites each (Table 1). In all three instances, the two metabolites regulated by the same locus were correlated (Pearson’s r2 for the metabolite pairs ranged between 0.42 - 0.84) (Figure S2). On the other hand, none of the 14 metabolites that was associated in our study had more than one significant locus. Quantile-quantile (QQ) plots for the significantly associated metabolites are provided in Figure S3.
Of the 11 genetic loci that were identified in the discovery phase, four of them (SLC2A9, DMGDH, FADS2 and ACADS) have previously been reported in association with blood metabolites2,4,5; the remaining seven loci were novel i.e. there were no previously known associations for them with metabolites in blood or any other biospecimens.
The AGMAT locus, one of the novel loci identified, was associated with the metabolites 4-guanidinobutanoate and beta-guanidinopropanoate. These metabolites are generated as intermediate products in the polyamine synthesis pathway, the main site of action for the enzyme agmatinase (encoded by the AGMAT gene)26. The most significantly associated variants for the two metabolites i.e. rs10927806 and rs6690813, respectively, are in high LD with one another (r2=0.99), suggesting a shared underlying genetic regulation for the two metabolites by the AGMAT locus. Likewise, the association between the ATP13A5 locus and creatinine (a widely used measure of renal function), observed in our study, has also not been reported previously. Another interesting novel association that we identified pertained to the metabolism of the pyrimidine uracil - the DPYS locus (encodes for the enzyme dihydropyrimidinase, involved in uracil degradation) was associated with the metabolites 3-ureidopropionate and 3-ureidoisobutyrate (breakdown products of uracil metabolism). The novel association for the TYMS/ENOSF1 locus with ribonate is also intriguing, since it has previously been shown that ribonate is one of the substrates for the catalytic activity of reverse thymidylate synthase (rTS), the protein product of ENOSF127. Therefore, it appears that ENOSF1, which is the source of anti-sense RNA of TYMS, is probably the functional gene mediating the observed association of the TYMS/ENOSF1 locus with salivary ribonate. The associations of the SLC2A9 locus with allantoin, the ABO locus with N-acetylglucosamine/N-acetylgalactosamine, the UGCG locus with glycosyl-N-stearoyl-sphinganine, and the MGP locus with gamma-carboxyglutamate were the remaining mQTLs observed in our study that have not been reported previously.
Of the 14 distinct mQTLs that were identified in our study, it was possible to test the most significant variant-metabolite pair for nine mQTLs in both serum and faecal metabolite data in TwinsUK (metabolites corresponding to the remaining five mQTLs were not present in the serum and faecal metabolite datasets). Of them, the associations for the ATP13A5 and DPYS loci did not replicate in serum (p>0.05); while, none of the associations, barring the one for the ACADS locus, replicated in the faecal data (p>0.05) (Table S1). Thus, a comparison across all three biospecimens (for the significantly associated salivary metabolites which were also measured in serum and faecal samples in TwinsUK) demonstrates that the effects of the ATP13A5 and DPYS loci were specific to saliva (Figure S4).
For none of the mQTLs that was identified, did we find any additional independent association signals after conditioning for the most significant variant at the locus (conditional p>10−5 for all variants tested at each locus), a finding which was verified by the regional association plots (Figure S5). The observation that a single genetic association signal underlies each of the significantly associated metabolites might partly be due to our lack of power to detect secondary signals at these loci.
Moreover, the strength of association for the most significant variant-metabolite pair did not change much on adjusting for PD status, for any of the mQTLs (Table S2). Hence, it does not appear that the presence of underlying PD has a significant effect on the associations observed in our study.
For 8 of the 11 associated loci, it was observed that the most significant variant demonstrated an eQTL effect on at least one transcript in one of the tissues in the GTEx database (no significant eQTL effects were observed for the ATP13A5, SLCA2A9 and ABO loci) (Table S3). In case of 7 of those 8 loci (except DPYS), the significant eQTL effect was observed for the overlapping or the nearest gene transcript; and for 5 of those 7 loci (AGMAT, FADS2, DMGDH, TYMS/ENOSF1 and UGCG), the eQTL effect was observed in one of the gut-related tissues. While eQTL data for transcripts assayed in the minor salivary glands was available for a small number of donors (N=97) in the GTEx database, none of the significant eQTL effects that we observed was in the salivary tissue.
Replication of the discovery phase findings
In SHIP-2, we could attempt replication for 9 of the 14 mQTLs that were identified in the discovery phase (metabolites corresponding to the remaining five mQTLs were not measured in SHIP-2). In the initial replication analysis, which was performed using salivary metabolite data that was not normalised for sample osmolality, 8 of the 9 discovery phase associations were replicated (p<0.05), with the direction of effect consistent with that observed in TwinsUK (Table 2). The association between the ABO locus and N-acetylglucosamine/N-acetylgalactosamine was the only finding from the discovery phase that did not replicate in SHIP-2. When we repeated the replication analysis in SHIP-2 with metabolite data that was normalised for the sample osmolality, the strength of all the associations was comparatively reduced (Table S4).
Phenotypic associations for salivary metabolites
Majority of the loci that were associated in our study have been reported in relation with GWAS traits, inborn errors of metabolism and / or clinically relevant biochemical pathways (Table 3). We further tested the salivary metabolites associated with the DPYS, AGMAT and ATP13A5 loci in relation with specific phenotypes using information available in the TwinsUK database (Table 4).
The AGMAT-associated metabolites (4-guanidinobutanoate and beta-guanidinopropanoate) are generated in the polyamine synthesis pathway (https://www.genome.jp/kegg-bin/show_pathway?hsa00330). This pathway also produces the compound putrescine, which has been implicated in poor oral health and foul breath28,29. Consequently, we evaluated the significance of the AGMAT-associated metabolites in oral health by testing them with PD status - the levels of both 4-guanidinobutanoate and beta-guanidinopropanoate were significantly higher in PD cases compared to controls (p=0.0003 and p=0.0006, respectively). Since eGFR (estimated glomerular filtration rate) is calculated on the basis of serum creatinine, and these two commonly used measures of renal function are negatively correlated, we wanted to investigate the association between salivary creatinine and eGFR. We observed a similar strong negative relationship between salivary creatinine and eGFR (p=1.6 x 10−11), which was indicative of a homeostasis between creatinine concentrations in serum and saliva. Furthermore, creatinine is also known to be a marker of muscle mass and strength30. We, therefore, tested salivary creatinine in relation with grip strength (a measure of muscle strength), which suggested a positive correlation between them (p=0.01).
Apart from these findings, the remaining phenotypic associations that we tested were largely negative, as follows:
The enzyme agmatinase (encoded by the AGMAT gene), which acts on the substrate agmatine, has been implicated in the pathophysiology of mood disorders31. Moreover, studies have also proposed agmatine as a novel neuromodulator32. We, therefore, tested the AGMAT-associated metabolites in relation with a diagnosis of clinical depression or anxiety disorder, and responses (on a ordinal scale) to questions in the Hospital Anxiety and Depression Scale (HADS) questionnaire33. But, neither analysis showed any significant association (Table S5).
ATP13A5, the locus that was associated with salivary creatinine, belongs to the family of ATPases that regulate the activity of HMG-CoA reductase34, the main site of action of the cholesterol-lowering class of drugs called statins. Statins are known to cause muscle dysfunction (myopathy) in a small fraction of patients35. Given that ATP13A5 and statins both act in the same biochemical pathway, we assessed whether salivary creatinine is also associated with statin usage, and could therefore be used as a biomarker for statin-induced myopathy. There was, however, no association between salivary creatinine and statin usage (p=0.22).
5-Fluorouracil (5-FU) is a pyrimidine analogue that is a commonly used anticancer drug. It is eliminated from the body via the pyrimidine degradation pathway, hence variants in genes coding for the pyrimidine degradation enzymes (for instance, DPYS) are known to be associated with the development of 5-FU toxicity36, which mainly manifests as gastrointestinal side effects. Since we could not directly assess the DPYS-associated metabolites (3-ureidopropionate and 3-ureidoisobutyrate) in relation to the gastrointestinal side effects of 5-FU toxicity, we tested the metabolites with a commonly observed phenotype of gastrointestinal dysfunction, irritable bowel syndrome or IBS (ascertained using the ROME-III questionnaire37). For both metabolites, we observed that the levels were not significantly different in IBS “cases” compared to “controls” (p>0.05, for both metabolites). However, this negative finding does not negate the possibility that these metabolites could be of clinical use in predicting 5-FU toxicity.
Discussion
Here we report a genome-wide association analysis of 476 metabolites measured in saliva samples of healthy population-based studies of European descent. We identified a total of 11 distinct genetic loci that regulate the level of 14 salivary metabolites, of which three loci were associated with more than one metabolite each.
The fact that saliva is reflective of the concentration of biochemicals in blood forms the basis for certain clinical applications of saliva that others have proposed previously such as therapeutic monitoring of drugs38, cortisol measurement39 and renal function monitoring40. Using salivary metabolite data, we replicated the associations for certain well-established genetic loci that are known to regulate the level of blood metabolites. Thus, our findings add further credence to the notion that, as biofluids, a certain degree of homeostasis exists between blood and saliva.
Additionally, we identified some novel associations in our study, which have expanded our knowledge of genetic influences on human metabolites. In particular, the association between the DPYS locus and pyrimidine metabolites is intriguing because of its clinical relevance. Mutations in the pyrimidine catabolism genes such as DPYD (encodes dihydropyrimidine dehydrogenase) and DPYS (encodes dihydropyrimidinase) have been linked to inborn errors of metabolism41,42 [MIM:274270 and 222748, respectively] as well as development of severe toxicity to the chemotherapeutic agent 5-FU36,43,44. Studies have previously demonstrated the applicability of salivary measurement of certain pyrimidine pathway metabolites (uracil and dihydrouracil) for evaluating 5-FU toxicity due to deficient DPYD activity45. In our study, variants in the DPYS gene correlated with the levels of specific pyrimidine metabolites (3-ureidopropionate and 3-ureidoisobutyrate). Therefore, studies to explore the utility of salivary measurements of these metabolites as non-invasive tools for predicting 5-FU toxicity resulting from mutations that affect DPYS activity are warranted. While we could not test 3-ureidopropionate and 3-ureidoisobutyrate in relation to the gastrointestinal side effects of 5-FU toxicity, we did not find any association between these metabolites and a phenotype relevant to gut dysfunction (IBS phenotype). Similarly, the observed association between salivary ribonate and the ENOSF1 locus, overexpression of which has been shown to cause resistance of tumour cell lines to 5-FU27, was further indicative of the possible role of saliva as a tool for drug monitoring. The other novel finding of note was the association between the AGMAT locus and the metabolites 4-guanidinobutanoate and beta-guanidinopropanoate. The AGMAT-associated metabolites are produced in the polyamine synthesis pathway, which also generates the compound putrescine that has been implicated in oral health. Moreover, in our study, the AGMAT-associated metabolites were correlated with PD status, a disease related to poor oral health. Together, these findings suggest that the AGMAT-associated metabolites might serve as potential biomarkers for oral health. On the other hand, though the agmatine pathway has been previously implicated in mood disorders, we did not find any association between the AGMAT-associated metabolites and either symptoms of or a prior diagnosis of clinical depression or anxiety disorder. Thus, evidence based on the AGMAT-associated metabolites does not lend support to the hypothesis regarding a potential link between pathways involved in maintaining oral health and regulation of mood15,18. There were a few other novel associations of interest in our study, such as the metabolite gamma-carboxyglutamate (associated with the MGP locus, which is known to cause abnormal vascular calcification in patients with cardiovascular disease46), for which we could not assess the clinical significance since specific phenotypic information was not available in sufficient numbers in those with salivary metabolite data.
Given that we identified some novel genetic associations for salivary metabolites i.e. genetic loci that had not been reported in association with blood metabolites, it is indicative of the presence of regulatory pathways that are specific to the salivary metabolome. For the novel genetic loci, in most cases, the overlapping or the nearest gene transcript was expressed in one or more gut-related tissues, including salivary glands. However, for these loci, we did not find much evidence for cis-eQTL effects specific to salivary or other gut tissues, whereby we could attribute their association with the respective salivary metabolite(s) to transcriptional regulation of overlapping / neighbouring genes. Interestingly, there is growing evidence to suggest that the human metabolome is a reflection of an interaction between the host and the gut microbiome7,47,48 – it will, therefore, be worth exploring whether the salivary metabolites that were associated in our study correlate with the composition of the salivary microbiome. Attempting to elucidate, in this manner, biological processes that might mediate the observed associations between genetic loci and salivary metabolites will help broaden our understanding of the underlying pathways.
In summary, our study has provided an initial map of genetic loci that regulate the salivary metabolome. Based on what has been observed for other complex human traits, future studies with larger sample sizes are expected to uncover additional genetic loci with much smaller effects on salivary metabolites. Furthermore, our study also offered insights into hitherto unknown biological pathways involved in maintaining the levels of salivary metabolites. While we did explore, to an extent, the clinical relevance of a few salivary metabolites of interest, a more comprehensive analysis with a wider range of phenotypic domains will help broaden our understanding of the relationship between the salivary metabolome and systemic health conditions.
Supplemental Data
Supplemental data includes supplemental materials and methods, five supplemental tables and six supplemental figures.
Author Contributions
Conceived and designed the experiments: C.J.S., C.M., T.D.S., G.K., R.P.M., M.V.M, J.R., U.V.
Performed the experiments: R.P.M.
Analysed the data: A.N., K.S., Y.K., P.M.W., R.C.E.B., T.K., M.P., S.W., M.M.
Wrote the manuscript: A.N., C.J.S., C.M., K.S., R.P.M.
All authors revised the manuscript
Declaration of Interests
R.P.M. is an employee of Metabolon, Inc. and, as such, has affiliations with or financial involvement with Metabolon, Inc.
Web Resources
1000 Genomes project: http://www.internationalgenome.org/
Metabolon: https://www.metabolon.com/
GEMMA: http://www.xzlab.org/software.html
KEGG: https://www.genome.jp/kegg/
GCTA: http://cnsgenomics.com/software/gcta/#COJO
GTEx: https://gtexportal.org/home/
NHGRI GWAS catalogue: https://www.ebi.ac.uk/gwas/
OMIM: http://omim.org/
HMDB: http://www.hmdb.ca/
LocusZoom: http://locuszoom.org/
Figure S1: Comparison of the distribution of saliva sample osmolality in TwinsUK and SHIP-2
The distribution of saliva sample osmolality observed in SHIP-2 was much narrower in comparison to TwinsUK, indicative of the fact that, in contrast to TwinsUK, the saliva samples in SHIP-2 represented stimulated saliva.
Figure S2: Plots demonstrating the correlation between the pairs of metabolites that were associated with the same locus in the discovery phase
The figure demonstrates the correlation between the pairs of metabolites which were regulated by the same locus, as follows (i-iii): beta-guanidinopropanoate and 4-guanidinobutanoate (AGMAT); urate and allantoin (SLC2A9); 3-ureidopropionate and 3-ureidoisobutyrate (DPYS).
Figure S3: Quantile-quantile (QQ) plots of the genome-wide association analysis for the 14 salivary metabolites that were significantly associated in the discovery phase
The figure demonstrates the QQ plots for the significantly associated salivary metabolites, as follows (i-xiv): 4-guanidinobutanoate; beta-guanidinopropanoate; creatinine; urate; allantoin; dimethylglycine; 3-ureidopropionate; 3-ureidoisobutyrate; N-acetylglucosamine/N-acetylgalactosamine; glycosyl-N-stearoyl-sphinganine; 1-(1-enyl-palmitoyl)-2-arachidonoyl-GPC; ethylmalonate; gamma-carboxyglutamte; ribonate.
Figure S4: A comparison of the effect of the index variant (from the salivary mGWAS) using measurements of the corresponding metabolite in faecal, serum and saliva samples of the TwinsUK subjects
The plots demonstrate a comparison of the effect of the index variant (from the salivary mGWAS) for the significantly associated salivary metabolites that were measured in all three biospecimens (faeces, serum and saliva), as follows (i-vii): 4-guanidinobutanoate; creatinine; urate; dimethylglycine; 3-ureidopropionate; ethylmalonate; ribonate.
Figure S5: Regional association plots for the genetic loci that were significantly associated with a salivary metabolite in the discovery phase
Regional association plots were created using the LocusZoom tool for the locus-metabolite associations (mQTLs) that were identified in the discovery phase, as follows (i-xiii): AGMAT and 4-guanidinobutanoate; AGMAT and beta-guanidinopropanoate; ATP13A5 and creatinine; SLC2A9 and urate; SLC2A9 and allantoin; DMGDH and dimethylglycine; DPYS and 3-ureidopropinoate; DPYS and 3-ureidoisobutyrate; ABO and N-acetylglucosamine/N-acetylgalactosamine; UGCG and glycosyl-N-stearoyl-sphinganine; FADS2 and 1-(1-enyl-palmitoyl)-2-arachidonoyl-GPC; ACADS and ethylmalonate; TYMS/ENOSF1 and ribonate *The regional association plot for the MGP locus has not been included since LD information for variants at that locus was not well characterised in the 1000 Genomes dataset.
Figure S6: Boxplots demonstrating the distribution of the significantly associated salivary metabolites, stratified by the genotypes of the corresponding index variant
The 14 salivary metabolites that were identified in the discovery phase of our study mapped to 11 distinct genetic loci: of which, three loci (AGMAT, SLC2A9 and DPYS) were associated with two metabolites each; and the remaining eight loci were each associated with one metabolite. The plots demonstrate the distribution of the significantly associated salivary metabolite with respect to the genotypes of the corresponding index variant (for each index variant, the number of observations for each genotype is provided), as follows (i-xiv): 4-guanidinobutanoate; beta-guanidinopropanoate; creatinine; urate; allantoin; dimethylglycine; 3-ureidopropionate; 3-ureidoisobutyrate; N-acetylglucosamine/N-acetylgalactosamine; glycosyl-N-stearoyl-sphinganine; 1-(1-enyl-palmitoyl)-2-arachidonoyl-GPC; ethylmalonate; gamma-carboxyglutamte; ribonate.
Acknowledgements
C.J.S. acknowledges funding from the Chronic Disease Research Foundation and the Wellcome Trust (grant WT081878MA). K.S. was supported by the Biomedical Research Program at Weill Cornell Medicine in Qatar, a program funded by the Qatar Foundation.
TwinsUK is funded by the Wellcome Trust, Medical Research Council, European Union, the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St. Thomas’ NHS Foundation Trust in partnership with King’s College London.
The SHIP cohort study is part of the Community Medicine Research Net (http://www.medizin.uni-greifswald.de/cm) of the University of Greifswald, Germany, which is funded by the German Federal Ministry of Education and Research (BMBF, grant no. 01ZZ96030, 01ZZ0701); the Ministry for Cultural Affairs and the Ministry for Social Affairs of the Federal State of Mecklenburg-West Pomerania (SHIP; http://www.medizin.uni-greifswald.de/cm).