Abstract
The environmental metabolome is a dominant and essential factor shaping microbial communities. Thus, we hypothesized that metagenomic datasets could reveal the quantitative metabolic status of a given sample. Using a newly developed bottom-up ecology algorithm, we predicted high-resolution metabolomes of hundreds of metagenomic datasets from the human microbiome, revealing body-site specific metabolomes consistent with known metabolomics data, and suggesting that common cosmetics ingredients are some of the major metabolites shaping the human skin microbiome.
Main text
Microbial communities constantly adapt to exploit available resources [Martiny 2015]. As a result, the presence of specific microbes or distributions of microbes allows us to infer environmental features. For example, the altered metabolic conditions in the microenvironment of colorectal cancer (CRC) tumors select for the outgrowth of specific species in the human CRC microbiome [Tjalsma 2012], allowing cancer detection [Zeller 2014]. Similarly, microbes can serve as biosensors for geochemical features such as solvent or uranium contamination [Smith 2015]. These and many other empirical examples of significant associations between the environment and the microbiota [Merrifield 2016, Adams 2015] suggest that the composition and metabolic potential of microbial communities could be used to reconstruct the metabolic environment through a reverse engineering strategy.
Shotgun metagenomics rapidly inventories the composition and genomic content of microbiomes, but obtaining similarly high-resolution metabolomic measurements of microbial environments remains challenging, among others due to the hundreds to thousands of biochemical compounds that may be difficult to distinguish using commonly used mass spectrometry-based technologies [Marcobal 2013]. Here we address this challenge by developing a novel approach that predicts the metabolic environment of a microbial community based on its metagenome, by using the inferred metabolism and abundance profiles of community members as input.
The main premise of our approach, which we named bottom-up ecology, is that the metabolic potential of a microbiome that is encoded in the genes of the species and their wiring into pathways, reveals how it can exploit the available metabolic resources. Using bottom-up ecology, we can quantitatively infer these resources by identifying the metabolic environment that yields growth of these microbes in the relative abundances as observed in the metagenome. Our approach is outlined in Figure 1 (see Supplementary Methods for details). First, we use reference genome sequences to generate genome-scale metabolic models of the species encountered (GSMMs) by using an established automated pipeline (1) [Henry 2010]. GSMMs provide a minimally biased description of the metabolic preferences of a microbial genome by integrating prior knowledge about protein functions based on information from many model and non-model organisms, and can be used to model the fluxes through the cellular metabolic network [Henry 2010]. Next, we determine the relative abundances of these organisms in the microbiome by mapping the metagenomic reads against the reference genomes (2). We can now ask the question, which metabolic environment would lead to relative growth rates of the GSMMs (3) that best correlate with the observed relative abundances (4), where growth is defined as the sum of all fluxes in biomass reactions in the GSMMs [Henry 2010]. Flux balance analysis (FBA) allows the growth or biomass production of GSMMs to be estimated given certain constraints. We constrain the GSMMs of all the microbes found in a metagenome by providing the same metabolic environment (modeled as an upper bound to the import reactions of metabolites) and assuming the same cellular objective of growth. Finally, we use semi-Markov chain sampling of the highly dimensional metabolome solution space to identify the metabolomic composition leading to the GSMM growth profile that optimally correlates with the abundance profile of the microbial genomes observed in the metagenome (5).
To test our bottom-up ecology approach in a well-controlled system, we generated nine in silico metabolic environments by sampling random metabolites according to a realistic distribution (see Supplementary Methods). Next, we inferred growth of GSMMs generated from the bacterial reference genomes of the Human Microbiome Project (HMP) [Consortium 2012], and assumed that the microbes with the highest growth rates would be the most abundant in the simulated metagenomes. Using only the predicted abundance values of the best growing species, we then re-inferred the environment based on their relative abundance profiles. As expected, the metabolomes inferred by using bottom-up ecology correlate with the metabolomes on which the species were grown, while this correlation is absent when comparing metabolomes predicted from non-matching species (Figure 2a). Importantly, correct inference of the metabolic environment depends on the information from the entire simulated microbial community. For example, the correlation values on the diagonal in the top-20 species plot were significantly lower than in the top-100 species plot (P-value=0.04, one-tailed paired T-test of nine values, see Figure 2a).
Next, we tested the ability of the bottom-up ecology algorithm to infer the metabolomic environment in multiple human body-sites. We used 175 metagenomic datasets from four different body-sites (oral, skin, stool, and vaginal) where the relative abundances were measured of over one thousand reference bacteria [Consortium 2012]. A principal component analysis of the resulting predicted metabolomic environments revealed four clusters corresponding to the origin of the metagenomes (Supplementary Figure 1a), as was reflected in the body-site specific microbiomes that were previously observed [Consortium 2012]. This clustering was independent of the initialization of our algorithm in the highly dimensional metabolome search space. For example, searches based on oral metagenomes that were initiated with predicted skin metabolomes quickly converged to the oral metabolome cluster (Supplementary Figures 1b–e). Thus, our bottom-up ecology algorithm predicted consistent and specific metabolomes for the human microbiome body-sites.
The metabolomes inferred from the HMP metagenomes allowed us to identify the most important metabolites in the different human body-sites. Due to the difficulties in obtaining quantitative, high-resolution metabolomic profiles of a sample [Marcobal 2013], there are relatively few experimental datasets that could be used for experimental benchmarking of our bottom-up ecology algorithm. From the Human Metabolome Database [Wishart 2013], we obtained three annotated metabolomes from saliva and one from fecal water. Moreover, we included one additional metabolome from a fecal incubator and one from vagina from recent literature [Kortman 2016, Vitali 2015]. We linked measured metabolites between these studies and our models (Supplementary File 1) and correlated the metabolomic profiles of these six independently measured metabolomes to the values of the metabolites inferred by bottom-up ecology from the 175 HMP metagenomic datasets. As shown in Figure 2b, the oral metagenomes are linked to the saliva metabolomes, the stool metagenomes to the fecal water and fecal incubator metabolomes, and the vaginal metagenomes to the vaginal metabolome (P-value≈0, one-tailed unpaired T-test). These results show that the metabolomic environment of a microbiome can be quantitatively captured in high-resolution by our bottom-up ecology algorithm.
To our knowledge, no high-throughput human skin metabolome has been measured to date. To identify the main metabolites on the human skin, we averaged the relative concentrations of all metabolites in the metabolomes predicted based on 50 skin metagenomes (Supplementary File 1). Interestingly, the top compounds shaping the skin microbiome include various ingredients found in cosmetic and hygiene products (Supplementary Table 1). For example, myristic acid is used as a fragrance ingredient, cleansing agent, and emulsifier, and is readily adsorbed by the skin. Similarly, citrate is a commonly used ingredient to adjust the acidity of cosmetics. Nicotinamide ribonucleotide, aspartate, and N-acetyl glucosamine are used in skin conditioner products. Moreover, N-acetyl glucosamine is a precursor to hyaluronic acid, a major component of skin structure, a pathway that responds to UV irradiation in skin [Averbeck 2007]. These results provide a first look into the metabolites shaping the human skin microbiome [Bouslimani 2015, Grice 2011].
Our bottom-up ecology approach exploits the fact that in an ecosystem, microbes are constantly competing for resources, leading to a relative abundance distribution reflecting their ability to exploit these resources. Metagenome-guided modeling enables a deeper understanding of microbial ecosystems by linking the environmental metabolome to the metabolic network of individual microbial populations [Levy 2013]. By explicitly modeling the fluxes of individual GSMMs that are matched with the species composition of the system in a probabilistic fashion, our approach provides a starting point for mechanistic models of microbial ecology, including the potential for systems with more complex crossfeeding networks [Garza 2015].
While our results show meaningful metabolomic profiles for diverse human body-site environments, there is still a relatively high level of noise (Figure 2b). First, it should be noted that the metagenomes and metabolomes used in this benchmark were measured and published independently in systems that may differ in unknown ways, e.g. a fecal incubator versus fresh stool. Second, automatically reconstructed GSMMs have ~70% agreement with experimentally measured microbial phenotypes [Henry 2010, Plata 2015], adding significant noise to the inferred metabolomic profiles. Third, our algorithm depends on the availability of reference genome sequences to derive these GSMMs, as do approaches that exploit environmental sequencing data to make inferences about the genetic functionality of a microbial ecosystem [Langille 2013, Aßhauer 2015]. Ongoing work addressing these points is expected to improve bottom-up ecology as an approach to predict metabolic features of microbial environments, also in environments that are less well-sampled than the human microbiome. Nevertheless, it is promising that we can already reconstruct the metabolomes of human body-site environments by starting only from the metagenomic content of the microbial community.
Author contributions
DRG created the algorithm and performed the experiments. All authors devised the study and wrote the manuscript.
Acknowledgments
We thank Maarten Kooyman from SURFsara for help implementing bottom-up ecology on the Netherlands Life Science Grid. DRG is supported by the Science Without Borders program of CNPQ/BRASIL. BED is supported by Netherlands Organization for Scientific Research (NWO) Vidi grant 864.14.004.