Abstract
The impact of host genetics on gut microbial dynamics is debated. No study to date has investigated the possible role of host genetics in shaping the gut microbiota in HIV-1 infected subjects. With the aim of generating preliminary data to inform future host genetic studies, we performed an exploratory host exome analysis of 147 subjects either infected or at risk of becoming infected with HIV-1 from the MetaHIV cohort in Barcelona. Using a DNA microarray chip, we sought to identify host genetic variants associated to three specific microbial features with a potentially inheritable component, and which were previously found to be associated with gut dysbiosis in HIV infection, i.e.: gut enterotype, presence of methanogenic archaea and microbial gene richness. After correction for multiple comparisons, we did not observe any statistically significant association between the host’s genetic landscape and the explored gut microbiome traits. These findings will help design future, adequately-powered studies to assess the influence of host genetics in the microbiome of HIV-1-infected subjects.
Introduction
The human gut microbiota performs essential functions to human health1, from nutrient digestion to immune homeostasis2. Perturbations of gut microbial configurations thus impact human health. They may occur due to a wide range of environmental factors, including diet3,4, diseases5-7, medication intake8, and are even influenced by age, geography9 and sexual behavior10. Over 20% of the inter-person microbiome variability is indeed associated with factors related to diet, drugs and anthropometric measurements11.
The host’s genetic background shapes the structure and influences the dynamics of the niche inhabited by the intestinal flora12. However, the influence of host genetics on the human microbiome is more debated13. To date, microbiome-wide association studies (MGWAs) integrating the effect of both environmental and host’s intrinsic factors on the microbiota have yielded disparate results 14,15. Most candidate genes associated with microbiota configurations participate in pathways involved in host immunity, microbial sensing, sugar digestion and host dietary preferences16-18. The abundance of intestinal Bifidobacterium has been associated with SNPs in the lactase-coding gene (LCT) in independent cohorts13,19,20. Moreover, an heritable component has been attributed to certain gut microbial taxa like Christensenellaceae21 and Methanobrevibacter22.
Understanding the reciprocal interaction between HIV-1 infection, the host immune system and the intestinal microbiota might provide clues to cure HIV-1 infection. In a crossectional study in 156 European subjects with different HIV-1 phenotypes (MetaHIV cohort) 23,24, we previously showed that men who have sex with men (MSM) were enriched in Prevotella and, after controlling for HIV-1 risk group, there was no consistent microbial dysbiosis pattern discernible by 16S rRNA sequencing25. Using shotgun metagenomics analyses in the same cohort, we identified a strong, dose-dependent relationship between gut microbial richness and nadir CD4+ T-cell counts24, which is the major surrogate marker of excess mortality, systemic inflammation and clinical complications in chronically HIV-1-infected subjects26,27. In our study, changes in microbial richness reflected multidimensional modifications in microbial species composition and functions, including increases in reactive oxygen and nitrogen-resistant species (ROS/RNS) like the Gram-negative generalists Bacteroides and Proteobacteria, and decreases in ROS/RNS-sensitive Gram-positive syntrophic microbes, including hydrogen-consuming methanogenic archaea.
A yet unanswered question is whether the host genetic background could have any influence in such gut microbiome shifts. To the best of our knowledge, no study has investigated the possible role of host genetics in the shaping the gut microbiota in HIV-1 infected subjects. With the aim of generating preliminary data to inform future larger studies, we performed a pilot, exploratory exome analysis of the MetaHIV cohort seeking to identify host genetic variants associated to 3 specific microbial features previously found to be important in this setting, and with a potential inheritable component, i.e.: gut enterotype28, presence of methanogenic archaea22 and microbial gene richness29.
Results
Population stratification in the MetaHIV cohort
We analyzed the exomes of 147 subjects from the MetaHIV cohort 23,24 using the Illumina Infinium CoreExome microarray, which covers 547,644 putative functional exonic variants (Table 1). Our study included 30 (20%) females and 117 (80%) males, most of them Caucasian (79.6%). The gut microbiota of these participants had been characterized using 16S rRNA gene amplicon sequencing23 and whole metagenome shotgun sequencing24 in previously published studies. Using 16S rRNA, 61 individuals from our cohort (41.5%) were classified as Bacteroides-rich, whereas 86 subjects (58.5%) were categorized as Prevotella-rich. Methanogenic archaea were detectable in 89 participants (60.5%).
From all variants covered by the Illumina® microarray, 296,475 passed the pre-specified quality filters and were thus selected for further analyses. From these, 294,411 (99.3%) were found in dbSNP30, and 143,429 (48.4%) were located within an annotated human gene (Table 1). A multidimensional scaling (MDS) plot based on identity-by-state (IBS) distances showed that our multi-ethnic cohort was stratified by genetic ancestry (Figure 1a). As expected, females and males also showed a different pattern of missing variants as estimated by an identity-by-missingness (IBM) matrix (Figure 1b).
Lack of significant associations between host genetics and gut microbiome composition
We assessed the interaction between QC-filtered genetic variants and three different gut microbial traits: gut enterotype (Bacteroides vs. Prevotella), methanogenic archaea (presence vs. absence) and gut microbial gene richness (as a continuous variable). Two independent logistic models were performed to test the association of genetic variants with the gut enterotype23 and the presence of methanogenic archaea24. A linear regression model was used for microbial gene richness. No significant associations were found between any of the three gut microbiome traits investigated and allelic variation under a canonical31 significance threshold of ~3×10-7.
Exploratory analysis using less stringent criteria
With the purpose of providing a pilot exploration of host genomic variants more likely to be associated with the gut microbiome traits in our small dataset, we arbitrarily reduced the stringency of the p-value threshold to p <1e-5. Using this threshold, three genetic variants were linked to methanogenic detection and five were associated to microbial gene richness (Table 2, Figure 2, Supplementary Figure 2). None of them contributed to the enrichment of any reactome pathways.
Putative associations involving innate immunity and major histocompatibility genes
To evaluate the interaction between the gut microbiome composition and the innate immune system, we extracted from the InnateDB database32 the filtered host genetic variants from our cohort located in protein-coding sequences involved in human responses to microbes. Our dataset covered 6,127 variants extracted from InnateDB, which were distributed along 1,702 genes. None of these variants were associated to the explored microbiome features. To explore whether the MHC complex might be related with gut microbial traits, we investigated a total of 205 genetic variants distributed along 24 HLA genes. None of them were associated to screened microbial feature under a p-value threshold of p<1e-5.
Discussion
Using a DNA-microarray-based genotyping strategy, we performed a pilot exploratory study to investigate whether the host genomic background could influence the gut microbiome composition in HIV-1-infected subjects 23,24. Following a GWAS-like analysis approach31,33, we found no statistically-significant associations between the covered host genetic background and Bacteroides vs. Prevotella clustering, detection of methanogens, or microbial gene richness. By setting a less stringent significant threshold, we identified a number of genetic associations with both methanogens presence and microbial gene richness. These should be seen as candidate associations that may exert small additive effects to the gut microbiome configuration and could be confirmed with properly-powered studies, including more participants and a genome-wide set of variants.
The small sample size of this study would only enable detecting variants associated with large effect sizes on gut microbiome composition. We thus suggest that, if the host genetics have any role in shaping the gut microbiome of people with or at risk of acquiring HIV-1, such effect is probably small. This is consistent with previous studies showing that the environment was, in fact, the main determinant of the gut microbiota composition in the general population11.
A study involving monozygotic and dizygotic human twins reported that the abundance of Methanobrevibacter smithii was more concordant between monozygotic than dizygotic twins, suggesting that the colonization of this group of microbes might have a genetic component34. In our study, the presence of methanogenic archaea was associated with three variants distributed along the same intron of LOC105369896, an uncharacterized long non-coding RNA (lncRNA) located in chromosome 12 (Table 2). Despite the vast majority of annotated lncRNA in the human genome have unknown functions, their participation in gene expression regulation, chromosome conformation, cell development and disease, has been widely reported35. A recent study in mice suggested that the lncRNA expression profile discriminates animals with different transplanted microbiota36. These results might be pinpointing to a putative role of lncRNA in human gut microbiota configuration, indicating that lncRNA could be involved in commensals recognition.
Microbial gene richness was the parameter with the higher number of linked genetic variants, although none of them clearly stood out. Two of these genetic variants are located within CDH12 gene, which encodes for cadherin-12 protein. Both of them are tagged as silent mutations, so we wouldn’t expect them to exert important phenotypic effects.
In mice, the major histocompatibility complex (MHC) was suggested to have an effect on the fecal microbiota composition by restricting the colonization of certain bacteria37. In our study, associations with HLA variants were not found. Nevertheless, in GWAS every association involving genes from the HLA complex should be interpreted with caution given the extreme polymorphism at both nucleotide and structural levels of this region38. Demonstration of the implication of HLA variation in human gut microbial configuration will thus require a specific well-powered study.
Unlike human cohorts, studies involving mice do allow for a better control of both genetic and environmental factors, that are important confounders18. In certain murine models, single mutations in genes implicated in innate immunity, including leptin or toll-like receptor 5 (TLR5), have important effects on microbiome composition 39,40. While it is important to assess the contribution of host genotype in shaping the gut microbiota, it has been widely evidenced that the microbial responses induced by diet changes in mice, which are rapid and reproducible; dominate the host genotype effect 41.
This study has a number of limitations. First, its small sample size hinders our ability to detect statistically-significant associations once canonical false discovery corrections are implemented. Unlike in mouse models, testing genetic associations in humans require much larger and random sampling to compensate for environmental confounders such as diet and lifestyle habits. Our cohort included females and males from different geographic origins, which resulted in an expected population stratification42. However, the issue of population structure was controlled for using statistical methods developed to adjust for false associations43,44. Furthermore, data from the MetaHIV cohort should not be extrapolated to the general population because it is a very specific, not randomly-selected cohort of subjects who are either infected or at risk of being infected with HIV-1. Stool microbiome data may not necessarily reveal the effect of the genetic makeup on the dynamics of the microbial populations linked to intestinal mucosal surfaces45. Finally, no biological validation of any of the potential associations has been pursued46.
With all such caveats in mind, in this pilot exploratory analysis we do not find any consistent association between the host genetic makeup and the gut microbiota configuration in subjects who are either infected or at risk of becoming infected with HIV-1. Larger, adequately powered studies will be required to assess if host genetic variants mainly encoding for innate immunity genes, which shape the structure of epithelial cell surfaces 47,18, have an impact on the gut microbiota composition. In conclusion, prolonged immune deficiency and environmental and lifestyle factors play a much more important role than host genetics in shaping gut microbial communities in people living with HIV 11,14,15,17,48.
Materials and Methods
Study subjects
This study included subjects from MetaHIV, a cross-sectional study cohort of subjects infected or at risk of becoming infected with HIV-1 (see 23 for a detailed description of the cohort). HIV-1-infected subjects were recruited from two tertiary HIV-1 clinics in Barcelona, Catalonia, Spain. Most HIV-1-negative controls were enrolled from a prospective cohort of HIV-negative MSM who attend quarterly medical and counseling visits at a community-based center in Barcelona49,50.
Ethics & Community Involvement
All subjects provided appropriate informed consent regarding genetic analyses. The study was reviewed and approved by the Institutional Review Boards of the Hospital Universitari Germans Trias i Pujol (reference PI-13-046) and the Hospital Vall d’Hebrón (reference PR(AG)109/2014). All participants provided written informed consent in accordance with the World Medical Association Declaration of Helsinki. The study concept, design, patient information and results were discussed with the HIVACAT Community Advisory Committee (CAC), who provided input on these aspects as well as on the presentation and dissemination of study results.
Sample collection
Participants provided a single blood samples from which peripheral blood mononuclear cells (PBMCs) were isolated. DNA derived from PBMCs was extracted using the QIAamp® DNA Blood minikit (QIAGEN Inc., Hilden, GER), and stored at -80ºC.
Host genotyping
We genotyped 547,644 genetic variants from 147 subjects using the human Infinium® CoreExome-24v1-0 BeadChip microarray and the Infinium® HTS Assay Protocol Guide (Illumina Inc., San Diego, CA, USA). Exome genotyping was carried out at the Institut de Medicina Predictiva del Cancer (IMPPC). Genotype calling was done with the GenomeStudio Software version 2011.1 (Illumina Inc.). The average call rate was 99.9%, and no gender discrepancy was observed based on assay experiments. PLINK software v1.0751 was used to exclude individuals with >90% missing genotyping data (--mind 0.1) and variants not called in at least 10% of samples (--geno 0.1). Variants with minor allele frequency (MAF) lower than 0.01, and those out of Hardy-Weinberg equilibrium (P<1 x 10-4) were also removed (-hwe 0.0001). A total of 296,475 variants on 147 individuals were included in further analysis (Supplementary Figure 1). Variants passing such filters were annotated following the support data of the Infinium Core Exome kit provided by Illumina and the Ensembl Variant Effect Predictor tool52 (Table 1). Variants located in sexual chromosomes (chromosomes 23 and 24), pseudoautosomal regions (chromosome 25), mitochondrial DNA (chromosome 26) and control probes (chromosome 0) were excluded from the current analysis. Gene expression was assessed using the Human Protein Atlas database 53.
Gut microbial characterization
Gut microbiota information was obtained from previously published studies using 16S rRNA23 and shotgun metagenomic sequencing24 of fecal samples from the same cohort. Based on those studies, we chose to investigate the association between the genetic makeup of the host and 3 specific gut microbiome traits, i.e.: (a) gut enterotype (Bacteroides vs. Prevotella), (b) detection of methanogenic archaea (Methanobrevibacter smithii, M. unclassified and Methanosphaera stadtmanae), and (c) gut microbial gene richness. Previous studies had suggested that HIV-1 infection was associated to shifts from Bacteroides to Prevotella predominance 54–60. Using 16S rRNA sequencing we previously observed, instead, that subjects enriched in Prevotella were mostly MSM, and there was no significant Bacteroides vs. Prevotella clustering by HIV-1 infection after controlling for HIV-1 risk group23. Using shotgun sequencing, we then found that low gene richness was linked to low nadir CD4+ T-cell counts and one of the most robust microbial markers of high microbial gene richness was, precisely, the detection of methanogenic archaea24.
Statistical analysis
Population stratification was tested by constructing a matrix based on pairwise identity-by-state (IBS) distances. Identity-by-state distances were represented using a multidimensional scaling plot (Figure 1a). Effects of systematic batches and gender on missingness were assessed by clustering individuals according to their identity-by-missingness (IBM) distances, i.e. the proportion of missing genotypes discordantly missing by different participants (Figure 1b).
Genetic associations with both qualitative (gut enterotype, absence/presence methanogens) and quantitative microbial traits (microbial gene richness) were evaluated using logistic and linear regression models respectively, setting the additive version of the models and adding sex and gender as covariates. Statistic p-values were corrected using the Bonferroni adjustment to prevent false discovery, given the number of exonic variants tested and the cohort size. Associations between each microbial species abundance and genetic variation were not performed because the abundance of the microbes was not normally distributed, and substantial inflation of type I error rate was expected in such tests61.
A standard genome-wide significance threshold of ~3×10-7 for the European population has been previously proposed for simulated cohorts of N=100 with equal number of cases and controls 31. But, since 1) our population was stratified by ethnical origin; 2) the number of cases and controls was slightly unbalanced; and 3) we did not have a genome-wide coverage of genetic variants, variants with a minimum unadjusted p-value threshold of 1e-5 were considered for further analyses.
A gene set enrichment analysis of reactome pathways62 was performed for genes linked to significant variants using a binomial test and correcting p-values for multiple testing (Benjamini-hochberg procedure).
Author contributions
R.P., and B.C. conceived and designed the study. R.P., B.M., J.Co., J.S., J.N., C. B., M.Cr. and B.C recruited the study participants and performed their clinical evaluations. Fecal DNA was extracted, amplified and sequenced by M.P, M.Ca., and M.R. under the supervision of M.N. and R.P. Y.G, and J.R performed the bioinformatic and statistical analyses, with the supervision of M.N., C.B. and R.P.Y.G. and R.P wrote the paper, which was reviewed, edited and approved by all authors.
Competing interests
The authors declare no competing interests.
Data availability
The datasets generated during and/or analyzed during the current study are available in the National Center for Biotechnology Information -NCBI repository (Bioproject accession number: PRJNA307231, SRA accession number: SRP068240).
Acknowledgements
This study was supported by philanthropic donations from Fundació Glòria Soler, the Gala SIDA 2015, 2016 editions, and People in Red – Barcelona 2017 edition. IrsiCaixa is supported by the RED de SIDA RD16/0025/0041 cofinanced by the ISCIII and the European Regional Development Fund (ERDF), “Investing in your future”. M.R. is funded through a FI-DGR grant (FI-B00184) from Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) at the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya. M.L.C. is funded through the grant MTM2015-64465-C2-1-R, Spanish Ministry of Economy and Competitiveness, Spain. We thank the members of the IGTP Genomics and Bioinformatics Core Facilities (Maria Pilar Armengol, Lauro Sumoy and Iñaki Martínez) for their contribution to this publication. The sponsors of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all study data, and had final responsibility for the decision to submit for publication.