Abstract
The airways of the lung carry microbiota that contribute to respiratory health1. The ecology of normal airway microbial communities, their responses to environmental events, and the mechanisms through which they cause or modify disease are poorly understood. Cigarette smoking is the dominant malign environmental influence on lung function, causing 11·5% of deaths globally2. Asthma is the most prevalent chronic respiratory disease worldwide3,4, but was uncommon 100 years ago5. The asthma pandemic is linked to urbanization, leading to considerations of protective microbiota loss (the “hygiene hypothesis”)6-8 and acquisition of strains that may damage the airway epithelia9. We therefore investigated oropharyngeal airway microbial community structures in a general population sample of Australian adults. We show here that airway bacterial communities were strongly organized into distinctive co-abundance networks (“guilds”), just seven of which contained 99% of all oropharyngeal operational taxonomic units (OTUs). Smoking was associated with diversity loss, negative effects on abundant taxa, profound alterations to network structure and marked expansion of Streptococcus spp.. These perturbations may influence chronic obstructive pulmonary disease10 (COPD) and lung cancer11. In contrast to smokers, the loss of diversity in asthmatics selectively affected low abundance but prevalent OTUs from poorly understood genera such as Selenomonas, Megasphaera and Capnocytophaga, without coarse scale network disruption. The results open the possibility that replacement of specific organisms may mitigate asthma susceptibility.
We examined adults from the population of Busselton in Western Australia who were participating in a general health survey12. Direct sampling of the lung microbiota requires invasive procedures, such as bronchoscopy, that are not yet possible in epidemiological studies. In healthy individuals, however, the oropharynx and the intra-thoracic airways share similar microbiota13,14. We consequently used oropharyngeal swabs for population sampling, accepting that the abundance of pathogens in the lower airways of diseased subjects is imperfectly reflected in the oropharynx13,15,
We submitted oropharyngeal swabs from 578 subjects to 16S rRNA gene qPCR and sequencing, the latter yielding 44,290,100 high quality reads (Supplementary Figure 1.1 for analysis structure). After removal of 173 OTUs with high probability of being contaminants and 13,472 rare OTUs present in only one sample or with less than 20 reads, there remained 4,218 OTUs derived from 43,775,771 reads. To enable diversity analyses based on proportions, the samples were rarefied to a minimum of 6,543 reads, retaining 529 samples containing 4,005 OTUs and 3,461,247 reads. For consistency, unrarefied data from these same 529 samples were used to test differences between subject groups by DeSEQ, and network analyses. No systematic differences in results were seen if the larger sample was analyzed.
The average age of the 529 subjects was 56 years (Supplementary Table 1). Sixty subjects were current smokers and 216 were ex-smokers (with a mean 18 years since quitting). The mean levels of the forced expiratory volume in one second (FEV1) and the forced vital capacity (FVC) of the subjects were normal. There were 77 doctor-diagnosed asthmatics. There was only one case with a clinical diagnosis of COPD. The frequency of asthma and current smoking were not different to the whole Busselton cohort.
Subjects were not included if they were taking antibiotics within six weeks of the time of study. The annual rate of antibiotic prescription in the Australia population is 254 per 1000, and half of these will be for respiratory infections16, so it is likely that many smokers will have intermittently taken antibiotics. Asthma is not considered an indication for antibiotics in the Australian healthcare system. Inhaled corticosteroids, used in the maintenance treatment of asthma, do not seem to affect microbial diversity17.
Structure of the normal airway microbiome
An estimate of Bray Curtis beta diversity (β) for the population gave the mean dissimilarity in microbial diversity (M) between subjects to be 0.51 ± SD=0.06 (on a scale of 0-1), indicating that on average individual airway microbiomes shared about half of their OTUs.
Five phyla contained 98.4% of all OTUs (Table 1, Supplementary Table 2.2). Firmicutes (predominately Streptococcus and Veilonella spp.) was the most common phylum, with 24 OTUs in the top 50, and 57.9% of all OTUs found in the complete dataset. Bacteroidetes (predominately Prevotella spp.) contained 14.1% of the OTUs, Proteobacteria (predominately Neisseria and Haemophilus spp.) contained 12.3%, Actinobacterium 9.1% and Fusobacterium 4.9%. Overall, the 50 most abundant OTUs accounted for 92% of the data (Supplementary Table 2.2).
Streptococcus spp. show high rates of clonal diversity and are poorly differentiated by standard culture and 16S sequences18,19. We therefore sequenced the methionine aminopeptidase gene (map) to further differentiate between Streptococcus taxa19 in 483 subjects. After removal of map_OTUs only present in one sample or with fewer than 20 reads or negative correlations with qPCR abundance there remained 14,898 map_OTUs, suggesting substantial variation in Streptococcal strains in the population. β diversity estimates in rarefied data (to a level of 7,700 reads) found M = 0.84 ± SD=0.06, indicating low similarity of the streptococcal composition between subjects. The top 9 of the 10 most prevalent map_OTUs were identified as S. salivarius, with S. parasanguinis the next most prevalent. (Supplementary Table 2.6.1). The potential pathogen S. mitis/pneumoniae was detected in 58% of subjects, although at low abundance. Tools for further exploration of streptococcal clades are clearly wanted.
Microbial communities are formed through complex ecological interactions that can be exposed through network analyses20. On the assumption that correlations in the abundance of different taxa would capture co-ordinated growth, we applied weighted correlation network analyses (WGCNA)21 to the Busselton dataset.
We observed 13 discrete modules in which the abundance of members was strongly correlated. The WGCNA program labels modules with unique colour identifiers, but we have also named them according to their most abundant genera (Table 2). Just 13 OTUs remained unassigned to a network, referred to as the grey module. The 5 largest modules (in terms of abundance of members) contained contained 97.6% of all OTU sequence reads (Table 2). Individual hubs were stongly connected to their network vectors (range of P = 7.9E-266 to 1.9E-121) (Supplementary Table 2.4.1).
The strengths of association suggest that these co-abundance modules may represent “guilds” of co-operating bacteria that occupy ecological niches on the mucosa.
The largest guild (turquoise module: Prevotella.1) accounted for 42.7% of reads (Table 2, Supplementary Table 2.4.1). The most common organisms were within the genera Prevotella, Veillonella, Actinomyces and Atopobium. These organisms resemble common mucosal commensals at other body sites, and perhaps represent a base microbial carpet. The smaller guild (cyan) on the same division (B) of the network dendrogram (Supplementary Figure 1.2) was almost entirely made up of Veillonella spp. and may occupy a related ecological niche.
The blue module (Streptococcus.2) contained 21.6% of reads, predominately from the genera Streptococcus, Haemophilus and Veillonella. Network hubs included Lactobacillales and Gemella. The adjacent network (Neisseria: green) (Supplementary Figure 1.2) was dominated by Neisseria, with Porphyromonas, and Capnocytophagia. This may suggest a normal guild than can be occupied by Proteobacteria potential pathogens.
The magenta module (Streptococcus.1) (19.4% of reads) was completely dominated by Streptococcus taxa (40%) and an unidentified Firmicutes (60%) (Supplementary Table 2.4.1) which is likely also to be streptococcal (based on phylogenetic clustering, not shown). Network hubs were also Streptococcus, identifying a streptococcal-specific guild in the mucosa.
Although our results were well powered to map microbial community composition, only limited functions could be surmised by genus assignments and the relationship of the networks to each other (Supplementary Figure 1.2). Metagenomic shotgun sequencing remains problematic for respiratory samples because of high proportions of human DNA, and we accept that a survey of the airway microbiota that includes systematic culture and genome sequencing is desirable.
The airway microbiome and clinical traits
A stepwise regression found that microbial diversity in individual airways was independently related to current cigarette smoking (R2 =6%, P<0.001), a current diagnosis of asthma (additional R2 =1.4%, P<0.005) and packyears of smoking (additional R2 =0.8%, P=0.04) (Supplementary Table 2.3), but not to age or sex. We partitioned the data into three subgroups: smoking + packyears>10 (n=159; asthmatic (n=77); and unaffected (n=233).
Smoking
A DeSEQ analysis to identify changes in specific taxa revealed marked effects of cigarette smoking. (Figure 1, Supplementary Table 2.5.1 and 2.5.2). The loss of diversity affected many abundant OTUs, including those in the genera Fusobacterium, Neisseria, Haemophilus, Veillonella and Gemella. By contrast, the OTUs increased in smokers were in general highly abundant Streptococci. Examination of map gene OTUs attributed increases in abundance to S. parasanguinis (Fold change 5.2, Padjusted =1.75E-07), S. mitis/pneumoniae (3.62, 4.81E-09), S. salivarius (3.03, 5.59E-15) and S. thermophilus (2.53, 7.38E-05) (Supplementary Table 2.6.2).
To further explore the impact of smoking and asthma on the higher order structure of the airway microbiome, co-abundance networks were constructed separately in the asthmatic and current smoker portions of the cohort and compared with the full dataset (representing the whole population), limiting the analysis to the 4,207 OTU present in all three datasets.
The network structure of the communities was profoundly altered in current smokers. Whilst the largest guild (Prevotellla.1: commensal carpet) showed relative preservation, other modules showed markedly lower levels of conservation and were strongly positively or negatively associated with smoking status; either in terms of module eigenvectors or hubs (Figure 2, Table 2, Supplementary Table 2.4.1). In smokers, 276 OTUs became disconnected from any module. These most strongly featured Streptococcus (70 OTUs), unknown genera (41 OTUs) and Veillonella (35 OTUs).
Cigarette smoking has previously been shown to affect the airway microbiota22, but the extent, magnitude and specificity of disruption here suggests an independent capacity to damage human health. The loss of diversity may predispose smokers to the recurrent infections that lead to COPD10,22. Smoking is accompanied by substantial changes in the bowel flora23 that may mediate smoking influences on inflammatory bowel disease. Bacteria have known roles in the genesis of cancer in general24 and in lung cancer specifically11. Streptococcus spp. produce an array of potent toxins that act against human cells or tissues25, and the expansion of Streptococcus clades in smokers might be carcinogenic. Most patients with lung cancer have been heavy smokers and smoking often continues after diagnosis. Our results might also suggest that the local lung microbiota should be considered a factor in lung cancer responses to immuotherapy26.
Asthma
Microbial diversity loss in asthmatics compared to non-smoking subjects was qualitatively different to the effects of smoking. Two genera (Neisseria and Rothia OTUs) were increased in numbers (Padjusted <0.05) (Figure 3, Supplementary Table 2.5.3). Of these, the Neisseria OTU was abundant (4.7% of reads in the population) and showing a 2-fold increase, consistent with increases in Protebacteria spp. previously observed in the thoracic airways of asthmatics13,27.
Eighty-four OTUs were in relatively low abundance amongst asthmatic subjects (Figure 3, Supplementary Table 2.5.4). In marked contrast to smokers, the affected organisms were often in poorly characterized or potentially fastidious genera, including Leptotrichia, Selenomonas, Megasphaera and Capnocytophaga. Some representatives of the more common genera Actinomyces, Prevotella and Veillonella were also less abundant. This spectrum does not match organisms shown to be affected by inhaled corticosteroids17.
These taxa were not concentrated in individual networks. The modules did not correlate with the presence of asthma either in terms of their vectors or their hubs, indicating that the general structure of oropharyngeal microbial communities in asthmatics was preserved (Figure 2).
Divergent (but potentially complementary) theories are offered on possible mechanisms by which microbial diversity might prevent asthma. The “immune deviation” hypothesis suggests that the adaptive immune system needs exposure to infections in order to avoid inappropriate reactions28. An extension of this model is that absence of commensal organisms leads to loss of local or systematic tonic signals that normally down-regulate immune responses at mucosal surfaces29.
Our findings, of reduced numbers of distinctive low-abundance organisms, are consistent with immune modulation by such bacteria. Nevertheless, the lower thoracic airway microbiota in asthmatics consistently show significant excesses of potentially pathogenic Proteobacteria 13,27 (or Streptococcus spp. in severe disease 13,30,31), and we detected a significant increase in the abundance of a common Proteobacterial OTU in the Busselton asthmatic subjects. It is therefore also possible that a diverse microbial community protects against asthma through inhibition of colonization by potential pathogens through effects on growth, adherence or biofilm formation32.
In each case, potential benefits appear likely from manipulating the specific airway microbiota found to be reduced in asthmatics. This provides a strong impetus to isolate and study the organisms that could provide protection.
Methods
Five hundred and 78 Caucasian adults were recruited through the Busselton Health Study in Western Australia. Individuals with a diagnosis of cancer were excluded. Subjects completed a detailed questionnaire as previously described12. Subjects were classified as asthmatic if they answered yes to the question “Has your doctor ever told you that you have asthma”. Other diagnoses potentially influencing the microbiome were diabetes (n=18 patients) and gastroesophageal reflux (GERD, n=36). No associations were found for diabetes or GERD in any analyses, and we included subjects with these diagnoses in the unaffected group.
Samples for microbial analysis were taken under direct vision, using sterile rayon swabs that were rubbed gently with an even pressure around the oropharynx five times, strictly avoiding contact with tonsils, palate or nose. Swabs were immediately frozen and stored at −80°C prior to transportation on dry ice to Imperial College London, UK.
DNA was extracted from swab heads using the MP Bio FastDNA Spin Kit for Soil (http://www.mpbio.com). Blank controls with no sample added were taken from each DNA extraction kit to test for contamination33
PCR of the 16S rRNA V4 region was performed in quadruplicate using a custom indexed forward primer S-D-Bact-0564-a-S-15 (5’ AYT GGG YDT AAA GNG 3’), reverse primer S-D-Bact-0785-b-A-18 (5’ TAC NVG GGT ATC TAA TCC 3’) and a high fidelity Taq polymerase master mix (Q5, New England Biolabs, Massachusetts, USA). Primer sequences were based on Klindworth et al.34, with dual-barcoding as per Kozich et al.35 with adaptors from Illumina (California, USA). A mock community (Table S1) was included to assess sequencing quality. PCR cycling conditions were: 95°C for 2 minutes followed by 35 cycles of 95°C for 20 seconds, 50°C for 20 seconds and 72°C for 5 minutes. Amplicons were purified, quantified and equi-molar pooled and the library paired-end sequenced (Illumina MiSeq V2 reagent kit) 35 as previously described36. Bacterial load was quantified by qPCR using KAPA BioSystems SYBR Fast qPCR Kit with the same 16S rRNA V4 primers used for sequencing.
Analysis of data was carried out in the R environment and details can be followed on github: https://tinyurl.com/y2onjblt. Sequence processing was performed in QIIME (version 1.9.0)37. Community level differences in alpha and beta diversity and Operational Taxonomic Unit (OTU) level differences, were analysed using Phyloseq in R (version 3.2.0). A phylogenetic tree was generated from the representative sequences using the default parameters of the make_phylogeny command37. Taxonomy of OTUs was assigned by matching representative sequences against release version 23 August 2013 of the Silva database38 using the default parameters of the assign_taxonomy command37. OTUs occurring in only one sample or with less than 20 reads in the whole dataset were removed. Weighted and unweighted UniFrac beta diversity measures and subsequent principle co-ordinates analysis of them was carried out using the beta_diversity_through_plots script37. For the purposes of alpha diversity calculations, the raw counts tables were rarefied to a depth of 6,543 reads. Significant differences in alpha diversity between datasets were assessed using Mann–Whitney U-tests.
Potential kit contaminant OTUs were identified by the presence of negative Spearman’s correlations between OTU abundance and bacterial burden (logged qPCR copy number), adjusted using Bonferroni corrected P-values < 0.05. OTUs subsequently of interest were cross-checked with a listing of potential contaminants33.
We further differentiated Streptococcus spp. by sequencing the methionine aminopeptidase (map) gene19 in 483 samples (constrained to 5 sequencing runs with controls). Of these subjects 234 were never-smoking and 53 were current smokers. We used barcoded primers map-up 5′ GCWGACTCWTGTTGGGCWTATGC ‘3 and map-down 5′ TTARTAAGTTCYTTCTTCDCCTTG ‘3. As positive controls, DNA from nine strains of Streptococcus with bacterial identity confirmed through Sanger sequencing was used for positive controls (S. agalactiae (DSMZ-2134); S. constellatus subsp. Constellatus (DSMZ-20575); S. infantis (DSMZ-12492); S. parasanguinis (DSMZ-6778); S. pneumoniae (DSMZ-20566); S. pseudopneumoniae (DSMZ-18670); S. pyogenes (DSMZ-20565); S. sanguinis (DSMZ-20567); and S. mitis (DSMZ-12643)). Analysis was performed in QIIME37, using a clustering level of 95% to define OTUs. We attributed the most common OTU sequences to Streptococcal species by BLAST searches. Full details are online (http://hdl.handle.net/10044/1/63937).
Co-abundance networks between non-rarefied OTU abundances were analyzed using the WGCNA package39. Abundances were log transformed with 0.1 added to zeroes40, and the topological adjacency matrix was constructed from Spearman’s correlation coefficients with a β soft thresholding parameter of 3. Hierarchical clustering of the overlap matrix with dynamic tree cutting defined the co-abundance modules, with a minimum module size set at 20 OTUs. The significance of Spearman’s correlation between module eigengenes and clinical variables was adjusted for multiple testing using the Benjamini and Hochberg method41. Module structure was contrasted between cohorts using the R package circlize (0.4.5).
Author contributions
Overall study design: AWM, AJ, MH, JH, MJC, MFM, and WOCMC. Busselton Survey and sample collection: MH, JH, AJ, AWM. Microbial analysis strategy (laboratory and bioinformatics); MJC, MFM, PJ, EMT. Laboratory experiments: EMT, with assistance and advice by MJC, PJ, LC and MFM. Primary ecological analyses EMT with input from MJC and PJ. Network analysis EMT and SWO. Secondary analyses WOCC. EMT, MFM and WOCMC wrote the first draft of the paper. All authors have read and contributed to the final version of the paper.
Competing Interests
The authors have no competing interests to declare.
Materials & Correspondence
The raw data is available online at the European Nucleotide Archive at the European Bioinformatics Institute, with the accession number PRJEB29091
The R scripts for analysis are available at https://tinyurl.com/y2onjblt
Supplementary Material
PDF 854KB
Acknowledgements
The study was funded by a Wellcome Senior Investigator Award to WOCC and MFM (P46009), and the Asmarley Trust. The Busselton Healthy Ageing Study is supported by grants from the Government of Western Australia (Office of Science, Department of Health) and the City of Busselton, and from private donations to the Busselton Population Medical Research Institute. We thank the WA Country Health Service and the community of Busselton for their ongoing support and participation.