Abstract
Marine regions that experience low dissolved oxygen (DO) from seasonal to long-term time scales, a.k.a. dead zones, are increasing in number and severity around the globe with deleterious effects on ecology and economics. One of the largest of these occurs on the continental shelf of the northern Gulf of Mexico (nGOM) as a result of eutrophication-enhanced bacterioplankton respiration and strong stratification. The effects of this perturbation on microbial assemblages, and therefore the underlying potential for biogeochemical cycling, have only begun to be explored. Here we present predicted roles for multiple organisms in various phases of nitrogen, carbon, and sulfur cycling and evidence for electron donor-based niche partitioning based on more than 70 high-quality genomes (33 of which are > 70% complete) reconstructed from whole community metagenomic data in the 2013 nGOM dead zone and metatranscriptomic expression data. All but two genomes could be classified into 17 named bacterial and archaeal phyla, and many represented the most abundant organisms in the dead zone (e.g. Thaumarchaeota, Synechococcus), some of which include members of the uncultivated “microbial dark matter” (e.g. Marine group II Euryarchaeota, SAR406). Surprisingly, we also recovered near complete genomes belonging to Candidate Phyla that are usually associated with anoxic environments: Parcubacteria (OD1), Peregrinibacteria, Latescibacteria (WS3), and ACD39. This work provides an important biogeochemical map for multiple phyla that will help to resolve the impacts of hypoxia on nutrient flow in the dead zone.
Introduction
Dissolved oxygen (DO) below 2 mg L−1 (~62.5 μM- “hypoxia”) becomes dangerous or lethal to a wide variety of marine life, including organisms of economic importance1. Hypoxia results from aerobic microbial metabolism combined with strong stratification that prevents reoxygeneation of bottom waters. These taxa are fueled primarily by autochthonous organic matter generated from phytoplankton responding to nitrogen input1. Dead zones are increasing in number and frequency as use of nitrogen-based fertilizers increases with a concomitant increase in transport to coastal oceans via runoff 2. In the nGOM, nitrogen deposition from the Mississippi and Atchafalaya Rivers leads to one of the world’s largest seasonal dead zones, with bottom water hypoxia that can extend over 20,000 km2 1. Action plans to mitigate the nGOM dead zone have stressed that removing continued uncertainties and increasing our “understanding of nutrient cycling and transformations” remain vital for plan implementation3. These needs motivated our current study of the engines of dead zone nutrient transformation: microorganisms.
Much of our current knowledge regarding microbial contributions to regions of low DO comes from numerous reports describing naturally occurring, deep water oxygen minimum zones (OMZs), such as those in the Eastern Tropical North and South Pacific, the Saanich Inlet, and the Arabian, Baltic, and Black Seas4–11. In many of these systems, continual nutrient supply generates permanent or semi-permanent decreases in oxygen, sometimes to the point of complete anoxia4. During these conditions anaerobic metabolisms, such as nitrate and sulfate reduction and anaerobic ammonia oxidation, become prevalent5,9,11–13. In contrast, the nGOM dead zone is distinguished by a seasonal pattern of formation, persistence, and dissolution1; benthic contributions to bottom water oxygen consumption14,15; and a shallow shelf that places much of the water column within the euphotic zone16. While parts of the nGOM dead zone can become anoxic1,17, many areas maintain low oxygen concentrations even during peak hypoxia while the upper water column remains oxygenated18–20.
The first studies of bacterioplankton assemblages in the dead zone showed Thaumarchaea dominated21 in low DO waters and could be highly active22. Of the thousands of operational taxonomic units (OTUs), only 15 OTU abundances correlated with DO (either positively or negatively), and many of the most abundant taxa were expected aerobes based on taxonomic affiliation21. Further, amoA and Thaumarchaeal 16S rRNA gene copy per L of seawater inversely correlated with DO. Consistently, aerobic ammonia oxidation, not nitrate reduction, was the dominant source of nitrite in the 2012 dead zone22. Nitrite accumulation in waters even with low DO was unusually high and possibly due to a decoupling of ammonia- and nitrite-oxidation22. Thus the nGOM dead zone has unique microbiological features that echo its physico-chemical distinctions.
However, despite being one of the largest zones of eutrophication-based hypoxia in the world and its relevance to comparisons with global oxygen minimum zones, we still know comparatively little about the microbial ecology and physiology within the nGOM dead zone. To improve our understanding of the metabolic roles played by dead zone taxa, we reconstructed nearly 80 bacterioplankton genomes from metagenomic sequences across six sampling sites spanning DO concentrations of 2.2 - 128.8 μM21, and mapped metatranscriptomic sequences to measure gene and genome activity. We confirm that many of the most dominant and active taxa have predicted aerobic metabolism, implicate many organisms in the nitrogen, sulfur, and carbon cycles, recover members of Candidate Phyla (CP) that have gone previously unreported in the system, and provide evidence for electron donor-based niche partitioning.
Results
Metagenomic assembly yielded high quality genomes from a wide range of taxa
We recovered a total of 77 bacterioplankton genomes from two separate assemblies (Table 1). 33 had predicted completeness > 70%, and only six had predicted contamination > 6% (Table S1). The genomes belonged to 17 bacterial and archaeal phyla, and in some cases, multiple subclades within a phylum (Table 1). Multiple genomes represented dominant taxa previously observed with 16S rRNA gene surveys21: two Thaumarchaeota genomes that grouped closely with Nitrosopumilus taxa, six Marine Group II Euryarchaeota (MGII), three Synechococcus, five Nitrospina and SAR406 (Marinimicrobia) genomes, and several species from the Alpha- and Gammaproteobacteria (Figs S1, S2). Actinobacteria, Proteobacteria, and Planctomycetes were represented by the largest number of genomes, although most in the latter two phyla had low predicted completion percentages (Table S1). None of the Planctomycetes genomes grouped with known anammox bacteria in the Brocadiaceae that play important roles in some OMZs9,11, but instead grouped with either the Planctomycetaceae or Phycisphaerae (Fig. S1). In general, all of the most abundant phyla detected with 16S rRNA gene analysis in Gillies et al. (2015) were represented by at least three genomes, and there were many from less abundant phyla, including Gemmatimonadetes and Ignavibacteriae (Table 1). Notably, we recovered genomes from several uncultivated microbial “dark matter” clades. In addition to the uncultivated SAR406s and MGIIs, we reconstructed three SAR202 genomes and six genomes from three different Candidate Phyla (CP): Parcubacteria (formerly OD1- specifically Candidatus Uhrbacteria), Peregrinibacteria, Latescibacteria (WS3), and one putatively assigned to ACD39 (Fig. S1, S3).
Roles in nitrogen and sulfur cycling
We identified potential bacterioplankton contributions to every phase of inorganic nitrogen cycling except nitrogen fixation (Fig. 1). Nitrosopumilus and Nitrospina spp. exclusively occupied roles in ammonia and nitrite oxidation, respectively, consistent with previous observations in the dead zone21,22. The Bin 71 Nitrosopumilus genome has predicted amoABC genes in a syntenic arrangement to that of Nitrosopumilus maritimus SCM1 (Walker, 2010), and we observed transcripts from these genes in all samples, supporting the view of active nitrification. Nitrosopumilus spp. dominated the lowest DO sites (Fig. 2), and showed proportionate activity, based on rank-residuals (see methods) for RNA vs. DNA recruitment rank (Fig. 3, Rows 21 & 28). In contrast, Nitrospina spp. displayed much lower abundances (Fig. 2), but one genome (Bin 25) demonstrated high activity relative to other taxa across all samples (Fig. 3, Row 75). Bin 25 was also the only genome with a nitrite oxidase nxrA23, and we observed expression across all samples with reliable data (Table S1). Active, but low abundance Nitrospina corroborate Bristow et al. (2015), who found low overall expression of nxrA, with all transcripts predicted to belong to Nitrospina spp. Nitrite levels and temperatures in our samples also support their suggestion that Nitrospina spp. may have been nitrite and/or temperature limited22,24, thus leading to low rates of nitrite-relative to ammonia-oxidation and an accumulation of nitrite to micromolar concentrations- an unusual phenomenon in a system not fully anoxic.
The 2013 dead zone data show a similar nitrite accumulation to micromolar concentrations (Table S1). Abundant nitrite may have been reduced along two different paths. Two each of Bacteriodetes, Latescibacteria, and SAR406 genomes had predicted nrfAH genes for nitrite reduction to ammonia (Table S1). We observed activity for these taxa (Fig. 3) and the nrfAH genes (Fig. 4), indicating a possible nitrite/ammonia metabolic loop in three of the four hypoxic sites. Alternatively, multiple genomes contained predicted nirK and nirS genes for reduction of nitrite to nitric oxide. Aggregate expression of nirK genes, found primarily in the Nitrospina and Nitrosopumilus genomes, but also in one Flavobacteriaceae and one MGII genome, exceeded that of all other reductive nitrogen cycle genes (Fig. 4). The presence of nirK in the absence of other denitrification genes may indicate a nitrite detoxification method25,26, or simply an alternative terminal electron accepting process during low DO27.
We observed roles for many organisms in other parts of the nitrogen cycle as well. Although multiple genomes contained genes for nitrate reduction, and we observed expression of narGH in most samples (Fig. 4), complete denitrification capability was only predicted for one Flavobacteriaceae genome (Bin 10, Polaribacter sp.) (Fig. 1). Also, we only observed expression of all required genes (nar, nap, nir, nor, nos) in the lowest DO sample (E2A- Fig. 4). However, our findings suggest denitrification can occur via multiple taxa in concert, and many organisms appeared poised to reduce nitrous oxide via nosZ genes (Fig. 1). The pervasive nosZ genes in the absence of other parts of the denitrification pathway suggest that these taxa have evolved to reduce nitrous oxide as a terminal electron acceptor. Given the paucity of genomes with nitric oxide reductases, production of nitrous oxide via Nitrosopumilus28,29 seems plausible as the dominant source. We detected nosZ transcripts in nearly all samples (Fig. 4). A number of genomes also had predicted capacities to convert organic nitrogen compounds to inorganic sources of nitrogen (Fig. 1). Importantly, Synechococcus and one Rhodobacteraceae genome, closely related to Donghicola spp., encoded cyanate lyases corroborating previous reports of similar organisms using cyanate as a nitrogen source30,31.
Fewer organisms had putative pathways for sulfur cycling (Fig. 1). Chromatiales, Rhodobacteraceae, and Rhodospirillales genomes contained dsrAB genes (Table S1), predicted by phylogenetic analysis32 to operate oxidatively (Fig. S4). Genomes in these same clades also contained aprAB and sat genes for sulfate/sulfite conversion, although based on the dsrAB data, aprAB likely function in reverse as well. We observed putative thiosulfate oxidation genes in the same groups and the Gemmatimonadaceae. Thus, sulfur oxidation was phylogenetically restricted. Chromatiales taxa were abundant in hypoxic samples (Fig. 2) similarly to reports in the ETNP OMZ33. However, these taxa contributed fewer transcripts than would be expected based on DNA abundance (Fig. 3) suggesting a limited influence. Two SAR406 genomes had predicted genes for thiosulfate reduction to sulfide (and/or polysulfide reduction34)- we detected transcripts for these genes only in samples E2A and E4 (Table S1).
Illuminating metabolism within microbial dark matter
Several recovered genomes belonged to clades with no cultured representatives: Marinimicrobia, MGII, SAR202, and the CP Parcubacteria (OD1), Peregrinibacteria, and Latescibacteria (WS3). Marinimicrobia (a.k.a. SAR406 and Marine Group A) are widely distributed35,36 and frequently observed in oxygen minimum zones5,37, but nevertheless remain poorly understood from a functional perspective. Four Marinimicrobia genomes (estimated at > 73% complete) informed our metabolic reconstruction of this group (Figs. 5, S5). SAR406 genomes have both high and low-affinity cytochrome c oxidases38 (Table S1) extending previous observations of adaptive traits for a range of DO concentrations37. This is consistent with their predominance in lower DO samples from the dead zone (Fig. 2). The genomes also encode potential for thiosulfate/polysulfide reduction to sulfide as previously observed37, however we additionally predict capabilities in nitrate reduction, nitrous oxide reduction, and nitrite reduction to ammonia (Fig. 1, Table S1). The genomes contain putative transporters for a variety of dissolved organic matter (DOM) components including nucleosides, amino and fatty acids, and oligopeptides. They also contain an enrichment of both glycoside hydrolases (GH) and carbohydrate binding modules (CBM) (Table S1), the majority of which have secretion signals indicating they target external carbohydrate nutrients. This supports previous predictions for organic matter degradation by these taxa39. Thus, Marinimicrobia likely perform chemoorganoheterotrophic metabolism with flexibility in terminal electron accepting processes that implicate them in nitrogen and sulfur cycling.
Other well-known, but still as yet uncultivated, taxa include SAR202 and MGII. Members of the SAR202 clade of Chloroflexi also inhabit a wide variety of marine environments40, frequently in deeper waters40–42, but remain functionally obscure. We recovered three SAR202 genomes (estimated > 83% complete) that, although present at low abundance (Fig. 2), nevertheless showed relatively high activity in some samples (Fig. 3). These organisms likely respire oxygen, however nitrate and nitrous oxide reductases were recovered in two genomes, indicating a potential for utilization of alternative terminal electron acceptors (Table S1). While transporter predictions support previous observations of DOM uptake41, the SAR202 genomes also contained predicted copies of 2-oxogluterate synthase and citryl-CoA lyase, suggesting a possible C-fixation route via the reductive TCA cycle. Our MGII reconstructions largely corroborate previous descriptions43,44 of these organisms as aerobic chemoorganoheterotrophs with transporters for a variety of DOM compounds (Table S1).
Recovery of Parcubacteria from a coastal marine system is unusual, but not unprecedented. Parcubacteria SAGs have been identified from marine and brackish sources39. However, Peregrinibacteria have thus far only been found in terrestrial subsurface aquifers45–48. Consistent with previous reports of obligate fermentative metabolism by Parcubacteria and Peregrinibacteria39,45,47,49, we identified no respiratory pathways for these taxa and they trended towards greater abundances in the lowest DO samples (Fig. 2). Furthermore, we found no genes linking them to substantial roles in nitrogen or sulfur cycling.
Latescibacteria have been proposed to have a particle-associated, strictly anaerobic lifestyle, with many genes available for the breakdown of complex algal cell wall components and utilization of the resultant monomers39,50. While our carbon substrate predictions and flagellar gene identification largely substantiate these findings, the dead zone Latescibacteria encode cytochrome c oxidases and other elements of aerobic electron transport chains (Table S1). The discrepancy in predicted physiological roles between these and previously identified Latescibacteria may be due to the incompleteness of the single-cell genomes39 or perhaps horizontal gene transfer causing differentiation within the phylum. Indeed, the best BLASTP hits to the cytochrome c oxidase subunits in the Bin 50 genome matched an Acidobacteria strain (Table S1). While the possibility also exists that the entire contig was inappropriately binned, the remaining genes on the contig do not support its assignment to another specific taxon (Table S1). The even distribution of the dead zone Latescibacteria across varied DO concentrations also supports their predicted aerobic capability (Fig. 2), but this hypothesis needs further investigation.
One of the genomes with the highest transcriptional activity across samples was Bin 13 (Fig. 3), which grouped with CP ACD39 taxa (Fig. S1). However, the phylogenetic position was poorly supported, so we consider this designation putative. The genome for this organism encodes a predicted obligate aerobic chemoheterotrophic metabolism, with no designated roles in the nitrogen or sulfur cycles (Fig. S7). Its high activity indicated the importance of a DO consuming, organic carbon-based lifestyle even in low DO waters.
Evidence for electron donor-based niche partitioning
Although some of the recovered CP taxa likely subsist via an obligate anaerobic lifestyle, the majority of groups in the dead zone had one or more genomes that encoded cytochrome c oxidases (and other electron transport chain components) for respiring oxygen (Table S1). Pervasive aerobic metabolism in an area of depleted oxygen seems counterintuitive, yet despite DO being as low as 0.2 μM in the E2A sample, oxygen likely remained prevalent enough to sustain aerobic microbes. As little as 0.2-0.3 μM oxygen inhibited denitrification in OMZ populations by 50%51, and Bristow et al. (2015) experimentally verified that at levels as low as 20 μM DO, aerobic ammonia oxidation by marine Thaumarchaea, rather than nitrate reduction, led to nitrite accumulation in 2012 nGOM dead zone bottom waters. Most of the taxa responsible for the largest number of transcripts (proportionally- Fig. 3, stars) were from groups of predicted aerobes, and the only groups that could be assigned dissimilatory nitrate reduction genes (e.g. Bin 10 (Flavobacteriaceae), Bin 51-1 (Marinimicrobia), etc.) also had cytochrome c oxidases (Table S1), making these taxa likely only facultative anaerobes. Based on overall pathway reconstruction and gene expression, the likely options for external terminal electron acceptors appeared limited to oxygen, nitrate, nitrite, or nitrous oxide. Indeed, we observed expression of cytochrome c oxidase genes at similar or greater levels than nitrate reductases (Fig. 4), indicating active aerobic respiration in dead zone bacterioplankton, even in dysoxic (1-20 μM DO5) water. Although we also observed nirK transcripts at higher levels than any of these other gene sets, concurrent expression of all genes indicates possible co-reduction of multiple compounds, which can occur in non-limiting electron donor circumstances52.
In contrast to the limited number of terminal electron-accepting processes available, electron-donating processes occurred in much greater variety, spanning different forms of both lithotrophy and organotrophy. The dominant Thaumarchaea were predicted ammonia-oxidizing chemolithotrophs and one of the most active genomes, Bin 25, represented nitrite-oxidizing Nitrospina spp. We identified the potential for sulfur chemolithotrophy in both Gamma-(Chromatiales) and Alphaproteobacteria (Rhodobacteraceae and Rhodospirillales) genomes. Furthermore, these genomes encoded a multitude of chemoorganoheterotrophic options. We identified transporters for import of many different DOM components, from sugars to amino and carboxylic acids, differentially distributed throughout the various taxonomic groups (e.g., oligopeptide transporter family vs. p-aminobenzoyl-glutamate transporter family- Table S1). Within carbohydrate metabolism, CAZy profiles indicated other differential preferences. For example, while as expected the chemolithotrophic Nitrosopumilus genomes contained no predicted GH or polysaccharide lyases (PL) (e.g., see the CAZy database), Marinimicrobia genomes were enriched in the former, and Latescibacteria genomes showed evidence for enrichment in both (similarly to previous observations50). SAR202 genomes contained similar numbers of GH and carbohydrate esterases. The comparative cornucopia of metabolic pathways for electron-donating relative to electron-accepting processes strongly supports the hypothesis that niche differentiation within dead zone bacterioplankton occurs predominantly via specialization for different oxidizable substrates rather than for different roles in the important redox cascade in OMZs4,5. Indeed, the diversity of compounds within DOM supplies myriad opportunities for microbial differentiation in the chemoorganoheterotrophic space53–56.
Discussion
Our results provide the first system-level genomic, and some of the most comprehensive metabolic, reconstructions and assessment of active functions to date for bacterioplankton in the nGOM dead zone. These genomes comprise the most abundant and/or active taxa observed throughout the region in both 201222 and 201321 and CP taxa that had gone previously undetected. The pervasiveness of aerobic respiration pathways and the limited number of alternative terminal electron accepting processes supports a view that bacterioplankton likely occupy different niches based on specialized oxidation of different electron donors, including various DOM components.
This has distinct implications for dead zone creation, maintenance, and dissolution because the amount of oxygen consumed by a given taxon will depend greatly on the kinetics of the terminal oxidase(s), the reducing equivalents available from the electron donor of choice, and in the case of organic compounds, the relative use of that substrate for growth vs. energy (bacterial growth efficiency57). How changing electron donor affects the relative abundance and DO consumption rates of these organisms will be an important area of future research. Unfortunately, the relative oxygen consumption rates for different taxa across different substrates has been cataloged for only a small number of taxa58 and many important to the dead zone remain uncultured. Customized cultivation strategies for nGOM bacterioplankton are showing some success in this area59, and can provide valuable additional data on respiration rates for individual organisms. We also know little about the seasonal variations in bacterioplankton assemblages in the nGOM, or how these relate to electron donor availability. Previous work indicates, for example, that Thaumarchaea occur in lower abundances during spring60,61, before DO decreases, than they did during the 2013 dead zone. This means that these taxa become competitive at a later seasonal stage, but whether that is caused by low DO, changes in competitiveness based on electron donor availability, or some other factor is unknown. Indeed, the lack of microbial time series data in the nGOM leaves many additional open questions for these and other taxa. Which taxa respire the most oxygen? Are they abundant year round, or, if not, when do they become important and what selects for this? How do changing DOM sources and components affect respiration rates at the individual taxon and whole community scale? What is the role of particle-associated taxa in DO consumption? Answers will provide greater understanding of the dynamics resulting in low DO and deoxygenation trends in the world’s oceans, as well as provide context for comparative studies between dead zones and oxygen minimum zones around the globe.
Methods
Sample selection and nucleic acid processing
Six samples representing hypoxic (n=4) and oxic (n=2) DO concentrations were picked from among those previously reported21 at stations D1,D2, D3, E2, E2A, and E4 (Table S1). At these six stations 10 L of seawater was collected and filtered with a peristaltic pump. A 2.7 μM Whatman GF/D pre-filter was used and samples were concentrated on 0.22 μM Sterivex filters (EMD Millipore). Sterivex filters were sparged and filled with RNAlater. DNA and RNA were extracted directly off of the filter by placing half of the Sterivex filter in a Lysing matrix E (LME) glass/zirconia/silica beads Tube (MP Biomedicals, Santa Ana, CA) using the protocol described in Gillies et al. (2015) which combines phenol:chloroform:isoamyalcohol (25:24:1) and bead beating. Genomic DNA and RNA were stored at −80°C until purified. DNA and RNA were purified using QIAGEN (Valencia, CA) AllPrep DNA/RNA Kit. DNA quantity was determined using a Qubit2.0 Fluorometer (Life Technologies, Grand Island, NY). RNA with an RNA integrity number (RIN) (16S/23S rRNA ratio determined with the Agilent TapeStation) ≥ 8 (on a scale of 1-10, with 1 being degraded and 10 being undegraded RNA) was selected for metatranscriptomic sequencing. Using a Ribo-Zero kit (Illumina) rRNA was subtracted from total RNA. Subsequently, mRNA was reverse transcribed to cDNA as described in Mason et al. (2012)62. RNA (as cDNA) and DNA libraries (six samples/lane) were linker barcoded and pooled for high-throughput Illumina HiSeq 2000 paired-end sequencing at Argonne National Laboratory.
Sequencing, assembly, and binning
DNA and RNA were sequenced separately, six samples per lane, with Illumina HiSeq 2000 chemistry to generate 100 bp, paired-end reads at the Argonne National Laboratory Next Generation Sequencing facility. The data are available at the NCBI SRA repository under the BioSample accession numbers SAMN05791315, SAMN05791316, SAMN05791317, SAMN05791318, SAMN05791319, SAMN05791320, SAMN05791321, SAMN05791322, SAMN05791323, SAMN05791324, SAMN05791325, SAMN05791326 (the first six refer to DNA samples, the latter six, RNA). DNA sequencing resulted in a total of 416,924,120 reads that were quality trimmed to 413,094,662 reads. Prior to assembly, adaptors were removed using Scythe (https://github.com/vsbuffalo/scythe), and low-quality reads (Q < 30) were trimmed with Sickle (https://github.com/najoshi/sickle). Reads with three or more Ns or with average quality score of less than Q20 and a length < 50 bps were removed. Genomes were reconstructed using two rounds of assembly. Metagenomic reads from all six samples were pooled, assembled, and binned using previously described methods63,64. Briefly, quality filtered reads were assembled with IDBA-UD65 on a 1TB RAM, 40-core node at the LSU High Performance Computing cluster SuperMikeII, using the following settings: -mink 65 –maxk 105 –step 10 –pre_correction-seed_kmer 55. Initial binning of the assembled fragments was performed using tetra-nucleotide frequency signatures using 5 kbp fragments of the contigs. Emergent self-organizing maps (ESOM) were manually delineated and curated based on clusters within the map (Fig. S8, S9). The primary assembly utilized all reads and produced 28,080 contigs ≥ 3 kb totaling 217,715,956 bp. Of these, 303 contigs were over 50 kb, 72 over 100 kb, and the largest contig was just under 495 kb. Binning produced 76 genomes, but the Thaumarchaeota bin was very large and predicted to have high contamination from other taxa. Assuming expected genome sizes of ~1.6 Mb and average relative abundances within the community of 20% or greater, our initial sequencing run would have produced at least 5000x coverage for a hypothetical Thaumarchaeota genome. To reduce this burden on the assembly algorithm, a second round of assembly using a 5% random sub-selection of the pooled DNA reads was completed using the same settings. The subset of reads was generated randomly using the split command to create sub-files that were 1/20th the size of the original the metagenomic file. A sub-file was chosen at random and was assembled using the parameters previously stated. The subassembly resulted in a mixed Thaumarcheaota bin (Fig. S9) that could be separated in two based on comparative read mapping from different samples (below). The final genomes were nearly complete (96 and 79%, respectively), and grouped with other Nitrosopumilus genomes (Table 1, Figs. S1, S2).
DNA and RNA mapping
Metagenomic and metatranscriptomic sequencing reads from each sample were separately mapped to binned contigs using BWA66 to compare bin abundance across samples and facilitate bin cleanup (below). Contigs within each bin were concatenated into a single fasta sequence and BWA was used to map the reads from each sample to all bins. All commands used for these steps are available in supplementary information.
Bin QC
Bins were examined for contamination and completeness with CheckM67, and we attempted to clean bins with > 10% estimated contamination using a combination of methods. First, the CheckM modify command removed contigs determined to be outliers by GC content, coding density, and tetranucleotide frequency. In one case (Bin 51_l), the outlying contigs were sufficiently similar to be formed into a separate bin. Next, in bins that still showed > 10% contamination, contigs were separated according to comparative relative abundance of mean DNA read coverage by sample. In three cases (Bins 33, 40, 62), this was insufficient to reduce contamination. Final bins were evaluated with CheckM again to generate the statistics in Table S1 and final bin placements in the CheckM concatenated gene tree (Fig. S2).
Ribosomalprotein tree
The concatenated ribosomal protein tree was generated using 16 syntenic genes that have been shown to undergo limited lateral gene transfer (rpL2, 3, 4, 5, 6, 14, 15, 16, 18, 22, 24 and rpS3, 8, 10, 17, 19)68. Ribosomal proteins for each bin were identified with Phylosift69. Amino acid alignments of the individual ribosomal proteins were generated using MUSCLE70 and trimmed using BMGE71 (with the following settings: -m BLOSUM30 –g 0.5). The curated alignments were then concatenated for phylogenetic analyses and phylogeny inferred via RAxML v 8.2.872 with 100 bootstrap runs (with the following settings: mpirun -np 4 -npernode 1 raxmlHPC-HYBRID-AVX -f a -m PROTCATLG -T 16 -p 12345 -x 12345 -# 100). Note this is similar to the number utilized in a previous publication for this tree with automated bootstrapping73, and required just over 56 hours of wall clock time. The alignment is available in SI.
Taxonomic assignment
Taxonomy for each bin was assigned primarily using the ribosomal protein tree. However, for bins that did not have enough ribosomal proteins to be included in the tree, or for which the placement within the tree was poorly supported, assignments were made based on the concatenated marker gene tree as part of the CheckM analysis (Fig. S2), or via 16S rRNA gene sequences, when available. 16S rRNA genes were identified via CheckM, and these sequences were blasted against the NCBI nr database to corroborate CheckM assignments. Several of the most abundant OTUs in Gillies et al. (2015) matched the elusive and important OM1 clade of marine Actinobacteria74, so we explicitly included genome sequences in our ribosomal protein tree from taxa that have been placed near this clade previously75,76. While several of our genomes group with Illumatobacter or the MedAcidi sequences, these are not expected to be true OM1 organisms, and 16S rRNA gene blast results from bins 29-1, 58, and 60 did not support classifying these taxa as OM1. However, their proximity to a close relative of the OM1 clade, Candidatus Actinomarina minuta, make them useful contributions to future studies of the group. In the case of the SAR202 genomes, which did not have representative genomes in either the ribosomal protein tree or the CheckM tree, the 16S rRNA gene sequences for two of the three bins (43-1, 43-2) were available and aligned with the sequences used to define the SAR202 clade40 (Fig. S10). Alignment, culling, and inference were completed with MUSCLE70, Gblocks77, and FastTree278, respectively, with the FT_pipe script. The script is provided in SI. For putative CP genomes, taxonomy was also evaluated by examining the taxonomic identification for each of the predicted protein sequences after a BLASTP search against the NCBI nr database. Post-blast, the number of assignments to the dominant one or two taxonomic names, along with the number of assignments to “uncultured bacterium,” was plotted for each genome according to the bit score quartile. Quartiles were determined in R using the summary function. In the case of the Parcubacteria and Peregrinibacteria genomes, the majority of high quality blast hits were consistent with the ribosomal protein tree taxonomy (Figs. S1, S3). For the Bin 13 and WS3 genomes, blast hit identification did not return clear results-hits included a wide variety of taxa. This is consistent with what would be expected from novel genomes that have few or no members in the database, however this analysis did not directly corroborate the ribosomal protein tree phylogeny. Bin 56 has two ribosomal protein operons on scaffold_2719/Ga0113622_1153 and scaffold_21777/Ga0113622_1009. In the ribosomal protein tree, the former placed the organism in the Planctomycetaceae, while the latter (which was much smaller) placed the organism in CP WS3. The majority of BLASTP annotations to the nr database matched Planctomycetaceae taxa, as did the 16S rRNA gene sequences found in the genome, so this organism was designated a Planctomycetes.
Metabolic reconstruction
Post-binning, genomes were submitted individually to IMG79 for annotation. Genome accession numbers are in Table S1, and all are publically available. Metabolic reconstruction found in Table S1 and Figs S5-7/ S11-13, came from these annotations and inspection with IMG’s analysis tools, including KEGG pathway assignments and transporter predictions. Transporters highlighted for DOM uptake were identified based on information at the Transporter Classification Database80. Carbohydrate-active enzymes (CAZymes) were predicted using the same routines as those in operation for the updates of the carbohydrate-active enzymes database (www.cazy.org)81.
RPKM abundance of taxa and genes
Abundance of taxa within the sample was quantified by evaluating mapped reads using Reads-Per-Kilobase-Per-Million (RPKM) normalization82 according to Aij = (Nij/Li) x (1/Tj), where Aij is the abundance of bin i in sample j, Nij is the number of reads that map to bin from sample j, Li is the length of bin i in kilobases, and Tj is the total number of reads in sample j divided by 106. Nij was generated using the samtools66 idxstats function after mapping with BWA. The data in Fig. 2 were created by summing (Nij/Li) for groups of taxa defined in Table S1 prior to multiplying by (1/Tj). RNA coverage was used to evaluate both bin and gene activity. Mean coverage for each supercontig was calculated using bedtools83 and bins were assigned a rank from lowest mean recruitment (1) to highest mean recruitment (2). Bins with particularly high or low activity (transcript abundance) relative to their abundance (genome abundance) were identified using rank-residuals, calculated as follows: On a plot of DNA coverage rank vs. RNA coverage rank, residuals for each bin or gene were calculated from the identity. As the rank-residuals followed a Gaussian distribution, bins with a residual that was > 1 s.d. from the rank-residual mean were classified as having higher-than-expected transcriptional activity; bins with a residual that was < 1 s.d. from the mean were classified as having lower-than-expected transcriptional activity. RPKM values were also calculated for every gene in every bin analogously to that for bins, using RNA mapping values extracted with the bedtools multicov function. Sample E2 was omitted from gene-specific calculations as only 4588 transcriptomic reads mapped successfully from this sample, compared to >100,000 from other samples. 17,827 of 140,347 genes had no evidence of expression in any sample and so were removed from further analysis. 3,840 genes recruited reads in all remaining samples. All calculations are available in Table S1 or the R markdown document Per.gene.RPKM.Rmd in Supplemental Information. Note that RPKM values indicate abundance measurements across a small number of samples. While we can evaluate the relative expression of genes for those samples, our dataset lacks sufficient power to evaluate estimates of significance in differential expression.
nxrA and dsrABphylogeny
Initial annotation of our bins identified putative nitrate reductases, however none with explicit assignment as nxr genes putatively involved in nitrite oxidation, which was unusual considering that Nitrospina spp. are nitrite-oxidizing chemolithotrophs. Since nxrA-type nitrite oxidases are homologous with narG-type nitrate reductases (and polyphyletic)23, annotation for these genes was curated with phylogenetic analysis using known nxr genes. Reference sequences of known nxrA/narG homologs from Lüke et al. (2015) were searched against a database of all the binned scaffolds (makeblastdb –dbtype nucl with default settings except -parse_seqids -hash_index) using tblastn with default settings (except –max_query_seqs 10000). Nine bins had hits to any sequences in the reference file with bit scores > 100. All annotated protein sequences were downloaded for these bins and used to create a blast protein database (makeblastdb –dbtype prot with default settings except -parse_seqids - hash_index); the reference sequences were searched through this database using blastp with default settings (except –max_query_seqs 10000). All blast suite commands used v. 2.2.28+. The eight proteins returning matches with bit scores > 100 were selected for phylogenetic analysis using the original reference sequences. Alignment, culling, and inference were completed with the FT_pipe script. The protein from Nitrospina Bin 25 grouped closely with the nxrA gene from N. gracilis (Fig. S15), providing strong evidence for this being a functional nitrite oxidase. To evaluate whether the predicted dsrAB genes in bins 23, 35, and 61-2 functioned most likely as oxidative or reductive, they aligned with a previously established reference dataset32 that was used for phylogenetic inference also using the FT_pipe script.
Author contributions
JCT, BJB, and OUM designed the study. LEG, NNR, and JCT collected samples. NNR provided processed oceanographic data. LEG and OUM extracted, quantified, and determined quality of nucleic acids. JCT, KWS, and BJB reconstructed the genomes. KWS, BJB, BT, BH, and JCT conducted downstream analyses. JCT lead manuscript writing and all co-authors evaluated and contributed edits.
Competing financial interests
The authors declare no competing financial interests.
Acknowledgements
Funding for this work was provided to JCT through the Oak Ridge Associated Universities Ralph E. Powe Junior Faculty Enhancement Award and the Louisiana State University Department of Biological Sciences. A portion of the funding for this work was provided by a Planning Grant award to OUM from Florida State University. Funding for the research vessel and collection of oceanographic data was provided by the National Oceanic and Atmospheric Administration, Center for Sponsored Coastal Ocean Research Award Number NA09NOS4780204 to NNR. The authors thank the crew of the R/V Pelican. Portions of this research were conducted with high performance computing resources provided by Louisiana State University (http://www.hpc.lsu.edu).
References Cited
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.
- 7.
- 8.
- 9.↵
- 10.
- 11.↵
- 12.
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.
- 55.
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵