ABSTRACT
Microbial methanogenesis may have been a major component of Earth’s carbon cycle during the Archaean Eon, generating a methane greenhouse that increased global temperatures enough for a liquid hydrosphere, despite the sun’s lower luminosity at the time. Evaluation of potential solutions to the “faint young sun” hypothesis by determining the age of microbial methanogenesis was limited by ambiguous geochemical evidence, and the absence of a diagnostic fossil record. To overcome these challenges, we utilize a temporal constraint: a horizontal gene transfer (HGT) event from within archaeal methanogens to the ancestor of Cyanobacteria, one of the few microbial clades with recognized crown group fossils. Results of molecular clock analyses calibrated by this HGT-propagated constraint show methanogens diverging within Euryarchaeota no later than 3.51 Ga, with methanogenesis itself likely evolving earlier. This timing provides independent support for scenarios wherein microbial methane production was important in maintaining temperatures on the early Earth.
Introduction
Methane is a greenhouse gas implicated in current and past climate change. Accumulation of atmospheric methane during the Archaean Eon has been proposed as one solution to the “faint young sun paradox”, contributing to increased global temperatures enough to maintain a liquid hydrosphere despite the lower luminosity of the sun at the time1,2. While microbial methanogenesis is generally assumed to be an extremely ancient pathway due to its phylogenetic distribution across much of Euryarchaeota3, there is only limited geochemical evidence for microbial methane production in the Archaean, in the form of carbon isotopic composition of kerogens ~2.7 Ga4 and methane-bearing fluid inclusions ~3.46 Ga5. Therefore, the time of the onset of microbial methane production and the relative contributions of microbial and abiogenic sources to Archaean atmospheric methane remain uncertain. The case for a microbial methane contribution would be strengthened by molecular clock estimates showing the divergence of methane-producing microbes predates their proposed geochemical signature. Few such studies have been conducted, resulting in a range of dates for the origin of microbial methanogenesis spanning the early Precambrian (e.g., 3.05-4.49 Ga1, 3.46-3.49 Ga6, ~3.45 Ga7, 2.97-3.33 Ga8, and a much younger 1.26-1.31 Ga9). Estimating divergence times requires calibration points from the geological record10: body or trace fossils attributable to a clade’s crown group, preferably by phylogenetic analysis11, or (more controversially) preserved traces of organic biomarkers that may be diagnostic for certain clades12,13. There is no such direct evidence, however, for Archaea in deep time (let alone methanogens nested deeply within Euryarchaeota). Recent molecular phylogenies suggest eukaryotes may have evolved from within a paraphyletic Archaea14. However, the lack of consensus as to the placement of eukaryotes15, and the long branch separating eukaryotes from all other groups, make direct fossil calibration of Archaea using crown group eukaryotic fossils problematic. Without geological constraints, confidence in divergence estimates rests entirely on the unconstrained rate models and root priors used, which are sensitive to lineage-specific rate changes16, and cannot be internally cross-validated.
In this work, we employ a horizontal gene transfer (HGT) event to date the divergence of methanogens. The HGT was donated from methanogens to the ancestor of Cyanobacteria, microbial clade with the oldest fossils with likely crown-group affinities known in the entire Tree of Life. This extends the use of fossil-calibrated relaxed molecular clocks to archaeal evolution, enabling methods comparable to those validated in studies of metazoan evolution. As calibrations from the rock record are essential for accurate molecular clock inferences, particularly for ancient splits10, their inclusion permits more accurate and precise dating of the earliest methanogens.
HGT events represent temporal intrusions between genomes, establishing a cross-cutting relationship determining the relative age of the donor (older) and recipient (younger) clades17. Previous work has argued for the relative ages of clades18–20, or has used HGT events as secondary calibrations for molecular clock studies21,22. Caution must be applied when importing secondary divergence estimates from prior molecular clock studies, as they may propagate errors associated with the original estimate, leading to false precision23,24. Furthermore, basing a molecular clock solely on donor-recipient logic fails to incorporate the observed reticulating branch length. This is relevant as it is impossible to ascertain whether the HGT occurred near the divergence of the recipient’s total group, near the diversification of its crown group, or at any time along its stem lineage. As stem lineages can represent very long time intervals for major microbial clades, their omission may dramatically impact date inferences.
Previous phylogenetic analyses have shown that the smc, scpA, and scpB genes (together encoding proteins that form the SMC complex, required for chromosome condensation in many microbial groups), were transferred to Cyanobacteria from a euryarchaeal donor25,26. With improved taxon sampling and accounting for long branch attraction artifacts, we show strong phylogenetic support that SMC complex genes were transferred in a single evolutionary event from a sister lineage of Methanomicrobiales to the ancestor of Cyanobacteria (Fig. 1, Supplementary Figs. 1-4, Supplementary Table 1).
Results
To link the HGT event with the species topology of both methanogens and Cyanobacteria, we concatenated 1) aligned SMC complex sequences for Cyanobacteria and Euryarchaeota with 2) ribosomal protein sequences for Euryarchaeota (expected to reconstruct the Euryarchaeota species tree; listed in Supplementary Table 2) and 3) ribosomal protein sequences for Cyanobacteria as three separate partitions in a composite alignment. This composite alignment allows the reticulating branch length for SMC complex evolution to be included in dating analyses, while the topology and branch lengths of the respective euryarchaeal and cyanobacterial clades are inferred from the far more extensive site information within ribosomal datasets. The composite alignment maximally captures the sequence information required to infer divergences of the donor and recipient lineage, as well as sequence information supporting the length and placement of the reticulating branch.
Pairwise distances (Supplementary Fig. 5) suggest that the SMC complex genes are evolving slightly (~30%) faster than ribosomal genes, but at about the same rate in all taxa, including the reticulating branch. Thus inclusion of the HGT does not produce clade-specific lineage effects, and the HGT is appropriate for concatenation in a composite alignment. Observed heterotachy may have been imposed by the HGT event itself, which may impact rate estimates along this branch. To test this potential impact, simulated alignments were generated using artificially halved and doubled reticulating branch lengths (Supplementary Fig. 6). On average, doubling the reticulating branch length decreased the age of the cyanobacterial crown group by ~77 Myr, and increased the age of the methanogen donor clade by ~87 Myr. Halving the reticulating branch length increased the age of the cyanobacterial crown group by ~72 Myr, and decreased the age of the methanogen donor clade by ~62 Myr. Given the large variances associated with each of these age estimates, impacts on divergence times were relatively small.
Previous studies have shown Bayesian dating approaches may be robust to extensive missing sequence data27. We further explored the suitability of composite alignments with large blocks of missing data (where entire clades lack all 30 ribosomal sequences) using simulations (Supplementary Fig. 7). The mean age estimates for crown Cyanobacteria were slightly older when missing data were included, increasing the age of crown Cyanobacteria by 2.6%; however, the mean age estimates for the donor clade were not significantly affected. Therefore, missing data have a small impact on age estimates, but this level of significance does not propagate to deeper nodes.
The accuracy of divergence times estimated individually from the Euryarchaeota species tree (uncalibrated relaxed clock only) differed substantially from the SMC gene tree (incorporating the HGT into the analysis) and the composite alignment result (Fig. 2A). Based on the species tree alone, methanogens are estimated to have diverged within the Paleoarchaean (mean 3.53 Ga ± SD of 163 Myr, minimum 3.24 Ga). Analyses of the SMC complex alone (mean 3.96 Ga ± 236 Myr, minimum 3.46 Ga) and the composite alignment (mean 3.94 Ga ± 228 Myr, minimum 3.51 Ga) both yield older age estimates for methanogens, in the Eoarchaean. Precision of the latter two analyses is similar for deeper nodes and slightly lower in the composite alignment for Cyanobacteria. Across all calibration sets, the effective prior distributions are similar to the posterior results for the root and methanogen node, but differ slightly for Cyanobacteria, where the posterior is included within a broader effective prior (Supplementary Table 3 and Supplementary Fig. 8). This indicates that prior specification is responsible for age results at most nodes, but the sequence data are also informative for some nodes. In the Euryarchaeota species tree alone, the donor node (Methanosarcinales + Methanomicrobiales) was not significantly older (2.46 Ga ± 158 Myr) than estimates for crown Cyanobacteria from other analyses (mean 2.32 Ga ± 180 Myr; below), which is unlikely as the donor node must be older than the recipient’s crown group22. This necessary adjustment makes up for the slight decrease in precision when the HGT partition is added, providing an additional calibration and increased accuracy. The advantage of adding ribosomal alignment blocks to the HGT is thus the incorporation of taxa and outgroups that lack the SMC complex genes, allowing us to infer the ages of more ancient nodes, including the divergence of methanogens, and their earliest diversifications.
Discussion
Divergence time estimates calibrated by a 2.0 Ga fossil akinete (rod-like resting cell28 are extremely old (Fig. 2B), with the age of Cyanobacteria (mean 2.93 Ga ± 161 Myr, minimum 2.62 Ga) substantially predating the Great Oxygenation Event (GOE; 2.33 Ga29), and with the age of the methanogen ancestor tipping into the Hadean (mean 4.33 Ga ± 240 Myr, minimum 3.88 Ga). In this analysis, the age of Euryarchaeota (mean 4.53 Ga ± 252 Myr, minimum 4.09 Ga) violates the maximum prior applied to the root, estimating a most likely ancestor age older than the oldest zircons (4.38 Ga30), and possibly older than the Earth itself. The maximum plausible fossil age for total-group Nostocales (before resulting mean estimates violate the root prior; Fig. 2B) corresponds to ~1.7 Ga, a similar age to proposed akinete material from the McArthur Group of Northern Australia31. As the validity of the 2.0 Ga microfossil has been questioned32, we also calibrated the same node on our tree with a younger 1.2 Ga fossil akinete33, which has greater morphological evidence32, and is the most conservative estimate discussed more extensively below (Fig. 3). This calibration results in age estimates for Cyanobacteria (mean 2.32 Ga ± 180 Myr, minimum 1.97 Ga) very close to and potentially younger than GOE, microbial methanogenesis in the Eoarchaean (mean 3.94 Ga ± 228 Myr, minimum 3.51 Ga), and a correspondingly early age for Euryarchaeota (mean 4.17 Ga ± 228 Myr, minimum 3.67 Ga).
Although only a single fossil calibration was used for this analysis, it may still improve accuracy and precision where among-lineage rate variation is accounted for jointly with the root prior34. In simulations, age estimates for Cyanobacteria are substantially more accurate when a fossil calibration is added (Fig. 2B), while deeper nodes in Euryarchaeota are less influenced. The 95% confidence intervals calculated from empirical fossils (1.2 and 2.0 Ga) overlap for the ages of Euryarchaeota and microbial methanogenesis, but not for Cyanobacteria, illustrating the importance of sensitivity analysis for clades such as Cyanobacteria with ghost ranges dependent upon phylogenetic interpretation of fossil discoveries35 and the use of (relatively) “safe but late” constraints23. Note that the GOE itself was not used as a calibration, as different age estimates of Cyanobacteria are contradictory about the relationship between the timing of oxygenic photosynthesis and the age of Cyanobacteria themselves36.
Within Euryarchaeota, Methanopyrales, Methanococcales, and Methanobacteriales diverge earlier than Methanomicrobiales, Methanosarcinales, and their relatives (Fig. 3). We conservatively estimate the emergence of methanogens as 3.94 Ga ± 228 Myr at the youngest, and the split between Methanosarcinales and Methanomicrobiales (the closest split to the HGT) at 3.10 Ga ± 195 Myr. Therefore, any proposed scenario of a late origin of microbial methanogenesis in the Mesoarchaean through Proterozoic8,9 violates the youngest possible calibrated molecular clock estimate (95% CI younger bound of 3.51 Ga), in addition to geochemical evidence5. Recently, archaeal clades outside of Euryarchaeota, the uncultured Bathyarchaeota and Verstraetearchaeota, were found to possess genes involved in methane metabolism37,38, thus the absolute origin of microbial methanogenesis could be substantially older than Euryarchaeota. Although our analysis is agnostic regarding the ancestral metabolism of Archaea, an older evolutionary history for microbial methanogenesis does not refute the hypothesis of an Archaean microbial methane greenhouse.
A substantial microbial methane greenhouse likely only contributes to Archaean warming of the Earth if 1) methanogenesis evolved early enough, which is consistent with our age estimates, and 2) the divergence time of methanogens predates that of the diversification of poorly characterized microbial taxa involved in anaerobic oxidation of methane (AOM; usually comprising communities of Bacteria and Archaea living together). AOM taxa are of interest because their metabolism can alter the carbon isotope signature of methane produced by microbes5,39 in the opposite direction from microbial methanogenesis5,39. Furthermore, AOM removes a substantial fraction of methane from sediments, which could effectively erase any geochemical signature of Eorchaean microbial methanogenesis40,41. Divergence time estimates for AOM taxa alone could not directly support the existence of microbial methanogenesis, as they can also metabolize abiotic sources of methane42. Thus, comprehensive estimates of divergence times for both methanogenic Euryarchaeota and AOM taxa could together constrain the Archaean “methane greenhouse window”, by permitting a narrower, independent interpretation of isotopic data.
As in biostratigraphy, in which index fossils are used to correlate and calibrate rock formations worldwide, a well-supported HGT event from a clade of interest into a fossil-bearing clade permits a direct link between divergence estimates and geological history. Combining data from genes with both reticulate and vertical histories into a single alignment complements other recent developments in calibrating microbial evolution20,22,43–45. Our results strongly support the appearance of major methanogen lineages predating the emergence of crown group Cyanobacteria. Our divergence estimates for Euryarchaeota are consistent with previous hypotheses proposing a role for microbial methane in warming the Archaean Earth. With the growing importance of time-calibrated phylogenies in evolutionary inference46,47, these methodological developments help to overcome the limitations of the sparse microbial geologic record, and indicate their potential utility in resolving the comparative natural history of microbial clades across the entire Tree of Life.
Methods
Data Matrix Construction
The smc, scpA and scpB proteins form a complex required for chromosome condensation in many microbial groups. Genes encoding these proteins within Cyanobacteria have previously been identified as having been transferred from within Archaea25,26. We queried NCBI’s nr database using BLASTp for homologs of smc, scpA, and scpB proteins in each member of Euryarchaeota with a sequenced genome (except the species-rich Halobacteriales, for which we selected eight representatives), and representative Cyanobacteria from all orders. Previously reported SMC homologs within Aquificales likely representing an additional HGT from Thermococcales26 were also included. No scpB sequences were found in Aquificales or Halobacteriales. Protein sequences for each homolog were individually aligned in MUSCLE v3.748. The smc protein contained two large poorly aligned regions, representing coiled-coil domains25,49. These regions were removed via alignment masking using GUIDANCE50, leaving 729 aligned sites. For the two scp proteins (which are much shorter, with limited phylogenetic informativeness; Supplementary Table 4), we elected not to mask poorly aligned regions in light of recent work indicating trees resulting from this process may be of decreased quality51.
Phylogenetic Analysis
Individual gene trees were constructed with RaxML v1.8.952 using the LG4M + G substitution model53. All three HGT genes were concatenated with FASconCAT v1.054, analyzed in RaxML with 100 bootstrap replicates, and in PhyloBayes v3.3f using two chains and the CAT20 site-dependent model55,56. The CAT20 model was used because preliminary analyses using the full CAT model did not reach convergence. An automatic stopping rule was implemented, with tests of convergence every 100 cycles, until the default criteria of effective sizes and parameter discrepancies between chains were met (50 and 0.3, respectively). Trees and posterior probability support values were then generated from completed chains after the initial 20% of sampled generations were discarded as burn-in.
Composite Alignment
A composite alignment was constructed to date the origin of methanogens, by concatenating 1) aligned SMC complex sequences for Cyanobacteria and Euryarchaeota (1,778 amino acids) with 2) ribosomal sequences for Euryarchaeota (adding representatives of clades without identified SMC homologs, i.e. Methanobacteriales, Methanocellales, Methanopyrales, and Thermoplasmatales) and 3) ribosomal sequences for Cyanobacteria, as three separate partitions (14,366 amino acids total). Specifically, 30 ribosomal proteins (Supplementary Table 2) were identified by BLASTp, aligned separately in MUSCLE, then concatenated. Separate partitions for cyanobacterial and archaeal ribosomal proteins are used to provide more informative sites for estimating evolutionary relationships and rates in these groups, without introducing phylogenetic conflict with the HGT partition. Using this approach, only SMC complex sequences determine cyanobacterial placement ‘within’ Euryarchaeota along the reticulating HGT branch. Note that SMC sequences from Aquificales were omitted from these analyses, as this additional putative HGT event is uninformative in this investigation. The concatenated topology was estimated with RaxML using the LG4M + G model and PhyloBayes using CAT20.
Fossil Calibration
To produce a divergence estimate, we applied a time constraint within Cyanobacteria, derived from fossil resting cells (akinetes; genus Archaeoellipsoides) similar to the cyanobacterial clades Nostocales (morphological subsection IV) and Stigonematales (subsection V), from the 2.0 Ga Franceville Group of Gabon28,31,57,58. There are too few morphological characters to determine a crown-group position of this fossil32, so we assigned the fossil minimum age to total-group Nostocales (i.e. the clade in our tree including Nostocales and Stigonematales32, and their sister group Chroococcidiopsidales). As the affinities of Paleoproterozoic Archaeoellipsoides have been questioned32,59, we also tested a less controversial younger fossil of the same genus with a more similar size to members of total-group Nostocales, from the 1.2 Ga Dismal Lakes Group of Northwest Canada33,59. To measure the effect of using different Archaeoellipsoides fossil ages on divergence time estimates, we simulated calibrations for total-group Nostocales at 100 Myr intervals between 1.3 and 2.3 Ga (in addition to the empirical fossil dates at 1.2 and 2.0 Ga). Unlike some previous analyses57,58, we did not include the age of the GOE as a calibration on the age of Cyanobacteria. Our approach permits an estimate of the age of Cyanobacteria independent of the onset of atmospheric oxygenation36.
Divergence Time Estimation
Divergence times were estimated in PhyloBayes using a fixed topology from the RaxML composite alignment result, the CAT20 substitution model, and the uncorrelated gamma multipliers (UGM) relaxed clock model16. The UGM model allows substitution rates to vary across the tree, and assumes there is no autocorrelation of evolutionary rates across deep branches16. Therefore, this model is suited to modeling rate changes associated with HGT events along a reticulating branch. Rates across sites followed a uniform distribution, and the prior on divergence times was uniform.
The root was calibrated with a gamma distributed prior with a mean of 3.9 Ga and SD 230 Myr (range from 4.36 to 3.44 Ga); this constraint was calculated as the mean of the maximum root age of 4.38 Ga (oldest zircons, approximating the age of habitable Earth30) and minimum of 3.46 Ga (oldest traces of microbial methane5). We selected a gamma distribution rather than uniform, because we do not assume it would be equally likely that the last common ancestor of Euryarchaeota diverged at either the maximum or minimum age (i.e. the tails are less likely). The superiority of “soft” calibration densities has been discussed previously60. It is not circular to use microbial methane traces as a younger bound on the root, because this constraint only presupposes the methane traces are 1) archaeal and 2) biogenic, and does not specifically constrain the age of any clade within Archaea, including methanogens (i.e., the ancestor of known methanogens may be either younger or older than 3.46 Ga, as the prior is not directly placed on its node). Each fossil age (above) was used as a hard-bound minimum constraint on a uniform age prior, which is appropriate (despite soft bounds on the root, above) due to the extreme antiquity and limited character information from these calibrations. Other validatory analyses, including varying the molecular clock model resulted in minimal changes to divergence time estimates. Comparisons of estimated CIs to the effective prior61 were also made by removing sequence data using the -prior flag in PhyloBayes (Supplementary Fig. 8 and Supplementary Table 3).
Data Availability
Supplementary data files are available at Dryad (provisional link: http://datadryad.org/review?doi=doi:10.5061/dryad.m371v).
Author Contributions
J.M.W. and G.P.F. designed research and performed data analysis. J.M.W. drafted the manuscript with assistance from G.P.F.
Competing Interests
The authors declare no competing financial interests.
Acknowledgments
We thank D. Pisani and M. dos Reis for improving the manuscript with their helpful comments, D. Gruen, C. Magnabosco, D. Rothman, and B. Schirrmeister for discussions, and G. Shomo for assistance with the Engaging Cluster at MGHPCC. We acknowledge support from Simons Foundation Collaboration on the Origin of Life #339603 to G.P.F. and NSF EAR-1615426 to G.P.F. and J.M.W.
Footnotes
↵* e-mail: jowolfe{at}mit.edu