ABSTRACT
Diphthamide is a modified histidine residue which is uniquely present in archaeal and eukaryotic elongation factor 2 (EF-2), an essential GTPase responsible for catalyzing the coordinated translocation of tRNA and mRNA through the ribosome. In part due to the role of diphthamide in maintaining translational fidelity, it was previously assumed that diphthamide biosynthesis genes (dph) are conserved across all eukaryotes and archaea. Here, comparative analysis of new and existing genomes reveals that some archaea (i.e., members of the Asgard superphylum, Geoarchaea, and Korarchaeota) and eukaryotes (i.e., parabasalids) lack dph. In addition, while EF-2 was thought to exist as a single copy in archaea, many of these dph-lacking archaeal genomes encode a second EF-2 paralog missing key-residues required for diphthamide modification and for normal translocase function, perhaps suggesting functional divergence linked to loss of diphthamide biosynthesis. Interestingly, some Heimdallarchaeota previously suggested to be most closely related to the eukaryotic ancestor maintain dph genes and a single gene encoding canonical EF-2. Our findings reveal that the ability to produce diphthamide, once thought to be a universal feature in archaea and eukaryotes, has been lost multiple times during evolution, and suggest that anticipated compensatory mechanisms evolved independently.
INTRODUCTION
Elongation factor 2 (EF-2) is a critical component of the translational machinery that interacts with both the small and large ribosomal subunits. EF-2 functions at the decoding center of the ribosome, where it is necessary for the translocation of messenger RNA and associated tRNAs (Spahn, et al. 2004). Archaeal and eukaryotic EF-2, as well as the homologous bacterial EF-G, are members of the highly conserved translational GTPase protein superfamily (Atkinson 2015). Gene duplications and subsequent neo-functionalizations have been inferred for eukaryotic EF-2 (eEF-2), with the identification of the spliceosome component Snu114 (Fabrizio, et al. 1997), and Ria1, a 60S ribosomal subunit biogenesis factor (Becam, et al. 2001). Bacterial EF-G is involved in both translocation and ribosome recycling and has undergone multiple duplications, including sub-functionalizations separating the translocation and ribosome recycling functions (Suematsu, et al. 2010; Tsuboi, et al. 2009) as well as neo-functionalizations including roles in back-translocation (Qin, et al. 2006), translation termination (Freistroffer, et al. 1997), regulation (Li, et al. 2014) and tetracycline resistance (Donhofer, et al. 2012). However, to date, archaea were thought to encode only a single essential protein within this superfamily, i.e. archaeal EF-2 (aEF-2) (Atkinson 2015).
Unlike bacterial EF-Gs, archaeal and eukaryotic EF-2s contain a post-translationally modified amino acid which is synthesized upon the addition of a 3-amino-3-carboxypropyl (ACP) group to a conserved histidine residue and its subsequent modification to diphthamide by the concerted action of 3 (in archaea) to 7 enzymes (in eukaryotes)(de Crécy-Lagard, et al. 2012; Schaffrath, et al. 2014). While diphthamide is perhaps best known as the target site of bacterial ADP-ribosylating toxins (Iglewski, et al. 1977; Jorgensen, et al. 2008) and as required for sensitivity to the antifungal sordarin (Botet, et al. 2008), its exact role remains a subject of investigation. Yeast mutants incapable of synthesizing diphthamide have a higher rate of translational frame shifts, suggesting that this residue plays a critical role in reading frame fidelity during translation (Ortiz, et al. 2006). Furthermore, structural studies of eEF-2 using high-resolution Cryo-EM have indicated that diphthamide interacts directly with codonanticodon bases in the translating ribosome, and facilitates translocation by displacing ribosomal decoding bases (Anger, et al. 2013; Murray, et al. 2016). In addition, diphthamide has been proposed to play a role in the regulation of translation, as it represents a site for reversible endogenous ADP-ribosylation (Schaffrath, et al. 2014), and in the selective translation of certain genes in response to cellular stress (Argüelles, et al. 2014). Given its anticipated role at the core of the translational machinery, it is not surprising that, with the sole exception of Korarchaeum cryptofilum (de Crécy-Lagard, et al. 2012; Elkins, et al. 2008), the diphthamide biosynthetic pathway is universally conserved in all archaea and eukaryotes. Indeed, while not strictly essential, loss of diphthamide biosynthesis has been shown to result in growth defects in yeast (Kimata and Kohno 1994; Ortiz, et al. 2006) and some archaea (Blaby, et al. 2010), and is either lethal or causes severe developmental abnormalities in mammals (Liu, et al. 2006; Webb, et al. 2008; Yu, et al. 2014).
In the current study, we explore the evolution and function of EF-2 and of diphthamide biosynthesis genes using genomic data from novel major archaeal lineages that were recently discovered using metagenomics and single-cell genomics approaches (Adam, et al. 2017; Hug, et al. 2016; Spang, et al. 2017). In particular, we report the presence of EF-2 paralogs in many archaeal genomes belonging to the Asgard archaea, Korarchaeota and Bathyarchaeota (Evans, et al. 2015; He, et al. 2016; Lazar, et al. 2016; Meng, et al. 2014; Spang, et al. 2015; Zaremba-Niedzwiedzka, et al. 2017) and the unexpected absence of diphthamide biosynthesis genes in several archaea and in parabaslid eukaryotes. Our findings reveal a complex evolutionary history of EF-2 and diphthamide biosynthesis genes, and point to novel mechanisms of translational regulation in several archaeal lineages. Finally, our results are compatible with scenarios in which eukaryotes evolved from an Asgard-related ancestor (Spang, et al. 2015; Zaremba-Niedzwiedzka, et al. 2017) and suggest the presence of a diphthamidated EF-2 in this lineage.
MATERIALS AND METHODS
Sampling and sequencing of ABR Loki- and Thorarchaeota
Sampling, DNA extraction, library preparation and sequencing was produced as described in (Zaremba-Niedzwiedzka, et al. 2017). We chose the four deepest samples, at 125 and 175 cm below sea-floor (MM3/PM3 and MM4/PM4 respectively), as they showed highest lokiarchaeal diversity in a maximum likelihood phylogeny of 5 to 15 ribosomal proteins (RP15) encoded on the same contig (Zaremba-Niedzwiedzka, et al. 2017). Adapters and low quality bases were trimmed using Trimmomatic version 0.32 with the following parameters: PE -phred33 ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:1:true LEADING:3 TRAILING:6 SLIDINGWINDOW:4:15 MINLEN:36 (Bolger, et al. 2014).
Assembly of ABR Loki- and Thorarchaeota
Samples from the same depth were assembled together using IDBA-UD (Peng, et al. 2012) (version 1.1.1-384, –maxk 124 -r <MERGED_READS>) producing four different assemblies (S1:MM1/PM1, S2:MM2/PM2, S3:MM3/PM3, S4:MM4/PM4). Assemblies S3 and S4 were particularly interesting as they showed the highest lokiarchaeal diversity. However, some lokiarchaeal members showed highly fragmented contigs, probably due to the low abundances of these organisms. In an attempt to produce longer contigs we co-assembled those reads coming from Asgard archaea members in the samples MM3, PM3, MM4 and PM4. Asgard archaea reads were identified using Clark (version 1.2.3, -m 0) (Ounit, et al. 2015) and Bowtie2 (version 2.2.4, default parameters) (Langmead and Salzberg 2012) against a customized Asgard archaea database. Classified reads were extracted and co-assembled using SPAdes (version v.3.9.0, –careful) (Bankevich, et al. 2012).
In brief, the Asgard database was composed of Asgard genomes publicly available on February 2017. Clark does not perform well when organisms present in the samples of interest are not highly similar to the ones present in the provided database. To increase the classification sensitivity, we included in our database low-quality Asgard MAGs (with highly fragmented contigs) generated from assemblies S3 and S4, using CONCOCT (Alneberg, et al. 2014). Coverage profiles required by CONCOCT were estimated using kallisto (version 0.43.0, quant –plaintext) (Bray, et al. 2016). All available samples from the same location (MM1, PM1, MM2, PM2, MM3, PM3, MM4, PM4) were used and mapped independently against the assemblies S3 and S4. For each assembly, MAGs were reconstructed using two different minimum contig length thresholds (2000 and 3000 bp). We used the number of containing clusters of ribosomal proteins (ribocontigs) as a proxy to estimate the microbial diversity present in the community. The maximum number of clusters (-c option in CONCOCT) was estimated by calculating approximately 2.5 times the estimated number of species in the sample (Johannes Alneberg, personal communication), resulting into 900 and 600 for S3 and S4, respectively. Potential Asgard archaea bins were identified based on the presence of ribocontigs classified as Asgard archaea and were included in the database.
Binning of ABR Loki- and Thorarchaeota
Several binning tools with different settings were run independently: CONCOCT_2000: version 0.4.0, –read_length 200 and minimum contig length of 2000. CONCOCT_3000: version 0.4.0, -read_length 200 and minimum contig length of 3000. In both cases, coverage files were created mapping all 8 samples against the co-assembly using kallisto. MaxBin2: version 2.2.1, -min_contig_length 2000 -markerset 40 –plotmarker (Wu, et al. 2016). The 8 samples were mapped against the co-assembly using Bowtie2. Coverage was estimated using the getabund.pl script provided. MyCC_4mer: 4mer -t 2000 (Lin and Liao 2016). MyCC_56mer: 56mer -t 2000. Both coverage profiles were obtained as the authors described in their manual.
The results of those 5 binning methods were combined into a consensus: contigs were assigned to bins if they had been classified as the same organism by at least 3 out of 5 methods. The resulting bins were manually inspected and cleaned further using mmgenome (Albertsen, et al. 2013). Completeness and redundancy was computed using CheckM (Parks, et al. 2015).
Sampling and sequencing of OWC Thorarchaeota
Eight soil samples were collected from the Old Woman Creek (OWC) National Estaurine Research Reserve and DNA was extracted as described previously (Narrowe, et al. 2017). Library preparation and five lanes of Illumina HiSeq 2x125 bp sequencing followed standard operating procedures at the US DOE Joint Genome Institute (GOLD study ID Gs0114821). Sample M3-C4-D3 had replicate extraction, library preparation, and two lanes of sequencing performed, and reads were combined before downstream analysis. For 3 additional samples (M3-C4-D4, O3-C3-D3, O3-C3-D4) one lane of sequencing was performed. For the other 4 samples (M3-C5-D1, M3-C5-D2, M3-C5-D3, M3-C5-D4) DNA was sheared to 300bp with a Covaris S220, metagenomic sequencing libraries were prepared using the Nugen Ovation Ultralow Prep kit, and all four samples were multiplexed on one lane of Illumina HiSeq 2x125 sequencing at the University of Colorado Denver Anschutz Medical Campus Genomics and Microarray Core.
Assembly and binning of OWC Thorarchaeota
For initial assembly of the 5 full-lane sequencing runs, adapter removal, read filtering and trimming were completed using BBDuk (sourceforge.net/projects/bbmap) ktrim=r, minlen=40, minlenfraction=0.6, mink=11 tbo, tpe k=23, hdist=1 hdist2=1 ftm=5, maq=8, maxns=1, minlen=40, minlenfraction=0.6, k=27, hdist=1, trimq=12, qtrim=rl. Filtered reads were assembled using megahit (Li, et al. 2015) version 1.0.6 with –k-list 23,43,63,83,103,123.
The individual metagenome from the O3-C4-D3 sample was binned using Emergent Self-Organizing Maps (ESOM)(Dick, et al. 2009) of tetranucleotide frequency (5kb contigs, 3kb windows). BLAST hits of predicted proteins identified a Thorarchaeota population bin. All scaffolds containing a window in this bin were used as a mapping reference and reads from the 9 OWC libraries were mapped to this bin using bbsplit with default parameters (sourceforge.net/projects/bbmap). The mapped reads were reassembled using SPAdes version 3.9.0 with –careful -k 21,33,55,77,95,105,115,125 (Bankevich, et al. 2012). Finally, the reads which were input to the reassembly were mapped to the assembled scaffolds using Bowtie 2 (Langmead and Salzberg 2012) to generate a coverage profile which was used to manual identify bins using Anvi’o (Eren, et al. 2015). Proteins were predicted using prodigal (Hyatt, et al. 2010) and searched against UniRef90 release 11-2016 (Suzek, et al. 2015), with the taxonomy of best blast hits used to validate contigs as probable Thorarchaeota. Contigs having no top hit to the publicly available Thorarchaeota genomes were manually examined and removed if they could be assigned to another genome bin in the larger metagenomic assembly. Genome completeness and contamination was estimated using CheckM (Parks, et al. 2015).
Identification of diphthamide biosynthesis genes and EF-2 homologs in eukaryotes and archaea
The EGGNOG members dataset (available at http://eggnogdb.embl.de/#/app/downloads) was surveyed for sequences corresponding to the following clusters of orthologous groups (COG): EF-2, COG0480; DPH1/DPH2, COG1736; DPH3, COG5216; DPH4, COG0484; DPH5, COG1798; DPH6, COG2102; and DPH7, ENOG4111MMJ. For genomes not represented in EGGNOG, we manually inspected publicly available genomes as indicated by ‘orthology assignment source’ (Supplementary File S1). Similarly, an in-house arCOG dataset, modeled after the publicly available arCOGs from Makarova et al. (Makarova, et al. 2015),was queried for the corresponding COG distribution in relevant archaeal genomes. Finally, aEF-2 and aEF-2p genes in Thorarchaeota OWC Bin 2,3 and 5 were identified using HMMER: version 3.1b2, hmmsearch –cut-tc (Eddy 2011) against PFAM models PF00679 (EF-G_C) and PF03764 (EFG_IV). Conserved synteny surrounding the Thorarchaoeta aEF-2p gene was used to further search for partial aEF-2p genes. In addition, all contigs with matching HMM hits to dph2 and dph5 in the full OWC assembly were manually examined for potential Thorarchaeal dph genes; none were identified.
Phylogenetic analyses
Elongation factor 2: EF-2 and EF-2 paralogs of Asgard archaea, Koarchaeota and Bathyarchaeota were aligned with a representative set of archaeal, bacterial EF-2 and eukaryotic EF-2, EFL1 and snRNP homologs using mafft-linsi (Katoh and Standley 2013). Subsequently, poorly aligned ends were removed manually before the alignments were trimmed with trimAl 5% (Capella-Gutierrez, et al. 2009), yielding 871 aligned amino acid positions. Maximum likelihood analyses were performed using IQ-tree using the mixture model LG+C60+R4+F, which was selected among the C-series models based on its Bayesian information criterion score by the built-in model test implemented in IQ-tree. Branch supports were assessed using ultrafast bootstrap approximation as well as with single branch test (-alrt option).
Diphthamide biosynthesis proteins Dph1/Dph2 (IPR016435; arCOG04112) and Dph5 (IPR004551; arCOG04161): Both Dph1 and Dph2 as well as Dph5 homologs of a representative set of eukaryotes were aligned with archaeal Dph1/2 and Dph5 homologs, respectively. Several DPANN genomes contain two genes encoding the CTD and NTD of Dph1/2 (Fig. 1, Supplementary File S1) such that Dph1/2 homologs of these organisms had to be concatenated prior to aligning Dph1/2 sequences. Alignments were performed using mafft-linsi and trimmed with BMGE (Criscuolo and Gribaldo 2010) using the blossum 30 matrix and setting the entropy to 0.55. This resulted in final alignments of 170 (Dph1/2) and 221 (Dph5). Maximum likelihood analyses were performed using IQ-tree (Nguyen, et al. 2015) with the mixture models resulting in the lowest BIC: LG+C50+R+F (Dph1/2) and LG+C60+R+F (Dph5), respectively. Branch supports were assessed using ultrafast bootstrap approximation (Hoang, et al. 2018) as well as with the single branch test (-alrt flag).
Concatenated ribosomal proteins: A phylogenetic tree of co-localized ribosomal proteins was performed using the rp15 pipeline as described previously (Zaremba-Niedzwiedzka, et al. 2017). In brief, archaeal ribosomal proteins encoded in the r-protein gene cluster (requiring a minimum of 11 ribosomal proteins) were aligned with mafft-linsi, trimmed with trimAl using the -gappyout option, concatenated and subjected to maximum likelihood analyses using IQ-tree with the LG+C60+R4+F model chosen based on best BIC score as described above. Branch supports were assessed using ultrafast bootstrap approximation as well as with the single branch test (-alrt option) in IQTREE.
Structural modeling of EF-2 homologs
Structural models of a/eEF-2 genes and paralogs were generated using the i-Tasser standalone package version 5.1 (Yang, et al. 2015), and visualized and analyzed using UCSF Chimera version 1.11.12 (Pettersen, et al. 2004). The best structural hits to the PDB for each sequence’s top-scoring model were identified using COFACTOR (Roy, et al. 2012). The Drosophila melanogaster eEF-2 structure in complex with the ribosome (PDB:4V6W) was used as a structural reference to which all models were superimposed (aligned) using Chimera’s MatchMaker.
Loop motif logos of EF-2 homologs
e/aEF-2 and paralog sequences which were used to generate the EF-2 tree were clustered at 90% amino acid identity using CD-HIT: version 4.6, -c 0.9 -n 5 (Fu, et al. 2012) and the sequence alignment was filtered to retain only cluster centroids. The conserved loop sequences were extracted from the filtered EF-2 alignment using Jalview version 2.10.1 (Waterhouse, et al. 2009), verified by cross-referencing to the structural models, and sequence logos generated on cluster centroids only using WebLogo: version 2.8.2 (weblogo.berkeley.edu) (Crooks, et al. 2004).
Accession Numbers
Taxonomy and accession numbers for all genes analyzed in this study are listed in Supplementary File S1.
RESULTS
Most Asgard archaea, Korarchaeota and Geoarchaea as well as parabasalids lack diphthamide synthesis genes
It was previously assumed that EF-2 of all eukaryotes and Archaea was uniquely characterized by the presence of diphthamide. To examine if this assumption is still valid when taking into account recently sequenced genomes, we surveyed 337 archaeal and 168 eukaryotic genomes (File S1) for each of the three known archaeal (de Crécy-Lagard, et al. 2012) and seven eukaryotic (Su, et al. 2012a; Su, et al. 2012b; Uthman, et al. 2013) dph genes. While most archaeal genomes encode clear dph homologues, we failed to detect the diphthamide biosynthesis genes in a large diversity of metagenome-assembled genomes (MAGs) of uncultured archaea, including newly assembled MAGs analyzed for this study (Fig. 1, Supplementary Fig. S1, Supplementary File S1). In particular, our analyses showed that, as reported for K. cryptophilum (de Crécy-Lagard, et al. 2012; Elkins, et al. 2008), all Korarchaeota and Geoarchaea as well as nearly all members of the Asgard archaea lack the conserved archaeal diphthamide biosynthesis genes dph1/2, dph5 and dph6. As an exception, Asgard archaea related to the Heimdallarchaeote LC3 clade were found to encode the complete archaeal diphthamide biosynthetic pathway (Fig. 1). Genes coding for Dph5 and Dph6 could not be detected in two Bathyarchaeota draft genomes (RBG_13_46_16b and SG8_32_3). However, it is unclear whether these two genomes are in the process of losing dph biosynthesis genes or whether the absence of dph5 and dph6 genes is due to the incompleteness of these draft genomes. We also surveyed 168 eukaryotic genomes and high-quality transcriptomes, including those lineages that have undergone drastic genome reduction, such as microsporidians (Corradi, et al. 2010), diplomonads (Morrison, et al. 2007), and degenerate nuclei (i.e., nucleomorphs) of secondary plastids in cryptophytes (Lane, et al. 2007) (Supplementary File S1) for dph gene homologs. We detected dph homologues in all eukaryotic genomes and transcriptomes except for parabasalid protists, including animal pathogens such as Trichomonas vaginalis, Tritrichomonas foetus and Dientamoeba fragilis (Supplementary File S1). Unless these archaea and parabasalids possess alternative, yet undiscovered diphthamide biosynthesis pathways, these findings suggest that their cognate EF-2 lacks the modified diphthamide residue. As a peculiarity, while the Dph1/2 protein is encoded by a single fusion gene in seemingly all archaea, we found that in several members of the DPANN archaea (Castelle, et al. 2015; Rinke, et al. 2013) this protein is encoded by two genes that separately code for the N- and C-terminal domains. To our knowledge, this is the first systematic report of the widespread absence of diphthamide biosynthesis in diverse eukaryotes and archaea.
Various archaeal genomes that lack diphthamide biosynthesis genes encode an EF-2 paralog
To shed light into the implications of the potential lack of diphthamide in members of the Asgard archaea and Korarchaeota, we performed detailed analyses of eukaryotic and archaeal EF-2 homologs (Fig. 1). First, we found that the draft genomes of most Asgard archaea, some Korarchaeota (Kor 1 and 3), and a few Bathyarchaeota encode two distantly related EF-2 paralogs. In contrast, the genomes of K. cyptophilum and two novel marine Korarchaeota (Kor 2 and 4) and Heimdallarchaeota LC2 and LC3 as well as Geoarchaea do not encode an EF-2 paralog. Given that the Heimdallarchaeota LC2 genome was estimated to be only 70-79 % complete (Zaremba-Niedzwiedzka, et al. 2017), and based on phylogenetic analyses (see below), we consider it possible that this genome might encode an as-yet unassembled aEF-2 paralog. The presence of paralogous aEF-2 in most Asgard archaea and some Korarchaeota genomes corresponds with the absence of diphthamide synthesis genes (Fig. 1 and 2). Yet, even though the genomes of K. cryptophilum, Kor 2, Kor 4, and Geoarchaea as well as of Heimdallarchaeote LC2 lack dph genes, they do not encode an EF-2 paralog. In all other archaeal genomes, including that of Heimdallarchaeote LC3, the absence of an EF-2 paralog correlates with the presence of dph genes.
Archaea with two EF-2 family proteins encode only one bona fide EF-2
We next addressed whether residues and structural motifs shown to be necessary for canonical translocation were conserved in the various EF-2 and EF-2 paralogs. Domain IV of EF-2, representing the anticodon mimicry domain, is critical for facilitating concerted translocation of tRNA and mRNA (Ortiz, et al. 2006; Rodnina, et al. 1997). This domain includes three loops that extend out from the body of EF-2 and interact with the decoding center of the ribosome. The first of these three loops (HxDxxHRG) (canonical residue positions are numbered according to sequence associated with D. melanogaster structural model PDB 4V6W (Anger, et al. 2013)) contains the site of the diphthamide modified histidine, H701, and is highly conserved across archaea and eukaryotes (Ortiz, et al. 2006; Zhang, et al. 2008). High conservation is also seen in a second adjacent loop (SPHKHN) in the a/eEF-2 domain IV (S581-N586), which contains a lysine residue (K584) that interacts directly with the tRNA at the decoding center, and is itself positioned by a stacking interaction between P582 and H585 (Murray, et al. 2016). The third loop appears to stabilize the diphthamide loop, partially via a salt-bridge formed between a nearby glutamate residue (E660) and R702 in the diphthamide loop (Anger, et al. 2013). Both of these residues are highly conserved among archaea and eukaryotes.
Our analyses reveal that the sequence motifs in these loops are also strictly conserved among the EF-2 family proteins of the Heimdallarchaeote LC3 lineage, Geoarchaea, as well as in those Korarchaeota and Bathyarchaeota that lack an EF-2 paralog (Fig. 3, Supplementary Fig. S2a). Notably, this conservation is seen irrespective of the presence or absence of dph genes in those genomes. However, most bona fide EF-2 of parabasalids (which lack dph genes), possesses a glycine to asparagine mutation at residue 703 (Fig. 3, Supplementary Fig. S2b, Supplementary Fig. S3a), which may compensate for the lack of the diphthamide residue by contributing an amide group (Fig. 3, Supplementary Fig. S3b).
In contrast, in those Asgard archaea and Korarchaeota (Kor 1/3 clade) that encode two EF-2 family proteins, even within the bona fide EF-2 copy, these domain IV motifs show reduced conservation. In the diphthamide loop, R702 is universally replaced by a threonine residue. In 21 of 22 aEF-2 proteins, there is a correlated mutation of E660 to either arginine or lysine (Supplementary Fig. S4). Structural homology modeling suggested that these correlated mutations likely prevent unfavorable electrostatic interactions between domain IV loops, and maintain stabilization of the diphthamide loop (Supplementary Fig. S4). While G703 is conserved in most EF-2s of archaea, all Lokiarchaeota (except Lokiarchaeota CR_4), encode either a serine or a glutamine at this site (Fig. 3, Supplementary Fig. S2a). Furthermore, analysis of the second loop (S581-N586) revealed additional crucial mutations in the EF-2 of these archaea; notably, K584 is not conserved (Fig. 3, Supplementary Fig. S2a). Despite these modifications which correlate with the presence of an EF-2 paralog in these archaea, there is still evidence for strong selection pressure maintaining many of the key conserved residues in these domain IV motifs, including H701, the target site of diphthamide modification (Fig. 3, Supplementary Fig. S2a).
In contrast, our analyses of the multiple sequence alignment and structural models suggest that the paralogous EF-2 (aEF-2p) proteins encoded by these archaea lack conservation in the stabilizing second loop (SPHKHN) as well as the first diphthamide loop (HxDxxHRG), including H701 (Fig. 3). Based on predicted fold conservation in domains I and II, and the overall conservation of the five sequence motifs (G1-G5) characterizing GTPase superfamily proteins (Atkinson 2015), aEF-2p likely maintains GTPase activity (Supplementary Fig. S5). However, given the apparent lack of conservation in key domain IV loops, it is unlikely that aEF-2p proteins can serve as functional translocases in protein translation.
EF-2 homologs of archaea experienced complex evolutionary history
To resolve the evolutionary history of EF-2, we performed phylogenetic analyses of archaeal EF-2 (aEF-2) and aEF-2p, bacterial EF-G and eukaryotic EF-2 family proteins, i.e. EF-2, Ria1 (or Elongation factor like, EFL1) and Snu114 (or U5 small nuclear ribonucleoprotein, snRNP/ U5-116kD) (Fig. 2) (Atkinson 2015). First, our analyses revealed that sequences from all non-LC3 Asgard archaea and the Kor-1 and -3 marine Korarchaeota formed two distinct clades, one of which contains canonical aEF-2 proteins (as defined by conservation of the domain IV loop known to interact with the ribosomal decoding center during translocation) while the other cluster comprises aEF-2p (Fig. 2). However, the phylogenetic placement of these protein clades relative to each other and within the phylogenetic backbone is not fully resolved due to lack of statistical support. This might be caused by modified (accelerated) evolutionary rates that appear to characterize the evolution of aEF-2 and aEF-2p in lineages that encode a paralog, as indicated by increased relative branch lengths for both the aEF-2 and aEF-2p clades (Fig. 2, Supplementary Files S2 and S3).⇓
Secondly, bathyarchaeal EF-2 homologs were also found to form two separate clades. One of these clades is placed within the TACK superphylum, and includes both canonical bathyarchaeal EF-2s as well as potential paralogs (i.e., RBG_13_46_16b and SG8-32-3). In contrast, the second clade is only comprised of two sequences (i.e., RBG_13_46_16b and AD8-1), and is placed as a sister group of all TACK, Asgard and eukaryotic EF-2 homologs (Fig. 2). In spite of this deep placement in the phylogenetic analyses, the second clade is comprised of the canonical EF-2 homologs of Bathyarchaeota genomes RBG_13_46_16b and AD8-1, based on analysis of key domain IV residues. Currently, only the most complete of the latter two draft genomes, RBG_13_46_16b, contains an aEF-2 paralog. Therefore, the current data is insufficient to resolve the puzzling pattern of EF-2 evolution in the Bathyarchaeota phylum.
Finally, in our analysis, eEF-2, Ria1 and Snu114 were found to form a highly supported monophyletic group that emerged as a sister group to the aEF-2 proteins encoded by the genomes comprising the Heimdallarchaeote LC3 clade (LC3 and B3). Close inspection of the EF-2 sequence alignment revealed that eukaryotic and LC3 EF-2 homologs share common indels to the exclusion of all other archaeal EF-2 family protein sequences (Supplementary Fig. S6, Supplementary Fig. S7). Notably, these highly conserved indels were found to be encoded by the genomic bins of two distantly related members of the Heimdallarchaeota LC3 lineage, which were independently assembled and binned from geographically distinct metagenomes (Spang, et al. 2015; Zaremba-Niedzwiedzka, et al. 2017). This refutes recently raised claims stating that these indels in Heimdallarchaeote LC3 may be the results of contamination from eukaryotes (Da Cunha, et al. 2017) while supporting the sister-relationship of eukaryotes and Asgard archaea (Eme, et al. 2017; Spang, et al.; Spang, et al. 2015; Zaremba-Niedzwiedzka, et al. 2017). In addition, despite the low sequence identity of 39%, the high-confidence modeled structure of Heimdallarchaeote LC3 EF-2 was highly similar to Drosophila melanogaster eEF-2 (RMSD (root-mean-square deviation) 1.3Ã across all 796 residues to D. melanogaster structural model PDB 4V6W (Anger, et al. 2013); Supplementary File S1). By comparison, the Heimdallarchaeaote AB-125 model aligns less confidently to the Drosophila EF-2 structure (RMSD 16.4Ã). The observed phylogenetic topology and the presence of the full complement of dph biosynthesis genes in LC3 genomes (Figs. 1 and 2), support an evolutionary scenario in which Heimdallarchaeote LC3 and eukaryotes share a common ancestry with EF-2 being vertically inherited from this archaeal ancestor.
DISCUSSION
The use of metagenomic approaches has led to an expansion of genomic data from a large diversity of previously unknown archaeal and bacterial lineages and has changed our perception of the tree of life, microbial metabolic diversity and evolution, as well as the origin of eukaryotes (Brown, et al. 2015; Castelle, et al. 2015; Hug, et al. 2016; Parks, et al. 2017; Spang, et al. 2015; Zaremba-Niedzwiedzka, et al. 2017). Since most of what is known about archaeal informational processing machineries is based on a few model organisms, we aimed to use the expansion of genomic data to investigate key elements of the translational machinery - EF-2 and diphthamidylation - across the tree of life.
Our analyses of archaeal EF-2 family proteins and the distribution of diphthamide biosynthesis genes have revealed unusual features of the core translation machinery in several archaeal lineages. These findings negate two long-held assumptions regarding the archaeal and eukaryotic translation machineries, with both functional and evolutionary implications. First, we show that diphthamide modification is not universally conserved across Archaea and eukaryotes. Second, we demonstrate that, much like Bacteria and eukaryotes (Atkinson 2015), the archaeal EF-2 protein family has undergone several gene duplication events, presumably coupled to functional differentiation of EF-2 paralogs, throughout archaeal evolution.
The evolution of archaeal diphthamide biosynthesis and EF-2 is especially intriguing in the context of eukaryogenesis. Recent findings based on comparative genomics indicate that eukaryotes evolved from a symbiosis between an alphaproteobacterium with an archaeal host that shares a most recent common ancestor with extant members of the Asgard archaea, possibly a Heimdallarchaeota-related lineage (Spang, et al. 2015; Zaremba-Niedzwiedzka, et al. 2017). Our study adds additional data to support this scenario by revealing close sequence and predicted structural similarity of canonical EF-2 proteins of the Heimdallarchaeote LC3 lineage and eukaryotic EF-2 proteins, including shared indels. Furthermore, phylogenetic analyses of EF-2 family proteins reveals that EF-2 of the Heimdallarchaeote LC3 lineage forms a monophyletic group with EF-2 family proteins of eukaryotes, and therefore suggests that the archaeal ancestor of eukaryotes was equipped with an EF-2 protein similar to the homologs found in this lineage. The subsequent evolution of the eukaryotic EF-2 family appears to have included at least two ancient duplication events leading to Ria1 and Snu114. Importantly, the presence of characteristic eukaryotic indels in EF-2 of all members of the Heimdallarchaeote LC3 lineage further strengthens this hypothesis and underlines that concerns raised about the quality of these genomic bins (Da Cunha, et al. 2017) are unjustified (Spang, et al.).
In addition, the LC3 clade also represents the sole group within the Asgard archaea that is characterized by the presence of the full complement of archaeal diphthamide biosynthesis pathway genes. However, while phylogenetic analyses of Dph1/2 show weak support for a sister-relationship between Heimdallarchaeota and eukaryotes, eukaryotic Dph5 appears to be most closely related to homologs of Woesearchaeaota (Supplementary Fig. S8, Supplementary File S3), an archaeal lineage belonging to the proposed DPANN superphylum (Castelle, et al. 2015; Rinke, et al. 2013; Williams, et al. 2017), comprising various additional lineages with putative symbiotic and/or parasitic members (reviewed in Spang et al. (Spang, et al. 2017)). Notably, a previous study has also revealed an affiliation of some eukaryotic tRNA synthetases with DPANN archaea (Furukawa, et al. 2017). Given that several DPANN lineages infect or closely associate with other archaeal lineages, they may exchange genes with their hosts frequently, as was shown for Nanoarchaeum equitans and its crenarchaeal host Ignicoccus hospitalis (Podar, et al. 2008). Following a similar reasoning, the archaeal ancestor of eukaryotes (i.e. a relative of the Asgard archaea) may have acquired genes (e.g. dph5) from an ancestral DPANN/Woesearchaeota symbiont. However, prospective analyses and generation of genomic data from additional members of the Asgard and DPANN archaea are necessary to test this hypothesis and to clarify the evolutionary history of the origin of diphthamide biosynthesis genes in eukaryotes.
Furthermore, our findings have practical implications for studies that involve phylogenetic and metagenomic analyses. Previously, EF-2 has been widely used as a phylogenetic marker, in both single-gene (Baldauf, et al. 1996; Elkins, et al. 2008; Hashimoto and Hasegawa 1996; Iwabe, et al. 1989), and multiple-gene alignments of universal single copy genes [(Guy, et al. 2014; Raymann, et al. 2015; Williams, et al. 2012), and others] to assess the relationships between Archaea, Bacteria and eukaryotes. However, the presence of paralogs of EF-2 in various Archaea and eukaryotes suggest that EF-2 should be excluded from such datasets. In addition, EF-2, Dph1/2, and Dph5 are part of single-copy marker gene sets regularly used to estimate genome completeness and purity of archaeal metagenomic bins (Parks, et al. 2015; Wu and Scott 2012). The presence of duplicated aEF-2 gene families, the absence of dph genes in most Asgard archaea, Geoarchaea and Korarchaeota, and the presence of two split genes for Dph1/2 in DPANN makes these genes unsuited as marker genes, and should hence be excluded from marker gene sets used to assess genome completeness.
The observed absence of dph biosynthesis genes in various Archaea as well as parabasalids is surprising given that diphthamide was previously thought to be a conserved feature across Archaea and eukaryotes (Schaffrath, et al. 2014), and critical for ensuring translational fidelity (Ortiz, et al. 2006). While we currently cannot rule out the possibility that dph-lacking archaea and parabasalids perform the multi-step process of diphthamidylation using a set of yet-unknown enzymes, future proteomics studies will be needed to conclusively rule out the presence of diphthamide in these taxa. Yet, it is more likely that these groups have evolved a different mechanism or mechanisms to fulfill the proposed roles of diphthamide in translation.
Many of the dph-lacking archaeal genomes encode two paralogs of the aEF-2 gene. Despite the apparent absence of diphthamide, our sequence and structural modeling analyses imply that these dipthamide-deficient aEF-2 proteins are likely under strong selective pressure to maintain translocase function. In contrast, analyses of the aEF-2p suggest that, while this paralog is a member of the translational GTPase superfamily, aEF-2p is unlikely to function in the same manner as canonical aEF-2. In fact, the complete lack of sequence conservation in aEF-2p key domain IV loop residues indicates that these paralogs are not likely to act as translocases (Fig. 3, Supplementary Fig. S2a) (Ortiz, et al. 2006; Rodnina, et al. 1997) and instead perform alternative roles. For instance, it seems possible that aEF-2p may compensate for the absence of diphthamide in at least some dph-lacking lineages. However, other functions for aEF-2p such as error-correcting back-translocation or ribosome recycling also seem possible, given the observed sub- and neo-functionalizations seen in eukaryotic and bacterial EF-2/EF-G paralogs (Qin, et al. 2006; Tsuboi, et al. 2009). Alternatively, given proposed regulation of translation via ADP-ribosylation of diphthamide (Schaffrath, et al. 2014) and a role of diphthamide in responding to oxidative stress (Argüelles, et al. 2014; Argüelles, et al. 2013), aEF-2p could perform another, yet unknown role in translation regulation.
Currently, the consequences for the absence of dph biosynthesis genes in parabasalids and in several Archaea remain unclear. Future studies could gain insight into such questions by studying translation in the genetically tractable parabasalid Trichomonas vaginalis, whose cell biology and metabolism has been extensively studied. In addition, acquisition of additional sequencing data or enrichment cultures from members of the Asgard superphylum, Korarchaeota, and other novel archaeal lineages will lead to a better understanding of the evolution and function of EF-2 family proteins, and the absence of dph biosynthesis genes.
ACKNOWLEDGEMENTS
We thank Jordan Angle, Kay Stefanik, Rebecca Daly, and Kelly Wrighton for assistance with sampling of OWC sediments, and Felix Homa for computational support. Sequencing of OWC metagenomes was conducted in part by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility that is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Sequencing of Aarhus bay metagenomes was performed by the National Genomics Infrastructure sequencing platforms at the Science for Life Laboratory at Uppsala University, a national infrastructure supported by the Swedish Research Council (VR-RFI) and the Knut and Alice Wallenberg Foundation. We thank the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) at Uppsala University and the Swedish National Infrastructure for Computing (SNIC) at the PDC Center for High-Performance Computing for providing computational resources. This work was supported by grants of the European Research Council (ERC Starting grant 310039-PUZZLE_CELL), the Swedish Foundation for Strategic Research (SSF-FFL5) and the Swedish Research Council (VR grant 2015-04959) to T.J.G.E̤ C.W.S. is supported by a European Molecular Biology Organisation long-term fellowship (ALTF-997-2015) and the Natural Sciences and Engineering Research Council of Canada postdoctoral research fellowship (PDF-487174-2016).