Abstract
Genes and genomes can evolve through interchanging genetic material, this leading to reticular evolutionary patterns. However, the importance of reticulate evolution in eukaryotes, and in particular of horizontal gene transfer (HGT), remains controversial. Given that metabolic pathways with taxonomically-patchy distributions can be indicative of HGT events, the eukaryotic nitrate assimilation pathway is an ideal object of investigation, as previous results revealed a patchy distribution and suggested one crucial HGT event. We studied the evolution of this pathway through both multi-scale bioinformatic and experimental approaches. Our taxon-rich genomic screening shows this pathway to be present in more lineages than previously proposed and that nitrate assimilation is restricted to autotrophs and to distinct osmotrophic groups. Our phylogenies show a pervasive role of HGT, with three bacterial transfers contributing to the pathway origin, and at least seven well-supported transfers between eukaryotes. Our results, based on a larger dataset, differ from the previously proposed transfer of a nitrate assimilation cluster from Oomycota (Stramenopiles) to Dikarya (Fungi, Opisthokonta). We propose a complex HGT path involving at least two cluster transfers between Stramenopiles and Opisthokonta. We also found that gene fusion played an essential role in this evolutionary history, underlying the origin of the canonical eukaryotic nitrate reductase, and of a novel nitrate reductase in Ichthyosporea (Opisthokonta). We show that the ichthyosporean pathway, including this novel nitrate reductase, is physiologically active and transcriptionally co-regulated, responding to different nitrogen sources; similarly to distant eukaryotes with independent HGT-acquisitions of the pathway. This indicates that this pattern of transcriptional control evolved convergently in eukaryotes, favoring the proper integration of the pathway in the metabolic landscape. Our results highlight the importance of reticulate evolution in eukaryotes, by showing the crucial contribution of HGT and gene fusion in the evolutionary history of the nitrate assimilation pathway.
Introduction
One of the most significant advances in evolution was the realization that lineages, either genes or genomes, can also evolve through interchanging genetic material, this leading to reticulate evolutionary patterns [1,2]. Reticulate evolution, and in particular horizontal gene transfer (HGT), is widely accepted as an important mechanism in prokaryotes [3]. However, its occurrence is still subject to controversy in eukaryotes, and its prevalence and mechanistic basis are active areas of study [4,5]. The finding of homologous genes in distantly related lineages may suggest the occurrence of HGT events [6]. However, taxonomically-patchy distributed genes can also be the result of secondary losses. Hence, the most accurate methodology for HGT detection consists of finding topological incongruences between the reconstructed phylogenetic trees and the species phylogeny [7].
Adaptation to new environments requires metabolic remodeling, and HGT of metabolic genes between prokaryotes occurs at a higher rate than that of informational genes [8]; which may facilitate the recipients’ rapid adaptation [9]. Numerous metabolic pathways in eukaryotes are of bacterial origin [6], transferred from endosymbionts [10]; and many proposed HGTs between eukaryotes also involve metabolic genes [11–13]. Hence, patchily distributed metabolic pathways make good candidate subjects for investigation into possible events of HGT in eukaryotes.
The eukaryotic nitrate assimilation pathway is strikingly patchily distributed [14]. The ability to use nitrate as a nitrogen source is not essential, but valuable in nitrate-rich environments [15,16]. In order to reduce nitrate to ammonium, a specific pathway is required, involving minimally a nitrate transporter, a nitrate reductase and a nitrite reductase (Nitrate Assimilation Proteins, NAPs) [17]. In eukaryotes, NAPs were characterized in plants and fungi and were later identified in other eukaryotes, including green and red alga, diatoms and Oomycota [14,18]. A study published a decade ago proposed that the nitrate assimilation pathway characteristic of many fungal species originated in a stramenopiles lineage leading to Oomycota, and then was transferred as a cluster from Oomycota to the root of Dikarya (Fungi). The authors also hypothesized that the acquisition of this metabolic pathway might have been an important innovation for the colonization of dry land by this fungal group [14]. However, the absence of genomic data from many eukaryotic groups left uncertainty surrounding this proposed HGT event as well as the degree to which HGT influenced the evolutionary history of this pathway in eukaryotes. We therefore performed an extensive survey of NAPs and NAP clusters in order to understand the origins and the evolution of the eukaryotic nitrate assimilation pathway.
Our updated taxon sampling extends the presence of this ecologically-relevant pathway to many previously unsampled lineages, showing a patchy distribution that overlaps with the distribution of autotrophy and osmotrophy in the eukaryotic tree. The reconstructed history indicates a pervasive role of HGT underlying this patchy distribution, with three independent bacterial transfers contributing to the origins of the pathway and at least seven well-supported transfers of NAPs and NAP clusters between eukaryotes. Our results do not agree with the proposed origin and transfer of a NAP cluster from Oomycota to dikaryotic fungi. Instead, we propose that the NAP cluster was assembled in a common ancestor of Alveolata and Stramenopiles, with Fungi vertically inheriting it from a stramenopiles transfer to an ancestral opisthokont. We also propose a horizontal origin of the Oomycota NAP cluster from Ichthyosporea, a group of unicellular relatives of animals. Gene fusion was also crucial in the evolution of this pathway, underlying the origin of the canonical eukaryotic nitrate reductase; as well as of a nitrate reductase of chimeric origins found in the NAP clusters of two ichthyosporeans. Finally, we demonstrate that this cluster is functional in the ichthyosporean Sphaeroforma arctica, with NAPs showing a strong co-regulation in response to environmental nitrogen sources. The similarities of this transcriptional control with that shown for many lineages with distinct horizontal acquisitions of the pathway indicate that this regulatory response has convergently evolved multiple times in eukaryotes.
Results
NAP genes in eukaryotes
The minimal metabolic pathway required to incorporate nitrate into the cell and reduce it into ammonium includes a nitrate transporter, a nitrate reductase and a nitrite reductase (Fig 1) [18]. The nitrate transporter NRT2 and the nitrate reductase EUKNR are involved in the first two steps of the pathway in all the eukaryotes where this metabolism has been studied. For the third and last step of the pathway, two nitrite reductases have been characterized in eukaryotes: a chloroplastic ferredoxin-dependent enzyme (Fd-NIR, characterized in land plants and green algae); and a cytoplasmic NAD(P)H dependent cytosolic enzyme (NAD[P]H-NIR, characterized in fungi).
We screened NAP genes in a taxon sampling designed to cover the broadest possible eukaryotic diversity (Fig 2). Among the 60 taxa with at least one NAP gene detected, 47 have the complete pathway (i.e. the transporter, the nitrate reductase and one of the two nitrite reductases; see Supplementary Fig 1 and Supplementary Table 1. All supplementary tables are in Supplementary File 2; supplementary files are accessible in https://figshare.com/s/d11b23d7928009e2d508). The distribution of NAP genes across eukaryotes is highly correlated, as it would be expected for genes involved in the same pathway (Supplementary Fig 2A). However, considering only taxa with at least one NAP gene, the two nitrite reductases, NAD(P)H-nir and Fd-nir, show an almost completely anti-correlated distribution (Supplementary Fig 2B). Fd-nir is restricted to autotrophic lineages (including facultative autotrophs), as expected for a chloroplastic enzyme [19]. In contrast, NAD(P)H-nir is mostly distributed along heterotrophs, except for the myzozoans Symbiodinium minutum and Vitrella brassicaformis, in which the Fd-nir is absent; and four Ochrophyta species, in which both nitrite reductases are present.
The widespread and patchy distribution of NAP genes is correlated with the distribution of different nutrient acquisition strategies within the eukaryotic tree (Supplementary Fig 2C). We found NAP genes in all the sampled autotrophs (see green circles in Fig 2). This includes taxa from groups whose plastid originated from a cyanobacterial endosymbiont (primary plastids): Glaucophyta, Rhodophyta and Chloroplastida; as well as algal groups whose plastid originated from an eukaryotic endosymbiont (complex plastids): Haptophyta, Cryptophyta, Chlorarachniophyta, S. minutum, V. brassicaformis and Ochrophyta [20]. Among heterotrophs, we found the complete pathway in Fungi and Oomycota, as reported in previous studies, but also in Teretosporea and Labyrinthulea. These groups are phylogenetically distant but share many analogous cellular and ecological features related to their proposed convergent evolution towards an osmotrophic lifestyle [21] (Fungal-like osmotrophs, see brown circles in Fig 2). We did not find the entire nitrate assimilation pathway in any of the phagotrophic lineages sampled (Supplementary Fig 2C).
The distinct origins of NAP genes in eukaryotes
Previous studies proposed a bacterial origin for the transporter and the two nitrite reductases [14]. However, which particular bacterial group(s) were the possible donors of these three NAP genes was not determined. We investigated the origin of Fd-nir, NAD(P)H-nir and nrt2 in eukaryotes using a comprehensive and taxonomically representative prokaryotic dataset.
The bacterial donors of Fd-nir, NAD(P)H-nir and nrt2
The reconstructed phylogenies of these three NAPs with prokaryotic homologs show in all cases a well-supported monophyletic clade that includes eukaryotic sequences branching distantly to archaeal ones (Fig 3). These suggest that each NAP descends from a single eukaryotic acquisition, and also that they were not vertically inherited from Archaea, but horizontally transferred from Bacteria to eukaryotes.
Our phylogeny supports that the eukaryotic nitrite reductase Fd-NIR descends from Cyanobacteria, as was previously suggested based on sequence-similarity analyses [22] (100% UFboot; Fig 3 and Supplementary Fig 3). Because of its cyanobacterial origin and given that Fd-NIR activity has been located in the chloroplast, it is tempting to propose that Fd-nir was transferred to eukaryotes from the cyanobacterial endosymbiont from which all the primary plastids originated. However, not all the proteins of plastidic activity originated from this organelle [23], so it remains unclear whether Fd-nir originated from the endosymbiont or not. If Fd-nir is of plastidic origin, we would then expect a similar phylogenetic position of the eukaryotic Fd-NIR in relation to Cyanobacteria than in the phylogenies of the photosystem II subunit III and the ribosomal protein L1; two genes of bona fide plastidic origin (encoded in the plastid genome of Cyanophora paradoxa [24]). The branching pattern of eukaryotic sequences in Fd-NIR and the two plastidic genes phylogenies suggest an early-branching cyanobacterial lineage as the donor in all cases (Supplementary Figs 3, 4 and 5). Notwithstanding whether plastids originated from an early or a deep cyanobacterial lineage [24,25], we interpret the similar phylogenetic relationships between eukaryotes and Cyanobacteria in all our phylogenies as moderate support for a plastidic origin for Fd-nir. In all the sampled taxa we found Fd-nir in genomic sequences corresponding to the nuclear genome. This indicates that Fd-nir would have been transferred to the nucleus before the divergence of all primary algal lineages, as indeed occurred with many proteins of plastidic activity [10].
A cyanobacterial origin is unlikely for NAD(P)H-nir and nrt2 (Fig 3). In the NAD(P)H-NIR phylogeny (Fig 3 and Supplementary Fig 6), the sister-group position of Planctomycetes to all eukaryotes suggest that this cytoplasmic nitrite reductase originated in eukaryotes through a HGT from this marine bacterial group. Finally, the phylogeny of NRT2 does not support any particular bacterial lineage as the donor of this nitrate transporter to eukaryotes (Fig 3 and Supplementary Fig 7).
EUKNR originated by gene fusion
In contrast to Fd-nir, NAD(P)H-nir and nrt2, the distribution of euknr is restricted to eukaryotes. The exclusive combination of protein domains shown by this nitrate reductase suggests a chimeric origin involving the fusion of different proteins. Hence, we used a sequence similarity network-based approach [2] to investigate which ancestral protein families were involved in EUKNR origins. A first network between EUKNR and similar eukaryotic and prokaryotic sequences was constructed (Fig 4A; see Materials and methods section for details about the network construction process). The topology of the network shows five different clusters, each one representing a specific protein family, namely, bacterial sulfite oxidases (SUOX), eukaryotic SUOX with a Cytochrome b5 (Cyt-b5) domain, eukaryotic SUOX without a Cyt-b5 domain, EUKNR, and NADH reductases (Fig 4A). The pattern connecting the EUKNR with the eukaryotic SUOX and NADH reductase clusters is characteristic of composite genes [26]; in which two unrelated gene families are connected in the network through an intermediate gene family. This suggests that EUKNR shares homology with both eukaryotic SUOX and NADH reductases [2]. Hence, a gene fusion between eukaryotic SUOX and NADH reductases would account for the origin of respectively the N-terminal and C-terminal EUKNR domains.
In the represented network, only eukaryotic SUOX without a Cyt-b5 domain are connected to EUKNR. This suggests that EUKNR are more related to SUOX without a Cyt-b5, a result in agreement with standard phylogenetic methods (EUKNR sequences branched closer to SUOX without Cyt-b5; see Supplementary Fig 8). To determine the origin of the Cyt-b5 region, we used the Cyt-b5 domain of the two EUKNR reference sequences to construct a second network including those eukaryotic and prokaryotic proteins that aligned to this specific EUKNR region (Fig 4C). The two Cyt-b5 EUKNR regions connected with a lower E-value with Cyt-b5 monodomain proteins than with proteins whose architectures contain other domains in addition to Cyt-b5 (e.g. SUOX). This strongly suggests that the Cyt-b5 region of EUKNR was not acquired from SUOX but rather originated from a third protein. We thus propose that EUKNR has a chimeric origin, evolving from a fusion of genes belonging to three distinct families: eukaryotic SUOX (without Cyt-b5), Cyt-b5 monodomain proteins, and NADH reductases.
Evaluating the impact of HGT in NAPs evolution
Some of the topologies shown in the phylogenies of NAPs (Fig 5) strongly disagree with the eukaryotic species tree (Fig 2), and hence would require a large number of ancestral paralogues and differential paralogue losses to be accounted by a strictly vertical inheritance scenario. In general (and with the exception of nrt2, see below), there is usually one copy of NAP genes per genome (see Supplementary Table 1). Therefore, we did not find any a priori reasons to hypothesize that the number of copies could have been greater in the ancestral genomes. To evaluate potential cases of HGT, we performed AU tests [27] (see all tested topologies and AU-test results in Supplementary Table 3), as well as additional phylogenetic inferences excluding conflicting taxa and increasing the taxon sampling by incorporating orthologues from the taxon-rich Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP) dataset [28] (MMETSP trees, see Materials and methods section).
Fd-NIR
This chloroplastic nitrite reductase is restricted to photosynthetic groups including all the primary algal groups (Glaucophyta, Rhodophyta, Chloroplastida), which belong to the Archaeplastida supergroup, as well as most of the sampled complex plastid algal groups, with the exception of the two photosynthetic myzozoans sampled (Alveolata, SAR). These complex plastid algal groups include Haptophyta, Ochrophyta (Stramenopiles, SAR), Bigellowiella natans (Chlorarachniophyta, Rhizaria, SAR), and Guillardia theta (Cryptophyta, Archaeplastida) (Figs 2 and 5).
In the inferred phylogenetic tree, Galdieria sulphuraria (Rhodophyta) Fd-NIR is the earliest branch within the eukaryotic clade (Fig 5, Supplementary Fig 9), in disagreement with the accepted eukaryotic tree (Fig 2). However, the low nodal support and the fact that it branches with other Rhodophyta sequences in the MMETSP tree suggest that this position is artefactual (Supplementary Fig 10). Surprisingly, sequences from three Chloroplastida branch together with sequences from Chondrus crispus and Pyropia yezoensis (Rhodophyta), and are hence separated from the rest of Chloroplastida (we rejected the monophyly of Chlorophyta) (p-AU 0.0019). This unexpected topology is also observed and well supported in the MMETSP tree, suggesting that it is unlikely to represent a phylogenetic artefact. Because we showed that all eukaryotic Fd-nir descend from a unique transfer from Cyanobacteria (Fig 4), this conflicting topology could only be explained either by ancestral duplication and differential paralogue loss or by HGT.
All Fd-NIR sequences from Ochrophyta, Chlorarachniophyta and Haptophyta form a monophyletic clade that branch within Archaeplastida, with strong support in the MMETSP tree (99% UFBoot). We rejected a topology constraining the monophyly of all Archaeplastida sequences (p-AU 0.0009). These results, together with Fd-NIR being a plastidic enzyme, supports a common origin of both Fd-nir and plastids in these complex plastid algal groups. The phylogenetic position of Haptophyta sequences within Ochrophyta is in agreement with recent studies suggesting an Ochrophyta origin of the Haptophyta plastid [29,30] (we rejected the monophyly of Ochrophyta sequences) (p-AU 0.0029). The position of Chlorarachniophyta Fd-NIR within Ochrophyta, however, is more difficult to explain. One could argue that this is due to low phylogenetic signal given that Chlorarachniophyta is the closest group to Ochrophyta among the taxa with Fd-NIR. In fact, we could not reject an alternative topology representing a vertical inheritance of Fd-nir in these two groups from a SAR common ancestor (B. natans as sister-group to the Ochrophyta + Haptophyta clade) (p-AU 0.2318). However, we consider the HGT scenario more parsimonious since the same topology was recovered in both the MMETSP tree and NRT2 tree (Fig 5, see below).
NAD(P)H-NIR
This cytoplasmic nitrite reductase is distributed in two eukaryotic supergroups, Opisthokonta and SAR (Figs 2 and 5). Within Opisthokonta, this gene is present in Fungi and Teretosporea; while in SAR, we found it in Labyrinthulea and Oomycota (Stramenopiles) and in some of their photosynthetic relatives from Ochrophyta (Stramenopiles) and Myzozoa (Alveolata). Again, the reconstructed phylogeny shows a topology discordant with the eukaryotic species tree (Fig 2), with sequences from Myzozoa, Ochrophyta and Labyrinthulea branching as the earliest clades, respectively, to a clade including all Opisthokonta and Oomycota sequences (Fig 5). A hypothetical scenario that could be proposed from this topology would imply that NAP(P)H-nir was transferred from Bacteria (Planctomycetes, see Fig 3) to a common ancestor of Alveolata and Stramenopiles, then vertically inherited in some Stramenopiles and transferred to Opisthokonta. Our test of alternative topologies rejected the monophilies of Opisthokonta, Stramenopiles and Teretosporea (p-AU of 0.0022, 0.0002 and 0.0033, respectively). While this altogether supports HGTs involving these groups, the exact number of transfers involving Opisthokonta and Stramenopiles lineages is uncertain. We thus evaluate two parsimonious scenarios as potential explanations for the evolutionary history of NAD(P)H-nir, which are based on three assumptions.
The first assumption is that NAD(P)H-nir originated in Opisthokonta from at least one HGT from Stramenopiles. Although we could not reject a vertical inheritance of all Opisthokonta sequences from a common eukaryotic ancestor (Stramenopiles + Opisthokonta clade as sister-group to other eukaryotes) (p-AU 0.1387); we consider this as poorly parsimonious given the gene distribution and the topology, not only of the NAD(P)H-NIR tree but also of other NAPs (Fig 5). The second assumption is that a member of Labyrinthulea was involved in an HGT event to Opisthokonta, given its sister-group position to the Opisthokonta + Oomycota clade (94% of UFBoot). Finally, a third assumption is that there was an HGT event between Ichthyosporea and Oomycota, given that they branch as sister-groups with high support (100% of UFBoot). Labyrinthulea and Oomycota are hence the two Stramenopiles groups that we assume as potential donors of NAD(P)H-nir to Opisthokonta. From this, we envision two parsimonious scenarios. A first one considers that all opisthokont sequences descend from a single labyrinthulean transfer, which would imply that Ichthyosporea subsequently transferred this gene to Oomycota. A second scenario considers that the transfer was from Oomycota to Ichthyosporea, and hence that two different Stramenopiles lineages transferred NAD(P)H-nir to Opisthokonta.
If the first scenario is correct, and Ichthyosporea transferred NAD(P)H-nir to Oomycota, a phylogeny excluding Ichthyosporea should show the Oomycota sequences to be more related to other Opisthokonta than to Ochrophyta (the closest taxonomic group to Oomycota in the NAD(P)H-NIR dataset, see Fig 2). In agreement with this, a phylogeny excluding Ichthyosporea shows Oomycota NAD(P)H-NIR branching within Opisthokonta with a strong support (UFBoot 96%, see Supplementary Fig 11). Moreover, an Oomycota + Ochrophyta clade is rejected by the test of alternative topologies (p-AU 0.0238). However, and unexpectedly, the same pattern did not occur when we removed sequences from Oomycota, given that ichthyosporean sequences branch with Labyrinthulea rather than with other Opisthokonta (Supplementary Fig 12). In this dataset without Oomycota, we could not reject the topology forcing Ichthyosporea and Ochrophyta together (to be expected if ichthyosporean sequences originated from the Stramenopiles lineage) (p-AU 0.0691). However, given that the Ichthyosporea + Labyrinthulea clade is poorly supported (UFBoot 77%, Supplementary Fig 12) and given that we rejected the Oomycota + Ochrophyta clade in the absence of Ichthyosporea, we consider these results altogether more consistent with a transfer from Ichthyosporea to Oomycota rather than in the opposite direction, thus supporting the first scenario proposed (i.e. a transfer from Labyrinthulea to Opisthokonta, and then from Ichthyosporea to Oomycota).
Lastly, we consider the unexpected position of Corallochytrium limacisporum (Corallochytrea, Teretosporea) NAD(P)H-NIR within the fungal clade (Fig 5) as a phylogenetic artefact rather than as a bona fide HGT event. First, the nodal support for the position is low (UFBoot 71%). In addition, C. limacisporum appears as a long branch taxa [31] in phylogenomic analyses [32,33], and hence is a problematic taxa when used in phylogenetic inferences. Moreover, topologies suggesting alternative scenarios representing an independent origin of its NAD(P)H-nir from Fungi could not be excluded, such as forcing the monophyly of fungal sequences or forcing the monophyly of C. limacisporum + Ichthyosporea + Oomycota (p-AU 0.3787 and 0.1879, respectively).
NRT2
This nitrate transporter is widely distributed among eukaryotes with nitrate and nitrite reductase genes (Supplementary Fig 2), with numerous species harboring lineage-specific duplications (see blue dots in Supplementary Fig 13). In particular, we found 66 species-specific duplications among the 56 species in which we found nrt2, with 36 duplication events occurring in Streptophyta (Chloroplastida). Again, to reconcile the recovered topology with the eukaryotic tree (Fig 2), a strictly vertical inheritance scenario would require a large number of ancestral duplications and differential paralogue losses. While obvious nrt2 paralogues are observed (Supplementary Fig 13), these must correspond to recent duplications given that sequences from the same species branch close to each other. Therefore, as with other NAP genes, an HGT agnostic scenario would be poorly supported given the absence of evident ancestral paralogues.
Sequences from Archaeplastida groups with primary plastids appear as the earliest clades of the tree (Fig 5), together with other eukaryotes (we rejected the monophyly of all Archaeplastida sequences) (p-AU 0.0008). The earliest-branching eukaryotic clade comprises only sequences from Glaucophyta (UFBoot 100%). The other eukaryotic NRT2 sequences branch in two clades that are strongly supported also in the taxon-rich MMETSP tree (Supplementary Fig 14). The first of these two clades includes all the Chloroplastida and Haptophyta sequences. It is unclear whether Haptophyta are more related to Chloroplastida [34] or to the SAR supergroup [35]. If Haptophyta were more related to SAR, its position in this tree could be interpreted as a support for a horizontal origin of nrt2 from Chloroplastida. Indeed, a previous study suggested that Haptophyta could have received genes of non-plastidic function from the green-plastid lineage [36]. The second clade includes Rhodophyta and Cryptophyta sequences branching as sister-group to a SAR + Opisthokonta clade. Even though Cryptophyta presumably belongs to the Archaeplastida supergroup, its position as sister-group to Rhodophyta is unexpected [34,35] and could represent an ancestral Archaeplastida paralogue conserved in both groups. However, the plastid proteomes of Cryptophyta show clear signatures of a Rhodophyta contribution [20,37], and hence nrt2 could have been transferred to Cryptophyta from a red algal endosymbiont. Since a red algal signal has also been found in some SAR plastid proteomes [20,37], we also propose a second transfer from Rhodophyta to a SAR common ancestor; although we cannot discard an alternative scenario involving a first transfer from Rhodophyta to Cryptophyta and then from Cryptophyta to a SAR common ancestor (p-AU of 0.2698).
The topology within the SAR + Opisthokonta clade resembles that of the NAD(P)H-NIR tree (Fig 5). Myzozoan sequences appear as the earliest-branching clade (Alveolata, SAR), with a clade including Ochrophyta sequences (Stramenopiles, SAR) branching as sister-group to a clade including the sequences from Oomycota and Labyrinthulea (Stramenopiles) and from Teretosporea and Fungi (Opisthokonta). However, this tree also includes sequences from B. natans (Chlorarachniophyta, Rhizaria, SAR) branching within the Ochrophyta clade, as in the Fd-NIR tree (Fig 5). Given that additional Chlorarachniophyta sequences robustly branch as sister-groups to and within Ochrophyta in the NRT2 and Fd-NIR MMETSP trees (Supplementary Figs 14 and 10, respectively), we propose that these two NAP genes were co-transferred from Ochrophyta to Chlorarachniophyta. Indeed, while all Chlorarachniophyta plastids presumably descend from a green algal endosymbiont [20], the chimeric signal of their plastid proteome suggests that other algal lineages could have contributed to the gene repertoire of this mixotrophic algal group [38].
As with the NAD(P)H-NIR phylogeny, our test of alternative topologies rejected the monophilies of Opisthokonta, Stramenopiles and Teretosporea (p-AU 0.0246, 0.0028 and 0.0373, respectively). This strongly supports HGT events involving these groups. There are two reasons in favor of at least one HGT event from Stramenopiles to Opisthokonta. Firstly, as with NAD(P)H-NIR, the earliest branching positions of other SAR lineages to the Stramenopiles + Opisthokonta clade suggests that it is more likely that the first donor of the clade was a member of the Stramenopiles, which would have inherited the genes from a common ancestor. Secondly, we rejected a topology compatible with a vertical inheritance scenario of this gene in Opisthokonta from a common ancestor to all eukaryotes (the Stramenopiles + Opisthokonta clade as sister-group to other eukaryotes, which would imply an HGT origin of nrt2 in Labyrinthulea and Oomycota) (p-AU 0.0014).
In contrast to the NAD(P)H-NIR phylogeny, Labyrinthulea sequences do not branch as a sister-group to Opisthokonta + Oomycota, appearing as the sister-group (together with C. limacisporum) to Fungi (Fig 5). Based on this topology, one possible interpretation is that nrt2 could have originated in Opisthokonta from an ancestral member of the Stramenopiles. Then, Oomycota and Labyrinthulea would have acquired nrt2 from Ichthyosporea and from an ancestor of C. limacisporum, respectively; instead of vertically inheriting nrt2 from that Stramenopiles ancestor. Because this hypothesis (H1) is too outlandish and the accuracy of protein trees is limited [39], we evaluated the following alternative potential scenarios for the origin of the nrt2 in Opisthokonta (Supplementary Fig 15): (H2) nrt2 would have been transferred from (i) Oomycota to a common ancestor of Opisthokonta, and then from (ii) an ancestor of C. limacisporum to Labyrinthulea; (H3) nrt2 would have been transferred from (i) Labyrinthulea to a common ancestor of Opisthokonta, and then from (ii) Ichthyosporea to Oomycota ‐the more parsimonious history for NAD(P)H-nir-; (H4) same as H3, but (ii) with Oomycota as a donor of the transfer to Ichthyosporea; and (H5) two labyrinthulean transfers would have occurred: a first transfer to common ancestor of Fungi and a second transfer to an ancestor of C. limacisporum. In this scenario, Oomycota would have transferred nrt2 to Ichthyosporea.
We discarded the hypothesis H2 given its incompatibility with the molecular clock-based divergence times proposed for the evolution of eukaryotes [40]. In particular, Oomycota could not have transferred nrt2 to a common ancestor of Teretosporea and Fungi because these two lineages would have diverged before Oomycota originated [40]. However, we could not discard any of the other hypotheses (detailed below). As with NAD(P)H-NIR, if H3 was true (the p-AU for this topology is 0.1818), we would expect the sequences from Oomycota to be more related to a Opisthokonta + Labyrinthulea clade rather than to Ochrophyta in a tree without Ichthyosporea (Supplementary Fig 16). The same applies for the ichthyosporean sequences in a tree without Oomycota (Supplementary Fig 17). If, however H4 or H5 were true, we would expect exactly the opposite topology (i.e. Oomycota more related to Ochrophyta in the absence of Ichthyosporea, and vice versa). Unfortunately, the topologies recovered show contradictory results (Supplementary Fig 16 and 17). To conclude, even though H3 could be favored given that it is the most parsimonious hypothesis for NAD(P)H-NIR phylogeny (see previous Results section), neither H4 nor H5 can be rejected, at least by the phylogenetic signal of NRT2.
EUKNR
The distribution of euknr was found to be very similar to that of nrt2 (Supplementary Fig 2). Euknr is present in most photosynthetic and non-photosynthetic organisms for which we inferred the capability to assimilate nitrate (Figs 2 and 5). Interestingly, we found the euknr (but no other NAP genes) also in Chromosphaera perkinsii (Ichthyosporea, Opisthokonta). The presence of euknr in an additional ichthyosporean besides Creolimax fragrantissima and Sphaeroforma arctica may be an indicator that their NAP genes were vertically inherited from an opisthokont ancestor and subsequently lost in the other ichthyosporeans, many of which have been described as parasitic species [41]. This was the scenario proposed by the hypothesis H3 (Supplementary Fig 15). Unfortunately, the phylogenetic signal does not allow to confidently infer the evolutionary history of this gene, including the eukaryotic lineage in which this nitrate reductase would have had originated (Fig 5 and Supplementary Fig 18).
Notwithstanding the weak support of the phylogeny, we found three unexpected and well-supported relationships between distantly related taxa. Firstly, Oomycota branches as the sister-group to the ichthyosporeans C. fragrantissima and S. arctica, as in the NRT2 and NAD(P)H-NIR trees (UFBoot 95%). This strongly indicates that a transfer of the whole pathway occurred between Oomycota and Ichthyosporea. Secondly, there is a clade that comprises several distantly related fungal sequences as well as a sequence from Acanthamoeba castellanii (Amoebozoa). However, sequences from these fungal taxa are also found in another clade that includes the A. nidulans sequence of bona fide nitrate reductase activity (named Anid_NaR in the euk_db dataset) [42][43]. Moreover, there is experimental evidence excluding that the A. nidulans euknr paralogue could function as a nitrate reductase [44] (Kaufmann, J. Fritz, D. Canovas, M. Gorfer and J. Strauss, personal communication). Thus, we propose that a fungal paralogue of euknr, of uncertain function, was transferred from an ancestral fungus to a lineage leading to A. castellanii. In fact, the finding of a gene transfer in A. castellanii is not surprising considering the extensive signatures of HGT found in this early amoebozoan lineage. Thirdly, B. natans branches in-between Chlorophyta, in agreement with its plastid being originated from a green algal endosymbiont [20].
Origin and evolution of NAP clusters
We then inquired the importance of NAP clustering in shaping the evolution of this pathway. To this end, we analyzed the distribution of the clusters within the eukaryotic tree (Fig 2). We found clusters in >56% of the sampled eukaryotes with at least two NAP genes in the genome (Supplementary Table 1). In particular, clusters were patchily distributed in Fungi, Teretosporea, Oomycota, Ochrophyta, Labyrinthulea, Myzozoa, Chlorarachniophyta, Chlorophyta, Rhodophyta and Haptophyta. The patchy distribution of the clusters within these groups suggests that many de-clustering events had occurred, assuming that de-clustering events are more parsimonious than de novo clustering events. NAP genes are always found unclustered in Cryptophyta, Glaucophyta and Streptophyta. While Cryptophyta and Glaucophyta are poorly represented in our dataset, the absence of clusters in Streptophyta (includes land plants) is remarkable since NAPs are found in the 9 sampled genomes of this group (Supplementary Table 1).
We found that >64% of the detected clusters include the whole pathway, nrt2, euknr and either NAD(P)H-nir or Fd-nir. In Ochrophyta, the only eukaryotic group with taxa showing both nitrite reductases in the same genome, we found clusters comprising nrt2 or euknr and either Fd-nir or NAD(P)H-nir, but never both (Fig 2). In agreement with the gene distribution, clusters with Fd-nir are found only in autotrophs. While clusters with NAD(P)H-nir are also found in autotrophs, in particular in two Ochrophyta (Stramenopiles, SAR) and in one Myzozoa (Alveolata, SAR) species; they are mostly distributed along osmotrophic taxa from Oomycota and Labyrinthulea (Stramenopiles, SAR) and from Fungi and Teretosporea (Opisthokonta) (NAP clusters with NAD(P)H-nir hereafter abbreviated as hNAPc, for “heterotrophic NAP clusters”).
The presence in two of the three primary algal groups of NAP clusters including Fd-nir leads to two potential scenarios. First, we can envision a unique origin of the cluster in an archaeplastidan ancestor. If Glaucophyta, where the NAPs are not clustered (Fig 2), was an earlier lineage than Rhodophyta and Chloroplastida [24], cluster formation could have occurred in the last common ancestor of Rhodophyta and Chloroplastida. If so, at least two de-clusterization events would have occurred: one in the lineage leading to C. crispus and P. yezoensis (Rhodophyta) and the other in the lineage leading to Streptophyta (Chloroplastida) (Fig 2). A second scenario would imply at least two independent clustering events, in the lineages leading to Cyanidioschyzon merolae and G. sulphuraria (Rhodophyta) and to Chlorophyta (Chloroplastida).
The tendency of NAP genes to be clustered in green and red algae lineages may have facilitated the transfer of the entire pathway during the endosymbiotic events involving these algal groups [20]. However, the phylogenetic signal of NAPs suggests that not all the clusters in complex plastid algae would have been acquired from a single endosymbiont, with at least two independent clustering events occurring in the lineages leading to Chrysochromulina sp. (Haptophyta) and B. natans (Chlorarachniophyta). In Chrysochromulina sp., the cluster would have had originated after the acquisition of nrt2 and Fd-nir from Chloroplastida and Ochrophyta, respectively (Fig 5). In B. natans, the cluster would have had originated after the acquisition of euknr and Fd-nir from Chloroplastida and Ochrophyta, respectively.
For hNAPc, we propose that this cluster could had been transferred between heterotrophs given that sequences from taxa bearing the cluster (Fig 2) branch close to each other in the NAP trees (Fig 5). This would have allowed transfers of the entire metabolic pathway, which we consider more parsimonious than individual transfers of the genes followed by multiple clusterization events. Thus, in agreement with NRT2 and NAD(P)H-NIR phylogenies, we propose that hNAPc would had been originated in a common ancestor of Alveolata and Stramenopiles and later transferred between Stramenopiles and Opisthokonta. The phylogenetic signal does not allow to infer the number and direction of hNAPc transfers that had occurred between Stramenopiles and Opisthokonta.
A tetrapyrrole methylase and the origin of NAPs in Opisthokonta
Given the uncertainty of the phylogenetic signal, we searched for additional features that could help clarify which of the proposed hypotheses for the origin of hNAPc in Opisthokonta is more parsimonious (Supplementary Fig 15). We checked intron positions, but we found them to be poorly conserved and not useful to clarify phylogenetic relationships (data not shown). We found, however, in the genomes of C. limacisporum, C. fragrantissima (Teretosporea, Opisthokonta) and Aplanochytrium kerguelense (Labyrinthulea, Stramenopiles, SAR) an additional protein annotated with a TP_methylase Pfam domain clustering with NAP genes. The three proteins showed highest similarity to Uroporphyrinogen-III C-methyltransferases (data not shown), a class of tetrapyrrole methylases involved in the biosynthesis of siroheme (which works as a prosthetic group for many enzymes, NAD(P)H-NIR included) [45]. A phylogenetic tree of this protein family showed a clade that includes the three proteins clustered with NAP genes as well as other eukaryotic proteins branching within a bacterial clade (Supplementary Figs 19 and 20). We, therefore, consider this group a subfamily of eukaryotic tetrapyrrole methylases, hereafter referred as “TPmet”.
We showed that NAP phylogenies support an origin of NAP genes in Opisthokonta through a transfer of hNAPc from Stramenopiles. Under this hypothesis, the finding of tpmet clustered with NAP genes (tpmet-hNAPc) in three Opisthokonta and Labyrinthulea taxa could suggest that tpmet may have been co-transferred in cluster with NAP genes also from Stramenopiles to Opisthokonta. If so, we could expect the phylogeny and the distribution of tpmet to resemble that of hNAPc and the corresponding genes. While the inferred tree is poorly informative given the low nodal support values (Supplementary Fig 20), the distribution of tpmet shows similarities with the distribution of hNAPc. In particular, we found tpmet in ~93% of taxa encoding NAD(P)H-nir, the defining gene of the cluster (see the previous Results section). However, only ~55% of the taxa with tpmet have NAD(P)H-nir. Interestingly, almost all the ~45% of taxa with tpmet that do not have NAD(P)H-nir correspond to Opisthokonta lineages without NAP genes, such as Nucleariidae, Choanoflagellata as well as many ichthyosporeans (Supplementary Fig 21). Thus, if tpmet emerged in Opisthokonta through a tpmet-hNAPc transfer, its presence in these early lineages would reinforce the single transfer scenario to a common ancestor of Opisthokonta from Stramenopiles, being NAP genes secondarily lost in those taxa. Moreover, the presence of tpmet-hNAPc in C. limacisporum (Corallochytrea, Teretosporea) and C. fragrantissima (Ichthyosporea, Teretosporea) favors a vertical inheritance of tpmet-hNAPc in Corallochytrea and Ichthyosporea from a common teretosporean ancestor rather than a horizontal origin of the ichthyosporean NAP genes from Oomycota; given that tpmet is not clustered with NAP genes in any Oomycota genome. Overall, among all the proposed scenarios, we propose H3 as the most parsimonious given the phylogenetic signal of NAPs and the distributions of tpmet and tpmet-hNAPc (see all the scenarios evaluated in Supplementary Fig 15).
A novel chimeric nitrate reductase in ichthyosporean NAP clusters
In C. fragrantissima and S. arctica, rather than the canonical nitrate reductase, we identified a gene clustered with nrt2 and NAD(P)H-nir that has a chimeric domain architecture consisting of (i) the first three Pfam domains of the EUKNR in the N-terminal region; and (ii) the first two Pfam domains of the NAD(P)H-NIR in the C-terminal region (Fig 6A). A domain architecture analysis of proteins from euk_db and prok_db (see Materials and methods section) showed this unexpected domain architecture to be restricted to these two ichthyosporeans. Phylogenetic analyses showed that the region containing the Oxidoreductase molybdopterin binding, Mo-co oxidoreductase dimerisation and Cytochrome b5-like Heme/Steroid binding Pfam domains correspond to the EUKNR family (Fig 5 and Supplementary Fig 18), and includes the nitrate reducing module characteristic of this nitrate reductase [46]. In contrast, the C-terminal region, corresponding to the Pfam domains Pyridine nucleotide-disulphide oxidoreductase and BFD-like [2Fe-2S] binding domain, branched within the NAD(P)H-NIR clade in a tree including all the eukaryotic and prokaryotic proteins containing this pair of domains (Fig 6B and Supplementary Fig 22). In that tree, the two sequences branched as sister-group to C. fragrantissima and S. arctica NAD(P)H-NIR proteins. Therefore, we propose that this chimeric gene originated after the replacement of the canonical C-terminal EUKNR region by the N-terminal region of the NAD(P)H-NIR in a common ancestor of these two ichthyosporeans (hereafter we refer to this gene as C. fragrantissima and S. arctica putative nitrate reductase, abbreviated as CS-pNR). This event should have occurred after the HGT event involving Ichthyosporea and Oomycota, since the nitrate reductase found in Oomycota does comprise the canonical domain architecture of EUKNR (Fig 6A).
S. arctica has a NAP cluster functional for nitrate assimilation
We sought for experimental evidence of nitrate assimilation in S. arctica, as a representative of an ichthyosporean NAP cluster including the putative uncharacterized nitrate reductase (CS-pNR). We developed a minimal growth medium in which the nitrogen source (N source) can be controlled (‘modified L1 medium – mL1’, see Materials and methods). We then tested growth of S. arctica in mL1 minimal medium with different N sources (Fig 7A). In all the minimal medium conditions (mL1 + different N sources), cells were smaller than in Marine Broth, used as the positive control. We observed a slight growth in the negative control (‘mL1’) after 168 hours, which we hypothesize can be due to the use of cell reserves, or the utilization of vitamins from the medium as N source. We detected a clearly stronger growth in mL1 supplemented with either NaNO3, (NH4)2SO4 or urea, compared to mL1 without any N source. The growth observed in mL1 + NaNO3 shows that S. arctica is able to assimilate nitrate.
The finding that S. arctica can grow using nitrate as the sole N source implies that this organism must have a nitrate reductase activity, and the CS-pNR is indeed a strong candidate to carry out this enzymatic activity, in line with the bioinformatics evidence (Fig 6). In general, NAP genes from different eukaryotic species had been shown to be co-regulated in response to environmental N sources [19,47–50]. Hence, a co-regulated expression of CS-pnr with nrt2 and NAD(P)H-nir would be consistent with their proposed role in nitrate assimilation. We thus measured the levels of expression of the three genes in S. arctica, in the presence of different N sources (Fig 7B). The three S. arctica genes were up-regulated either in mL1 without any N source as well as in mL1 + NaNO3. In contrast, we observed that the three genes were poorly expressed in mL1 + (NH4)2SO4 and in mL1 + urea. These results show that the cluster is functional in S. arctica and also that its expression is regulated in response to different N sources.
Discussion
Nitrate assimilation is restricted to autotrophs and fungal-like osmotrophs
Our screening of NAP genes provides an updated and comprehensive picture of the distribution of the nitrate assimilation pathway in eukaryotes (Fig 2). Besides the taxa included in previous studies [14,51], we describe the presence of the complete pathway in Haptophyta, Cryptophyta, Chlorarachniophyta, Myzozoa, Labyrinthulea and Teretosporea (Supplementary Fig 1). While all autotrophs analyzed have NAP genes, this is not the case for heterotrophs, where we only found NAP genes in taxa from those groups that have convergently evolved into a fungus-like osmotrophic lifestyle [21], that is Fungi, Ichthyosporea, Oomycota and Labyrinthulea (Fig 2 and Supplementary Fig 2). The absence of this pathway in phagotrophs is probably due to the fact that this nutrient acquisition strategy provides access to organic nitrogen sources, whose incorporation is energetically less demanding than nitrate. Thus, NAP genes would be less likely to be acquired and more prone to be lost in phagotrophic lineages, as presumably occurred with genes involved in the synthesis of certain amino acids [52].
HGT and the evolutionary history of NAPs in eukaryotes
The patchy distribution of this metabolic pathway (Fig 2) and the large number of non-vertical relations observed in our phylogenies (Fig 5) are not consistent with an exclusive scenario of vertical transmission and gene loss. We consider that some of the unexpected topologies found represent indeed bona fide gene transfers, because we consistently recovered them in more than one NAP phylogeny and/or because they fit with endosymbiotic events proposed for the acquisition of complex plastids [10,37]. Here we detail our proposed evolutionary scenario to account for the distribution and the phylogenetic signal of NAPs (Fig 8):
Three of the four NAP genes originated in eukaryotes through independent transfers from Bacteria. The nitrite reductases NAD(P)H-nir and Fd-nir were most likely transferred from Planctomycetes and Cyanobacteria (Fig 3), respectively; with Fd-nir showing signatures of a plastidic origin (Supplementary Fig 3) if, as proposed, this organelle originated from an early-cyanobacterial lineage [25,53]. The particular bacterial donor of the nitrate transporter nrt2 remains unclear (Fig 3). The nitrate reductase euknr originated through the fusion of three eukaryotic genes: a sulfite oxidase, a Cyt-b5 monodomain and a FAD/NAD reductase (Fig 4). Furthermore, we hypothesize that the pathway, including nrt2, Fd-nir and euknr, was established in an early-Archaeplastida ancestor, as discussed below. First, the three pathway activities most likely originated in the same eukaryotic ancestor. If this is the case, and the plastidic nitrite reductase Fd-NIR, and not NAD(P)H-NIR, was present in the original nitrate assimilation toolkit, this would imply an Archaeplastida origin of the pathway, given that it is well established that plastids originated in this group. The phylogenies of Fd-NIR and NRT2 are consistent with this hypothesis, as they show Archaeplastida sequences in the earliest branches within eukaryotes (Fig 5). The fact that the same topology is not observed in the EUKNR tree does not contradict our argument, since the EUKNR tree showed low statistical nodal support (Supplementary Fig 18). The alternative NAD(P)H-nir-early scenario, while still possible, is less parsimonious because it disagrees with the NRT2 phylogeny and requires of additional secondary losses of this gene.
We propose that NAP genes were transferred from Archaeplastida to other eukaryotic groups during the endosymbiotic events that led to the origin of complex plastids, as it has been shown for numerous genes not necessarily related to plastidic functions [10]. Consistent with this, our phylogenies suggest multiple NAP transfers between algal lineages (Fig 8), with sequences from the complex plastid algal groups branching as sister-groups to the early-branching Archaeplastida sequences in the Fd-NIR and NRT2 trees (Fig 5). For some transfers, the donor and the receptor lineages coincide with proposed endosymbiotic events. This is the case of euknr from Chlorophyta to Chlorarachniophyta [20], Fd-nir from Ochrophyta to Haptophyta [30] and nrt2 from Rhodophyta to Cryptophyta and to SAR [20]. Even though we also found some unexpected transfers between algal lineages, we also consider them as potential endosymbiotic gene transfers. The reason is that the origin of complex plastids is not clearly elucidated, partly due to the heterogeneous phylogenetic signal shown by the plastid proteomes [37]. Based on this heterogeneity, the target-ratchet model proposes that complex plastids resulted from a long-term serial association with different transient endosymbionts, all of which could have contributed in shaping the proteome of the host lineage [20]. Consistent with this model, we found NAP genes in Haptophyta and Chlorarachniophyta that would have been transferred from different potential algal endosymbionts (Fig 8).
From the gene distribution and the phylogenies (Fig 5), we parsimoniously propose that NAD(P)H-nir was transferred from Planctomycetes to a common ancestor of Alveolata and Stramenopiles. The advent of this cytoplasmic nitrite reductase would have resulted in a eukaryotic nitrate assimilation pathway independent from Fd-NIR, and hence independent of the chloroplast. We found NAP sequences from distinct osmotrophic lineages from Stramenopiles and Opisthokonta branching together in the trees (Fig 5), strongly suggesting HGT events involving these groups. Based in these phylogenies (Fig 5) but also in an analysis of the distribution and the gene composition of the clusters (Figs 2 and 6), we parsimoniously propose a first transfer of a NAP cluster from an ancestral stramenopiles (probably from Labyrinthulea) to a common opisthokont ancestor; and a second transfer of a cluster from Ichthyosporea (Teretosporea) to Oomycota (Stramenopiles) (see H3 and all the scenarios evaluated in Supplementary Fig 15). NAP genes would had been subsequently lost in multiple opisthokonts, mostly in phagotrophs but also in some ichthyosporean and fungal groups (Fig 8). Interestingly, the role of HGT in shaping the gene toolkits for osmotrophic functions is well documented in Oomycota and Fungi [12,54]. Our finding of HGT events involving taxa from these two groups but also from Teretosporea and Labyrinthulea extends the scope and potential importance of this mechanism in the acquisition of metabolic features associated to an osmotrophic lifestyle [21].
The hypothetical scenario that we propose for the origin of nitrate assimilation in Opisthokonta disagrees with previous results from Slot and Hibbet [14]. They recovered Oomycota as sister-group to Fungi, while in our trees with an updated taxon-richer dataset, Oomycota branches as sister-group to Ichthyosporea within the Opisthokonta + Stramenopiles clade (Fig 5). To evaluate whether the discrepancies with previous studies are the result of differences in the taxon sampling, we constructed trees excluding all Teretosporea and Labyrinthulea sequences. In agreement with the results of Slot and Hibbet, we then recovered the Oomycota sequences branching as sister-group to Fungi in the NRT2 and NAD(P)H-NIR phylogenies (Supplementary Figs 23 and 24, respectively). To be compatible with the molecular clock data [40] and the phylogenetic signal (Fig 5), an Oomycota origin of all NAP genes in Opisthokonta would require multiple transfers from this lineage, and hence it is poorly parsimonious (see H6 in Supplementary Fig 15). Instead, in agreement with the H3 scenario, we propose that in the results of Slot and Hibbet, Oomycota (Stramenopiles) appeared more related to Fungi (Opisthokonta) than to Ochrophyta (Stramenopiles) because Oomycota received the NAP genes from Opisthokonta, in particular from Ichthyosporea. Notwithstanding, the support for the proposed scenario is susceptible to change with the addition of further taxa, given the dependence of HGT inference to the taxon sampling used [7].
HGT of NAPs could be favored by the metabolic, genomic and ecological landscapes
While HGT in eukaryotes has been the subject of controversy, there is an increasing number of gene families where HGT has been shown to play a role [11,55,56]. The results presented here show the evolutionary history of the nitrate assimilation pathway as a striking example of the importance that gene transfer between eukaryotes may have in the evolution of a certain metabolic pathway. Among the transfers proposed by the most parsimonious scenario (Fig 8), we consider at least the following ones as bona fide transfers because of being well supported by the data (see Results section): 1) At least one transfer of a NAP cluster from an ancestral stramenopiles to Opisthokonta; 2) a NAP cluster transfer from Ichthyosporea to Oomycota or vice versa (although the first is more parsimonious); 3) a nrt2 transfer between Haptophyta and Chlorophyta; 4 and 5) a Fd-nir transfer from primary algae to SAR, and from Ochrophyta to Haptophyta; 6) a euknr transfer from Chlorophyta to Chlorarachniophyta; and 7) a transfer of a euknr paralogue of unknown function from Fungi to a lineage leading to A. castellanii.
We argue that NAP genes may be particularly prone to be successfully transferred. On a metabolic level, pathways downstream to nitrate assimilation and the enzymes involved in the synthesis of the molybdenum cofactor (required for the activity of a number of enzymes, EUKNR included) are widespread in eukaryotes [46,52,57]. This would facilitate the functional coupling of the newly transferred pathway to the metabolic network. On a genomic level, NAP genes are frequently organized in gene clusters in eukaryotic genomes (Fig 2). This would allow the acquisition of the whole pathway in a single HGT event [58], which is also more likely to be positively selected than separate transfers of individual components of the pathway. Moreover, the presence of the whole pathway in the same genomic region could also favor the evolution of a co-regulated transcriptional control after the HGT acquisition [59]. There are various reported examples of other metabolic gene clusters transferred between eukaryotes [60]. On an ecological level, nitrate concentrations have been fluctuating in the course of evolutionary history [61], and are still highly dependent on regional and seasonal changes [16]. Thus, in some circumstances NAP genes could be dispensable while in other circumstances their acquisition through HGT would be favored. This dynamic evolutionary fitness could imply that even one given eukaryotic lineage could have acquired and lost the faculty of nitrate assimilation more than once in the course of its evolution.
Nitrate assimilation in Ichthyosporea: a putative novel nitrate reductase
The presence of NAP genes in Ichthyosporea, described as animal symbionts or parasites [41] and phylogenetically related to Metazoa [32], was not previously reported. In the NAP clusters of C. fragrantissima and S. arctica, we found a putative nitrate reductase gene that probably originated in the common ancestor of these two ichthyosporeans (CS-pnr) from the fusion of the N-terminal region of the EUKNR with the C-terminal region of the NAD(P)H-NIR (Fig 6). The presence of the nitrate reducing module characteristic of EUKNR [46] strongly suggests the clustered CS-pNR is a functional nitrate reductase. The growth on nitrate as sole nitrogen source of S. arctica (mL1 + NaNO3, Fig 7A), in the absence of any other candidate enzyme in the genome, constitutes almost definitive proof for this function. This is further supported for by the strong transcriptional co-regulation of CS-pnr with nrt2 and NAD(P)H-nir in response to the availability of different nitrogen sources. In particular, these genes are poorly expressed on easily assimilable nitrogen sources (urea and ammonium) and highly expressed in nitrogen-free medium as well as in the presence of nitrate (Fig 7B).
The results from the RT-qPCR experiments can be most easily rationalized by a straight-forward repression process. However, specific induction by nitrate cannot be excluded. In the nicotinate assimilation pathway of A. nidulans, we see both specific induction and high expression under nitrogen starvation conditions, mediated by the same transcription factor [62]. It is possible that in this latter instance the intracellular inducer is generated by degradation of intracellular metabolites. Similarly, in the absence of any added nitrogen source, a high affinity nitrate transporter may scavenge residual nitrate present in the nitrate-free culture medium, as it has been specifically shown for A. nidulans [63], Hansenula polymorpha [64] and C. reinhardii [65]. In agreement with this, RNAseq data show that in A. nidulans, an organism where the nitrate-responsive transcription factor has been thoroughly studied [63], nitrate starvation results in high expression of the three genes in the NAP cluster [66].
The transcriptional regulation of NAPs has been characterized in land plants [47], Chlorophyta [19], Rhodophyta [48], Fungi [49,50]; and now also in the ichthyosporean S. arctica (Fig 7). The independent origins of NAP genes in some of these groups (Fig 8), together with the shown lineage-specific differences at the regulatory elements [19,67–69] suggests that natural selection promoted the evolution of analogous regulatory responses, favoring the integration of this pathway into the metabolic landscape after its acquisition through HGT.
Materials and methods
Phylogenetic screening of NAPs
An updated database of 174 eukaryotic proteomes (euk_db) was constructed (January 2017), using predicted protein sequences from publicly available genomic or transcriptomic projects [70–72]. The complete list of species, with the corresponding abbreviations, is available in Supplementary Table 1. All supplementary tables are in Supplementary File 2; supplementary files are accessible in https://figshare.com/s/d11b23d7928009e2d508). The phylogenetic relationships between all the sampled eukaryotes were constructed from recent bibliographical references (James et al., 2006; Ruhfel et al., 2014; Derelle et al., 2015; Sierra et al., 2015; Kurtzman et al., 2015; Derelle et al., 2016; He et al., 2016; Janouškovec et al., 2017; Kang et al., 2017; Mccarthy and Fitzpatrick, 2017; Muñoz-Gómez et al., 2017; Brown et al., 2018). Protein domain architectures from all euk_db sequences were obtained with PfamScan (a Hidden Markov Model [HMM] search-based tool) using Pfam A version 29 [85]. A database of prokaryotic protein sequences (prok_db) was constructed from Uniprot bacterial and archaeal reference proteomes (Release 2016_02) [72] with the aim of detecting potential prokaryotic contamination in euk_db as well as to investigate the prokaryotic origins of eukaryotic NAP genes.
For each NAP, we followed a multi-step procedure in order to maximize both sensitivity and specificity in the orthology assignation process (see Supplementary File 1 for detailed information about the particular strategy followed for each NAP). The overall strategy consisted first in identifying potential NAP family members in euk_db with BLAST (version 2.3.0+) [86] and HMMER (version 3.1b1) [87]. For BLASTP searches, we queried the databases using the NAP sequences from Chlamydomonas reinhardtii and A. nidulans [18], downloaded from Phytozome 11 [88] and NCBI protein databases [70], respectively [BLASTP: ‐evalue 1e-5, only non-default software parameters specified]. For HMMER searches [hmmsearch], we used the HMM Pfam domains MFS_1 for NRT2, Oxidored_molyb and Mo-co_dimer for EUKNR, and NIR_SIR for NAD(P)H-NIR and Fd-NIR. The candidate sequences retrieved from the BLASTP and hmmsearch analyses were submitted to cdhit (version 4.6) [89] [-c 0.99] to remove repeated/very recent paralogues (i.e. redundant sequences). We used the non-redundant candidate sequences to detect potential prokaryotic homologues in prok_db [-evalue 1e-5], to use them as outgroups to eukaryotic sequences and/or to detect potential euk_db contaminant sequences during the phylogenetic analyses. The non-redundant candidate sequences and the captured prokaryotic homologs were submitted to an iterative process in which we recursively performed phylogenetic inferences with the sequences non-discarded in the previous steps until we reached a set of bona fide NAP family members. The criteria to discard sequences in each step was mainly phylogenetic, but also assisted with functional information of each candidate sequence, predicted from their Pfam domain architecture and from their best-scoring BLASTP hit [‐evalue 1e-3] against the SwissProt database [72] (downloaded on July 2016). Those potential eukaryotic NAPs that branched separately from other eukaryotic sequences within a prokaryotic clade in the phylogenies were considered as contaminants if they correspond to a euk_db proteome generated from transcriptomic data obtained from cultures with bacterial contamination, or if they are encoded in potentially contaminant genomic scaffolds. Previous to all phylogenetic inferences, sequences were aligned with MAFFT (version v7.123b) [90] [mafft-einsi] and alignments were trimmed with trimAl (version v1.4.rev15) [91] using the ‐gappyout option. Maximum likelihood phylogenetic inference was done using RAxML (version 8.2.4) [92] with rapid bootstrap analysis (100 replicates) and using the best model according to BIC criteria in ProtTest analyses (version 3.2) [93].
In Supplementary Table 1, for each species, the columns corresponding to NAPs are colored in blue when at least 1 bona fide member has been identified. They are colored in light brown when all members identified are likely to correspond to bacterial contamination, and in red when no NAPs were identified.
Re-annotation of NAPs using TBLASTN
For those eukaryotes in which we detected an incomplete presence of the pathway (i.e. having only genes coding for some but not the three required steps, see Fig 1), we performed additional searches in the genomic sequences of the corresponding organism using the reference NAP protein sequences [TBLASTN: ‐evalue 1e-5]. This additional search allowed us to re-annotate two putative NAPs absent from euk_db (Fd-NIR in Aureococcus anophagefferens and NRT2 in Ostreococcus tauri) that were later incorporated in the phylogenies.
We also searched for potentially transferred prokaryotic nitrate and nitrite reductases, whose presence would suggest a replacement of their eukaryotic counterparts. While a putative ‘Copper containing nitrite reductase’ was found in the amoebozoan Acanthamoeba castellanii, we considered this sequence as an uncharacterized copper oxidase not necessarily involved in nitrite reduction. We were based in the fact that (1) the most similar sequences in euk_db and prok_db correspond to few distantly related eukaryotes without any NAPs or with already the complete eukaryotic pathway predicted such as C. reinhardtii and (2) the absence of the characteristic InterPro Nitrite reductase, copper-type signature.
Correlation between NAPs distribution and feeding strategies
We constructed phylogenetic profiles for each NAP gene family: vectors with presence/absence information (coded in “1” or “0”, respectively), with every position of the vectors corresponding to a certain species sampled in our euk_db dataset. These vectors were then used to quantify the correlation between the distributions of the different NAPs by computing the inverse of the Hamming distance between each pair of phylogenetic profiles. We also quantified the correlation between the distribution of the different NAPs and the distribution of the different nutrient acquisition strategies in eukaryotes. For that, we classified eukaryotes into ‘Autotrophs/Mixotrophs’ (i.e. strictly and facultative autotrophs) and ‘Non-autotrophs’. ‘Non-autotrophs’ (i.e. strictly heterotrophs) were further subclassified into ‘Phagotrophs’, ‘Fungal-like osmotrophs’ and ‘Others’ (Supplementary Table 1). The category ‘Phagotrophs’ include all heterotrophs that feed by phagotrophy. The category ‘Fungal-like osmotrophs’ include all heterotrophs that belong to eukaryotic groups with cellular and physiological features characteristic of a fungal-like osmotrophic lifestyle [21]. These include Fungi, Teretosporea, Oomycota and Labyrinthulea. The category ‘Others’ include all the heterotrophs not classified in any of these categories, all of them belonging to eukaryotic groups with a parasitic lifestyle.
Evolution of NAP genes
We used the bona fide eukaryotic NAP sequences identified to reconstruct the evolutionary history of the NAP gene families in eukaryotes. We excluded all the sequences with less than half of the median length of the corresponding NAP family in order to remove fragmented sequences that could mislead the alignment and the phylogenetic inference processes. Sequences were aligned and trimmed with MAFFT [mafft-einsi] and trimAl [-gappyout]. For the phylogenies, we used IQ-TREE (version 1.5.3) [94] instead of RAxML given that an approximately unbiased (AU) test can be performed in IQ-TREE [27]. AU test was used to evaluate whether the robustness of those branches indicating potential gene transfer events are significantly higher than other alternative topologies (10000 replicates; see all the alternative topologies tested and AU test results in Supplementary Table 3). Trees representing the alternatives topologies were constructed also with IQ-TREE, using the same alignments and evolutionary models and constraining the topologies with Newick guide tree files [-g option]. For bootstrap support assessment, we used the ultrafast bootstrap option (1000 replicates) because was shown to be faster and less biased than standard methods [95,96]. For model selection, we used ModelFinder, already implemented in IQ-TREE [97].
Moreover, and given that some eukaryotic groups were poorly represented due to the lack of genomic data, we constructed additional NAP trees incorporating orthologues from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) dataset [28]. We queried the reference NAP protein sequences against all the MMETSP transcriptomes [BLASTP: ‐evalue 1e-3]. MMETSP NAP orthologues were identified from the aligned sequences by means of Reciprocal Best Hits (RBH) [98] and best-scoring BLASTP hit against SwissProt database [-evalue 1e-3].
Comprehensive screening of NAPs in prokaryotes
We used the bona fide eukaryotic NAP sequences to capture potential prokaryotic orthologues of nrt2, NAD(P)H-nir and Fd-nir. First, we queried those sequences against prok_db with BLASTP [‐max_target_seqs 100, ‐evalue 1e-5]. Protein domain architectures were annotated with PfamScan, and those with clearly divergent architectures were discarded. The remaining prokaryotic sequences were aligned with eukaryotic NAPs using MAFFT [mafft-einsi]. The alignments were trimmed with trimAl [-gappyout] and the phylogenetic inferences were done with IQ-TREE [ultrafast bootstrap 1000 replicates, best model selected with ModelFinder]. Prokaryotic sequences were taxonomically characterized by aligning them against a local NCBI nr protein database (downloaded on November 2016), and only hits with more than 99% of identity and query coverage were considered [BLASTP: ‐task blastp-fast].
To ensure that the taxonomic representation of prok_db allow to detect signatures of genes likely to have been transferred from Alphaproteobacteria and Cyanobacteria (the putative donors of the mitochondria and the plastid, respectively [99]), we constructed control phylogenies using in each case two genes with a known plastidic (‘Photosystem II subunit III’ and ‘ribosomal protein L1’ [24]) (Supplementary Figs 4 and 5, respectively) and mitochondrial origin (‘Cytochrome c oxidase subunit III’ and ‘Cytochrome b’ [100]) (Supplementary Figs 25 and 26, respectively). For the mitochondrial and plastid control genes, the eukaryotic sequences used to query prok_db were retrieved from a subset of proteomes from plastid-bearing eukaryotes. For the detection of potential orthologues in prok_db, alignment and phylogenetic inference; we used the same procedure, software and parameters than with the NAP trees (see above).
Construction of sequence similarity networks
Sequence similarity network of full length EUKNR
The EUKNR protein sequences were aligned against a database including euk_db and prok_db [BLASTP: ‐max_target_seqs 10000, ‐evalue 1e-3]. Aligned sequences were concatenated with the EUKNR and redundant sequences were removed with cdhit before being aligned all-against-all with BLASTP [-max_target_seqs 10000, ‐evalue 1e-3]. We used Cytoscape (version v.321) [89] to construct a sequence similarity network from BLAST results, represented using the organic layout option. In the network, each aligned protein correspond to a node. Nodes are connected through edges if the corresponding sequences aligned with a lower E-value than the threshold value. A relaxed E-value threshold would lead to an over-connected network, with edges connecting very divergent proteins. On the other hand, a strong threshold would lead to an under-connected network, having only connections between strongly similar proteins. After exploring different thresholds, we chose an E-value of 1e-82 because it allows to represent only the most similar protein families to the N-terminal and C-terminal regions of EUKNR. We performed as well the following modifications in order to remove redundant and non-informative connections and to facilitate the analysis and interpretation of the network: (i) we removed self-loops and duplicate edges; (ii) we removed those nodes that were not connected to the EUKNR cluster or that were connected with a distance of more than two nodes; (iii) non-EUKNR sequence names were modified to include information of their protein domain architectures [PfamScan]; (iv) we removed nodes and edges corresponding to proteins with strong evidence of corresponding to miss-predicted proteins (e.g. spurious domain architectures). Nodes representing proteins that only connected with miss-predicted sequences were also removed (information about the list of proteins, their domain architecture and the particular reasons for their exclusion is available in Supplementary Table 2).
To validate whether EUKNR are more phylogenetically related to non-Cyt-b5 sulfite oxidases than to Cyt-b5 sulfite oxidases (see the corresponding Results section), we constructed a phylogenetic tree with the identified EUKNR sequences and the sulfite oxidases detected during the network construction process. MAFFT [mafft-einsi], trimAl [-gappyout] and IQ-TREE [ultrafast bootstrap 1000 replicates, best model selected with ModelFinder] were used for phylogenetic inference.
Sequence similarity network of EUKNR Cyt-b5 domain
The regions of the A. nidulans and C. reinhardtii EUKNR corresponding to the Cyt-b5 Pfam domain were aligned against a database including euk_db and prok_db [BLASTP: ‐max_target_seqs 10000, ‐evalue 1e-3]. Non-redundant and non-EUKNR sequences were concatenated with the two EUKNR Cyt-b5 sequences and an all-against-all alignment was performed [BLASTP: ‐max_target_seqs 10000, ‐evalue 1e-3]. Sequence names were modified to include information of their protein Pfam domain architectures [PfamScan]. A sequence similarity network was constructed with Cytoscape and represented with the organic layout option (as with full length EUKNR), removing self-loops and duplicate edges and using an E-value threshold of 1e-17. We also removed those nodes that were not connected to A. nidulans or C. reinhardtii EUKNR Cyt-b5 regions or that were connected with a distance of more than two nodes.
Detection of NAP clusters
For the detection of clusters of NAP genes, we scanned the genomes of those sampled eukaryotes with more than 1 NAP gene identified. We aligned the NAPs of each organism against the corresponding genomes using TBLASTN [-evalue 1e-3]. The genomic location of each NAP was annotated based on the TBLASTN hit with the highest score. Then, we looked for genomic fragments with more than 1 NAP genes annotated, and the genes were considered to be in a cluster when they were proximally located in that fragment. In the case of Corallochytrium limacisporum, the two NAP genes detected were found in terminal positions of two separate fragments of the genome assembly (nrt2 in scaffold99_len85036_cov0 and NAD(P)H-nir in scaffold79_len158446_cov0). To figure out whether these two genes are in different scaffolds because of an assembly artifact, we designed primers directed to the terminal regions of both fragments (ClimH_R73C and ClimH_F72C, see all the primers used in this work in Supplementary Table 4). These primers were used to check, by PCR, whether the two scaffolds are contiguous on the same chromosome. We obtained a PCR fragment of ~500 bp that was cloned into pCR2.1 vector (Invitrogen) and Sanger sequenced. BLAST analysis of the sequenced products (available in Supplementary Table 4) showed the presence of regions from both scaffolds in the extremes of the PCR fragment, confirming that nrt2 and NAD(P)H-nir are clustered in this species.
Furthermore, we investigated the genomic regions flanking the clusters of C. fragrantissima, S. arctica, C. limacisporum, Phytophthora infestans and A. kerguelense in order to find additional genes in the NAP clusters of Opisthokonta and SAR. Because we found a TP_methylase Pfam domain protein (TP_methylase) clustered with NAP genes in three of these genomes, we scanned the remaining SAR and Opisthokonta for the presence of additional clusters of NAP genes with a TP_methylase. To do that, we retrieved all the TP_methylase of each organism and aligned them against the corresponding genome [TBLASTN: ‐evalue 1e-3]. As with NAP genes, the genomic location of each TP_methylase was annotated considering the BLAST hit with the highest score.
Phylogenetic analyses of tetrapyrrole methylase proteins
All the TP_methylase in euk_db were retrieved and used to detect similar sequences in prok_db [BLASTP: 1e-3]. Among the aligned sequences from prok_db, only those with a detected TP_methylase Pfam domain were kept [PfamScan]. TP_methylase sequences from euk_db and prok_db were aligned with MAFFT and trimmed with trimAl [-gappyout]. Because there were >1000 sequences in the alignment, we used FastTreeMP (version 2.1.9) [101] for the construction of the phylogenetic tree instead of IQ-TREE. We kept for further analyses the sequences in the blue clade because it included the three TP_methylase proteins found in cluster with NAP genes (sequences pointed by arrows in Supplementary Fig 19). Because in that blue clade eukaryotic sequences were monophyletic and branched within a bacterial clade, we considered all the eukaryotic sequences of this clade as a particular eukaryotic TP_methylase protein family (TPmet). We used all TPmet sequences to capture potential prokaryotic homologs of this specific family in prok_db [BLASTP: - evalue 1e-3], which were incorporated to TPmet sequences for a second phylogenetic tree (Supplementary Fig 27). To that end, sequences were aligned with MAFFT [mafft-einsi], trimmed with trimAl [-gappyout], and the tree was inferred with IQ-TREE [ultrafast bootstrap 1000 replicates, best model selected with ModelFinder]. To get a higher phylogenetic resolution of TPmet and their prokaryotic relatives, a third and last phylogenetic inference (Supplementary Fig 20) was done with sequences labeled in blue in Supplementary Fig 27 (for phylogenetic inference, we used the same procedure as for the second tree).
We also constructed a Venn diagram to evaluate the coincidence between the phylogenetic distributions of TPmet and NAD(P)H-nir families along eukaryotes. In this analysis, we excluded the TPmet sequence belonging to N. vectensis (Nvec_XP_001617771) because it is located in a genomic fragment (NW_001825282.1) that most likely represents a contaminant scaffold. In particular, the phylogenetic tree revealed that this protein is identical to a region of the TPmet found in the choanoflagellate Monosiga brevicollis (Mbre_XP_001745780). We found that this M. brevicollis protein, as well as the TPmet protein found for the choanoflagellate Salpingoeca rosetta (Sros_PTSG_11107), are encoded in large genomic fragments (1259938 bp in the case of M. brevicollis), while the N. vectensis protein is found in a small genomic fragment (1325 bp). Moreover, this N. vectensis fragment entirely aligned without mismatches with the M. brevicollis fragment (CH991551, between the 280127-281451 positions), indicating that this most likely represents a contamination from the M. brevicollis genome.
Phylogenetic analyses of the C-terminal region of CS-pNR
Sequences from euk_db and prok_db as well as from MMETSP and Microbial Dark Matter database (MDM_db) [102] (downloaded in January 2017) were scanned for the co-presence Pyr_redox_2 and Fer2_BFD Pfam domains [hmmsearch]. Sequences with these pair of domains were retrieved and aligned with MAFFT [mafft-einsi]. We only kept the region of the alignment that correspond to Pyr_redox_2 and Fer2_BFD Pfam domains. The alignment was further trimmed with trimAl [‐gappyout], and IQ-TREE was used for the phylogenetic inference [ultrafast bootstrap 1000 replicates, best model selected with ModelFinder].
Cells and growth conditions
S. arctica JP610 was grown axenically at 12 ºC in 25 cm2 or 75 cm2 culture flasks (Corning) filled, respectively, with 5 mL or 20 mL of Marine Broth (Difco). For nitrogen limitation experiments, cells were incubated in modified L1 medium (mL1) [103], of the following composition (per liter): 35 g marine salts (Instant Ocean), 0.1 g dextrose, 5 g NaH2PO4.H2O, 1.17 x 10−5 M Na2EDTA.2H2O, 1.17 x 10−5 M FeCl3.6H20, 9.09 x 10−7 M MnCl2.4H20, 8.00 x 10−8 M ZnSO4.7H20, 5.00 x 10−8 M CoCl2.6H20, 1 x 10−8 M CuSO4.5H20, 8.22 x 10−8 M Na2MoO4.2H2O, 1 x 10−8 M H2SeO3, 1 x 10−8 M NiSO4.6H20, 1 x 10−8 M Na3VO4, 1 x 10−8 M K2CrO4, 2.96 x 10−8 M thiamine•HCl, 2.05 x 10−10 M biotin, 3.69 x 10−11 M cyanocobalamin. For nitrogen supplementation experiments, mL1 medium was supplemented with either 100 mM NaNO3, 100 mM (NH4)2SO4 or 100 mM urea as nitrogen source, as specified in the text. Photomicrographies were taken with a Nikon Eclipse TS100 equipped with a DS-L3 camera control unit (Nikon). Images were processed with imageJ.
RNA isolation, cDNA synthesis and real-time PCR analyses
The expression levels of S. arctica NAP genes in cultures with different nitrogen sources were analyzed using real-time PCR. S. arctica cells were grown for 10 days in 75 cm2 cell culture flasks (Corning) with 20 mL Marine Broth (Difco). Cells were scraped and collected by centrifugation at 4500 xg for 5 min at 12 ºC in 50 mL Falcon tubes (Corning). Supernatant was discarded and pellets were washed twice by resuspension with 20 mL of mL1 medium to wash out any trace of Marine Broth. An aliquot of the washed cells was collected as time 0. Cells were finally resuspended in mL1 medium, distributed equally into four 25 cm2 culture flasks, and supplemented with different nitrogen sources. Aliquots were collected at 6, 12 and 24 hours. At each time-point, cells were pelleted in 15 mL Falcon tubes (Corning), supernatant was discarded and the pellets were resuspended in 1 mL Trizol reagent (Invitrogen) and transferred to 1.5 mL microfuge tubes with safe lock (Eppendorf). Tubes were subjected to two cycles of freezing in liquid nitrogen and thawing at 50 ºC for 5 min. After this treatment, samples were kept at −20 ºC until further processing. To eliminate any trace of genomic DNA, total RNA was treated with Amplification Grade DNAse I (Roche) and precipitated with ethanol in the presence of LiCl. The absence of genomic DNA was confirmed using a control without reverse transcription. A total of 2.5 µg of pure RNA was used for cDNA synthesis using oligo dT primer and SuperScript III retrotranscriptase (Invitrogen), following the instructions of the manufacturer. cDNA was quantified using SYBR Green supermix (Bio-Rad) in an iQ cycler and iQ5 Multi-color detection system (Bio-Rad). Primer sequences are shown in Supplementary Table 4. The total reaction volume was 20 µL. All reactions were run in duplicate. The program used for amplification was: (i) 95 ºC for 3 min; (ii) 95 ºC for 10 s; (iii) 60 ºC for 30 s; and (iv) repeat steps (ii) and (iii) for 40 cycles. Real-time data was collected through the iQ5 optical system software v. 2.1 (Bio-Rad). Gene expression levels are expressed as number of copies relative to the ribosomal L13 subunit gene, used as housekeeping.
Supplementary files captions
All supplementary files are accessible in https://figshare.com/s/d11b23d7928009e2d508. These include:
Supplementary File 1. Supplementary methods for the ‘Phylogenetic screening of NAPs’ section.
Supplementary File 2. Includes all the supplementary tables.
Acknowledgements
We thank José Luis Maestro for his help in qPCR design and qPCR results interpretation, Meritxell Antó for technical assistance, and Michelle Leger for their valuable comments on the manuscript. EOP also warmly acknowledge the MCG C. Committee for their insightful advice and final approval of figures design. This work was supported by an ERC Consolidator Grant (ERC-2012-Co-616960), support from the Secretary’s Office for Universities and Research of the Generalitat de Catalunya (project 2014 SGR 619) and one grant from the Spanish Ministry for Economy and Competitiveness (MINECO; BFU2014-57779-P, with European Regional Development Fund support), all to IRT. SRN is a member of the Carrera del Investigador Científico from CONICET, Argentina. EOP was supported by a pre-doctoral FPI grant from MINECO.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.
- 69.↵
- 70.↵
- 71.
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵