ABSTRACT
Budding yeasts are a diverse group of microbes that inhabit a range of environments by exploiting various metabolic traits. The complete genetic basis of these traits are mostly unknown, preventing their addition or removal in a chassis organism for fermentation, including the removal of aerobic ethanol accumulation in Saccharomyces cerevisiae for biomass-derived products. To help understand the molecular evolution of aerobic fermentation in yeasts, we created Analyzing Yeasts by Reconstructing Ancestry of Homologs (AYbRAH), an open source database of manually curated and predicted orthologs in 33 Dikarya fungi (https://github.com/kcorreia/aybrah). AYbRAH was used to evaluate the gain and loss of orthologs in budding yeast central metabolism and refine the topology of recently published yeast species trees. The shift from a lipogenic-citrogenic to ethanologenic lifestyle is distinguished by the emergence of the pyruvate dehydrogenase (PDH) bypass in budding yeasts, with a glucose-inducible acetyl-CoA synthetase. Flux balance analysis (FBA) indicates the PDH bypass has a higher protein and phospholipid yield from glucose than ATP citrate lyase, the ancestral acetyl-CoA source in budding yeasts. Complex I subunits and internal alternative NADH dehydrogenase (NDI) orthologs have been independently lost in multiple yeast lineages. FBA simulations demonstrate Complex I has a significantly higher biomass yield than NDI. We propose the repeated loss of Complex I in yeasts is a result of Complex I inhibition by a mitochondrial metabolite. Existing and future genome annotations can be improved by annotating genomes against manually curated ortholog databases, which we believe can improve the understanding of physiology and evolution across the tree of life.
1 INTRODUCTION
Yeasts are unicellular fungi that exploit diverse habitats on every continent, including the gut of wood boring beetles, insect frass, tree exudate, rotting wood, rotting cactus tissue, soil, brine solutions, and fermenting juice (Kurtzman et al., 2011). The most widely studied yeasts are true budding yeasts, which span roughly 400 million years of evolution within the subphylum Saccharomycotina (Hedges et al., 2015), and possess a broad range of traits important to metabolic engineering. These include citrate and lipid accumulation in Yarrowia sp. (Aiba and Matsuoka, 1979) and Lipomyces sp. (Boulton and Ratledge, 1983), thermotolerance in multiple lineages (Ryabova et al., 2003; Banat et al., 1996), acid tolerance in Pichia sp. (Rush and Fosmer, 2013), methanol utilization in Komagataella sp. (Hazeu and Donker, 1983), osmotolerance in Debaromyces sp. (Larsson and Gustafsson, 1993), xylose fermentation to ethanol in multiple yeast lineages (Schneider et al., 1981; Slininger et al., 1982; Toivola et al., 1984), alternative nuclear codon reassignments (Mühlhausen et al., 2016), glucose and acetic acid co-consumption in Zygosaccharomyces sp. (Sousa et al., 1998), and aerobic ethanol production (the Crabtree effect) in multiple lineages (De Deken, 1966; Van Urk et al., 1990; Blank et al., 2005; Christen and Sauer, 2011). Most of these traits do not have a complete known genetic basis, preventing their addition or removal in a chassis organism to improve fermentation techno-economics.
The Crabtree effect is one notable hindrance associated with Saccharomyces cerevisiae for biomass-derived products, such as heterologous proteins (Hensing et al., 1995). The volumetric productivity of these fermentations could be improved if aerobic ethanol production were eliminated in a two-stage fermentation process (Venayak et al., 2015), yet a complete understanding of the molecular mechanisms underlying the Crabtree effect remains unknown. At a fundamental level, the presence or absence of the Crabtree effect can be explained by a gain of function, a loss of function, or some combination of gains and losses of function in an ancestral yeast species without the Crabtree effect. However, most studies take a S. cerevisiae-centric view (Vemuri et al., 2007; Nilsson and Nielsen, 2016; Huberts et al., 2012; Vazquez and Oltvai, 2016), or do not consider molecular evolution of multiple species (Blank et al., 2005; De Deken, 1966; Christen and Sauer, 2011).
If we could study the physiology of the mother of all budding yeasts, which we refer to as the Proto-Yeast, along with a Crabtree-negative and Crabtree-positive yeast, we could undertake model-based studies to reverse engineer what gains and losses of function have occurred in each lineage. The Proto-Yeast has evolved from its original state, making this direct study impossible, but we can study the metabolism of Proto-Yeast’s living descendants. In recent years, dozens of diverse yeast genomes have been sequenced (Riley et al., 2016), paving the way for greater insight into the evolution of metabolism in the yeast pan-genome.
Genome-scale metabolic reconstructions provide a scaffold to study metabolism in yeasts (Österlund et al., 2013), but many of these reconstructions are understandably skewed to reflect the well-studied metabolism of S. cerevisiae. The two types of errors in these metabolic reconstructions are commission and omission. With commission, enzymes present in the metabolism of S. cerevisiae are often included in a non-conventional organism’s metabolic reconstruction, despite a lack of genomic evidence. Omission occurs when enzymes are not included in a non-conventional organism’s metabolic reconstruction, likely since they are absent in S. cerevisiae. Specific examples of both types of errors in various yeast reconstructions can be found in Supplemental Information. These errors can significantly alter flux analysis and impair our understanding of these yeasts’ physiologies. The root of this problem is a lack of enzyme characterization in non-model organisms and a lack of high-quality orthology assignments. Every enzyme in every organism is unlikely to be characterized, hence, we argue that a transparent and open curated ortholog database is critical for creating higher quality genome-scale reconstructions.
The distinction between orthologs, paralogs, ohnologs and xenologs is an important aspect of comparative genomics and its role in the evolution of physiology. Briefly, orthologs arise from speciation and often have conserved function; paralogs and ohnologs emerge from duplication events and may have novel function; xenologs derive from horizontal gene transfer between organisms (Jensen, 2001). Commonly used ortholog databases, such as eggNOG (Huerta-Cepas et al., 2016b), InParanoid (Sonnhammer and Östlund, 2015), KEGG orthology (Mao et al., 2005), PIRSF (Wu et al., 2004), Ensembl (Vilella et al., 2009), OrthoMCL (Fischer et al., 2011), OrthoDB (Kriventseva et al., 2015), and OMA (Altenhoff et al., 2015), often fail to separate homologs into true orthologs in gene families with many duplications (see Supplemental Information). Higher quality ortholog databases include PANTHER and Yeast Gene Order Browser, but these are limited to a smaller selection of organisms (Mi et al., 2012; Byrne and Wolfe, 2005). In addition to these databases, ad hoc ortholog relationships are often assigned in genomics studies or genome-scale metabolic reconstructions, but they lack transparency and are not continuously improved. To solve these problems, and ultimately improve our understanding of yeast physiology and its evolution, we present Analyzing Yeasts by Reconstructing Ancestry of Homologs (AYbRAH). AYbRAH, derived from the Hebrew name Abra, mother of many, is an open source collection of predicted and manually curated orthologs, their function, and origin. Using this database, we reviewed the gains and losses of orthologs in central metabolism within Dikarya, refined the topology of published yeast species trees, and focus on how the evolution of the pyruvate dehydrogenase (PDH) bypass and NADH dehydrogenase may have played a role in the Crabtree effect. These studies, combined with modeling, provide an improved understanding of how the evolution of acetyl-CoA metabolism and the electron transport chain (ETC) impacted yeast physiology.
2 METHODS
AYbRAH curation
AYbRAH was created by combining several databases and programs in a pipeline to identify ortholog groups (Figure 1). Protein sequences from 33 organisms in Dikarya were downloaded from UniProt and MycoCosm (Table 1) (Grigoriev et al., 2013). 212 836 proteins were fed into OrthoMCL (Fischer et al., 2011) to cluster genes into putative Fungal Ortholog Group’s (FOG’s) with default parameters in BLASTP and OrthoMCL. FOG’s were coalesced into Homolog Groups (HOG’s) using Fungi-level OrthologDB identification codes (Kriventseva et al., 2015). MAFFT 7.245 (Katoh and Standley, 2013) was used to align the HOG sequences using a gap and extension penalty of 1.5. 100 bootstrap trees were constructed for each HOG with PhyML 3.2.0 (Guindon et al., 2010), optimized for tree topology and branch length. Consensus phylogenetic trees were generated with SumTrees in DendroPy 4.1.0 (Sukumaran and Holder, 2010), and figures drawn with ETE3 (Huerta-Cepas et al., 2016a). Phylogenetic trees were reviewed in central metabolism when OrthoMCL failed to create monophyletic ortholog groups caused by underclustering and overclustering (see Supplemental Information). New ortholog groups were created from polyphyletic FOG’s when there was evidence of a monophyletic ortholog group either by manual inspection of its phylogenetic tree or with the help of an ETE3 script. Evolview (He et al., 2016) was used to map ortholog groups to the yeast species trees. Additional steps were required to assign proteins to a FOG due to a lack of assignment by OrthoMCL or an incomplete protein annotation. A protein from each FOG of the species closest relative was queried against its genome sequence with TBLASTN, using an expect threshold less than 1e-20. Annotated proteins were then queried against the TBLASTN hits to determine which proteins were annotated but not assigned to a FOG by OrthoMCL, and which proteins were unannotated despite a match in its genome. Both groups of proteins were assigned to a HOG by its best match, and to a FOG with pplacer (Matsen et al., 2010) via the MAFFT add alignment option.
Metabolic simulations
Flux balance analysis (FBA) was carried out with the consensus S. cerevisiae genome-scale metabolic model (Heavner and Price, 2015) in MATLAB to quantify yield differences with acetyl-CoA pathways and matrix-orientated NADH dehydrogenase. Carbohydrate, protein, RNA, DNA, and phospholipid yields with glucose were computed from three acetyl-CoA pathways: the S. cerevisiae wild-type PDH bypass via pyruvate decarboxylase (PDC), NADP-acetaldehyde dehydrogenase (ALD), and acetyl-CoA synthetase (ACS); the PDH bypass via NAD-ALD; and ATP citrate lyase (ACL). Biomass yield was computed with Complex I and internal NADH dehydrogenase (NDI).
3 RESULTS & DISCUSSION
Impact of acetyl-CoA source on biomass precursor yields
We maximized the production of carbohydrates, protein, DNA, RNA, and phospholipids with glucose using a S. cerevisiae genome-scale model to help understand the impact of acetyl-CoA sources on their yields. No changes were observed with carbohydrate, RNA, and DNA yields for the PDH bypass with NADP-ALD, the PDH bypass with NAD-ALD, and ACL. Protein and phospholipid yields were highest with the PDH bypass, at 0.53 g/g and 0.02 g/g, respectively. The PDH bypass via NAD-ALD marginally reduced both yields by 0.01%. ACL reduced the protein and phospholipid yields by 2.9% and 0.07%, respectively (Figure 4). Hence, there are clear differences between the different acetyl-CoA synthesis pathways, suggesting a link between the different pathways, and their impact on the physiology of the yeasts.
Evolution of PDH bypass
The PDH bypass is a well-known pathway in S. cerevisiae (Pronk et al., 1996; van Rossum et al., 2016), but its phylogenetic origin has never been explored or why the ancestral ACL was lost in most budding yeasts. The primary source of cytosolic acetyl-CoA in Proto-Yeast was likely ACL (Hynes and Murray, 2010); it exists as a homo-oligomer in Basidiomycota (Shashi et al., 1990), and a hetero-oligomer in Ascomycota following an ancient gene duplication (Figure 3). ACL was lost in the lineage sister to Nadsonia fulvescens following the acquisition of ACS2 (Figure 3). Another source of acetyl-CoA in Proto-Yeast was the phosphoketolase (PHK) pathway, which was lost in budding yeasts sister to Lipomyces starkeyi. PHK also exists as a heterooligomer in the Pezizomycotina-Saccharomycotina clade. Proto-Yeast likely had one copy of ACS, ACS1, an enzyme expressed when grown on acetate (Connerton et al., 1990). A duplication in an ancestral species sister to Yarrowia lipolytica, which we refer to as the Proto-Fermenter, led to ACS2 (Figure 4A), the glucose-inducible ACS present in the PDH bypass (van den Berg et al., 1996; Zeeman and Steensma, 2003). ACS3, a propionyl-CoA synthetase, derived from an additional duplication of ACS1 in Pezizomycotina; it may have been present in Proto-Yeast. Schizosaccharomyces pombe (Van Urk et al., 1990), Blastobotrys adeninivorans, and N. fulvescens are the only yeasts in this study with ACL but use the PDH bypass as an acetyl-CoA source with glucose; it is not known when each source is preferred for cytosolic acetyl-CoA production in these species. Expression of PDC under aerobic glucose growth would have been a prerequisite for the PDH bypass in yeast, but its expression was likely limited to oxygen-limiting conditions in Proto-Yeast (Sanchis et al., 1994). All budding yeasts in this study have homologs of the PDH bypass genes (PDC, NAD or NADP-ALD, and ACS), but the delimiting gene for an active PDH bypass in Saccharomycotina, and ultimately glucose fermentation to ethanol, appears to be the ACS2 paralog. The absence of an active PDH bypass in L. starkeyi and Y. lipolytica indicate that microbes do not maximize their biomass yields given their metabolic network (Segre et al., 2002; Fong et al., 2003), but likely require chance duplications to rewire function.
Expansion of aldehyde dehydrogenase family
NADP-ALD was absent in Proto-Yeast but emerged as cytosolic and mitochondrial enzymes from several duplications in budding yeasts (Figure 3). It’s unclear if ALD6.1 or ALD5 originated first from ALD2, but the S. cerevisiae’s ALD6 gene, annotated as ALD6.2 in AYbRAH, is paralogous to the first cytosolic NADP-ALD in the budding yeast pan-genome, ALD6.1. ALD2.4, another putative cytosolic NADP-ALD, also emerged within Taphrinomycotina but has not been characterized directly (Van Urk et al., 1990). The loss of cytosolic NADP-ALD in the CTG clade, with the exception of Debaromyces hansenii, Cephaloascus albidus and Cephaloascus fragrans (Grigoriev et al., 2013), is puzzling as ALD6 has been shown to be indispensable for S. cerevisiae (Grabowska and Chelstowska, 2003). An alternative NADPH source may have evolved in the CTG clade that reduced the need for cytosolic NADP-ALD. The presence and absence of ALD2.3, the only mitochondrial ALD in Pezizomycotina covered in this study, appears to divide the fungal species into ethanologenic and citrogenic species but further analysis is required with more species; the PDH bypass may be active in Neurospora crassa and Trichoderma reesei under some conditions.
Dekkera bruxellensis and D. hansenii are two Crabtree-positive yeasts with peculiar recent duplications in the ALD family. First, D. bruxellensis has a mitochondrial NADP-ALD, ALD6.3, originating from a cytosolic NADP-ALD duplication, ALD6.1; this complements the older mitochondrial NAD(P) paralog, ALD5, present in most yeasts sister to N. fulvescens. ALD6.3 is also present in other Crabtree-positive Dekkera species, but absent in Brettanomyces naardenensis, a closely related Crabtree-negative species (results not shown). Second, D. hansenii has a cytosolic NAD(P)-ALD, ALD5.2, derived from a mitochondrial NAD(P)-ALD duplication, ALD5; ALD5.2 is the only known cytosolic NADP-ALD source in the CTG clade covered in this study. These yeasts also deviate from the typical Crabtree effect observed in Saccharomycetaceae: both yeasts possess Complex I and alternative oxidase, while not having any NDI orthologs; D. bruxellensis accumulates large amounts of acetic acid when grown aerobically with glucose, but for reasons not entirely understood; fructose 1,6-bisphosphate (F1,6P), an important metabolite elevated during overflow metabolism in many species, was found at a lower concentration in D. hansenii than S. cerevisiae following glucose addition to non-fasted cells (Sánchez et al., 2006); and finally, both yeasts exhibit the Crabtree effect at a lower glycolytic flux than Saccharomycetaceae yeasts (Van Urk et al., 1990). It’s uncertain how these paralogs may directly impact the Crabtree effect, but there is evidence NADP(H) may play an underappreciated role. Most Crabtree-positive yeasts cannot metabolize xylose via NADPH-dependent xylose reductase; Kluyveromyces sp. are the only known lineage in Saccharomycetaceae that have an NAD(P)-dependent glyceraldehyde 3-phosphate dehydrogenase (Verho et al., 2002) and can consume xylose; NADPH-dependent nitrate reductase impacts aerobic ethanol and acetic acid production in Pachysolen tannophilus and D. bruxellensis (Galafassi et al., 2013; Moktaduzzaman et al., 2015; Jeffries, 1983). Recent studies have shown the importance metabolites play in allosteric regulation and may offer a template to study how NAD(P)H directly or indirectly impacts the Crabtree effect (Hackett et al., 2016; Gerosa et al., 2015).
Evolution of phosphofructokinase and its possible connection to the PDH bypass
The emergence of ACS2 was followed by a duplication of PFK2, causing the ancestral homo-oligomeric 6-phosphofructokinase (PFK) to become a hetero-oligomeric enzyme with altered allosteric regulation (Habison et al., 1983; Reuter et al., 2000; Flores et al., 2005). Both forms are initially activated by ATP and subsequently inhibited at higher concentrations. PFK in Pezizomycotina fungi and citrogenic yeasts are inhibited by citrate or phosphoenolpyruvate, in contrast to the recently discovered pyruvate kinase inhibition by citrate in S. cerevisiae (Hackett et al., 2016); it is not known if citrate inhibition of pyruvate kinase exists in other budding yeasts. Hetero-oligomeric PFK’s in Saccharomycotina are also activated by AMP and fructose 2,6-bisphosphate (Bär et al., 1997; Lorberg et al., 1999; Kirchberger et al., 2002), which has not been found in Y lipolytica’s PFK (Flores et al., 2005). The physiological significance of these changes in allostery has not been explored in the literature, but the emergence of the PDH bypass may mark a shift from negative control of upper glycolysis by citrate under nitrogen limitation with citrogenic/lipogenic yeasts, to the positive control of upper glycolysis by AMP under excess glucose in ethanologenic yeasts. Crabtree-positive yeasts in Saccharomycetaceae have been shown to have elevated AMP levels, a product of acetyl-CoA synthase, relative to Crabtree-negative yeasts (Christen and Sauer, 2011); however, further characterization is needed in other Crabtree-positive and citrogenic yeast lineages. The reduced demand for cytosolic citrate from the loss of ACL and the absence of citrate inhibition in upper glycolysis in S. cerevisiae may have enabled F1,6P accumulation, which is an important source of metabolic control (Díaz-Ruiz et al., 2008) during excess glucose and nitrogen-limiting conditions (Hackett et al., 2016). Coarse-grained kinetic studies, similar to a recent investigation on the glycolytic imbalance in S. cerevisiae (van Heerden et al., 2014), could help elucidate the advantages of either allosteric interactions given their metabolic networks and environments. Ideal organisms for these studies include L. starkei as a proxy for Proto-Yeast, B. adeninivorans as a proxy for Proto-Fermenter since it uses the PDH bypass, retains ACL, and has a homo-oligomeric PFK, and finally N. fulvescens which gained a hetero-oligomeric PFK.
Biomass yield as a function of NADH dehydrogenase sink
The impact of using Complex I versus NDI as an NADH sink was assessed by optimizing for biomass production (40% protein content) with FBA. The proton-translocating Complex I biomass yielded 0.63 g biomass/g glucose in contrast to 0.50 g biomass/g glucose with non-proton translocating NDI (Figure 5D), which are in agreement with experimental yields when yeasts are grown with glucose (Van Hoek et al., 1998; Van Urk et al., 1990). These results suggest there is a clear disadvantage associated with the loss of Complex I with respect to the biomass yield. While such a loss in yield may not reflect the growth rate, a low yield might suggest a loss in biomass production.
Orientation of alternative NADH dehydrogenase
Phobius plots indicate the position of transmembrane domains in an alternative NADH dehydrogenase (NDH2) can predict its membrane orientation (Figure 5B and Figure 5C). NDI orthologs, which orients towards the matrix, have C-terminal hydrophobic domains; external alternative NADH dehydrogenases (NDE), which orient towards the cytosol, have dominant N-terminal hydrophobic domains and sometimes a C-terminal hydrophobic domain (Figure 5C). S. pombe does not encode any NDI orthologs but it can oxidize NADH within the mitochondrial matrix (Crichton et al., 2007); alternative translation start sites in its two NDE orthologs, which are conserved across other Schizosaccharomyces species, may translate additional isoforms (see Supplemental information).
Expansion and loss of Type II NADH dehydrogenase
Proto-Yeast likely had three orthologs of NDH2 (Figure 5A): NDI0, the ancestral NDI (Duarte et al., 2003) orthologous to Escherichia coli’s NDH-2 gene; NDE0, an ancestral NDE originating from an NDI0 duplication after endosymbiosis; NDE1, an ortholog of S. cerevisiae’s NDE1, derived from an NDE0 duplication before the emergence of fungi. NDE1 is conserved in all species reviewed in this study, but there is a repeated loss of internal NADH dehydrogenase in yeasts. NDI0 was lost in the sister clade of L. starkeyi. Another ortholog of internal NADH dehydrogenase re-emerged as NDI1 from a duplication of NDE1 in the clade sister to Ascoidea; NDI1 was subsequently lost in the Pichiaceae and CTG clades. NDI also independently re-emerged in N. fulvescens, likely before the loss of its Complex I. NDE1.2 in S. pombe originated from an NDE1 duplication in Schizosaccharomyces. Unlike S. cerevisiae, Proto-Yeast’s NDE enzymes likely had NADPH dehydrogenase activity (Tarrío et al., 2006a,b; Melo et al., 2001; Carneiro et al., 2004). It is not known where the NADPH dehydrogenase activity was lost in S. cerevisiae evolution, or whether its loss contributed to the Crabtree effect.
The loss of Complex I in the two most widely studied Crabtree-positive yeasts, S. cerevisiae and S. pombe, led some researchers to believe the loss of Complex I was associated with aerobic fermentation (Gabaldón et al., 2005). Kerscher (2000) claimed it is more advantageous for an organism to synthesize a single polypeptide of NDI than Complex I, one of the largest known proteins, when a carbon source is abundant and rapid growth is essential (Katz et al., 1971; Schwitzguebel and Palmer, 1982). Repression of the ETC in fermentative yeasts would have lead to the loss of Complex I, which was functionally replaced by NDI. However, the genome-sequencing of D. bruxellensis, a lesser studied Crabtree positive yeast, contained subunits of Complex I prompting researchers to reconsider this relationship (Procházka et al., 2010). The Kerscher (2000) hypothesis is consistent with S. cerevisiae’s physiology but does not consider the evolution and physiology of other yeasts (see Supplemental Information).
Repression and inhibition of internal NADH dehydrogenase may both lead to aerobic ethanol accumulation, but the underlying physiology and evolution can be different. Inhibition may lead to repression, but there can be many reasons why an organism represses an enzyme, such as repressing the ETC to accumulate ethanol to inhibit bacterial growth (Merico et al., 2007). Several studies indicate inhibition is relevant to the ETC. First, substrates with the adenosine moiety were found to inhibit Complex I from bovine heart (Birrell and Hirst, 2013). Second, Complex I is expressed but inactive in exponentially growing cells when grown on ethanol (Ohnishi, 1972); this is also compatible with post-translation modification (PTM), but protein phosphorylation has only been found to stimulate respiration in glioma cells (Pasdois et al., 2003). Third, the short-term Crabtree effect in S. cerevisiae results in overflow metabolism within seconds of a glucose pulse in glucose-limited chemostats (Van Urk et al., 1988), but also after more than 40 minutes with Kluyveromyces lactis (Kiers et al., 1998) and 60 minutes with Cyberlindnera jadinii, formally Candida utilis (Van Urk et al., 1988). The ETC is not repressed under these conditions, yet ethanol accumulates before any changes in the proteome can occur; this is also consistent with metabolite inhibition or a PTM in the ETC or TCA cycle. Fourth, NADP was found to inhibit the oxidation of NADH in purified mitochondria from a relative of S. stipitis (Camougrand et al., 1988). Finally, neuronal cells exposed to rotenone have a reduction of glucose-derived acetyl-CoA and shift to lipogenesis from glutamate, which are characteristics of the Warburg effect (Worth et al., 2014). These results suggest an underappreciated role of enzyme kinetics in the Crabtree-effect.
F1,6P is a potent inhibitor that is frequently elevated during overflow metabolism in multiple organisms. F1,6P has been found to inhibit Complex III and IV in S. cerevisiae at physiological concentrations, but not in C. jadinii, a well-studied Crabtree-negative yeast (Díaz-Ruiz et al., 2008; Hammad et al., 2016). Although the evidence for F1,6P inhibition on the ETC is strong, it is likely a single interaction in a cascade of events since there must be a reason for F1,6P accumulation. Crabtree-positive and Crabtree-negative yeasts have been used as a binary classification, but large-scale analysis within Saccharomycetaceae demonstrates aerobic ethanol production occurs over a gradient (Hagman et al., 2013). C. jadinii shows early symptoms of the Crabtree effect since it accumulates ethanol after a glucose pulse in glucose-limited chemostats, and has an inactive Complex I during growth on glucose in batch and chemostats. An inactive Complex I in C. jadinii, due to inhibition or repression, forces it to rely on Type II NADH dehydrogenase, leading to an increase in its glucose uptake to compensate for its lower P/O ratio relative to Complex I. The increase in glucose uptake plays a role in the imbalance between upper and lower glycolysis, and hence F1,6P accumulation. An inactive Complex I during growth on non-fermentable carbon sources in C. jadinii, and lack of increase in F1,6P concentration in D. hansenii (Sánchez et al., 2006), precludes F1,6P’s inhibition of Complex I or as a master regulator in all Crabtree-positive species. F1,6P may inhibit the ETC of some Saccharomycetaceae species, but there are likely additional mitochondrial inhibitors to the ETC or Complex I. If a yeast consistently found itself in an environment where Complex I was persistently inactivated by inhibition, a single deleterious mutation in one subunit would have been enough to cause Complex I to become defunct, rendering its more than 37 genes into pseudogenes, and eventually junk DNA without a trace of Complex I, despite having a higher biomass yield than NDI.
Additional results and discussions on the consensus species tree, gene duplications, subfunction-alization, neofunctionalization and horizontal gene transfers in central metabolism can be found in Supplemental Information. These include ribosome subunit duplications, phosphoglycerate mutase, citrate synthase, glyceraldehyde 3-phosphate dehydrogenase, UDP-glucose-4-epimerase, isocitrate dehydrogenase, malate synthase, malate dehydrogenase, fumarase, fumarate reductase, succinate semialdehyde, homocitrate synthase, trifunctional enzyme C1-tetrahydrofolate synthase, NAD-glycerol 3-phosphate dehydrogenase, and alcohol dehydrogenase.
Benefits of an ortholog database for genomics
AYbRAH was used to evaluate the gains and losses of genes that may have contributed to the Crabtree effect, but an open source curated ortholog database has additional benefits from providing gene targets for functional characterization to functional gene annotation to streamlining genome-scale reconstructions. First, a curated ortholog database can serve as a repository for orthologs that have been screened and orthologs that require screening. If we want to engineer non-conventional organisms, we need to understand the orthologs that do not exist in model organisms and the set of orthologs that do not have conserved function with model organisms, rather than characterizing all the orthologs in a handful of model organisms. Second, a curated ortholog database can be used to improve and simplify genome annotation. Genes from newly sequenced organisms can be mapped to curated ortholog groups, similar to what has been implemented with eggNOG-mapper (Huerta-Cepas et al., 2017) and YGAP (Proux-Wéra et al., 2012), and create new ortholog groups for de novo genes or genes from recent duplications. Pulling annotations from a curated ortholog database has the advantage of unifying the names and descriptions of genes between organisms, as has been proposed for ribosomal subunits (Ban et al., 2014), and can reduce the number of genes that are unannotated, misannotated, or annotated as hypothetical proteins. And finally, a curated ortholog database can be used to increase the quality and quantity of genome-scale reconstructions. Genome-scale reconstructions inherently require a great deal of curation for orthology assignment and functional annotation, which is often not transparent. Refocusing this effort to curate ortholog groups and their function in an open source database for a pan-genome can allow for improvements to be pushed to all reconstructions, and for genome-scale reconstructions to be compiled for any taxonomic level, from Kingdom to Genus species.
4 CONCLUSION
We developed AYbRAH, an open source ortholog database for Fungi, to determine what gains and losses of metabolic genes may be relevant to the Crabtree effect in budding yeasts, with a focus on the PDH bypass and NADH dehydrogenase. Manual curation was required for gene families with high sequence similarity, often arising from recent gene duplications. FBA simulations indicate the PDH bypass has a higher yield of protein and phospholipids from glucose than ACL, the ancestral acetyl-CoA source in Proto-Yeast. All yeasts in this study have homologs of the PDH bypass but the delimiting gene for its expression with glucose appears to be ACS2, a paralog of the ancestral acetate-inducible ACS. FBA simulations demonstrate Complex I results in a 26% higher biomass yield than internal alternative NADH dehydrogenase when cells are grown with glucose. Complex I is bypassed for NDI in some yeast lineages, and both enzymes have been lost in many independent yeast lineages. We propose the repeated loss of Complex I is from inhibition by a mitochondrial metabolite, which is consistent with the physiology of Crabtree-positive and negative yeasts. By reconstructing the evolution of homologs into orthologs at the pan-genome level, bioinformaticians can improve existing and future genome annotations and improve the accuracy of species trees, which can both help biologists understand the evolution of physiology throughout the tree of life.
6 ABBREVIATIONS
ACS: acetyl-CoA ALD: aldehyde dehydrogenase ACL: ATP citrate lyase AYbRAH: Annotating Yeasts by Reconstructing Ancestry of Homologs ETC: electron transport chain F1,6P: fructose 1,6-bisphosphate FBA: flux balance analysis FOG: Fungal Ortholog Group HOG: Homolog Group NDE: external alternative NADH dehydrogenase NDI: internal alternative NADH dehydrogenase NDH2: alternative NADH dehydrogenase NUO: Complex I PDC: pyruvate decarboxylase PDH: pyruvate dehydrogenase PFK: 6-phosphofructokinase PHK: phosphoketolase PTM: post translational modification
5 ACKNOWLEDGMENTS
The authors gratefully acknowledge Prof. Belinda Chang and Ryan Schott for their advice with the phylogenetic analysis. K.C. was supported by Bioconversion Network and NSERC CREATE M3.
Author contributions: K.C. and S.M.Y. performed the phylogenetic analysis. K.C. and R.M. discussed the results. K.C. wrote the manuscript. K.C. and R.M. reviewed the manuscript.