Abstract
Sexual development is a key evolutionary innovation of eukaryotes. In many species involves the interaction of compatible mating partners that can undergo cell and nuclear fusion and subsequent steps of development including meiosis. Mating compatibility in fungi is governed by mating type determinants, which are localized at mating type (MAT) loci. In basidiomycetes, the ancestral state is hypothesized to be tetrapolar (bifactorial), with two genetically unlinked MAT loci containing homeodomain transcription factor genes (HD locus) and pheromone and pheromone receptor genes (P/R locus), respectively. Alleles at both loci must differ between mating partners for completion of sexual development. However, there are also basidiomycete species with bipolar (unifactorial) mating systems, which can arise through genomic linkage of the HD and P/R loci. In the Tremellales, which comprise mostly yeast-like species, bipolarity is found only in the pathogenic Cryptococci, e.g. in the well-studied human pathogen Cryptococcus neoformans. Here, we describe the analysis of MAT loci from the Trichosporonales, a sister group to the Tremellales. We analyzed genome sequences from 29 strains comprising 24 species, including two new genome sequences generated in this study. Somewhat surprisingly, in all of the species analyzed, the MAT loci are fused and a single HD gene is present in each allele. This is similar to the organization in the pathogenic Cryptococci, which also have linked MAT loci and carry only one HD gene per MAT locus instead of the usual two HD genes found in the vast majority of basidiomycetes. However, almost all Trichosporonales strains analyzed carry either the combination of the HD gene SXI1 with the pheromone allele STE3a, or the combination of the SXI2 gene and STE3α allele. This is in contrast to MAT alleles in C. neoformans, where SXI1 is linked with STE3α, and SXI2 is linked with STE3a. The differences in allele combinations as well as the existence of tetrapolar Tremellales sister species to the bipolar Cryptococci suggest that the fusion of the HD and P/R loci and the loss of one HD gene per allele occurred independently in the Trichosporonales and the pathogenic Cryptococci, supporting the hypothesis of convergent evolution at the molecular level towards fused mating-type regions in fungi. A phylogenetic analysis of divergence times suggests that the MAT fusion in the Trichosporonales is the oldest fusion of MAT loci observed to date.
Author summary Sexual development in fungi is governed by mating-type (MAT) genes, and the corresponding MAT loci show similarities to sex chromosomes in animals and plants. One common feature is an evolutionary trend towards combining sex-associated genes on the same chromosome, which can evolve by selection because it facilitates linkage of favorable allele combinations. Here, we show that this occurred in the Trichosporonales, a sister group to the Tremellales, similar to the expanded, fused MAT loci discovered previously in the human pathogen Cryptococcus neoformans. Our data suggest that fusion of MAT loci occurred independently in the Trichosporonales and pathogenic Cryptococci, supporting the hypothesis of convergent evolution towards fused MAT regions in fungi.
Introduction
Sexual reproduction is pervasive among eukaryotic organisms, but despite its rather conserved core features (syngamy/karyogamy and meiosis), many aspects of sexual development show high evolutionary flexibility [1–3]. This includes the determination of compatible mating partners that can successfully undergo mating and complete the sexual cycle. In many species, compatibility is determined by one or more genetic loci that differ in compatible mating partners. The evolution of such genes has been studied in many systems including plants, animals, algae, and fungi [1, 4–6]. In animals and plants, genes that determine sexual compatibility are often found on sex chromosomes, which have distinct evolutionary histories compared to autosomes. In fungi, genetic systems for determining mating compatibility vary widely, ranging from single genetic loci on autosomes to chromosomes that have the hallmarks of sex chromosomes, e.g. suppressed recombination. Studies in different fungal groups have revealed that transitions towards larger, sex chromosome-like regions have occurred several times in fungal evolution, with some systems having evolved only recently [5, 7–9]. Thus, fungi are excellent model systems to study the evolution of genomic regions involved in mating and mating type determination.
Mating compatibility in fungi is governed by mating-type genes, which are located in mating-type (MAT) loci [10, 11]. While ascomycetes and the Mucoromycotina are bipolar (unifactorial), i.e. harboring only one MAT locus [12–14], in basidiomycetes, the ancestral state is considered to be tetrapolar (bifactorial), with two genetically unlinked MAT loci controlling mating-type determination at the haploid stage [15, 16]. One MAT locus usually contains tightly linked pheromone and pheromone receptor genes (P/R locus) involved in premating recognition, and the other encodes homeodomain transcription factor genes encoding homeodomain (HD) proteins of class 1 and class 2 (HD locus) determining viability after syngamy. Importantly, alleles at both loci must differ between mating partners for completion of the sexual cycle [17].
However, there are also many basidiomycete species with bipolar mating systems, which can arise through genomic linkage of the HD and P/R loci, or when the P/R locus loses its function in determining mating specificity [15, 17, 18]. Bipolarity through MAT loci fusion is found in the subphylum Ustilaginomycotina, e.g. in Malassezia species and Ustilago hordei, several Microbotryum species (subphylum Pucciniomycotina), as well as in the pathogenic Cryptococci from the class Tremellomycetes (subphylum Agaricomycotina) [1, 7, 19–23] (Fig. 1). In Microbotryum, several convergent transitions to linked MAT loci were shown within the genus [7]. One feature that known species with fused MAT loci have in common is that they are associated with plant or animal host species as pathogens or commensals. It has been hypothesized that the necessity to find a mating partner while associated with a host might have favored linkage of MAT loci, because having one linked instead of two unlinked MAT loci increases the compatibility of gametes derived from a single diploid parent. This would improve mating compatibility rates on a host where other mating partners might be difficult to find, e.g. when a host is initially colonized by a single diploid genotype or meiotic progeny from a single tetrad [4, 8, 15, 17, 24].
Two additional evolutionary features can be associated with the linkage and expansion of the MAT loci. One is the recruitment of other development-associated genes into the fused MAT loci, and the second is suppression of recombination between the fused MAT loci that can extend along the MAT-containing chromosome [11, 23, 25]. Suppression of recombination is a hallmark of sex chromosomes in other eukaryotes as well, and thus might point towards convergent evolutionary transitions for the regulation of sexual development in eukaryotes [26, 27]. Suppression of recombination between the linked MAT loci would further increase compatibility under inbreeding conditions, and recruitment of sex-associated genes into the MAT locus might facilitate the inheritance of favorable allele combinations through genetic linkage [28].
A well-studied case of fused MAT loci is the one of Cryptococcus neoformans, a member of a group of closely related, pathogenic Cryptococcus species [1]. Fusion of the HD and P/R loci most likely occurred in the ancestor of the pathogenic Cryptococci, because other analyzed Tremellales species including the closely related, but non-pathogenic Cryptococcus amylolentus, Kwoniella heveanensis, Kwoniella mangrovensis, Cryptococcus wingfieldii, and Cryptococcus floricola are all tetrapolar [29–33]. The fused C. neoformans MAT locus encompasses more than 20 genes over a region spanning more than 100 kb and has two alleles designated a and α. In the majority of basidiomycetes, each MAT allele at the HD locus carries both the HD1 and the HD2 transcription factor genes, whereas in C. neoformans, the MATα locus contains only the HD1 gene SXI1α, and MATa contains only the HD2 gene SXI2a. Except for a gene conversion hotspot, the C. neoformans MAT locus displays suppressed meiotic recombination [23, 34, 35].
Among the Tremellomycetes, pathogenic Cryptococci are so far the only species for which fused MAT loci have been described [17, 23]. In a previous study, we analyzed the genome sequence of Cutaneotrichosporon oleaginosum strain IBC0246 (formerly Trichosporon oleaginosus), which belongs to the Trichosporonales, a sister order to the Tremellales within the Tremellomycetes class. Trichosporonales species are widely distributed in the environment and have been isolated from a variety of substrates including soil, decaying plant material, and water. Many species are saprobes, but some have also been found to be associated with animals including humans either as commensals or pathogens [36–38]. Despite their common occurrence in the environment, Trichosporonales are an understudied fungal group, and sexual reproduction has not yet been observed for any of the known species [39, 40]. Recently, several Trichosporonales were studied with respect to their biotechnological properties, including the oil-accumulating C. oleaginosum, which was first isolated from a dairy plant, and has the ability to metabolize chitin-rich and other non-conventional substrates [41–44]. The sequenced C. oleaginosum strain is haploid, and similar in genome size and gene content to genomes from the sister order Tremellales, and this was also the case for several other Trichosporonales genomes that have since been sequenced [43, 45–50]. Interestingly, C. oleaginosum showed some similarities to C. neoformans in the organization of MAT loci. This included recruitment of genes with diverse functions during mating into the HD and P/R loci, as well as the presence of only a SXI1 homolog at the HD locus [43]. However, in the draft genome assembly, HD- and P/R loci were situated on different scaffolds. Furthermore, a sexual cycle has not yet been described for C. oleaginosum. Thus, it was not possible to conclusively distinguish whether this species carries two unlinked or fused MAT loci, with the seemingly more likely possibility being a tetrapolar mating system as this is present in the majority of species from the sister group of Tremellales.
Since the analysis of the first C. oleaginosum genome, several more Trichosporonales genomes were sequenced, although none were analyzed with respect to their MAT loci [45–51]. Here, we describe the sequencing of two additional Trichosporonales genomes for C. oleaginosum ATCC20508 and Vanrija humicola CBS4282, and the analysis of MAT loci organization in Trichosporonales genomes from 24 different species. Surprisingly, we found that all of the species analyzed have fused MAT loci that contain both P/R and HD genes. Furthermore, all the analyzed strains of each species contain only one of the two ancestral HD genes (SXI1 and SXI2), with almost all of the species carrying either the combination of the HD gene SXI1 with the pheromone receptor allele STE3a, or the combination of the SXI2 gene and the STE3α allele. This is in contrast to C. neoformans, where SXI1 is combined with STE3α in the MATα allele, and SXI2 is combined with STE3a in the MATa allele [34]. The differences in allele combinations as well as the existence of tetrapolar Tremellales sister species to the bipolar Cryptococci suggest that the fusion of the HD and P/R loci as well as the loss of one HD gene per allele occurred independently in the Trichosporonales and the pathogenic Cryptococci. This provides further evidence that convergent evolution leading to fused MAT loci is evolutionary beneficial under certain circumstances, and has been selected for multiple times in basidiomycetes, with the MAT loci fusion in Trichosporonales representing the oldest such event observed to date.
Results
Trichosporonales species have fused HD and P/R mating type loci
Since 2015, genome sequences have been published for Trichosporonales species from the genera Apiotrichum, Cutaneotrichosporon, Takashimella, Trichosporon, and Vanrija [45–52] (Table 1). We analyzed the genomes of 29 isolates that belong to 24 Trichosporonales species for the organization of the MAT loci, and found that all of them contain fused MAT loci, with both mating-type determining genes (STE3 and SXI) located ~55 kb apart from each other. Except for four strains described below, all analyzed MAT loci comprise either the combination of the HD gene SXI1 with the pheromone allele STE3a, or the combination of the SXI2 gene and STE3α allele (Table 1, Figs 1 and 2). Each MAT allele carries a single pheromone precursor gene in the vicinity of the STE3 gene (Fig. 2, S1 Text). The SXI/STE3 combinations found in the Trichosporonales are different from the MAT alleles in C. neoformans, where the SXI1 gene is combined with STE3α in the MATα allele, and SXI2 is combined with STE3a in the MATa allele [23, 34]. To avoid confusion with the C. neoformans nomenclature and be compatible with allele designations in other basidiomycetes, we named the STE3α-containing allele A1, and the STE3a-containing allele A2 (Figs 1 and 2).
The fact that the genomes of 24 Trichosporonales species carry fused MAT loci with two allelic combinations distributed throughout the phylogenetic tree (Fig 1) suggests that the linkage event predates the diversification of the Trichosporonales clade. However, at the beginning of this study, only one MAT allele was known for each of the Trichosporonales species. To verify that both MAT alleles, A1 and A2, can be found in different strains within a single species, we analyzed additional Vanrija humicola strains (Table 2). V. humicola is a member of a phylogenetically early-branching lineage within the Trichosporonales, and the previously sequenced strain (JCM1457) carries the A2 allele (Figs 1 and 2) [50]. Several other strains obtained from culture collections were analyzed by RAPD genotyping, which revealed banding patterns similar to those from the type strain V. humicola JCM1457 (=CBS571, MAT A2) (S1 Fig). These strains were analyzed further by PCR for the mating type genes. Strains other than CBS4282 and CBS4283 gave PCR products when using oligonucleotides for SXI1 and STE3a derived from the JCM1457 (A2) genomic sequence. The two genes adjacent to the SXI gene in both MAT alleles are conserved in the analyzed Trichosporonales genomes (Fig 2), therefore we used oligonucleotide primers derived from these genes to amplify the predicted SXI2-containing region of CBS4282 and CBS4283 confirming the presence of an SXI2 gene in these strains. To assess this further, we sequenced the genome of CBS4282 using Illumina sequencing. The genome was assembled into 21 scaffolds with 5089 predicted genes. At 22.63 Mb, the assembly size is similar to that of strain JCM1457 (A2) (22.65 Mb, [50]). A k-mer analysis of the Illumina reads for CBS4282 showed a single peak as expected for a haploid genome (S1 Fig). Analysis of the MAT region revealed fused MAT loci carrying the SXI2 gene and STE3α allele, confirming that this strain is indeed an A1 strain, and thus showing that alternative alleles can be found in different strains of a single Trichosporonales species (Figs 2 and 3, S2 and S3 Figs, Table 2). This was further confirmed by an analysis of the recently published genome of a second Apiotrichum porosum strain [51], with the two A. porosum genome sequences available now representing the A1 and A2 alleles (Figs 2 and 3).
Alternative MAT alleles in Trichosporonales show significant chromosomal rearrangements, but each displays highly conserved structure across species
The gene order within the core MAT region between the SXI2 and STE3α/MFα genes is very well conserved between V. humicola CBS4282 (A1) and the A1 alleles of the other Trichosporonales species analyzed, whereas it differs from the V. humicola A2 strain JCM1457 by two inversions, one of which encompasses most of the core MAT region (Fig 2). The same is true for the A2 alleles, where the core MAT region is also rather conserved between V. humicola JCM1457 (A2) and other Trichosporonales (Fig 2).
In addition to the key mating-type determinants, the MAT loci of the Trichosporonales contain other genes previously shown to be required for mating and filamentation in other species (e.g. STE11 and STE20), and their common presence in the MAT loci of Tremellales, suggests these genes were anciently recruited to the MAT locus (Fig 2). In the Trichosporonales, STE11 and STE20 are found within the core MAT region (between the HD and P/R genes), and STE12 is in the vicinity of the core MAT region in most species (Fig 2). One exception turned out to be STE12 in the Cutaneotrichosporon lineage. In C. oleaginosum strains IBC0246 (A2) and ATCC20509 (A2), STE12 is located on a different scaffold than the MAT locus. While STE12 in IBC0246 (A2) is located at the end of its respective scaffold, its location in ATCC20509 (A2) is around position 1.1 Mb of a 2.5 Mb scaffold, indicating that even if this scaffold was linked to the MAT scaffold, the linkage would not be very tight.
To confirm that STE12 is indeed not linked to the MAT locus in C. oleaginosum, we sequenced the genome of the third of the three known strains of this species, ATCC20508. The genome was sequenced with Pacific Biosciences SMRT sequencing and assembled into eight contigs with 8208 predicted genes. CHEF gel electrophoresis of the C. oleaginosum strains shows a karyotype comprising seven chromosomes, with the sizes of the six largest chromosomes corresponding well to the sizes of the six largest contigs of the ATCC20508 assembly (S4 Fig). Similar to the other two strains, ATCC20508 carries the MAT A2 allele and STE12 on different contigs (contigs 1 and 3, respectively; S4 Fig), making it unlikely that they are located the same chromosome (S4 Fig). Additionally, analyses of the ATCC20508 (A2) and ATCC20509 (A2) genome assemblies showed that both the MAT locus and STE12 are located within large co-linear scaffolds in both strains, providing further evidence that STE12 is not linked to the MAT locus in this species (Fig S4). Consistent with this, in the other Cutaneotrichosporon species analyzed, STE12 is either on the same scaffold as MAT, but at a distance of 400 to 750 kb (S5 Fig), or is present on different scaffolds than MAT. Thus, while this suggests that the location of STE12 in the vicinity of MAT likely represents the ancestral state in the Trichosporonales, its relocation to a different genomic location in the Cutaneotrichosporon species indicates that STE12 might not be fully linked to MAT in the Trichosporonales.
Another difference in MAT organization that can be observed in the genus Cutaneotrichosporon is the combination of HD genes and STE3 alleles. While all other strains of each species carry either A1 (STE3α+SXI2) or A2 (STE3a+SXI1) alleles, Cutaneotrichosporon dermatis carries an allele that has the overall genomic organization of A1, but has instead an SXI1 gene (S5 Fig). Because the A1 vs. A2 designation is based on the STE3 variant, this allele was called A1*. Interestingly, A1* alleles can also be found in Apiotrichum veenhuisii, whereas a corresponding A2* allele, in which STE3a is linked to the SXI2 gene, can be found in Apiotrichum gracile (Fig 1 and S5 Fig). Furthermore, both A1* and A2* alleles can be found in Cutaneotrichosporon mucoides, one of several hybrid species within the Trichosporonales (S6 Fig, S2 Text) [49, 50]. These MAT alleles seem to represent a derived state and might have been independently generated by recombination involving crossing-over between the MAT region adjacent to the SXI1/SXI2 genes that is co-linear between the A1 and A2 alleles (S5 Fig).
The fusion of the MAT loci in Trichosporonales is ancient
The overall high degree of conservation in gene order and allele combinations of HD and P/R loci in the Trichosporonales suggests that the fusion of the MAT loci occurred in the common ancestor of the Trichosporonales lineage. To determine at what time this fusion might have occurred, we estimated divergence times of the basidiomycete lineages included in Fig 1 using three calibration points (see S7 Fig and Materials and Methods). Based on this analysis, the Trichosporonales and Tremellales have a common ancestor dating back approximately 179 million years ago (MYA), whereas the earliest split within the Trichosporonales occurred approximately 147 MAY (S7 Fig). Thus, under the assumption of a MAT fusion in the common ancestor of the Trichosporonales, this fusion would have occurred between 179 and 147 MYA, making it the oldest observed MAT fusion event in basidiomycetes, because the MAT fusions in the pathogenic Cryptococci, the Ustilaginomycotina, and the Microbotryum lineage occurred more recently (S7 Fig). While our estimates of divergence times for the Microbotryum and Trichosporonales lineages agree well with recent analyses [7, 53], it has to be noted that previous estimates for the last common ancestor of the pathogenic Cryptococci resulted in earlier divergence times ranging from 40 to 100 MYA [54–57]. Nevertheless, even if the divergence time of the ancestor of the pathogenic Cryptococci were underestimated in our analysis, this would still place the last common ancestor of the Trichosporonales, and thus the likely MAT fusion in this group, at a much earlier time point.
Suppression of recombination is restricted to the fused MAT loci in V. humicola
Suppression of recombination is a hallmark of sex chromosomes of animals and plants and the mating-type chromosomes of algae and fungi [9, 25, 31, 34, 58–60]. One consequence of recombination cessation can be the accumulation of transposable elements as well as increased genetic differentiation between allelic sequences [26]. This was observed in C. neoformans, where the MATa and MATα alleles differ significantly in gene organization, and the MAT locus contains more remnants of transposons and other repeat sequences than other genomic regions except for the centromeres and rDNA repeats [23, 61]. Furthermore, both C. neoformans MAT alleles are highly rearranged in comparison with their Cryptococcus gattii counterparts [34].
In contrast, the MAT A1 and A2 alleles of V. humicola are overall co-linear apart from two inversions, and this is generally the case throughout the Trichosporonales (Figs 2 and 3A). An analysis of repetitive sequences shows that there is no accumulation of repeat regions within the MAT alleles of V. humicola (S1 Table). Thus, whereas the two inversions might impair meiotic pairing and therefore recombination in this region, the conserved gene order and lack of repeats made us consider if the V. humicola MAT loci, and more generally the Trichosporonales MAT loci, are regions of suppressed recombination.
To test this, we first analyzed phylogenetic trees for several genes present within or adjacent to the MAT loci of Trichosporonales (Fig 4). For genes in regions undergoing meiotic recombination, alleles associated with alternative mating types are expected to display a species-specific topology in a phylogenetic tree, whereas genes in MAT regions should cluster by mating type if recombination suppression predates speciation (i.e. with the A1 alleles of the different species branching together rather than each of the alleles clustering with the A2 allele from the same species) [30, 34]. In the Trichosporonales, only STE3 clearly shows a mating type-specific pattern with an ancient trans-species polymorphism in Trichosporonales and Tremellales (Fig 3B). None of the other genes tested showed a mating type-specific phylogenetic pattern for the Trichosporonales at such a deep phylogenetic level, indicating that recombination was not suppressed already at the base of the clade. However, for several genes within the core MAT locus, sequences from two V. humicola A2 strains (JCM1457 and UJ1) group apart from the A1 sequence (CBS4282) (Fig 3B), and therefore, these genes might be mating type-specific at the species level, although more strains from more species would have to be investigated to exclude this finding occurring by chance (Fig 3).
This finding is further supported by BLASTN analyses comparing alleles in the three available V. humicola genomes, which showed that within the core MAT region, alleles from the A2 strains (JCM1457 and UJ1) are more similar to each other than to the A1 strain CBS4282, whereas outside of this region this is not the case (Figs 4A and 4B). This suggests that the MAT locus of V. humicola might be a region of recent recombination suppression.
To further test whether recombination is suppressed between the MAT alleles in V. humicola, we analyzed levels of synonymous divergence (dS) between alleles on the MAT-containing scaffold as well as on three other scaffolds of CBS4282 (A1) and the two V. humicola A2 strains (S8 Fig). In the four longest scaffolds, genes within large regions of the scaffolds are more similar between CBS4282 (A1) and the JCM1457 (A2) than between the two A2 strains, where stretches of low dS values occur much less frequently (S8 Fig). Interestingly, we also noted regions of high similarity between CBS4282 (A1) and JCM1457 (A2) that are interrupted by stretches containing more divergent alleles. This pattern is suggestive of ongoing recombination in natural V. humicola populations, even though sexual reproduction has not yet been observed.
In contrast to other genomic regions, including the co-linear regions outside of the core MAT region where overall divergence is lowest between CBS4282 (A1) and JCM1457 (A2) (Fig 4C and S8 Fig), most of the alleles in the core MAT region are slightly more divergent between the A1 strain and the two A2 strains than between the two A2 strains (Fig 4C). This is consistent with an absence of genetic exchange between the A1 and A2 alleles, as one would expect if recombination in this region is suppressed. These findings might be explained by a reduced recombination rate in the core MAT region carrying the inversions in alternate MAT alleles, which could lead to accumulation of mutations in the two MAT alleles and thus to elevated synonymous divergence between A1 and A2 alleles. However, divergence between A1 and A2 strains within the core MAT region is only moderately elevated for most genes, and the difference between the average dS values is only statistically significant when comparing the analysis of CBS4282 (A1) vs. UJ1 (A2) with the analysis of JCM1457 (A2) vs. UJ1 (A2) (Fig 4D). One possible explanation could be that genetic exchange within the inverted regions may not be completely inhibited, because exchange via non-crossover gene conversion or double crossover can still occur within the inverted regions [62]. This should result in a so-called suspension bridge pattern with divergence in the middle of an inverted region lower than towards the inversion breakpoints [63]. To test this, we performed BLASTN analyses on sliding windows of genomic sequences of the MAT locus and adjacent regions (S9 Fig). The results show less sequence similarity in the regions of putative inversion breakpoints between A1 and A2 strains compared to regions within the inversions and outside of the MAT region, consistent with the hypothesis that a certain amount of recombination is occurring within the inverted regions.
Even though recombination within the MAT region might not be fully suppressed possibly due to non-crossover gene conversion, the gene order within each A1 and A2 allele in different species is surprisingly well conserved (Fig 2). To test if this degree of gene order conservation extends beyond the MAT locus, we compared the MAT-containing scaffolds of Trichosporonales. As expected for a range of species separated by millions of years of evolution, synteny is conserved between closely related, but not between species that diverged long ago (S10 Fig), except for the MAT region. This suggests that the MAT region of Trichosporonales is an ancient cluster of tightly linked loci (also known as ‘supergenes’) that seem to segregate as a stable polymorphism within the populations of each species. Thus, the Trichosporonales appear to contain very stable MAT alleles with respect to gene order, combined with (slightly) suppressed recombination between different alleles.
The MAT loci in the Trichosporonales have a significantly lower GC content compared to the overall genomic GC content
One curious observation was found during the analysis of the Trichosporonales MAT region, namely a lower GC content in the MAT regions compared to the surrounding regions for the analyzed Vanrija, Cutaneotrichosporon, and Trichosporon strains (S10 and S11 Figs). One possible explanation for a lower GC content could be an accumulation of AT-rich transposable elements, but an analysis of repeats in the three V. humicola strains showed that there are only few (strain CBS4282, A1) or no (strains JCM1457 and UJ1, both A2) repeats present within the MAT region of these strains (S1 Table). Another explanation might be a lower density of coding regions, which tend to have a higher GC content than non-coding regions. However, an analysis of the GC content only in the coding sequences within and around the MAT region of strain CBS4282 showed the same pattern of lower GC content within the MAT region (S11B Fig). Another possible explanation for the lower GC content might be the accumulation of mutations due to reduced recombination. Under the (simplistic) assumption that mutation frequencies for all nucleotide exchanges are similar, this would drive the GC content towards 50%. In genomes with an average GC content of more than 50%, this would appear as a region with lower GC content, and this would apply to the analyzed strains with an average genomic GC content of 58 to 63 % (S11 Fig). It has been shown that spontaneous mutations tend to be AT-biased in many species including several fungi [64–68], and thus higher mutation rates would generally lead to a lower GC content in the corresponding regions. Both explanations can be supported by an analysis of the codon usage within the MAT region of the three V. humicola strains (S2 Table). Among the codons for amino acids that can be encoded by more than one codon, there is a trend for GC-rich codons to be used less frequently in the MAT region compared to the genome-wide usage, consistent with (GC content-equalizing or AT-biased) mutations combined with selection for conserved protein sequences due to functional constraints. Additional studies of species with a GC content of less than 50% might be useful to test these hypotheses.
Discussion
Fused MAT loci evolved several times in basidiomycetes
In this study, we analyzed MAT loci from the Trichosporonales, and found that all of the analyzed species harbor fused MAT loci with a single HD gene, an arrangement that among the Tremellomycetes has so far only been found in the pathogenic Cryptococci [17]. Fused MAT loci have been identified previously in other basidiomycetes (Malassezia, Microbotryum, Sporisorium, Ustilago), but these MAT loci carry the ancestral arrangement of two HD genes [7, 17, 19–22, 69, 70]. An analysis of several strains from V. humicola showed that there are (at least) two MAT alleles, and homologs of both alleles are distributed throughout the Trichosporonales phylogenetic tree. This suggests that these alleles might be the most prevalent alleles in this group as the selection of sequenced strains was not based on and therefore most likely not biased towards certain MAT features. Indeed, if the mating system is predominantly inbreeding, rare alleles have no advantage and can be gradually lost by genetic drift. Thus, fusion of MAT loci, which can evolve by selection because it is beneficial in selfing mating systems, is predicted to lead to a reduction in the number of MAT alleles [4, 71]. In addition, theoretical modelling predicts that a combination of facultative and rare sexual reproduction, low mutation rates, and a small effective population size should lead to a reduction in the number of mating types [72]. Population sizes and mutation rates are, however, not known for Trichosporonales species, and no sexual development has been observed so far making it possible that asexual reproduction is the predominant form of propagation for many Trichosporonales.
Convergent evolution of fused MAT loci in Trichosporonales and Tremellales
Overall, our finding of fused MAT loci with a single HD gene in all of the Trichosporonales species investigated is most consistent with the hypothesis that a fusion event of the HD and P/R loci followed by loss of one HD gene per MAT allele occurred independently in the Trichosporonales and the pathogenic Cryptococci lineages. The alternative hypothesis that the fusion of the HD and P/R loci occurred in the common ancestor of the Trichosporonales and pathogenic Cryptococci is less parsimonious as it would imply multiple independent reversions to tetrapolarity in the non-pathogenic Cryptococci. In addition, while several mating-associated genes can be found in the MAT loci of both groups, the majority of genes within the MAT locus of the Trichosporonales is not found in the MAT locus in the Tremellales, consistent with independent fusion events in the two lineages. This hypothesis is also supported by the combination of the SXI1/SXI2 genes and the STE3a/α alleles in the A1 and A2 alleles of the Trichosporonales, which is different from those of the MATa and MATα alleles of the pathogenic Cryptococci. A model is proposed in Fig 5 to explain the current situation in the Tremellomycetes, where recruitment of several mating-associated genes into the P/R loci of a tetrapolar ancestor occurred first as these genes can also be found within the P/R loci of extant Tremellales with unlinked MAT loci [29–31]. After the split of the Tremellales and Trichosporonales lineages, fusion of the HD and P/R loci as well as the loss of one HD gene per MAT allele occurred in the ancestor of the Trichosporonales, whereas within the Tremellales this happened only in the ancestor of the pathogenic Cryptococci.
An independent fusion might suggest that similar selective pressures were acting to result in similar evolutionary trajectories in the Trichosporonales and pathogenic Cryptococci. In the latter group, it has been hypothesized that the pathogenic lifestyle might make it difficult to find a mating partner while associated with a host. In this case, linkage between the two MAT loci is expected to be favored by selection, as this increases the odds of compatibility between the gametes derived from a single diploid zygote [4, 7, 8, 17, 25]. Extant members of the Trichosporonales can be associated with hosts as commensals or pathogens, but this lineage also includes many soil- or water-associated saprobes [36, 37]. It is possible that the ancestor in which MAT fusion occurred was associated with a host, and that the fused MAT loci remained stable during evolution of species with saprobic lifestyles. However, a scarcity of compatible mating partners might also occur under other conditions, e.g. if a population is derived from few progenitor cells that propagated through mitotic cell divisions during favorable conditions and switched to sexual development when nutrients were depleted.
In addition to the fused MAT loci, the presence of only one HD gene per MAT allele in all examined Trichosporonales is similar to pathogenic Cryptococci. This was noted in the initial analysis of genome sequences for two Trichosporonales strains now designated MAT A2 [43], and the current study shows that this is a conserved feature throughout the Trichosporonales. Outside of the Tremellomycetes, the majority of analyzed basidiomycete species harbor at least two HD genes per MAT allele, one HD1 and one HD2 gene, which are not compatible with each other, irrespective of whether the mating system is tetrapolar or bipolar [15, 17]. An exception is the genus Wallemia, where a fused MAT exists with only one HD gene, but until now, only one MAT allele harboring a SXI1 homolog is known in this genus [17, 73]. Functionally, two compatible HD1 and HD2 genes from different MAT alleles are necessary and sufficient for sexual development not only in C. neoformans, where only two genes are present after mating [74, 75], but also in U. maydis and Coprinopsis cinerea, where two (U. maydis) or more (C. cinerea) compatible HD1/HD2 combinations can be present after mating [76–78]. The presence of multiple HD gene paralogs or increased HD allele diversity within a species is advantageous under outcrossing, because it allows more frequent dikaryon viability after mating. However, if a species is predominantly selfing, a single MAT locus with two alleles will provide the highest percentage of compatibility between gametes from a single tetrad [4, 15, 17]. In such cases, the loss of one HD gene per MAT locus should not be a problem from either a functional or evolutionary point of view, and thus this genomic configuration might be observed in other species with fused MAT loci, unless the HD genes have other functions unrelated to mating.
One common feature of fungal MAT loci and sex chromosomes in other eukaryotes is the recruitment of sex-associated genes. The presence of the STE11 and STE20 within the core MAT loci of Trichosporonales and Tremellales suggests an ancient linkage of these genes to a MAT locus. In C. neoformans, STE20 is required for the formation of proper heterokaryotic filaments and basidia after mating, and STE11α is required for mating and filament formation [79, 80]. STE20 displays MAT-specific alleles [29, 30] and is also located within the P/R locus of other basidiomycete species, e.g. Leucosporidium scottii and red yeasts in the Pucciniomycotina [16].
Genomic signatures that distinguish the MAT loci from surrounding genomic regions in Trichosporonales
Another common feature of fungal MAT loci and the sex-determining regions in other eukaryotic groups is that recombination is usually suppressed in these regions. In several fungi, suppressed recombination is observed not only within the MAT locus itself, but spreading out from the MAT locus along the MAT-carrying chromosome resulting in so-called evolutionary strata of stepwise recombination suppression, similar to findings in sex-chromosomes of mammals and plants. Examples are the ascomycete Neurospora tetrasperma and the basidiomycete genus Microbotryum [7, 9, 25, 28, 81–85]. In the latter, it was recently shown that the linkage of HD and P/R loci and subsequent establishment of regions of suppressed recombination extending beyond the fused MAT loci evolved at least five times independently within the genus [7]. In contrast, the analysis of V. humicola did not yield indications for suppressed recombination beyond the core MAT region that lies between the SXI1/SXI2 and pheromone genes. Furthermore, the gene order within the MAT alleles and the HD/STE3 allele combinations in the analyzed Trichosporonales species are largely conserved, suggesting that the fusion of HD and P/R loci occurred once in a common ancestor. Regions outside of the MAT locus have in general undergone substantial rearrangements (e.g. translocations) in different species as seen by the varying genomic locations of STE12 in the genus Cutaneotrichosporon and the lack of synteny outside of the MAT region in more distantly related Trichosporonales.
In Microbotryum, it was shown that evolutionary strata ranging from mating-type-to species-specific patterns can be distinguished along chromosomes with recently linked MAT loci [7, 25]. These analyses were performed based on alleles from several strains per species and of both mating types. One challenge for the phylogenetic analysis in the Trichosporonales is that most species are currently represented by a single strain. The analysis of additional alleles from more strains from each species will be necessary to fully evaluate the phylogenetic distribution of alleles.
In the Trichosporonales, the two MAT alleles, A1 and A2, differ in general by two inversions. Because such inversions suppress single-crossover recombination in the inverted region, and in this way keep locally adapted alleles linked, they may be under positive selection within the different species. However, if the MAT fusion is relatively ancient, i.e. occurred in the last common ancestor of the Trichosporonales, an open question is why the genetic differences in the two known alleles are not more pronounced. One explanation could be that recombination within the inverted regions might still be possible via non-crossover gene conversion or double crossover [63]. This would be consistent with the pattern observed in V. humicola of lower divergence between A1 and A2 alleles within the inverted regions compared to divergence at the inversion breakpoints. However, the evolutionary consequences of double crossover and gene conversion on genetic divergence within the inverted region likely differ: the former is more likely to transfer a large segment, whereas the latter only allows small segments of DNA to be exchanged. Hence, if an inversion captures multiple locally adapted loci, gene conversion may more easily erase genetic divergence while double crossovers are more likely to result in maladaptive associations. In C. neoformans, gene conversion occurs in a GC-rich intergenic region within the MAT locus and was proposed as a mechanism for maintaining functionality of those genes within the MAT locus that are essential [35]. Gene conversion has also been observed in the regions with suppressed recombination of the mating-type chromosomes of N. tetrasperma, in the mating type locus of the green algae Chlamydomonas reinhardtii, and in sex chromosomes of animals [86–92].
Gene conversion might be associated with an increase in the GC content as it has been observed in a number of eukaryotes that gene conversion can be biased towards GC [93–95], which would not be compatible with the observed lower GC content in the MAT region of Trichosporonales. However, a recent analysis of two fungal species found no evidence of a GC-bias in gene conversion [96], and therefore at present gene conversion as an explanation for low divergence between MAT alleles in Trichosporonales remains possible. It is tempting to speculate that the lower GC content observed in the MAT region of Trichosporonales is connected to the suppression of recombination or the (yet unknown) mechanisms that lead to the low levels of divergence between the MAT alleles. In C. neoformans, two regions of higher GC content outside of the MAT locus are associated with increased recombination [97]. A correlation between higher GC content and increased recombination has been demonstrated in humans, although it is not clear in this case what is cause and what is effect, whereas in the yeast Saccharomyces cerevisiae, a correlation was described at the kilobase range, but no long-range correlation was found [98–100].
Another hypothesis to explain the low levels of divergence between the MAT alleles could be same mating-type mating (“unisexual reproduction”, i.e. sexual reproduction without the need of a partner with a different mating type), which occurs in several pathogenic Cryptococci. Similar to reproduction after fusion of MATa and MATα cells, “unisexual reproduction” also entails diploidization and meiosis including genetic exchange [101–103]. Diploidization can occur without cell fusion, e.g. through endoreplication, or through fusion of two cells carrying the same MAT allele [104]. In the latter case, this would allow recombination within the MAT region, which could prevent degeneration of the MAT locus despite suppressed recombination when paired with a different MAT allele. Recombination within the MAT locus during “unisexual reproduction” was shown for C. neoformans [105]. So far, no sexual cycle has been observed in the Trichosporonales, and therefore it is not known what forms of sexual reproduction occur in this group. This is an important open question that requires detailed investigation in future studies.
Conclusions
In summary, we have shown that all analyzed Trichosporonales species contain fused mating type loci with a single HD gene. The two known MAT alleles differ by two inversions, but each allele is relatively stable across different species, making it more likely that the fusion of the MAT loci occurred in the last common ancestor of the Trichosporonales. Thus, this would be the most ancient fusion of MAT loci observed to date. The apparent evolutionary stability of the alleles A1 and A2 through many speciation events (even though neighboring regions underwent significant recombination) suggests strong selective pressure operates to maintain integrity of the alleles, probably due to retaining the advantages of fused MAT loci with few alleles. Another possible explanation might be that the observed combinations (STE3α and SXI2 in the A1, and STE3a and SXI1 in the A2 allele) have a higher fitness than the alternative combinations. However, the presence of the A1* and A2* alleles in several Trichosporonales species shows that these combinations also occur, making this hypothesis less likely. Mechanistically, a combination of gene conversion during meiosis after fusion of cells with different mating types to account for the limited suppression of recombination, and meiotic recombination by crossing over during “unisexual reproduction” might explain the apparent evolutionary stability of the MAT region while maintaining (at least) two distinct alleles.
Based on the phylogenetic distribution, gene content, and allele combinations in fused MAT loci in pathogenic Cryptococci of the sister order Tremellales, fusion of MAT loci as well as the loss of one HD gene per MAT locus occurred independently in the Trichosporonales and Tremellales. Thus, our data support a model of convergent evolution of the MAT locus at the molecular level in two different Tremellomycetes orders, similar to patterns observed in other fungal groups [1, 7, 21, 22]. Future analyses of additional Tremellomycetes groups that have not yet been analyzed with respect to their MAT loci will allow tests of the hypothesis that this evolutionary route has been repeated independently in other cases. It will be especially interesting to analyze earlier-branching sister lineages to the Tremellales and Trichosporonales with respect to their MAT loci configurations and the number of HD genes per MAT locus. Other important questions are the frequency and mechanisms of recombination in the region between the fused HD and P/R loci. These questions can be addressed once Trichosporonales species have been identified that undergo sexual reproduction in the laboratory, so that the progeny of genetic crosses can be analyzed.
Materials and Methods
Strains and growth conditions
C. oleaginosum and V. humicola strains used in this study are given in Table 2. Strains were grown on YPD medium at 25°C on solid medium or in liquid culture with shaking (250 rpm) for preparation of nucleic acids.
CHEF (contour-clamped homogeneous electric field) electrophoresis analysis of the C. oleaginosum isolates
CHEF plugs of the C. oleaginosum isolates were prepared and the electrophoresis was carried out as described in previous studies with slight modification [29, 31]. Specifically, the plugs were run using two different sets of switching time to get better resolution of the larger (S4A Fig top, 20 – 30 minutes linear ramp switching time) and smaller (S4A Fig bottom, 120 – 360 seconds linear ramp switching time) chromosomes, respectively.
Genotyping and Analysis of mating type genes in V. humicola strains
A collection of V. humicola isolates were genetically screened by modified RAPD using primer 5’-CGTGCAAGGGAGCACC-3’ with 48°C as annealing temperature (S1A Fig). The V. humicola strains showing identical genotypic profile as the type strain CBS571 (Table 2) were further analyzed for the presence of the SXI1, SXI2, STE3, and MYO2 genes using PCR with oligonucleotides given in S3 Table. PCR fragments were either sequenced directly or cloned into pDrive (Qiagen, Hilden, Germany) and sequenced.
Genome sequencing, assembly, and annotation of V. humicola CBS4282 and C. oleaginosum ATCC20508
Genomic DNA samples were extracted using a modified CTAB protocol as previously reported [31, 106]. Specifically, to enrich samples with high molecular weight, after precipitation, the genomic DNA was picked out from the solution instead of spun down, and the samples were checked by CHEF for their sizes and integrity, following manufacturer’s protocol (BioRad, Hercules, CA, USA). Sequencing of the ATCC20508 and CBS4282 genomes was carried out at the Sequencing and Genomic Technologies Core Facility of the Duke Center for Genomic and Computational Biology, using large insert library (15-20 kb) and PacBio Sequel (2.0 chemistry) for ATCC20508, and generating 151 nt paired-end Illumina reads for CBS4282. The PacBio sequence reads for C. oleaginosum ATCC20508 were assembled using the HGAP4 assembly pipeline based on the Falcon assembler [107] included in the SMRT Link v5.0.1 software by Pacific Biosciences. The Illumina sequence reads of V. humicola CBS4282 were trimmed using Trimmomatic (v0.36) [108] with the following parameters to remove adapter contaminations: ILLUMINACLIP:TruSeq3-PE-2. fa:2:30:10:1:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:40. Trimmed reads were assembled with SPAdes (v3.11.1) [109], and contig sequences were improved using pilon [110] based on the Illumina reads mapped to the assembly using Bowtie2 (v2.2.6) [111]. k-mer frequencies were analyzed based on the CBS4282 Illumina reads as described previously [112, 113]. Gene models for the newly sequenced strains as well as for strains for which genome assemblies, but not annotation was available from GenBank were predicted ab initio using MAKER (v2.31.18) [114] with predicted proteins from C. oleaginosum as input [43].
Analysis of mating type regions, synonymous divergence, and repeat content
MAT regions in Trichosporonales genomes were identified by BLAST searches [115] against the well-annotated MAT-derived proteins from C. neoformans [34], and manually reannotated if necessary. The short and not well conserved pheromone precursor genes were not among the predicted genes, and were identified within the genome assemblies using custom-made Perl scripts searching for the consensus sequence M-X(15-60)-C-[ILMVST]-[ILMVST]-X-Stop.
Synteny analysis of the genomes of V. humicola strains and C. oleaginosum strains was done with nucmer from the MUMmer package (v3.23) [116]. Synteny plots were drawn with Circos [117]. Synteny between MAT regions of different Trichosporonales species was based on bidirectional BLAST analyses of the corresponding predicted proteins. For the dS plots, alleles in V. humicola strains were identified by two-directional BLAST analysis, and MUSCLE (v3.8.31) [118] was used to align the two alleles per gene per strain pair. Synonymous divergence and standard errors were estimated with the yn00 program of the PAML package (v4.9) [119]. Analysis of transposable elements and other repeats in V. humicola genomes was performed with RepeatMasker (Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. 2013-2015, http://www.repeatmasker.org) based on the RepbaseUpdate library [120] and a library of de novo-identified repeat consensus sequences for each strain that was generated by RepeatModeler (Smit AFA, Hubley R. RepeatModeler Open-1.0. 2008-2015, http://www.repeatmasker.org/RepeatModeler/) as described [113].
Linear synteny comparison along the MAT-containing scaffolds (S10 Fig) was generated with Easyfig [121] using a minimum length of 500 bp for BLASTN hits to be drawn.
Species tree and gene genealogies
Of the 24 Trichosporonales species for which genome assemblies were available we selected only 21 for phylogenetic analysis as the remaining three species (Trichosporon coremiiforme, Trichosporon ovoides and Cutaneotrichosporon mucoides) were shown to be hybrids in previous studies [49, 50]. Additionally, we selected other well-studied bipolar and tetrapolar representatives belonging to the three major Basidiomycota lineages, plus two ascomycetes as outgroup. To reconstruct the phylogenetic relationships among the selected members, the translated gene models of each species were clustered by a combination of the bidirectional best-hit (BDBH), COGtriangles (v2.1), and OrthoMCL (v1.4) algorithms implemented in the GET_HOMOLOGUES software package [122] to construct homologous gene families. The Cryptococcus neoformans H99 protein set was used as reference and clusters containing inparalogs (i.e. recent paralogs defined as sequences with best hits in its own genome) were excluded. A consensus set of 32 protein sequences was computed out of the intersection of the orthologous gene families obtained by the three clustering algorithms. Protein sequences were individually aligned with MAFFT v7.310 [123] using the L-INS-i strategy and poorly aligned regions were trimmed with TrimAl (-gappyout). The resulting alignments were concatenated to obtain a final supermatrix consisting of a total of 21,690 amino acid sites. We inferred a maximum-likelihood phylogeny using the LG+F+R5 model of amino acid substitution in IQ-TREE v1.6.5 [124]. Branch support values were obtained from 10,000 replicates of both ultrafast bootstrap approximation (UFBoot) [125] and the nonparametric variant of the approximate likelihood ratio test (SH-aLRT) [126].
For phylogenetic analysis of selected genes within and outside the MAT region in Trichosporonales, protein alignments were generated and trimmed as above and subsequently used to infer maximum likelihood phylogenies in IQ-TREE. Consensus trees were graphically visualized with iTOL v4.3.3 [127].
Divergence time analysis
Divergence times were estimated in MEGA-X [128] using the RelTime approach [129]. Contrary to other methods, RelTime does not require assuming a specific model for lineage rate variation and was shown to be as accurate as other approaches using relaxed and strict molecular clock models [130]. The reconstructed species tree, with branch lengths in the units of number of substitutions per site, was used as input and transformed into an ultrametric tree with relative times. The final timetree was obtained by converting the relative node ages into absolute dates by using three calibration constraints: 0.42 million year (MY) corresponding to the divergence between Microbotryum lychnidis-dioicae and Microbotryum silenes-dioicae [131]; 41 MY for the Ustilago - Sporisorium split; and 413 MY representing the minimum age of Basidiomycota. The latter two calibration points were obtained from the Timetree website (http://www.timetree.org/), which should be referred to for additional information and references.
Data availability statement
The V. humicola CBS4282 genome sequence (BioProject PRJNA475686) has been deposited at DDBJ/ENA/GenBank under the accession QKWK00000000. The version described here is version QKWK01000000. Illumina reads have been deposited in the NCBI SRA database under accession SRP150316. The C. oleaginosum ATCC20508 genome sequence (BioProject PRJNA475739) has been deposited at DDBJ/ENA/GenBank under the accession QKWL00000000. The version described here is version QKWL01000000. Pacific Biosciences reads have been deposited in the NCBI SRA database under accession SRP150334.
Acknowledgements
MN would like to thank Swenja Ellßel and Silke Nimtz for excellent technical assistance, and Ulrich Kück and Christopher Grefen for support at the Botany Department of the Ruhr-University Bochum.
References
- 1.↵
- 2.
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.
- 43.↵
- 44.↵
- 45.↵
- 46.
- 47.
- 48.
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.
- 56.
- 57.↵
- 58.↵
- 59.
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.
- 66.
- 67.
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.
- 83.
- 84.
- 85.↵
- 86.↵
- 87.
- 88.
- 89.
- 90.
- 91.
- 92.↵
- 93.↵
- 94.
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.
- 100.↵
- 101.↵
- 102.
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵
- 131.↵
- 132.
- 133.↵