Abstract
Allopolyploidization, genome duplication through interspecific hybridization, is an important evolutionary mechanism that can enable organisms to adapt to environmental changes or stresses. The increased adaptive potential of allopolyploids can be particularly relevant for plant pathogens in their ongoing quest for host immune response evasion. To this end, plant pathogens secrete a plethora of molecules that enable host colonization. Allodiploidization has resulted in the new plant pathogen Verticillium longisporum that infects different hosts than haploid Verticillium species. To reveal the impact of allodiploidization on plant pathogen evolution, we studied the genome and transcriptome dynamics of V. longisporum using next-generation sequencing. V. longisporum genome evolution is characterized by extensive chromosomal rearrangements, between as well as within parental chromosome sets, leading to a mosaic genome structure. In comparison to haploid Verticillium species, V. longisporum genes display stronger signs of positive selection. The expression patterns of the two sub-genomes show remarkable resemblance, suggesting that the parental gene expression patterns homogenized upon hybridization. Moreover, whereas V. longisporum genes encoding secreted proteins frequently display differential expression between the parental sub-genomes in culture medium, expression patterns homogenize upon plant colonization. Collectively, our results illustrate of the adaptive potential of allodiploidy mediated by the interplay of two sub-genomes.
Author summary Hybridization followed by whole-genome duplication, so-called allopolyploidization, provides genomic flexibility that is beneficial for survival under stressful conditions or invasiveness into new habitats. Allopolyploidization has mainly been studied in plants, but also occurs in other organisms, including fungi. Verticillium longisporum, an emerging fungal pathogen on brassicaceous plants, arose by allodiploidization between two Verticillium spp. We used comparative genomics to reveal the plastic nature of the V. longisporum genomes, showing that parental chromosome sets recombined extensively, resulting in a mosaic genome pattern. Furthermore, we show that non-synonymous substitutions frequently occurred in V. longisporum. Moreover, we reveal that expression patterns of genes encoding secreted proteins homogenized between the V. longisporum sub-genomes upon plant colonization. In conclusion, our results illustrate the large adaptive potential upon genome hybridization for fungi mediated by genomic plasticity and interaction between sub-genomes.
Introduction
Cycles of polyploidization are hallmarks of eukaryotic genome evolution, where an initial increase of ploidy is commonly followed by reversions to original ploidy states [1]. For instance, all angiosperm plants share two rounds of ancient polyploidy [2]. The prevalence of polyploidy events in eukaryotic evolution is likely due to the high evolutionary potential of polyploids, as additional chromosome sets give leeway to functional diversification [3]. Consequently, polyploidy events are often followed by adaptive radiation and are dated near the base of species-rich clades in phylogenies [4,5]. In addition, polyploidy has been associated with increased invasiveness [6] and resistance to environmental stresses [7]. For instance, numerous plant species that survived the Cretaceous-Palaeogene mass extinction 66 million years ago underwent a polyploidization event which is thought to have contributed to their increased survival rates [8,9]. Polyploids may originate from the same species, i.e. autopolyploidization, or from different species as a result of interspecific hybridization, i.e. allopolyploidization. In general, allopolyploids are believed to have a higher adaptive potential than autopolyploids due to the combination of novel or diverged genes from distinct parental species [3].
The impact of allopolyploidization has mainly been investigated in plants, as approximately a tenth of all plant species consists of allopolyploids [10]. In contrast, allopolyploidization in fungi is far less intensively investigated [11]. Nonetheless, allopolyploidization impacted the evolution of numerous fungal species, including the economically important baker’s yeast Saccharomyces cerevisiae [12]. The increased adaptive potential enabled allopolyploid fungi to develop desirable traits that can be exploited in industrial bioprocessing [13]. For instance, at least two recent hybridization events between S. cerevisiae and its close relative Saccharomyces eubayanus gave rise to Saccharomyces pastorianus, a species with high cold tolerance and good maltose/maltotriose utilization capabilities, which is exploited in the production of lager beer that requires barley to be malted at low temperatures [14].
Allopolyploid genomes experience a so-called “genome shock” upon hybridization, inciting major genomic reorganizations that can manifest by genome rearrangements, extensive gene loss, transposon activation, and alterations in gene expression [15]. These early stage alterations are primordial for hybrid survival, as divergent evolution is principally associated with incompatibilities between the parental genomes [16]. Additionally, these initial re-organizations and further alterations in the aftermath of hybridization also provide a source for environmental adaptation [3]. Frequently, heterozygosity is lost for many regions in the allopolyploid genome [17]. This can be a result of the direct loss of a homolog of different parental origin (i.e. a homeolog) through deletion or gene conversion whereby one of the copies substitutes its homeologous counterpart [18]. Gene conversion and the homogenization of complete chromosomes played a pivotal role in the evolution of the osmotolerant yeast species Pichia sorbitophila [19]. In total, two of its seven chromosome pairs consist of partly heterozygous, partly homozygous sections, whereas two chromosome pairs are completely homozygous. Gene conversion may eventually result in chromosomes consisting of sections of both parental origins as “mosaic genomes” [20]. However, mosaic genomes can also arise through recombination between chromosomes of the different parents, such as in the hybrid yeast Zygosaccharomyces parabailii [21].
The redundancy of having one or more additional homeolog copies for most genes facilitates functional diversification in allopolyploids [22]. Consequently, accelerated gene evolution is generally observed upon allopolyploidization [22,23]. Nevertheless, recent allopolyploidization events in the fungal genus Trichosporon resulted in a general deceleration of gene evolution [24]. Thus, allopolyploidization is not always followed by accelerated gene evolution. Arguably, the environmental exposure upon allopolyploidization plays an pivotal role in the eventual speed and grade of gene diversification [25].
Allopolyploidization typically entails gene expression pattern alterations when parental genomes that evolved distinct transcriptional regulation merge [26]. Moreover, crosstalk between parental sub-genomes can lead to further gene expression alterations, leading to the emergence of novel expression patterns [27–29]. Nevertheless, in general, expression patterns are often conservatively inherited upon hybridization as the majority of allopolyploid genes are expressed similarly to their parental orthologs [30]. For instance, more than half of the genes in an allopolyploid strain of the fungal grass endophyte Epichloë retained their parental gene expression level [31].
Plant pathogens are often thought to evolve while being engaged in arms races with their hosts with pathogens evolving to evade host immunity while plant hosts attempt to intercept pathogen ingress [32,33]. Due to the increased adaptation potential, allopolyploidization has been proposed as a potent driver in pathogen evolution [34]. Allopolyploids often have different pathogenic traits than their parental lineages, such as higher virulence [35,36] and altered host ranges [37,38]. Within the fungal genus Verticillium, allodiploidization resulted in the emergence of the species Verticillium longisporum [37,39]. V. longisporum infects brassicaceous plants, whereas other Verticillium spp. generally do not typically colonize plants of this family, with the exception of Arabidopsis thaliana [37,40–42]. Similar to haploid Verticillium spp., V. longisporum is thought to have predominant asexual reproduction as a sexual cycle has never been described and populations are not outcrossing [39,43]. V. longisporum is sub-divided into three lineages, each representing a separate hybridization event [37]. The economically most important lineage A1/D1 originates from hybridization between Verticillium species A1 and D1 that have hitherto not been found in their haploid states. V. longisporum lineage A1/D1 is the main causal agent of Verticillium stem striping on oilseed rape and a worldwide emerging pathogen [44,45]. Lineage A1/D1 can be further divided into two genetically distinct populations, which have been named ‘A1/D1 West’ and ‘A1/D1 East’ after their relative geographic occurrence in Europe [39]. Nevertheless, both populations originate from the same hybridization event [39]. Conceivably, upon or subsequent to the hybridization event, V. longisporum encountered extensive genetic and transcriptomic alterations that might have facilitated a shift towards brassicaceous hosts. Here, we studied the impact of allodiploidization on the evolution of V. longisporum by investigating genome, gene and transcriptomic plasticity.
Results
V. longisporum displays a mosaic genome structure
The genomes of two V. longisporum strains were analysed to investigate the impact of hybridization on the genome structure. Previously, V. longisporum strains VLB2 and VL20, belonging to ‘A1/D1 West’ and ‘A1/D1 East’, were sequenced with the PacBio RSII platform and assembled into genomes of 72.9 and 72.3 Mb in size, respectively [39]. These genome sizes exceed double the amount of the telomere-to-telomere sequenced V. dahliae strains JR2 (36.2 Mb) and VdLs17 (36.0 Mb) [46]. We used RepeatModeler (V1.0.8; Smit et al. 2015) in combination with RepeatMasker to determine that 14.28 and 13.90% of the V. longisporum strain VLB2 and VL20 genomes are composed of repeats, respectively (S1 Table). Intriguingly, this is more than double the repeat content as in V. dahliae strain JR2, for which 6.49% of the genome was annotated as repeat using the same methodology. The V. longisporum genomes were also screened for the fungal telomere-specific repeats (TAACCC/GGGTTA) to estimate the number of chromosomes. In total, 29 and 30 telomeric regions were found in the VLB2 and VL20 genomes, respectively, that were consistently situated at the end of sequence contigs, suggesting that V. longisporum contains at least 15 chromosomes (S1 Table). Six out of 45 and 4 out of 44 sequence contigs in strains VLB2 and VL20, respectively, were flanked on both ends by telomeric repeats and therefore likely represent complete chromosomes (S1 Table). For comparison, V. dahliae strains have 8 chromosomes [46].
In allodiploid organisms, parental origin determination is elementary to investigate genome evolution in the aftermath of hybridization. As species D1 is phylogenetically closer related, and consequently has a higher sequence identity, to V. dahliae than species A1, V. longisporum genomic regions were previously provisionally assigned to either species D1 or A1 [39]. Here, we determined the parental origin of V. longisporum genomic regions more precisely. The difference in phylogenetic distance of species A1 and D1 to V. dahliae caused that V. longisporum genome alignments to V. dahliae displayed a bimodal distribution with one peak at 93.1% and another peak at 98.4% sequence identity that represent the two parents with a dip at 96.0% (S1 Fig). In order to separate the two sub-genomes, regions with an average sequence identity to V. dahliae of <96% were assigned to species A1, whereas regions with an identity of ≥96% were assigned to species D1 (Fig 1). In this manner, 36.2 Mb of V. longisporum strain VLB2 was assigned to species A1 and 35.7 Mb to species D1. For V. longisporum strain VL20, 36.3 Mb was assigned to species A1 and 35.2 Mb to species D1. Only 1.0 and 0.8 Mb of strains VLB2 and VL20, respectively, could not be aligned to V. dahliae and thus remained unassigned.
To trace the chromosome sets of the original parents of the hybrid, the parental origin of individual contigs was determined. In total, 8 of the 10 largest contigs of V. longisporum strain VLB2 as well as strain VL20 consist of regions originating from both species A1 and species D1 (Fig 1). Thus, parental chromosome sets cannot be separated from one another as V. longisporum apparently evolved a mosaic genome structure in the aftermath of hybridization.
Genomic rearrangements are responsible for the mosaic genome
Typically, a mosaic structure of a hybrid genome can originate from gene conversion or from chromosomal rearrangements between DNA strands of different parental origin [17]. To analyse the extent of gene conversion, genes were predicted for the V. longisporum strains VLB2 and VL20. To aid gene annotation with the BRAKER1 1.9 pipeline [47], ∼2 Gb of filtered RNA-seq reads were generated from fungal cultures in liquid medium. In total, 19,123 and 18,784 genes were predicted for V. longisporum strains VLB2 and VL20 respectively, which is ∼90% higher than the amount of genes that were predicted for V. dahliae strain JR2 using the same approach (9,909 genes) (S1 Table). As to be expected, the divergence of species A1 and D1 could also been observed at the gene level based on sequence identity and GC-content (Fig 1; S1 Fig). In total, 9,531 and 9,402 genes were assigned to the species A1 sub-genome of the strains VLB2 and VL20, respectively, whereas the number of genes in the species D1 sub-genomes was 9,468 and 9,243 for these strains, respectively (S1 Table). Thus, the amount of genes is similar in the two sub-genomes for both V. longisporum strains and similar to the gene number identified in V. dahliae strain JR2. Over 80% of the V. longisporum genes are present in two copies whereas almost all genes (97-98%) are present in one copy within each of the V. longisporum sub-genomes (S2 Fig). Moreover, of the 7,620 genes that are present in two copies in VLB2 and VL20, only 5 genes were found to be highly similar (<1 % nucleotide sequence diversity) in VLB2, whereas the corresponding gene pair in VL20 was more diverse (>1%) (Fig 2A). In V. longisporum strain VL20, no highly similar copies were found that are more divergent in VLB2. Collectively, these findings indicate that the two copies of most genes present in the V. longisporum are homeologs and that gene conversion only played a minor role during evolution of the mosaic genome.
Considering that gene conversion played a minor role during genome evolution, the mosaic genome structure of V. longisporum likely originated from rearrangements between homeologous chromosomes. To identify the location of genomic rearrangements, the genome of V. longisporum strain VLB2 was aligned to that of strain VL20 (Fig 2B). Extensive chromosomal rearrangements occurred between the two V. longisporum strains, as we observed 87 putative syntenic breaks. In order to confirm these breaks, long sequencing reads of VLB2 were aligned to the VL20 genome assembly to assess if synteny breaks were supported by read mapping (S3 Fig). In total, 60 synteny breaks could be confirmed by read mapping. As genomic rearrangements are often associated with repeat-rich genome regions [48], the synteny break points were tested for their association to these regions. In total, 34 of the 60 (57%) confirmed synteny break points were flanked by repeats, which is significantly more than expected from random sampling (mean = 18.5%, σ = 0.05%) (S4 Fig). In conclusion, it appears that chromosomal rearrangement, rather than gene conversion, is the main driver underlying the mosaic structure of the V. longisporum genome.
V. longisporum lost heterozygosity through deletions
In each of the V. longisporum isolates, 17% of the genes occur only in a single copy. Although gene conversion played a minor role in the aftermath of hybridization, loss of heterozygosity may occur through gene loss or, alternatively, single-copy genes may originate from parent-specific contributions. However, as 12% of the singly copy genes in strain VLB2 are present in two copies in strain VL20, and 16% of the single copy genes in VL20 are present in two copies in VLB2, gene deletion seems to be an on-going process in V. longisporum evolution since both strains are derived from the same hybridization event [39]. Of the genes that were lost in the VLB2 divergence from VL20, 52% resided in the species A1 sub-genome, whereas 47% in the D1 sub-genome (1% remained unassigned). For VL20, 49% and 50% of these lost genes resided in the A1 and D1 sub-genome, respectively. Thus, gene loss occurs evenly across the two sub-genomes. We next determined the fraction of lost genes that encode secreted proteins as pathogen secretomes play pivotal roles in establishing symbioses with plant hosts [49]. In total, 12.9% and 8.7% of the genes that encode secreted proteins were lost in the divergence of VLB2 and VL20, respectively. This is a significant enrichment for strain VLB2, where genome-wide 8.9% of the genes encode secreted proteins (Fisher’s exact test, P = 0.009) (S1 Table). Nevertheless, this enrichment is not found for strain VL20 of which 9.0% of the genes encode secreted proteins (Fisher’s exact test, P = 0.45) (S1 Table). Nonetheless, in general, V. longisporum strains VLB2 and VL20 contain 1.91 and 1.90 times the number of genes encoding secreted proteins compared to V. dahliae JR2 with similar contributions of the species A1 and D1 sub-genomes (S1 Table). This indicates that, although gene loss occurs, it hitherto impacted the secretome of V. longisporum only to a limited extent.
Global acceleration of gene evolution upon allodiploidization
To investigate the evolution of genes subsequent to the allodiploidization event, we determined their rates of non-synonymous (Ka) and synonymous (Ks) substitutions. Substitution rates were determined for so-called best-reciprocal orthologs, which are genes that are present in a single copy in all Verticillium species (Fig 3A) and thus, sub-genomes A1 and D1 of V. longisporum were considered separately. In total, 5,342 and 5,369 orthologous groups could be constructed using V. longisporum strain VLB2 and VL20, respectively. Consequently, for every orthologous group, Ka/Ks ratios were determined for every branch of the Verticillium phylogeny leading to an extant species and consequently compared with the Ka/Ks ratio obtained for the V. dahliae branch. In general, genes in clade Flavexudans spp. displayed higher Ka/Ks ratios than clade Flavnonexudans spp. (Fig 3; S5 Fig). V. nubilum and V. albo-atrum genes displayed the lowest Ka/Ks ratios of all Verticillium spp., which correlates with their relatively long evolutionary history without divergence of known sister species (Fig 3; S5 Fig). Of all Verticillium spp., V. longisporum sub-genome D1 was the only species of which genes displayed significantly higher Ka/Ks ratios than V. dahliae (Wilcoxon rank-sum test, P = 9.43e-12, VLB2 based). Thus, genes of the V. longisporum sub-genome D1 generally evolve faster than genes of other haploid Verticillium spp. In contrast, genes of the other V. longisporum sub-genome, A1, generally displayed lower Ka/Ks ratios than V. dahliae orthologs (Wilcoxon rank-sum test, P < 2.2e-16, VLB2 based). However, the absence of a phylogenetically closely related A1 sister species hampers the determination of putative differences in gene diversification rate before and after hybridization as Ka/Ks ratios can vary considerably between genes of haploid Verticillium spp. For instance, V. longisporum sub-genome A1 displays higher evolutionary speed, expressed by Ka/Ks, than V. alfalfae (Wilcoxon rank-sum test, P = 3.94e-14, VLB2 based) and a similar evolutionary speed to V. nonalfalfae (Wilcoxon rank-sum test, P = 0.09, VLB2 based), species that have the same last common ancestor with A1 as V. dahliae (Fig 3A).
To find evidence for accelerated evolution in both V. longisporum sub-genomes, we determined the number of genes under positive selection in every Verticillium (sub-)genome using the above formed orthologous groups. Genes under positive selection were determined based on a Z-test and varied considerably from 209 in the V. longisporum sub-genome A1 to 3 in V. tricorpus (S6 Fig). Intriguingly, the V. longisporum sub-genomes A1 and D1 have the highest number of genes under positive selection (128 genes for species D1). The genomes V. dahliae and V. zaregansianum contain a considerable number of genes under positive selection; 103 and 99 respectively. To investigate whether particular functional gene properties are associated with genes under positive selection, the fractions of genes that encode secreted proteins were determined (S6 Fig). These fractions were not higher in the V.longisporum sub-genomes than in V. dahliae and V. zaregansianum. Furthermore, no major differences in Clusters of Orthologous Group (COG) functional categories could be observed for genes under positive selection between sub-genome A1, sub-genome D1, V. dahliae and V. zaregansianum (S7 Fig). Thus, we conclude that V. longisporum genes globally diverge faster than genes of related haploid Verticillium spp.
Expression pattern homogenization in the hybridization aftermath
To investigate the impact of allodiploidization on gene expression patterns, the expression of V. longisporum genes was compared with V. dahliae orthologs of isolates grown in culture medium. To this end, expression of single copy V. dahliae genes was compared with V. longisporum orthologs that are present in two copies: one in the species A1 sub-genome and one in the species D1 sub-genome. In total, 7,469 and 7,411 of these expressed gene clusters were found for V. longisporum strain VLB2 and VL20, respectively. Reads were mapped to the predicted V. longisporum genes of which 51% and 50% mapped to species A1 homeologs and 49% and 50% to the species D1 homeologs, for strains VLB2 and VL20, respectively, and thus we observed no general dominance in expression for one of the sub-genomes. Over 60% of the V. dahliae genes are not differentially expressed compared to A1 and D1 orthologs, indicating that the majority of the genes did not evolve differential expression patterns (Fig 4). In total, 29.6% and 24.2% of the genes are differently expressed compared to V. dahliae orthologs in the species A1 and D1 sub-genomes, respectively (Fig 4, VLB2 based). The significantly higher fraction of differentially expressed A1 genes (Fisher’s exact test, P = 4.3e-13) corresponds to the more distant phylogenetic relationship of A1 with V. dahliae than of D1. Intriguingly, however, significantly less D1 genes (18.5%, VLB2 based) were differently expressed to A1 homeologs than to V. dahliae orthologs despite the larger phylogenetic distance between Verticillium species A1 and D1 (Fisher’s exact test, P < 2.2e-16). In general, the expression pattern of V. longisporum A1 and D1 sub-genomes are more similar to each other (ρ = 0.90 for VLB2) than the expression pattern of sub-genome D1 and V. dahliae (ρ = 0.86 for VLB2) (Fig 5; S2 Table). This discrepancy in phylogenetic relationship and expression pattern similarities may indicate that the expression patterns of species A1 and D1 homogenized upon hybridization. Moreover, for V. longisporum strains VLB2 and VL20, homeolog expression patterns within the same strain (ρ = 0.90 for VLB2) correlated more than A1 and D1 expression patterns of different strains (ρ = 0.88 for VLB2 A1 and VL20 D1) (Fig 5; S2 Table). Thus, expression patterns of homeologs may have synchronized in the aftermath of hybridization.
In planta secretome homogenization between V. longisporum sub-genomes
To assess potential gene expression differences upon host colonization, gene expression patterns of V. longisporum and V. dahliae orthologs were also investigated in planta. To this end, oilseed rape plants were inoculated with VLB2 and VL20, respectively. As observed previously, oilseed rape plants inoculated with VLB2 developed typical Verticillium symptoms including stunted plant growth and leaf chlorosis [50]. In contrast, oilseed rape plants inoculated with VL20 did not display any disease symptoms. Accordingly, VLB2 DNA could be detected in oilseed rape stems, whereas VL20 DNA remained under the detection limit. In addition, A. thaliana plants were inoculated with V. dahliae strain JR2, and V. longisporum strains VLB2 and VL20. However, only JR2 DNA could be detected in above-ground plant material. Consequently, total RNA sequencing was performed for oilseed rape plants inoculated with V. longisporum strain VLB2 and A. thaliana plants inoculated with V. dahliae strain JR2. In total, ∼1.5 Gb of filtered RNA-seq reads were generated from the Verticillium inoculated plant material. Similar to V. longisporum grown in culture medium, 49% and 51% of the reads mapped to the A1 and D1 sub-genomes of V. longisporum. Thus, also in planta there is no expression dominance of one of the V. longisporum sub-genomes. Furthermore, 16.4% and 15.1% of the V. longisporum genes were differently expressed from V. dahliae orthologs in sub-genome A1 and D1, respectively (Fig 6). Thus, in correspondence with V. longisporum grown in culture medium, a larger fraction of A1 orthologs was differently expressed to V. dahliae orthologs than D1 orthologs (Fisher’s exact test, P = 0.04).
To elucidate a putative association of gene expression differences between V. longisporum and V. dahliae with their distinct host ranges, the fraction of differently expressed genes that encode secreted proteins was determined. For Verticillium grown in culture medium, 16.5% of the differentially expressed genes between V. longisporum and V. dahliae encode secreted proteins, whereas this is only 5.3% for genes without differential expression (Fig 6A). This enrichment of genes encoding secreted proteins was also found for isolate VL20 (S8 Fig). Correspondingly, V. longisporum genes that were differently expressed from V. dahliae orthologs were enriched for Pfam domains associated with secretion and host colonization, such as Hce2 (PF14856), a domain found in putative effectors with homology to Cladosporium fulvum effector Ecp2 (Stergiopoulos et al. 2012; S3 Table). Similarly, genes encoding secreted proteins were also significantly enriched for differentially expressed genes between V. longisporum and V. dahliae in planta (Fig 6B). However, the fraction of 9.7% was significantly less than 16.5% for Verticillium grown in culture medium (Fig 6B). Thus, despite the colonization of V. longisporum and V. dahliae on a different host species, the enrichment of genes encoding secreted proteins is lower for Verticillium grown in planta compared with culture medium.
To see how the different sub-genomes contribute to the enrichment of genes encoding secreted proteins, we compared gene expression patterns of V. longisporum sub-genomes. For V. longisporum grown in culture medium, 19.4% of the genes that are differentially expressed between the A1 and D1 homeologs encode secreted proteins, whereas this is 5.8% for homeologs with similar expression levels (Fig 7). Thus, similar to V. longisporum and V. dahliae orthologs, differentially expressed homeologs are enriched for genes that encode secreted proteins. Intriguingly, 7.9% of the genes with differential homeolog expression in planta encode secreted proteins, which is a similar fraction as for homeologs without differential expression (8.4%; Fig 7). Thus, upon plant colonization, there is no enrichment of genes that encode secreted proteins for differentially expressed homeologs. This lack of enrichment may be due to increased expression differences in planta between homeologs that encode non-secreted proteins. Alternatively, expression levels of homeologs encoding secreted proteins may homogenize in planta (Fig 7). In total, 11.2% and 9.5% of the genes that are differently regulated in planta from culture medium encode secreted proteins in sub-genome A1 and D1, respectively (Fig 7). This is a significantly larger fraction than for genes without differential expression: 7.2% and 7.9% for sub-genomes A1 and D1, respectively. In sub-genome A1, this enrichment of genes encoding secreted proteins was both present for in planta up-regulated (11.6%, Fisher’s exact test, P < 1.74e-05) and down-regulated (10.9%, Fisher’s exact test, P < 4.23e-05) genes. In contrast, in sub-genome D1, the enrichment was present for in planta up-regulated genes (12.3%, Fisher’s exact test, P < 2.27e-05), but not for down-regulated genes (7.7%, Fisher’s exact test, P = 0.91). Thus, in general, genes encoding secreted proteins underwent relatively more frequently expression alterations upon plant colonization compared to genes that encode for non-secreted proteins. Consequently, the lack of enrichment in planta is caused by an increased homogenization of homeolog expression patterns of genes encoding secreted proteins. This homogenization is illustrated by the expression pattern of genes encoding secreted proteins with a pectate lyase Pfam domain (PF03211), which is associated with host cell-wall degradation (S9 Fig) [52]. In total, 5 genes (Pect_ly_1, Pect_ly_2, Pect_ly_3, Pect_ly_4 and Pect_ly_5), with a pectate lyase domain were found with two homeologous copies in V. longisporum and all of them were predicted to be secreted. There was no differential expression between the homeologs in planta for all 5 genes, whereas in culture medium the homeologs are differentially expressed for Pect_ly_1, Pect_ly_4 and Pect_ly_5. Homogenization of homeolog expression was achieved through down-regulation for Pec_ly_1, as both homeologs were not expressed in planta. In contrast, similar levels of homeolog in planta expression were achieved for Pect_ly_4 and Pect_ly_5 by the relative increase in expression of the D1 and A1 homeolog, respectively.
DISCUSSION
Hybridization is a powerful evolutionary mechanisms often leading to the emergence of new plant pathogens with distinct pathogenic features from their parents [34,53]. We demonstrate the genomic and transcriptomic plasticity of the allodiploid V. longisporum pathogen and illustrate its potential for divergent evolution. Firstly, the plastic nature of the V. longisporum genome is displayed by its mosaic structure (Fig 1). Mosaicism in V. longisporum is not driven by homogenization that played a negligible role in the aftermath of hybridization (Fig 2A). Rather, V. longisporum mosaic genome structure is caused by extensive genomic rearrangements after hybridization (Fig 2B). Genomic rearrangements are major drivers of evolution and facilitate adaptation to novel or changing environments [48]. Genomic rearrangements are not specific to the hybrid nature of V. longisporum as other Verticillium spp. similarly encountered extensive chromosomal reshuffling [54–56]. As expected, the majority of the synteny breaks between the genomes of V. longisporum strains VLB2 and VL20 reside in repeat-rich genome regions (S4 Fig) as, due to their abundance, repetitive sequences are more likely to act as a substrate for unfaithful repair of double-strand DNA breaks [48]. Nonetheless, in V. longisporum, 43% of the synteny breaks identified are not associated with repeat-rich regions. Conceivably, the presence of two genomes also provides homeologous sequences with sufficient identity to mediate unfaithful repair. Secondly, V.longisporum genes globally display accelerated evolution in comparison to orthologs of non-hybrid Verticillium spp. This is illustrated by the increased abundance of genes under positive selection and the more divergent evolution of D1 genes in comparison to its sister species V. dahliae (Figs 3; S5 and S6 Figs). The increased rate of divergence is likely a result of having two homeologs of most genes as this redundancy gives leeway to functional diversification [57]. Previously, 29 genes in V. dahliae were determined to evolve under positive selection, whereas in this study more than three times the amount was found (S6 Fig) [54]. Here, we obtained higher numbers as a higher P-value cut-off was used (P < 0.05 and instead of P < 0.01). Moreover, we calculated positive selection based on the nucleotide substitutions along the V. dahliae species branch, whereas de Jonge et al. (2013) determined positive selection based on intraspecific substitutions, which is expected to have lower Ka/Ks ratios for genes under positive selection than when interspecific mutations are used [58]. Finally, allodiploidization resulted in transcriptomic alterations even though the majority of the V. longisporum genes was not differently expressed from V. dahliae orthologs (Fig 4). As species A1 and D1 are hitherto unfound in their haploid state, V. dahliae was used for expression pattern comparison as it only recently diverged from species D1 and therefore likely resembles D1 gene expression [37]. Unfortunately, species A1 currently lacks a known sister species that could be used in the gene expression comparison (Fig 3A). Despite the absence of the V. longisporum parents, expression patterns between the A1 and D1 sub-genomes seem to have homogenized upon hybridization as they show more resemblance than the expression pattern between V. dahliae and species D1 (Fig 5; S2 Table). V. longisporum genes that were differently expressed to V. dahliae orthologs are enriched for genes encoding secreted proteins (Fig 6; S8 Fig). However, that enrichment is higher for Verticillium grown in culture medium compared with in planta (Fig 6). This may be a consequence of the disappearance of expression differences between the V. longisporum homeologs as homeologs that encode secreted proteins homogenized their expression upon host colonization (Fig 7).
Whole-genome duplication events are usually followed by extensive gene loss, often leading to reversion to the original ploidy state [59]. However, the so-called ‘haploidization’ of V. longisporum has only proceeded to a limited extent, as 80% of the genes are present in two copies (S2 Fig), whereas the haploid V. dahliae genome contains only 1% of its genes in two copies. Thus, the V. longisporum genome displays the symptoms of a recent allodiploid, with gene loss being an on-going process that by now has only progressed marginally. However, the retention of both homeolog copies can also be evolutionary advantageous as the presence of an additional gene copy may facilitate functional diversification of V. longisporum genes in comparison to haploid Verticillium spp. (Fig 3; S5 and S6 Figs). Gene duplication is an important mechanism for plant pathogens, including V. dahliae, to evade host immunity [55,60,61]. The LS regions of V. dahliae, which are enriched for active transposable elements, are derived from segmental duplications [54,55]. However, instead of the specific duplication of LS regions, V. longisporum hybridization resulted in a whole-genome duplication likely resulting in the global acceleration of gene evolution as genes encoding secreted proteins or genes associated with niche colonization were not enriched in genes that evolve under positive selection (S6 and S7 Figs).
Expression divergence of a particular gene may evolve through mutations in regulatory sequences of the gene itself (cis effects), such as promoter elements, or alterations in other regulatory factors (trans effects), such as chromatin regulation [27,62]. Conceivably, the surprisingly higher correlation of the sub-genome D1 expression pattern with A1 than with V. dahliae may originate from the disappearance of differences in trans regulators between species A1 and D1 upon hybridization as in V. longisporum they reside in the same nuclear environment (Fig 5; S2 Table) [27]. Homogenization of expression patterns of homeologs has been similarly observed in the fungal allopolyploid Epichloë Lp1 [31]. Furthermore, the enrichment in genes encoding secreted proteins for V. longisporum genes that are differently regulated upon infection indicates their importance for infection, as secreted proteins play important roles in pathogen-host interactions (Gupta et al. 2015; Fig 7). Upon plant infection, expression of homeologs encoding secreted proteins homogenized, illustrating gene expression crosstalk between the different sub-genomes of V. longisporum (Fig 7). Ratio changes between homeolog expressions also occurred in synthetic allopolyploid Arabidopsis upon cold stress treatment [63]. Many of these ratio alterations were related to stress responses. Thus, conceivably, alterations in homeolog expression ratios facilitate the adjustment of allopolyploids to different environmental conditions.
Conclusion
Allodiploidization is an intrusive evolutionary mechanism that involves extensive alterations in genome, gene and transcriptome evolution. V. longisporum displays signatures of rapid diversifying evolution in the aftermath of hybridization, illustrated by extensive genomic rearrangements and accelerated gene evolution. Furthermore, the regulatory crosstalk between sub-genomes can adjust gene expression depending on the environment. Thus, in comparison to non-hybrid Verticillium spp., V. longisporum has a high adaptive potential that can contribute to host immunity evasion and to the further specialization towards brassicaceous plant hosts.
Material and methods
Genome analysis
Genome assemblies of the two V. longisporum strains (VLB2 and VL20) and V. dahliae strain JR2 were previously published [39,46]. Telomeric regions were determined based on the fungal telomeric repeat pattern: TAACCC/GGGTTA (minimum three repetitions) [46]. Furthermore, additional repeats were identified and characterized using RepeatModeler (v1.0.8). De novo-identified repeats were combined with the repeat library from RepBase (release 20170127) [64]. The genomic coordinates of the repeats were identified with RepeatMasker (v4.0.6). Homologous genes were identified by nucleotide BLAST (v2.2.31+). Here, only hits with a minimal coverage of 80% with each other were selected.
RNA sequencing, gene annotation and function determination
To obtain RNA-seq data for Verticillium grown in culture medium, isolates JR2, VLB2 and VL20 were grown for three days in potato dextrose broth (PDB) with three biological replicates for every isolate. To obtain RNA-seq data from Verticillium grown in planta, two-week-old plants of the susceptible oilseed rape cultivar ‘Quartz’ were inoculated by dipping the roots for 10 minutes in 1×106 conidiospores ml−1 spore suspension of V. longisporum isolates VLB2 and VL20, respectively [50]. Similarly, three-week-old A. thaliana (Col-0) plants were inoculated with V. dahliae isolate JR2, VLB2 and VL20, respectively. After root inoculation, plants were grown in individual pots in a greenhouse under a cycle of 16 h of light and 8 h of darkness, with temperatures maintained between 20 and 22°C during the day and a minimum of 15°C overnight. Three pooled samples (10 plants per sample) of stem fragments (3 cm) and complete flowering stems were used for total RNA extraction for oilseed rape and A. thaliana, respectively. Total RNA was extracted based on TRIzol RNA extraction (Simms et al. 1993). cDNA synthesis, library preparation (TruSeq RNA-Seq short-insert library), and Illumina sequencing (single-end 50 bp) was performed at the Beijing Genome Institute (BGI, Hong Kong, China). In total, ∼2 Gb and ∼1.5 Gb of filtered reads were obtained for the Verticillium samples grown in culture medium and in planta, respectively. RNAseq data were submitted to the SRA database under the accession number: SRP149060.
Using RNA-seq from the in liquid medium grown cultures, gene annotation was performed for JR2, VLB2 and VL20 with the BRAKER1 1.9 pipeline [47] using GeneMar-kET [65] and AUGUSTUS [66]. Predicted genes with internal stop codons were removed from the analysis. The secretome prediction was done using SingalP4 (v4.1) [67], TargetP (v1.1) [68], and TMHMM (v2.0) [69] as described previously [70]. Pfam function domains were predicted using InterProScan [71]. Subsequently, Pfam enrichments was determined using hypergeometric tests, and significance values were corrected using the Benjamini-Hochberg false discovery method [72]. Clusters of Orthologous Group (COG) categories were determined for protein sequences using EggNOG (v4.5.1) [73].
Parental origin determination
Sub-genomes were divided based on the differences in sequence identities between species A1 and D1 with V. dahliae. V. longisporum genomes of VLB2 and VL20 were aligned to the complete genome assembly of V. dahliae JR2 using NUCmer, which is part of the MUMmer package v3.23 [46,74]. Here, only 1-to-1 alignments longer than 10 kb and with a minimum of 80% identity were retained. Subsequent alignments were concatenated if they aligned to the same contig with the same orientation and order as the reference genome. The average nucleotide identity was determined for every concatenated alignment and used to divide the genomes into sub-genome.
The parental origin determination based on sequence identities of the exonic regions of genes was performed by BLAST (v2.6.0+). Here, hits with a minimum subject and query coverage of 80% were used. Furthermore, similar to Louis et al. (2012), differences in GC-content between homologous genes present in two copies were calculated accordingly:
Gene conversion and genomic rearrangements
Genes occurring in multiple copy were identified using nucleotide BLAST (v2.6.0+) and the sequence identity between these genes was determined. Here, hits with a minimum subject and query coverage of 80% were used.
The VLB2 genome assembly was aligned to VL20 to identify synteny breaks using NUCmer, which is part of the MUMmer package v3.23 [74]. Subsequent alignments were concatenated if they aligned to the same contig with the same orientation and order as the reference genome. In order to confirm synteny breaks, filtered V. longisporum long sequencing reads of VLB2 [39] were aligned to the V. longisporum VL20 genome with the Burrows-Wheeler Aligner (BWA) [75] and further processed with the samtools package (v1.3.1) [76]. Synteny breaks were visualized using the R package Sushi [77] and the Integrative Genomics Viewer [78]. The association between breaks with repeats was tested through permutation. First, the fraction of synteny breaks flanked by repeats was determined. Here, synteny breaks were assigned to reside in a “repeat-rich” region if a 1 kb window around the break consisted for more than 10% of repeats. The V. longisporum VL20 genome assembly was divided into windows of 1 kb using BEDTools (v2.26.0). To estimate the significance of the synteny break/repeat association [79], 10,000 permutations were executed with the same amount of windows as there were synteny breaks to determine the random distribution of repeat-rich regions.
Phylogenetic tree
Following Verticillium strains were used as representatives for their species: V. albo-atrum = PD747, V. alfalfae = PD683, V. dahliae = JR2, V. isaacii = PD660, V. klebahnii = PD401, V. nonalfalfae = TAB2, V. nubilum = PD621, V. tricorpus = PD593 and V. zaregansianum = PD739. The phylogenetic tree was constructed based on nucleotide sequences of the Benchmarking Universal Single-Copy Orthologs (BUSCOs) of fungi present in all Verticillium spp. and the out-group species Sodiomyces alkalinus [80]. To this end, previously published Verticillium and S. alkalinus assemblies were used [39,46,56,81]. In total, 277 orthologous groups were aligned using mafft (v7.271) (default settings) [82,83]. Aligned genes were then concatenated, and the phylogenetic tree was inferred using RAxML with the GTRGAMMA substitution model (v8.2.0) [84]. The robustness of the inferred phylogeny was assessed by 100 rapid bootstrap approximations.
Gene divergence
Previously published annotations of the haploid Verticillium spp. were used to compare the evolutionary speed of orthologs [46,56]. The VESPA (v1.0b) software was used to automate this process [85]. The coding sequences for each Verticillium spp. were filtered and subsequently translated using the VESPA ‘clean’ and ‘translate’ function. Homologous genes were retrieved by protein BLAST (v2.2.31+) querying a database consisting of all Verticillium protein sequences. Here, only hits with a minimum coverage of 70% were used. Homologous genes were grouped with the VESPA ‘best_reciprocal_group’ function. Only homology groups that comprised a single representative for every Verticillium spp. were used for further analysis. Protein sequences of each homology group were aligned with muscle (v3.8.31) [86]. The aligned protein sequences of the homology groups were conversed to nucleotide sequence by the VESPA ‘map_alignments’ function. The alignments were used to calculate Ka/Ks for every branch of the species phylogeny using codeml module of PAML (v4.8) with the following parameters: F3X4 codon frequency model, wag.dat empirical amino acid substitution model and no molecular clock [87]. To this end, the previously obtained phylogenetic tree topology was used: (((((V. klebahnii, V. isaacii), V. tricorpus), V. zaregamsianum), V. albo-atrum), (((V. dahliae, species D1), (V. alfalfae, V. nonalfalfae)), species A1), V. nubilum). When comparing the evolutionary speed between V. dahliae genes and its orthologs, extreme values Ka/Ks were discarded from further analysis, i.e. Ka/Ks < 0.0001 and Ka/Ks ≥ 2. Significance of positive selection was tested using a Z-test [88] and Z-values >1.65 were considered significant with P<0.05.
Gene expression analysis
The RNA sequencing reads of the Verticillium strains VLB2, VL20 and JR2 were uniquely mapped to their previously assembled genomes using the Rsubread package in R [39,46,89,90]. To compare gene expression patterns, gene orthologs were retrieved by protein BLAST (v2.2.31+). Here, only hits with a minimum sequence identity of 70% and of coverage of 80% were used. Only genes in single copy in V. dahliae with two orthologs in V. longisporum of different parental origin (one A1 copy and one D1 copy) were used for comparative analysis. The comparative transcriptomic analysis was performed with the package edgeR in R (v3.4.3) [90–92]. Expression patterns were corrected for putative length differences between orthologous genes. Genes are considered differently expressed when P-value < 0.05 with a log2-fold-change ≥ 1. P-values were corrected for multiple comparisons according to Benjamini and Hochberg [72].
Acknowledgements
The authors would like to thank the Marie Curie Actions program of the European Commission that financially supported the research of J.R.L.D. Work in the laboratories of B.P.H.J.T. and M.F.S is supported by the Research Council Earth and Life Sciences (ALW) of the Netherlands Organization of Scientific Research (NWO). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Sander Y.A. Rodenburg for sharing bioinformatics scripts.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.
- 92.↵
- 93.↵