Abstract
Plants and fungi display a broad range of interactions in natural and agricultural ecosystems ranging from symbiosis to parasitism. These ecological interactions result in coevolution between genes belonging to different partners. A well-understood example are secreted fungal effector proteins and their host targets, which play an important role in pathogenic interactions. Biotrophic smut fungi (Basidiomycota) are well-suited to investigate the evolution of plant pathogens, because several reference genomes and genetic tools are available for these species. Here, we used the genomes of Sporisorium reilianum f. sp. zeae and S. reilianum f. sp. reilianum, two closely related formae speciales infecting maize and sorghum, respectively, together with the genomes of Ustilago hordei, Ustilago maydis and Sporisorium scitamineum to identify and characterize genes displaying signatures of positive selection. We identified 154 gene families having undergone positive selection during species divergence in at least one lineage, among which 77% were identified in the two investigated formae speciales of S. reilianum. Remarkably, only 29% of positively selected genes encode predicted secreted proteins. We assessed the contribution to virulence of nine of these candidate effector genes in S. reilianum f. sp. zeae by deleting individual genes, including a homologue of the effector gene pit2 previously characterized in U. maydis. Only the pit2 deletion mutant was found to be strongly reduced in virulence. Additional experiments are required to understand the molecular mechanisms underlying the selection forces acting on the other candidate effector genes, as well as the large fraction of positively selected genes encoding predicted cytoplasmic proteins.
Introduction
Plants and fungi have a long history of coevolution since the emergence of pioneering land plants approximately 400 million years ago. The development of early plants was likely supported by associations with symbiotic fungi, as suggested by analyses of ribosomal RNAs and fossil records (Remy et al. 1994; Gehrig et al. 1996; Martin et al. 2017). Different forms of plant-fungus interactions have evolved, including mutualistic symbiosis where both plant and fungus benefit (Parniske 2008), and pathogenic interactions where fungal colonization greatly reduces plant fitness (Dean et al. 2012). Pathogenic interactions play critical roles in natural and agricultural ecosystems, and understanding the evolutionary mechanisms shaping them is of great importance to plant production, food security and protection of biodiversity in natural ecosystems (Fisher et al. 2012; Bagchi et al. 2014).
Secreted fungal effector proteins are key players in pathogenic interactions as they are involved in protecting and shielding growing hyphae, suppressing plant defense responses and changing plant physiology to support growth of the pathogen (Stergiopoulos and de Wit 2009; de Jonge et al. 2011; Giraldo and Valent 2013). Many effector proteins lack known functional domains, and expression of a subset of effectors is linked to plant colonization (Lo Presti et al. 2015; Toruno et al. 2016; Franceschetti et al. 2017; Lanver et al. 2017). Effector proteins with a strong effect on virulence phenotype are thought to coevolve with their plant targets either in an arms race or a trench-warfare scenario (Brown and Tellier 2011). In the former, fungal effectors manipulating the host are under positive directional selection, and plant targets evolve in response to changes in effector proteins (Rovenich et al. 2014). In the latter scenario, sets of alleles are maintained by balancing selection in both host and pathogen populations (Brown and Tellier 2011; Tellier et al. 2014). Several methods are available for identifying genomic regions under selection (Nielsen 2005; Aguileta et al. 2009; Aguileta et al. 2010). It has been proposed that genes with signatures of positive selection have important functions during host pathogen interaction or have contributed to host specialization (Tiffin and Moeller, 2006). It is therefore expected that the deletion of such genes reduces virulence when tested on a susceptible host.
Depending on the aim of the investigation, studies identifying genes with signatures of positive selection are carried out within or between species. Whilst studies on the population level focus on recent and ongoing selective processes and are instrumental in the understanding of adaptation, comparative genomic studies employing different species encompass a broader time span and provide insight into the underlying genetic basis of host specialization (Plissonneau et al. 2017). The signature of positive selection in such case typically takes the form of an excess of divergence between species due to increased fixation of mutations by selective sweeps compared to a neutral expectation (Yang and Nielsen, 1998). This is commonly measured by the ratio of non-synonymous (dN) over synonymous (dS) divergence and a dN / dS ratio > 1 is taken as evidence for positive selection under the assumption that synonymous substitutions are neutral while non-synonymous are not. Positive selection studies in a number of plant pathogen systems revealed that genes encoding secreted effector proteins are enriched in signatures of positive selection (Möller and Stukenbrock 2017). Such studies include investigations in diverse plant pathogens like Microbotryum species causing anther-smut disease of Caryophyllaceae species (Aguileta et al. 2010), the wheat pathogen Zymoseptoria tritici (Stukenbrock et al. 2011), the rust fungus Melampsora larici-populina (Hacquard et al. 2012), the rice blast fungus Magnaporthe oryzae (Huang et al. 2014), the wheat stem rust fungus Puccinia graminis f. sp. tritici (Sperschneider et al. 2014) the Irish potato famine pathogen Phytophthora infestans (Dong et al. 2014), a group of Fusarium species (Sperschneider et al. 2015) and a group of smut fungi parasitizing different grasses and a dicot host (Sharma et al. 2014; Sharma et al. 2015). Yet, as only a few genes under positive selection have been functionally studied, the link between the selected genotypes and their corresponding phenotypes are only beginning to be understood and only a few studies used evolutionary predictions to unravel the molecular mechanisms of host adaptation. For example, the population genomics study in the wheat pathogen Z. tritici which identified candidate effector genes under positive selection (Stukenbrock et al. 2011) was followed up experimentally, and in this case it was shown that the deletion of three of four candidate genes reduced virulence (Poppe et al. 2015). In the grey mold fungus Botrytis cinerea four positively selected genes were deleted without affecting virulence, and this finding was attributed to functional redundancy, the limited number of tested host plants, or experimental conditions different from natural infections (Aguileta et al. 2012). A study of the oomycete effector protein EpiC1 showed that a single amino acid substitution at a site under positive selection affected the binding affinity of different host proteases determining host specificity (Dong et al. 2014).
Smut fungi, belonging to the division of Basidiomycota, are a group of about 550 species parasitizing mostly grasses, including important crops like maize, sorghum, oat, barley and sugarcane (Begerow et al. 2014). In smut fungi, sexual reproduction is linked to pathogenic development and smut fungi therefore depend on successful plant colonization to complete their life cycle. As biotrophic pathogens, they require living plant tissue for establishing a successful interaction (Martinez-Espinoza et al. 2002). With few exceptions like Ustilago maydis, smut fungi usually develop symptoms only in the female or male inflorescence of their respective host plants. During the last ten years, quality draft genome sequences of prominent species were obtained, including U. maydis, the causative agent of smut disease on maize and teosinte (Kämper et al. 2006), Sporisorium reilianum causing head smut of maize and sorghum (Schirawski et al. 2010), Ustilago hordei infecting barley (Laurie et al. 2012), and Sporisorium scitamineum parasitizing sugarcane (Que et al. 2014; Taniguti et al. 2015;Dutheil et al. 2016). The head smut fungus S. reilianum occurs in two formae speciales that infect maize (S. reilianum f. sp. zeae) or sorghum (S. reilianum f. sp. reilianum) (Zuther et al. 2012). The concept of formae speciales is used in phytopathology to distinguish members of the same species based on their ability to colonize a certain host plant (in this example maize or sorghum) (Anikster, 1984). The divergence of U. hordei, U. maydis, S. scitamineum and S. reilianum was inferred to have occurred in the interval of seven and 50 million years ago (Munkacsi et al. 2007). The availability of genome sequences of several species with different host ranges, together with established tools for genetic manipulations (Brachmann et al. 2004; Kämper 2004; Khrunyk et al. 2010; Schuster et al. 2016) make this group of smut fungi particularly interesting to study the evolution of effector genes as well as their contributions to virulence, speciation and host specificity.
Here, we employed the genome of the recently sequenced strain S. reilianum f. sp. reilianum SRS1_H2-8 (http://www.ebi.ac.uk/ena/data/view/LT795054-LT795076) (Zuther et al. 2012) together with the genomes of U. maydis, U. hordei, S. scitamineum and S. reilianum f. sp. zeae to identify potential effector genes with signatures of positive selection. Candidate genes were individually deleted in S. reilianum f. sp. zeae and the phenotype of the deletion strains was assessed after infection of maize in order to understand their function with respect to virulence. We report that the deletion of one candidate gene, pit2, led to a strong reduction in virulence and we further discuss hypotheses on the origin of positive selection for the other candidate genes.
Materials and Methods
Construction of homologous protein families
Fungal species used in this study, their number of gene models, number of predicted secreted proteins and sources of genome data are listed in supplementary table S1. The predicted proteome of the five smut fungi U. hordei, U maydis, S. scitamineum, S. reilianum f. sp. zeae and S. reilianum f. sp. reilianum were used to perform an all-against-all blastp search (Altschul et al. 1990). The SiLiX algorithm was subsequently used to infer homology relationships based on the blast hits (Miele et al. 2011). Two parameters are considered to decide whether a Blast hit can be taken as evidence for homology: the percent identity between two sequences and the relative length of the hit compared to the total length of the two sequences, hereby referred to as “coverage”. In order to maximize the number of families comprising 1:1 orthologues (that is families that have an equal number of members in each species), SiLiX (Miele et al. 2011) was run with a range for coverage and identity thresholds between 5 % and 95 % in 5 % steps. An identity of 40 % and coverage between 5 % and 45 % lead to the maximum number of families with 1:1 orthologues (5,394; supplementary fig. S1) while settings with 40 % identity and 80 % coverage lead to 5,326 families with 1:1 orthologues (supplementary fig. S1). Since using a higher coverage had only a cost of 68 core families, the stricter criteria were applied for family clustering. Families with at least two members were aligned on the codon level using MACSE 1.01b (Ranwez et al. 2011) and on the protein level using PRANK v.100802 (Löytynoja and Goldman 2008). The resulting alignments were subsequently compared and column scores (CS) computed for each position in the alignment (Thompson et al. 1999). Only positions with CS of 100 % (that is, alignment columns identically found by both methods) and a maximum of 30 % gaps were retained for further analysis.
Estimation of genome-wide divergence values and divergence times
The five genomes of U. hordei, U. maydis, S. scitamineum, S. reilianum f. sp. zeae and S. reilianum f. sp. reilianum were aligned using the Multiz genome aligner from the TBA package (Blanchette et al. 2004) and projected on the U. maydis genome as reference. The resulting multiple genome alignment had a total size of 21 Mb and was further restricted to regions with homologous sequences in the five species (total length after this step: 14.3 Mb) and processed to remove coding regions. The final non-coding sequence alignment had a total length of 2.2 Mb, for which pairwise nucleotide similarities were computed in non-overlapping windows of 10 kb.
Gene families with exactly one member in each species were concatenated and pairwise protein sequence similarities computed using the seqinr package for R (Charif and Lobry 2007). Protein alignments were also used to infer dates of divergence, using a relaxed clock model. The PhyloBayes version 4.1 software (Lartillot et al. 2009) was used with the auto-correlated model of Thorne et al. (Thorne et al. 1998) under a GTR + CAT model. A unique calibration point was used, based on the divergence time of the most divergent lineage U. hordei, previously estimated to have occurred between 27 and 21 Myr (Bakkeren and Kronstad, 2007). A uniform prior was used on this interval for the Monte-Carlo Markov Chain. As convergence issues arise when large alignments (more than 20,000 positions) are used, we followed the PhyloBayes authors’ recommendation to conduct a jackknife procedure. We generated three datasets of ca 20,000 amino acids by randomly sampling families and concatenating the corresponding alignments. Two chains were run in each case and convergence was assessed. Sampling was performed after a burning of 10,000 iterations, and every 10 subsequent iterations. Chains were run to ensure that the minimum effective sample size was greater than 50 and maximum relative difference lower than 0.3 in at least one sample. Results are summarized in supplementary table S2 and supplementary fig. S2 shows the six chains for the three samples. In addition to the convergence of the two chains for each sample, our results reveal extremely consistent results between samples. Figure 1A shows estimates from one chain of the third data set, which shows a minimum effective sample size greater than 300.
Detection of positive selection
For gene families with at least three members, translated sequences were employed to create maximum likelihood phylogenetic trees using PhyML 3.0 (Guindon et al. 2010) with a minimum parsimony starting tree and the LG amino acid substitution model with a four-classes gamma distribution of site-specific substitution rate (Le and Gascuel 2008). The best tree topology obtained from nearest neighbor interchange (NNI) and subtree pruning recrafting (SPR) searches was kept (Guindon et al. 2010). BppML (Dutheil and Boussau 2008) was then used to re-estimate branch lengths from the codon alignment using the YN98 substitution model (Nielsen and Yang 1998). We next aimed at inferring the occurrence of positive selection for each gene family. This is typically achieved by measuring the ratio of non-synonymous vs. synonymous substitutions (dN/dS ratio) using models of codon sequence evolution (Yang, 2006). In particular, non-homogeneous models of sequence evolution estimate the dN/dS ratio independently in different lineages, yet at the cost of potential over-parametrization issues. In the manual of the PAML package, the authors state that such models should only be used for hypothesis testing and advise against using them for scans of positive selection. Dutheil et al. (2012) proposed a model selection approach (implemented in the TestNH package) allowing to select for the best non-homogeneous model supported by the data. They start by fitting the simplest (homogeneous) model and sequentially add parameters to model variation of selective regime among lineages. Because the number of possible models is large even for small data sets, two heuristic approaches have been introduced: the ‘free’ heuristic permits unconnected branches from the tree to evolve under the same regime, while the ‘joint’ heuristic restricts model sharing to connected branches (see Dutheil et al. 2012 for details). The choice of models to test is guided by statistics on the patterns of substitutions on the phylogenetic tree, an approach named substitution mapping (Romiguier et al. 2012). Apart from the model selection approach, the underlying models of codon sequence evolution are identical to the one originally described by Yang (Yang, 1998; Yang and Nielsen, 1998). Model selection was performed with the TestNH software, which contains two programs: (1) MapNH (Romiguier et al. 2012) was used for mapping substitutions on the previously inferred phylogenetic tree and (2) PartNH (Dutheil et al. 2012) was subsequently employed to fit time non-homogeneous models of codon substitutions. PartNH uses the previously inferred substitution maps in order to perform model comparisons and select a non-homogeneous model with minimal number of parameters. Both methods ‘free’ and ‘join’ were applied and compared to scan for positive selection. Finally, putative secreted effector proteins were identified by predicting secretion using SignalP 4.0 (Petersen et al. 2011) and proteins were considered as secreted if the program indicated the presence of a signal peptide but no transmembrane domain.
To detect residues under positive selection in homologues of pit2, the branch-site model with Bayes Empirical Bayes (BEB) analysis as implemented in PAML4 (Yang 2007) was applied. We employed information about family composition, alignment and phylogeny as outlined above and defined sr10529 and srs_10529 as foreground branches. A posterior probability threshold of > 95 % was used for the BEB analysis.
Association of positively selected genes with repeats in U. hordei
We tested whether genes under positive selection are located significantly closer to repetitive elements than average genes in the genome of U. hordei, which shows the highest content of repetitive elements in the group of smut fungi investigated here. For this analysis, only a group of “uncharacterized interspersed repeats” was investigated, because it was shown previously that this is the only category showing a strong association with candidate effector genes (Dutheil et al. 2016). Binary logistic regressions were conducted in R using the rms package (Harrell 2015). The ‘robcov’ function of the rms package was used in order to get robust estimates of each effect. The variable “distance to the closest interspersed repeat” was transformed by log(x+1) because of its extreme distribution.
Comparing dN/dS ratios of genes residing in virulence clusters
Previous work has identified several virulence gene clusters in U. maydis and some of them play important roles during pathogenic development (Kämper et al. 2006; Schirawski et al. 2010). In total, these clusters contain 163 genes, where 100 reside in clusters without virulence phenotype and 63 reside in clusters with virulence phenotype upon deletion. Both types of clusters contain each 32 genes for which a dN/dS ratio could be determined (the missing genes are part of families that do not have at least three members and were therefore not analyzed). The dN/dS ratios of all genes in clusters were compared between clusters with and without virulence phenotype (Wilcoxon Rank-Sum Test).
Gene Ontology terms enrichment analysis
All proteins in S. reilianum f. sp. zeae, in S. reilianum f. sp. reilianum and in U. hordei were considered for Gene Ontology (GO) term enrichment analyses. GO terms were assigned using iprscan 1.1.0 (http://fgblab.org/runiprscan; developed by Michael R. Thon) which links GO information provided by Interpro to each protein. In this way, 1,759 unique GO terms could be assigned to 4,130 proteins in S. reilianum f. sp. zeae, 1,744 unique GO terms could be assigned to 4,124 proteins in S. reilianum f. sp. reilianum and 1,757 unique GO terms could be assigned to 3,922 proteins in U. hordei (supplementary table S3). The Bioconductor package topGO (Alexa et al. 2006) was then used to link each GO term to the three major categories “Cellular Component”, “Biological Process” or “Molecular Function”. Enrichment analysis was performed by computing P values for each GO term using Fisher’s classic test with parent-child correction (Grossmann et al. 2007). Cytoplasmic proteins with and without signatures of positive selection were compared for the three species separately, and differences were considered to be significant at the 5 % level.
Strains and growth conditions
The Escherichia coli derivative Top 10 (Invitrogen, Karlsruhe, Germany) and the Saccharomyces cerevisiae strain BY4741 (MATa his3Δ1 leu2Δ met15Δ ura3Δ; Euroscarf, Frankfurt, Germany; kindly provided by M. Bölker, Marburg) were used for cloning purposes. Sporisorium reilianum strains used in this study are listed in supplementary table 4. They are derivatives of the haploid solopathogenic strain JS161 which is capable of plant colonization without the need of a mating partner, because it expresses a compatible pheromone/receptor pair (Schirawski et al. 2010) Escherichia coli was grown in dYT liquid medium (1.6 % (w/v) Trypton, 1.0 % (w/v) Yeast Extract (Difco), 0.5 % (w/v) NaCl) or YT solid medium (0.8 % (w/v) Trypton, 0.5 % (w/v) Yeast-Extract, 0.5 % (w/v) NaCl, 1.3 % (w/v) agar) supplemented with 100 mg/mL Ampicillin when needed. The yeast S. cerevisiae was maintained in YPD solid medium (1 % (w/v) yeast extract, 2 % (w/v) Bacto-Pepton, 2 % (w/v) Bacto-Agar, 2 % (w/v) glucose) and grown on SC URA- medium (1.7 % (w/v) Yeast Nitrogen Base without ammonium sulfate, 0.147 % (w/v) dropout-mix without Uracil, 2 % (w/v) glucose) for selecting transformants containing the plasmid pRS426 (Sikorski and Hieter 1989) (kindly provided by M. Bölker, Marburg) or derivatives of pRS426. Strains of S. reilianum were grown in liquid YEPSlight medium (1.0 % (w/v) yeast extract, 0.4% (w/v) peptone, 0.4% (w/v) sucrose) at 28°C on a rotary shaker at 200 rpm.
Construction of S. reilianum strains
Polymerase chain reactions were performed using the Phusion High-Fidelity DNA Polymerase (New England Biolabs). Templates were either JS161 genomic or indicated plasmid DNA. Restriction enzymes were obtained from New England Biolabs. Protoplastmediated transformation was used to transform S. reilianum following a method established for U. maydis (Schulz et al. 1990). Transformants were selected on RegAgar plates (1.0 % (w/v) yeast extract, 0.4 % (w/v) Bacto-Pepton, 0.4 % (w/v) Sucrose, 1 M Sorbitol, 1.5 % (w/v) Bactoagar) supplemented with 200 μg/mL Geneticin and true resistance was tested by growing single colonies on PD plates (3.9 % (w/v) Potato-Dextrose Agar, 1 % (v/v) Tris-HCl (1M, pH 8.0)) supplemented with 50 μg/mL Geneticin. Gene replacements with resistance markers were generated with a PCR-based method employing the previously described SfiI insertion cassette system (Brachmann et al. 2004; Kämper 2004) and were confirmed by Southern blot analysis. Genomic regions residing about 1 kb upstream (left border) or downstream (right border) adjacent to open reading frames of candidate genes were PCR-amplified using the listed primer pairs (supplementary table S5) and genomic DNA of JS161 as template. The resulting fragments were used for cloning plasmids containing the respective deletion constructs.
To obtain deletion constructs for the genes sr10529 and sr14347, PCR fragments containing the left and right borders of each gene were ligated to the hygromycin resistance cassette of pBS-hhn (Kämper 2004) via SfiI restriction sites and cloned into pCRII-TOPO (Life Technologies) to generate pTOPO Δsr10529 #1 and pTOPO Δsr14347 #1, respectively. Since the use of Geneticin as selection marker resulted in much less false positive transformants compared to the use of Hygromycin B, the hygromycin resistance cassettes in these plasmids were replaced by the Geneticin resistance cassette of pUMA 1057 (Brachmann et al. 2004) by ligation via SfiI restriction sites, yielding plasmids pTOPO Δsr10529 G418 and pTOPO Δsr14347 Gen #1, respectively. Deletion constructs were PCR-amplified from plasmids pTOPO Δsr10529 G418 and pTOPO Δsr14347 Gen #1 using the listed primers (supplementary table S5) and used to transform the S. reilianum strain JS161 to generate the gene deletion strains JS161Δsr10529 and JS161Δsr14347, respectively.
The drag and drop cloning method in yeast (Jansen et al. 2005) was used to generate plasmids pRS426 Δsr12968 Hyg #1, pRS426 Δsr14944 Hyg #2, pRS426 Δsr10059 Hyg #1, pRS426 Δsr10182 Hyg #1, pRS426 Δsr14558 Hyg #1 and pRS426 Δsr12897 Hyg #1 which contain deletion constructs for deleting the candidate genes sr12968, sr14944, sr10059, sr10182, sr14558 or sr12897. These plasmids are a derivate of plasmid pRS426, which can be maintained in E. coli and S. cerevisiae (Sikorski and Hieter 1989). PCR-amplified left and right borders of each candidate gene and the hygromycin resistance cassette were integrated in pRS426 by homologous recombination in S. cerevisiae. Subsequently, the hygromycin resistance cassette was replaced with the Geneticin resistance cassette by ligation via SfiI restriction sites, yielding plasmids pRS426 Δsr12968 Gen #1, pRS426 Δsr14944 Gen #3, pRS426 Δsr10059 Gen #1, pRS426 Δsr10182 Gen #1, pRS426 Δsr14558 Gen #1 and pRS426 Δsr12897 Gen #5, respectively. Gene deletion constructs were PCR-amplified from the respective plasmid using listed primers (supplementary table S5). The obtained deletion constructs were used to transform the S. reilianum strain JS161 to generate the gene deletion strains JS161Δsr12968, JS161Δsr14944, JS161Δsr10059, JS161Δsr10182, JS161Δsr14558 and JS161Δsr12897, respectively.
The drag and drop cloning method was also used to generate plasmid pRS426 Δsr12084 Gen #1. PCR-amplified left and right borders of sr12084 and the Geniticin resistance cassette were integrated in pRS426 by homologous recombination in S. cerevisiae. The gene deletion construct for deleting the candidate gene sr12084 was PCR-amplified from plasmid pRS426 Δsr12084 Gen #1 using primers sr12084_lb_fw/sr12084_rb_rv and transformed into the S. reilianum strain JS161 to generate the gene deletion strain JS161Δsr12084.
Virulence assays
The solopathogenic strain JS161 and derivatives thereof were grown in YEPSlight liquid medium to an optical density at 600 nm (OD600) of 0.8 - 1.0 and cell cultures were adjusted to an OD600 of 1.0 with sterile water prior to injection into one-week old maize (Zea mays) seedlings of the dwarf cultivar ‘Gaspe Flint’ (kindly provided by B. Burr, Brookhaven National Laboratories and maintained by self-pollination). Plants were sowed in T-type soil of ‘Fruhstorfer Pikiererde’ (HAWITA, Vechta, Germany) and grown in a temperature-controlled greenhouse (14h-/10h-light/dark cycle, with 28/20°C and 25,000 – 90,000 lux during the light period). Virulence symptoms were scored nine to ten weeks post infection according to previously described symptoms (Ghareeb et al. 2011) and the following categories were distinguished: the plant did not develop ears, the plant developed healthy ears shorter or equal to 1 cm or the plant developed healthy ears longer than 1 cm, the plant developed spiky ears, phyllody in ears or phyllody in tassels. Spore formation was only observed occasionally, and rarely the plant died due to the infection. Three independent infections were carried out per strain, mock treated plants were infected with water as control and at least three independent deletion strains were tested for virulence.
Results
Candidate effector genes are less conserved between species compared to other genes
We reconstructed families of homologous genes for the five smut fungi U. hordei, U. maydis, S. scitamineum, S. reilianum f. sp. zeae and S. reilianum f. sp. reilianum using the SiLiX clustering algorithm (Miele et al. 2011). We optimized the clustering parameters to maximize the occurrence of orthologues and minimize the number of paralogues within each family. In this way, we were able to reconstruct 8,761 families, among which 5,266 had at least one gene in each species (supplementary table S6). As a consequence, we found at least one homologous sequence in four species for 78 % of all genes. 5,254 gene families are found to have exactly one member in each species and were therefore taken as true orthologues (referred to as “core orthologous set” in the following). Considering that secreted proteins are putative effectors, we used SignalP (Petersen et al. 2011) to predict secretion of the encoded protein for each gene (supplementary table S1). We report that 920 (11 %) families contained only genes encoding a predicted secreted protein, while 7,657 (87 %) contained only genes encoding a protein not predicted to be secreted. The remaining 184 (2 %) families contained both predicted secreted and cytoplasmic proteins (supplementary table S6). The occurrence of families with both secreted and cytoplasmic proteins can be explained by (1) false negative predictions for secretion, as truncated C-terminal sequences were not removed from the data set, (2) wrong gene annotations or (3) gain or loss of a secretion signal peptide during effector evolution (Poppe et al. 2015). Among all predicted secreted proteins, 52 % have at least one orthologue in all other species, which is significantly less than the global 78 % proportion for all proteins (Chi-squared test, P value < 2.2×10−16). Genes encoding putative effector proteins are therefore less conserved across species than other genes, either because their sequence is evolving faster, preventing the recovery of homologous relationships, or because effector genes are created or lost at a higher rate. In U. hordei, we observe several species-specific family expansions. There were 17 families which encompassed five to 25 members, but no orthologue in other species (supplementary table S6). Moreover, we identified three families with up to 62 members in U. hordei, but only one member in up to three of the other species (supplementary table S6). Gene duplications in U. hordei have been hypothesized to be driven by mobile elements (Laurie et al. 2012).
The genomes of S. reilianum f. sp. zeae and S. reilianum f. sp. reilianum diverged around one million years ago
To establish a frame for our comparative analysis we first calculated sequence similarity of the five smut fungi for non-coding intergenic and protein sequences. In addition, we estimated divergence times by performing a molecular dating analysis based on the core orthologues set of the five pathogens, using advanced models of protein sequence evolution and Bayesian inference as implemented in the PhyloBayes package (Lartillot et al. 2009). As calibration point, we used the divergence time of U. hordei and U. maydis, previously estimated to be between 27 and 21 Myr (Bakkeren and Kronstad, 2007). In alignable intergenic regions U. hordei shares 57 % identity with S. reilianum f. sp. reilianum and 77 % identity in protein sequences (fig. 1B). Monte-Carlo Markov chains were run for three independent gene samples totaling more than 20,000 amino acid positions each, and two chains were run in each case to assess convergence. The resulting posterior distribution of divergence times were used to infer 95 % posterior intervals. The split between U. maydis and the Sporisorium species was estimated to have occurred around 20 Myr ago (95 % posterior interval 25 to 12 Myr; fig 1A and supplementary table S2). Sporisorium reilianum f. sp. reilianum shares 61 % nucleotide identity in alignable intergenic regions with U. maydis, and 79 % sequence identity at the protein level (fig. 1B). The divergence times of S. scitamineum and the two formae speciales of S. reilianum were calculated to be 13 Myr ago (95 % posterior interval 19 to 7 Myr; fig. 1A and supplementary table S2), which is consistent with the mean divergence estimated between the hosts sorghum and sugarcane (10 Myr with a posterior interval of 8 to 13 Myr, average over eight studies, source: timetree.org (Kumar et al. 2017)). Sporisorium reilianum f. sp. reilianum and S. scitamineum share 74 % non-coding nucleotide identity and 88 % identity at the protein level (fig. 1B). Finally, the two S. reilianum strains diverged 1.1 Myr ago (95 % posterior interval 2.4 to 0.4 Myr; fig. 1A) and share 98 % noncoding nucleotide identity and 99 % protein identity (fig. 1B). We note that the estimation of this divergence date varied with the gene set used, and was in some cases found to be older (1.7 Myr, with a 95 % posterior interval of 4.7 to 0.6 Myr, see supplementary table S2). The comparison of the five smut genomes therefore encompasses a broad evolutionary time, and the divergence times obtained are compatible with previous estimates from smaller data sets (Munkacsi et al., 2007). The speciation times of the investigated smut species largely predate the 10,000 years of crop plant domestication, which implies that adaptation to the agricultural host, if any, will be negligible when interpreting the inter-specific patterns of sequence divergence, as it represents a marginal proportion of the time since the divergence from the ancestral species.
Sporisorium reilianum contains the largest number of positively selected genes
To detect positive selection, 6,205 families with at least three members (orthologues and/or paralogues, see supplementary table S6) were, regardless of their species composition, aligned on the codon and amino acid level and a phylogenetic tree was inferred. Obtaining accurate alignments is critical for detecting positive selection since alignment errors frequently inflate the false discovery rate (Schneider et al. 2009; Jordan and Goldman 2012). We therefore developed a stringent bioinformatics pipeline for the filtering of sequence alignments by masking ambiguous alignment positions for further analysis (see Methods). To scan for positive selection, we employed a non-homogeneous model of sequence evolution allowing dN/dS ratios to vary along the phylogeny, in combination with two heuristic model selection procedures to avoid over-parametrization issues (Nielsen and Yang 1998; Dutheil et al. 2012). Model parameters could not be fitted by either one of the two methods in 1.7% of branches. The two model selection procedures led to highly consistent estimates of branch-specific dN/dS ratios (Spearman’s rank correlation coefficient equal to 0.85, p-value < 2.2×10−16). The distribution of dN / dS was highly skewed with a median value of 0.06, demonstrating the strong predominance of purifying selection throughout lineages and genes. The mean value of dN / dS ratios for lineages undergoing positive selection (dN / dS >1) was 4.1 (median 1.9). While a dN / dS ratio above one is indicative of positive selection, the absolute value of the ratio is a poor indicator of the strength of undergoing selection. In particular, high ratio values can be obtained because of low dS values, and the dN rate might include neutral substitutions, such as non-synonymous substitution that are conservative regarding certain biochemical properties of the amino-acids involved (Sainudiin et al. 2005). The largest number of genes with signs of positive selection was found in S. reilianum f. sp. zeae (84 genes, of which 25 encode predicted secreted proteins) and S. reilianum f. sp. reilianum (111 genes of which 27 encode predicted secreted proteins) (fig. 1C). In addition, a substantial number of positively selected candidate genes was also found in U. hordei (49, and of these, 22 genes are predicted to code for secreted proteins), but only very few in U. maydis (2 genes) and S. scitamineum (7 genes) (fig. 1C). A list of all proteins with their associated dN/dS ratios in each species is provided in supplementary table S3. Predicted secreted proteins were significantly enriched in the group of proteins under positive selection in U. hordei and in the two investigated formae speciales of S. reilianum (P values < 10−5; Fisher’s exact test). This corroborates results of earlier studies in other pathosystems that showed that predicted secreted proteins are often under positive selection, which can be attributed to their direct interaction with host proteins (Joly et al., 2010; Wicker et al. 2013; Poppe et al., 2015). Notably, all genes found under positive selection in the two strains of S. reilianum, in S. scitamineum and in U. maydis share orthologous genes in the other species (supplementary tables S3 and S6). In contrast, genes with signs of positive selection in U. hordei belong largely (36 out of 49 genes) to families showing species-specific expansions (supplementary tables S3 and S6).
Genes under positive selection in U. hordei are associated with uncharacterized interspersed repeats
Among the species compared here, the genome of U. hordei shows the highest fraction of repetitive elements (Laurie et al. 2012; Dutheil et al. 2016). Such elements are known to contribute to gene family expansions (Kazazian 2004), and have been suggested to contribute to adaptation by providing advantageous mutations, for instance by repeat-induced point mutations (RIP) leakage which was revealed in a species complex of Leptosphaeria (Rouxel et al. 2011; Grandaubert et al., 2014). As sequence signatures of RIP were found in LTR elements of U. hordei (Laurie et al., 2012), we tested whether genes under positive selection in U. hordei are physically associated with repetitive elements. We performed a binary logistic regression with the prediction of positive selection as a response variable (that is, whether the underlying branch has a dN/dS ratio higher than one) and we considered three putative explanatory variables for each analyzed gene: (1) whether the gene is predicted to encode a secreted protein, (2) whether the gene is duplicated and (3) the distance of the gene to the closest interspersed repeat. The complete linear model explains 50 % of the observed variance, and the three explanatory variables are all significant at the 0.1 % level (supplementary table S7). These results suggest that positively selected genes in U. hordei are associated with duplication events, and positive selection is more likely to occur at genes encoding putative effectors. In addition, the proximity of interspersed repeats increases the odds of positive selection, independently of the two other effects, and is confirmed by a stratification approach: the effect still holds when only duplicated genes are considered, or only genes encoding a secreted protein, or the combination of the two (supplementary table S7). This finding corroborates previous results obtained in other microbial plant pathogens where it was described that effector genes tend to localize in repeat rich regions and where it was suggested that such regions contribute to the rapid evolution of effector genes (Raffaele and Kamoun, 2012).
Positively selected genes encoding cytoplasmic proteins in S. reilianum and U. hordei
While we expect effector genes to be under positive selection, we find that the majority of positively selected genes in S. reilianum encodes cytoplasmic proteins (fig. 1C). To assess the putative functional role of these genes, we performed a Gene Ontology term enrichment analysis, comparing cytoplasmic proteins under positive selection to cytoplasmic proteins not under positive selection (table 1). This analysis revealed that genes with a potential role in metabolic processes, like sulfur compound metabolism, molybdopterin cofactor metabolic process, RNA metabolic process, organic cyclic compound metabolic process and oxidoreductase activity, as well as responses to starvation and extracellular stimuli are significantly overrepresented at the 5% level (Fisher’s classic test with Parent-Child correction; see table 1). This could indicate that cytoplasmic proteins under positive selection contribute to metabolic changes which might be needed to survive with the limited nutrients available on the surface or in the biotrophic interface of different host plants. A similar analysis for cytoplasmic proteins under positive selection in U. hordei was conducted, but only led to top-level categories (DNA integration, DNA metabolic process and isomerase activity; see table 1).
Virulence contribution of effector genes showing signs of positive selection in S. reilianum
Candidate effector genes inferred to be under positive selection in a particular species could play a critical role in pathogenicity. Therefore, we sought to assess the contribution to virulence of such candidate genes by creating individual deletion mutants. In total, we tested nine candidate genes with high dN/dS ratios and predicted to encode secreted proteins: three with signatures of positive selection only in S. reilianum f. sp. zeae, three with signatures of positive selection in S. reilianum f. sp. zeae as well as in S. reilianum f. sp. reilianum and three with signatures of positive selection only in S. reilianum f. sp. reilianum. All nine chosen candidate genes together with their characteristics are summarized in table 2. Deletion mutants were generated in the haploid solopathogenic strain JS161 of S. reilianum f. sp. zeae. This strain is capable of colonizing maize plants and cause disease without a compatible mating partner (Schirawski et al. 2010) but virulence is much reduced relative to infection with mating-compatible wild-type strains and spores are only rarely produced. Deletion mutants were also generated in strain JS161 in cases where positive selection was only detected in S. reilianum f. sp. reilianum (table 2), because no solopathogenic strain is presently available for S. reilianum f. sp. reilianum. For each gene at least three independent deletion mutants were generated and tested for virulence. To determine virulence, Gaspe Flint, a dwarf variety of corn, was infected and symptoms were scored in male and female flowers (fig. 2). Only the deletion of sr10529, a gene showing positive selection in both formae speciales of S. reilianum, showed a strong reduction in virulence (table 2 and fig. 2). The gene sr10529 in S. reilianum f. sp. zeae is orthologous to the previously identified and characterized gene pit2(UMAG_01375) in U. maydis.
Pit2 plays an essential role in virulence as inhibitor of a group of maize papain-like cysteine proteases that are secreted to the apoplast (Doehlemann et al. 2011; Mueller et al. 2013).Previous work identified a conserved domain of 14 amino acids (PID14) in Pit2 as required and sufficient for the inhibition of maize cysteine proteases (Mueller et al. 2013). When the branch-site model of PAML 4 (Yang 2007) was used to identify amino acid residues under positive selection in the Pit2 orthologues of the two S. reilianum species, only two residues residing in the PID14 domain were found under positive selection. However, 24 positively selected residues were detected outside this domain in the 57 amino acid long C-terminal part (fig. 3).
Discussion
We used evolutionary comparative genomics of five related smut fungi infecting four different host plants to identify genes with a signature of positive selection during species divergence, with a focus on genes encoding predicted secreted proteins, as such genes were suggested to contribute to virulence in various plant pathogenic microbes (Aguileta et al. 2010; Stukenbrock et al. 2011; Hacquard et al. 2012; Dong et al. 2014; Huang et al. 2014; Sharma et al. 2014) Our analysis revealed that positive selection is found between paralogous genes in U. hordei, where they belong to families with species-specific expansions. In contrast, genes under positive selection in the other four species belong to families of orthologous sequences. While we find evidence for a large set of genes under positive selection in the S. reilianum species, signatures for positive selection are hardly detectable in the more distant relatives U. hordei, U. maydis and S. scitamineum that diverged earlier. Finding evidence for positive selection over time spans of several millions of years is notoriously difficult (Gillespie 1994) because of two main reasons: (1) periods where genes are evolving under positive selection may occur episodically and may be followed by long episodes of purifying selection, leading to an average dN/dS below 1 on long periods of time and (2) fast evolving genes may diverge to an extent where their homology is difficult to infer and where they can no longer be aligned reliably. To overcome this problem, more genome information of species with intermediate branching points is needed (Gillespie 1994).
Predicted secreted proteins were about three times overrepresented in the set of positively selected genes, which illustrates the importance of secreted proteins in adaptation processes of smut fungi. This also corroborates results in other plant pathogenic microbes like Melampsora sp., Z. tritici and the wheat powdery mildew Blumeria graminis (Joly et al. 2010; Stukenbrock et al. 2011; Wicker et al. 2013). However, the majority of positively selected genes encodes cytoplasmic proteins (fig. 1C), suggesting that both secreted and non-secreted proteins are important targets of adaptation. A Gene Ontology analysis in S. reilianum showed that mainly processes related to metabolism and its regulation as well as responses to starvation and external stimuli are enriched in cytoplasmic proteins under positive selection. This points at a role of these proteins in adaptation to differences in nutrient availability in the respective host plants maize and sorghum as well as responses to cues originating from the respective host (Haueisen and Stukenbrock 2016). A study conducted in U. maydis has shown that the fungus induces major metabolic changes in the host plant upon infection during establishment of biotrophy and undergoes a series of developmental transitions during host colonization that are likely influenced by the host environment (Doehlemann et al. 2008). It is thus conceivable that the two S. reilianum accessions have adapted to their different hosts that differ significantly for example in their amino acid and vitamin composition (Etuk et al. 2012). Furthermore, recent studies in U. maydis suggested that intracellular changes of metabolism influence virulence, and therefore the underlying proteins could be targets of positive selection (Kretschmer et al. 2012; Goulet et al. 2017).
Out of nine deletions of positively selected genes, only one mutant, lacking sr10529, was affected in virulence. While six of the deleted genes are single genes in S. reilianum f. sp. zeae for which we failed to identify paralogs, sr12084 has two paralogs, sr14347 has five paralogs and sr10182 has ten paralogs. We restricted our analyses to generating deletion mutants in some of the genes under positive selection. This leaves open the possibility that the paralogous genes have redundant functions in virulence. Adapting the CRISPR-Cas9 technology allowing multiplexing (Schuster et al. 2017) to S. reilianum will be instrumental in testing this hypothesis in future studies. Alternatively, the candidate effectors we investigated may be needed under conditions which differ from those tested here. For example, S. reilianum f. sp. zeae can also systemically colonize maize plants via root infection (Mazaheri-Naeini et al. 2015), a colonization route we have not assessed in our experimental setup. Moreover, we employed only one maize cultivar for infection assays. Results from other pathosystems suggest that virulence effects can strongly depend on the host and pathogen genotypes, in particular in the presence of R and avr genes (Petit-Houdenot and Fudal, 2017). No avr-R gene interaction was described so far in the S. reilianum f. sp. zeae-maize pathosystem. Instead, quantitative virulence differences are observed when different host cultivars are infected (Lübberstedt et al. 1999). Knowing the expression profile of effector genes may assist the identification of differences in development of the mutants compared to wild type strains. Since we lack this information, we scored disease symptoms only in the inflorescences about nine weeks after infection. Additionally, it may be possible that small differences in virulence between JS161 and deletion mutants of candidate genes remain undetected due to the weak infection behavior of JS161. For example, spore formation is only rarely observed after infecting maize plants with the solopathogenic strain. In contrast, infections resulting from infections with two compatible haploid strains show spores in about 40 % of the infected plants (Zuther et al. 2012). This means that defects related to spore formation will not be evident in mutants of JS161. In three cases positive selection was detected in orthologous genes in S. reilianum f. sp. reilianum while candidate effector genes were for experimental reasons deleted in S. reilianum f. sp. zeae. Therefore, it cannot be excluded that these effectors might have a virulence function in S. reilianum f. sp. reilianum. In this case, the positively selected effector genes might have evolved during adaptation to the sorghum host and present host specificity genes. In summary, our virulence assays leave open the possibility that the eight candidate genes which did not show a contribution to virulence could play a role in pathogenicity under conditions not tested here. Alternatively, candidate effector proteins might also be positively selected for traits that are not directly linked to pathogenicity. Such traits could for instance involve competition with large numbers of other plant colonizing microbes (Zhan and McDonald 2013; Rovenich et al. 2014). Secreted proteins of S. reilianum could act for example as toxin or could efficiently utilize resources from the environment and thereby limit the growth of other microbes. In these cases, a contribution to virulence is not expected to be observed in the employed infection assay with the effector gene mutants. Moreover, our molecular dating analysis showed that the common ancestors of the investigated smut species originated before the beginning of crop domestication. Therefore, positive selection, whose signs we detect by our approach, has most likely occurred on ancestral host plants and not on the domesticated host maize. Consequently, some of the candidate effector genes under positive selection might not be important for the colonization of crop plants, but for infection of related wild species.
In U. maydis, we note that effector genes residing in clusters whose deletion affected virulence (Kämper et al. 2006; Schirawski et al. 2010) have similar dN/dS ratios as effector genes in clusters where the deletion had no effect on virulence (median dN/dS ratio 0.0619 vs.0.1094; Wilcoxon rank test with P value = 0.1848). Furthermore, orthologues of the effectors Pep1, Stp1 and Cmu1, which were shown to have important roles in pathogenicity of U. maydis (Djamei et al. 2011; Doehlemann et al. 2009; Schipper 2009) showed no signature of positive selection. These observations could suggest that certain fungal effector proteins are under evolutionary constraint and are therefore not free to accumulate non-synonymous mutations. Such effectors are conserved over long time spans (Schirawski et al. 2010; Hemetsberger et al. 2015; Sharma et al. 2015) and this illustrates that they are instrumental for successful infections in a large group of smut fungi. They probably target molecules shared by several host plants, for example housekeeping functions that cannot easily evolve in response to the binding of an effector.
One candidate gene (sr10529) under positive selection in both formae speciales of S. reilianum showed a strong contribution to virulence upon deletion. It is orthologous to the previously described protease inhibitor Pit2 in U. maydis, where the deletion also abolished virulence (Doehlemann et al. 2011; Mueller et al. 2013). Positively selected residues in the PID14 domain of Pit2 might reflect that different proteases need to be inhibited in maize and sorghum. Pit2 might thus contribute to determining the host range of the respective species. A role of cysteine protease inhibitors in host specificity was demonstrated in Phytophthora infestans, a pathogen of potato and its sister species Phytophthora mirabilis, which infects the ornamental plant Mirabilis jalapa. Positively selected orthologous protease inhibitors were shown to inhibit proteases specific to the respective host plants and this specificity could be traced back to a single amino acid substitution (Dong et al. 2014). Surprisingly, 24 positively selected sites in Pit2 were detected outside the PID14 domain in the 57 amino acid long C-terminal part in both S. reilianum f. sp. zeae and S. reilianum f. sp. reilianum. This finding raises the intriguing possibility that the C-terminus of Pit2 might possess a second function that is independent of protease inhibition. Earlier work has shown that the pit1 gene encoding a transmembrane protein is located next to the pit2 effector gene and both genes contribute similarly to virulence (Doehlemann et al. 2011). Furthermore, pit1 and pit2 are divergently transcribed, which makes it likely that the expression of pit1 and pit2 is co-regulated. In addition, this gene arrangement of pit1 and pit2 is conserved in U. hordei, U. maydis, S. scitamineum and S. reilianum (Sharma et al. 2015). This finding has led to the speculation that Pit1 and Pit2 somehow act together to govern virulence of U. maydis and related smut fungi. It was hypothesized that that Pit2 shuttles apoplastic maize proteins towards Pit1, thereby scavenging damage-associated molecules (Doehlemann et al. 2011). In this scenario, the positively selected amino acids in the C-terminus of Pit2 could have been selected for scavenging such molecules as adaptation to the two hosts. In future studies it will be highly interesting to complement the pit2 mutant of S. reilianum f. sp. zeae with the pit2 orthologue of S. reilianum f. sp. reilianum to see if this promotes virulence on sorghum.
Conclusions
Screens for genes with signs of positive selection are commonly used to identify candidate effector genes in various plant pathogenic microbes. However, it is currently largely open whether positively selected effector genes play indeed a role in virulence. Here, we used comparative genomics of five smut fungi and showed that only one out of nine genes under positive selection contributes to virulence of S. reilianum. Moreover, the majority of positively selected genes did not encode predicted secreted proteins. Our results leave open the possibility that many genes with signatures of positive selection contribute to virulence under conditions not tested in this study or are selected in traits that are not directly related to pathogenicity.
Acknowledgements
Our work was supported through the LOEWE program of the state of Hesse through SYNMICRO and through the Max Planck Society. GS was funded by the International Max Planck Research School (IMPRS) for Molecular, Cellular and Environmental Microbiology. We thank Michael Bölker for providing the yeast strain BY4741 and the plasmid pRS426 which were used for cloning purposes, and Benoit Nabholz and Nicolas Lartillot for guidance with the dating analysis. We are grateful to Markus Rampp (Max Planck Computing Center Garching) for assistance with computational work and to Stefan Schmidt for expert greenhouse service, as well as all present and former group members for lively discussions and Eva H. Stukenbrock for her comments on the manuscript.