Abstract
Horizontal Gene Transfer (HGT) in eukaryotic plastids and mitochondrial genomes is frequently observed, and plays an important role in organism evolution. In yeasts, recent mitochondrial HGT has been suggested between S. cerevisiae and S. paradoxus. However, few strains have been explored due to the lack of accurate mitochondrial genome annotations. Mitochondrial genome sequences are important to understand how frequent these introgressions occur and their role in cytonuclear incompatibilities. In fact, most of the Bateson-Dobzhansky-Muller genetic incompatibilities described in yeasts are driven by these cytonuclear incompatibilities. In this study, we have explored the mitochondrial inheritance of several worldwide distributed Saccharomyces species isolated from different sources and geographic origins. We demonstrated the existence of recombination hotspots in the mitochondrial region COX2-ORF1, likely mediated by the transfer of two different types of ORF1, encoding a free-standing homing endonuclease, or facilitated by AT tandem repeats and GC clusters. These introgressions were shown to occur both at intra- and interspecific levels. Based on our results we proposed a model which involve several ancestral hybridization events among Saccharomyces strains in wild environments.
Introduction
Chloroplast and mitochondrial genomes are prone to introgressions and Horizontal Gene Transfers (HGT) (Keeling 2009; Hao et al. 2010), which likely play an important role in the evolution of eukaryotes (Andersson 2009). Mitochondria are involved in multiple cellular processes (Hatefi 1985; Green and Reed 1998; Starkov 2008). In yeasts, around 750 nuclear encoded proteins must coordinate with those encoded in the mitochondrial genome (Sickmann et al. 2003). Indeed, a new interdisciplinary field, the “mitonuclear ecology”, is devoted to the study of the evolutionary consequences due to mitonuclear conflicts (Hill 2015).
In yeasts, recombination in mitochondrial genomes has been mostly focused on Saccharomyces cerevisiae (Dujon et al. 1974; Birky et al. 1982; Taylor 1986; MacAlpine et al. 1998), the model yeast species in the genus Saccharomyces (Hittinger 2013). The mechanism of mitochondrial genome inheritance is well known in Saccharomyces (Ling et al. 2007; Basse 2010; Ling et al. 2011); but despite the recently mitochondrial genome characterization of a hundred S. cerevisiae, mostly clinical (Wolters et al. 2015), and few S. paradoxus, and the detection of mitochondrial introgression between those two species (Wu et al. 2015; Wu and Hao 2015), little is known about the genomic properties of most Saccharomyces species.
In this study, we inferred the mitochondrial inheritance of 517 Saccharomyces strains, as well as 49 natural interspecific hybrids, isolated from different sources and geographic origins, by sequencing a mitochondrial gene, COX2. This gene has successfully been used either for phylogenetic purposes (Belloch et al. 2000; Kurtzman and Robnett 2003) or for the identification of mtDNA inheritance (Peris et al. 2012a; Peris et al. 2012b; Badotti et al. 2013; Peris et al. 2014; Pérez-Través et al. 2014; Rodríguez et al. 2014). We extended our work to the downstream gene ORF1, an unstudied gene encoding a putative free-standing homing endonuclease. A homing endonuclease gene (HEG) is a selfish element described to evolve neutrally, following three steps: i) invasion of an empty site (HEG-) (Colleaux et al. 1986; Burt and Koufopanou 2004), ii) accumulation of mutations generating premature stop codons or invaded by GC clusters which can disrupt their coding frame, iii) and a final loss of the gene (Goddard and Burt 1999). We also sequenced COX3, which encodes for cytochrome c oxidase subunit III, to demonstrate the presence of different mitochondrial recombination hotspots in the COX2-ORF1 region. We suggest molecular mechanisms that drive such recombinations. In addition, we propose a reticulate event model in the Saccharomyces mitochondrial genome as a starting hypothesis for future tests, when new complete Saccharomyces mitochondrial genomes will be available.
Results
COX2 shows extensive reticulation among Saccharomyces species
To understand the mitochondrial inheritance of worldwide distributed wild and domesticated Saccharomyces yeasts (Figure 1), we sequenced and retrieved from public databases, the mitochondrial COX2 gene sequences, generating a set of 566 Saccharomyces strain sequences (Table S1). COX2 sequence alignment contained 80 phylogenetically informative positions (see details in Supplementary Text). The COX2 phylogenetic tree failed to reconstruct the species tree (Figure S1), likely indicating the existence of conflicting data in our sequence alignment. A Median-Joining network showed ten haplogroups, supported by sequence inspection (Figure 2 and S2). S. cerevisiae were found in three haplogroups: C1a, C1b and C2. Three haplogroups differentiated S. paradoxus populations: P1 (Europe), P2 (America B and C) and P3 (Far East). Two haplogroups for S. mikatae: M1 (IFO1815) and M2 (IFO1816). S. kudriavzevii strains were split into K1 (European and Asia A - IFO1802) and K2 (IFO1803). Finally, one haplogroup for the rest of species: A (S. arboricola, CBS10644), E (S. eubayanus) and U (S. uvarum).
Extensive analysis showed the presence of several recombinant haplotypes among strains from different species (Figure S2). New recombinations were identified in Far Eastern S. paradoxus (haplotype 64) and S. cerevisiae strains (haplotype 2, 60 and 94) (Figure 2 and S2). We segmented the original COX2 alignment based on the most common recombination point (Figure S2) and a Neighbor-Net (NN) phylogenetic network was reconstructed for each COX2 segment (Figure S3). The 5’ end segment NN phylonetwork shows a clear differentiation of haplotypes by species with the exception of haplotypes 69, 70 and 71 belonging to American S. paradoxus, which share identical sequences with some S. cerevisiae strains (Figure S3A). Haplotypes from natural hybrids (haplotypes 78, 79, 87, 88, 89 and 93) and S. eubayanus (haplotype 102) still appear in an ambiguous position in the network due to the presence of a different recombination point (Figure S2). The extensive presence of incongruent data found in the 3’ end segment of COX2 (Figure S3B) might be also due to recombination.
We were able to define the potential donors of several hybrids (COX2 haplotypes 87, 88, 89), being S. kudriavzevii (European or Asian) and a European S. paradoxus (Figure S2). Hybrids S. cerevisiae x S. kudriavzevii isolated from wine (UvCEG) and a dietary supplement (IF6), and S. cerevisiae x S. uvarum isolated from wine (S6U) inherited an already recombinant mitochondria from S. cerevisiae (Figure S2, Table S1). Hybrids S. eubayanus x S. uvarum and S. cerevisiae x S. eubayanus (haplotypes 78, 79, 93) were also recombinant between S. eubayanus and S. uvarum.
S. mikatae IFO1815 and S. kudriavzevii IFO1803 haplotypes were also potentially recombinant (Figure S2). In the case of IFO1803, amino acid sequences indicates a recombination between S. kudriavzevii and S. uvarum; although, the S. uvarum haplotype corresponds to an unknown strain. For S. mikatae IFO1815, a potential donor might be a S. cerevisiae strain from haplogroup C1b, but its amino acid sequence is similar to IFO1816.
All S. cerevisiae COX2 haplogroups are worldwide distributed
The recombination hotspot located in COX2 makes this gene a good candidate to differentiate closely related strains. Those strains sharing a similar recombination are expected to share an ancestor. To describe the phylogeography of S. cerevisiae, we explored the polymorphic COX2 gene of 418 S. cerevisiae from 7 continents, isolated from both human-associated (baking, beer, clinical, laboratory, fermentation, sake, wine and traditional alcoholic beverages) and wild environments (Figure 1, Table S1).
An association study among S. cerevisiae strain origins and their haplogroup distribution was performed (Figure S4A). Clinical and wine samples were significantly associated with haplogroup C2 (x2 test p-values 9.9 × 10−5 and 3 × 10−4, respectively). We applied a similar approach to infer the distribution of haplogroups taking into account their geographic origins (Figure S4B), revealing that European S. cerevisiae strains were highly associated with Haplogroup C2 (x2 test p-value 9.9 × 10−5). No significant distribution bias was observed in the remaining strains, possibly due to the low number of wild isolates in our study.
Two types of the putative homing endonuclease ORF1 define two S. cerevisiae Haplogroups
To define the extension of the recombination, we sequenced the ORF1 region from 36 strains representative of the most frequent COX2 haplotypes, exploring a total of seventy-two ORF1 sequences (Table S1). The presence of ORF1 gene was confirmed for the six available Saccharomyces species, indicating its broad dissemination across the genus. ORF1 length was found to be highly polymorphic among strains and species, mostly due to differing content of AT-rich tandem repeats and GC clusters (see Supplementary Text, Figure S5), with lengths ranging from 1.3kbp (ZA17) to 1.5kbp (VRB).
The ORF1 phylogenetic tree (Figure S5) conflicts with the species tree (Figure S1), indicating a potential HGT from one species to another or the presence of recombinant sequences. To visualize the presence of conflicting data in the alignment, we reconstructed a NN phylogenetic network (Figure 3). Two groups of sequences were visualized in this network: the Type I group contains most Saccharomyces ORF1 sequences, except for sequences of S. kudriavzevii. The Type II group is comprised of S. cerevisiae haplotypes from the COX2 haplogroup C2 and the sequence of the Far Eastern S. paradoxus CECT11152 (M51, syn. IFO1804), isolated from Far East Russia, which also appeared as having a recombinant COX2 (Figure S2 and S3). Haplotypes corresponding to strains of S. cerevisiae, some European and Far Eastern S. paradoxus, and S. kudriavzevii, as well as some hybrids, were located in an ambiguous position in the phylonetwork, between the two main ORF1 groups (Type I and Type II), suggesting they correspond to recombinant forms.
Recombinant analyses, performed with both RDP and GARD programs, supported the presence of four partitions. A Kishino-Hasegawa test of phylogenetic congruence indicated that phylogenetic trees from each partition were incongruent with each other. The best partition model was 4 partitions (ΔAICc 143.724), and for each breakpoint the p-value was below 0.01. However, RDP and GARD disagreed with the location of the second breakpoint due to the presence of different recombinant points, depending on the strain (Figure S6 and S7B). Some recombination breakpoints were located close to A+T-rich sequences or GC clusters (see Supplementary text, Figure 4). One of the recombination points was just on the beginning of the second LAGLIDADG domain of the encoded homing endonuclease (Figure 4). At least one of the recombinant events involved some of the haplotypes located in the ORF1 phylogeny in-between the two main types (Figure S6 and S7). Further recombinations between Type I x Type I, Type I x Type II and Type II x Type II were detected (Figure S6 and S7).
Data indicates a recent recombination event between ORF1 from S. kudriavzevii and Far Eastern S. paradoxus, supported by the high identity among S. kudriavzevii and Far Eastern S. paradoxus (CECT11422 and CECT11424) ORF1 amino acid sequences for two segments of the alignment (Figure S6 and S7). For example, the genetic distance for the second region was: 1.1% (S.par FE-S.kud IFO1802) and 2.4% (S. par FE-S.kud EU), much lower compared to the nuclear genetic differences, 13.66% and 13.55%, respectively, and when it was compared to the forth ORF1 segment (Figure S6 and S7): 46.05% (S. par FE-S.kud IFO1802) and 47.91% (S.par FE-S.kud EU).
Particular attention must be given to the strains CECT11757 and L1528, representative of the C2xP3 recombinant COX2 haplogroup. Their COX2 sequence suggested that their ORF1 sequence should be closely related to type II; however, their ORF1 sequence in fact was related to type I. This result suggests a second recombination event occurred in L1528, which recombinant ORF1 contains a type II sequence from the second to the last segment (Figure S6 and S7). However, the case of CECT11757 ORF1 seems more complex. This ORF1 is a type I closely related to that from European S. paradoxus, which suggests a second recombination event with a European S. paradoxus ORF1 while maintaining the typical COX2 3’ end region of a Far Eastern S. paradoxus. Another scenario might be that COX2 3’ end is under a different substitution rate driving to homoplasy in that region, according to a “patchy-tachy” model (Sun et al. 2011). For this reason we extended the analysis to the ORF1 region, where most of the recombinations were well supported by the invasion of ORF1 or mediated by highly recombinogenic regions.
COX3 supports a mtDNA transfer from S.cerevisiae to American S. paradoxus strains and HGT/introgression in the COX2-ORF1 region
To improve the species assignment based on a mitochondrial gene, we sequenced the COX3 gene for those strains representative of the most frequent COX2 haplotypes (Table S1). The selection of COX3 for species assignment was based on its lack of introns or an overlapping homing endonuclease, minimizing a potential recombinant scenario.
The COX3 NJ phylogenetic tree was congruent with the species tree (Figure S1 and S8), except for American S. paradoxus. In addition, the position of S. mikatae was not well resolved. The COX3 MJ phylogenetic network assigned species by haplogroups (Figure 5). We could not define the S. eubayanus haplogroup because its COX3 sequence was not available. We found a high number of nucleotide substitutions in the S. cerevisiae COX3 sequences retrieved from Saccharomyces Genome Resequencing Project (SGRP) (H4, H9-11), indicative of assembling errors in these mitochondrial genomes; nevertheless, the topologies of the MJ network and the NJ tree were not affected.
The CECT11152 COX3 sequence was located within S. paradoxus haplogroup. This result confirms an HGT event involving the 3’ end COX2 and ORF1 region (Figure S3B), but not affecting the rest of the mitochondrial genome (Figure S8). A similar conclusion is reached for S. cerevisiae x S. kudriavzevii hybrids (CECT1102, CECT11011, CECT1990), where most of the mitochondrial genome is from S. kudriavzevii. The two representative strains from the COX2 haplogroup C2 (L1528 and CECT11757) possess an ORF1 closely related to the European S. paradoxus CECT10380 but have a S. cerevisiae COX3 sequence type, supporting another HGT for the COX2-ORF1 region (Figure 5 and S8).
Our two representatives from the American S. paradoxus, 120MX and CBS5313, displaying an identical 5’ end COX2 segment sequence to S. cerevisiae sequences (Figure S2A), also had a COX3 sequence closely related to the S. cerevisiae haplogroup. The COX3 haplotype 17 of 120MX was identical to S. cerevisiae COX3 sequences (Figure 5 and S8). This result suggests that the American S. paradoxus likely inherited a mitochondrial genome from S. cerevisiae.
Discussion
Recombinations in the COX2-ORF1 region might be mediated by ORF1, GC clusters and/or AT tandem repeats
Evidence of intraspecific mitochondrial recombination among S. cerevisiae strains has been shown (Dujon et al. 1974; Nunnari et al. 1997; Berger and Yaffe 2000). Indeed, a complete DNA recombination map in S. cerevisiae has been recently drawn (Fritsch et al. 2014). Mitochondrial recombination is initiated by a double-strand break (DSB) generated and resolved by mainly four nuclear encoded proteins (Lockshon et al. 1995; Ling and Shibata 2002; Ling et al. 2007; Ling et al. 2013) and it can be facilitated by the mitochondrial genome architecture, presence of GC clusters, or by the mobility of mitochondrial elements, such as introns or homing endonucleases (Dieckmann and Gandy 1987; Séraphin et al. 1987; Yang et al. 1998). We found that two COX2 S. cerevisiae and S. paradoxus groups of sequences are mainly driven by the presence of two highly divergent ORF1 gene sequences, type I and type II. The presence of two clear COX2 groups might be the result of the ORF1 invasion.
A recent study, where fourteen S. cerevisiae and S. paradoxus mitochondrial genomes were sequenced, found HGTs between those two species mainly facilitated by GC clusters (Wu et al. 2015; Wu and Hao 2015). Our results showed recombination breakpoints close to GC clusters and AT tandem repeats in wild Saccharomyces strains, supporting that these genomic elements are facilitating mitochondrial recombination events. All these three elements, GC clusters, AT tandem repeats and homing endonucleases might be responsible of facilitating recombination in mitochondria.
The clearest example of ORF1 transfer was between Far Eastern S. paradoxus CECT11152 and S. cerevisiae, which is in agreement to the described HGT of GC cluster GC48 between S. cerevisiae and IFO1804 (CECT11152) (Wu and Hao 2015), suggesting that both transfers likely occurred at the same time. For this reason, we propose that ORF1 is a likely functional homing endonuclease similar to other S. cerevisiae free-standing homing endonucleases, such as ORF3 (ENS2) (Séraphin et al. 1987; Nakagawa et al. 1992). Following nomenclature rules (Belfort and Roberts 1997), we suggest renaming ORF1 to F-SceIIIα or F-SceIIIβ for S. cerevisiae and replacing the last two letters with the corresponding first two letters for each species ORF1, where “F” stands for freestanding homing endonuclease, “S” for Saccharomyces, “ce” for cerevisiae and III because F-SceI and F-SceII are designated for Endo.SceI (ENS2, ORF3) and HO endo, respectively. The type I and type II group membership is designated by using a suffix α and β, respectively.
Wine S. cerevisiae domestication bottleneck fixed a mitochondrial variant
Wild Saccharomyces yeasts reproduce mainly by mitotic divisions (Tsai et al. 2008; Liti et al. 2009); however, some industrial strains have been shown to be hybrids (Lopandic et al. 2007; González et al. 2008; Sipiczki 2008; Dunn and Sherlock 2008; Peris et al. 2012c; Pérez-Través et al. 2014) or experienced some degree of sexual mating with closely related or fairly distant species, as it is supported by the presence of nuclear genome introgressions (Liti et al. 2006; Novo et al. 2009; Dunn et al. 2012; Almeida et al. 2014). The result of hybridization and HGT in industrial conditions can be adaptive (Belloch et al. 2008; Gibson et al. 2013). Most of these studies have been focused in the nuclear genome, but little is known about the mitochondrial genotype of wild and industrial strains.
The low mutation rate of yeast’s mitochondrial coding sequences (Clark-Walker 1991) and assembling method improvements are turning the attention to the mitochondria, which have been recently used to estimate time divergence among Schizosaccharomyces pombe strains (Jeffares et al. 2015), and to infer the population structure of 100 S. cerevisiae (Wolters et al. 2015). Although the presence of recombinations or the utilization of highly polymorphic sequences is not well suited for time divergence estimation, mitochondrial sequence is useful for tracing evolutionary relationships between closely related strains (Bartelli et al. 2013; Wolters et al. 2015).
Two independent S. cerevisiae domestication events from wild isolates have been inferred (Legras et al. 2007), one for sake and another for wine strains (Fay and Benavides 2005; Liti et al. 2009; Schacherer et al. 2009). Indeed, S. cerevisiae wine domestication is attributed to be originated in the Near East (Fay and Benavides 2005; Liti et al. 2009) probably from the wild S. cerevisiae stock from the Asian continent (Wang et al. 2012).The high frequency of European wine strains with haplogroup C2 suggests that the bottleneck during the domestication to winemaking fixed few COX2 variants from haplogroup C2. At the nuclear level, similar results were observed for the wine/European strains (Liti et al. 2009; Schacherer et al. 2009). Most clinical samples are derived from wine isolates, as the mitochondrial haplogroup indicate, in agreement with previous results of the 100 S. cerevisiae project (Strope et al. 2015; Wolters et al. 2015). With the expansion of domesticated wine S. cerevisiae strains together with vineyards throughout Europe by Phoenicians and Romans, and the migration to America after European colonization, these wine S. cerevisiae ORF1 sequences were able to recombine with other S. cerevisiae ORF1 sequences. Wild and wine S. cerevisiae crosses might have been possible, as wine isolates have been found in oak trees close to vineyards (Hyma and Fay 2013). It is also clear how in the case of the dietary supplement and wine hybrids S. cerevisiae x S. kudriavzevii (IF6 and UvCEG) and a winemaking hybrid S. cerevisiae x S. uvarum (S6U) inherited a recombinant mitochondria from S. cerevisiae strains mostly associated with wine from Europe, South America and Africa. Before hybridization of S. cerevisiae with other Saccharomyces species to generate the IF6/UvCEG and S6U hybrids, S. cerevisiae was introgressed with Far Eastern S. paradoxus, likely before migrating to Europe (Figure 6). Introgressions occurring before hybridization might be also a potential scenario for some other hybrids, such as S. cerevisiae x S. kudriavzevii hybrids which showed introgressions from European S. paradoxus (Peris et al. 2012a) or introgressions from European S. uvarum into mitochondria of S. cerevisiae x S. eubayanus hybrids (Peris et al. 2014).
The impact of the haplogroup C2 on the winemaking process is of interest to understand the domestication of S. cerevisiae. Interestingly, most of our hybrids have inherited a non-cerevisiae mitochondrial genome (Peris et al. 2012a; Peris et al. 2012b; Peris et al. 2014; Pérez-Través et al. 2014). Previous work (Warren et al. 2013) has demonstrated how the inheritance of one of the parental mitochondrial genomes impacts respiration in hybrids, highlighting the importance of taking into account the fixation of the mtDNA during the generation of artificial hybrids for specific industrial processes.
Introgressions might influence diversification
All potential transfers detected in our study are summarized in Figure 6 which is a starting point to model mitochondrial introgressions in Saccharomyces genus. Some particular Saccharomyces lineages have some degree of introgression, such as S. kudriavzevii Asia B, S. kudriavzevii Asia A and European and S. mikatae IFO1815.
The clearest and most interesting introgression was detected in all American S. paradoxus from populations B and C. These strains have a COX2 sequence with a closely related 5’ end to S. cerevisiae from haplogroup C1, and the two selected representative strains (120MX and CBS5313) shared a COX3 sequence with S. cerevisiae strains. Recent evidence of phylogenetic tree incongruence among S. cerevisiae and American population B S. paradoxus YPS138 (Leducq et al. 2014) strain was also detected (Wu et al. 2015), supporting our hypothesis. Our results indicate that the American S. paradoxus has inherited the mtDNA from S. cerevisiae. Indeed, a recent study has described a S. cerevisiae from an unknown source with a mitochondrial genome closely related to S. paradoxus CBS432 (Wolters et al. 2015), which supports the hypothesis that wild strains can survive with foreign mitochondrial genomes. The increase of new isolates resulting from the improvement of isolation methods (Sampaio and Gonçalves 2008; Sylvester et al. 2015), along with the application of new methods for assembling mitochondrial genome will shed light about other American S. paradoxus mitochondrial inheritance and the extension of mitochondrial introgressions among other species. The presence of a foreign mitochondrial genome brings the question is mitochondrial introgression has an influence in the diversification of species. We have previously shown how hybrid S. cerevisiae x S. kudriavzevii strains inheriting a mitochondrial genome from S. cerevisiae are more prone to lose S. kudriavzevii genes (Peris et al. 2012c) suggesting an evolutionary constriction due to the proper coordination of nuclear genes with mitochondrial genes. Nonetheless, most of Bateson-Dobzhansky-Muller incompatibilities are between nuclear and mitochondrial genes (Lee et al. 2008; Chou et al. 2010; Hou et al. 2015). In this way, the accommodation of nuclear genome to this new mitochondrial genome might drive the diversification of American S. paradoxus from the other S. paradoxus lineages.
Mitochondrial introgressions as evidence of ancestral hybridization events in wild environments
Saccharomyces double and triple hybrids have been isolated from many different industrial conditions, such as “ale” and lager beer, wine, cider, dietary supplements (Casaregola et al. 2001; Barros Lopes et al. 2002; González et al. 2008; Peris et al. 2012a; Pérez-Través et al. 2014) and clinical samples (Peris et al. 2012a). However, hybrids have not been isolated from natural samples, suggesting that hybridization is only occurring in artificial conditions where hybrids are better adapted to those stressful environments (Belloch et al. 2008) by the acquisition of beneficial traits from the parents (Peris et al. 2012c; Gibson et al. 2013). HGT between S. cerevisiae and S. paradoxus may be mediated by the formation of heterokaryons due to pseudohyphae formation of Saccharomyces strains found in sympatric association (Wu et al. 2015), such as S. cerevisiae, S. paradoxus and S. kudriavzevii in Europe (Sampaio and Gonçalves 2008). However, we cannot rule-out a complete hybridization with a rapid loss of one of the parent genomes by outcrossings with the non-hybridized sibling strains. The diversity of S. cerevisiae x S. kudriavzevii hybrids (Peris et al. 2012a), most of them with lower S. kudriavzevii genome content, suggests a rapid loss of S. kudriavzevii genes after hybridization while keeping traits important for low temperature fermentations (Peris et al. 2012c). In addition, several introgressions involving most Saccharomyces species suggest some gene flow among them (Liti et al. 2005; Muller and McCusker 2009; Almeida et al. 2014) that supports hybridization events in natural environments. Our study suggests that Asia is a hotspot of these hybridization events and that hybridization might be influenced by the presence of all Saccharomyces species, as the Asian origin model suggests (Bing et al. 2014; Liti 2015).
Material and Methods
Saccharomyces strains and culture media
A collection of 517 Saccharomyces strains and 49 natural hybrids, worldwide distributed (Figure 1, Table S1), were used in this study. Species assignment for each strain was delimited by different authors using molecular techniques, such as the sequencing of 5.8S-ITS region, Random Fragment Length Polymorphisms (RFLP), a multilocus sequence approach or whole genome sequencing (Table S1). Hybrids were mostly characterized on the basis of restriction analysis of 35 different nuclear genes (González et al. 2008; Peris et al. 2012a; Peris et al. 2012b; Pérez-Través et al. 2014). Naumovozyma castellii sequences were included as outgroup references to root phylogenetic trees. Yeast strains were grown at 28 °C in YPD (2% glucose, 2% peptone and 1 % yeast extract).
PCR amplification, sequencing and gene alignments
Total yeast DNA was extracted following the procedure described by Querol et al. (1992). Partial gene sequence (585bp) of the mitochondrial gene COX2 was amplified by PCR using the primers described in Belloch et al. (2000). COX3 and ORF1 gene sequences were amplified and sequenced, using primers described in Table S2, for seventy-two Saccharomyces strains, representatives of the most frequent COX2 haplotypes (Table S1). For the sequencing of ORF1 we followed a primer walking approach (Figure 4). ORF1 gene amplification from IFO1815 failed to be amplified, and the reference S. eubayanus strain yHCT76 was not available in the period of this study. Sequences were deposited in GenBank under Accession nos. JN676363-JN676823 and JN709044-JN709115. Gene sequence accession numbers from previously sequenced strains are shown in Table S1. Sequences from S. cerevisiae strains from the Saccharomyces Genome Resequencing Project (SGRP) were retrieve using the blast server (http://www.sanger.ac.uk/cgi-bin/blast/submitblast/s cerevisiae sgrp). A PSI-Blast search was run to retrieve ORF1 sequences from the closest sequences of non-Saccharomyces species.
COX2 and COX3 sequences were aligned using CLUSTALW, as implemented in MEGA v5 (Tamura et al. 2011), and manually trimmed. For ORF1 we used MUSCLE (Edgar 2004) to align the aminoacid sequences and the nucleotide sequence alignment was further refined by visual inspection in Jalview 4.0b2 (Waterhouse et al. 2009).
COX2, ORF1 and COX3 haplotype classification, and COX2 genetic diversity
DnaSP v5 (Librado and Rozas 2009) was used to calculate the number of haplotypes of COX2, ORF1 and COX3 and genetic statistics of COX2, such as the number of polymorphic sites(s), average number of differences between sequences (k), nucleotide diversity (π) and haplotype diversity (Hd) based on the species designation.
Phylogenetic analysis and detection of recombination
COX2, ORF1 and COX3 phylogenetic trees reconstructed using the Neighbor-Joining (NJ) methods was performed in MEGA v5 (Tamura et al. 2011), performing 10000 pseudoreplicates bootstrapping for branch support. COX2 and COX3 median joining networks were reconstructed using PopART v1.7.2b (http://popart.otago.ac.nz). COX2 and ORF1 Neighbor-Net phylogenetic networks were also reconstructed using SplitsTree v4 (Huson and Bryant 2006) to explore the presence of incongruence in our dataset.
An alignment with representative sequences of each COX2 haplotype was used as an input for RDPv3.44 (Martin et al. 2010) to detect and define recombination points. Recombination points detected by two or more methods implemented in RDPv3.44 were considered significant, applying a Bonferroni correction for multiple comparisons. Although, different recombination points were detected, we defined two COX2 segments using the most frequent recombinant sites. COX2 gene was divided into two segments referred as 5’-end (positions 1-496 in the alignment [124-620 in the reference COX2 gene sequence from the S288c strain, SGD ID: S000007281]) and 3’-end (from 497-end of alignment [621-708 in the reference COX2 sequence]). The Maximum Likelihood phylogenetic trees for both COX2 segments was reconstructed with the best fitted models, inferred using jModeltest (Posada 2008). Tree Puzzle v5.2 (Schmidt et al. 2002) was used to test the phylogenetic congruence of the two inferred phylogenetic trees to the species Saccharomyces phylogenetic tree topology (Borneman and Pretorius 2015). The statistical significance of these comparisons was performed by the Shimodaira-Hasegawa (Shimodaira and Hasegawa 1999) and ELW (Expected-Likelihood Weights) (Strimmer and Rambaut 2002) tests.
A concatenated alignment of COX2 position 621 to the end and the partial ORF1 sequences was generated. Indels, mostly due to AT repetitive regions, and GC clusters were removed. For the detection of recombinant sites we followed the approach described above. Four segments were described for the most frequent recombinant points. First segment (224bp) takes the COX2 region and the 246 nucleotide positions of ORF1 (corresponding to nucleotide 292 in the S288c ORF1 gene, SGD ID: S000007282). The last COX2 nineteen nucleotides are the first ORF1 nucleotides, both CDS are overlapped. The second segment takes 247 to 644 position of ORF1 (from 203-704 in S288c ORF1), the third was built from 64 to 920 (706-980 in S288c), and the forth from 921 to the end of the alignment (981-1435 in S288c). Recombinant segments were also supported by the GARD (Genetic Algorithm Recombination Detection) method implemented in Datamonkey (Delport et al. 2010), which also perform a Kishino-Hasegawa test (Kishino and Hasegawa 1989).
The species tree was reconstructed by using a concatenated alignment of RIP1, MET2 and FUN14, generated with FASCONCAT v1.0 (Kück and Meusemann 2010). Gene sequences were retrieved from Genbank, blast searches against SGRP S. cerevisiae sequences and S. arboricola online blast database (http://www.moseslab.csb.utoronto.ca/sarb/blast/), or blast searches to local databases generated with the Saccharomyces Sensu Stricto (SSS) genomes (http://www.saccharomycessensustricto.org/cgi-bin/s3.cgi?data=Assemblies&version=current). A Maximum-Likelihood (ML) phylogenetic tree was inferred using RAxML v8 (Stamatakis 2014) by performing 100 heuristic searches for the best gene tree, which branches were bootstrap supported by 1000 pseudosamples.
Statistical analysis
X2 test for detecting bias distributions of S. cerevisiae strains by country or isolation source among the COX2 haplogroups was performed in R statistical package (Adler and Murdoch D 2009). p-values were replicated 10000 times by a Monte Carlo simulation.
Acknowledgments
We thank to C.P. Kurtzman to provide us the S. mikatae NRRL Y-27342. We thank Chris Todd Hittinger and William G. Alexander for critical comments on the manuscript. This work was supported by Spanish Government grant (AGL2009-12673-C02-02) and Generalitat Valenciana grants (PROMETEUS and ACOMP/2012) to EB, and from the Spanish Government FEDER (AGL2012-39937-C02-01) and Generalitat Valenciana (PROMETEOII/2014/042) to AQ. DP acknowledges to the Spanish Government for its Ministerio de Ciencia e Innovacion (MICINN) FPI fellowship. AA received PROMEP Fellowship from SEP, Mexican government. LP acknowledges to CSIC and the Spanish Ministry of Education and Science (MEC) for an I3P fellowship. SO acknowledges to MEC for the postdoctoral research contract.