Abstract
Domestication provides an excellent framework for studying adaptive divergence. Using population genomics and phenotypic assays, we reconstructed the domestication history of the blue cheese mold Penicillium roqueforti. We showed than this fungus was domesticated twice independently. The population used in Roquefort originated from an old domestication event associated with weak bottlenecks and exhibited traits beneficial for pre-industrial cheese production (slower growth in cheese and greater spore production on bread, the traditional multiplication medium). The other cheese population originated more recently from the selection of a single clonal lineage, was associated to all types of blue cheese worldwide but Roquefort, and displayed phenotypes more suited for industrial cheese production (high lipolytic activity, efficient cheese cavity colonization ability and salt tolerance). We detected genomic regions affected by recent positive selection and putative horizontal gene transfers. This study sheds light on the processes of rapid adaptation and raises questions about genetic resources conservation.
Introduction
Mechanisms of adaptive divergence (population differentiation under selection) are key questions in evolutionary biology for the understanding of how organisms adapt to their environment and how biodiversity arises. Domestication studies can help us understand adaptive divergence as this process involves strong and recent selection for traits that can be easily identified. Furthermore, closely related non-domesticated populations are often available, making it possible to contrast their traits and genomes with those of domesticated populations. This approach has already proved to be powerful for reconstructing the origin of domesticated populations and the genetic architecture of traits selected by humans. It has been applied to maize and teosinte, and to dog breeds and wolves [1-6]. Independent domestication events from the same ancestral population are particularly interesting because they provide replicates of the adaptation process and insights into the evolution predictability and constraints [7-12]. Comparisons of domesticated varieties selected for different phenotypes have also proved to be a powerful approach for elucidating the mechanisms of adaptation, for example in dog breeds and pigeons [13, 14]. Studies on genetic diversity and subdivision in domesticated organisms provides also crucial information for the conservation of genetic resources. Ancient domestication processes were slow and involved contributions from large numbers of farmers. By contrast, recent breeding programs have been run by a small number of companies; this and international trade demands have resulted in a massive loss of genetic diversity in crops and breeds, potentially jeopardizing adaptive potential for improvement [15-17].
Fungi are interesting eukaryotic models for adaptive divergence studies, with their small genomes, easy access to the haploid phase and experimental tractability for in vitro experiments [18, 19]. Many fungi are used as food sources [20] and some have been domesticated for food production. Propagation of the latter is controlled by humans, and this has resulted in genetic differentiation from wild populations [21-24] and the evolution of specific phenotypes beneficial for humans [24, 25]. Saccharomyces cerevisiae yeasts domesticated for fermentation have provided important insight into adaptive divergence mechanisms, with different yeast lineages independently domesticated for different usages [23, 26, 27]. Studies about yeast adaptation for alcohol and cheese production have highlighted the proximal genomic mechanisms involved, including horizontal gene transfer, selective sweep, hybridization and introgression [25, 27-30].
Penicillium roqueforti, a filamentous fungus used in the dairy industry to impart the typical veins and flavor of blue cheeses, has recently emerged as an excellent model for studying adaptive divergence [31, 32]. Blue cheeses, including Roquefort, Gorgonzola and Stilton, are highly emblematic foods that have been produced for centuries [33]. The strongest genetic subdivision reported in P. roqueforti concerns the differentiation of a cheese-specific population that has acquired faster growth in cheese than other populations and better excludes competitors, thanks to very recent horizontal gene transfers, at the expense of slower growth on minimal medium [31, 34, 35]. Such genetic differentiation and recent acquisition of traits beneficial to cheesemaking in P. roqueforti suggests genuine domestication, i.e., adaptation under selection by humans for traits beneficial for their food production. A second population identified in P. roqueforti and lacking the horizontally-transferred regions includes strains isolated from cheese and other environments, such as silage, lumber and spoiled food, suggesting that adaptive divergence may have occurred [34-36]. The existence of further genetic subdivision separating populations according to the original environment, or protected designation of origin (PDO) for cheese strains has been suggested, but, because it was based only on a few microsatellite markers, the resolution power was low [34-36]. Secondary metabolite production (aroma compounds and mycotoxins) and proteolysis activity have been shown to differ between strains from different PDOs [37]. A high-quality P. roqueforti genome reference is available [32], allowing more powerful analyses based on population genomics.
Another asset of P. roqueforti as an evolutionary model is the availability of vast collections of cheese strains and of historical records concerning cheesemaking [38-43]. While the presence of P. roqueforti in cheeses was initially fortuitous, since the end of the 19th century, milk or curd has been inoculated with the spores of this fungus for Roquefort cheese production. Spores were initially multiplied on bread, before the advent of more controlled in vitro culture techniques in the 20th century [38-43]. Bread was inoculated by recycling spores from the best cheeses from the previous production [38-43]. This corresponds to yearly selection events since the 19th century until ca. 20 years ago when strains were stored in freezers. After World War II, strains were isolated in the laboratory for industrial use and selected based on their technological and organoleptic impact in cheeses and compounds produced [44], which have likely accelerated domestication. This history further suggests there may have been genuine domestication, i.e., an adaptive evolution triggered by human selection for cheese quality. Unintentional selection may also have been exerted on other traits, including growth and spore production on bread, the traditional multiplication substrate.
By sequencing multiple P. roqueforti genomes from different environments and analyzing large collections of cheese strains, we provide evidence for adaptive divergence. We identified four genetically differentiated populations, two including only cheese strains and two other populations including silage and food-spoiling strains. We inferred that the two cheese populations corresponded to two independent domestication events. The first cheese population correspond to strains used for Roquefort production and arose through a weaker and older domestication event, with multiple strains probably originating from different cultures on local farms in the PDO area, presumably initially selected for slow growth before the invention of refrigeration systems. The second cheese population experienced an independent, more recent, domestication event associated with a stronger genetic bottleneck. The non-Roquefort cheese population showed a higher fitness for traits likely to be under selection for modern production of cheese (e.g. growth in salted cheese and lipid degradation activities), while the Roquefort cheese population showed greater spore production on bread, the traditional medium for spore production. The two cheese populations also had different volatile compound profiles, with likely effects on cheese flavor. Moreover, we detected genomic regions affected by recent positive selection and genomic islands specific to a single cheese population. Some of these genomic regions may have been acquired by horizontal gene transfers and have putative functions in the biochemical pathways leading to the development of cheese flavor.
Results
Two out of four populations are used for cheesemaking: one specific to the Roquefort PDO and a worldwide clonal population
We sequenced the genomes of 34 P. roqueforti strains from public collections [33], including 17 isolated from blue cheeses (e.g., Roquefort, Gorgonzola, Stilton), 17 isolated from non-cheese environments (mainly spoiled food, silage, and lumber), and 11 outgroup genomes from three Penicillium species closely related to P. roqueforti (Supplementary Table 1). After data filtering, we identified a total of 115,544 SNPs from the reads mapped against the reference P. roqueforti FM164 genome (48 scaffolds).
Three clustering methods free from assumptions about mating system and mode of reproduction separated the P. roqueforti strains into four genetic clusters (Figs 1, 2 and 3A), two of which almost exclusively contained cheese strains (the exceptions being two strains isolated from a brewery and brioche, Figs 1 and 2, probably corresponding to feral strains). One cluster contained both silage strains (N=4) and food-spoiling strains (N=4), and the last cluster contained mostly food-spoiling strains (N=5) plus strains from lumber (N=2) (Figs 1, 2, and Supplementary Table 1). Noteworthy, these two clusters corresponding to strains from other environments did not include a single cheese strain. The two cheese clusters were not the most closely related, suggesting independent domestication events (Figs 1 and 2). Moreover, cheese clusters were much less diverse than non-cheese clusters, as shown by their shorter branch lengths in the tree, their low genetic diversity represented by small J values and more homogeneous colors in distance-based clustering (Figs 1, 2 and 4A). One of the two cheese clusters displayed a particularly low level of genetic diversity (Figs 1, 2 and 4A, with only 0.03% polymorphic sites, and a lack of recombination footprints (i.e., a higher level of linkage disequilibrium, as shown by r2 values, Fig. 4B, and by the large single-color blocks along the genomes, Fig. 2). These findings suggest that the second cheese population is a single clonal lineage.
We used genome sequences to design genetic markers (Supplementary Table 2) for assigning a collection of 65 strains provided by the main French supplier of P. roqueforti spores for artisanal and industrial cheesemakers, 18 additional strains from the National History Museum collection in Paris (LCP) and 31 strains from the adjunct collection of the Université de Bretagne Occidentale (UBOCC, Supplementary Table 1) to the four genetic clusters. Out of these 148 strains, 55 were assigned to the more genetically diverse of the two cheese clusters. The majority of these strains included strains used for Roquefort PDO cheese production (N=30); three strains originated from Bleu des Causses cheeses (Supplementary Figure 1, Supplementary Table 1), produced in the same area as Roquefort and using similarly long storage in caves. The remaining strains of this cluster included samples from other blue cheeses (N=13), unknown blue cheeses (N=5) or other environments (N=4), the latter likely associated with feral strains. Because of the strong bias of usage toward Roquefort production, we refer to this cluster hereafter as the “Roquefor population”. Of the remaining 95 strains, 60 belonged to the second cheese cluster, which was less genetically diverse and contained mainly commercial strains used to produce a wide range of blue cheeses, but only one from the Roquefort PDO (Supplementary Figure 1, Supplementary Table 1). This cluster was therefore named the “non-Roquefort population”. The Roquefort population also included 13 strains used to inoculate other types of blue cheese (e.g. Gorgonzola or Bleu d’Auvergne), but strains from these types of cheeses were more common in the non-Roquefort population. The non-Roquefort cluster contained strains carrying Wallaby and CheesyTer, two large genomic regions recently shown to have been transferred horizontally between different Penicillium species from the cheese environment and conferring faster growth on cheese [31, 32], whereas all the strains in the Roquefort cluster lacked those regions.
Two independent domestication events in Penicillium roqueforti for cheesemaking
By comparing 11 demographic scenarios in approximate Bayesian computation (ABC), we showed that the two P. roqueforti cheese populations (Roquefort and non-Roquefort) resulted from two independent domestication events (Fig. 5, Supplementary Figure 2). The highest posterior probabilities were obtained for the S4 scenario, in which the two cheese populations formed two lineages independently derived from the common ancestral population of all P. roqueforti strains (Fig. 5, model choice and parameter estimates in Supplementary Figure 2). We inferred much stronger bottlenecks in the two cheese populations than in the non-cheese populations, with the most severe bottleneck found in the non-Roquefort cheese population. Some gene flow (m=0.1) was inferred between the two non-cheese populations but none with cheese populations. The bottleneck date estimates in ABC had too large credibility intervals to allow inferring domestication dates (Supplementary Figure 2E). We therefore used the multiple sequentially Markovian coalescent (MSMC) method [45] to estimate times since domestication, considering that they corresponded to the last time there was gene flow between genotypes within populations, as this also corresponds to bottleneck date estimates in coalescence. The domestication for the Roquefort cheese population was inferred seven times longer ago than for the non-Roquefort cheese population, both domestication events being recent (ca. 760 versus 140 generations ago, Figure 5B-C). Unfortunately, generation time, and even generation definition, are too uncertain in the clonal P. roqueforti populations to infer domestication dates in years. In addition, the MSMC analysis detected two bottlenecks in the history of the Roquefort cheese population (Figure 5B).
Contrasting fitness traits between cheese populations
We tested whether different phenotypes relevant for cheesemaking had evolved in the two cheese clusters, relative to other populations (Figs. 3B and 6, Supplementary Table 3). We first produced experimental cheeses inoculated with strains from the different P. roqueforti populations to assess their ability to colonize cheese cavities, a trait that may have been subject to human selection to choose inocula producing the most visually attractive blue cheeses. The fungus requires oxygen and can therefore sporulate only in the cheese cavities, its spores being responsible for the characteristic color of blue cheeses. Strains from the non-Roquefort cheese population were the most efficient colonizers of cheese cavities (Supplementary Table 4); no difference was detected between strains from the Roquefort and non-cheese populations (Fig. 6E).
As P. roqueforti strains were traditionally multiplied on bread loaves for cheese inoculation, they may have been subject to unintentional selection for faster growth on bread. However, growth rate on bread did not significantly differ between populations (Fig. 6A, Supplementary Table 4).
We then assessed lipolytic and proteolytic activities in the P. roqueforti populations. These activities are important for energy and nutrient uptake, as well as for cheese texture and the production of volatile compounds responsible for cheese flavors [37, 46]. Lipolysis was faster in the non-Roquefort cheese population than in the Roquefort and silage/food spoiling populations (Fig. 6B, Supplementary Table 4). A strong population effect was found for proteolytic activity (Supplementary Table 4) although without significant differences in the post-hoc pairwise analysis (Fig. 6C). However, variances showed significant differences between populations (Levene test F-ratio=5.97, d.f.=3, P<0.0017), with the two cheese populations showing the highest variances, and with extreme values above and below those in non-cheese populations (Fig. 6C). Noteworthy, proteolysis is a choice criterion for making different kinds of blue cheeses (https://www.lip-sas.fr/index.php/nos-produits/penicillium-roquefortii/18-penicillium-roquefortii). This suggests that some cheese strains may have been selected for higher and others for lower proteolytic activity. Alternatively, selection could have been relaxed on this trait in the cheese populations, leading to some mutations decreasing and other increasing proteolysis in different strains, thus increasing variance in the populations.
The ability of P. roqueforti strains to produce spores may also have been selected by humans, both unwittingly, due to the collection of spores from moldy bread, and deliberately, through the choice of inocula producing bluer cheeses. We detected no difference in spore production between the P. roqueforti populations grown on cheese medium or malt. However, we observed significant differences in spore production on bread medium. The Roquefort population produced the highest number of spores and significantly more than the non-Roquefort population (Fig. 6D, Supplementary Table 4).
High salt concentrations have long been used in cheesemaking to prevent the growth of spoiler and pathogenic microorganisms. We found that the ability to grow on salted malt and cheese media decreased in all P. roqueforti populations (Fig. 6F, Supplementary Table 4). We found a significant interaction between salt and population factors, and post-hoc tests indicated that the Roquefort population was more affected by salt than the other populations (Fig. 6F, Supplementary Table 4).
Volatile compound production was also investigated in the two cheese populations, as these compounds are important for cheese flavor [46]. We identified 52 volatile compounds, including several involved in cheese aroma properties, such as ketones, free fatty acids, sulfur compounds, alcohols, aldehydes, pyrazines, esters, lactones and phenols [47] (Supplementary Figure 3). The two cheese populations presented significantly different volatile compound profiles, differing by three ketones, one alcohol and two pyrazines (Supplementary Figure 3). The Roquefort population produced the highest diversity of volatile compounds (Supplementary Figure 3A).
Detection of genomic regions affected by recent positive selection and population-specific genomic islands
We identified five regions present in the genomes of strains from the non-Roquefort cheese population and absent from the other populations. We also detected five other genomic islands present in several P. roqueforti strains but absent from the non-Roquefort cheese strains (Supplementary Figure 4). Nine of these ten genomic regions were not found in the genomes of the outgroup Penicillium species analyzed here and they displayed no genetic diversity in P. roqueforti. No SNPs were detected even at synonymous sites or in non-coding regions, suggesting recent acquisitions, by horizontal gene transfer. Only FM164-C, one of the genomic islands specific to the non-Roquefort population, was present in the outgroup genomes, in which it displayed variability, indicating a loss in the other lineages rather than a gain in the non-Roquefort population and the outgroup species (Supplementary Figure 4A). The closest hits in the NCBI database for genes in the ten genomic islands were in Penicillium genomes. Most of the putative functions proposed for the genes within these genomic regions were related to lipolysis, carbohydrate or amino-acid catabolism and metabolite transport. Other putative functions concerned fungal development, including spore production and hyphal growth (Supplementary Figure 4). In the genomic regions specific to the non-Roquefort cheese population, we also identified putative functions potentially relevant for competition against other microorganisms, such as phospholipases, proteins carrying peptidoglycan- or chitin-binding domains and chitinases (Supplementary Figure 4) [48]. Enrichment tests were non-significant, probably due to the small number of genes in these regions.
Footprints of positive selection in P. roqueforti genomes were detected using an extension of the McDonald-Kreitman test [49] which identifies genes with more frequent amino-acid changes than expected under neutrality, neutral substitution rates being assessed by comparing the rates of synonymous and non-synonymous substitutions within and between species or populations to account for gene-specific mutation rates. We ran the test with three levels of population subdivision. First, no significant footprint of positive selection was detected for any gene by comparing the whole P. roqueforti species with P. paneum. In a second test, a set of 15 genes was identified as evolving under positive selection in the Roquefort cheese population but not in the other pooled P. roqueforti populations (Supplementary Figure 5B). Interestingly, eight of these 15 genes clustered at the end of the largest scaffold (Supplementary Figure 5A). In a third test, four genes were identified as evolving under positive selection in the non-Roquefort cheese population but not in the pooled non-cheese P. roqueforti populations (Supplementary Figure 5B). Two of these genes corresponded to a putative aromatic ring hydroxylase and a putative cyclin evolving under purifying selection in Roquefort and non-cheese P. roqueforti populations (Supplementary Figure 5B). Aromatic ring hydroxylases are known to be involved in the catabolism of aromatic amino acids, which are precursors of flavor compounds [50, 51].
Discussion
We report here the genetic subdivision of P. roqueforti, the fungus used worldwide for blue cheese production, with unprecedented resolution, providing insights into its domestication history. Population genomics studies on strains from various substrates and from a large collection of cheese identified four genetically differentiated populations, two of which being cheese populations originating from independent and recent domestication events. One P. roqueforti cheese population included all the genotyped strains but one used for PDO Roquefort cheeses, produced in the French town Roquefort-sur-Soulzon, where blue cheeses have been made since at least the 15th century, and probably for much longer [33, 38-43]. The strains from this Roquefort cheese population lacks the horizontally-transferred Wallaby and CheesyTer genomic islands contrary to the other non-Roquefort cheese population.
We observed that the two P. roqueforti cheese populations differed on several traits important for cheese production, probably corresponding to historical discrepancies. Indeed, the Roquefort population has retained moderate genetic diversity, consistent with soft selection during pre-industrial times on multiple farms near Roquefort-sur-Soulzon, where specific strains were kept for several centuries. The Roquefort cheese population grew slower in cheese [31] and had weaker lipolytic activity. Slow maturation is particularly crucial for the storage of Roquefort cheeses for long periods in the absence of refrigeration [41] because they are made of ewe’s milk, a product available only between February and July. During storage, cheeses could become over degraded by too high rates of lipolysis, thus likely explaining the low lipolysis activity in Roquefort strains. By contrast, most other blue cheeses are produced from cow’s milk, which is available all year. The Roquefort population showed greater sporulation on bread, which is consistent with unconscious selection for this trait when strains were cultured on bread in Roquefort-sur-Soulzon farms before cheese inoculation during the end of the 19th and beginning of the 20th centuries.
Lipolytic activity is known to impact texture and the production of volatile compounds affecting cheese pungency [54-59]. The Roquefort and non-Roquefort populations showed different volatile compound profiles, suggesting also different flavor profiles. The discovery of different phenotypes in the two cheese populations, together with the availability of a protocol for inducing sexual reproduction in P. roqueforti [36], pave the way for crosses to counteract degeneration after clonal multiplication and bottlenecks, for variety improvement and the generation of diversity.
Both cheese populations were found to have gone through bottlenecks. A previous study showed that these bottlenecks, together with clonal multiplication, decreased fertility, with different stages in sexual reproduction affected in the two populations identified here as the Roquefort and non-Roquefort lineages [52]. The non-Roquefort cheese population, despite suffering from a more severe and more recent bottleneck, was found to be used in the production of all types of blue cheese worldwide, including Gorgonzola, Bleu d’Auvergne, Stilton, Cabrales and Fourme d’Ambert. We showed that it grows more rapidly on cheese [31], exhibits greater ability to colonize cheese cavities, higher salt tolerance and faster lipolysis than the Roquefort population. These characteristics are consistent with the non-Roquefort cheese population resulting from a very recent strong selection of traits beneficial for modern, accelerated, production of blue cheese using refrigeration techniques, followed by a worldwide dissemination for the production of all types of blue. Such drastic losses of genetic diversity in domesticated organisms are typical of strong selection for industrial use by a few international firms and raise concerns about the conservation of genetic resources, the loss of which may hinder future innovation. More generally, in crops, the impoverishment in genetic diversity decreases the ability of cultivated populations to adapt to environmental and biotic changes to meet future needs [15-17]. The PDO label, which imposes the use of local strains, has probably contributed to the conservation of genetic diversity in the Roquefort population (see “Cahier des charges de l’appellation d’origine protégeé Roquefort”, i.e., the technical specifications for Roquefort PDO). We inferred two bottlenecks in the Roquefort population, more ancient than in the non-Roquefort population, likely corresponding to a pre-industrial domestication event when multiple local farms multiplied their strains, followed by a second bottleneck when fewer strains were kept by the first industrial societies. For other blue cheeses, even if their production was also ancient, the performant non-Roquefort clonal lineage could have been recently chosen to fit modern industrial production demands due to the lack of PDO rules imposing the use of local strains. However, despite a much lower genome-wide diversity in domesticated populations, proteolysis and volatile compounds diversity was found higher in cheese than in non-cheese populations. In fact, different strains with more or less rapid proteolysis and lipolysis are sold for specific blue cheese types (e.g., milder or stronger), in particular by the French LIP company (https://www.lip-sas.fr/index.php/nos-produits/penicillium-roquefortii/18-penicillium-roquefortii). Such a high phenotypic diversity within the cheese populations is consistent with diversification of usage under domestication, and in particular when different characteristics are desired according to cheese type. This has already been observed in relation to the diversification of crop varieties or breeds in domesticated animals [13, 14].
When studying adaptation in domesticated organisms, it is often useful to contrast traits and genomic variants between domesticated and closely related wild populations to determine the nature of the adaptive changes occurring under artificial selection [53, 54]. The only known non-cheese populations of P. roqueforti occur essentially in human-made environments (silage, food and lumber), consistent with the specific adaptation of these populations to these environments. The two non-cheese populations were inferred to have diverged very recently and they displayed footprints of recombination and marked differentiation from the cheese populations. Domesticated populations are expected to be nested within their source populations, suggesting that we have not sampled the wild population that is the most closely related from cheese strains yet. The high level of diversity and inferred demographic history of P. roqueforti indicate that most food-spoiling strains belong to differentiated populations and are not feral cheese strains. In addition, not a single cheese strain was found in the food spoiling and silage populations. This was shown by both genome sequences and by the genotyping of a larger number of strains using a few selected markers, in the present study and based on microsatellite markers in a previous work [35]. Consequently, P. roqueforti spores from blue cheeses may, rarely, spoil food and food-spoiling and silage strains are not used for cheesemaking nor recombine with cheese strains. Such a lack of incoming gene flow into cheese populations allowed trait differentiation in cheese strains as expected under domestication.
It came as a surprise that the two non-cheese populations split more recently from each other than from the cheese lineages. In particular, the non-Roquefort population diverged the earliest from the unidentified ancestral population, and this has likely occurred in another environment than cheese. Much more recently, selection in industrial times has likely only kept the most performant clonal lineage of this population for cheesemaking, losing most of the initial diversity, as indicated by the very strong and recent bottleneck inferred in this lineage. Possible scenarios to explain the existence of two separated clusters thriving in food and silage differentiated from cheese strains include the very recent adaptive differentiation of a population from silage on human food or vice versa. The finding that silage strains are only found in one cluster (the orange in our representation) suggests an adaptation to this ecological niche, although experiments will be required to test this hypothesis. Food spoiling strains are in contrast found in three clusters and may thus not constitute a specific population adapted to this environment and may instead represent migrants from several populations belonging to other ecological niches. Green and orange clusters may alternatively represent populations thriving in yet unidentified environments, dispersing to silage and food.
The history of blue cheese production may provide circumstantial clues as to the origin of P. roqueforti cheese populations. Indeed, the first blue cheeses likely resulted from the sporadic accidental contamination of cheeses with spores from the environment, such as moldy food.
However, this would not be consistent with the demographic history inferred here for cheese and food-spoiling strains, as the cheese strains were not found to be nested within the food-spoiling strains, some of which originated from moldy bread. Furthermore, old French texts suggest that the blue mold colonized the cheese from within [33, 39, 40], which would indicate that the milk or curd was contaminated. French cheese producers began to inoculate cheeses with P. roqueforti spores from moldy rye bread at the end of the 19th century [33, 39, 40]. Breads were specifically made with a 2:1 mixture of wheat and rye flour and were baked rapidly at high temperature (500°C), to yield a protective crust, around a moist, undercooked interior [38, 41]; the mold developed from the inside of the bread after one to five months in the Roquefort caves [33, 39, 40]. Surveys of the microorganisms present in their caves [41, 55, 56] and our unsuccessful attempts to obtain samples from a maturing cellar suggest that P. roqueforti spores did not originate from the caves, which were nevertheless crucial due to the ideal conditions provided for P. roqueforti development [41]. Bread may have been colonized from the environment or from rye flour if the source P. roqueforti population was a rye endophyte or pathogen. This last hypothesis would be consistent with the lifestyle of many Penicillium species, which live in close association with plants, often acting as plant pathogens or necrotrophs [57], and with the occurrence of a P. roqueforti population in lumber and silage. If this hypothesis is correct, then cheeses may historically have become contaminated with P. roqueforti from fodder during milking.
Comparison between non-cheese and cheese populations allowed us to identify specific traits and genes that have been under selection in cheese as opposed to other environments. Furthermore, the two independently domesticated P. roqueforti cheese populations, exhibiting different traits, represent a good model for studying the genomic processes involved in adaptation. We were able to identify candidate genes and evolutionary mechanisms potentially involved in adaptation to cheese in P. roqueforti. The horizontally-transferred CheesyTer genomic island probably contributes to the faster growth of the strains identified here as constituting the non-Roquefort population [31]. Indeed, CheesyTer includes genes with putative functions involved in carbohydrate utilization (e.g. β-galactosidase and lactose permease genes) that are specifically expressed at the beginning of cheese maturation, when lactose and galactose are available. This horizontal gene transfer may thus have been involved in adaptation to recently developed industrial cheese production processes in the non-Roquefort cheese population, conferring faster growth. We also identified additional genomic islands specific to the non-Roquefort cheese population, probably acquired recently and including genes putatively involved in fungal growth and spore production. In the genomic islands specific to the cheese populations, several genes appeared to be involved in lipolysis, carbohydrate or amino-acid catabolism and metabolite transport, all of which are important biochemical processes in the development of cheese flavor. In the Roquefort cheese population, a genomic region harboring genes with footprints for positive selection included several genes encoding proteins potentially involved in aromatic amino-acid catabolism corresponding to precursors of volatile compounds. Further studies are required to determine the role of these genes in cheese flavor development.
In conclusion, we show that P. roqueforti cheese populations represent genuine domestication. Of course, the domestication process in cheese fungi has been more recent and different from the ones in emblematic crops or animals, and may not even fit exactly some definition of domestication. Nevertheless, we did observe strong genetic differentiation from non-cheese populations, strong bottlenecks and trait differentiation with likely benefits for cheese production. Furthermore, a previous study has shown that the non-Roquefort cheese strains have acquired genes conferring better growth in cheese [31]. The two independent domestication events identified here represent parallel adaptations to the same new environment, a particularly powerful situation for studies of adaptation [7, 8, 11]. Our findings concerning the history of P. roqueforti domestication shed light on the processes of adaptation to rapid environmental change, but they also have industrial implications and raise questions about the conservation of genetic resources in the agri-food context.
Methods
Isolation attempts of Penicillium roqueforti in ripening cellar and dairy environments
In order to investigate whether a wild P. roqueforti population occurred in ripening cellars or dairy environments that could be at the origin of the observed cheese populations, we sampled spores from the air in an artisanal cheese dairy company (GAEC Le Lévejac, Saint Georges de Lévejac, France, ca 60 km from Roquefort-sur-Soulzon, producing no blue cheese to avoid feral strains, i.e. dispersal from inoculated cheeses), sampling was performed in the sheepfold, milking parlour, cheese dairy and ripening cellar. We also sampled spores from the air in an abandoned ripening cellar in the town of Meyrueis (ca 70 km from Roquefort-sous-Soulzon) where Roquefort cheeses used to be produced and stored in the early 19th century. In total, 55 Petri dishes containing malt (2% cristomalt, Difal) and 3% ampicillin were left open for six days as traps for airborne spores (35 Petri dishes in the abandoned ripening cellar and 20 Petri dishes in the artisanal cheese dairy company). Numerous fungal colonies were obtained on the Petri dishes. One monospore was isolated from each of the 22 Penicillium-like colonies. DNA was extracted using the Nucleospin Soil Kit (Macherey-Nagel, Düren, Germany) and a fragment of the ²-tubulin gene was amplified using the primer set Bt2a/Bt2b [59], and then sequenced. Sequences were blasted against the NCBI database to assign monospores to species. Based on β-tubulin sequences, ten strains were assigned to P. solitum, six to P. brevicompactum, two to P. bialowienzense, one to P. echinulatum and two to the Cladosporium genus. No P. roqueforti strain could thus be isolated from this sampling procedure.
Genome sequencing and analysis
The genomic DNAs of cheesemaking strains obtained from public collections belonging to P. roqueforti, seven strains of P. paneum, one strain of P. carneum and one strain of P. psychrosexualis (Supplementary Table 1) were extracted from fresh haploid mycelium after monospore isolation and growth for five days on malt agar using the Nucleospin Soil Kit (Macherey-Nagel, Düren, Germany). Sequencing was performed using the Illumina HiSeq 2500 paired-end technology (Illumina Inc.) with an average insert size of 400 bp at the GenoToul INRA platform and resulted in a 50x-100x coverage. In addition, the genomes of four strains (LCP05885, LCP06096, LCP06097 and LCP06098) were used that had previously been sequenced using the ABI SOLID technology [32]. GenBank accession numbers are HG792015-HG792062.
Identification of presence/absence polymorphism of blocks larger than 10 kbp in genomes was performed based on coverage using mapping against the FM164 P. roqueforti reference genome. In order to identify genomic regions that would be lacking in the FM164 genome but present in other strains, we used a second assembled genome, that of the UASWS P. roqueforti strain collected from bread, sequenced using Illumina HiSeq shotgun and displaying 428 contigs (Genbank accession numbers: JNNS01000420-JNNS01000428). Blocks larger than 10 kbp present in the UASWS genome and absent in the FM164 genome were identified using the nucmer program v3.1 [68]. Gene models for the UASWS genome were predicted with EuGene following the same pipeline as for the FM164 genome [32, 69]. The presence/absence of these regions in the P. roqueforti genomes was then determined using the coverage obtained by mapping reads against the UASWS genome with the start/end positions identified by nucmer. The absence of regions was inferred when less than five reads were mapped. In order to determine their presence/absence in other Penicillium species, the sequences of these regions were blasted against nine Penicillium reference genomes (Supplementary Table 1). PCR primer pairs were designed using Primer3Plus (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi/) in the flanking sequences of these genomic regions in order to check their presence/absence in a broader collection of P. roqueforti strains based on PCR tests (Supplementary Table 2). For each genomic island, two primer pairs were designed when possible (i.e. when sufficiently far from the ends of the scaffolds and not in repeated regions): one yielding a PCR product when the region was present and another one giving a band when the region was absent, in order to avoid relying only on lack of amplification for inferring the absence of a genomic region. PCRs were performed in a volume of 25 μL, containing 12,5 μL template DNA (ten folds diluted), 0.625 U Taq DNA Polymerase (MP Biomedicals), 2.5 μL 10x PCR buffer, 1 μL of 2.5 mM dNTPs, 1 μL of each of 10 pM primer. Amplification was performed using the following program: 5 min at 94°C and 30 cycles of 30 s at 94°C, 30 s at 60°C and 1 min at 72°C, followed by a final extension of 5 min at 72°C. PCR products were visualized using stained agarose gel electrophoresis. Data were deposited at the European Nucleotide Archive (http://www.ebi.ac.uk/ena/) under the accession number: PRJEB20132 for whole genome sequencing and PRJEB20413 for Sanger sequencing.
For each strain, reads were mapped using stampy v1.0.21 [60] against the high-quality reference genome of the FM164 P. roqueforti strain [32]. In order to minimize the number of mismatches, reads were locally realigned using the genome analysis toolkit (GATK) IndelRealigner v3.2-2 [61]. SNP detection was performed using the GATK Unified Genotyper [61], based on the reference genome in which repeated sequences were detected using RepeatMasker [62] and masked, so that SNPs were not called in these regions. In total 483,831 bp were masked, corresponding to 1.67% of the FM164 genome sequence. The 1% and 99% quantiles of the distribution of coverage depth were assessed across each sequenced genome and SNPs called at positions where depth values fell in these extreme quantiles were removed from the dataset. Only SNPs with less than 10% of missing data were kept. After filtering, a total of 115,544 SNPs were kept.
The strain tree was inferred by maximum likelihood using the RAxML program v7.0.3 [63] under the GTRCAT model using 6905 concatenated genes. To take into account possible differences in nucleotide substitution rates, the dataset was divided into two partitions, one including the 1st and 2nd codon positions and one including the 3rd codon positions. To assess node confidence, 1000 bootstraps were computed.
Population structure was assessed using a discriminant analysis of principal components (DAPC) with the Adegenet R package [64]. The genetic structure was also inferred along the genome by clustering the strains according to similarities of their genotypes, in windows of 50 SNPs, using the Mclust function of the mclust R package [65, 66] with Gower’s distance and a Gaussian mixture clustering with K=7 (as the above analyses indicated the existence of four P. roqueforti populations and there were three outgroup species).
The nucleotidic diversity, genetic diversity and linkage disequilibrium were estimated using the θπ, θw and r2 statistics, respectively, with the compute and rsq programs associated to libsequence v1.8.9 [67] on 1145 sliding windows of 50 kb with 25 kb of overlap distributed along the longest eleven scaffolds of the FM164 assembly (> 200 kb).
To identify genes evolving under positive selection in P. roqueforti genomes, we used the method implemented in SnIPRE [49], a Bayesian generalization of the log-linear model underlying the McDonald-Kreitman test. This method detects genes in which amino-acid changes are more frequent than expected under neutrality, by contrasting synonymous and non-synonymous SNPs, polymorphic or fixed in two groups, to account for gene-specific mutation rates.
Strain genotyping
We identified two genomic regions with multiple diagnostic SNPs allowing discriminating the two cheese clusters. Two PCR primer pairs were designed (Supplementary Table 2) to sequence these regions in order to assign the 65 strains (Supplementary Table 1) that can be purchased at the Laboratoire Interprofessionnel de Production d’Aurillac (LIP) (the main French supplier of P. roqueforti spores for artisanal and industrial cheese-makers; https://www.lip-sas.fr/) to the identified clusters. PCR products were then purified and sequenced at Eurofins (France). Because one of the cheese clusters included strains carrying the Wallaby and CheesyTer genomic islands while the second cluster strains lacked these genomic regions [31], we used previously developed primer pairs to check for the presence/absence of CheesyTer and Wallaby [31].
Sequences were first aligned together with those extracted from sequenced genomes, allowing assignation of LIP strains to one of the two cheese populations using MAFFT software[70] and then the alignments were visually checked. Then a tree reconstruction was made using RAxML following GTRCAT substitution model, using 2 partitions corresponding to the two fragments and 1000 bootstraps tree were generated [63].
Strain phenotyping
Experimental cheeses were produced in an artisanal dairy company (GAEC Le Lévejac, Saint Georges de Lévejac, France). The same ewe curd was used for all produced cheeses. Seven P. roqueforti strains were used for inoculation (two from each of the Roquefort, non-Roquefort and silage/food spoiler clusters, and one from the lumber/food spoiler cluster; their identity is given in Supplementary Table 1) using 17.8 mg of lyophilized spores. Three cheeses were produced for each strain in cheese strainers (in oval pots with opposite diameters of 8 and 9 cm, respectively), as well as a control cheese without inoculation. After 48 h of draining, cheeses were salted (by surface scrubbing with coarse salt), pierced and placed in a maturing cellar for four weeks at 11°C. Cheeses were then sliced into six equal pieces and a picture of each slice was taken using a Nikon D7000 (zoom lens: Nikon 18-105mm f:3.5-5.6G). Pictures were analyzed using the geospatial image processing software ENVI (Harris Geospatial Solution) (Fig. 6F). This software enables pixel classification according to their level of blue, red, green, and grey into two to four classes depending on the analyzed image. This classification allowed assigning pixels to two classes corresponding to the inner white part and the cavities of the cheese, respectively (Fig. 6F). For each picture, the percentage of pixels corresponding to the cavities was then quantified. Because the software could not reliably assign pixels to the presence versus absence of the fungus in cavities, we visually determined the cavity areas that were colonized by P. roqueforti using images. This allowed calculating a cheese cavity colonization rate. Because Penicillium spores have a high dispersal ability which could cause contaminations, we confirmed strain identity present in cheeses by performing Sanger sequencing of four diagnostic markers designed based on SNPs and specific to each strain (Supplementary Table 2). For each cheese, three random monospore isolates were genotyped, and no contamination was detected (i.e. all the sequences obtained corresponded to the inoculated strains).
To compare the growth rates of the different P. roqueforti clusters on bread (i.e. the traditional multiplication medium), 24 strains were used (eight from each of the Roquefort and non-Roquefort cheese clusters, five from the silage/food spoiler cluster, and three from the lumber/food spoiler cluster; the identities of the strains are shown in Supplementary Table 1). Each strain was inoculated in a central point in three Petri dishes by depositing 10 ¼L of a standardized spore suspension (0.7×109 spores/mL). Petri dishes contained agar (2%) and crushed organic cereal bread including rye (200 g/L). After three days at 25°C in the dark, two perpendicular diameters were measured for each colony to assess colony size.
The lipolytic and proteolytic activities of P. roqueforti strains were measured as follows: standardized spore suspensions (2500 spores/inoculation) for each strain (N=47: 15 from the Roquefort cluster, 15 from the non-Roquefort cheese cluster, 10 from the silage/food spoiler cluster and seven from the lumber/food spoiler cluster, identity in Supplementary Table 1) were inoculated on the top of a test tube containing agar and tributyrin for lypolytic activity measure (10 mL/L, ACROS Organics, Belgium) or semi-skimmed milk for the proteolytic activity measure (40 g/L, from large retailers). The lipolytic and proteolytic activities were estimated by the degradation degree of the compounds, which changes the media from opaque to translucent. For each media, three independent experiments have been conducted. For each strain, duplicates were performed in each experiment and the degree of enzymatic activity level in the medium was marked. Measures were highly repeatable between the two replicates (Pearson’s product-moment correlation coefficient of 0.93 in pairwise comparison between replicates, P<0.0001). We measured the distance between the initial mark and the hydrolysis, translucent front, after 7, 14, 21 and 28 days of growth at 20°C in the dark.
A total of 47 strains were used to compare spore production between the four P. roqueforti clusters (Supplementary Table 1), 15 belonging to the non-Roquefort cluster, 15 to the Roquefort cluster, 10 to the silage/food spoiler cluster and seven to the lumber/food spoiler cluster. After seven days of growth on malt agar in Petri dishes of 60 mm diameter at room temperature, we scraped all the fungal material by adding 5 mL of tween water 0.005%. We counted the number of spores per mL in the solution with a Malassez hemocytometer (mean of four squares per strain) for calibrating spore solution. We spread 50 μL of the calibrated spore solution (i.e. 7.106spores.mL−1) for each strain on Petri dishes of 60 mm diameter containing three different media, malt, cheese and bread agar (organic “La Vie Claire” bread mixed with agar), in duplicates (two plates per medium and per strain). After eight days of growth at room temperature, we took off a circular plug of medium with spores and mycelium at the top, using Falcon 15 mL canonical centrifuge tubes (diameter of 15 mm). We inserted the plugs into 5 mL Eppendorf tubes containing 2 mL of tween water 0.005% and vortexed for 15 seconds to detach spores from the medium. Using a plate spectrophotometer, we measured the optical density (OD) at 600 nm for each culture in the supernatant after a four-fold dilution (Supplementary Table 3).
To compare salt tolerance between P. roqueforti clusters, 26 strains were used (eight from the Roquefort cluster, ten from the non-Roquefort cluster, three from the silage/food spoiler cluster, and five from the lumber/food spoiler cluster; strain identities are shown in Supplementary Table 1). For each strain and each medium, three Petri dishes were inoculated by depositing 10 μL of standardized spore suspension (0.7×109 spores/mL) on Petri dishes containing either only malt (20 g/L), malt and salt (NaCl 8%, which corresponds to the salt concentration used before fridge use to avoid contaminants in blue cheeses), only goat cheese, or goat cheese and salt (NaCl 8%). The goat cheese medium was prepared as described in a previous study [31]. Strains were grown at 25°C and colony size measured daily for 24 days.
Volatile production assays were performed on 16 Roquefort strains and 19 non-Roquefort cheese strains grown on model cheeses as previously described [37]. Briefly, model cheeses were prepared in Petri dishes and incubated for 14 days at 25 °C before removing three 10 mm-diameter plugs (equivalent to approximately 1 g). The plugs were then placed into 22 mL Perkin Elmer vials that were tightly closed with polytetrafluorethylene (PTFE)/silicone septa and stored at-80°C prior to analyses [37]. Analyses and data processing were carried out by headspace trap-gas chromatography-mass spectrometry (HS-trap-GC-MS) using a Perkin Elmer turbomatrix HS-40 trap sampler, a Clarus 680 gas chromatograph coupled to a Clarus 600T quadrupole MS (Perkin Elmer, Courtaboeuf, France), and the open source XCMS package of the R software (http://www.r-project.org/), respectively, as previously described [71].
All phenotypic measures are reported in Supplementary Table 3. Statistical analyses for testing differences in phenotypes between populations and/or media (Supplementary Table 4) were performed with R software (http://ww.r-project.org).
Differences in volatile profiles among the two P. roqueforti cheese populations were analyzed using a supervised multivariate analysis method, orthogonal partial least squares discriminant analysis (OPLS-DA). OPLS is an extension of principal components analysis (PCA), that is more powerful when the number of explained variables (Y) is much higher than the number of explanatory variables (X). PCA is an unsupervised method maximizing the variance explained in Y, while partial least squares (PLS) maximizes the covariance between X and Y(s). OPLS is a supervised method that aims at discriminating samples. It is a variant of PLS which uses orthogonal (uncorrelated) signal correction to maximize the explained covariance between X and Y on the first latent variable, and components >1 capture variance in X which is orthogonal (uncorrelated) to Y. The optimal number of latent variables was evaluated by cross-validation [72]. Finally, to identify the volatile compounds that were produced in significantly different quantities between the two populations, a t-test was performed using the R software (http://www.r-project.org/).
Demographic modeling using approximate Bayesian computation (ABC)
The likelihoods of 11 demographic scenarios for the P. roqueforti populations were compared using approximate Bayesian computation (ABC) [73, 74]. The scenarios differed in the order of demographic events, and included 21 parameters to be estimated (Supplementary Figure 2). A total of 262 fragments, ranging from 5 kb to 15 kb, were generated from observed SNPs by compiling in a fragment all adjacent SNPs in complete linkage disequilibrium. The population mutation rate θ (the product of the mutation rate and the effective population size) used for coalescent simulations was obtained from data using θw, the Watterson’s estimator. Simulated data were generated using the same fragment number and sizes as the SNP dataset generated from the genomes. Priors were sampled in a log-uniform distribution (Supplementary Figure 2C). For each scenario, one million coalescent simulations were run and the following summary statistics were calculated on observed and simulated data using msABC [75]: the number of segregating sites, the estimators K [76] and θw [77] of nucleotide diversity, Tajima’s D [78], the intragenic linkage disequilibrium coefficient ZnS [79], FST [80], the percentage of shared polymorphisms between populations, the percentage of private SNPs for each population, the percentage of fixed SNPs in each population, Fay and Wu’s H [81], the number of haplotypes [82] and the haplotype diversity [82]. For each summary statistic, both average and variance values across simulated fragments were calculated.
The choice of summary statistics to estimate posterior parameters is a crucial step in ABC [83]. We chose the summary statistics based on their capacity to discriminate scenarios, by testing if their values were significantly different among scenarios across simulations running a Kruskal-Wallis test. We finally kept 34 summary statistics for model choice: average and variance of shared polymorphism percentages between pairs of population, variance of private SNP percentages in lumber/food spoiler and non-Roquefort populations, average and variance of fixed SNPs percentage between pairs of populations, average of FST between pairs of population, average of Fay and Wu’s H and number of haplotypes in the Roquefort population.
The posterior probability distributions of the parameters, the goodness of fit for each model and model selection (Supplementary Figure 2E) were calculated using a rejection-regression procedure [73]. Acceptance values of 0.005 were used for all analyses. Regression analyses was performed using the “abc” R package (http://cran.rproiect.org/web/packages/abc/index.html).
Estimate of time since domestication
The multiple sequentially Markovian coalescent (MSMC) software was used to estimate the domestication times of cheese populations. The estimate of the last time gene flow occurred within each cheese population was taken as a proxy of time since domestication as it also corresponds in such methods to bottleneck date estimates and is more precisely estimated. Recombination rate was set at zero because sexual reproduction has likely not occurred since domestication in cheese populations (see results). Segments were set to 21*1+1*2+1*3 for the Roquefort population which contains three haplotypes (Figure 1) and to 10*1+15*2 for the non-Roquefort population, which contains two closely related haplotypes (Figure 1). In both cases, MSMC was run for 15 iterations and otherwise default parameters. The mutation rate was set to 10−8.
Author contributions
TG and AB acquired the funding, designed and supervised the study. SL and AS produced the genomes. ED, AB and RdlV analyzed the genomes. ED, SL, JR, AS, MC, AT, EC, MLP and DR performed the experiments. ED, AB and TG analyzed the data from the experiments. ED, AB and AF performed ABC analyses. ED and TG wrote the manuscript with contributions from the other authors.
Acknowledgments
This work was supported by the ERC starting grant GenomeFun 309403 awarded to TG, the ANR FROMA-GEN grant (ANR-12-PDOC-0030) to AB, and an “Attractivite” grant from Paris-Sud University to AB. We thank Kamel Soudani for help with image analysis and Aurelien Tellier for advice concerning ABC analyses. We are grateful to Coralie Benel and Francis Roujon of GAEC Le Lévejac for assistance with cheesemaking and Paul Villain for experimental help. Sequencing was performed at GenoToul INRA platform. We thank INRA and MNHN for granting access to four genomes sequenced with the help of Joelle Dupont, Sandrine Lacoste, Yves Brygoo and Jeanne Ropars in the framework of the ANR ‘Food Microbiomes’ project (ANR-08-ALIA-007-02) coordinated by Pierre Renault.