ABSTRACT
The domestication history of grapes (Vitis vinifera ssp. vinifera) has not yet been investigated with genome sequencing data. We gathered data for a sample of 18 cultivars and nine putatively wild accessions to address three features of domestication history. The first was demography. We estimated that the wild and cultivated samples diverged ~22,000 years ago. Thereafter the cultivated lineage experienced a steady decline in population size (Ne), reaching its nadir near the time of domestication ~8,000 years ago. The long decline may reflect low intensity cultivation by humans prior to domestication. Ne of the wild sample fluctuated over the same timeframe, commensurate with glacial expansion and retraction. Second, we characterized regions of putative selective sweeps, identifying 309 candidate-selected genes in the cultivated sample. The set included genes that function in sugar metabolism, flower development and stress responses. Selected genes in the wild sample were enriched exclusively for functions related to biotic and abiotic stresses. A genomic region of high differentiation between wild and domesticated samples corresponded to the sex determination region, which included a candidate gene for a male sterility factor and additional genes that vary in gene expression among sexes. Finally, we investigated the cost of domestication. Despite the lack of a strong domestication bottleneck, grape accessions contained 5.2% more deleterious variants than wild individuals, and these were more often in a heterozygous state. We confirm that clonal propagation leads to the accumulation of recessive deleterious mutations, which is a likely cause of severe inbreeding depression in grapes.
INTRODUCTION
Grapevines (Vitis vinifera ssp. vinifera) are the most economically important horticultural crop in the world (Myles et al., 2011). The products of grape cultivation include table grapes, raisins, juice, wine and oil, and they contribute an estimated $162B annually to the American economy alone (Initiative, 2007). In addition to its economic value, V. vinifera is a model organism for the study of perennial fruit crops, both because it can be transformed and micropropagated via somatic embryogenesis (Kikkert et al., 2005, Wang et al., 2005) and because it has a relatively small genome. Its ~500 Mb genome is similar in size to that of rice (430 Mb) (Goff et al., 2002) and poplar (465Mb) (Tuskan et al., 2006).
Cultivated grapes (hereafter vinifera) have been a source of food and wine since their domestication ~8000 years ago (8.0kya) from their wild progenitor, V. vinifera ssp. sylvestris (hereafter sylvestris) (McGovern et al., 2003). The exact location of domestication remains uncertain, but most lines of evidence point to a primary domestication event in the Near-East (McGovern et al., 2003, Myles et al., 2011). Domestication caused morphological shifts that include larger berry and bunch sizes, higher sugar content, altered seed morphology, and a shift from dioecy to a hermaphroditic mating system (This et al., 2007). There is interest in identifying the genes that contribute to these morphological shifts. For example, several papers have attempted to identify the gene(s) that are responsible for the shift to hermaphroditism, which are located in a ~150kb region on chromosome 2 (Fechter et al., 2012, Picq et al., 2014).
Historically, genetic diversity among V. vinifera varieties has been studied with simple sequence repeats (SSRs) (Bowers et al., 1999). More recently, a group genotyped 950 vinifera and 59 sylvestris accessions with a chip containing 9,000 SNPs (Myles et al., 2011). Their data suggest that grape domestication led to a noticeable, but mild, reduction of genetic diversity. Still more recent studies have used whole-genome sequencing (WGS) to assess structural variation among grape varieties (Di Genova et al., 2014, Cardone et al., 2016, Xu et al., 2016). Surprisingly, however, WGS data have not been used to investigate the population genomics of grapes.
Several population genomic studies have focused on annual plants, but few have focused on perennial crops like grapes (Velasco et al., 2016, Wang et al., 2017). The distinction between annual and perennial crops is important, because perennial domestication is expected to differ from annual domestication in at least three aspects (Miller and Gross, 2011, Gaut et al., 2015). The first is time. Long-lived perennials have extended juvenile stages and are often propagated clonally. As a result, the number of sexual generations is much reduced for perennials relative to annual crops, even for perennials like grapes that were domesticated relatively early in human agricultural history. The second is the severity of the domestication bottleneck. Many (and perhaps most) perennial crops differ from annual crops in that they have not experienced a severe domestication bottleneck (Miller and Gross, 2011). The third is clonal propagation; many perennials are propagated clonally but most annuals are not. Clonal propagation maintains genetic diversity in desirous combinations but also limits opportunities for sexual recombination (Miller and Gross, 2011, Ramu et al., 2017).
The distinct features of perennial domestication likely affect genome-wide patterns of genetic diversity. For example, perennials may differ from annuals with respect to the ‘cost of domestication’ (Lu et al., 2006), which refers to an increased genetic load within cultivars. This cost originates partly from the fact that the decreased effective population size (Ne) during a domestication bottleneck reduces the efficacy of genome-wide selection (Charlesworth and Willis, 2009), which may increase the frequency and number of slightly deleterious variants (Lohmueller, 2014, Henn et al., 2015). The characterization of deleterious variants is important, because they are potential targets for crop improvement (Morrell et al., 2011). Thus far, putatively deleterious variants been studied in several annual crops (Renaut and Rieseberg, 2015, Kono et al., 2016, Liu et al., 2017, Ramu et al., 2017) but not perennials. If perennials do not typically experience strong domestication bottlenecks (Miller and Gross, 2011), their domestication may not result in an associated cost (Gaut et al., 2015).
Here we perform WGS on a sample of vinifera cultivars and putatively wild sylvestris accessions to focus on three sets of questions. First, what do the data reveal about the demographic history of cultivated grapes, specifically the timing and severity of a domestication bottleneck? Second, what genes bear the signature of selection in vinifera? Finally, has there been an accumulation of putatively deleterious variants in grapes relative to sylvestris, or have the unique features of perennial domestication permitted an escape from this cost?
RESULTS
Plant samples and population structure: We collected WGS data from nine putatively wild sylvestris individuals from the Near-East that represent a single genetic group (Myles et al., 2011), 18 vinifera individuals representing 14 cultivars, and one outgroup (Muscadinia rotundifolia) (Table S1). Our sylvestris accessions are a subset of the wild sample from reference (Myles et al., 2011), which was filtered for provenance and authenticity. We nonetheless label the sylvestris sample as ‘putatively’ wild, because it can be difficult to identify truly wild individuals. Reads were mapped to the Pinot Noir reference genome PN40024 (Jaillon et al., 2007), resulting in the identification of 3,963,172 and 3,732,107 SNPs across the sylvestris and vinifera samples (see Methods).
To investigate population structure, we applied principal component analysis (PCA) to genotype likelihoods (Korneliussen et al., 2014). Only the first two principal components (PCs) were significant (P<0.001); they explained 23.03% and 21.88% of the total genetic variance, respectively (Fig. 1A). PC1 separated samples of wine and table grapes, except for two accessions (Italia and Muscat of Alexandria) positioned between the two groups. PC2 divided wild and cultivated samples. Wine, table and wild grapes clustered separately in a neighbor joining tree, except for Muscat of Alexandria, which has been used historically for both wine and table grapes (Fig. 1B). Finally, STRUCTURE analyses revealed an optimal grouping of K=4 which separated sylvestris accessions, table grapes, wine grapes and the Zinfandel/Primitivo subgroup of wine grapes, and also identified admixed individuals (Fig. S1).
Nucleotide diversity and demographic history: We estimated population genetic parameters based on the sylvestris accessions (n=9) and on a cultivated sample of n=14 that included only one Thompson clone and one Zinfandel/Primitivo clone (Table S1). Both samples harbored substantial levels of nucleotide diversity across all sites (sylvestris: πw = 0.0147 ± 0.0011; vinifera: πc = 0.0139 ± 0.0014; Fig. S2). Although π was higher in sylvestris (πc/πw = 0.94 ± 0.14), vinifera had higher levels of heterozygosity and Tajima’s D values (0.5421 ± 0.0932; sylvestris, D = −0.4651 ± 0.1577; Fig. S2). Linkage disequilibrium (LD) decayed to r2 < 0.2 within 20 kilobases (kb) in both samples, but it declined more slowly for vinifera after ~20kb (Fig. S2).
We inferred the demographic history of the vinifera sample using MSMC, which requires phased SNPs (Schiffels and Durbin, 2014). Assuming a generation time of 3 years (McGovern et al., 2003) and a mutation rate of 2.5 □ × □ 10−9 mutations per nucleotide per year (Koch et al., 2000), we converted scaled population parameters into years and individuals (Ne). Based on these analyses, vinifera experienced a continual reduction of Ne starting ~22.0ky until its nadir from ~7.0kya to 11.0kya (Fig. 2A), which corresponds to the time of domestication and implies a mild domestication bottleneck. Notably, there was no evidence for a dramatic expansion of Ne since domestication. MSMC results were similar across two separate analyses (Fig. 2A), based on n=4 samples of either table or wine grapes (Table S1), suggesting that analyses captured shared aspects of the samples’ histories. We also used MSMC to compute divergence times. The divergence between sylvestris and vinifera was estimated to be ~21kya (Fig. 2B), which corresponds to the onset of the decline of vinifera Ne. Divergence between wine and table grapes was estimated to be ~2.5kya, which is well within the hypothesized period of vinifera domestication (Fig. 2B).
We repeated demographic analyses with SMC++, which estimates population histories and divergences without phasing (Terhorst et al., 2017)(Fig. 2C). This method yielded no evidence for a discrete bottleneck from ~7.0 to 10kya, but SMC++ and MSMC analyses had four similarities: i) an estimated divergence time of ~30kya that greatly predates domestication; ii) a slow decline in vinifera Ne since divergence; iii) no evidence for a rapid expansion in Ne after domestication; and iv) a ~2.6kya divergence of wine and table grapes (Fig. 2C, Fig. S3). We also used SMC++ to infer the demographic history of our sylvestris sample, revealing a complex Ne pattern that corresponds to features of climatic history (see Discussion).
Sweep mapping: We investigated patterns of selection and interspecific differentiation across the grape genome, using two metrics: CLR (Pavlidis et al., 2013) and FST. The CLR test identifies potential selection regions by detecting skews in the site frequency spectrum (sfs) within a single taxon, while FST focuses on regions of high divergence between taxa. All sweep analyses focused on sliding 20kb windows, reflecting the genome-wide pattern of LD decline (Fig. S2). Windows that scored in the top 0.5% were considered candidate sweep regions.
CLR analyses within vinifera identified 117 20kb windows encompassing 309 candidate-selected genes (Table S2, Fig. S4). Among these genes, nine functional categories were identified as significantly overrepresented (P ≤ 0.01), including the “Alcohol dehydrogenase superfamily”, “Monoterpenoid indole alkaloid biosynthesis” and “Flower development” (Table S3). We identified genes involved in berry development and/or quality, including the SWEET1 gene, which encodes a bidirectional sugar transporter (Chong et al., 2014). SWEET1 is expressed primarily in young and adult leaves (Chong et al., 2014), but it was also overexpressed in full-ripe berries compared to immature berries (adj. P=9.4E-3; Fig. S5), suggesting its involvement in sugar accumulation during berry ripening. SWEET1 was also identified when we contrasted non-admixed table and wine grapes using FST. Based on haplotype structure, we hypothesize that the skewed SWEET1 sfs within vinifera is due to elevated divergence between wine and table grapes. Additional genes of interest within vinifera included: i) a leucoanthocyanidin dioxygenase (LDOX) gene (VIT_08s0105g00380) that peaks in expression at the end of veraison (adj. P=8.9E-10; Fig. S5) and may be involved in proanthocyanidins accumulation (Bogs et al., 2005, Blanco-Ulate et al., 2015, Savoi et al., 2016); ii) genes potentially involved in berry softening, such as two pectinesterase-coding genes and a xyloglucan endotransglucosylase/hydrolase gene that exhibited maximal expression in post-veraison berry pericarps (Fig. S5); and iii) flowering time genes, including a Phytochrome C homolog.
CLR analyses of the sylvestris sample were notable for three reasons. First, the top 0.5% of windows yielded far fewer (88 vs. 309) genes (Table S2). Second, candidate-selected regions within sylvestris were distinct from those in vinifera (Fig. 3A); none of the putatively selected regions overlapped between taxa. Third, candidate-selected genes were enriched primarily for stress resistance (Table S4), including flavonoid production (P=6.27E-3), ethylene-mediated signaling pathways (P=8.76E-6), and the stilbenoid biosynthesis pathway (P=1.93E-50). Stilbenoids accumulate in response to biotic and abiotic stresses (Bézier et al., 2002, Wang et al., 2010, Amrine et al., 2015).
We also applied FST to regions of divergent selection between taxa, yielding an additional 929 candidate-selected genes (Tables S2 & S5). One peak of divergence was located from 4.90Mb to 5.33Mb on chromosome 2, which coincides with the sex determination region. FST analyses detected two peaks in this region; the first was identified previously by QTL analyses (Riaz et al., 2006, Fechter et al., 2012), and the wider region was identified from diversity data (Picq et al., 2014). The two peaks contain 13 and 32 genes, respectively. In the first peak, six genes were overexpressed in female (F) compared to both male (M) and hermaphroditic (H) flowers (adj. P 0.05; Fig. S6; Table S6), representing a non-random enrichment of F expression under the peak (binomial; P<10−7). One of these genes had been identified as a candidate male sterility gene (VviFSEX) (Coito et al., 2017). The second peak included four genes with biased sex expression: one with higher F expression, two with higher H expression and one with higher M expression (Table S6). Altogether, sweep mapping identified numerous candidate genes that may prove useful both for understanding the phenotypic consequences of domestication and for targeted improvement.
Deleterious variants: Domesticated species accumulate more deleterious variants than their progenitors (Marsden et al., 2016, Liu et al., 2017, Ramu et al., 2017). This phenomenon appears to be common for annual crops, but it is not clear whether the unique features of perennial domestication produce a similar effect (Gaut et al., 2015). To examine the potential increase in the number and frequency of deleterious variants at nonsynonymous sites between our vinifera and sylvestris samples, we predicted deleterious SNPs using SIFT (Ng and Henikoff, 2003). A total of 3,3653 nonsynonymous mutations were predicted to be deleterious in both samples. The number of derived deleterious variants was 5.2% higher, on average, for vinifera individuals than sylvestris individuals (Fig. 4A), and the ratio of deleterious to synonymous variants was also elevated in vinifera (Fig. S7). Most (~77%) deleterious variants were found in a heterozygous state in both samples, but the distribution by state differed between taxa, because deleterious variants were more often homozygous in sylvestris (P < 0.001, Fig. 4A). Cultivated accessions had a higher proportion of heterozygous deleterious variants (P = 0.002, Fig. 4A) and an elevated ratio of deleterious to synonymous variants (P < 0.001, Fig. S7).
Cassava, another clonally propagated crop, also had high levels of heterozygous deleterious variants (Ramu et al., 2017). To determine whether clonal propagation can drive this phenomenon, we performed forward simulations under two mating systems: outcrossing and clonal propagation. Each mating system was considered under three demographic models: a constant size population, a long ~30ky population decline similar to that inferred from SMC++ analysis, and a discrete bottleneck (see Methods). As expected (Simons et al., 2014), neither the mating system nor the demographic model had an effect on the accumulation of deleterious variants under an additive model (Fig. 5). Under a recessive model with outcrossing, a population bottleneck led to purging of deleterious variants (Fig. 5), as shown previously (Simons et al., 2014). However, clonal propagation drove the accumulation of deleterious variants under all demographic scenarios, with a bottleneck exacerbating the effect (Fig. 5). We conclude that clonal propagation drives the accumulation of deleterious, recessive alleles in heterozygous regions.
Selective sweeps can also drive the accumulation of deleterious variants, because they drag linked sites to high frequency (Fay and Wu, 2000, Hartfield and Otto, 2011). We examined the distribution of putatively deleterious variants in sweep regions compared to the remainder of the genome (i.e., the ‘control’). Sweep regions contained a significantly lower number of deleterious mutations when corrected for length (P < 0.001, Fig. 4D), but these variants were also found at significantly higher frequencies (P < 0.001, Fig. 4E) and in higher numbers relative to synonymous variants (P < 0.001; Fig. 4F), indicative of hitchhiking. All of these trends – including the number of deleterious variants per individual, the distribution by state, and the effects in sweep regions - were qualitatively similar using PROVEAN (Choi et al., 2012) to identify deleterious variants.
DISCUSSION
The Eurasian wild grape (Vitis vinifera subsp. sylvestris) is a dioecious, perennial, forest vine that was widely distributed in the Near East and the northern Mediterranean prior to its domestication (Zohary and Spiegel-Roy, 1975). The earliest archaeological evidence of wine production suggests that domestication took place in the Southern Caucasus between the Caspian and Black Seas ~6.0-8.0kya (McGovern et al., 1996, 2003). After domestication, the cultivars spread south by 5.0kya to the western side of the Fertile Crescent, the Jordan Valley, and Egypt, and finally reached Western Europe by ~2.8kya (Olmo, 1995, McGovern et al., 2003). Here, however, we are not concerned with the spread of modern grapes, but rather demographic history before and during domestication, the identity of genes that may have played a role in domestication, and the potential effects of domestication and breeding on the accumulation of deleterious variants.
A weak domestication bottleneck: We have gathered genome-wide resequencing data from a sample of table grapes, wine grapes and putatively wild grapes to investigate population structure and demographic history. These analyses lead to our first conclusion, which is that our sylvestris sample represents bona fide wild grapes, as opposed to feral escapees from domestication. This conclusion is evident from the fact that the sylvestris accessions cluster together in population structure analyses (Fig. 1), that they are estimated to have diverged from cultivated grapes ~22 to 30kya (Fig. 2), and that the set of putatively selected genes differ markedly between the vinifera and sylvestris samples (Fig. 3A). The divergence time between wild and cultivated samples suggests, however, that our sylvestris accessions likely do not represent the progenitor population of domesticated grapes.
Analyses of the vinifera data suggest that its historical population size has experienced a long decline starting from ~22.0 to ~30.0kya. MSMC analyses suggest that this decline culminated in a weak bottleneck around the estimated time of domestication (Fig. 2A). The potential bottleneck corresponds to the estimated time of grape domestication and the shift from hunter–gatherer to agrarian societies (Purugganan and Fuller, 2009). We note, however, that SMC++ analysis found no evidence for a distinct bottleneck, but instead inferred a consistent Ne decline (Fig. 2C). The question becomes, then, whether the domestication of vinifera included a discrete bottleneck. The evidence is mixed. The positive D for vinifera superficially suggests a population bottleneck, but forward simulations show that positive D values also result from a long population decline (Fig. S8). If there was a discrete bottleneck for grapes, we join previous studies in concluding that it was weak (Grassi et al., 2003, Barnaud et al., 2010, Myles et al., 2011), based on two lines of evidence. First, the diversity level in our vinifera sample is 94% that of sylvestris, representing a far higher cultivated-to-wild ratio than that of maize (83%) (Hufford et al., 2012), indica rice (64%) (Liu et al., 2017), soybean (83%) (Lam et al., 2010), cassava (71%) (Ramu et al., 2017) and tomato (54%) (Lin et al., 2014). This high ratio is consistent with a meta-analysis documenting that perennial crops retain 95% of neutral variation from their progenitors, on average, while annuals retain an average of 60% (Miller and Gross, 2011). Second, MSMC analyses suggest a ~2 to 3-fold reduction in Ne at the time of domestication (Fig. 2A). This implies that 33%-50% of the progenitor population was retained during domestication, a percentage that contrasts markedly with the <10% estimated for maize (Wright et al., 2005, Beissinger et al., 2016) and ~2% for rice (Zhu et al., 2007).
A protracted pre-domestication history: The protracted decline in Ne for vinifera prompts a question about its cause(s). One possibility is that it reflects natural processes that acted on vinifera progenitor populations. For example, climatic shifts may have contributed to the long Ne decline, because the the Last Glacial Maximum (LGM) occurred between 33.0 and 26.5kya (Clark et al., 2009). If the LGM caused vinifera’s population decline, one might expect to see population recovery during glacial retraction from 19.0 to 20.0kya. We detect evidence of such recovery in sylvestris but not vinifera (Fig. 2).
A second possibility is that proto-vinifera populations experienced a long period of human-mediated management. The crop domestication literature espouses two views about the speed of domestication. One view, supported primarily by the fossil record, is that domestication is a slow process that takes millennia (Purugganan and Fuller, 2009, 2011, Fuller et al., 2014). A second view argues that domestication occurred much more rapidly, based on genetic evidence and population modeling (Gaut, 2015). The gap between these two views has been bridged, in part, by a recent study of African rice (Meyer et al., 2016). The study inferred an Ne decline that started ~15kya and reached its nadir ~4kya, when African rice was thought to have been domesticated. The authors hypothesized that the Ne decline reflects a protracted period of low-intensity management and/or cultivation prior to modern domestication. We propose that a similar period of low-intensity usage by humans may have contributed to the long Ne decline in vinifera, especially given the contrasting historical pattern of the sylvestris sample (Fig. 3C). It is difficult to prove this proposition, but we do note that humans have inhabited the Southern Caucasus mountains, with some sites bearing evidence of human habitation for > 20k years (Adler and Tushabramishvili, 2004).
A surprising feature of demographic inference is that there is no evidence for a post-domestication expansion of vinifera (Fig. 2). This observation contrasts sharply with studies of maize (Beissinger et al., 2016) and African rice (Meyer et al., 2016), both of which > 5-fold had Ne increases following domestication. We hypothesize that the lack of expansion in grapes relates to the dynamics of perennial domestication, specifically clonal propagation and the short time frame (in generations). Data from peach are consistent with our hypothesis, but peach also has extremely low historical levels of Ne (Velasco et al., 2016). Almond, which is another clonally-propagated perennial, exhibits ~2-fold Ne expansion after domestication (Velasco et al., 2016), but it also may have been propagated sexually prior to the discovery of grafting (Zohary and Hopf, 2000). Clearly more work needs to be done to compare demographic and domestication histories across crops with varied histories.
Our demographic inference has caveats. First, our study – along with all previous studies - has likely not measured genetic diversity from the precise progenitor population to vinifera. Indeed, such a population may be extinct or at least substantially modified since domestication. Second, our sample size is modest, but it is sufficient to infer broad historical patterns (Schiffels and Durbin, 2014). Consistent with this supposition, the two runs of MSMC with two different samples of n=4 yielded qualitatively identical inferences about the demographic history of vinifera. Larger samples will be necessary for investigating more recent population history. Finally, demographic calculations assume a mutation rate and a generation time that may be incorrect, and they also treat all sites equivalently. We note that masking selected regions provide similar inferences (Fig. S3). We also gain confidence from the fact that our observations are consistent with independent estimates about domestication times and glacial events.
Selective sweeps and agronomically important genes: Selective sweep analyses have identified genes and regions that have been previously suspected to mediate agronomic changes. One example is that of the SWEET1 gene, which is detected within a potential sweep regions identified by vinifera CLR analyses and by FST between table vs. wine grapes (Figs. 3C & S4). The latter comparison needs to be considered with care, because we contrasted four Thompson clones, which exhibited no evidence of admixture, against a similarly non-admixed subset of wine grape accessions (Table S1). Even so, these analyses suggest that at least one difference between wine and table grapes is attributable to the SWEET1 sugar transporter, that 657 genes differentiate wine and table grapes (Table S2) and that the two groups diverged ~2.5kya (Fig. 2B and Tables S2 & S7).
One major change during domestication was the switch from dioecy to hermaphroditism (This et al., 2006). The sex-determining region resides on chromosome 2, based on QTL analyses that fine-mapped the sex locus between ~4.90 and 5.05 Mbp (Fechter et al., 2012, Picq et al., 2014). The region corresponds to a larger chromosomal segment from 4.75 Mb to 5.39 Mb, based on diversity data and segregation in multiple families (Hyma et al., 2015). With population genomic data, we identify a region from 4.90Mb to 5.33Mb that contains two discrete peaks of divergence (from ~4.90 to 5.05Mb and from ~5.2 to 5.3 Mb; Fig. 4B). We posit that the two peaks are meaningful, because a shift to hermaphroditism is thought to require two closely-linked loci: one that causes loss of M function and another that houses a dominant F sterility mutation (Charlesworth et al., 2005, Charlesworth, 2013). The first peak contains six genes overexpressed in F flowers, including VviFSEX, which may abort stamen development (Coito et al., 2017). We hypothesize that the second peak houses a dominant F sterility factor. The leading candidates for this function are four genes that are differentially expressed among sexes (Table S2), but none of the four are annotated with an obvious function in sex determination (Ramos et al., 2017) (Table S6).
The cost of domestication in a clonally propagated perennial: It has been unclear whether perennial crops demonstrate an increased burden of slightly deleterious mutations (Gaut et al., 2015), especially given that most have experienced only moderate bottlenecks (Miller and Gross, 2011). We find, however, that there is a cost associated with grape domestication: each vinifera accession contains 5.2% more putatively deleterious SNPs, on average, than the wild individuals in our sample. This difference exceeds that observed in dogs (2.6%) (Marsden et al., 2016) and rice (~3-4%) (Liu et al., 2017) but pales in comparison to cassava (26%), a clonally propagated annual (Ramu et al., 2017).
We have shown that clonal propagation leads to the accumulation of deleterious recessive mutations (Fig. 5 & S9), but it does not explain why cassava has had a more dramatic increase in deleterious variants (i.e., 26% vs. 5.2%) as a consequence domestication. There are at least two reasons. The first is that cassava underwent a strong domestication bottleneck (Ramu et al., 2017). Forward simulations indicate that the deleterious burden is exacerbated with a discrete bottleneck compared to a protracted population decline (Fig. 5). Second, mutation rates vary between annuals and perennials (Gaut et al., 2011), so that cassava may harbor more post-domestication mutations that contribute to cost. Altogether, the high level of deleterious, recessive, heterozygous mutations in cultivated grape provides a genomic explanation to a well-known feature of grape-breeding, which is severe inbreeding depression (Kole, 2011).
MATERIALS AND METHODS
For full materials and methods, please see SI Appendix, Supplementary Text. We collected leaf tissue for 13 individuals from 11 vinifera cultivars, nine sylvestris accessions and one accession of V. rotundifolia (subgenus Muscadinia) from the USDA grape germplasm collections in Davis, California (Table S1). DNA was extracted from leaf samples, Illumina paired-end sequencing libraries were constructed (TrueSeq), and libraries were sequenced as 150-bp paired reads. Illumina raw reads for five other cultivars were gathered from the Short Read Archive (SRA) at NCBI (Table S1).
Reads were trimmed, filtered and mapped to the PN40024 reference (12X) (Jaillon et al., 2007). Local realignment was performed around indels; reads were filtered for PCR duplicates; and sites with extremely low or high coverage were removed. For population structure analyses, we used ANGSD (Korneliussen et al., 2014) to generate a BEAGLE file for the variable subset of the genome, and then applied NGSadmix (Skotte et al., 2013). To measure genome-wide genetic diversity and other population parameters, we estimated a genome-wide sfs from genotype likelihoods (Korneliussen et al., 2014). PCA analyses were also based on the sfs.
Functional regions were based on the V. vinifera genome annotation in Ensembl (v34). Nonsynonymous SNPs were predicted to be deleterious based on a SIFT score □≤□ 0.05 (Kumar et al., 2009). The V. rotundifolia outgroup allele was submitted to prediction programs to avoid reference bias (Kono et al., 2016, Liu et al., 2017). The number of deleterious or synonymous alleles per individual or region was calculated as 2 × the number of homozygous variants + heterozygous variants (Henn et al., 2016).
We employed MSMC 2.0 to estimate Ne over time (Li and Durbin, 2011, Schiffels and Durbin, 2014), based on SNPs called in GATK v3.5 (DePristo et al., 2011)(see SI Appendix, Supplementary Text). Segregating sites within each sample were phased and imputed using Shapeit (Delaneau et al., 2013) based on a genetic map (Hyma et al., 2015). Demographic history was also inferred with SMC++, which analyzes multiple genotypes without phasing (Terhorst et al., 2017). SweeD (Pavlidis et al., 2013) was used to detect selective sweeps. FST values were averaged within 20 kbp non-overlapping windows using ANGSD (Korneliussen et al., 2014).
Functional categories were assigned to genes using VitisNet functional annotations (Grimplet et al., 2009). We tested functional category enrichment using Fisher’s Exact Test, with P ≤ 0.01 considered as significant. Gene expression data used SRA data for berry (SRP049306) and flower (SRP041212) samples. Reads were trimmed for qualitynd then mapped onto the PN40024 transcriptome (v.V1 from http://genomes.cribi.unipd.it/grape/) using Bowtie2 (Langmead and Salzberg, 2012). DESeq2 (Love et al., 2014) was used to normalize read counts and to test for differential expression.
Forward in time simulations were carried out using fwdpy11 (Thornton, 2014). Five hundred replicate simulations were run for each demographic and mating scheme model. The population decline model was based on SMC++ results and rescaled for computational performance. Three demographic models were simulated: constant population size, a linear population decline, and a discrete bottleneck. Two mating schemes were simulated: strict outcrossing for the whole simulation, and outcrossing with clonal propagation for the final 100 generations. Additional details are available from SI Appendix, Supplementary Text.
ACKNOWLEDGEMENTS
We thank R. Gaut and R. Figueroa-Balderas for generating the data and sampling and also D. Seymour, Q. Liu, K. Roessler and E. Solares for comments. YZ is supported by the International Postdoctoral Exchange Fellowship Program; JS is supported by the NSF-GRFP; BSG is supported by the Borchard Foundation. DC is supported by J. Lohr Vineyards and Wines and by E. & J. Gallo Winery.