ABSTRACT
Octoploid strawberry (Fragaria × ananassa) is a valuable specialty crop, but profitable production and availability are threatened by many pathogens. Efforts to identify and introgress useful disease resistance genes (R-genes) in breeding programs are complicated by strawberry’s complex octoploid genome. Recently-developed resources in strawberry, including a complete octoploid reference genome and high-resolution octoploid genotyping, enable new analyses in strawberry disease resistance genetics. This study characterizes the complete R-gene collection in the genomes of commercial octoploid strawberry and two diploid ancestral relatives, and introduces several new technological and data resources for strawberry disease resistance research. These include octoploid R-gene transcription profiling, dN/dS analysis, eQTL analysis and RenSeq analysis in cultivars. Octoploid fruit transcript expression quantitative trait loci (eQTL) were identified for 77 putative R-genes. R-genes from the ancestral diploids Fragaria vesca and Fragaria iinumae were compared, revealing differential inheritance and retention of various octoploid R-gene subtypes. The mode and magnitude of natural selection of individual F. × ananassa R-genes was also determined via dN/dS analysis. R-gene sequencing using enriched libraries (RenSeq) has been used recently for R-gene discovery in many crops, however this technique somewhat relies upon a priori knowledge of desired sequences. An octoploid strawberry capture-probe panel, derived from the results of this study, is validated in a RenSeq experiment and is presented for community use. These results give unprecedented insight into crop disease resistance genetics, and represent an advance towards exploiting variation for strawberry cultivar improvement.
INTRODUCTION
Cultivated strawberry (Fragaria ×ananassa) is an important specialty crop that is cultivated world-wide for its sweet and flavorful fruit. However, marketable yields and post-harvest quality are significantly affected by disease. The strawberry fruit presents a vulnerable target for microbial pathogens (Farzaneh et al., 2015), as it is soft, moist, carbohydrate rich, and subject to damage from forces as seemingly innocuous as rain (Herrington et al., 2011). Genetic disease resistance has been a long-standing breeding priority. While breeders have made progress in producing varieties with tolerance to some pathogens, growers remain dependent on exogenous crop protection strategies to reduce pathogen loads (Cordova et al., 2017).
Plant R-genes are mediators of resistance to specific pathogens via effector triggered immunity, which results in the hypersensitive response and cell death (Amil-Ruiz et al., 2018). R-genes require a high degree of regulation to maintain homeostatic transcript levels to mitigate off-target protein interactions (Hammond-Kosack and Jones, 1997). For this reason, many classes of functional R-genes are expressed at low levels unless elicited by pathogens (Lai and Eulgem, 2017), contributing to the challenges of R-gene genomic and functional annotation. About 60% of characterized plant R-genes contain nucleotide-binding (NB-ARC) and leucine-rich-repeat (LRR) domains, and are referred to NLR genes (Funk et al., 2018). Plant R-genes are frequent targets for genetic improvement via breeding and genetic engineering (Djian-Caporalino et al., 2014, Baumgartner et al., 2015), and gene editing methods may accelerate their introduction into already-elite varieties. However, progress has been hindered because relatively few R-genes conferring novel resistance have been characterized (Amil-Ruiz et al., 2018). This problem is appreciable in strawberry, where the genetic complexity of octoploid cultivars presents unique challenges for functional identification and cloning of causal variants. An analysis of diploid R-genes across the Rosaceae genus was previously conducted (Arya et al., 2014). New genetic resources for high-resolution genotyping in octoploid strawberry have resulted in the recent identification of several disease resistance loci (Roach et al., 2016, Mangandi et al., 2017, Anciro et al., 2018, Pincot et al., 2018, Salinas et al., 2018, Verma et al., 2018). However, the specific genes mediating resistance in these QTL intervals typically remain unresolved, as genomic resources for octoploid strawberry have not kept pace with genetic mapping.
Cultivated strawberry shares common ancestors with the extant diploid species F. vesca, F. iinumae, F. nipponica, and F. viridis (Edger et al., 2019). A high-quality octoploid strawberry genome has been recently developed (Edger et al., 2019), enabling new kinds analyses and improved resolution compared with previous studies involving Fragaria NLRs (Jia et al., 2015, Zhong et al., 2018). Analysis of this F. ×ananassa ‘Camarosa’ genome identified the repertoire of octoploid R-gene sequences and further demonstrated a general genomic retention bias towards F. vesca-like sequences (Edger et al., 2019).
This research compares R-genes from octoploid strawberry with its diploid ancestors and provides additional analysis into the genetic control of R-gene expression and retention patterns. Additional bias towards retention of F. vesca-like R-genes was detected in octoploid strawberry, beyond the bias observed in non-R-gene coding sequences. This finding provides insight into potential practical drivers of biased gene retention. Conserved domains were compared to describe specific R-gene phylogenic relationships. The octoploid genome was used to assemble 61 fruit transcriptomes, and used to discover subgenomic expression quantitative trait loci (eQTL) for R-genes expressed in octoploid fruit. Data from the octoploid ‘Camarosa’ strawberry gene expression atlas (Sánchez-Sevilla et al., 2017) was also used to determine R-gene transcript accumulation throughout the strawberry plant.
Resistance gene enrichment and sequencing (RenSeq) is an advantageous method for sequencing R-genes (Andolfo et al., 2014), and is likely to be very useful for de novo resolution of causal mutations (Witek et al., 2016). This method can be used to identify casual mutations within existing disease resistance QTL. For this purpose, a novel octoploid strawberry RenSeq capture probe library was designed using the R-genes identified in this analysis. This panel was experimentally validated using the University of Florida breeding germplasm. The results demonstrate robust capture and resequencing of octoploid and diploid R-genes using only short second-generation sequence reads and with relatively deep genomic multiplexing.
This report characterizes the complete R-gene collection in the genomes of commercial octoploid strawberry and two diploid ancestral relatives, providing the genome-level resolution necessary for fully exploiting genetic disease resistance in strawberry. This research introduces several new technology and data resources that now may be applied in study of strawberry disease resistance.
RESULTS
Octoploid and Diploid R-genes
The genomes of octoploid ‘Camarosa’, diploid F. vesca, and diploid F. iinumae were analyzed for R-gene signatures. The F.iinumae genome was selected to represent the closely-related ‘old world’ diploid ancestors F.iinumae, F. nipponica and F. viridis, which each have highly similar but fragmented genomic assemblies.
Putative R-genes were identified based on protein domain and motif analysis, which identified gene models with traditional NLR-type domains, including coiled coil (CC), Toll Interleukin Receptor-like (TIR), Leucine Rich Repeat (LRR), and Nucleotide Binding - APAF-1 (apoptotic protease-activating factor-1), R proteins and CED-4 (Caenorhabditis elegans death-4 protein) (NB-ARC). Gene models with NLR-type domains that are not highly specific to NLR sequences (e.g. LRR domains) were included if there was also supporting evidence of an additional NLR-associated motif. BLAST2GO annotated disease resistance associated genes not meeting these criteria were analyzed manually, leading to the intentional inclusion of many putative Receptor-like Kinase (RLK-type) R-genes in this analysis.
Octoploid F. ×ananassa ‘Camarosa’ carries 1,962 putative resistance genes (1.82% of all genes), including 975 complete or truncated NLR genes (Table 1). NLR gene content is similar in genic proportion to the 367 complete or truncated NLR genes in F. vesca (1.09% of all genes) and 387 in F. iinumae (0.5% of all genes) (Table 1). Traditional NLR domains including Coiled Coil (CC), Toll Interleukin Receptor-like (TIR), Leucine Rich Repeat (LRR), and Nucleotide Binding - APAF-1 (apoptotic protease-activating factor-1), R proteins and CED-4 (Caenorhabditis elegans death-4 protein) (NB-ARC) domains comprise the majority of domain classes in all predicted resistance gene models in diploid and octoploid strawberry accessions (Figure 1). In many categories, the three genomes show somewhat dissimilar ratios of relative NLR-subtype content (Table 1). These include biases towards TIR-only proteins in F. vesca and CNL-type R-genes in F. innumae. Octoploid ‘Camarosa’ is proportionally intermediate for many NLR categories, relative to F. vesca and F. innumae. A high proportion of TIR-NB and TIR-NB-LRR-type NLR-genes is observed in ‘Camarosa’. However, the overall proportion of TIR-containing genes is similar. The Resistance to Powdery Mildew 8 (RPW8) domain, a disease resistance domain associated with broad-spectrum mildew resistance in Arabidopsis, appears frequently in strawberry and is present in 136 (13.9%) of octoploid NLRs (Table 1). Basic trends in NLR-subtype genomic content in ‘Camarosa’ does not more strongly resemble either F. vesca or F. innumae.
In ‘Camarosa’, 750 genes contain at least one NB-ARC domain, which is the most characteristic domain of NLR-type R-genes (Table 1). The ratio of ‘Camarosa’ NB-ARC-containing genes to total predicted gene content (1:144) is higher than in F. vesca (1:171) and F. iinumae (1:262), possibly indicating diversifying selection of NLR genes in octoploid F. ×ananassa. A substantial number of atypical domains are present on strawberry R-genes, including malectin-like carbohydrate-binding domains, RNA-binding domains, transcription factor-like WRKY and F-box domains, and several types of protein kinase domains (Figure 1).
Tandem clusters of R-genes were observed in all three of the analyzed strawberry genomes. The phenomena of R-gene expansion through tandem duplication is exemplified in the RPW8-containing R-gene class. Of the seventy-one RPW-containing R-genes in F. vesca, all but seven reside in one of a few genomic clusters (Figure S1). The major RPW cluster observed in F. vesca chromosome 1 is strongly retained in ‘Camarosa’ (Figure S2). Similar R-gene hotspots are observed throughout the diploid and octoploid strawberry reference genomes. Genome annotations for all NLR domains in ‘Camarosa’ and F. vesca genotype Hawaii 4 v2.0 (Tennessen et al., 2014) are provided in File S1. Annotations are also available on the JBrowse web-based genome browser at the Genome Database for the Rosaceae (www.rosaceae.org).
Phylogenetic Analysis of Strawberry R-genes
The conserved NB-ARC domains from ‘Camarosa’, F. vesca, and F. iinumae were compared via maximum likelihood analysis to examine evolutionary trends among NLR genes. NB-ARCs from all three genomes phylogenetically organized mostly according to their extended R-gene domain structures, with TNLs, CNLs, and RPW-associated NB-ARCs forming clades based on this criteria (Figure 2). Minor NLR subtypes, such as WRKY-associated NLR genes, also sorted into a unique subclade based only on NB-ARC sequence. Multiple distinct clades with identical domain architectures were detected, and in a few cases these subclades are relatively distant from one another.
R-gene Transcript Accumulation
Raw RNAseq expression data from different tissues of ‘Camarosa’, derived from the octoploid strawberry gene expression atlas (Sánchez-Sevilla et al., 2017), were reassembled based on the ‘Camarosa’ genome. A majority of ‘Camarosa’ NLR genes are predominantly expressed in the roots and leaves (Figure 3A-B). Comparatively few NLRs are predominantly or specifically expressed in the mature receptacle. Most NLR type R-genes are broadly specific to only one or two tissues. Expressed NLR genes from root, leaf, green and white receptacles show remarkably poor overlap. Overall NLR transcript accumulation is correlated across all stages of receptacle development, with strongest expression in the earlier stages and decreasing with maturity in both the receptacle and achene. Complete R-gene expression values for each tissue are provided in Table S1.
Mature receptacle transcriptomes from 61 field-grown individuals of three octoploid populations reveal broadly stable R-gene expression levels (SD 0.09). R-genes comprise 1.8% of the predicted gene models in the ‘Camarosa’ genome, but represent an average of only 0.48% of total transcripts in the mature receptacle (Figure 3C). Minute but statistically significant absolute differences were observed between each of the three populations [F (2,59) = 19.06, p < .00001]. To explore possible biases in the gene expression analysis caused by confounding environmental factors, principal component analysis (PCA) was performed on all RNAseq assemblies including two replicates of ‘Mara des Bois’ fruit harvested in different seasons (Figure S2). Total transcript-accumulation variation clusters most strongly according to familial relationship, with the ‘Mara des Bois’ replicates showing similar expression patterns. A measured amount of variation due to environmental influence can also be seen, as the two ‘Mara des Bois’ RNAseq replicates cluster somewhat more closely with their co-harvested progeny.
R-gene eQTL in 61 Strawberry Fruit Transcriptomes
eQTL analysis was performed to evaluate heritable genotypic effects on R-gene transcription using 61 octoploid IStraw35 genotypes and mature fruit transcriptomes. This analysis identified 77 R-gene-like sequences with at least one highly-significant locus explaining differential expression (3.9% of octoploid genome-predicted R-genes). These R-genes include 53 NB-ARC containing genes, comprised of 25 TNL’s, 14 CNL’s, 3 NBS-RPW proteins, and 11 NB-ARC-only proteins (File S1). The majority of remaining eQTL genes are TIR, LRR, and RPW-only genes. As the ‘Camarosa’ genomic locus of each transcript is known, cis vs trans eQTL status was determined. Of 77 significant R-gene eQTL transcripts, all 77 R-genes are regulated via a cis-genetic locus (Table 2), of which 24 R-genes are also under regulation of an additional second trans-eQTL (Table 3). No solely trans-eQTL were discovered among this set of R-genes. The most significant IStraw35 SNP marker name and position for each R-gene transcript is provided with the eQTL phase, minor allele frequency, p-value (FDR-adjusted), heritability estimate, expression in parental lines, and BLAST2GO description.
A representative eQTL R-gene (maker-Fvb5-1-snap-gene-191.37) is detailed in Figure 6. Analysis by BLAST2GO indicates an “RPM1-like disease resistance gene” whose Arabidopsis thaliana homolog confers resistance to Pseudomonas syringae. This gene is hereafter referred to as “FaRPM1.1”. An unusual double NB-ARC structure is predicted for FaRPM1.1 (Figure 6A). An eQTL was detected for this gene relative to chromosome 5 on the F. vesca genome position (Figure 6B). This eQTL is superficially analogous to the physical position of octoploid FaRPM1.1 on chromosome 5, homoeolog 1 (Figure 6C). The significant eQTL markers are not included in the ‘Holiday’ × ‘Korona’ octoploid genetic map, impeding a recombination-based subgenomic genetic association (van Dijk et al., 2014). However, the associated marker physical sequences match uniquely to the ‘Camarosa’ chromosome 5 homoeolog 1 locus, within several kilobases of the FaRPM1.1, confirming a cis-eQTL designation (Figure 6D).
Evolutionary Pressure on F. ×ananassa R-genes
Elevated median dN/dS ratios were observed across all 1,962 predicted F. × ananassa R-genes (0.4232) compared to non R-genes (0.3526) (Figure 4). Fewer R-genes exhibited extremely low dN/dS ratios, indicating that high degrees of R-gene conservation are less common. However, a similar rate of hypervariable genes (dN≫dS) was observed between R-genes and non R-genes. A complete list of dN/dS ratios for each F. ×ananassa R-gene is provided in Supplementary Table 1.
R-gene dN/dS values were compared against transcript accumulation across various strawberry tissues and receptacle stages. R-genes with low transcript accumulation across all tissues were correlated with higher dN/dS ratios (Pearson’s r = −0.69, p < .0001) (Figure S5). In other words, R-genes with poor evidence of expression also have higher ratios of non-synonymous mutation capable of altering amino acid sequences and affecting protein function.
Subgenome Dominance in Octoploid Strawberry
Polyploidization is associated with rapid genome remodeling events to establish a new homeostasis, including selective gene loss and methylation. While R-gene expansiveness is often considered evolutionarily favorable, genes that are stoichiometrically or dosage sensitive are more commonly retained in duplicate after polyploidization (Edger and Pires, 2009, Birchler and Veitia, 2012, Edger et al., 2017a). The ‘Camarosa’ octoploid genome, in comparison with the genomes from its diploid F. vesca-like and F. iinumae-like ancestors, has provided an ideal platform to study the general biological phenomena of post-hybridization genome remodeling and subgenome dominance (Edger et al., 2019). To gauge R-gene post-hybridization retention specifically, a gene-focused baseline assessment of subgenome dominance in the ‘Camarosa’ octoploid genome was necessary. Putative gene ancestry was predicted based on gene-by-gene sequence comparisons to determine the closest ‘Camarosa’ gene homologs in F. vesca (Fragaria_vesca_v2.0.a2.cds) and F. iinumae (FII_r1.1cds), which is representative of the highly similar ‘old world’ subgenomes. This gene-by-gene putative orthology analysis was selected over a total comparison of homeologous chromosomes, as extensive genetic transfer from the F. vesca-like subgenome has strongly converted all subgenomes to contain F. vesca-like genes over time (Tennessen et al., 2014), and because the F. iinumae FII_r1.1 genome is incompletely assembled and is not amenable to whole-genome alignment. By this facile coding-sequence comparison method, a significant bias towards the retention and/or expansion of F. vesca-like genes is observed in the ‘Camarosa’ genome (Figure 5A), with an even stronger bias towards F. vesca-like fruit gene expression (Figure 5B) consistent with previous analyses (Edger et al., 2019). Of 108,087 F. ×ananassa ‘Camarosa’ predicted gene models, 68,664 genes (63.5%) were most similar to an F. vesca gene model, with 35,377 (32.7%) most similar to an F. iinumae gene model, with a minority of genes not closely matching either. A single homoeologous chromosome with significantly more F. vesca-like genes (∼80% F. vesca-like) was seen in every chromosomal group. In a majority of cases, this putative F. vesca-derived chromosome possesses the greatest total gene content of the chromosome group. In 61 fruit transcriptomes, 73.7% of total transcripts derived from a gene sequence most similar to F. vesca, corresponding to a 10.2% expression increase relative to the baseline genomic retention bias. This bias towards the expression of F. vesca-like sequences was seen on every subgenome (Figure 5B, yellow highlight).
NLR-gene Subgenome Dominance
Significant gene retention bias towards R-genes that are more F. vesca-like is observed in ‘Camarosa’ gene models (Figure 6). Of the 750 predicted NLR-gene models, 71.7% more closely resemble a F. vesca gene rather than an F. iinumae gene (Figure 6A). This is somewhat higher than the baseline retention bias towards F. vesca-like genes in octoploid (63.8%) from this analysis. In every chromosome group, the F. vesca-like homoeologous chromosomes (yellow highlight) retained the greatest number of NLRs. Of expressed R-genes, 1,125 demonstrate the highest sequence identity with an F. vesca gene, 444 show highest sequence identity with an F. iinumae gene, and 2 (an RPW-only gene, and an LRR_8-only gene) are without significant matches to either diploid genome. While F. vesca-like genes contribute the most to total mature fruit NLR expression (70.5% of transcripts), this is proportional to F. vesca-like NLR genome content (71.3%) and is similar in magnitude to general F. vesca expression bias (73.7%) (Figure 6B). In other words, F. vesca-like NLR genes are retained in the octoploid genome somewhat above the baseline bias, but do not experience the additional expression bias that is a generic feature of F. vesca-like transcripts.
RenSeq for Strawberry Resistance Genes
A panel of sequence capture probes was designed based on putative R-gene sequences discovered in the genomes of F. ×ananassa ‘Camarosa’, F. vesca genotype Hawaii 4, F. iinumae, and de novo fruit transcriptomes from F. ×ananassa cultivars ‘Mara des Bois’ and ‘Florida Elyana’. Benchtop RenSeq capture on genomic DNA was performed on a collection of sixteen strawberry genotypes, including twelve F. ×ananassa advanced breeding selections, three F. ×ananassa disease-resistant cultivars, and a diploid F. vesca. As a preliminary validation of capture efficiency with this novel RenSeq probe panel, multiplexed Illumina sequencing was performed on captured R-gene genomic libraries. An average of 2.60 million reads (2 × 100bp) was obtained for each of sixteen libraries from a single lane. Reads from octoploid and diploid lines were mapped to their respective annotated genomic references. An average R-gene resequencing depth of 26x was achieved in the ‘Camarosa’ RenSeq line and 30x in the F. vesca, with similar coverage ranges in the other diverse octoploid accessions (Figure 8). In the ‘Camarosa’ RenSeq line, 68% of reads mapped to an annotated resistance gene, while an additional 20% of reads mapped to a non-R-gene gene model. In F. vesca this efficiency was lower, where 36% of reads mapped to an annotated R-gene. A FASTA of RenSeq probes is provided for use in File S2. Example probe coverage is detailed in Figure S6.
DISCUSSION
These results provide a characterization of the R-gene complement of cultivated octoploid strawberry and the relationship to the extant diploid relatives, F. vesca and F. iinumae. Commercial strawberry is hypothesized to contain a single F. vesca-like subgenome, and three highly-similar ‘old world’ subgenomes which are likely derived from F. iinumae, F. viridis, and F. nipponica (Edger et al., 2019). Polyploidization is associated with massive genome remodeling events including gene loss (Edger et al., 2017a, Edger et al., 2018). Linkage-map comparisons in octoploid and diploid strawberry have uncovered extensive unidirectional homoeologous exchanges which have broadly converted the three ‘old world’ F. innumae-like subgenomes to be more F. vesca-like (Tennessen et al., 2014). This finding explains the difficulties of clear ancestral delineation of strawberry homoeologs (Vining et al., 2017). Recent analysis of the octoploid genome reveals that biased homologous exchanges have converted other subgenomes to be more like the dominant F. vesca-like subgenome (Edger et al., 2019). The present gene-level homology and expression analysis shows the majority of F. vesca-like dominance is derived from F. vesca-like genes residing on alternate subgenomes. For NLR genes in particular, the bias towards F. vesca-like genomic retention was more pronounced. Unlike general octoploid genes, expression of F. vesca-like and F. iinumae-like NLRs is proportional to their genomic representation. This finding provides potential insight into the practical drivers of subgenome conversion. Consolidation of redundant genes and maintenance of stoichiometrically sensitive genes has been hypothesized as a driver for gene retention bias. NLRs are involved in consequential and sensitive protein-level interactions, including signaling functions requiring homo- and hetero-dimerizations. Avoidance of dysfunctional NLR molecular interactions may have contributed to the observed biases in NLR retention and expression, post-polyploidization.
Multiple distinct NLR clades with identical domain architectures were detected, likely distinguishing intra-subgenome homologs from different subgenomes. These likely reflect broad ancestral sequence divergences prior to hybridization. Comparison of R-genes in octoploid and diploid strawberry reveals enrichment of different subtypes. The ‘Camarosa’ genome shows a large increase in complete TNL-type R-genes and a concomitant decrease in truncated TIR-only genes, relative to its diploid ancestral relatives. The F. iinumae genome shows a considerably larger amount of CNL-types. TNLs have been nearly eliminated from most monocot genomes in bias towards CNL-types (Nepal et al., 2017). The reasons for emerging divisions in TNL/CNL content in plant genomes remains unclear. In hybrid F. ×ananassa, it is possible that relatively high number of complete TNL genes is a result of higher rate of retention post-polyploidization in this category. Many of the non-classical domains discovered in ‘Camarosa’ R-genes have also been found and characterized in the R-genes of other species. These include an LRR/Malectin-like RLK protein, which mediates powdery mildew resistance in barley and wheat (Rajaraman et al., 2016). Atypical R-gene domains physically associated with NB-ARCs have been implicated in a variety of active disease resistance functions, including signal transduction and defense gene activation, and serving as decoy endogenous sequences to bait pathogen effectors into direct interaction and detection (Khan et al., 2016).
A large proportion of strawberry NLR genes from octoploid and diploid genomes are associated with RPW8 domains. It has been suggested that the RPW8 domain emerged with the earliest land plants and subsequently merged with NLR genes, however their prevalence across plant genomes varies widely (Zhong and Cheng, 2016). The RPW domain appears to have been completely lost in monocots, and is rare in many other species. R-gene genomic studies frequently neglect to assess the presence of NLR-associated RPW domains. Two NBS-RPW8 proteins conferring mildew-resistance have been described in the Arabidopsis thaliana genome (Xiao et al., 2001) that retained their function when expressed in grape (Hu et al., 2018). The AtRPW8.2 gene was recently shown to induce the expression of defense-related genes when expressed in strawberry leaves (Cui et al., 2017). This R-gene subtype has apparently expanded in strawberry, possibly due to unusually high mildew disease pressure exerted on strawberry species and intense selection for resistance. However, R-gene domain content is not reliably predictive of resistance specificity, and close R-gene paralogs are known to confer resistance to pathogens in entirely different kingdoms (Wen et al., 2015). Interestingly, the strawberry RPW8 domain is frequently found in association with NB-ARC-containing genes but never with TIRs. The purpose of RPW8 gene expansion in strawberry remains an interesting open question.
Octoploid NLR transcript accumulation is low throughout the strawberry plant, but is particularly low in the mature receptacle. This is a somewhat unexpected result due to the many pathogens targeting this susceptible organ. It is possible that only certain R genes are highly upregulated in the response to pathogen attack. Another possibility is that resistance based on the hypersensitive response may be less effective at mature stages, where cell wall disruption has already initiated with ripening and the intercellular environment is conducive to pathogen growth. Transcriptional response to Botrytis cinerea infection in the mature octoploid receptacle led to differential expression of over 1,500 genes, including secondary metabolism and pathogenesis-related (PR) genes, but only 15 NLR genes (Xiong et al., 2018). In the present study, elevated NLR transcription in the green, white, and turning stages suggest NLR-based resistance may be more prevalent at these earlier developmental stages. The highest levels of NLR expression were seen in the roots and leaves, indicating this mode of resistance may be more common in these tissues. NLR expression overlaps poorly between tissues, supporting the concept that NLRs are optimized for each tissue. It would be interesting to examine the patterns of tissue specific expression of R-genes against different strawberry pathogens.
The genetics of differential fruit expression of R-genes in strawberry cultivars was examined via eQTL analysis. In many cases, the identified genetic markers described presence/absence of R-gene expression. The identified eQTLs were often due to a cis variant at a single detectable locus, very close to the physical position of the gene itself. This is suggestive of a mutation in a cis-regulatory element, such as the gene promoter or 5’-UTR, or a genic presence/absence structural variation. Such presence/absence variation affects nearly 20% of genes in the Brassica oleracea pangenome and is a major contributor of agronomic trait diversity (Golicz et al., 2016). As these strawberry R-gene eQTL are derived from crosses of cultivars with differing ranges of pathogen susceptibility, these eQTL genes represent strong candidates for functional disease resistance and potential genetic improvement. These disclosed R-gene eQTL marker sequences may be cross-referenced with existing disease-resistance QTL to potentially identify causal R-genes. As categories of R-genes are expressed at very low levels unless induced by pathogens (Lai and Eulgem, 2017), the genotype × pathogen interaction may have lowered confidence values or introduced possible type II errors in eQTL detection. However, the reproducibility of cis-eQTL tends to be particularly high in related populations (Peirce et al., 2006). Additional replicates and infected/non-infected challenge conditions will likely reveal additional eQTL associations and greatly improve the confidence of heritability estimates, and may be used to validate pathogen-induced R-gene candidates.
F. ×ananassa predicted R-genes (NLRs and other R-gene types) have elevated average dN/dS ratios compared to non R-genes, indicating greater overall tendency towards divergent selection. R-genes with very low dN/dS ratios are likely to be conserved disease resistance genes. This active evolutionary selection is highly indicative of function. Of particular interest are strawberry R-genes demonstrating both low dN/dS values and low transcript levels across all tissues (Table S1). Many functional R-genes are expressed at low levels, either constitutively or until elicited by the proper pathogen (Lai and Eulgem, 2017). Such R-genes may be difficult to distinguish from pseudogenes on a purely transcriptional bases. Low dN/dS values demonstrate selective pressure to maintain these sequences, offering evidence of maintained function despite low expression. The results of this combinatorial analysis can be used help identify novel sources of R-gene-based resistance which may be otherwise difficult to detect. It should be noted that this analysis is performed in the context of a single cultivar, which has undergone several centuries of artificial selection. It is possible that wild octoploid species may reveal different and more natural patterns of disease-resistance selection. More sequenced accessions from geographically diverse wild and cultivated germplasm are needed. Further analysis on the octoploid pangenome will reveal more detailed selection patterns, and more importantly, reveal recent selection sweep events which may have occurred in certain R-gene groups.
Many R-genes were discovered clustered in the genomes of both octoploid and diploid strawberry, highlighting the challenges of resolving individual R-genes via association mapping and positional cloning. The difficulty of isolating functional R-genes from strawberry disease resistance QTL was the principle motivator of this analysis. A thorough identification of R-genes in the octoploid genome is necessary for future genomics and genetics analysis in strawberry disease-resistance breeding programs. Additionally, this information is prerequisite for creating a RenSeq probe panel, to facilitate targeted R-gene sequencing in breeding programs.
A novel strawberry RenSeq capture-probe library was developed based on the R-gene sequences identified from genomic and transcriptomic resources. This 39,501-probe panel was experimentally validated using octoploid and diploid genomes and resulted in an average ∼20× R-gene resequencing depth per genomic library, using only multiplexed short reads. RenSeq assembly in ‘Camarosa’ and the F. vesca genotype Hawaii 4 resulted in significant coverage of R-genes. Interestingly, the capture efficiency (R-gene reads over total reads) was somewhat lower in F. vesca, likely representing the saturation of capture probes in the smaller F. vesca genome. Similar rates of perfect sequence matching along the entire read in ‘Camarosa’ and F. vesca (66.24% and 69.68%, respectively) indicates that theoretical octoploid reference sequence errors are not likely promoting RenSeq assembly error in ‘Camarosa’. However, 14.4% of mapped ‘Camarosa’ R-gene reads have an equally valid alternative R-gene mapping locus, compared with just 4.83% in F. vesca. This difference indicates that homoeologous sequence redundancy is an appreciable issue for mapping short-reads in polyploids, even with an isogenic (but not haplotype-specific) mapping reference. Longer sequencing read-lengths, spanning less well-conserved non-coding sequences, will assist in de novo resolution of similar loci in octoploid strawberry. Combining RenSeq with longer-read third generation sequencing technologies will allow for improved de novo assembly of R-gene loci, and will greatly facilitate causal mutation detection within disease resistance QTL in octoploid strawberry.
MATERIALS AND METHODS
Plant Populations and Genetic Materials
Three pedigree-connected and segregating strawberry populations were created from crosses ‘Florida Elyana’ × ‘Mara de Bois’, ‘Florida Radiance’ × ‘Mara des Bois’, and ‘Strawberry Festival’ × ‘Winter Dawn’ (Figure S7). These cultivars and 54 progeny were selected for RNAseq and Istraw35 SNP genotyping analysis (Verma et al., 2017), and were used to identify expressed genes and R-gene eQTL. De novo assemblies of ‘Mara des Bois’ and ‘Florida Elyana’ were also used to help design RenSeq capture probes.
For RenSeq, 14 disease resistant octoploid cultivars and elite breeding lines were selected from the University of Florida breeding program, and supplemented with ‘Camarosa’ and with the ancestral diploid F. vesca. The RenSeq lines are F. vesca genotype Hawaii 4, ‘Camarosa’, Sweet Sensation® ‘Florida127’, ‘Florida Elyana’, 11.28-34, 11.77-96, 11.98-41, 12.115-10, 12.121-5, 13.26-134, 13.42-5, 13.55-195, 14.100-58, 14.100-59, 14.101-154, and 14.101-225.
Identification of R-genes in Strawberry spp.
R-genes were predicted from the strawberry octoploid ‘Camarosa’ draft genome “F_ana_Camarosa_6-28-17.rm” (Edger et al., 2019), the diploid F. vesca reassembly “Fragaria_vesca_v2.0.a2” (Tennessen et al., 2014), and the diploid F. innumae assembly “FII_r1.1” (Hirakawa et al., 2014). Domain-level analysis was performed using the CLC Genomics Workbench 11 HMM implementation to search for Pfam-v29 domains on translated gene models from all genomic and transcriptomic strawberry resources. Motif search was performed on all translated gene models, using 56 R-gene-associated motifs collected from (Lukasik and Takken, 2009, Jupe et al., 2012, Van Ghelder and Esmenjaud, 2016). The CLC Genomics Workbench 11 (CLC Bio, Denmark) pattern discovery tool was trained on a preliminary list of strawberry R-genes, and novel motifs were reiterated back to all protein models. The Ncoils sequence analysis algorithm (Lupas et al., 1991) was used to detect coiled-coil domains, and the output was parsed into GFF3 format for protein list reannotation. BLAST2GO annotation (Conesa et al., 2005) was performed to assign putative functions to all genes and confirm sequence association with disease resistance in a cross-referenced database.
Protein models containing canonical R-gene domains (eg. NB-ARC domain) were selected for inclusion as R-genes, as were gene models with more common domains (eg. LRR) with supporting evidence of an R-gene-associated motif. BLAST2GO annotated disease resistance associated genes not meeting the domain and motif-level criteria were analyzed for potential inclusion, leading to the inclusion of many LRR-containing RLK putative R-genes.
NB-ARC Phylogenetic Analysis
NB-ARC domains were extracted from F. iinumae, F. vesca, and F. ×ananassa ‘Camarosa’. The CIPRES Science Gateway (Miller et al., 2010) was utilized for full-length protein sequence alignment using MUSCLE v3.7 (Edgar, 2004) and Maximum likelihood analysis using RAxML v8.2.10 (Stamatakis, 2014). Tree construction was performed using the PROTGAMMA rate distribution model with 100 bootstrap replicates, and rooted with human APAF-1. This process was replicated five times using different random number seeds. Trees were visualized in CLC Genomic Workbench 11 with a 50% threshold bootstrap value. Word clouds were generated per clade based on the relative domain content of the full proteins.
Subgenome Dominance in Octoploid Strawberry R-genes
The closest homolog for each F. ×ananassa ‘Camarosa’ gene in either Fragaria_vesca_v2.0.a2.cds or FII_r1.1cds was determined via BLAST analysis (e-value threshold < 0.1, word size=25, match=1, mismatch=l, existence=0, extension=2). F. vesca-like and F. iinumae-like gene counts and TPMs were independently calculated for each octoploid chromosome. This process was performed first on all genes in the ‘Camarosa’ genome to establish the baseline gene retention and expression bias, and then repeated using only predicted NLR genes.
dN/dS Analysis
dN and dS values were computed using a set of custom scripts (https://github.com/Aeyocca/ka_ks_pipe/). Orthologous genes between the F. ×ananassa and F. vesca v4 (Edger et al., 2017b) genomes were identified using the compara module in JCVI utilities library (Tang et al., 2015). Filtering of the JCVI utilities output was performed using a custom perl script to identify the best syntenic ortholog and best blast hit below e-value 1e-4. Alignment of each orthologous gene pair was performed using MUSCLE v3.8.31 (Edgar, 2004), followed by PAL2NAL (v14) (Suyama et al., 2006) to convert the peptide alignment to a nucleotide alignment. Finally, dN and dS values were computed between those gene pairs using codeml from PAML Version 4.9h (Yang, 2007) with parameters specified in the control file found in the GitHub repository listed above.
Tissue-specific Transcriptome Analysis
Raw short read RNAseq libraries from various ‘Camarosa’ tissue (Sánchez-Sevilla et al., 2017) with the study reference PRJEB12420 were download from the European Nucleotide Archive (https://www.ebi.ac.uk/ena). The complete 54 library RNAseq experiment consisted of six independent green receptacle libraries, six white receptacle libraries, six turning receptacle libraries, six red receptacle libraries, three root libraries, three leaf libraries, and six achene libraries each for all corresponding fruit stages. Raw RNAseq reads were assembled to the ‘Camarosa’ reference using the same pipeline as previously described for fruit transcriptome population analysis. Expression values from biologically-replicated libraries were averaged. Clustvis (Metsalu and Vilo, 2015) was used for tissue-based RNAseq clustering and heatmap visualization using correlation distance and average linkage with scaling applied using default parameters.
Fruit Transcriptome Analysis
61 fruit transcriptomes were sequenced via Illumina paired-end RNAseq (Avg. 65million reads, 2×100bp), and consisted of parents and progeny from crosses of ‘Florida Elyana’ × ‘Mara de Bois’, ‘Florida Radiance’ × ‘Mara des Bois’, and ‘Strawberry Festival’ × ‘Winter Dawn’. Reads were trimmed and mapped to the F. ×ananassa octoploid ‘Camarosa’ annotated genome using CLC Genomic Workbench 11 (mismatch cost of 2, insertion cost of 3, deletion cost of 3, length fraction of 0.8, similarity fraction of 0.8, 1 maximum hit per read). Reads that mapped equally well to more than one locus were discarded from the analysis. RNAseq counts were calculated in Transcripts Per Million (TPM). Three-dimensional principle component analysis (PCA) was performed on all RNAseq assemblies, including two replicates of ‘Mara des Bois’ fruit harvested three years apart and sequenced independently (Figure S2). Transcript abundances were normalized via the Box-Cox transformation algorithm performed in R (R. Development Core Team, 2014) prior to eQTL analysis. The BLAST2GO pipeline was used to annotate the full ‘Camarosa’ predicted gene complement.
Genotyping and Genetic Association of Octoploid Fruit R-genes
The Affymetrix IStraw35 Axiom® SNP array (Verma et al., 2017) was used to genotype 60 individuals, including six parental lines from three independent biparental RNAseq populations (Figure S7). Sequence variants belonging to the Poly High Resolution (PHR) and No Minor Homozygote (NMH) marker classes were included for association mapping. Mono High Resolution (MHR), Off-Target Variant (OTV), Call Rate Below Threshold (CRBT), and Other marker quality classes, were discarded and not used for mapping. Individual marker calls inconsistent with Mendelian inheritance from parental lines were removed. The F. vesca physical map was used to orient marker positions as current octoploid maps do not include a majority of the available IStraw35 markers. A genome-wide analysis study (GWAS) was performed using GAPIT v2 (Tang et al., 2016) performed in R. R-gene eQTL were evaluated for significance based on the presence of multiple co-locating markers of p-value < 0.05 after false discovery rate correction for multiple comparisons. Cis vs trans eQTL determinations were made by corroborating known ‘Camarosa’ physical gene position with the eQTL position the F. vesca map. In the example case of FaRPM1.1, subgenomic localization was confirmed via BLAST of the associated markers to the correct ‘Camarosa’ homoeologous chromosome.
RenSeq Probe Design and Validation
A panel of 39,501 of 120mer-length capture probes were designed based on the set of discovered strawberry R-genes from F. ×ananassa ‘Camarosa’, F. vesca genotype Hawaii 4, F. iinumae genomes, and de novo fruit transcriptomes from F. ×ananassa ‘Mara des Bois’ and ‘Florida Elyana’. A proprietary algorithm was used to select for capture probes of ideal hybridization thermodynamics and screened for potential off-target capture in the intergenic regions of ‘Camarosa’ and F. vesca (Rapid Genomics LLC, Gainesville FL). Probes were designed to not span exon-exon junction, to facilitate cross-utility for both genomic and cDNA libraries (Figure S6). A minimum baseline of 1x probe coverage was provided across the length of every predicted R-gene coding sequence, and additional probes were designed against conserved R-gene domains in order to promote capture of unknown and divergent R-genes across diverse octoploid accessions. RenSeq capture was performed on genomic libraries from fifteen octoploid disease-resistant cultivars and advanced breeding selections, and F. vesca, based on conditions set by (Jupe et al., 2014), with optimizations provided by Rapid Genomics LLC. Captured libraries were sequenced via 16× multiplexed Illumina Hiseq (2 × 100bp) and mapped to their respective annotated genomic references using CLC Genomic Workbench 11 (CLCBio, Aarhus, Denmark) (Similarity fraction = 0.9, Length fraction = 0.9, Match score = 1, Mismatch cost = 2, Insertion cost = 3, Deletion cost = 3).
Data Availability
Supplementary figures, tables, files, and raw data are available at FigShare. Custom scripts used for performing dN/dS analysis are available at Github: https://github.com/Aeyocca/ka_ks_pipe/. Raw short read RNAseq data from fruit transcriptomes are available from the NCBI Short Read Archive under project SRP039356 (http://www.ncbi.nlm.nih.gov/sra/?term=SRP039356). Raw short read RNAseq data from the ‘Camarosa’ gene expression atlas (Sánchez-Sevilla et al., 2017) are available at the European Nucleotide Archive (https://www.ebi.ac.uk/ena) with the study reference PRJEB12420. Results derived from these data are compiled in Table S1. File S1 contains BED files for annotating the octoploid genome (Edger et al., 2019) with R-genes and R-gene domains. Renseq probe sequences are provided in File S2. Istraw35 markers, map positions, and sample genotypes used in eQTL analysis are available in File S3.
CONFLICT OF INTEREST
The authors declare no conflicts of interest.
ACKNOWLEDGEMENTS
We acknowledge Aristotole Koukoulidis for ncoils prediction and annotation, Matthew Robinson for assistance with data normalization, Rapid Genomics LLC for technical assistance in probe generation and sequence capture, Anne Schwartz, Max Hogshead, Kiran Sharma and Nadia Mourad for assistance in data compilation, Ben Harrison and Max Hogshead for assistance in gDNA isolation of eQTL lines, and Dr. Alan Chambers and Dr. Jeremy Pillet for RNA isolation and RNAseq line selection.