ABSTRACT
Host-parasite coevolution can maintain high levels of genetic diversity in traits involved in species interactions. In many systems, host traits exploited by parasites are constrained by use in other functions, leading to complex selective pressures across space and time. Here, we study genome-wide variation in the staple crop Sorghum bicolor (L.) Moench and its association with the parasitic weed Striga hermonthica (Delile) Benth., a major constraint to food security in many African countries. We hypothesize that sorghum landraces are subject to geographic selection mosaics within parasite-prone areas and selection against resistance where S. hermonthica is never found. Supporting this hypothesis, multiple independent loss-of-function alleles at sorghum LOW GERMINATION STIMULANT 1 (LGS1), a locus known to impact resistance, are broadly distributed among African landraces and geographically associated with S. hermonthica occurrence, suggesting a role in local adaptation to parasite pressure. However, the low frequency of these alleles within S. hermonthica-prone regions and their absence elsewhere indicates potential trade-offs restricting their distribution. LGS1 impacts stereochemistry of strigolactones, hormones controlling plant architecture, belowground signaling with other organisms, and abiotic stress tolerance. Supporting trade-offs, transcriptome profiling of nutrient-stressed roots revealed differential regulation of several strigolactone biosynthesis and signaling genes in LGS1-deficient sorghum compared to a susceptible line. Signatures of balancing selection surrounding LGS1 and candidates from analysis of genome-wide associations with parasite distribution support long-term maintenance of diversity in parasite resistance genes. Our study of host resistance evolution across smallholder agroecosystems provides a valuable contrast to both industrial farming systems and natural communities.
SIGNIFICANCE STATEMENT Understanding co-evolution in crop-parasite systems is critical to management of myriad pests and pathogens confronting modern agriculture. In contrast to wild plant communities, parasites in agricultural ecosystems are usually expected to gain the upper hand in co-evolutionary ‘arms races’ due to limited genetic diversity of host crops in cultivation. Here, we develop a framework for studying associations between genome diversity in global landraces (traditional varieties) of the staple crop sorghum with the distribution of the devastating parasitic weed Striga hermonthica. We find long-term maintenance of diversity in genes related to parasite resistance, highlighting an important role of host adaptation for co-evolutionary dynamics in smallholder agroecosystems.
INTRODUCTION
Host-parasite interactions can be powerful and dynamic selective forces maintaining genetic variation in natural populations (1). In wild plant pathosystems, long-term balancing selection often maintains diverse resistance alleles in host populations over evolutionary time (2–4). Resistance polymorphism can be maintained by negative frequency-dependent selection, which drives cycling of resistance and virulence alleles when rare alleles provide a selective advantage (i.e. fluctuating Red Queen dynamics sensu (5)) (2, 6). Costs of resistance can also generate allele frequency cycles and maintain diversity over gradients of parasite pressure (7).
In contrast to wild plant communities where fluctuating Red Queen dynamics have frequently been observed, low host diversity in agricultural settings is often assumed to permit runaway ‘arms races’ in fast-evolving parasites (4, 8). Relative to smallholder farms, however, industrial scale farming accounts for a fraction of global production for many food crops (9). The dynamics of host adaptation to parasites across diverse smallholder agricultural systems remains poorly known, despite relevance for identifying novel resistance alleles and managing crop genetic resources (e.g. preserving germplasm both ex situ and in situ; Jensen et al. 2012). Are co-evolutionary dynamics in smallholder farming systems more similar to natural plant pathosystems, where high connectivity among genetically diverse patches can help promote evolution of host resistance (11)?
Current approaches for identifying and studying the evolution of resistance alleles often involve scoring large panels of diverse individuals (in genome-wide association studies, GWAS) or many recombinant individuals deriving from controlled crosses (in linkage mapping) and using DNA sequence information to identify genomic regions associated with parasite susceptibility. These mapping studies have revealed numerous insights to mechanisms of plant-pathogen dynamics (12), but require extensive phenotyping and genotyping effort for adequate statistical power. To understand the genetic basis of response to abiotic stress, another ‘bottom-up’ approach is to leverage analysis of the extensive genetic variation present in landraces, traditional crop or livestock varieties which have spread and subsequently adapted to diverse agro-ecological environments following domestication (13–16). Compared to modern improved varieties, which may have lost many resistances due to population bottlenecks and selection for performance in optimal environments (17), landraces and crop wild relatives may be a rich source of alleles conferring resistance to sympatric parasites under a hypothesis of local adaptation to environment of origin. Furthermore, landraces can be studied to test the hypothesis that a natural, putatively environmentally-adapted allele identified from a limited set of experimental environments and genetic backgrounds is indeed adaptive across a wide range of similar environments and across diverse genetic backgrounds (14). Genotype-environment association (GEA) analyses of georeferenced landraces have been a powerful strategy for understanding the genetic basis of local adaptation to gradients of abiotic stressors (14, 15). To our knowledge, similar approaches to investigate adaptation to biotic stress at continent scale have yet to be exploited (but see (16) for an example in Ugandan cattle).
In this study, we extend GEA to biotic gradients to evaluate the frequency and distribution of alleles in genomes of sorghum that confer resistance to Striga hermonthica (Delile) Benth., a root-parasitic weed native to Africa. Sorghum bicolor (L.) Moench is the world’s fifth most important crop and was domesticated from the wild progenitor Sorghum arundinaceum (Desv.) Stapf in Africa more than 5000 years ago (18). Sorghum is particularly important in arid and semiarid regions due to its drought tolerance compared to maize and rice. While plant drought responses are influenced by many processes, in recent years the role of strigolactones, plant hormones that regulate shoot branching (19), root architecture (20) and response to abiotic stress (21), have received particular attention. Strigolactones exuded into the rhizosphere, particularly under nutrient limitation (22) and drought (23), promote interactions with beneficial arbuscular mycorrhizal fungi (24) and are also required as germination stimulants for many root-parasitic plants. In addition to sorghum, S. hermonthica attacks maize, rice, millet, sugarcane, and wild grasses and is considered one of the greatest biotic threats to food security in Africa, costing billions of dollars (USD) in crop losses annually through yield loss and abandonment of fields (25).
Here, we evaluate the hypothesis that a geographical selection mosaic across gradients of biotic interactions maintains genetic diversity in S. hermonthica resistance in sorghum landraces (26, 27). To identify genomic signatures of host adaptation to parasites, we first develop species distribution models for S. hermonthica and search for statistical associations between sorghum genotype and prediction from the parasite model at collection location of each sorghum landrace. We characterize diversity and geographic distribution of loss-of-function alleles at the sorghum resistance gene LOW GERMINATION STIMULANT 1 (LGS1) (28). Loss of LGS1, a gene of unknown function with a sulfotransferase domain, alters stereochemistry of the dominant strigolactone in sorghum root exudates, from 5-deoxystrigol to orobanchol, which does not strongly stimulate parasite germination (28, 29). Our ecological genetic analyses of diverse sorghum landraces suggest that LGS1 loss-of-function mutations are adaptive across a large region of high S. hermonthica prevalence in Africa. However, we also present evidence supporting potential trade-offs related to LGS1 loss-of-function. Because endogenous strigolactones are integral to host plant development, rhizosphere communications, and stress response, we predicted that LGS1 loss-of-function may adversely impact fitness- or domestication-related traits through altered hormone signaling and perception. We compare root transcriptomes of sorghum lines producing high levels of 5-deoxystrigol or orobanchol to determine if structural variation in SLs impacts expression of strigolactone biosynthesis and signaling pathway components. After focused analyses on LGS1, we perform genome-wide tests of association with parasite distribution. We investigate patterns of polymorphism surrounding candidate resistance genes with evidence of locally-adaptive natural variation to determine whether balancing selection has maintained diversity in S. hermonthica resistance over evolutionary time.
RESULTS
S. hermonthica distribution model
We predicted that host alleles affecting resistance or tolerance would be strongly associated with the geographic distribution of parasites. To identify regions of likely S. hermonthica occurrence in the absence of continent-wide surveys, we built MaxEnt species distribution models (SDMs) (30). The optimal model showed good ability to predict occurrence with an AUC value of 0.86 (Fig. 1). A high degree of overlap was observed between models generated using all S. hermonthica records (n=1,050) and a subset of 262 occurrences that were observed specifically in fields of sorghum (Schoener’s D = 0.82; I = 0.97). Annual rainfall and total soil N were the most informative variables for predicting S. hermonthica occurrence (Table S1). Compared to all cells in the study background, distributions of environmental values for locations with high habitat suitability (HS) were generally restricted to a narrower range of intermediate values of precipitation and soil quality (Fig. S1-2). Locations with the highest HS scores exhibited mean annual rainfall ranging from ~500-1300 mm/year and soil nitrogen ranging from ~400 – 1000 g/kg (10th-90th percentiles for all cells with HS >0.5, Table S1). Soil clay content also contributed substantially to the sorghum-only model, and clay content in locations with the highest HS scores ranged from 12-29% (10th-90th percentile, HS > 0.5, all-occurrence model) or up to 36% (sorghum-only model, Table S1).
LGS1 associations with S. hermonthica occurrence
We predicted that sorghum resistance alleles would be more common in parasite-prone regions. Evaluating this prediction also allowed us to validate our SDM-based genotype-environment association approach by characterizing associations between S. hermonthica distribution and genetic variation at LGS1 (Sobic.005G213600), a known locus causing resistance to Striga spp. (28).
Using whole genome sequencing data (~25x coverage) from 143 sorghum landraces, we found evidence for three naturally occurring mutations resulting in LGS1 loss-of-function (Fig. 2). Two ~30 kb deletions were identified between positions 69,958,377-69,986,892 (n = 5 accessions) and 69,981,502-70,011,149 (n = 4 accessions, Table S2; Fig. 2A). These two deletions appear to be identical to previously described lgs1-2 and lgs1-3 (28), although breakpoint positions reported by our structural variant caller differed slightly. No SNPs in a separate genotyping-by-sequencing (GBS) dataset of >2000 sorghum landraces tagged lgs1-2 or lgs1-3 (Fig. S3), and so we imputed large deletions identified from the WGS dataset to the GBS dataset based on patterns of missing data (see Methods). Deletion call imputations were validated by testing root exudate from a subset of sorghum accessions for their ability to induce S. hermonthica germination (Fig. S4). We tested four genotypes with likely deletion alleles, and these genotypes stimulated significantly fewer Striga seeds to germinate compared to eight other genotypes that did not show strong evidence for deletions (linear mixed-effects model with genotype random effect, deletion genotypes stimulated germination of 8.43 fewer seeds out of 75 total, Wald 95% CI=1.49,15.36). Two of the genotypes stimulated low germination (P721 and SC1439) and had all 14 GBS SNPs missing from LGS1, but SNP calls outside of the gene model were inconsistent with lgs1-2 or lgs1-3 alleles, perhaps due to the presence of an additional uncharacterized deletion allele.
In addition to lgs1-2 and lgs1-3, we identified a previously unknown high-impact two bp insertion resulting in a frameshift variant in the beginning of the LGS1 coding region (position 69986146, allele frequency in WGS dataset = 8%). The frameshift was linked to a SNP genotyped in the GBS dataset (T/A at position 69,985,710; D’ = 0.93, r2 = 0.84, Fig. 2B). All nine accessions with the frameshift in the WGS dataset also shared a 315 bp deletion (positions 69,984,268-69,984,583) overlapping 143 bp of the 3’ untranslated region in the 1580 bp second exon of LGS1.
Although our approach is not well-powered to detect heterozygous deletions or small loss-of-function variants unlinked to SNPs in the GBS dataset, we find evidence that at least three of six independent LGS1 loss-of-function alleles characterized here and elsewhere (28) are present at low to intermediate frequency. Among accessions with SNP calls in LGS1, 7.0% of accessions exhibited SNP calls consistent with homozygous lgs1-2 and lgs1-3 in the GBS dataset (Supplemental Data File S1). Homozygous deletion frequency in African landraces for the GBS dataset was 7.5% compared to 5.3% in non-African landraces. The SNP tagging the frameshift is present at an allele frequency of 15% in the GBS dataset and is found in 16% of African landraces but only 7% of non-African landraces. LGS1 loss-of-function alleles were found in diverse races and regions for both datasets (Fig. S4, Table S2, Supplemental Data File S1), suggesting that these mutations have had time to spread and that their benefit is not strongly masked by epistasis. Most landrace accessions with LGS1 deletions and botanical race assignments from the GBS dataset were guinea (12 accessions), caudatum (8 accessions), or durra-caudatum hybrids (17 accessions, Table S2, Supplemental Data File S1).
LGS1 loss-of-function alleles were significantly more common among landraces with high parasite HS scores (Fig. 2C-D). However, correlations with population structure reduced power to detect associations with these resistance alleles after accounting for kinship (Fig. S5). The median S. hermonthica HS score was 0.20 for accessions homozygous for lgs1-2, 0.54 for accessions homozygous for lgs1-3, 0.25 for accession with the frameshift, or 0.09 for accessions without evidence for LGS1 loss-of-function. The difference in S. hermonthica HS score between lgs1-3 and LGS1 intact accessions was statistically significant before accounting for relatedness (p < 0.001, Wilcoxon rank sum test vs. p = 0.10, MLM). Frameshift associations with HS were also statistically significant prior to correction for relatedness (p < 0.001, Wilcoxon rank sum; p = 0.69, MLM) and were stronger with the sorghum-only model compared to the all-occurrence model (p = 0.33, MLM). We observed modest support for associations between lgs1-2 and S. hermonthica HS before correcting for relatedness (p = 0.06, Wilcoxon rank sum test; p = 0.53, MLM), and stronger associations with the sorghum-only model (p = 0.26, MLM).
Upregulation of SL biosynthesis pathway genes in an LGS1-deficient line
Changes in strigolactones (and indeed any hormone) are potentially pleiotropic given the many known downstream functions of SLs, suggesting tradeoffs may be associated with LGS1 variation. We found multiple SL biosynthesis genes were up-regulated in nutrient-stressed roots of a resistant sorghum line (SRN39), which produces high levels of orobanchol, compared to a susceptible sorghum line (Shanqui Red) that produces mainly 5-deoxystrigol (28), a dominant SL in sorghum root exudates that strongly stimulates S. hermonthica germination (29). We first confirmed differences in expression for genes located in the LGS1 region deleted in SRN39. Two gene models in the deletion region were differentially expressed with low expression in Shanqui Red and no reads detected in SRN39 (Sobic.005G213500, p = 0.001; Sobic.005G213700, p = 0.008). The two other gene models in the deleted region, Sobic.005213766 and Sobic.005213832, had no reads detected in either line in roots under nutrient stress. LGS1 was differentially expressed between lines, with low expression in the susceptible line and no expression in SRN39 (p = 0.03).
In Arabidopsis and rice, the strigolactone biosynthesis pathway undergoes negative feedback regulation, in which low strigolactone levels stimulate upregulation of several biosynthesis genes (31, 32). We asked if low endogenous 5-deoxystrigol might also influence SL biosynthesis gene expression in high-orobanchol producing sorghum genotypes. Four of eleven expressed SL biosynthesis genes had higher transcript levels in the LGS1-deficient sorghum line SRN39 compared to the susceptible line (Fig. S6, Table S3). These included two genes with homology to NSP1 (Sobic.001G341400, p < 0.001 and Sobic.002G372100, p = 0.03), CCD7 (Sobic.006G170300, p <0.001), and LBO (Sobic.003G418000, p = 0.01). Differences between lines in expression of seven other SL biosynthesis genes were not statistically significant. Among 17 genes with probable roles in SL perception and downstream signaling, SMXL3 (Sobic.008G042800) was differentially expressed between lines and was downregulated in the LGS1-deficient line SRN39 (p = 0.03). This change in SMXL3 suggests possible downstream pleiotropy given that regulation of SL-dependent processes such as shoot development and root growth depends on the ubiquitination and degradation of SMXL repressor proteins (33, 34).
Overall, 2937 transcripts were differentially expressed in roots of the two lines under nutrient stress. The majority of transcripts were more highly expressed in LGS1-deficient line SRN39 compared to Striga-susceptible Shanqui Red (1290 downregulated vs. 1647 upregulated, Supplemental Data File S2). No GO-term was overrepresented among differentially expressed genes, but genes involved in response to a biotic stimulus exhibited the highest enrichment score (GO: 0009607; uncorrected p = 0.002; FDR corrected p = 1).
Parasite-associated SNPs across the sorghum genome
We performed genome-wide tests of association with predicted parasite HS score. A scan of 317,294 SNPs across 2070 sorghum landraces revealed 97 loci exhibiting significant associations with S. hermonthica distribution at a false discovery rate of 5% (Fig. 3A, Table S4). Of SNPs exceeding the threshold for significance, 45 were present within 1 kb of a predicted gene model, and four were predicted to cause an amino acid change (Table S4). Three outlier SNPs were in QTL previously associated with Striga resistance (35) including one intron variant in the uncharacterized gene model Sobic.001G227800 (Table S4). Another outlier SNP was present in an intron of Sobic.007G090900, a gene model with high homology to SMAX1/D53, which is degraded in an SL-dependent manner to control downstream SL signaling and is associated with tillering and height in rice (36). SNPs among those with the strongest associations to parasite occurrence were also found in genes related to suberin and wax ester biosynthesis (Sobic.007G091200) including phenylalanine ammonia-lyase (Sobic.006G148800). Phenylalanine ammonia-lyase is highly upregulated in the resistant rice line Nipponbare compared to a susceptible line during infection with S. hermonthica (37) and is associated with increased lignin deposition and post-attachment resistance (38).
Across all gene models tagged by SNPs in the genome-wide analysis, no GO term met the threshold for significance after correction for multiple comparisons. The strongest enrichment scores were in genes with GO terms related to cell wall organization (GO:0071555; corrected p = 0.13, mean p score for 48 genes = 0.27), cell wall (GO:0005618, corrected p = 0.17, mean p score for 70 genes = 0.29), and pectinesterase activity (GO:0030599, corrected p = 0.22, mean p score for 39 genes = 0.26). The strongest SNP association to parasite occurrence in a pectinesterase gene model was in Sobic.002G138400 (SNP S2_21521798, Table S4). The allele associated with parasite occurrence at S2_21521798 is not predicted to cause an amino acid change, but was in strong linkage disequilibrium (r2 >0.9) with SNPs up to 204.4 kb upstream of the gene model, which encompasses a region of likely substantial structural variation based on visual inspection of read alignments to the reference. Overall, SNPs in genes related to SL biosynthesis and signaling (Table S3) showed a non-significant enrichment for associations with S. hermonthica distribution (uncorrected p = 0.09).
Signatures of selection in candidate regions
We further investigated three candidate genes with polymorphism that exhibited distinct geographic patterns and had known or potential roles in S. hermonthica resistance. Elevated Tajima’s D values can indicate an excess of shared polymorphism across SNPs at a locus, expected for regions of the genome under balancing selection, whereas strongly negative values can indicate an excess of low frequency polymorphism, expected under either purifying or positive selection. Two 5 kb genomic regions, spanning SNPs in LGS1 (SNP S5_69986146 in gene model Sobic.G005G213600) and a pectinesterase gene (SNP S2_21521798 in gene model Sobic.002G138400, MAF=0.275) exhibited elevated values of Tajima’s D compared to 1000 randomly sampled 5 kb windows containing or overlapping gene models (Fig. 4; p = 0.06, Sobic.G005G213600; p = 0.02, Sobic.002G138400). Regions of elevated Tajima’s D were localized to relatively small windows centered on SNPs associated with Striga habitat suitability, and larger window sizes produced weaker signals (data not shown). We looked for evidence of trans-species polymorphism and found no reads mapping to the pectinesterase gene and no evidence for the LGS1 loss-of-function alleles characterized in our study in previously sequenced accessions of S. propinquum (Kunth) Hitchc. (n=2) and S. arundinaceum (as synonym S. bicolor subsp. verticilliflorum (Steud.) de Wet ex Wiersema & J.Dahlb.) (n=2)(39).
We did not observe strong departures from the neutral expectation for the region surrounding a gene with homology to SMAX1 (gene model Sobic.007G090900 tagged by SNP S7_14459084, p=0.6). The minor allele at S7_14459084 was at low frequency in the GBS dataset (MAF = 0.014) and most common in West Africa (Fig. 3C), which is not well sampled in the WGS dataset. The signal of association with S. hermonthica occurrence extended more than 7.5 Mb on Chromosome 7 (Fig. 3A), but we did not observe evidence to suggest that S7_14459084 tags an incomplete or soft sweep in either the GBS or WGS datasets according to the haplotype-based statistic nSL (40).
DISCUSSION
Pests, pathogens, and parasites threaten human health and food security in a changing world but understanding mechanisms of resistance across diverse taxa remains challenging. Here, we evaluate the hypothesis that geographic selection mosaics maintain genetic diversity in host resistance alleles across gradients of parasite occurrence in smallholder farming systems. We extended genotype-environment association analyses to biotic environmental gradients using species distribution models to model high resolution variation in parasite occurrence at continent scales. We report strong associations with parasite occurrence for novel candidate resistance loci in the sorghum-Striga hermonthica pathosystem and characterize diverse loss-of-function mutations in the known sorghum gene LGS1. Geographic distribution of loss-of-function alleles suggests that LGS1-conferred resistance is stable across environments and genetic backgrounds. However, the intermediate frequency and paucity of LGS1 loss-of-function alleles outside of parasite-prone areas, combined with LGS1-associated changes in sorghum strigolactone biosynthesis and endogenous signaling, suggest there may be trade-offs associated with LGS1 loss-of-function.
Our results support the hypothesis that spatial variation in selective pressures controls geographic clines in the frequency of host resistance alleles. The patterns we characterized are likely representative of long-term averaged conditions as opposed to a snapshot of coevolution because our parasite distribution model used occurrence records spanning more than 150 years, and the landraces we studied were collected across the last several decades. Negative frequency-dependent selection and rapid coevolutionary dynamics, for example, could decouple parasite abundance and the frequency of host resistance alleles. Our approach may be helpful for identifying host alleles whose resistance phenotype is conditional on local parasite genotypes (e.g. as might be suggested by patterns of distribution for polymorphism in SMAX1, Fig. 3D). Many parasites including S. hermonthica are highly genetically diverse, so that host resistance phenotype differs depending on local parasite genotypes (41). This host by parasite genotype interaction might obscure GWAS for host resistance depending on the parasite genotype used (42). Combining knowledge of parasite population structure with the spatial perspective of parasite-associated host genomic variation presented here could facilitate complementary, inexpensive detection of genomic regions contributing to resistance across diverse biotic environments.
Our study revealed evidence for locally-adaptive natural variation in genes related to cell wall modification. Cell-wall modifying enzymes including pectinesterases are highly expressed in the developing haustorium of parasitic plants and in the host-parasite interface (43–46). Pectinesterases de-esterify pectin in plant cell walls, making it accessible to other cell-wall degrading enzymes and loosening cell walls. However, some studies have suggested that in the presence of Ca2+, de-esterified pectin forms egg-box structures and instead stiffens cell walls (47). Rigidification of sorghum cell walls by their own pectinesterases (such as Sobic.002G138400, Table S4) or reduced activity could help defend against parasitic invaders. Notably, Yang et al. (2015) reported haustorium specific expression and positive selection on pectinesterase inhibitors in parasitic plant lineages and a pectinesterase inhibitor showed exceptionally high host species-associated expression in field populations of S. hermonthica (Lopez-Perez et al., in press). Parasite pectinesterase inhibitors could interact with host pectinesterases or help maintain integrity of parasite cell walls in the face of high pectinesterase expression during haustorial invasion. The SNP in a pectinesterase gene most strongly associated with S. hermonthica distribution does not code for an amino acid change but is linked with other SNPs across a region spanning substantial structural variation in the WGS dataset, potentially influencing sorghum pectinesterase expression levels during haustorial invasion.
Our results also suggest that LGS1 loss-of-function alleles may be adaptive in S. hermonthica-prone regions, but that costs of resistance may limit their distribution. Loss-of-function alleles are relatively uncommon but higher in frequency and broadly distributed where parasites occur (Fig. 2C). Associations for the known locus LGS1 were not statistically significant after correcting for kinship, likely due to covariation with genomic background (Fig. S4), which has been shown to substantially reduce power to detect causal loci in locally-adapted sorghum landraces (14) and in simulations (48). The diversity of loss-of-function variants reported here (Fig. 2A-B) and elsewhere (28), their wide geographic distribution (Fig. 2C), and an excess of high frequency polymorphism localized to LGS1 (Fig. 4) are consistent with long-term maintenance of LGS1 diversity under balancing selection. The underlying evolutionary processes remain unknown but could include negative frequency dependent selection or spatiotemporally variable selection favoring different alleles in different environments, depending on the relative costs of resistance (4, 49). Costs of resistance linked to SL structural changes could include the ability to interact with AM fungi, increased susceptibility to other S. hermonthica ecotypes more sensitive to orobanchol (e.g. S. hermonthica that parasitize rice, an orobanchol-exuder), or impacts on endogenous strigolactone signaling. Consistent with the last hypothesis, we find increased expression of SL biosynthesis genes in a sorghum line with low levels of 5-deoxystrigol (5DS), one of the primary SLs produced by sorghum (29, 50). This result could suggest transcriptional feedback in response to deficits in endogenous 5DS in sorghum, as previously observed in Arabidopsis and rice in response to low overall SL levels (31, 32). Feedback regulation could be mediated by repressors of SL signaling such as SMXL3, a homolog of which was downregulated in the 5DS-deficient line (Table S3). Costs of resistance may also differ among particular loss-of-function alleles, depending on biological pathways influenced by other genes in deleted regions.
Another factor potentially contributing to natural variation in SL biosynthesis and signaling is the role of human selection against S. hermonthica resistant varieties during domestication and diversification of cultivated sorghum. While impacts of SL structural variation on endogenous function remain poorly known, natural variation in levels of strigolactone production correlates with increased tillering in rice (51, 52), a trait subject to strong selection over the course of cereal domestication (53). One of the most well-studied domestication genes, Teosinte branched 1 (Tb1), controls branching and was a major target of selection during maize domestication (54, 55). Tb1 activity is decoupled from strigolactone signaling in maize, enabling strigolactone-independent branching regulation (56). However, in pearl millet, signatures of selection on Tb1 orthologs are weaker (57), consistent with observations that Tb1 orthologs do not act independently of strigolactones in other cereals (58). There is also evidence that Tb1 regulates the maize domestication gene Teosinte glume architecture 1 (59), a homolog of which also exhibited parasite-associated genetic variation in our study (Table S4).
Taken together, this study provides evidence of locally-adaptive natural variation in sorghum parasite resistance genes across African smallholder farming systems. We report long-term maintenance of diversity in known and novel candidates implicated in pre- and post-attachment resistance to the parasitic plant Striga hermonthica. However, the possibility of LGS1-driven tradeoffs or the existence of orobanchol-sensitive S. hermonthica populations (for example, from rice) suggest potential pitfalls with widespread deployment of the LGS1 loss-of function allele in sorghum cultivation. Our findings highlight the complexity of interacting abiotic, biotic, and human pressures shaping genome polymorphism across environments in cultivated species.
METHODS
Species distribution models
Genome-environment association approaches identify putatively locally adaptive genetic loci where allelic variation is strongly associated with home environments (60). To employ this approach with biotic gradients, we required information on local parasite pressure for each sorghum landrace. We used species distribution models (SDMs) to estimate habitat suitability of Striga hermonthica at the location of each georeferenced sorghum landrace, under the assumption that modeled habitat suitability scores are a reasonable proxy of parasite success averaged over the long term and in comparison with sites where the parasite never occurs.
S. hermonthica SDMs were constructed with Maxent, a machine learning tool for predicting habitat suitability for a species of interest given a set of environmental variables and presence-only data (30). We compiled 1369 occurrence records for S. hermonthica records downloaded from the Global Biodiversity Information Facility (www.gbif.org), newly digitized records for specimens housed in the collections of the Royal Botanic Gardens Kew, the National Museum of Natural History in Paris, the French Agricultural Research Centre for International Development, the University of Montpellier, and the Botanical Garden of Lyon, and observations from published studies (61–76) (Supplemental Data File S3). Records within 0.01 degree (~1 km) of another observation were excluded to reduce sampling bias. To characterize the background environment across the study extent, we sampled 10,000 points at random from a 500 km radius surrounding locations of 1,050 occurrences in the final dataset. We also created ‘sorghum-only’ models based on a subset of S. hermonthica records (n = 262) that were annotated as occurring specifically on sorghum.
Environmental variables were chosen based on prior knowledge of the ecology of S. hermonthica (61). Bioclimatic and topographic variables (annual rainfall, mean temperature of the wettest quarter, isothermality, potential evapotranspiration [PET], and topographic wetness index) were obtained from CHELSA (77) and ENVIREM datasets (78). Soil variables (clay content, nitrogen, and phosphorus) were based on continental and global-scale soil property maps (79, 80). We explored additional ecologically relevant variables but did not include them in the final model due to high correlation across the study background with annual rainfall (correlated with soil PH, aluminum, and precipitation seasonality) or soil clay content (correlated with sand fraction) as indicated by Pearson coefficients (|r| >0.7).
SDMs were implemented and evaluated with ENMeval, using the ‘checkerboard2’ method for data partitioning, which is designed to reduce spatial autocorrelation between testing and training records (81). The distribution model with the lowest ΔAICc was selected for further comparisons with genome variation in sorghum. Two niche overlap statistics, Schoener’s D (Schoener 1968) and the I similarity statistic (83) were calculated for the all-occurrence and sorghum-only models using the R package dismo (84).
Sorghum LGS1 loss-of-function alleles
Fine-scale natural variation in sorghum LGS1 was characterized using whole genome sequencing (WGS) data from a set of 143 georeferenced landraces from the sorghum bioenergy association panel (BAP) (85). The BAP includes both sweet and biomass sorghum types, accessions from the five major sorghum botanical races (durra, caudatum, bicolor, guinea, and kafir), and accessions from Africa, Asia, and the Americas. The BAP accessions were sequenced to approximately 25x coverage and genotyped as part of the TERRA-REF project (www.terraref.org). This whole genome sequencing derived dataset is referred to throughout the manuscript as the ‘WGS dataset’ to distinguish it from the GBS dataset used for the GEA.
We characterized three loss-of-function alleles in LGS1 using data from the WGS dataset. Frameshift and nonsense mutations were identified using SnpEff v4.3t for SNP calls and small indels in Sobic.005G213600 available from TERRA-REF (86). High impact variants were manually checked to remove those near the 3’ end of the coding region and two linked variants that were present in the same lines and did not cause a frameshift when combined. To characterize large deletion variants, we aligned quality trimmed reads to the Sorghum bicolor v3.1 reference genome (DOE-JGI, http://phytozome.jgi.doe.gov/) with BWA MEM v0.7.17 (87). Duplicates were removed with SAMBLASTER v0.1.24 (88) and structural variants were called for each landrace with LUMPY v0.2.13 (89). SVTYPER v0.6.0 was used to call genotypes for structural variants ≤ 1 Mb spanning Sobic.005G213600 (90).
Following characterization of LGS1 deletion breakpoints using the WGS dataset, we imputed deletion calls to the GBS dataset. We considered the LGS1 region to be deleted if at least one SNP was called in the 5 kb region flanking positions of deletion breakpoints, but all data were missing between breakpoints. We considered the LGS1 region to be present if at least one SNP was called within the Sobic.005G213600 gene model. Fifteen low-coverage samples, with missing data extending 5 kb into flanking regions, were excluded.
Experimental validation of LGS1 deletion and frameshift alleles
LGS1 loss-of-function alleles were validated by testing 12 accessions from the Sorghum Association Panel (SAP) (91) for their ability to stimulate S. hermonthica germination (Table S5). Three accessions were previously reported as resistant to S. hermonthica, two accessions were susceptible, and seven accessions had unknown resistance (28, 92). Root exudates were harvested 43 days after planting (see Supplemental Materials & Methods for a detailed description of plant growth conditions). Germination trials were conducted using seed of Striga hermonthica collected on sorghum by Steven Runo (Kenyatta U) in western Kenya and assayed in the USDA quarantine lab at UVa. Seeds were surface sterilized with 0.5% bleach, rinsed three times with sterile distilled water, and 75 seeds were transferred with 500uL diH2O to 12-well microtiter plates. After 10 days of preconditioning in the dark at 30ºC, 2.25 mL of fresh root exudate was applied to each well. Exudate from five biological replicates (sorghum individuals) per sorghum genotype, with three technical replicates (wells) per biological replicate, were tested. GR24 at two concentrations, 1 ppm and 0.1 ppm, and water were used as positive and negative controls. Germinated seeds were counted under stereomicroscope after 66 hours incubation. We tested significance of deletion alleles using linear mixed effects models in the R package lme4 (93), where deletion was a fixed effect and genotypes were random effects.
LGS1 and expression of genes in strigolactone synthesis and signaling pathways
To assess the impact of putatively locally adaptive variation at LGS1, we studied root transcriptomes of two lines segregating for presence/absence. We generated TagSeq libraries using root tissue from five replicate individuals each of sorghum lines Shanqui Red (PI 656025) and SRN39 (PI 656027), grown under nutrient deficient conditions. Shanqui Red is susceptible to S. hermonthica, whereas SRN39 is resistant as a result of a ~34 kb deletion spanning LGS1 and four adjacent genes (28). Five replicates of each line were grown under identical conditions as for root exudate collection. Twenty-five days after planting, all potting components were carefully washed from roots, and 2-5 ml of the fine root tissue was sampled, placed into labeled vials and stored on ice for 1 day during shipment to U of Texas, where samples were stored at −80°C until RNA extraction.
RNA extractions were performed with TRIzol Reagent after grinding root tissues in liquid nitrogen using mortar and pestle. We used the 3’-TagSeq approach (94) with several modifications to construct cDNA libraries. This method focuses on the 3’ end of transcripts enriched in a size range of 400-500 bp fragments.
cDNA libraries were sequenced on a single lane of an Illumina Hiseq-2500 analyzer at the Genomic Sequencing and Analysis Facility at the University of Texas at Austin. We recovered between 3 to 5 million raw single-end 150 bp reads per sample and compared expression differences between SRN39 and Shanqui Red using the TagSeq v2.0 pipeline (https://github.com/Eli-Meyer/TagSeq_utilities/). A detailed description of the library preparation method and bioinformatic analysis is given in Supplemental Materials & Methods.
Count data were analyzed in DESeq2 with FDR correction (α=0.05) (95). Enrichment analysis was performed as for genome-wide association analysis, except only the set of gene models with non-zero expression were used as the background.
Genome-environment associations
We performed a genome-wide scan for SNPs in the sorghum genome strongly associated with values of habitat suitability estimated by our S. hermonthica distribution model. Sorghum genotypic information was extracted from a public dataset of accessions genotyped using genotyping-by-sequencing (GBS) (14, 96–98). This dataset comprises a diverse set of worldwide accessions including germplasm from the SAP (91), the Mini-Core Collection (99), and the Bioenergy Association Panel (BAP) (85). Beagle 4.1 was used to impute missing data based on the Li and Stephens (2003) haplotype frequency model (100). The average missing rate in the non-imputed dataset is 0.39 (98). After excluding sorghum accessions with missing coordinates and SNPs with minor allele frequency less than 0.01, the dataset, hereafter referred to as the ‘GBS dataset’, included 1547 African landraces among 2070 georeferenced accessions total genotyped at 317,294 SNPs. At each location of a georeferenced accession in the GBS dataset, we extracted logistic output from the S. hermonthica distribution model (Supplemental Data File S4-5) as the ‘phenotype’. To account for regions where predicted habitat suitability is high but S. hermonthica has not been recorded, we cropped model predictions to within 200 km from any occurrence record and set values outside of this range to zero to derive for each grid cell an S. hermonthica occurrence score ranging from zero to one; more than half of sorghum accessions are from locations with parasite HS scores greater than zero (Fig. 1B). Genome-wide associations for each SNP with S. hermonthica occurrence were computed for the GBS dataset using a mixed linear model (MLM) fit with GEMMA v0.94 (101). To take into account relatedness among individuals, we used a centered kinship matrix (−gk 1) generated from all 317,294 SNPs before calculation of p score statistics (−lmm 3). P scores were adjusted for multiple comparisons using the Benjamini and Hochberg (1995) procedure (FDR = 0.05). To visualize genomic regions previously implicated in resistance to S. hermonthica, locations of QTL from Haussman et al. (2004) in the S. bicolor v3.0 genome were downloaded from the Sorghum QTL Atlas (http://aussorgm.org.au) (102). We tested for associations with LGS1 loss-of-function mutations using the same procedure and kinship matrix as for the genome-wide association analysis.
We identified gene functions enriched for associations with parasite distribution using the gene score resampling method in ErmineJ (103). This method places higher value on gene scores than their relative ranking and does not require choice of a threshold for significance. For each gene model, we used the lowest p score from GEMMA of any SNP within 1 kb of gene model boundaries, and enrichment analyses were performed using the mean of all gene scores in the gene set. Gene sets were created using GO terms for all gene models in the Sorghum bicolor v3.0 genome (annotation version 3.1). We also created a custom gene set comprising 30 gene models implicated in strigolactone biosynthesis and signaling (Table S3). This gene set included nine gene models annotated as belonging to Phytozome pathway Sbicolor PWY-7101 (D27, CCD7, and CCD8 homologs), and all sorghum gene models annotated with best BLASTP hits to Arabidopsis MAX1, LBO (104), AtD14, SMAX1, SMXL3/4/5/6/7, NSP1, and NSP2 (105). Enrichment analysis was performed with 200,000 iterations, excluding gene sets with less than 5 or more than 200 genes.
Signatures of selection in candidate genomic regions
We predicted functional impacts for candidate SNPs of interest and performed scans for selection in 1 Mb regions surrounding each focal SNP. Linkage disequilibrium between sites was determined with vcftools v0.1.15 (--geno-r2 parameter). To identify regions under balancing selection among a subset of African landraces, we used Tajima’s D calculated with vcftools in non-overlapping 5 kb windows, excluding SNPs with more than 70% missing data. P-values for candidate regions under selection were calculated based on the empirical distribution of Tajima’s D for 1000 randomly sampled 5 kb windows that overlapped or fully encompassed gene models. We searched for sweeps using the nSL statistic with selscan v1.2.0a (106).
SUPPLEMENTAL MATERIAL & METHODS
Collection of sorghum root exudates
Accessions were grown individually in two-gallon pots with 70% potting mix (Premier pro-mix PGX) and 30% medium commercial grade sand in a greenhouse at 85°F during the day and 75°F at night with a 16-hour photoperiod. Pots were fertilized with 1x strength Miracle-Gro (Miracle-Gro® Water Soluble All Purpose Plant Food, The Scotts Company, LLC., Marysville, OH) once at fourteen days after planting. Forty-three days after planting, potting components were carefully washed from roots and whole plants were placed in separate flasks with a 1:5 ratio of root:DI water (v/v). Flasks were sealed with parafilm to prevent evaporation and placed into darkness at room temperature. After 48 hours, root exudate from each plant was centrifuged at 9,000 rpm for 10 minutes before removing supernatant for germination assays.
Root RNA extraction and DNase treatment
For RNA extraction, root tissues were ground in liquid nitrogen using mortar and pestle. Ground tissue powder was homogenized using an equal volume (W/V) of TRIzol™ reagent (Invitrogen, Cat#15596018) and 1 mL homogenate was used in RNA extraction. RNA extraction generally followed TRIzol™ Reagent user guide. In brief, 200 μL of Chloroform:Isoamyl alcohol 24:1(Sigma, C0549) was added to 1 mL homogenate and incubated for 10 min on rotator mixer at room temperature. The homogenate was centrifuged at 12000 RCF for 15 min at 4°C. The clear upper aqueous phase was transferred to new 1.5 mL Eppendorf tubes and combined with an equal volume (V/V) of isopropanol to precipitate nucleic acid. Samples were mixed well by inverting the tubes several times and centrifuged as described above to obtain RNA pellet. The supernatant was discarded and the pellet washed with 75% alcohol. The pellet was air dried for 10 min at room temperature and suspended in 30 μL of 10 mM Tris-Cl (pH 8.0). RNA samples were left overnight in 4°C refrigerator to dissolve the pellet. Further, RNA samples were treated with DNase I (AmbionTM, AM2222) to remove contaminating DNA. In brief, 25 μL of RNA was incubated with 2 units of DNase I and 3 μL of 10X DNaseI buffer for 30 minutes at 37°C using a water bath. The nucleic acid was re-precipitated, air dried and dissolved in 25 μL of 10 mM Tris-Cl (pH 8.0). The RNA was quantified using NanoDrop® ND-1000 Spectrophotometer and ~100 ng of RNA was analyzed by 1% gel electrophoresis to verify that the RNA was intact and free of genomic DNA contamination.
Library preparation and sequencing
We used 3’-TagSeq approach (Meyer et al., 2011) with several modifications to construct cDNA libraries. This method focuses on 3’ end of transcripts enriched in a size range of 400-500 bp fragments. Briefly, 1 μg of total RNA from each samples were prepared in a volume of 10 μL and incubated with 8 μL of degradation buffer (0.5mM dNTP mix, 1 mM DTT, 1X first strand buffer, 1 μM 3ILL-30TV oligo-dT primer to aim 3’ ends of cDNA) at 70°C for 16 min to achieve desire range of RNA fragments (300-600 bp). The entire fragmented RNA was further used in first strand cDNA (FS cDNA) synthesis with RNA oligo primer (S-ILL-swMW) and SMARTScribe Reverse Transcriptase (Clonetech). Then, cDNA was amplified with 16 cycles using Titanium Taq Polymerase, 3ILL and 5ILL primers, and 10 μL of the FS cDNA as template. The amplified cDNA products were purified using NucleoFast PCR Clean-up kit (Machery-Nagel) and quantified using Qubit dsDNA High sensitivity Assay Kit (Invitrogen) on Qubit® 2.0 Fluorometer.
Further, 50 ng of purified cDNA from each sample was individually barcoded with four cycles of barcoding PCR. We used Illumina specific barcodes (ILL-BC) and multiplexed those using Illumina TruSeq universal adapters (TrusSeq_Un) for use on the Illumina platform. An equal volume of barcoded cDNA libraries was pooled together and purified using PureLink Quick Gel Extraction and PCR Purification Combo kit (Invitrogen). The purified cDNA library pool was separated on 1.5% agarose in 1X TBE buffer (pH 8.2) to excise the fragment in the 400-500 bp size range. The excised gel fragment was purified using a gel extraction kit as mentioned above. The resulting library pool was loaded into a single lane of an Illumina Hiseq-2500 analyzer at the Genomic Sequencing and Analysis Facility at the University of Texas at Austin. We recovered between 3 to 5 million raw single-end 100 bp reads per sample. All primers used in the library preparation are given below.
Bioinformatic analysis of TagSeq data
Briefly, we excluded reads containing more than 20 bp with Phred quality scores < 20, homopolymer runs longer than 30 bp, or at least one 12-mer match to Illumina adaptor sequences. Reads were trimmed to remove non-template bases introduced during library prep, identified based on occurrence of a ‘GGG’ motif within the first 10 bp. Reads were then mapped to the set of 34,211 primary transcripts from the S. bicolor v3.1 genome with SHRiMP v.2.2.3 (David et al. 2011) and we kept only unique alignments and those with more than 40 bp matching the reference.
Oligos used in library preparations
TruSeq_Un1: AAT GAT ACG GCG ACC ACC GAG ATC TAC ACA TCA CGA CAC TCT TTC CCT ACA CGA CGC TCT TCC GAT CT
TruSeq_Un2: AAT GAT ACG GCG ACC ACC GAG ATC TAC ACA CTT GAA CAC TCT TTC CCT ACA CGA CGC TCT TCC GAT CT
ILL-BC (8 six-mer barcodes, here shown as NNNNNN): CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTC TTCCGATC
S-Ill-swMW: rArCrCrCrCrArUrGrGrGrGrCrUrArCrArCrGrArCrGrCrUrCrUrUrCrCrGrArUrCrUrNrNMW rGrGrG
3ILL-30TV: ACGTGTGCTCTTCCGATCTAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTV
5ILL: CTACACGACGCTCTTCCGATCT
Acknowledgments
We thank the many collectors, volunteers, and herbarium curators who made this work possible and are particularly grateful to Marie-Hélène Weech and the staff of the Royal Botanic Gardens Kew and the Muséum national d’Histoire naturelle. We thank Steven Runo for providing S. hermonthica seeds and Alice MacQueen for comments that improved the manuscript. Whole genome sequence data used here is from the TERRA REF experiment, funded by the Advanced Research Projects Agency-Energy (ARPA-E), U.S. Department of Energy, under Award Number DE-AR0000594. This material is based on work supported by a National Science Foundation Postdoctoral Research Fellowship in Biology to ESB under Grant No. 1711950. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
REFERENCES
- 1.↵
- 2.↵
- 3.
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵