Abstract
Viral infection requires a swift reaction from the host, which includes a transcriptional response. This transcriptional response is shaped by genetic variation in the host population, resulting in phenotypic differences in viral susceptibilities within the infected population. However, regulatory genetic variation in antiviral responses is not well investigated in the context of the natural host populations. Here, we infect genetically diverse isolates of the model organism Caenorhabditis elegans with Orsay virus to determine their transcriptional response to infection. The Hawaiian C. elegans isolate CB4856 shows low viral susceptibility, despite lacking the upregulation of Intracellular Pathogen Response (IPR) genes which are known to counteract viral infection. We subsequently investigated whether temporal differences in IPR timing could explain viral susceptibility, yet find that the low viral susceptibility of CB4856 unlikely results from an accelerated IPR gene activation. Instead, our data suggests that regulatory genetic variation, in particular within two key IPR regulators, determines the host transcriptional defence. Genetic analysis of 330 wild isolates reveals that CB4856 belongs to a minority of strains that show high genetic diversity within the pals-genes that are part of the IPR.. The two IPR regulators, pals-22 and pals-25, are located in a genomic region shaped by balancing selection, therefore different evolutionary strategies in IPR regulation exist. Nevertheless, the vast majority of wild isolates harbour little genetic variation in the pals-genes. Together, this suggests that the worldwide conservation of the IPR host transcriptional defence in C. elegans results from a high evolutionary pressure that pathogens could impose.
Introduction
The continuous battle between host and virus drives host genetic variation to arise in antiviral mechanisms such as transcriptional responses. Regulatory genetic variation affects the viral susceptibility after infection, making some individuals within the population more resistant than others (Wang et al. 2018; Franco et al. 2013; Piasecka et al. 2018; van Sluijs et al. 2017). Yet, the universality and mode-of-action of genetic diversity in shaping antiviral transcriptional responses within natural populations remains largely unknown.
Caenorhabditis elegans and its natural pathogen Orsay virus (OrV) are used as a powerful genetic model system to study host-virus interactions (Félix et al. 2011). OrV is a positive-sense single-stranded RNA virus infecting C. elegans intestinal cells where it causes local disruptions of the cellular structures (Félix et al. 2011; Franz et al. 2013). Two major groups of antiviral genes respond to viral infection in C. elegans: genes related to the RNA interference (RNAi) pathway (Félix et al. 2011; Tanguy et al. 2017; Sterken et al. 2014; Sarkies et al. 2013; Ashe et al. 2013, 2015; Guo et al. 2013) and genes related to the Intracellular Pathogen Response (IPR) (Bakowski et al. 2014; Reddy et al. 2017, 2019; Chen et al. 2017). The RNAi pathway activity is controlled by the gene sta-1 which in turn is activated by the viral sensor sid-3 that is hypothesized to directly interact with the Orsay virus (Tanguy et al. 2017). Subsequently, the antiviral RNAi components dcr-1, drh-1, and rde-1 degrade the viral RNA (Sarkies et al. 2013), but the RNAi genes themselves remain equally expressed during infection (Chen et al. 2017; Sarkies et al. 2013). The IPR counteracts infection by intracellular pathogens (including OrV) and increases the ability to handle proteotoxic stress (Bakowski et al. 2014; Reddy et al. 2017, 2019). The gene pals-22 co-operates together with pals-25 to control the IPR pathway by functioning as a molecular switch between growth and antiviral defence. Pals-22 promotes development and lifespan, whereas pals-25 stimulates pathogen resistance. Together pals-22 and pals-25 regulate a set of 80 genes that are upregulated upon intracellular infection including 25 genes in the pals-family and several members of the ubiquitination response (Reddy et al. 2017, 2019).. Both pals-22 and pals-25 do not change gene expression following OrV infection. In total, the pals-gene family contains 39 members mostly found in five genetic clusters on chromosome I, III, and V.. Recently, a third antiviral defence was identified which degrades the viral genome after uridylation by the gene cde-1 (Le Pen et al. 2018). Together, these antiviral pathways are key in controlling OrV infection in C. elegans.
The transcriptional responses following infection have so far been studied in the C. elegans laboratory strain N2, in RNAi deficient mutants in the N2 background such as rde-1 and dcr-1 and in the RNAi-deficient wild isolate JU1580 (Tanguy et al. 2017; Sarkies et al. 2013; Reddy et al. 2017; Chen et al. 2017; Bakowski et al. 2014; Ashe et al. 2013). These studies indicate the presence of intraspecies variation in the transcriptional response. For example, C. elegans rde-1 and dcr-1 mutants and JU1580 are deficient in the RNAi pathway and all show high viral susceptibility, yet their transcriptional responses differ substantially (Sarkies et al. 2013; Félix et al. 2011). Further, several pals-genes differ in expression levels between N2 and the RNAi mutants on one hand and JU1580 on the other hand (Sarkies et al. 2013). However, the link between the genetic background and the transcriptional responses is not fully understood. Understanding the transcriptional responses in diverse genetic backgrounds will provide insight into understanding the natural variation within the antiviral defence mechanisms.
Here we studied the transcriptional response of the C. elegans Hawaiian isolate CB4856 in comparison to the response in N2. CB4856 has high genetic diversity compared to N2 (Thompson, et al. 2015) and the transcriptional profile of this strain has been well-studied under multiple conditions e.g. (Snoek et al. 2017; Capra et al. 2008; Li et al. 2006; Viñuela et al. 2012). We found that CB4856 is less susceptible to viral infection than the N2 strain, but lacks upregulation of the antiviral IPR genes. The temporal dynamics of the transcriptional response in three genetic backgrounds, N2, JU1580, and CB4856 demonstrates that gene expression patterns for multiple genes responding to OrV infection are highly dynamic, but does not show evidence for an accelerated activation of established antiviral IPR members in CB4856. The strains N2 and CB4856 differ in (basal) gene expression of several pals-genes, including pals-22 and pals-25 that control the IPR transcriptional response. Our data suggests that regulatory genetic variation within the pals-family underlies this difference. Contrary, most strains among a set of 330 wild isolates show little genetic variation for genes in the pals-family, suggesting that high selective pressure exists to conserve the IPR transcriptional response.
Materials and Methods
Nematode strains and culturing
C. elegans wild-types strains N2 (Bristol) and CB4856 (Hawaii) were kept on 6-cm Nematode Growth Medium (NGM) dishes containing Escherichia coli strain OP50 as food source (Brenner 1974). Strains were kept in maintenance culture at 12°C and the standard growing temperature for experiments was 20°C. Fungal and bacterial infections were cleared by bleaching (Brenner 1974). The strains were cleared of males prior to the experiments by selecting L2 larvae and placing them individually in a well in a 12-wells plate at 20°C. Thereafter, the populations were screened for male offspring after 3 days and only the 100% hermaphrodite populations were transferred to fresh 9-cm NGM dishes containing E. coli OP50 and grown until starved.
Orsay virus infection assay
Orsay virus stocks were prepared according to the protocol described before (Félix et al. 2011). After bleaching, nematodes were infected using 20, 50, or 100µL Orsay virus/500µL infection solution as previously described (Sterken et al. 2014). Mock infections were performed by adding M9 buffer instead of Orsay virus stock (Brenner 1974). The samples for the viral load and transcriptional analysis were infected in Eppendorf tubes with 50µL Orsay virus/500µL infection solution 26 hours post bleaching (L2-stage) (8 biological replicates per treatment per genotype). The nematodes were collected 30 hours after infection. The samples for the transcriptional analysis of the time-series were infected with 50µL Orsay virus/500µL infection solution at 40 hours post bleaching (L3-stage). The nematodes were collected at time points: 1.5, 2, 3, 8, 10, 12, 20.5, 22, 24, 28, 30.5, or 32 hours post-infection (1 biological replicate per treatment per genotype per time point). Viral loads of the samples were determined by RT-qPCR as described by (Sterken et al. 2014).
RNA isolation
The RNA of the samples in the transcriptional analysis (infected 26hpb and collected 56hpb) was isolated using Maxwell® 16 Tissue LEV Total RNA Purification Kit, Promega according to the manufacturer’s instructions including two modifications. First, 10 μL proteinase K was added to the samples (instead of 25 μL). Second, after the addition of proteinase K samples were incubated at 65°C for 10 minutes while shaking at 350 rpm. Quality and quantity of the RNA were measured using the NanoDrop-1000 spectrophotometer (Thermo Scientific, Wilmington DE, USA).
The RNA of the samples in the time series was isolated using the RNeasy Micro Kit from Qiagen (Hilden, Germany). The ‘Purification of Total RNA from Animal and Human Tissues’ protocol was followed, with a modified lysing procedure; frozen pellets were lysed in 150 µl RLT buffer, 295 µl RNAse-free water, 800 µg/ml proteinase K and 1% ß-mercaptoethanol. The suspension was incubated at 55°C at 1000 rpm in a Thermomixer (Eppendorf, Hamburg, Germany) for 30 minutes or until the sample was clear. After this step the manufacturer’s protocol was followed. Quality and quantity of the RNA were measured using the NanoDrop-1000 spectrophotometer (Thermo Scientific, Wilmington DE, USA) and RNA integrity was determined by agarose gel electrophoresis (3 μL of sample RNA on 1% agarose gel).
cDNA synthesis, labelling and hybridization
The ‘Two-Color Microarray-Based Gene Expression Analysis; Low Input Quick Amp Labeling’ -protocol, version 6.0 from Agilent (Agilent Technologies, Santa Clara, CA, USA) was followed, starting from step five. The C. elegans (V2) Gene Expression Microarray 4X44K slides, manufactured by Agilent were used.
Data extraction and normalization
The microarrays were scanned by an Agilent High Resolution C Scanner with the recommended settings. The data was extracted with Agilent Feature Extraction Software (version 10.7.1.1), following manufacturers’ guidelines. Normalization of the data was executed separately for the transcriptional response data (infected at 26 and collected at 56 hours post bleaching) and the transcriptional response of the time series. For normalisation, “R” (version 3.3.1. x64) with the Limma pacakage was used. The data was not background corrected before normalization (as recommended by (Zahurak et al. 2007)). Within-array normalization was done with the Loess method and between-array normalization was done with the Quantile method (Smyth and Speed 2003). The obtained single channel normalized intensities were log2 transformed and the transcriptional response data (infected 26hpb) was batch corrected for the two different virus stocks that were used for infection. The obtained (batch corrected) log2 intensities were used for further analysis using the package ‘tidyverse’ (1.2.1) in “R” (3.3.1, x64).
The transcriptome datasets generated are deposited at ArrayExpress (E-MTAB-7573 and E-MTAB-7574). The data of the 12 N2 mock samples of the time series has previously been described (Snoek et al. 2015).
Principal component analysis
A principal component analysis was conducted on the gene-expression data of the both the transcriptional response and the transcriptional response of the time series. For this purpose, the data was transformed to a log2 ratio with the mean, using where R is the log2 relative expression of spot i (i = 1, 2,…, 45220) in strain j (N2, CB4856, or JU1580), and y is the intensity (not the log2-transformed intensity) of spot i in strain j. The principal component analyses were performed independently per experiment. The transformed data was used in a principal component analysis, where the first six axes that explain above 4.9% of the variance were further examined.
Linear model
The log2 intensity data of the nematodes that were infected 26hpb and collected 56hpb was analysed using the linear model with Y being the log2 normalised intensity of spot i (1, 2,…, 45220). Y was explained over genotype (G; either N2 or CB4856), treatment (T, either infected or mock), the interaction between genotype and treatment and the error term ε. The significance threshold was determined by the p.adjust function, using the Benjamini & Hochberg correction (FDR < 0.1 for T and GxT, FDR < 0.05 for G) (Benjamini and Hochberg 2009).
The log2 intensity data for samples of the time series was analysed using the linear model with Y being the log2 normalised intensity of spot i (1, 2,…, 45220). Y was explained over development (D, time of isolation: 1.5, 2, 3, 8, 10, 12, 20.5, 22, 24, 28, 30.5, or 32 hours post-infection), genotype (G; either N2 or CB4856), treatment (T; either infected or mock) and the error term ε. The significance threshold was determined by the p.adjust function, using the Benjamini & Hochberg correction (FDR < 0.05) (Benjamini and Hochberg 2009).
Functional enrichment analysis
Gene group enrichment analyses were performed using a hypergeometric test and several databases with annotations. The databases used were: the WS258 gene class annotations, the WS258 GO-annotation, anatomy terms, phenotypes, RNAi phenotypes, developmental stage expression, and disease related genes (www.wormbase.org) (Lee et al. 2018; Stein et al. 2002); the MODENCODE release 32 transcription factor binding sites (www.modencode.org) (Gerstein et al. 2000), which were mapped to transcription start sites (as described by (Tepper et al. 2013)). Furthermore, a comparison with previously identified genes involved in OrV infection was made using a custom made database (Table S1).
Enrichments were selected based on the following criteria: size of the category n>3, size of the overlap n>2. The overlap was tested using a hypergeometric test, of which the p-values were corrected for multiple testing using Bonferroni correction (as provided by p.adjust in R, 3.3.1, x64). Enrichments were calculated based on unique gene names, not on spots.
Probe alignment
Probe sequences of the pals-genes (C. elegans (V2) Gene Expression Microarray 4X44K slides, Agilent) were aligned to the genome sequence of CB4856 (PRJNA275000) using the BLAST function of Wormbase (WS267).
Genetic variation analysis
Data on C. elegans wild isolates were obtained from the CeDNR website (release 20180527) (Cook et al. 2017). The data was further processed using custom made scripts (https://git.wur.nl/mark_sterken/Orsay_transcriptomics). In short, the number of polymorphisms in the pals-family within a strain was compared to the total number of natural polymorphisms found in that that strain. The N2 strain was used as the reference strain. A chi-square test (FDR < 0.0001) was used to determine whether strains showed less or more variation than expected within the pals-gene family.
The number of polymorphisms within the pals-gene family was further specified per gene. Tajima’s D values were calculated per gene within the C. elegans genome using the PoPGenome package (Pfeifer et al. 2014). The number of polymorphisms within the pals-gene family were compared to the geographical origin of the strain obtained from the CeDNR database (Cook et al. 2017). The data were visualised using the packages ‘maps’ (3.3.0) and ‘rworldmap’ (1.3-6) using custom written scripts (https://git.wur.nl/mark_sterken/Orsay_transcriptomics).
Results
The C. elegans strains N2, JU1580, and CB4856 differ in viral susceptibility
Three strains with a different genetic background were exposed to Orsay virus (OrV) to determine their viral susceptibility. Different levels of OrV (20, 50, or 100µL OrV/500µL infection solution) were used to investigate which amount was needed to reach the maximum viral load at 30 hours after infection. The strains were infected 26 hours post bleaching and were collected as young adults before the start of egg-laying (Fig 1A). In line with previous research the strain JU1580 was more susceptible than the strain N2 (Sterken et al. 2014; Félix et al. 2011). CB4856 showed a lower viral load than N2 and JU1580, therefore being the least susceptible of the three strains (Fig 1B). The maximum viral load could be reached using 50µL OrV/500µL infection solution for all three strains which was therefore used in subsequent experiments (Fig 1B). The genotypes N2 and CB4856 were used for further transcriptional analysis.
The transcriptional response upon Orsay virus infection depends on the genetic background
We infected 26 hour old (L2) nematodes of the strains N2 and CB4856 and allowed the infection to develop for 30 hours. Young adult nematodes (before start of egg-laying) were isolated for transcriptomic analysis. In parallel, the same experiment was conducted with a mock-infection at 26 hours (Figure 1A). First, the transcriptional response of the strains N2 and CB4856 after OrV infection was analysed by means of a principal component analysis (PCA). Genotype explained the main difference in gene expression patterns (36.1%), which is in line with previous results, for example (Snoek et al. 2017; Capra et al. 2008; Li et al. 2006). The PCA showed that the effect of OrV infection is relatively small, explaining less than 5% of the total variance (Fig S1). Therefore, only a mild transcriptional response to infection is to be expected in further analyses.
Gene expression analysis by a linear model showed that 18 genes were differentially expressed upon infection by OrV (FDR < 0.1) (Fig 2A) and 15 genes were differentially expressed by a combination of both treatment and genotype (FDR < 0.1) (Fig 2B). These two groups of genes were largely overlapping (Fig 2C), because most genes only respond to infection in the genotype N2 (Fig 2D). Moreover, among the 7541 genes that were differentially expressed between N2 and CB4856 - under both mock and infected conditions – were 180 genes known to be involved in OrV infection (hypergeometric test, FDR < 0.05). As CB4856 populations showed lower viral loads, the lack of a transcriptional response may result from fewer infected nematodes or infected cells, but it may also result from a difference in (the timing of) the transcriptional response. The genes that were found to respond to OrV infection in the N2 strain were enriched for the pals-family gene class and the genes were associated with dopaminergic neurons, amphid- and sheath cells (FDR < 0.05), which was in agreement with previous studies (Chen et al. 2017; Sarkies et al. 2013).
Time-dependent transcription of Orsay virus response genes
The transcriptional response of three genotypes, N2, CB4856, and JU1580, was measured over a time-course to investigate the dynamics of the transcriptional response. Nematodes were infected in the L3 stage (40 hours after bleaching) and collected 1.5-32h after infection (Fig 3A). Thereby, a high resolution timeseries of transcriptional data for three genotypes and two treatments was obtained. We hypothesised that the low viral susceptibility of CB4856 could result from early activation of IPR genes, thereby counteracting the viral infection more promptly than N2.
A PCA indicated that time of isolation explains most of the variance (38.4%) in the gene expression for all three strains, both with and without infection. This fits previous results, where the transcriptional response over the course of the L4 stage was shown to be highly dynamic (Snoek et al. 2015). The similarity between mock and infected samples suggested that the infection with OrV does not affect the transcriptional patterns of development. Genotype (either N2, CB4856, or JU1580) explained another large part of the variance (21.8%) (Fig S3). The effect of virus on global gene expression was relatively small and explained less than 4.9% of the variance.
Subsequently, the time series data was analysed using a linear model to identify variation in gene expression over time as a result of development, genotype, or infection. Over 10,000 genes were differentially expressed over time (FDR < 0.05) confirming that development was a major determinant of gene expression (Snoek et al. 2015). None of the genes were affected by infection when analysed with the linear model. Yet, the use of a linear model will only detect linear expression dynamics. Therefore, we investigated the temporal expression of several infection-responsive genes found in the N2 and CB4856 experiment in more detail (Figure 3B, Figure S3). The average increase in gene expression over time was estimated by calculating the correlation coefficient (r) for all Genes responding to OrV infection (excluding best-24 and F58F6.3 that showed downregulation upon infection before) (Figure S3B). Gene expression for infection-responsive genes in N2 increased roughly 40% (rmock = 0.00 and rinfected = 0.40). In JU1580 there is smaller increase of about 20% (rmock = −0.04 and rinfected = 0.14). Contrary, in CB4856 genes that respond upon OrV infection in N2 became lower expressed in both mock and infected conditions over time (rmock = −0.25 and rinfected = −0.21). Subsequently, the dynamics per gene were investigated and showed distinct expression patterns depending on the genetic background (Figure 3B, S3A). Genes expression of F26F2.1 and F26F2.4 rises gradually in the genotypes N2 and JU1580 after infection (F26F2.1: Δr = 0.64 and Δr =0.76 respectively, F26F2.4: Δr = 0.88 and Δr =0.74 respectively) but not in the genotype CB4856 (F26F2.1: Δr = −0.06, F26F2.4: Δr =-0.19 respectively) indicating that these IPR genes were not activated in CB4856. Also within the pals-gene family expression patterns were different between CB4856 and N2 and JU1580. For example, over the timeframe of 30 hours expression of the gene pals-11 increased most in N2 (Δr = 1.23), however for CB4856 expression peaked at ∼20 hours after infection reaching the highest maximum throughout the investigated period. In general, for CB4856 expression of the genes responding to OrV infection, was more dynamic than in N2 and JU1580 showing more fluctuation within the measured timeframe (Figure S3A). Together, this indicates that genetic variation can determine the developmental or infection response patterns for OrV-response genes. Of note, the IPR genes in CB4856 were similarly expressed under mock and infected conditions, even though the temporal dynamics may vary when compared to N2. Therefore, we conclude that the viral resistance of CB4856 unlikely results from an accelerated IPR, but instead results from genetic variation acting on basal gene expression.
Transcription of genes in the pals-family is not linked to OrV infection in CB4856
Members of the pals-gene family were previously shown to be involved in the IPR transcriptional response defending against OrV infection and other intracellular pathogens in the strain N2 (Sarkies et al. 2013; Chen et al. 2017; Reddy et al. 2017, 2019). However, in the strain JU1580 several of the pals-genes were not differentially expressed after infection (Sarkies et al. 2013) just as they were not differentially expressed in the genotype CB4856 (Fig 2C). Interestingly, the RNAi deficient strain JU1580 is highly susceptible upon viral infection (Sterken et al. 2014; Félix et al. 2011; Ashe et al. 2013), whereas CB4856 is more resistant than both JU1580 and N2 (Fig 1B). To gain a better understanding of this unexpected result, the transcriptional response in the pals-gene family was explored in more detail from the genotypes N2 and CB4856 30h after infection.
Many of the pals-gene family members became higher expressed after infection in N2, but not in the strain CB4856 (Fig 4A). In agreement with literature, the pals-genes in the cluster on chromosome III (pals-17 until pals-25) were not differentially expressed after OrV infection (Leyva-Díaz et al. 2017; Reddy et al. 2017). Two of these genes, the paralogs pals-22 and pals-25, regulate the IPR transcriptional response (Reddy et al. 2017, 2019) and for both genes the expression is higher in N2 than the CB4856 genotype under mock as well as infected conditions (Fig 4A), which may affect gene expression of other IPR members. Multiple pals-genes with unknown function were highly expressed under mock conditions in CB4856, but not in N2. The most extreme example was pals-14. Upon infection in N2 the expression of pals-14 increased roughly 4-fold, where after the pals-14 expression in N2 equalled that in CB4856 under both uninfected and infected conditions (Fig 4B).
Genetic variation could underlie differences in gene expression. Therefore, the number of polymorphisms between N2 and CB4856 were calculated per pals-gene and compared to total genetic variation present for that gene in natural populations (Cook et al. 2017). The strain CB4856 harbours much of the total genetic variation known to occur in nature for the genomic pals-clusters on chromosome III and V (Cook et al. 2017) (Fig 4C). Given that our experiment used microarrays, differential expression could be due to hybridisation errors (Snoek et al. 2017; Alberts et al. 2007). Therefore, the probe sequences were compared to the genome of CB4856, showing that most probes align at the correct location (Table S2). The alignment shows that expression patterns for at least 20 out of 31 investigated pals-genes - including pals-14, pals-22, and pals-25- were highly likely to hybridize to the probe sequence (>95% overlap on the expected chromosomal location) (Thompson et al. 2015). The remaining 11 pals- genes that had low alignment scores were not taken along in further analyses (Table S2).
Most of the genetically diverse pals-genes on chromosome III and V have previously been shown to display local regulation of gene expression (cis-QTL) (Table S3). Moreover, at least 10 genes across different pals-clusters were regulated by genes elsewhere in the genome (trans-eQTL) (Table S3). Most of the eQTL were consistently found across multiple studies, environmental conditions and labs (Li et al. 2006; Rockman et al. 2010; Sterken et al. 2014; Li et al. 2010; Viñuela et al. 2010, 2012; Snoek et al. 2017). More specifically, pals-22 and pals-25 that regulate expression of other IPR genes were genetically distinct and differentially expressed between N2 and CB4856 in these studies (Li et al. 2006; Rockman et al. 2010; Sterken et al. 2014; Li et al. 2010; Viñuela et al. 2010, 2012; Snoek et al. 2017). Therefore we conclude that regulatory genetic variation within pals-gene family members plays a role in the difference in the IPR between the strains N2 and CB4856.
The pals-family regulating the IPR transcriptional response upon intracellular pathogen infection experiences selective pressure
Regulatory genetic variation within the pals-family linked to differences in the IPR transcriptional response between N2 and CB4856. Therefore, the total genetic variation within the pals-family was investigated for the 330 wild isolates in the CeNDR database (Cook et al. 2017). We examined whether the pals-family contained genetic variation as compared to an average gene family in C. elegans using a chi-square test. For 48 wild isolates the genetic variation in the pals-family is higher than expected (chi-square test, FDR < 0.0001 and ratio pals-gene variants/total variants > 1), but 204 isolates of the 330 analysed strains contains less than expected variation (chi-square test, FDR < 0.0001 and ratio pals-gene variants/total variants < 1) (Figure 5A).
To investigate the genetic diversity of the pals-genes within C. elegans, we tested if DNA sequence divergence appeared random, or that selective forces were acting on the pals-family by computing the Tajima’s D values per gene (Figure 5B). Overall, Tajima’s D values in C. elegans populations were low as a result of overall low genetic diversity (TDmean = −1.08, TDmedian = −1.12) (Andersen et al. 2012). Yet, four pals-genes (pals-17, pals-18, pals-19, and pals-30) show positive Tajima’s D values that indicate either balancing selection acts on these genes or that there is a low frequency of rare alleles. The most extreme example is pals-30 that has an Tajima’s D value of 4.8, the highest value of all C. elegans genes. In total, 11 out of 39 pals-genes have values that fall within the 10% highest Tajima’s D values for C. elegans (TD > −0.42), including IPR regulators pals-22 and pals-25.
Subsequently, the genetic diversity within each pals-gene was analysed to explore if balancing selection might be acting on genes within this family (Figure S4, Table S4). Several pals-genes contain hardly any genetic variation and are therefore conserved on a worldwide scale. This conserved group contains the pals-5 gene which acts downstream in the IPR (Reddy et al. 2017). Other pals-genes are highly variable and can contain hundreds of polymorphisms in a single gene. Interestingly, for most genes in this group few haplotypes exist worldwide. For example, for the gene pals-25 strains harbour either an N2-like allele, an allele containing ∼30 polymorphisms (CB4856 belongs to this group) or an allele containing ∼95 polymorphisms. In total, 19 out of 24 highly variable pals-genes show a clear grouping within 2 or 3 haplotypes supporting that balancing selection could be acting on these genes. In conclusion, individual pals-genes are either conserved or few variants occur for genetically distinct pals-genes. In particular the pals-genes with a division into a few haplotypes show atypically high Tajima’s D values when compared to other C. elegans genes. Thus, the pals-gene family is undergoing evolutionary selection distinct from other genes in the genome possibly resulting from balancing selection.
Some geographical locations may encounter higher selective pressures than others. Therefore, a geographical map was constructed to compare the C. elegans isolation location and the genetic variation found in the pals-genes. However, after mapping the amount of natural variation to the geographical location no clear pattern could be found (Figure S5). Interestingly, some strains that were isolated from the same location, show highly different rates of genetic diversity within the pals-family even though these strains may encounter similar (amounts of) pathogens. For example strain WN2002 was isolated in Wageningen and contains 3 times more polymorphisms in the pals-family than in other genes. Strain WN2066 was isolated from the same compost heap as WN2002, yet contains only 0.27% variation in the pals-family compared to an overall genetic variation of 2.67%. Therefore, it is unclear what drives global conservation versus local divergence of the IPR transcriptional response.
Discussion
In this study we used genetically distinct strains of the nematode C. elegans to measure the transcriptional response upon Orsay virus infection in a genotype- and time-dependent matter. We found that genetic variation in C. elegans determines the Intracellular Pathogen Response (IPR): a transcriptional response that counteracts pathogens by increased proteostasis (Reddy et al. 2017, 2019). In the reference strain N2 this response leads to activation of several members in the pals-gene family that in turn activate ubiquitination related genes such as skr-3, skr-4, skr-5, and cul-6 (Reddy et al. 2019). In contrast, genes in the pals-family are not upregulated in the strains JU1580 (Sarkies et al. 2013) and CB4856. The pals-cluster on chromosome III contains the gene pair pals-22 and pals-25 that together control the IPR (Reddy et al. 2019). Both pals-22 and pals-25 are lower expressed in CB4856 than in N2, likely underlying the difference in IPR between both strains. Multiple cis-QTL indicate that the expression of the pals-genes on chromosome III and V is controlled by local regulatory genetic variation (Li et al. 2006; Rockman et al. 2010; Sterken et al. 2014; Li et al. 2010; Viñuela et al. 2010, 2012; Snoek et al. 2017). Taken together, the genetic variation found in the pals-genes of CB4856 regulates the expression of IPR genes. An investigation of the worldwide variation within 330 C. elegans isolates reveals that the pals-genes show uncharacteristic patterns of genetic variation for this species. Most of the pals-genes are either conserved or few haplotypes exist causing their genetic conservation to be high on a worldwide scale. Population genetic analyses reveal that these genes are experiencing selective pressure which could be a result of balancing selection. Therefore, we argue that genetic variation in the pals-gene family regulates an evolutionary important transcriptional response to environmental stress, such as infection.
The exact cellular function of most IPR members is unclear
The function of most pals-genes is still unclear, even though many of them do become differentially expressed after intracellular infections. The biochemical function of the one common factor in this family, the so-called ‘protein containing ALS2cr12 (ALS2CR12) signature’ is still unknown. Besides functioning as an intracellular defence pathway, the pals-genes are involved in handling proteotoxic stress. More specifically, the gene pair pals-22 and pals-25 form a molecular switch between anti-stress and developmental pathways (Reddy et al. 2019). Gene expression of 22 pals-genes is enriched in dauer and/or L4 male expressed genes which suggests these pals-genes may have additional functions besides their functioning in the IPR (Gerstein et al. 2014; Leyva-Díaz et al. 2017). Indeed, one of the pals-genes, pals-22, is known to have dual biological functions. pals-22 controls the IPR and silences repetitive RNA (Leyva-Díaz et al. 2017), although whether both functions are executed via a similar mechanism remains unknown. Moreover, there may be functional redundancy within the pals-genes themselves, even though many pals-genes have highly dissimilar DNA and protein sequences (Leyva-Díaz et al. 2017). The gene pair pals-22 and pals-25 has been shown to regulate gene expression of other members in the IPR, however in pals-22;pals-25 double mutants the other IPR genes can still respond to infection (Reddy et al. 2019).
IPR genes of the pals-family are under selection
Multiple pals-genes show high Tajima’s D values, in particular the genes on the first and second cluster on chromosome III (0.1 and 1.4Mb) that all but one fall within the top 10% of genes with the highest Tajima’s D values. Positive Tajima’s D values indicate either balancing selection or a lack of rare alleles whereas negative Tajima’s D values show that rare alleles are present at high frequencies or there has been a recent selective sweep or population bottleneck (Tajima 1989). For C. elegans most of the genes show negative Tajima’s D values due to a recent selective sweep affecting chromosome I, IV, V, and X greatly reducing the genetic variation within the species (Andersen et al. 2012). After a selective sweep, novel genetic variation arises depending on the mutation rate and spreads depending on the outcrossing and migration rate of the species. The mutation rate in C. elegans is comparable to other multicellular species such as Drosophila (Denver et al. 2009; Haag-Liautard et al. 2007; Denver et al. 2004), however the outcrossing rate in C. elegans is exceptionally low. Estimated outcrossing rates in the wild range from 1-0.1% per generation (Andersen et al. 2012; Barrière and Félix 2005, 2007). In agreement, males are hardly found in wild populations and outbred populations show lower fitness (Barrière and Félix 2005; Richaud et al. 2018; Yon Rhee et al. 2008; Dolgin et al. 2007), although higher male frequencies can be beneficial under stress conditions (Teotonio et al. 2012; Morran et al. 2011, 2009b, 2009a).
As C. elegans mostly reproduces by selfing, genetic drift is proposed to be the main source of natural variation (Cutter 2010). In agreement, there has been little evidence for selection in C. elegans, but balancing selection can occur when environmental conditions determine the fitness of the strain. In that case, the benefit of the allele depends on the present environment and several alleles may be maintained actively in the population (Greene et al. 2016; Teotonio et al. 2017). The pals-genes with relatively high Tajima’s D values on chromosome III are located in a region estimated to have diverged 10° generations ago. Despite this early divergence, few haplotypes occur for this region which could result from long-term balancing selection (Thompson, et al. 2015). The transcriptional regulators of the IPR, pals-22 and pals-25, fall within this region and are expected to have robust transcription within the IPR pathway for most C. elegans isolates in general. This is in line with our results that show only a minority of strains, including CB4856, carry distinct regulatory genetic variants. The transcriptome in C. elegans was shown to experience high stabilizing selection, thus favouring robust transcription of genes on a population level despite genetic variation (Denver et al. 2005). Yet, some strains potentially harbour regulatory genetic variation which may be beneficial in certain environments. Possibly, when environmental stress is present, constant growth may be preferred over the pals-22/pals-25 molecular switch controlling growth and anti-stress programs. Therefore, the pals-gene evolution could result from an environmental factor such as pathogen presence.
Finding out which environmental factor could explain the population genetic patterns within the pals- genes of the IPR will be challenging. The IPR pathway has been shown to respond to multiple environmental stressors including intestinal and epidermal pathogens, but also heat stress (Reddy et al. 2019, 2017). Despite the increasing amount of ecological data for both C. elegans (Cook et al. 2017) and its pathogens (Richaud et al. 2018; Zhang et al. 2016), it is not yet sufficient to draw any firm conclusions whether co-occurrence of host and pathogen drives evolution within the pals-family. However, some evidence exists that host-pathogen interactions can affect the genotypic diversity at a population level. In Orsay (France), the location where Orsay virus is found, diversity in pathogen susceptibility potentially explains the maintenance of several minority genotypes. These genotypes are outcompeted in the absence of the intracellular pathogen Nematocida parisii, but perform better in the presence of the pathogen(Richaud et al. 2018). Additional transcriptional and genotypic investigation of the strains isolated in Orsay could reveal if IPR activity explains these observations. Moreover, C. elegans populations may experience distinct heat stress on the microscale they live in than expected based on the overall weather conditions at the sampling location (Petersen et al. 2014). Experimental evolutionary experiments hold the potential to bridge to gap between the lab and the field by investigating if the presence of intracellular pathogens or application of heat stress invokes any genetic and transcriptional changes within the pals-family (Gray and Cutter 2014; Teotonio et al. 2017).
The IPR response is genotype dependent
The viral susceptibility of C. elegans does not link directly to the IPR transcriptional response as CB4856 and JU1580 both lack upregulation of most pals-genes, but differ strongly in viral susceptibility. Compared to CB4856 the strain JU1580 harbours less genetic variation within the pals-family (11% and 6% respectively) when compared to the reference strain N2. Moreover, the expression pattern over time for pals-22 and pals-25 was similar between N2 and JU1580 (data not shown). Thus, the absence of a (strong) IPR in JU1580 could have another source than in CB4856. We hypothesise that crosstalk with the RNAi response is necessary to activate the IPR, since upregulation of pals-genes is also not occurring in rde-1, sta-1, and sid-3 mutants in the N2 background (Tanguy et al. 2017; Chen et al. 2017). The genes sta-1 and sid-3 are involved in activation of the RNAi pathway, whereas rde-1 is involved further downstream in the RNAi pathway mutants (Tanguy et al. 2017; Chen et al. 2017). Conjunction between the RNAi pathway and pals-22 has been suggested before by results from Leyva-Díaz et al. showing that pals-22 silences transgenes in wild-type animals. Yet when the RNAi gene rde-4 is mutated, pals-22 cannot execute its silencing function anymore (Leyva-Díaz et al. 2017). Antiviral RNAi in C. elegans functions via both rde-4 dependent and rde-4 independent mechanisms. In the rde-4 dependent mechanism rde-4 was proposed to process viral dsRNAs together with drh-1 (Guo et al. 2013). A dependency of the IPR on the RNAi pathway may explain why IPR activation would be lacking in the drh-1 deficient strain JU1580. Furthermore, most OrV transcriptional studies so far have focused on genes with a function in the IPR or the RNAi pathway. Yet transcriptional studies have yielded several genes with functions that do not (yet) link to any of the two pathways (Tanguy et al. 2017; Sarkies et al. 2013; Chen et al. 2017). The function of these genes in OrV infection could further enlighten the complexity of host-virus interactions.
CB4856 resistance to OrV does not result in a clear transcriptional response
The CB4856 strain does not show a transcriptional response to OrV once the maximum viral load is reached. There are several non-exclusive explanations as to why CB4856 does not show transcriptional changes. First, the viral loads may not be high enough to induce transcriptional changes in CB4856. Yet some of the CB4856 samples analysed contain viral loads equalling the viral loads of N2 samples that show transcriptional changes. Second, some pals-genes in CB4856 vary in temporal gene expression. Higher expression of IPR genes upon infection may speed up the process of counteracting viral infection to minimize its impact, thereby allowing for quick transcriptional recovery. Last, the IPR transcriptional antiviral response in CB4856 may be less efficient than that of N2, but this is compensated by other genes. N2 and CB4856 are among the most genetically diverse C. elegans strains resulting in a large set of different protein variants (Thompson et al. 2015). Possibly some CB4856 variants, for example one encoding an antiviral protein, could have a higher binding affinity to its target than the N2 variant. In that case, upregulation of the CB4856 variant may not be necessary to in the end still effectively counteract the viral infection.
Future perspectives
This study built a large dataset for the transcriptional response upon OrV infection in the genotypes N2 and CB4856 and temporally transcriptional data for three genotypes. For the first time the OrV transcriptional response was measured in the genotype CB4856 and in a temporally detailed manner for this genotype and the genotypes N2 and JU1580. However, the study design also contains some limitations. First, the use of microarrays that are designed for the N2 genotype is estimated to affect about 1600 spots (3.5%) due to hybridisation differences (Snoek et al. 2017). Subsequent studies could make use of RNA-seq techniques, although this data also needs to be carefully handled to ensure that genetic variants are mapped to the correct reference genes in the N2 or CB4856 genome (The C. elegans Sequencing Consortium 1998; Thompson et al. 2015; Piskol et al. 2013; Deelen et al. 2015). Second, although we did not detect any transcriptional changes in the strain CB4856, there may be local changes in the intestinal cells and for certain individuals. Orsay virus infection does not infect all individuals within the population and its cellular tropism is limited to the intestinal cells (Félix et al. 2011; Franz et al. 2013). Therefore, local changes will be largely diluted in a population readout. Transcriptional techniques, such as single-cell RNA-seq or TOMO-seq (Ebbing et al. 2018; Trapnell et al. 2017), that identify gene expression within infected cells will provide more details about the transcriptional response in the cells at stake. Third, the samples for the timeseries were taken in a temporally highly detailed manner by investigating 12 timepoints within 30 hours. Yet, the effect of development is large, confounding the identification of the relatively small effect of viral infection over time. Further research investigating the effect of development on gene expression and infection over time may reveal additional information on the temporal expression of Genes responding to OrV infection.
This study provides insights into the natural context of the evolutionary conserved genetic and the plastic, transcriptional response after infection. Our results show that genetic variation within C. elegans natural isolates correlates to diversity in the transcriptional response upon OrV infection, thus indicating how genetic variation can shape the transcriptional response after infection. We show that few genetic variations occur worldwide within clusters of pals-genes that regulate the IPR transcriptional response. Therefore, we suggest that genes that function in the IPR transcriptional responses experience an evolutionary pressure such as presence of intracellular pathogens. However, the importance of the transcriptional responses appears to vary as well, as the activity of the IPR response does not directly link to the viral susceptibility. This study provides new insights into the diversity of ways that host can develop both genetic and transcriptional responses to protect themselves from harmful infections.
Declarations
Availability of data and materials
All strains used can be requested from the authors. The transcriptome datasets generated are deposited at ArrayExpress (E-MTAB-7573 and E-MTAB-7574).
Competing interests
The authors declare that they have no competing interests.
Funding
LvS was funded by the NWO (Nederlandse Organisatie voor Wetenschappelijk Onderzoek) (824.15.006), MGS was funded by the Graduate School Production Ecology & Resource Conservation (PE&RC).
Authors’ contributions
BLS, GPP, JEK and MGS conceived and designed the experiments. LvS, KB, FP, TB, JAGR, and MGS conducted the experiments. LvS and MGS conducted transcriptome and main analyses. LvS, GPP, JEK, and MGS wrote the manuscript. All authors read and provided comments on the manuscript.
Acknowledgements
The authors want to thank Erik Andersen for hosting and sharing natural variation data on CeNDR and his advice on population genetic analyses.
Supplementary figures
Figure S1 Principal component analysis for gene expression in (un)infected C. elegans N2 and CB4856 – Principal component analysis for the gene expression data obtained for the nematodes that were infected 26 hours post bleaching and collected 30 hours post infection. PC axes that explain at least 5% of the total variation are shown. The genotype (N2 or CB4856) is indicated by colour and the treatment (mock or OrV infection) is indicated by shape.
Figure S2 Principal component analysis for gene expression in (un)infected C. elegans N2, CB4856 and JU1580 – Principal component analysis for the gene expression data obtained for the nematode that were infected 40 hours post bleaching and collected at 1.5, 2, 3, 8, 10, 12, 20.5, 22, 24, 28, 30.5, or 32 hours post infection. PC axes that explain at least 4.9% of the total variation are shown. The genotype (N2, CB4856, or JU1580) is indicated by colour, treatment (mock or OrV infection) is indicated by shape and the timepoint is indicated by size.
Figure S3 Analysis of C. elegans expression patterns for OrV-response genes in the time series data – A) Gene expression patterns for N2 mock, N2 infected, CB4856 mock, CB4856 infected, JU1580 mock, and JU1580 infected nematodes of the 30 Genes responding to OrV infection that were found in dataset of infected N2 and CB4856 nematodes (infected at 26 and collected at 56 hours post bleaching). B) Correlation coefficients of the gene expression of N2 mock, N2 infected, CB4856 mock, CB4856 infected, JU1580 mock, and JU1580 infected nematodes over time per OrV response gene.
Figure S4 Genetic variation per pals-gene per C. elegans strain – The total number of polymorphisms within the pals-gene family is plotted against the number of known polymorphisms per pals-gene. Each dot represents a strain of the CeNDR database and the colour of the strain indicates if the pals-family within this strain is depleted or enriched in polymorphisms as determined by the chi-square test (FDR < 0.0001).
Figure S5 Geographical distribution of natural variation within the C. elegans pals-gene family – A) Location of CeNDR isolates worldwide. The amount of natural variation (%) within the pals-pathway is indicated by the colour of the dot. All natural isolates have been grouped in a quantile (the first quantile exhibits least natural variation in the pals-pathway, the fourth exhibits most natural variation). B) Zoomed in representation of Figure 4A of the strains collected in Europe. C) Zoomed in representation of Figure 4A of the strains collected on Hawaii.
Supplementary tables
Table S1 Overview of C. elegans genes involved in OrV infection described in literature
Table S2 Alignment of C. elegans CB4856 sequence to probes that amplify the pals-genes
Table S3 cis- and trans-eQTL found for the pals-genes
Table S4 Conservation, haplotype number and Tajima’s D value per pals-gene