ABSTRACT
Exercise is recommended by health professionals across the globe as part of a healthy lifestyle to prevent and/or treat the consequences of obesity. While overall, the health benefits of exercise and an active lifestyle are well understood, very little is known about how genetics impacts an individual’s inclination for and response to exercise. To address this knowledge gap, we investigated the genetic architecture underlying natural variation in activity levels in the model system Drosophila melanogaster. Activity levels were assayed in the Drosophila Genetics Reference Panel 2 fly strains at baseline and in response to a gentle exercise treatment using the Rotational Exercise Quantification System. We found significant, sex-dependent variation in both activity measures and identified over 100 genes that contribute to basal and induced exercise activity levels. This gene set was enriched for genes with functions in the central nervous system and in neuromuscular junctions and included several candidate genes with known activity phenotypes such as flightlessness or uncoordinated movement. Interestingly, there were also several chromatin proteins among the candidate genes, two of which were validated here and shown to impact activity levels. Thus, the study described here reveals the complex genetic architecture controlling basal and exercise-induced activity levels in D. melanogaster and provides a resource for exercise biologists.
INTRODUCTION
Obesity is a disease associated with significantly higher all-cause mortality relative to normal weight (Carmienkeet al. 2013; Flegalet al. 2013; Masterset al. 2013). This increase in mortality can be attributed largely to the elevated incidence of cardiovascular disease (Ebbertet al. 2014; Molicaet al. 2015; Sisnowskiet al. 2015), cancer (Berger 2014; Garget al. 2014; Nakamuraet al. 2014; Parket al. 2014), and diabetes (Nomuraet al. 2010; Riobo Servan 2013; Polsky And Ellis 2015) observed in obese individuals. The increasing prevalence of obesity is a serious international public health issue that has warranted action from legislators worldwide (Harteminket al. 2006). The upswing in rates of the disease throughout the United States caused the national medical expenditure dedicated to treating obesity-related illnesses in adults to increase by 29% between 2001 and 2015 (Bieneret al. 2018). Therefore, strategies to counter the broadening obesity epidemic are needed to ensure the medical and financial wellbeing of society at large.
Exercise is among the most common treatments for obesity, which also include surgical procedures, medications, and other lifestyle modifications (Baretic 2013; Wyatt 2013; Kushner 2014; Martinet al. 2015). Given the relatively risk-free nature of exercise as a method of weight loss compared with many other treatment options, it is widely considered to be an essential component of treatment regimes for obesity (Mcqueen 2009; Laskowski 2012; Fonseca-junioret al. 2013). In addition to treating obesity, exercise imparts a number of health benefits including improved muscle function (Andersenet al. 2015; Coenet al. 2015; Kimet al. 2015) and cartilage integrity (Tonevitskyet al. 2013; Breitet al. 2015; Blazeket al. 2016), increased insulin sensitivity (Mitrouet al. 2013; Brocklebanket al. 2015), and prevention of many chronic conditions (Boothet al. 2012). These benefits led exercise to be recognized as an important facet of a healthy lifestyle, with government agencies such as the U.S. Department of Health and Human Services issuing specific exercise recommendations for adults, youths, and children (e.g. adults should do at least 150 minutes of moderate-intensity, or 75 minutes of vigorous-intensity aerobic activity per week) (DHHS 2008). Thus, exercise is an important component of many people’s lives, whether to treat obesity or to improve overall health.
Despite the growing relevance of exercise, there is a lack of clarity concerning a number of factors that influence its physiological effects (Karolyet al. 2012), most notably genetic background. Although exercise has gained considerable popularity as both a lifestyle choice and a treatment for obesity, there are large differences in how individuals respond to exercise, and it is not universally effective (Keithet al. 2006; McAllisteret al. 2009). In fact, exercise provides no metabolic improvements to certain individuals (Bouchardet al. 1999; Bouchardet al. 2011; Stephens and Sparks 2015), revealing an extreme disparity in exercise response that can likely be accounted for, at least in part, by genetic variation. Moreover, existing data suggest a relationship between exercise-induced improvements to muscle metabolism and exercise performance in humans (Larewet al. 2003). Thus, while some genes have been identified as contributors to physical activity traits of an individual (Stubbeet al. 2006; de Geus et al. 2014), the genetic architecture controlling exercise responses has yet to be characterized.
As an emerging model organism for exercise studies, Drosophila melanogaster possesses several characteristics advantageous for elucidating the relevant genetic architecture (Piazzaet al. 2009; Tinkerhesset al. 2012; Sujkowskiet al. 2015; Mendezet al. 2016; Watanabe And Riddle 2017). Traditional obstacles for exercise studies, including difficulties in controlling for essential variables such as age, sex, fitness, and diet, can be addressed using D. melanogaster. Furthermore, Drosophila is an established model system with a well-characterized genome and ample tools for genetic studies. These tools include the Drosophila Genetics Reference Panel 2 (DGRP2), a fully sequenced population of 200 genetically diverse isogenic lines for quantitative genetic studies (Mackayet al. 2012; Huanget al. 2014), in addition to large collections of mutants and RNAi knockdown lines. Furthermore, a high degree of genetic (Schneider 2000) and functional (Hewitt And Whitworth 2017) conservation exists between Drosophila and humans, particularly in areas such as energy-related pathways (Edisonet al. 2016) and disease genes (Pandey And Nichols 2011; Uguret al. 2016), which allows findings from Drosophila to be tested to mammalian model systems. Together, these features make Drosophila an excellent choice for studies of exercise genetics.
Several innovative studies demonstrate that exercise treatments of Drosophila produce significant physiological and behavioral responses, including increased lifespan and improved climbing ability (Piazzaet al. 2009; Tinkerhesset al. 2012; Sujkowskiet al. 2015; Mendezet al. 2016; Watanabe And Riddle 2017). For example, treatments with the Power Tower, which prompts exercise by exploiting the negative geotaxis of Drosophila by repeatedly dropping their enclosures, causing the animals to fall to the base and attempt another climb, improves mobility in aging animals (Piazzaet al. 2009). The TreadWheel exploits negative geotaxis through slow rotation of fly enclosures to stimulate a response; responses to prolonged exercise on the TreadWheel include, for example, changes in triglyceride and glycogen levels in the animals (Mendezet al. 2016). The Rotating Exercise Quantification System (REQS) is an offshoot of the TreadWheel, which is able to record the activity levels of flies as they exercise, facilitating the comparison of different exercise regimes and allowing for the normalization of exercise levels (Watanabe And Riddle 2017). The REQS validation study also demonstrated that there is significant variability among different Drosophila genotypes in how they respond to the rotational exercise stimulation (Watanabe And Riddle 2017), suggesting that genetic factors contribute to the difference in exercise levels observed.
In this study, we investigate the genetic factors contributing to the level of exercise induced through rotational stimulation. Using the REQS, we measured basal activity levels (without rotation) as well as induced exercise levels (with rotation) in 161 genetically diverse strains from the DGRP2. Next, we used a genome-wide association study (GWAS) to identify the genetic variants responsible for the approximately 10-fold variation in activity levels observed within the DGRP2 lines. We identified over 100 annotated genes that contribute to basal and induced activity levels. The loci that control activity levels are different for the untreated and exercise-treated conditions and often also differ between males and females. Additional characterization of candidate genes validate the results of our GWAS and confirm that genes with functions in the central nervous system as well as some chromatin proteins impact Drosophila activity levels. Together, our findings provide key insights into the number and types of genetic factors that control basal and exercise-induced activity levels, provide an array of candidate genes for follow-up studies, and identify chromatin modifiers as a new class of proteins linked to exercise.
MATERIALS AND METHODS
Drosophila Lines and Husbandry
The DGRP2 fly lines used in this study were obtained from the Bloomington Drosophila Stock Center or from our collaborator Dr. Laura Reed (University of Alabama). Fly lines for the follow-up analysis of candidate genes (Supplemental Table S1) were obtained from the Bloomington Drosophila Stock Center, with the exception of the wde mutant stock, which was obtained from Dr. Wodarz (Georg-August-Universität Göttingen, Germany). Drosophila were grown on media consisting of a cornmeal-molasses base with the addition of Tegosept, propionic acid, agar, yeast, and Drosophila culture netting (Mendezet al. 2016). All flies used for the assays were reared in an incubator at 25°C and ~70% humidity with a 12-hour light-dark cycle for a minimum of three generations prior to the start of the experiments. To minimize density effects, animals were grown in vials established by mating seven male to ten female flies for each line. The resulting progeny were collected as virgins, aged 3-7 days, separated by sex, and then used for the exercise experiments.
Exercise Quantification Assay
Basal and induced exercise activity levels were determined for a total of 161 strains (155 basal and 151 induced) from the DGRP2 using the REQS (Mackayet al. 2012; Huanget al. 2014; Watanabe And Riddle 2017; Watanabe And Riddle 2018) (Supplemental Table S2). For each DGRP2 line, 100 male and 100 female virgin flies aged 3-7 days were used. The animals were anesthetized with CO2, divided into groups of 10 (n=10 groups per sex per line), and loaded into the vials of the REQS at 9AM [for additional details see (Watanabe And Riddle 2018)]. The REQS was moved into the incubator at 10AM, and the vials were rotated to a vertical orientation similar to how flies are reared in a laboratory setting. The animals were allowed to recover from anesthetization for one hour. The basal activity level of the animals was measured from 11AM to 12PM, keeping the REQS in a static position without rotation. At 12PM, the REQS’ rotational feature was turned on at 4 rotations per minute (rpm) to induce exercise in the animals, and activity levels were monitored during this exercise until 2PM. Measurements were taken at five-minute intervals during both the basal and induced activity phases.
Statistics
Basal and induced activity levels were calculated as the average activity level per five-minute interval of each vial/genotype/sex combination. A GLM (general linear model with gamma log-link) to investigate the impact of the factors vial, genotype, sex, and treatment on activity levels was performed using SAS9.4 software (Inc 2013). As the initial analysis showed no effect of “vial”, “vial” was removed from the final model. Descriptive statistics were generated in SPSS25 (IBM 2017) and R (R Development Core Team 2018). Custom Perl scripts were used for the SNP classification analysis in addition to R (R Development Core Team 2018).
Genome Wide Association Study (GWAS)
Basal and induced activity levels were separately analyzed by calculating the average activity level per five-minute interval of each genotype/sex combination. These phenotypic values (Supplemental Table S3) were used for two separate GWASs using the DGRP2 webtool (http://dgrp2.gnets.ncsu.edu/) developed by Dr. Trudy Mackay at NC State University (Mackayet al. 2012; Huanget al. 2014). Genetic variants that met a significance threshold of p<10-5 in any of the analyses (mixed model, simple regression model, female data, male data, combined data, and sex difference analysis) were considered candidate loci (Supplemental Table S3). Candidate genes for follow-up and validation were selected based on significance level, mutant availability from Drosophila stock centers, and reports on FlyBase (Gramateset al. 2017) consistent with phenotypes that might be linked to exercise/activity.
Gene Ontology (GO)
GO analysis was performed on the genes associated with the genetic variants that met the significance threshold of p<10-5 separately for both the basal and induced GWAS results. Genetic variants lacking association with a specific gene were removed from the list, and duplicate genes were removed as well. FlyBase gene IDs were retrieved for the gene sets using The Database for Annotation, Visualization and Integrated Discovery (DAVID) ID conversion function (Huang daet al. 2009b; Huang daet al. 2009a). GO analysis was carried out using PANTHER (Protein Analysis Through Evolutionary Relationships) (Miet al. 2013; Miet al. 2017). The “biological process” and “cellular component” enrichment terms were used.
Functional analysis of candidate genes
Candidate genes were selected for follow-up experiments based on their highly significant p-values in the GWASs, as well as based on the annotation available on FlyBase. Mutant lines for eIF5B, Suz(2), wde, Cirl, nwk, cpo, SmydA-9, and Prosap (see Supplemental Table S4 for genotypes) were characterized for basal and induced activity levels using the same protocol as described above in the section entitled “Exercise Quantification Assay.” For UAS/Gal4 lines, the UAS transgene was brought together with the nervous system GAL4 driver P{w[+mC]=GAL4-elav.L}2 through a single generation cross.
Data availability
All data necessary for confirming the conclusions of this article are represented fully within the article and its tables, figures, and supplemental files, with the exception of the genotype data for the DGRP2 population and the GWAS model. This information can be found at http://dgrp2.gnets.ncsu.edu/.
RESULTS
The DGRP2 shows extensive variation in basal and exercise-induced activity levels
To investigate the genetic basis of variation in activity levels, both basal and exercise induced, we focused on the DGRP2 collection of Drosophila strains. The DGRP2 is a collection of 200 inbred lines of Drosophila derived from wild-caught females, representing genetic variation that is present in a natural population. To measure activity levels, we used the REQS, as it allowed us to record basal activity of the animals without rotation and induced activity levels during rotation. The rotation induces exercise (higher activity levels) through the animals’ negative geotaxis response. Using 161 strains from the DGRP2, we measured basal activity of the animals as well as the activity during rotationally-induced exercise in a single experiment (Figure 1). The output from the REQS is the average activity level per 10 flies per five-minute interval, which was estimated based on a one-hour recording for the basal activity and a two-hour recording for the exercise activity.
Figure 2 shows the results from this set of experiments, with data from females (A+B) and males (C+D) shown separately. All activity measures described are given as crossings/five minutes/fly. We found a wide range of average activity levels for both basal and exercise-induced activity. The highest level of basal activity in males was found in line 57, which exhibited 144.5 +/- 9.4 activity units (mean +/- SEM), while for females line 595 had the highest basal activity with 58.55 activity units (+/-3.9). The highest performing line was the same for exercise-induced activity in both sexes, line 808 with an average of 155.25 activity units (+/-3.56) in males and an average of 133.94 activity units (+/-3.60) in females. Interestingly, for males the lowest basal and induced activity levels were measured both in line 383 with 0.275 activity units (+/-0.1) for basal activity and 1.63 activity units (+/-0.79) for exercise-induced activity. The low performer in the females was different between basal and induced activity: Line 390 females had the lowest basal activity with 0.53 activity units (+/-0.14), while line 32 with an average of 1.22 activity units (+/-0.26) had the lowest exercise-induced activity level. Looking across all factors, the variation in mean activity levels illustrated in Figure 2 range from a low of 0.275 +/- 0.1 activity units (line 383, basal males) to a high of 155.25 +/- 3.56 activity units (line 808, induced males). Thus, our highest activity measurement showed an approximately 500-fold increase from the lowest measurement, demonstrating that there is extensive variation in animal activity based on genotype, sex, and treatment within the DGRP2 population.
Several general trends can be discerned from the bar graphs in Figure 2: 1) Across the 161 lines, female flies tend to have a lower basal activity level than males (compare A to C); 2) rotational stimulation tends to increase activity levels (compare A to B and C to D), and 3) while basal activity levels tend to be lower in females, exercise activity levels tend to be more similar between the sexes (compare A and C to B and D). However, there are exceptions to these general trends. For example when comparing male to female activity, of the 155 basal lines measured, 27 lines displayed significantly higher activity levels in males, but there is one genotype (line 83) which exhibited higher activity levels in females (40.35 female activity, 4.87 male activity, p=0.044). Similarly, for the induced activity levels, 12 lines showed significantly greater activity in males than in females, while there was one line that showed the opposite trend, higher activity levels in females than in males (line 796, 122.01 female activity, 75.6 male activity, p<0.001). When observing the effects of exercise, as expected 41 genotypes demonstrated significantly increased activity with only three lines showing decreased activity. Thus, most genotypes showed increased activity when rotated, and males exhibited higher activity levels than females.
Sex, genotype, and exercise treatment impact activity
Next, we investigated the factors influencing the variation in activity levels we observed in Figure 2 utilizing a general linear model (GLM) analysis. Specifically, the GLM examined the impact of treatment (with or without rotation), sex, genotype, as well as the interactions between these factors. Our results indicate that activity levels were significantly impacted by treatment, illustrating that rotation indeed is able to increase activity levels above baseline in this genetically diverse population of fly strains (p<0.001, Table 1). In addition, sex and genotype significantly impacted activity levels, as suggested by the descriptive data illustrated in Figures 2 and S1 (p<0.001). Interaction effects between treatment, sex, and genotype also impacted the activity levels measured by the REQS, indicating that to be able to predict activity phenotypes, sex, treatment, and genotype must all be considered together (Table 1). Our descriptive statistics combined with the GLM analysis thus indicate that for the activity phenotype there is tremendous variation between the DGRP2 lines, some of which is due to genetics (genotype effect). This finding suggested that individual genes underlying the variation in activity levels might be identified by a GWAS.
GWAS identifies over 300 genetic variants impacting activity levels
To identify the genetic factors impacting basal and exercise-induced activity levels in the DGRP2 population, we carried out a GWAS. The analysis was run separately for the basal and exercise-induced activity levels (155 and 151 lines respectively) using the DGRP2 GWAS Webtool (http://dgrp2.gnets.ncsu.edu/). The webtool uses two different models for the analysis, a simple regression model, as well as a more complex mixed model, and the analysis is carried out for male data only, female data only, combined data from both sexes, and for the difference between sexes. Together, this set of GWASs identified over 400 genetic variants [single nucleotide polymorphisms (SNPs), multiple nucleotide polymorphisms (MNPs), and deletions (DEL)] that impact basal and exercise-induced activity levels (p<0.005; Figures 3 and S2, Tables 2 and S2), illustrating the importance of genetic factors for exercise phenotypes.
The genomic variants identified by the GWAS are distributed throughout the genome, and with the exception of the small 4th chromosome, all chromosome arms contain genetic variants contributing to the basal and induced exercise activity phenotypes (Table S2). Comparing the chromosomal distribution of the genetic variants identified as significant in the basal activity or induced exercise activity analysis to that of the overall distribution of variants using chi-square analysis, we found no significant deviations from the expectation, indicating that the variants are not clustered in any particular way in the genome. Overall, the results of this GWAS demonstrate that a large number of loci contribute to the two activity phenotypes measured and that the genetic architecture underlying the phenotypes is complex.
Basal activity variant distributions differ significantly from genome-wide set
Next, we investigated what types of genetic variants were identified as significant in the GWAS. In order to compare the classifications of the variants from the basal and induced analyses to the entire genome, the variants from the induced and basal activity were first characterized based on their genomic context (Introns, Exons, Upstream, Downstream, UTR, and Unknown). We then compared the classification distributions from the basal and induced analyses to a genome-wide set using a chi-square test. We found that only the basal activity variant distributions were significantly different from the genome-wide set (p=5.413*10-7) with a greater number of exons, upstream, and unknown elements (Figure 4). The classification for the variants contributing to induced exercise activity were not significantly different from the genome-wide classification of variants, possibly due to the smaller number of variants detected in this analysis and a concomitant reduced power to detect differences. Thus, the GWASs carried out here identify a diverse set of genetic variants as contributing to the activity phenotypes under study, and the overrepresentation of exon variants in the basal activity analysis suggests that these variants might indeed present genes important for animal activity and exercise.
Genetic variants impacting activity levels differ between males and females
The genetic variants identified as significantly associated with the two activity phenotypes differed between the analyses, some linked to activity in the males, some linked to activity in females, and some variants linked to the difference in activity levels between males and females. While the separate analyses of both male and female data lead to the identification of genetic variants, more genetic variants were identified in males than in females, and the combined analysis recovered more variants identified in males than in females. For example, the mixed model analysis of basal activity levels identified 53 genetic variants in females, 93 genetic variants in the males, and in the combined analysis, 46 variants are identified as significant, 22 of which overlap with the variants identified using the male data, while only one variant also occurs in the female analysis (p<5*10-5). These results illustrate the importance of collecting data in both sexes, as the genetic factors contributing to both basal and induced exercise activity levels differ between males and females. The results also suggest that the difference between males and females is under genetic control.
GWAS discovers positive and negative effect variants impacting activity levels
The genetic variants identified as significantly associated with basal and/or induced exercise activity include loci with positive and negative effects on the phenotype. In the analysis of the induced exercise activity data, only genetic variants with negative impacts on activity levels were identified in either sex. Examining the data for basal activity, in males, the majority of variants negatively impacted basal activity, and only 1.5% of variants (three out of 195; mixed model) increased activity levels. These rare genetic variants leading to increased activity levels were associated with the genes bdg and slo. In females, the results are similar, with the majority of variants (63; mixed model) impacting basal activity levels negatively and positive effect variants being rare (three variants associated with CG32521 and CG8420). Overall we found that the vast majority of our discovered variants resulted in a negative effect on activity.
Terms related to the central nervous system and muscle function are over-represented in the GO analysis
Next, we focused on the genetic variants identified in the GWAS as significant that were associated with genes and asked what types of genes contributed to the basal and induced activity phenotypes (146 genes for basal activity; 47 genes for induced exercise activity). To do so, we explored the gene ontology (GO) terms associated with the gene sets linked to basal and induced activity levels. In order to determine which gene classes were over- or under-represented, GO analysis was carried out using the PANTHER (Protein Analysis Through Evolutionary Relationships) tools (Miet al. 2013; Miet al. 2017). Specifically, enrichment analysis was used to identify biological processes, cellular components, or molecular functions over-represented within the GWAS gene set relative to the genome as a whole.
For the GO term enrichment analysis for the genes contributing to basal activity levels, 30 GO terms were identified in the “biological process” category and 17 in the “cellular component” category. Among the “biological process” and “cellular component” GO terms, the largest fold enrichments were seen for two terms related to neurons, “axonal growth cone” (26.24-fold enrichment; p=3.3*10-4) and “neuron recognition” (7.51-fold enrichment, p=5.79*10-5) (Figure 5). Other significantly enriched GO terms include “neuromuscular junction”, “synaptic transmission”, and “behavior” (Figure 5). The GO term enrichment analysis of the gene set associated with exercise-induced activity identified a 27-fold enrichment for genes involved in the Alzheimer disease-presenilin pathway (p=2.01*10-4). Thus, the GO term enrichment analysis identifies a variety of terms associated with neuronal function and behavior as characterizing the gene set involved in controlling basal and exercise-induced activity levels.
Candidate gene analyses support the GWAS results and suggest that the chromatin proteins WDE and SU(Z)2 impact animal activity
In order to assess the success of the GWAS analysis, we surveyed the information available on FlyBase (Gramateset al. 2017) to determine if altered activity phenotypes have been described associated with the candidate genes identified here. Mutants in 25 of the candidate genes have been described as either “flightless”, or “flight defective” on FlyBase (Gramateset al. 2017), five were described as “uncoordinated,” and one (Cirl) was described as “hyperactive” and “sleep defective” (Van Der Voet et al. 2015). Based on their molecular functions and previously described phenotypes, 10 genes were selected for follow-up and mutant/RNAi lines were obtained: eIF5B (eukaryotic translation initiation factor 5B; Lasko 2000; Johnstone And Lasko 2004), Su(z)2 (a member of the Polycomb Repressive Complex 1 [PRC1]; Wu And Howe 1995; Loet al. 2009; Nguyenet al. 2017), wde (a cofactor of the H3K9 histone methyltransferase EGG; Kochet al. 2009; Yuet al. 2015), Cirl (a Calcium-independent receptor for α- latrotoxin with hyperactivity phenotype; Andersenet al. 2015; Scholzet al. 2015; Scholzet al. 2017), nwk (an SH3 adaptor protein regulating synaptic growth; Coyleet al. 2004b; O’Connor-Gileset al. 2008; Rodalet al. 2008a), SmydA-9 (an arthropod-specific SET and MYND domain; Thompson And Travers 2008), cpo (controls various neurological functions; Bellenet al. 1992a; Bellenet al. 1992b) (Glasscock And Tanouye 2005), Prosap (also an SH3 domain containing proteins involved in neuronal development; Diaganaet al. 2002; Harriset al. 2016; Wuet al. 2017), jus (a gene with links to seizures; Zhanget al. 2002; Marley And Baines 2011; Deanet al. 2018), and Jarid2 (a Jumonji C domain-containing lysine demethylase; Sasaiet al. 2007; Herzet al. 2012). Two of the mutant lines (jus and Jarid2) obtained were very sickly, and we were unable to conduct the exercise quantification assay we normally carry out. Thus, our analysis focuses on eIF5B, Su(z)2, wde, Cirl, nwk, cpo, SmydA-9, and Prosap.
Because the mutant and RNAi lines we obtained for this analysis were in unknown genetic backgrounds, we compared their activity measurements to our results from the DGRP2 strains, with the understanding that this strain collection provided a good overview of activity levels – basal and induced - within Drosophila in general (Figure 6 and Table S1). As with the initial analysis of the DGRP2, activity levels were measured for each sex independently. For the eight candidate genes assayed in both sexes, the basal activity levels for nine sex/genotype combinations are in the top or bottom 20% of the phenotypic distribution. For the induced activity levels, seven of the sex/genotype combinations are amongst the 20% most extreme phenotypes measured in the DGRP2 (Figure 6 and Table S2). One of these genes stands out, as in both sexes, mutants in eIF5B showed some of the lowest basal and induced activity levels we recorded. These animals generally were rather sickly, suggesting that the DGRP2 must harbor a weak allele of this locus. For other mutants, the results were more mixed, with some affecting one sex more strongly than the other, or affecting only basal and not induced exercise or vice versa. For example, Su(z)2 knockdown in the nervous system led to low basal and induced activity levels in females, but did not consistently impact males. In contrast, Cirl mutant animals showed relatively low induced activity in males, but relatively high levels in females, and relatively normal basal activity levels. wde mutant animals show the opposite: high basal activity in males, low basal activity levels in females, and relatively normal activity levels after exercise induction. These results are not surprising, as a significant number of the SNPs identified in the GWAS only impacted one sex, and most candidate genes were specific to either basal or induced activity. Overall, we see that approximately half of the sex/genotype combinations in our candidate mutants or knockdown study exhibit phenotypes at the extreme ends of the DGRP2 phenotypic distribution. This finding further validates the GWAS analysis presented here and suggests that these candidate genes are worthy of further study.
DISCUSSION
As exercise is widely recommended as part of a healthy lifestyle and as a treatment for obesity, we explored the contribution of genetic variation to exercise in Drosophila using the DGRP2 strain collection. We found extensive variation in activity levels in this population, both basal and exercise-induced, which was dependent on genotype and sex. The GWASs identified over 300 genetic variants and more than 150 genes that contributed to the basal and exercise-induced activity phenotypes. Some of these genes had previously been associated with exercise performance or activity. This group of genes includes couch potato [cpo], an RNA-binding, nuclear protein expressed in the central nervous system that is essential for normal flight behavior (Bellenet al. 1992b; Glasscock And Tanouye 2005; Schmidtet al. 2008). nervous wreck [nwk] encodes a very different kind of protein than cpo, but it also has been shown previously to impact animal activity: mutants in this FCH and SH3 domain-containing adaptor protein show paralysis due to problems at the synapse of neuromuscular junctions (Coyleet al. 2004a; Rodalet al. 2008b). Another gene in this group is bedraggled [bdg], which encodes a putative neurotransmitter transporter, and null mutants of which are described as flightless and uncoordinated (Rawlset al. 2007). The importance of the factors identified in our study is further underscored by the GO terms enrichment analysis, which revealed terms associated with the functions of the central nervous system and its interaction with the musculature. Together, these findings illustrate that the GWAS described here was successful in detecting variants associated with factors involved in the control of animal activity levels and behavior.
In addition to the genes such as cpo, nwk, and bdg, that would have been expected to impact basal activity or geotaxis-induced exercise activity, others candidate genes represent pathways with no clear link to animal activity or behavior. For example, variants in several chromatin proteins were identified as impacting animal activity levels, including SmydA-9, a SET domain protein, Su(z)2, a Polycomb group protein related to PSC, Jarid2, a Jumonji domain protein that interacts with PRC2, and Wde, an essential co-factor of the H3K9 methyltransferase Egg. In addition, several sperm proteins were identified as linked to activity (e. g. S-Lap8n, Sfp36F, Sfp51E) as were several members of the immunoglobulin superfamily (e. g. Side-II, CG31814, CG13506), but how they might contribute to an activity phenotype is unclear. As several of these genes are among a total of 36 genes identified here that were identified also by Schnorrer and colleagues as essential for normal muscle development (Schnorreret al. 2010), it is likely they represent novel pathways linked to activity. Because it includes both unexpected and expected gene classes, the gene set identified here as contributing to both basal and exercise-induced activity levels provides a rich resource for researchers interested in using Drosophila as a tool for the study of exercise.
Our study also revealed that there were significant differences in the genes contributing to activity levels in males and in females, and that several genes could be identified that were responsible for the difference between the sexes. This finding was surprising, as we anticipated that the basic metabolic and sensory pathways involved in the control of animal activity would be conserved between males and females. Mackay and colleagues in 2012 discovered significant differences in the gene networks controlling starvation response and chill coma recovery time, but not startle response, and for all three traits, a large portion of the genetic variants that are identified in one sex show no significant association with the trait in the other sex (Oberet al. 2012). Garlapow and colleagues also found strong sex-specific differences in the factors controlling food intake in the DGRP2 population (Garlapowet al. 2015), and Morozova and colleagues detected sex differences in the networks controlling alcohol sensitivity (Fochleret al. 2017). These findings suggest that for many “basic” traits such as animal activity, sex specific differences in the underlying genetic networks occur frequently, highlighting the importance of studying both sexes.
When the list of candidate genes identified here is compared with those uncovered in other studies of activity traits, substantial overlap can be seen. For example, Jordan and colleagues investigated the genetic basis of the startle response and negative geotaxis in the DGRP2 population using a GWAS (Jordanet al. 2012). They identified approximately 200 genes associated with each of these activity phenotypes, a number of genes similar to that identified in our study. As our exercise system relies on negative geotaxis, not surprisingly, 24 identified by Jordan and colleagues were also identified in the basal activity analysis presented here, and an additional 17 genes from the exercise induced activity analysis also overlapped with the gene set identified by Jordan and colleagues. Five genes were identified as candidates in all three analyses (CG15630, CG33144, ed, nmo, and SKIP) (Jordanet al. 2012). The overlap seen between the candidate genes identified in the two studies indicates that despite the different activity traits measured shared pathways exist.
Interestingly, a second activity study utilizing the DGRP2 identified a completely independent set of candidate genes, showing no overlap with the genes identified here. The study by Rhode and colleagues used video-tracking to monitor the activity of male Drosophila from the DGRP2 in a shallow petri dish by measuring distance traveled in a 5-minute interval (Rohdeet al. 2018). This “2D” activity study focused on specific groups of genes linked to their phenotype, specifically genes involved in transmembrane transport. While this study used a very different algorithm to identify candidate genes, even when their phenotypic measures are analyzed with the standard GWAS tool our study used, we find no overlap in the candidate gene sets identified. This lack of overlap in candidate genes identified by two activity studies illustrates the complexity in the pathways that impact basic animal behaviors such as activity. Given current understanding, genes from basic energy metabolism pathways to genes controlling the development of muscles and sensory organs to genes impacting the processing of sensory information are all involved in controlling activity levels. Thus, it is not surprising that studies using different activity types will identify distinct sets of candidate genes; rather, these findings highlight that additional innovative studies are needed to come to a comprehensive understanding of the genes involved in animal activity, both basal and in response to stimulation.
The functional analysis of candidate genes presented here demonstrates that the REQS can be used successfully to identify genes involved in controlling basal and exercise induced activity levels in the DGRP2. Many of the candidate genes identified show relatively small impacts in the DGRP2, likely due to the presence of weak, not null alleles in this wild-derived population. Despite the fact that most of the candidate genes identified are sex-specific and have weak effects, the functional analysis of candidate genes showed that several clearly impact activity levels in the basal and exercise assays. Likely, with additional analyses including several alleles per gene, RNAi knockdown in tissues other than the central nervous system, and overexpression lines, further candidate genes will be confirmed as impacting activity levels in Drosophila.
Although our analysis identified genes from many different categories of cell and molecular functions, one category of note with regards to exercise response is chromatin modifiers. Our assays on wde and Suz(2) mutants suggest that this group of genes might indeed play a role in controlling exercise activity levels, and possibly exercise response. Changes in an organism’s activity levels have profound impacts, both acute and long-term. Gene expression changes and alterations to the epigenome have been identified as immediate consequences of exercise. For example, there are several studies documenting DNA methylation changes as well as changes in histone acetylation following exercise. These epigenetic changes are one possible mechanism that might mediate the long-term consequences of exercise, many of which can persist even in the absence of further exercise. Thus, it is of interest that two proteins linked to different histone methylation marks, histone 3 lysine 9 methylation (Wde) and histone 3 lysine 27 methylation (Su(z)2), were identified as contributing to the variation in activity levels between individuals in our study, and future studies in the role of these marks with regard to exercise are needed.
In summary, our study has identified promising candidate genes contributing to the variation in activity levels seen between genetically distinct individuals. Many of the genes identified have not been linked to exercise previously, and they thus present novel avenues for exploration to exercise biologists. Given the clear homology relationships between many of the Drosophila genes identified and mammalian genomes, the gene set presented here also provides new research directions for exercise biology studies in rodents. Using the results from model systems such as Drosophila as a guide, translational scientists will be able to accelerate biomarker development to eventually allow medical professionals to prescribe individualized exercise treatments for obesity related diseases and to guide athletes of all kinds.
TABLES
Table 1. Treatment, sex, and genotype show strong effects on activity levels. General linear model analysis of variance reveals the impact of treatment (+/- rotational stimulation), sex, and genotype on the activity phenotype measured.
Table 2. Summary of GWAS results. The table lists the number of genetic variants detected as significant by the different analysis types used in this study.
Supplemental Table S1. Mutant strains and candidate gene analysis. This table contains the genotypes and Bloomington stock numbers of the mutants analyzed, as well as the raw activity data collected from these strains.
Supplemental Table S2. List of DGRP lines included in this study. Table is separated by treatment and sex for each line. Y = Yes, N = No.
Supplemental Table S3. Phenotypic measures used in the GWAS analysis. Summary data as submitted to the DGRP2 GWAS webtool as well as raw data are provided.
Supplemental Table S4. Candidate genetic variants identified as significant by the GWAS. This table includes results from the basal activity GWAS and the exercise-induced activity GWAS.
Supplemental Table S5. Detailed GO term enrichment analysis results for genetic variants associated with basal activity. This table includes the complete GO term enrichment analysis results for the “Cellular component” and “Biological Process” categories for genetic variants associated with basal activity.
ACKNOWLEDGEMENTS
The work described was supported partially by Award Number P30DK056336 from the National Institute Of Diabetes And Digestive And Kidney Diseases through a pilot grant from the UAB Nutrition and Obesity Research Center (NORC) to NCR. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute Of Diabetes And Digestive And Kidney Diseases or the National Institutes of Health.
We would like to thank our undergraduate students Michael Azar, Makayla Nixon, Jamie Hill, and Anthony Foster for their contributions to various aspects of these experiments. The wde mutant stock w;wdeTD63/Cyo[twi:GFP] was obtained from Dr. A. Wodarz (Georg-August-Universität Göttingen, Germany), and Dr. L. Reed (University of Alabama) graciously shared many of the DGRP2 strains with us. Additional stocks obtained from the Bloomington Drosophila Stock Center (NIH P40OD018537) were used in this study.