Abstract
The omnigenic model suggests the existence of core networks of genes for quantitative traits, which are influenced by modifiers that may encompass most, if not all expressed genes in the genome. We have studied pupation site choice behaviour in Drosophila to test this model. Based on a GWA analysis of the Drosophila Genetic Reference Panel (DGRP) stocks, we identify candidate genes and show for disrupted versions of the genes that most are indeed involved in the phenotype. These candidate genes also allowed us to identify a core network and we experimentally confirm the involvement of other members of this core network in the trait. Intriguingly, when randomly choosing 20 non-network genes we also find an involvement in the trait for most of them. Comparison of phenotypic effect sizes suggest that the core network genes have on average stronger effects. Our data thus confirm the predictions of an omnigenic genetic architecture.
Introduction
Organismal phenotypes, i.e. traits measured in whole organisms usually have a quantitative distribution and their genetic architecture can be studied by genome wide analysis (GWA) approaches. In the past years, these approaches have revealed that such phenotypes have a polygenic architecture in the sense that many genes of moderate or small effect contribute to the phenotype. The in depth analysis of the extensive data collected in the framework of the studies on human height have shown that the general practice to use a genome wide statistical significance cut-off to declare loci being involved in a phenotype leave most of the heritability of the trait unaccounted for (Wood et al., 2014; Yang et al., 2011, 2010). Hence, the attention has turned towards the loci falling below this cut-off. A study focussing on these small effect loci has suggested that all, or at least almost all, genes can be expected to contribute to the phenotype. This has led to the suggestion of an omnigenic model for quantitative traits (Boyle et al., 2017; Liu et al., 2019). It assumes that there is a set of core genes forming pathways with special relevance for the phenotype, or disease etiology in human disease studies. These are expected to be modified by many, if not all other genes expressed in the same cells. Although the effect sizes of these other genes are expected to be smaller than those of the core genes, in sum they explain more of the genetic variance or heritability than the set of core genes. However, an explicit test of this model is still lacking.
We assess here predictions of the omnigenic model using an ecologically relevant quantitative behavioural trait in Drosophila melanogaster, namely pupation site choice. The pupal stage is a life history stage found in holometabolous insects undergoing transformation between larval and adult stages (Jones and Reiter, 1975; Price, 1970). The choice of pupation site is known to directly influence the probability of successful eclosion of adult flies (Joshi and Mueller, 1993; Rodriguez et al., 1992; Sokolowski, 1985), as pupae are exposed to many biotic and abiotic risks while immobilized during the 3-4 days of metamorphosis. Patterns of pupation site choices in a number of Drosophila species have been extensively investigated (Erezyilmaz and Stern, 2013; Markow, 1979; Vandal et al., 2012, 2008), and substantial variation has been found both within and between species. Several environmental factors, such as temperature (Schnebel and Grossfield, 1992), light (Markow, 1981; Schnebel and Grossfield, 1986), humidity (Casares et al., 1997; Sokal et al., 1960) and food medium (Harini, 2013; Hodge and Caslaw, 1998) have been shown to contribute to the variation. Biotic factors were also identified, including sex (Casares and Carracedo, 1987), larval development time (Welbergen and Sokolowski 1994), and larval density in the vial (Joshi and Mueller, 1993; Sokolowski and Hansell, 1983).
Previous genetic association studies to explore the genetic basis of pupation site choice in Drosophila were consistent with it being a complex behavioural trait with a polygenic basis (Erezyilmaz and Stern, 2013; Riedl et al., 2007; Sharon J. Bauer and, 1985; Sokolowski and Bauer, 1989). However, although there have been multiple studies on this trait and its genetics, they were so far hampered by limited sample size, as well as limited genetic resolution. We adapted here an automated method for pupal phenotyping (Reeves and Tautz, 2017) to determine the genetics of pupation height choice, while controlling for environmental factors. The automated phenotyping allows analysis of large numbers of pupae, maximising the potential to reach statistical significance even for small effect sizes.
For assessing the genetic contribution to this behavioural trait, we use the extensive genetic resources available for Drosophila. This includes the Drosophila Genetic Reference Panel (DGRP) as a resource for association mapping, as well as the gene disruption lines maintained at the Bloomington Drosophila Stock Centre (BDSC).
The DGRP consists of a population of more than 200 highly inbred Drosophila melanogaster strains derived from a wild population (Huang et al., 2014; MacKay et al., 2012), which has been successfully applied to identify the association between a broad range of phenotypes and their underlying genetic basis (Dembeck et al., 2015; Durham et al., 2014; Huang et al., 2014), including several behavioural traits (Lee et al., 2017; Rohde et al., 2017; Shorter et al., 2015; Xue et al., 2017). Together with the well-resolved richness in genetic polymorphisms and rapid decay in linkage disequilibrium (LD) in these strains (Huang et al., 2014), these make the DGRP a great panel to study the genetic basis of pupation height choice in Drosophila melanogaster at a fine scale resolution. A potential drawback of the lines is that they are limited in number, implying that genetic effect sizes of specific loci need to be large enough to be captured with a genome-wide significance threshold. However, as indicated above, it has become increasingly clear that the use of such a threshold is too stringent anyway. The efforts to avoid false positive associations by such thresholds are blocking the way towards understanding the genetic networks underlying a complex trait. An alternative approach is therefore to use lower statistical thresholds and to combine this with a direct test of the candidate genes that are uncovered in this way. The BDSC gene disruption lines cover almost all genes of Drosophila and can therefore be used to test most of GWA candidate genes for a possible involvement in the trait.
In the present study, we go beyond a standard GWA combined with experimental gene confirmation, in order to explore two further aspects. First, we ask what the candidate genes reveal on the possible core network of the trait and then further use genes from the predicted network to test them for their effects. Second, following the predictions of the omnigenic model, we explore also the effects of genes randomly chosen among genes expressed in the respective life stage. We find that not only a large fraction of the network predicted genes, but also of the randomly chosen genes have an effect on the pupation height phenotype, whereby the phenotypic effect sizes of the core network genes are higher than those of the randomly chosen genes. We conclude that these findings support predictions of an omnigenic genetic architecture for quantitative traits.
Results
Phenotyping
The acquisition of phenotype data from a large number of individuals is a prerequisite for high resolution genetic mapping studies. Instead of using the mostly manual measurement based approach from previous studies on pupation height (Erezyilmaz and Stern, 2013; Riedl et al., 2007; Sharon J. Bauer and, 1985; Sokolowski and Bauer, 1989), we adapted an earlier image-analysis based phenotyping pipeline (Reeves and Tautz, 2017). This pipeline was initially developed for the high-throughput measurement of pupal case length, and was shown to have the capability for the automatic detection of pupae with a high precision (Reeves and Tautz, 2017). We modified it for the purpose of the measurement of pupation height choice, defined as the distance from the vertical coordinate of pupation site (pupal center) to the food surface in the vial in millimetre (mm). Figure 1 shows an example of the automated measurement of pupation height.
Density of individuals within the vial in which they develop is a common major environmental covariate of many Drosophila traits, including pupation site choice (Joshi and Mueller, 1993; Sokolowski and Hansell, 1983). In the present study, individual density variation was controlled in an indirect manner through limiting the number of parents used per vial (10 females for wildtype strains, and 15 females for inbred strains), and restricting the number of nights they remained before being removed (1-2 nights). Still, some variation was apparent that required to be addressed. Based on a set of test experiments, we defined minimal sampling rules of at least 15 measured pupae per vial and at least six replicate vials per strain. Further, in order to minimize the influence of individual density within vials, an average slope (0.145) across all tested stocks was used to correct the mean estimate of pupation height in all vials.
Variation in pupation height
Two distinct sets of strains were used to explore the variance in pupation height choice. The first dataset consisted of 14 natural wild-type Drosophila melanogaster strains collected from different parts of the world (Supplementary file 1A). The second dataset was from the Drosophila Genetic Reference Panel (DGRP) (Huang et al., 2014; MacKay et al., 2012) and included 198 lines (Supplementary file 1B). In order to correct for environmental factors, especially cryptic differences in humidity (Casares et al., 1997; Sokal et al., 1960), two wildtype stocks (S-317 and S-314) representing two extremes of pupation height from the first strain set and showing a consistent trend, were continually re-measured to act as controls throughout all experiments. The estimates of pupation height for the strains were corrected based on the average pupation height of the two control stocks across all rounds of experiments.
Figure 2 shows the profiles of corrected pupation height from the wildtype and DGRP sets of strains. We observe large variation of pupation height among strains, ranging from pupation height of only 15 mm above the food medium up to the very highest possible position adjacent to the plug surface of the vial (50 mm). The global wildtype lines showed no obvious geographical clustering of pupation heights. The spread of pupation height among the DGRP stocks exceeds that of the wildtype stocks, suggesting that they capture at least a major part of the existing variation in D. melanogaster.
Sexual dimorphism and parental effects
Sexual dimorphism, the condition where sexes from the same species exhibit different characteristics for morphological or behavioural traits, is a commonly observed phenomenon. Regarding the pupation site choice in Drosophila, a controversy on the existence of sexual dimorphism has persisted for several decades. Early studies have reported no sex dimorphism on pupation site choice in Drosophila (Markow, 1979; Sokolowski and Bauer, 1989; Welbergen and Sokolowski, 1994), while the results from other studies showed that males pupated significantly higher than females (Casares and Carracedo, 1987; Riedl et al., 2007).
To address the sexual dimorphism question, a distinct dataset incorporating both pupation height and sex information of more than 4,000 randomly selected (2,340 females and 1,935 males) individuals from 728 vials generated from the 4-way pedigree dataset reported by (Reeves and Tautz, 2017) was analysed. Since this earlier study had not recorded the level of the food surface, the pupation height for each sexed pupae was calculated as the deviation from the corresponding vial (on which the pupae locates) to average pupation height (Figure 3A) (Reeves and Tautz, 2017). As shown in Figure 3B, there was a significant difference in pupation height between males and females (Wilcoxon rank sum test: P-value 1.5E-07), with male individuals pupating on average around 2 mm higher than females. This result is roughly in line with the observed sexual dimorphism reported in two previous studies (Casares and Carracedo, 1987; Riedl et al., 2007). The difference in pupation site choice between the sexes may be due to their distinct developmental timing, as females generally pupate later than males, and later larvae tend to pupate lower, possibly due to a response to diminishing levels of humidity inside the vials (Casares and Carracedo, 1987).
A parental bias, by which the phenotype of an individual depends more upon the mother’s or father’s phenotype or genotype, can be observed for some traits. This can be the result of inheritance of genetic material in the cytoplasm (e.g., mitochondria, Wolbachia bacteria), sex chromosomes, or imprinted gene regions. Previous studies on this aspect for pupation site selection provided two opposing views, with one suggesting the pupation site choice in Drosophila fits a simple additive model of inheritance without any parental bias (Sokolowski and Bauer, 1989), while the others found a significant maternal effect on pupation site selection (Garcia-Flores et al., 1989; Singh and Pandey, 1993). To address this question, two pairs of DGRP inbred lines with each pair representing two extremes of pupation height were randomly chosen, and were reciprocally crossed to test for parental biases. As shown in Figure 3C and 3D, the pupation heights for offspring from both directions lie between their parental stocks, with no significant differences (Wilcoxon rank sum test: P-values 0.11 and 0.17) on pupation height choice in reciprocal crosses. This finding supports the additive model of inheritance on pupation site selection in Drosophila melanogaster (Sokolowski and Bauer, 1989).
Wolbachia pipientis is a maternally transmitted endosymbiotic bacterium that infects around 53% of DGRP strains. It was reported to have a significant effect on some behavioural traits, e.g., acute and chronic resistance to oxidative stress (Huang et al., 2014). Two different approaches were used to explore a possible effect of Wolbachia infection on pupation height. First, the statistical analysis on the pupation height between all tested strains with infected and non-infected strains exhibited no significant difference of pupation height (Wilcoxon rank sum test: P-value 0.29, (Figure 3 - figure supplement 1). Second, the experimental phenotypic comparison between pupation height of three randomly selected DGRP strains with Wolbachia infection and those after the removal of Wolbachia using tetracycline treatment showed no significant statistical difference on pupation height choice for all tested strains (Figure 3 - figure supplement 2). Accordingly, the Wolbachia infection on DGRP strains was not incorporated in the association analysis below, as both the indirect and direct evidences described above reveal no strong effect on pupation height choice.
Heritability and chromosome effects
An estimate of the total genetic component of pupation height choice, i.e., broad sense heritability (H2), was achieved by determining the proportion of total variance in the mean strain height measurements compared to the average within each strain (Schmidt et al., 2017). The additive genetic impact, or narrow sense heritability (h2), is reported here as “SNP heritability” (Wray et al., 2013), i.e., the estimate of the proportion of phenotypic variance explained by all available SNPs (or genetic variants) in the DGRP stocks.
All estimates are shown in Table 1. Values of H2 of 0.64 (0.70 for wildtype strains) and h2 of 0.46 (SE: 0.2) based on the estimates from DGRP inbred stocks imply higher heritability than that from previous estimates within this species (Casares and Carracedo, 1986; Garcia-Flores et al., 1989; Singh and Pandey, 1993).
Partitioning the variance by chromosome reveals that all apart chromosome 4 contribute a substantial part to the variance of pupation height (Table 1 - table supplement1). The minimal contribution from chromosome 4 can be ascribed to the limited number of genetic variants within this chromosome. In line with previous reports (BAUER and SOKOLOWSKI 1985; Sokolowski and Bauer 1989), we find a somewhat higher contribution of autosomes to the variance of pupation height and also a slightly larger effect from chromosome 2 compared to chromosome 3 (Sharon J. Bauer and, 1985).
Genome-wide association analysis
The GWA analysis was based on the genetic variants of DGRP freeze 2 (Huang et al., 2014), variants with missing values above 20% and minor allele frequency below 5% were excluded from the further analysis. The linear regression model implemented in PLINK (Purcell et al., 2007) was used for the genome wide association analysis, including the assessments of possible covariates.
A previous study had shown a possible role of larval size on their pupation site choice (Vandal et al., 2012). Here we use pupal case length as an indicator of larval size (Reeves and Tautz, 2017). We find that there is indeed a weak, but significant negative correlation between pupal case length and pupation height (Figure 4 - figure supplement 1, panel A, Pearson’s correlation test, R-square 0.02, P-value 3.7E-07). Therefore, a second GWA was performed using pupation height as phenotype and pupal case length as covariate. This analysis revealed an extremely strong correlation between P-values of GWA with and without pupal case length as a covariate (Figure 4 - figure supplement 1, panel B, Pearson correlation test, R square 0.99, P-value < 2.2E-16) and no major difference in the q-q plots (Figure 4 - figure supplement 1, panels C and D). Based on these analyses, we conclude that the identified significant genetic variants on pupation height are mostly independent from its association with pupal case length.
To investigate whether any cryptic population structure could contribute to the observed variation in pupation site choice of inbred stocks, PLINK (Purcell et al., 2007) was used to identify major principal components (PC) of genetic variants in the DGRP strains. Only one major cluster was found in these stocks based on PC analysis (Huang et al., 2014), and no obvious clusters, within which the strains share the same pupation height (Figure 4 - figure supplement 2). Moreover, only three out of top 20 PCs showed significant (Pearson’s correlation test, P-value<0.05) correlations with pupation height, while all of them can only explain low fractions (3% to 6%, based on R2 values) of observed phenotypic variance (Figure 4 - figure supplement 3), for which the slight correlations seem more likely to come from a few outlier individuals.
Given their driving force on population divergence and speciation (Hoffmann and Rieseberg, 2008), major genomic inversions in DGRP strains might contribute to the observed population structure and the association between population structure and pupation height. The systematic test for the correlations between genomic inversion status in DGRP strains and the first two principal components of genomic variation showed significant effects only from In(2L)t and In(3R)Mo (Figure 4 - figure supplement 4), indicating their possible roles in population subdivisions (Hoffmann and Rieseberg, 2008). However, there is no significant association between pupation height and all tested genomic inversions, including In(2L)t and In(3R)Mo.
The original DGRP lines were constructed such that population structure effects should be minimized, but some genetic relatedness leading to cryptic population structure might still exist (Mackay and Huang, 2018). Hence, for identifying candidate loci, one could use different types of correction of population structure. While such corrections should reduce the number of false positive associations, they compromise also the power to detect true associations (Price et al., 2010). Hence, we decided to use the uncorrected data for a first list of candidate loci from which we chose genes for further functional analysis (Figure 4).
Candidate genes from the GWA analysis
When using genome-wide permutation based thresholds (P-value < 5E-08, see Methods), we find no significant genetic variants associating with pupation height (Figure 4). We decided therefore to use the more permissive significance cut-off of P-value < 1E-05, which is a nominal threshold frequently used in Drosophila quantitative trait genetic studies (Lee et al., 2017). At this cut-off, we found 28 significant genetic variants (25 SNPs and 3 indels) to be associated with pupation height in the DGRP strains, corresponding to 71 associating genes that locate within 5kb up/down-stream (default setting in SnpEff (Cingolani et al., 2012)) of these genetic variants (Table 2).
To identify additional candidate genes associated with the variants, we examined the long range linkage disequilibrium (LD) between pairs of detected candidate variants and with other genetic variants found in the DGRP strains. No significant linkages between physically distant (>=1Mb, with r2 >= 0.8) genetic variants were found, suggesting the associations in this population are not confounded by long range LD. LD blocks were then calculated for each significant genetic variant with a commonly used threshold r2 = 0.8 (Pallares et al., 2014), and 10 significant LD blocks were found with average block size of 5.3 kb (Table 2 – table supplement 1). No pairs of identified significant variants were found in the same LD block. This finding is in line with the observation of a rapid decay of LD in the panel strains reported in the original DGRP resource reference (Huang et al., 2014). Combining the additional genes identified in the above LD blocks, we identified in total 81 candidate genes associating with pupation height variation in Drosophila melanogaster. None of these identified candidate genes overlap with the previously reported pupation height association QTL at 56A01-C11 in (Riedl et al., 2007).
Phenotype confirmations
The advantage of Drosophila as a model system is that one can use mutant alleles that have been constructed in a common co-isogenic background to test whether different alleles in genes implicated by the GWA analyses indeed affect pupation site choice. Gene disruption lines were available for 16 candidate loci (with disruption in the coding region of at least 1 gene for each locus) within the 28 identified associated loci. Nine constitute transposon insertion mutations (Bellen et al., 2011) and seven constitute small deficiencies (Supplementary file 1C). Experimental tests generally involved replicated phenotypic comparisons between the co-isogenic progenitor stock and homozygous or heterozygous disruptions of the target genes (see Methods).
An overview on the measured pupation height differences compared to the respective progenitor stocks is provided in Figure 5. Twelve out of 16 tested candidate loci showed a significant difference, five with an increased height, and seven with a decreased one. The eight transposon insertion stocks that are homozygous viable (completely viable or semi-viable) show a decreased (Smr, Scrib, Ocnus, Lip4 and Send1) or increased (CG11891, CG42613, and Pellino) pupation height, two of which only in tendency (Wilcoxon rank sum test, P-value >0.05).
The other two stocks that show no significant overall change (Cp62Bc and Arl6IP1/nonA-l/Fas1) are deletion stocks that are homozygous lethal, i.e., the fact that we did not find an effect on pupation height in hemizygous state does not rule out the possibility that other alleles would show it. Two other homozygous deletion lethal strains (corresponding to genes CG7029 and Oatp74D/Edin) do show a significant influence on pupal height status, implying a haplo-insufficient or dominant effect. Four further homozygous lethal strains which could only be tested under conditions where 50% of the pupae should be hemizygotes (see Methods for details), (corresponding to one gene CG33970 from transposon insertion disruption, and three loci CG15270, CG13449/CG7650/RhoGAP71E/Comm3 and CG7567 from DNA segment deletion disruption) show significant effects as well, indicating a particularly strong haplo-insufficient or dominant effect. However, each of the deletions affects multiple genes (Supplementary file 1C), leaving it open whether another gene in the region causes the effect.
Expression and network analysis
Pupation follows the third instar larval (L3) stage in Drosophila. Hence, one can ask whether genes involved in pupation site choice are more likely to have a substantial expression at this stage, especially in the central nervous system, given that this is a behavioural phenotype. To test this hypothesis, the gene expression profiling data from the Drosophila modENCODE project (Brown et al., 2014) were explored, which measured the genome-wide gene expression for five tissues in L3 stages in Drosophila melanogaster, including central nervous system (CNS), digestive system, fat tissue, imaginal disc and salivary glands. As expected, larger fractions of expressed genes (RPKM > 0) for the identified candidate gene dataset among all tested tissues in L3 stages were found, compared with the expression profiling of total annotated protein-coding genes as controls (Figure 6A). We also find that the median expression levels of candidate genes compared to the randomly selected genes are significantly higher in the CNS (Wilcoxon rank sum test, P-value=0.006), but not other tissues (Figure 6B). This observation is consistent with the above prediction that the identified candidate genes for pupation site choice are enriched in the CNS of third instar larvae, where they could have a direct influence on behaviour.
For 18 out of 81 (22%) genes in the 28 candidate regions, their genetic interactions have been documented in Flybase v6.19 (http://www.flybase.org) (Attrill et al., 2016). We used this information to construct a computationally predicted network of genetically interacting genes, allowing one intermediate gene (i.e., a non-candidate gene connecting two candidate genes). This analysis revealed a network of 7 candidate genes from the GWA analyses and 17 computationally recruited intermediate genes (Figure 7). The probability that this network would have arisen when the same number of genes are randomly sampled is very low (p < 2.2E-16).
Given this network, we asked whether functional tests with genes from the network would confirm their involvement in the pupation height choice phenotype. Hence, we analysed the phenotypic effects of six computationally predicted genes (Scrib, Pnt, Egfr, E2f1, p53 and Ras85D) in the above network (Scrib has been tested in the section of phenotype confirmations, see Figure 5), via direct comparisons of pupation height status between the co-isogenic progenitor stock and the respective gene disruption lines (Table 3) (strain details in Supplementary file 1D). Five of the six genes showed indeed a significant phenotypic effect on pupation height (P-value < 0.05), and the results were consistent for the disruption of different alleles for the same gene.
Phenotypic effects of randomly chosen genes
The omnigenic model (Boyle et al., 2017) predicts that most, if not all genes may modify the core network of a given quantitative trait, at least when they are expressed in the relevant developmental stage and organ(s). We have therefore set out to use our phenotyping pipeline to test this prediction. We selected 20 genes from the panel of Drosophila gene disruption stocks (Bellen et al., 2011) using the following criteria: 1) with expression in the CNS of third instar larvae stage (RPKM >0), 2) with transposon (Minos) disruption in the gene coding region, 3) derived from the same co-isogenic progenitor stock, and 4) homozygous disruption viable (strain details in Supplementary file 1E). The latter criterion biases against essential genes (approximately 30% are not homozygous viable), but otherwise the selection was essentially random. Intriguingly, we find again that the majority of genes (12 of 20) shows strongly significant effects (p < 0.01) and six additional ones marginally significant effects (0.01 < p < 0.05) on the pupation height choice phenotype (Table 3).
Figure 8 compares the phenotypic effect sizes of the strains tested in this study with their respective pupation height GWAS p-values. It shows that the genes picked because of their GWA significance have not necessarily the largest phenotypic effects. However, the genes predicted from the network analysis (i.e., Egfr, E2f1, P53 and Ras85D), while not picked from GWA significance, show on average stronger effects on pupation height than other genes (Figure 8A, Wilcoxon rank sum test P-value: 2.5E-05). However, one could argue that these genes are general cell regulators that might affect many phenotypes. Hence, we checked also their effects on pupal case length that we measure with the same setup. We find that the effect sizes of these four genes on pupae case length are not particularly pronounced (Figure 8B) and are indistinguishable from average effect sizes of the randomly selected genes (Wilcoxon rank sum test P-value: 0.61). This supports the notion that the predicted network genes might indeed constitute the “core” genes of pupation height choice in Drosophila melanogaster.
However, one has to keep in mind that these strains may 1) have different genetic backgrounds, 2) have disruption in different regions, either coding regions or regulatory regions, 3) some of them only have the haplo-insufficient effects tested, and 4) some of them include deletions of multiple genes that required more complex crosses to detect their phenotypes (Supplementary file 1C). The first two factors seem to play only minor roles in the phenotype differences, given the observation that similar phenotypic effects can be found for the disruption of the same gene from difference genetic background (Experimental result of Pnt, see Table 3), and different gene regions (Experimental result of P53, see Table 3). After removing gene disruption stocks with only haplo-insufficient effects tested and from segmental deletion, the genes predicted from the network analysis still show on average stronger phenotype effects on pupation height (Wilcoxon rank sum test P-value: 6.7E-05, see Figure 8 – figure supplement 1A), but not pupae case length (Wilcoxon rank sum test P-value: 0.45, see Figure 8 – figure supplement 1B).
Discussion
We have established a phenotyping pipeline for a behavioural trait in Drosophila that has allowed us to test predictions of the omnigenic model for quantitative traits (Boyle et al., 2017; Liu et al., 2019). Although it is debated whether the term is more useful than the long established terms “polygenic” and “infinitesimal” (Wray et al., 2018), the analysis by (Boyle et al., 2017) has certainly sparked new interest in this almost century-old question. Moreover, it implicitly includes the concept of core networks and their modifiers, which is a step forward compared to the previous definitions. But independent of the relative novelty, we are only now coming into a phase where predictions from these models can be directly tested. Most of the evidence has so far come from human studies, which are often focussed on disease questions and their associated special considerations and limitations (Wray et al., 2018). But for well-developed genetic model systems, such as Drosophila, one can do direct genetic experiments.
There is a fast increasing number of studies based on the DGRP panel that show high heritability of traits (H2 > 0.5), but at the same time a polygenic architecture, even for cases where candidate genes have been predicted. This includes, for example, taste sensitivity to sugars (H2 of 0.63) (Uchizono and Tanimura, 2017), sensitivity to lead toxicity (H2 of 0.76) (Zhou et al., 2016), aggression (H2 of 0.69) (Shorter et al., 2015), DDT resistance (H2 of 0.8) (Schmidt et al., 2017), and adult foraging behaviour (h2 of 0.52) (Lee et al., 2017). Hence, our finding and H2 of 0.64 (h2 of 0.46) for the pupation site choice and no major loci with genome wide significance is well within the framework of these other studies. Hence, it seems safe to assume that most studies on quantitative traits in this system will yield similar results.
But apart from stating that many genes of small effects are involved in a quantitative trait, the omnigenic concept makes two predictions. The first is that although very many loci may influence a trait, there would still be a set of core genes in a closely interacting network, the action of which is essential, while the other genes are modifiers. The second is that essentially all genes expressed in the relevant tissue and stage may be involved in the trait. We find that both of these predictions are fulfilled in our tests. We can identify a core network that makes predictions for other relevant genes in the network (Figure 7 and 8). And we find that a large fraction of essentially randomly chosen genes have an effect on the phenotype (Figure 8 and Table 3).
Identification of a core network
Using the GWA ad hoc threshold of p < 1.0E-05, we were able to identify a set of 81 candidate genes within 28 associated genetic loci with significantly higher expression in CNS of L3 stage. Further, an interacting network was predicted among them, and the phenotypic effects on pupation height choice of five gene components from the network were experimentally confirmed. These include the well-studied gene scribble which encodes a scaffolding protein that is part of the conserved machinery regulating apicobasal polarity and organizes the synaptic architecture (Roche et al., 2002). This gene has also been reported to be associated with several other behavioural traits in Drosophila melanogaster, including olfactory behaviour (Ganguly et al., 2003), adult foraging (Lee et al., 2017) and sleep (Harbison et al., 2013). Another well-studied gene Egfr, which is the transmembrane tyrosine kinase receptor for signalling ligands in the TGFα family, was also be found to function in neuronal development and behaviour traits in Drosophila (King et al., 2014; Potdar and Sheeba, 2013). Ras85D encodes a protein that acts downstream of several cell signals, most notably from Receptor Tyrosine Kinases (RTK), and has been reported to be involved in pupal size determination (Li et al., 2016). Another component of the network is p53, which is a general regulator of the cell cycle, but which has also been found to be involved in central nervous system development in Drosophila (Bauer et al., 2007) and behavioural traits, such as the entrainment of the circadian rhythm in mice (Hamada et al., 2014).
GWA versus genetics
The GWA p-value for the randomly chosen genes is below any threshold that one would normally consider using. Accordingly, none of them would have been identified as candidate genes. Most of them have been little studied so far and almost half of them have not even been named as yet (Table 3). Hence, they are indeed likely to act mostly as modifiers of other pathways. The reason why they have not shown up in the GWA could simply be that they do not include segregating variants of sufficient effect size in the population from which the DGRP was derived. In fact, one can expect that most genetic variants present in a natural Drosophila population are unlikely to represent gene disabling mutations. And the variants that are gene disabling should be rare, i.e., should seldom occur as homozygotes. Hence, to test these genes as homozygous gene disruptions is rather unnatural. Still, it is an indication that the gene is involved in some form in the phenotype. But this is a general issue when comparing gene effects from a GWA analysis with those from classic genetic analyses. The former trace the effects of naturally occurring variants, the latter the ones of gene disruptions. These converge only for human genetic disease studies, but not for studies on natural genetic variation of quantitative traits. One has to keep this dichotomy of genetic views in mind when placing GWA results in the context of classic genetic results.
Conclusion
Our data confirm three major components of the omnigenic genetic architecture, namely that the trait under investigation is polygenic, that there is an underlying core network and that many randomly chosen genes can influence the trait. It should be possible to apply this test also to other phenotypes or genetic systems where the necessary stocks or experimental procedures are available. Evidently, if almost any random gene is involved in a given phenotype, why should one then do a GWA in the first place? Hence, it should be of special interest to study whether GWA p-values provide generally a guide to underlying core networks, as we have found here.
Materials and Methods
Drosophila strains
The list of strains of wildtype, DGRP inbred lines, transposon insertion /deficiency stocks, and their progenitor lines used in this study and their detailed information is provided in supplementary file 1. Flies were reared under standard culture conditions (cornmeal–molasses–agar medium, 24°C, 55–75% relative humidity, 12h light/dark cycle). A HOBO® data logger was placed in the incubator to monitor and record the environmental changes, i.e., temperature, light and humidity, across all the experimental period.
Automated phenotyping of pupation height
A previously established automated pupal case length detection pipeline was adopted and modified for the automatic screening of pupation height measurements (Reeves and Tautz, 2017).
In brief, standard food was dispensed into 28.5 mm diameter and 95 mm height vials (Genesee Scientific), and the food height (defined as the distance from the surface of the food to the bottom of the vial) for each vial was manually measured and recorded. Once the food vials had fully cooled, 10.1 cm × 10.5 cm squares of overhead projector film (nobo, plain paper copier film, 33638237) were slid into each vial lining their entire vertical wall. Approximately 10 healthy female flies (15 for inbred stocks) and 5 healthy male flies were introduced into each vial, for which a custom printed semi-transparent label (GA International Inc.), including a unique barcode, was affixed to the outside of each vial. Adult flies were removed from the vials after 1-2 days and vials were kept in the same incubation condition (see above) for another 8-9 days to allow them to reach pupation stage. In general, by the 10th day after the parents were initially introduced, the majority of offspring in the vials were present as pupae attached to the transparent film. The film was gently moved out from each vial, the food from the lower part was scrapped away and any larvae or pupa at white puparium stage (P1) were removed. The film was then placed into a pre-made plastic frame, which holds the film flat for further photographing using bottom illumination in a light tight box. Batches of the resulting images were then introduced into the image analysis procedure.
An open-source public domain image analysis software called CellProfiler (v2.1.0) was applied for the simultaneous recognition of pupae and measurements of a variety of attributes, with a customized pipeline adopted and slightly modified from (Reeves and Tautz, 2017). In brief, any “primary object” with significant distinction from the background was first identified without restriction on their sizes (module: identify primary objects). By using the module called “Untangle Worms”, the above identified objects composing of multiple touching pupae were disentangled into distinct pupae. Furthermore, the resulting putative pupae were shrunk and re-propagated outwards for a more precise detection of the edges of each pupa based on boundary changes in pixel intensity (module “Identify secondary objects”). Finally, distinct attributes for the pupae were calculated and a specific confidence class was assigned for each pupa based on its size attribute. The digital outlines of pupae were overlaid onto a cropped version of the original image for easy visualization. Different from the pipeline described in (Reeves and Tautz, 2017), the methods used to distinguish clumped objects and to draw dividing lines between clumped objects were changed to be based on “shape”, as this increases the power of CellProfiler to resolve pupae in close proximity. A manual check on 40 randomly selected films showed that the CellProfiler pipeline can successfully identify 96% of true pupae (sensitivity), with an accuracy of 81% for identified putative pupae. To further improve the detection accuracy, an additional refinement criterion was defined based on the size attributes of “true” pupae from manual curation. Applying the new criteria, the accuracy for pupae detection was improved to around 99.85%, with only a tiny fraction (< 0.7%) of loss for true positive results.
In addition, a 1€ cent coin (16.25 mm diameter) was included in each image, for the control of camera coordinate changes and the conversion of measurements in pixels to millimetres. The pupation height for each pupal was calculated as the subtraction of the vial food height from the vertical coordinate measurement (CellProfiler parameter: Areashape_Y). Overlaid images and files with a variety of attribute measurements were imported into a FileMaker database (v14, FileMaker Inc.). The quality filtering of pupae and related analysis were conducted with the tools implemented in the database.
Treatment of confounding factors
Pupal density in the vial is a biotic factor that could affect the pupation site selection preference of third instar larvae (Joshi and Mueller, 1993; Sokolowski and Hansell, 1983). Here, individually density was controlled through limiting the number of parents used per vial, and restricting the number of nights they remained before being cleared (see above). To further reduce the possible bias from low sampling effect, only vials with a pupal density of a minimum of 15 were considered as reliable, and a measurement for each stock should include at least 6 such reliable vial measurements. All of the tested stocks exhibited a uniformly positive relationship between individual density and pupation height estimate. The following equation was used to correct the influence of individual density in the vial on the mean estimate of pupation height:
O: pupation height vial mean to be corrected
D: automated estimate of individual density in the vial to be corrected
M: average vial density across whole experiment (set as 70)
S: slope of regression of individual density against pupation height mean vial (set as 0.145)
To correct the influence from the change of incubator humidity and other cryptic abiotic factors, two wildtype stocks representing two extreme sides on pupation height (S-317 and S-314) were included and measured in each round of experiments for the phenotyping measurements of DGRP inbred stocks. The correction on incubation environment change was achieved with the following equation:
O: pupation height of target DGRP stock from original measurement with correction of pupae density
H: pupation height measurement of high stock (S-314) for current round of experiment with correction of pupae density
Hu: average pupation height measurement of high stock (S-314) across all rounds of experiments with correction of pupae density
L: pupation height measurement of low stock (S-317) for current round of experiment with correction of pupae density
Lu: average pupation height measurement of low stock (S-317) across all rounds of experiments with correction of pupae density
Automatic measurement of pupal case length
The measurement of pupal case length followed the procedure described in (Reeves and Tautz, 2017). In brief, the pupal case length was defined as the length of the major axis of the ellipse that has the same normalized second central moments as the region of identified pupae, measured with the “Areashape_MajorAxisLength” index in CellProfiler. As the pupal case lengths are relatively robust to the pupal density in the vial and the minor change of incubator humidity, the measurement for pupal case length was not corrected for these factors.
Wolbachia infection, sex and maternal/paternal effect test
Two different approaches were exploited to test whether there is any effect on pupation site status from Wolbachia infection: 1) One indirect way applied here was to compare the difference of pupation height between Wolbachia-infected stocks and Wolbachia-uninfected stocks. 2) Three DGRP randomly chosen inbred lines with Wolbachia infection were used to create Wolbachia-free stocks through two generations of tetracycline treatment (by adding an appropriate volume of 100 mg/ml of tetracycline suspended in 99% ethanol to the surface of the solid prepared food) and then reared for at least another two generations with standard food to avoid any detrimental parental effects (Zeh et al., 2012). Meanwhile, half the flies from same three strains were also reared with standard food across the experiment as controls. Genomic DNA from the above 6 stocks was extracted individually using DNeasy blood and tissue kit (Qiagen), and the purity and concentration of the resulting DNA was measured with NanoDrop ND-1000 spectrophotometer (Thermofisher). A diagnostic PCR to test for the presence of the Wolbachia wsp gene was done using the primers wsp81F (5’-tggtccaaaatgtgagaaac-3’) and wsp691r (5’-aaaattaaacgctactcca-3’) (Richardson et al., 2012). The conditions for this diagnostic reaction were 35 cycles of 94°C for 15 seconds, 55°C for 30 seconds and 72°C for 1 minute. The expected PCR product length is around 630 bp. A standard (1%) agarose gel electrophoresis was used to test for the presence of the PCR product, with the broad range Quick DNA Marker (NEB #N0303) as loading ladder. Pupation height between the three Wolbachia-infected and the three Wolbachia-free lines were then measured and compared. In addition, the same procedure was applied to three randomly selected DGRP Wolbachia-uninfected lines to exclude the possibility that tetracycline treatment could have had an influence on pupation height.
One previously published dataset (Reeves and Tautz, 2017) consisting of pupation height and sex information on individuals was exploited to explore the presence of sexual dimorphism on pupation site status. In brief, 2,340 female pupae and 1,935 male pupae from 728 vials were randomly selected and their pupation site coordinate was measured and recorded. Deviation values from the corresponding vial average for all the sexed pupae were calculated, and the average deviation between two sexes were compared.
A reciprocal crossing approach was used to detect if any maternal, e.g., genetic effect from mitochondria, or paternal effect for pupation site selection. Two pairs of high-low pupation height combinational DGRP inbred lines (DR_21 and DR_99; DR_73 and DR_81) were selected for this analysis. The pupation height of F1 offspring from two way of crossing, i.e., virgin females from high stock crossing with males from low stock, and vice versa, were measured and compared with the phenotype of their parental stocks.
Estimates of heritability
The broad sense heritability (H2) was estimated with the variance components of a linear model of the form: Phenotype = Population mean + Line effect + error (Schmidt et al., 2017). Total phenotypic variance was estimated as Genetic Variance + Environmental Variance, and the H2 was thus estimated as Gv/Gv+Ev. This was implemented in IBM SPSS Statistics (version 22), with pupation height as the dependent variable and DGRP IL names as a random factor. Additionally, the measured number of pupae in each vial and the average humidity status during each round of experiment period were taken as covariates in the statistic model, to see whether the estimation was much influenced or not.
The narrow sense heritability was estimated as the proportion of variance in a phenotype explained by all available genetic variants used for mapping, an estimate that is often called “SNP heritability” (Wray et al., 2013). In practical, a genetic relationship matrix (GRM) between pairs of inbred strains from all the DGRP annotated genetic variants was built by using GEMMA (Version 0.96) (Zhou and Stephens, 2012), and then the narrow sense heritability (denoted as PVE) was calculated based on the above GRM with the univariate linear mixed model (Zhou and Stephens, 2012) implemented in GEMMA.
Principal component analysis (PCA) and genome-wide association analysis
The genetic variant information and major genomic inversion status were retrieved from DGRP freeze 2 (Huang et al., 2014). Genetic variants with missing values above 20% and minor allele frequency below 5% were excluded from further analysis, with which 1,903,028 genetic variants passing the stated criteria. To assess the possible influence of population structure on the pupation site selection, the PCA module from PLINK v1.90 (Purcell et al., 2007) was used to identify top principal components (PCs) from the filtered genetic variant data. The projection length of each strain on top 20 PCs was used to test the influence of cryptic population structures on pupation site selection.
The linear regression model implemented in PLINK (Purcell et al., 2007) was used to perform association analysis for the above filtered genetic variants. The R package “qqman” (Turner, 2014) was exploited for the visualization of GWA results in a Manhattan plot and qq-plots. Linear regression models used in this study include:
Pupation height ∼ genotype
Pupation height ∼ genotype + pupae case length
To define genome-wide significance threshold, we randomly assigned (1,000 times) phenotypes to individuals (thus preserving genetic structure), and performed mapping in PLINK, recording the lowest SNP association p-value for each permuted data set. The significance threshold (P-value <5E-8) was then defined as the 5th percentile of values for 1,000 permutations. As this stringent threshold returns no significant genetic variants, a more permissive significance threshold of p-value of 1E-5 was applied in practice. The associating genes for each genetic variant was predicted by SnpEff (Cingolani et al., 2012) with default parameters. In brief, all the protein-coding genes within 5 kb up/down-stream of target genetic variant were taken as its associating genes.
The genotypic linkage disequilibrium (LD) for each pair of significant genetic variants was tested by calculating the squared correlation estimator r2. Moreover, the r2 values for each genetic variant and all other genetic variants were also computed. A significant genetic region (QTL) was defined by the position of the most distant downstream and upstream genetic variants showing a minimum r2 of 0.8 to the significant genetic variants. Plink (Purcell et al., 2007) was used for all the r2 calculations. All the associating genes as claimed above, together with the genes within the LD regions were considered as candidate genes for further analysis.
Functional validation experiments
Two types of gene disruption mutagenesis techniques were used to disrupt candidate genes: 1) transposon insertion in the candidate gene region (Bellen et al., 2011); 2) DNA segment deletion within which these candidate genes locate. The detailed information about the gene disruption stocks and their progenitors can be found in supplementary file 1. The transposon insertion introduces an early premature stop codon that can disrupt the protein synthesis of candidate gene or disrupts the regulatory elements that could alter the expression of the candidate gene, and the deficiency mutation is the stock with approximately 10kb DNA segment deletion, within which the candidate gene locates. Based on the gene disruption type and selection marker on the balancer chromosome, the functional validation experiments were conducted as follows:
1) Transposon insertion lines: homozygous insertion complete-viable
The pupation height status of the transposon insertion lines and their corresponding progenitor lines are directly measured and compared.
2) Transposon insertion lines: homozygous insertion semi-lethal (or semi-viable)
Only virgin flies with homozygous insertion from these transposon insertion strains were selected for experiment validation. The pupation height status of the transposon insertion lines and their corresponding progenitor lines are then measured and compared.
3) Homozygous deficiency complete-lethal: with detectable marker (Tubby) at pupae stage
Four out of seven deficiency stocks are segregating balancer chromosomes with Tb selection marker (short rounded pupal), due to the complete lethality of homozygous large DNA segment deletion. The pupation site choices of the background stock (BG line) and F1 generation of the crossing of each segment deletion stock (virgin females) and its BG stock (males) were measured with the phenotyping pipeline aforementioned. The pupation height statuses of hemi-deletion individuals without Tb markers (no presence of balancer chromosome) were compared with those from BG stocks. The absence of Tb marker for individual pupal was determined by its pupae case length (> 73 pixels for Areashape_major_len from the output of Cellprofiler), on the basis of the apparent distinction between individual pupae with and without Tb markers. Moreover, manual check was done to further separate ambiguous individuals.
4) Homozygous insertion/deficiency complete-lethal: without detectable marker at pupae stage
One transposon insertion line and three deficiency lines are segregating balancer chromosomes with no detectable marker at pupal stage (curly wing or stubble), due to the complete lethality of homozygous large DNA segment deletion. Virgin female individuals from these insertion or deficiency stocks were crossed with male individuals from BG lines to generate F1 generation. Virgin female individuals without screening markers at adult stage from the F1 generation were selected and backcrossed with males from the BG lines. The pupation height status of the F2 generation were measured and compared with that of their progenitor stocks. It is worthwhile to note that the detected significances of the phenotyping effect of candidate genes from this approach are likely to be underestimated, as only half of the individuals in the experimental group are expected to contain the gene semi-deletion.
Expression and genetic interaction network analysis
The gene expression profiling data from Drosophila modENCODE project (Brown et al., 2014) were used for expression analysis. These were generated by measuring the genome-wide gene expression for 5 tissues in third instar larvae stages, including central nervous system (CNS), digestive system, fat, image disc and saliva glands. The expression level for each gene within each tissue was measured in units of RPKM. The fraction of genes with expression (RPKM >0) in candidate genes were compared with that in total annotated protein-coding genes. Moreover, the average gene expression levels for expressed candidate genes were also compared with those from (1000 times) randomly selected genes with the same dataset size.
The genetic interaction database was directly downloaded from Flybase V6.19 (Attrill et al., 2016). A network for whose edges were either a direct connection between candidate genes or bridged by only one gene not among the candidate gene list was extracted. The significance of the size of the largest cluster among the subnetworks by a randomization test in which we randomly extracted subnetworks for 1000 times with the same number of input genes. The p-value was determined by dividing the number of instances where the size of the largest cluster exceeds the observed largest size by the total number of randomizations (Zhou et al., 2016).
Moreover, phenotypic effects of six genes (Scrib, Pnt, Egfr, E2f1, p53 and Ras85D) in the above network were further checked, via direct comparisons of pupation height status between the co-isogenic progenitor stock and transposon disruption of each target gene (supplementary file 1D). In case there is no disruption line in the gene coding region, the ones with transposon disruption in gene regulatory regions (e.g., Intron or UTR) were selected for experimental varication. The phenotyping test experiments were conducted with the same procedure as above functional validation experiment.
Phenotypic effect test on random stocks
20 gene disruption stocks from the panel of Drosophila gene disruption project (Bellen et al., 2011) were selected using the following criteria: 1) with expression in the CNS of third instar larvae stage (RPKM >0), 2) with disruption in the gene coding region by the same type of transposon (Minos), 3) derived from the same co-isogenic progenitor stock, and 4) homozygous disruption viable. The phenotyping test experiments were conducted with the same procedure as above functional validation experiment. The GWA association p-value was taken as the lowest p-values from all the genetic variants within the gene and the 5kb up/down-stream of target gene.
Data availability
All the Drosophila melanogaster strains used in this study are public available through either Bloomington Drosophila Stock centre (https://bdsc.indiana.edu/) or EHIME Drosophila stock centre (https://kyotofly.kit.jp/cgi-bin/ehime/index.cgi). The detailed information about the wildtype, DGRP inbred lines and gene disruption stocks, is provided in Supplementary file 1. The primers used for Wolbachia infection detection are listed in the above text of this section.
Competing interests
DT: Senior editor, eLife. The other authors declare that no competing interests exist.
Supporting information
Supplementary file 1: Information for the fly stocks used in this study.
Acknowledgements
We are grateful to Wen Huang for help on Drosophila Genetic Reference Panel (DGRP) stocks. We appreciate Tautz’s lab members for helpful discussions and suggestions. We thank Anita Moeller and Elke Blohm Sievers for their excellent technical help and advice with conducting this experiment. This work was supported by institutional funding through the Max-Planck Society.