Abstract
Recent theory posits that adaptive evolution of reproductive proteins should depend on rates of female remating. In particular, selection on reproductive proteins is proposed to be weak unless females remate frequently, in which case cryptic female choice and sperm competition impose stronger selection. Here, we test these predictions by explicitly examining the role of selection in the molecular evolution of sperm genes in Lepidoptera, the butterflies and moths. Males of this order produce both fertilizing eupyrene sperm and a secondary apyrene type that lacks DNA. Based on population genetic analyses in two species, the monandrous Carolina sphinx moth and the highly polyandrous monarch butterfly, we see evidence for increased selection in fertilizing sperm, but only in the polyandrous species. This signal comes primarily from a decrease in non-synonymous polymorphism in sperm proteins compared to the rest of the genome, indicative of strong purifying selection. Investigation of the distribution of fitness effects of new non-synonymous mutations in monarch sperm confirms stronger selection on sperm proteins in monarchs, with very few neutral variants and weakly deleterious variants and a preponderance of strongly deleterious variants. Additionally, sperm genes in the monarch show an elevation of beneficial variants compared to the rest of the genome, suggesting a role for increased positive selection. Our results suggest that sperm competition can be a powerful selective force at the sequence level as well.
Introduction
With the rise of genomic data, the study of molecular evolution has broadened from considering one or a few genes and their protein products [1–3], to leveraging whole-genome data to look for evidence of selection across gene classes or genomic regions [4–6]. Such studies share the common goal of linking molecular evolutionary patterns to selective processes and more broadly determining the relative importance of adaptive and neutral evolution at the sequence level [7,8]. Throughout the history of these studies, one common finding is that reproductive proteins often diverge remarkably rapidly between species [9–11]. This pattern raises two questions: does rapid divergence result from adaptation, and if so, what is the source of selective pressure that drives this adaptation?
In many studies, adaptation is posited to explain rapidly evolving reproductive proteins [9,11,12]. Mechanistically, sexual antagonism or speciation through establishment of post-copulatory, pre-zygotic isolation are proposed to create strong directional selection, leading to the elevated divergence so often seen [10,13]. For certain reproductive proteins, like sperm-egg interaction pairs, there is compelling evidence for adaptive co-evolution [14,15], but, more commonly, claims of adaptive evolution lack evidence for a selective mechanism. Moreover, this framework does not explain cases in which divergence of reproductive proteins does not substantially differ compared to the rest of the genome. Recently, an alternative hypothesis was posited by Dapper and Wade [16]: reproductive proteins may diverge more quickly than other proteins due to relaxed selection and an increased influence of genetic drift; adaptive evolution may only be a strong force in systems with high rates of polyandry-mediated selection.
As explained by Nearly Neutral Theory [8], the efficacy of selection on a set of proteins depends on the effective population size (Ne) on which selection can act. Genes encoding reproductive proteins are typically expressed in only half of a given population (males or females), thus cutting in half the size of the population under selection [17,18]. Furthermore, proteins involved in postcopulatory events, like sperm competition, may only experience selection in the presence of a competitor’s proteins, which substantially decreases the opportunity for selection if females seldom mate more than once in a breeding season. This logic predicts that reproductive proteins can diverge more quickly than the rest of the genome due to relaxed selection. Adaptive evolution, particularly in sperm proteins, should only be a significant force in species with high rates of polyandry and may in fact slow rates of protein divergence in these taxa [16]. Thus, observation of divergence data alone is not sufficient to demonstrate adaptation is occurring.
Evidence for adaptation can be gleaned from rates of divergence, however, when they are paired with rates of polymorphic variation in proteins within species, as first formalized by McDonald and Kreitman [1]. From these two quotients, one can estimate the proportion of divergences fixed by adaptation (α) as well as the distribution of fitness effects of new mutations [19] to assess the role of selection in molecular evolution [20]. Studies using these McDonald-Kreitman frameworks to directly test for adaptation in reproductive proteins are rarer than the observation of elevated divergence and have mainly focused on secondary reproductive proteins, i.e. seminal fluid proteins [21–23] and female reproductive tract proteins [24], rather than proteins expressed by eggs and sperm themselves [but see 23 for an example of a sperm protein under positive selection]. Here, we examine the molecular evolution of sperm proteins and genes with sex-limited expression in a pair of lepidopteran insects (a moth and a butterfly) to test the importance of polyandry on adaptive evolution of reproductive proteins.
Sperm Dimorphism
Sperm proteins make an appealing subject for the study of reproductive protein evolution, because (aside from the obvious fact that they are an important class of reproductive proteins) sperm cells demonstrate the same striking pattern of rapid divergence at the morphological level that is so well-documented for reproductive proteins at the sequence level [26]. Sperm cells range from small and plentiful to gigantic [27] or super-structure-forming [28] and indeed, this variation exists at every level, from variability within individual males to fixed differences between species [29–33]. In many independently evolved cases, males of some taxa, including the Lepidoptera, consistently produce two different sperm types in a phenomenon known as sperm dimorphism. In all cases examined, only one of the two is capable of fertilization [30,34–37], leaving the adaptive significance of these non-fertilizing gametes unclear.
Males of nearly all species of butterflies and moths produce two distinct sperm types: one fertilizing sperm type (eupyrene) and a second type (apyrene) that lacks a nucleus and nuclear DNA [38]. The function of apyrene sperm is poorly understood, but it is incapable of fertilizing eggs. Nevertheless, its production is hormonally regulated and occurs in a developmentally predictable way, implying a novel gain of function rather than loss of fidelity in spermatogenesis [39], and evidence from organismal studies suggests that it plays some functional role(s) in reproduction. Males can control the ratio of the two sperm types in their ejaculate and typically transfer to females 10-20 times as much apyrene sperm as eupyrene sperm [40], leading some to suggest that apyrene sperm play a role in sperm competition [29,41].
As discussed above, sperm competition should create a distinctive pattern of molecular evolution for involved proteins [16]. We perform the first molecular evolutionary analyses of dimorphic sperm, assessing patterns of both polymorphism and divergence among sperm proteins from both eupyrene and apyrene sperm using proteomic datasets of two species: the monarch butterfly, Danaus plexippus [42], and the Carolina sphinx moth, Manduca sexta [43]. North American monarchs spend time at incredibly high density in overwintering colonies in Mexico [44] and, owing to these unique population dynamics, have some of the highest female remating rates observed in Lepidoptera. Female monarchs mate up to 14 times in the wild [45,46], creating ample opportunity for sperm competition. In contrast, Carolina sphinx moths are typically monandrous [47], making sperm competition rarely relevant as a selective force. Taking advantage of this contrast, we investigate the differences in strength of selection between the two sperm morphs to infer functional significance. To complete these analyses, we have generated the first published set of whole-genome resequencing data for Manduca sexta from a wild population (summarized in Table S1). To test the general predictions for relaxed selection in sex-limited proteins, we used RNA-seq gene expression datasets from previously published data for Carolina sphinx moths [48] and newly generated data for the monarch butterfly (summarized in Table S2).
Results
Differences Between Sperm Proteins and the Background Genome
First, we considered the sperm proteome as a whole and compared adaptive evolution of genes found in sperm to those in the background genome, defined as all autosomal protein coding genes not present in the sperm proteome. Z-linked genes were excluded from the analysis because mixed-sex sampling caused variable allele counts between autosomes and sex chromosomes; in both species autosomal genes contained >90% of the sperm proteome [49]. We counted and classified synonymous and non-synonymous single nucleotide polymorphisms within species and divergences to a congener (Danaus gilippus for the monarch, and Manduca quinquemaculata for the Carolina sphinx). These quantities were used to generate an estimate the proportion of adaptive substitutions (α) per gene-class for both the sperm proteome and the background genome. We found no difference in (α) between the sperm proteome and the rest of the genome in the Carolina sphinx (p = 0.40892 by permutation testing, Figure 1A, left); for monarchs, however, the sperm proteome showed a significantly greater proportion of adaptive substitutions than the rest of the genome (p = 0.00006, Figure 1A, right). Note that in the strict sense, negative α values are not defined and likely point to an abundance of weakly deleterious variants within populations or complex demographic histories [50]; nevertheless, these confounding variables should not differentially affect genes within species unless there are true differences in selection in gene sets.
To better understand the relative roles of polymorphism and divergence in sperm and background genes, we investigated the individual components of α: non-synonymous polymorphism (Pn), synonymous polymorphism (Ps), non-synonymous divergence (Dn), and synonymous divergence (Ds). We compared the scaled estimates of each (e.g. non-synonymous polymorphisms per non-synonymous site) to the background genome within each species using a Wilcoxon-Mann-Whitney test (Figure 1B). We found no differences between sperm and the background for any class of variants in M. sexta (Pn: W = 3014100, p = 0.5964; Ps: W = 2879300, p = 0.1830; Dn: W = 3068300, p = 0.2009; Ds: W = 2895700, p = 0.2686). The signal for elevated α in monarchs came primarily from non-synonymous polymorphsims, which was greatly depressed in sperm (W = 3062400; p = 3.224 * 10-11) while other classes were comparable between sperm and the background genome (Ps: W = 2684200, p = 0.2720; Dn: W = 2506400, p = 0.1300; Ds: W = 2544400, p = 0.3437).
Next, we leveraged orthology, as established by Whittington et al. [42], to test for differences in mating system while controlling for the effects of sperm proteome content. As substantial number of proteins, hereafter referred to as sperm homologs, are found in the sperm proteomes of both species and offer the opportunity to directly assess the selective pressures experienced by the same genes with conserved function but found in species with different levels of postcopulatory selection. Nearly half of the monarch sperm proteome (~42%, 216 genes, Figure 2A) shares an ortholog in the sperm proteome of M. sexta; reciprocally, there are 236 genes (37%) in the Carolina sphinx sperm proteome that share an ortholog in the monarch sperm proteome (due to a few one-to-several orthologs). We tested for differences in adaptive evolution between sperm homologs (containing an ortholog in the other species’ sperm proteome) and proteins unique to one sperm proteome (orthology outside of sperm or no detectable orthology). In Carolina sphinx moths, genes of these two classes did not differ in the proportion of adaptive substitutions with permutation testing (p = 0.6174, Figure 2B). In monarchs, we detected an increased proportion of adaptive substitution in the sperm homologs compared to unique proteins (p = 0.0372, Figure 2B). Comparing between species, sperm homologs had much higher α values in monarchs than in Carolina sphinx moths (p = 0.00008), while genes with unique expression in either species did not show differences between species (p = 0.5922).
Patterns of Adaptive Evolution in Sex-Specific Tissues
We also sought to test the hypothesis that reproductively involved genes with sex-limited expression are under relaxed selection compared to constitutively expressed genes. To do so, we compiled a set of RNA-seq expression data from a combination of published data and newly sequenced RNA from several somatic tissues (head, thorax, and gut) and reproductive structures (testes and accessory glands) in adult males. With these data, we calculated the tissue specificity metric, SPM [51], of every gene in the genome assembly with a sliding cutoff. This metric ranges from ubiquitous expression (near 0) to single-tissue specific (1). Owing to the non-independence of estimates at each point, e.g. all genes that pass a strict threshold of SPM > 0.9 are represented in every estimate at lower thresholds, these data were not significance-tested. Nevertheless, there are no obvious differences in α between genes with testes-specific expression (sex-limited) and the rest of the genes in the genome at any specificity threshold in either species. Moreover, results from both species’ sperm proteomes hold at all SPM thresholds. In monarchs, sperm genes show consistently higher α than the whole genome (Figure 3A). For the Carolina sphinx moth, sperm genes do not diverge from the whole genome, except at very high specificity thresholds, where sperm genes show lower α (Figure 3B). Finally, we note a general trend for increasing rates of adaptive evolution in more tissue-specific genes in monarchs but not in the Carolina sphinx moth. This result mirrors the inferred distribution of fitness effects; there is little evidence for adaptive evolution, as indicated by positively selected variants, in the Carolina sphinx moth background genome.
Distribution of fitness effects
Based on different patterns of selection between sperm proteins and the background genome in monarch butterflies, we investigated the distribution of fitness effects (DFE) of new non-synonymous mutations in these genes. Using the same samples as above, we generated site frequency spectra to estimate the DFEs and α for both the Carolina sphinx and monarch background genome, whole sperm proteome, and sperm homolog subset using a more complex likelihood model in the program polyDFE [52]. Owing to reduced power in the subsets of the proteome, particularly apyrene-specific genes, sperm dimorphism was not addressed with these methods.
The distributions of fitness effects of new non-synonymous mutations suggests stronger selection on sperm genes than the rest of the genome in monarchs but not Carolina sphinx moths. In Carolina sphinx moth, the DFE is quite consistent across the background genome, the whole sperm proteome, and the sperm homologs in the (Figure 4, left panels). By contrast, the DFE in monarchs differs substantially between the background genome and the sperm proteome. Relative to the background, the sperm proteome shows a dearth of weakly deleterious and effectively neutral variants, with a concurrent increase in strongly deleterious and beneficial variants. This pattern is even more exaggerated in the sperm homologs, where almost no neutral variants are detectable (Figure 4, right). Indeed, the decrease in weakly deleterious variants in monarch sperm genes should weaken the downward bias in the simple α calculation compared to the background, so the observed difference in α from our previous analysis is likely exaggerated. However, the increase in positively selected variants in both the whole sperm proteome and sperm homologs suggests that there is a true increase in adaptive evolution compared to the background genome.
Using these DFEs to estimate α with polyDFE, we see a slight increase for the sphinx moth sperm proteome compared to the background, but this pattern is not localized in the sperm homologs. Moreover, we see a much larger difference selection on sperm protein variants compared to the rest of the genome only in monarch butterflies. Here, upwards of 90% of substitutions are inferred to be a result of adaptive evolution in both the whole sperm proteome and the shared sperm orthologs (Supplemental Figure 1). We note that estimates of α are influenced by the ways in which demography are (or are not) accounted for [53], so the values obtained with this more complex likelihood method differ from our estimates based solely on count-data. Likely, the point-estimates here are closer to the true proportions of adaptive substitutions than are the values generated from our count data, but polyDFE may possess its own biases. In any case, we are more interested in relative patterns between classes of genes than accurately estimating the true, absolute value of α per se.
Molecular evolution in dimorphic sperm
To address the question of apyrene sperm function, we next considered the different subsets of the sperm proteomes. The two datasets consisted of three classes of sperm proteins: unique to eupyrene sperm, unique to apyrene sperm, or found in both cell types (henceforth “shared”). We assessed differences in selective pressures between the sperm morphs with another series of permutation tests, both comparing parts of the sperm proteome to the background genome and comparing parts of the proteome to each other. We did not estimate DFEs here owing to reduced power in the subsets of the proteome.
As expected based on the whole-proteome results, neither eupyrene-specific (p = 0.55912), shared (p = 0.4647), nor apyrene-specific proteins (p = 0.96496) differed from the background genome in the Carolina sphinx moth (Figure 5). In monarchs, both eupyrene-specific proteins (p =0.00018) and shared proteins (p = 0.01038) showed elevated α, but apyrene-specific proteins did not evolve differently from the background genome (p = 0.55934).
In the Carolina sphinx, α did not vary between apyrene-specific and eupyrene-specific proteins (p = 0.7271), between apyrene-specific and shared (p = 0.7176) or eupyrene-specific and shared proteins (p = 0.9979). Similarly, neither apyrene nor eupyrene sperm differed significantly from the shared set in monarchs (p = 0.6332 & p = 0.6234, respectively). There was, however, a trend for increased α in eupyrene-specific proteins compared to apyrene-specific proteins (p = 0.0986). In these analyses, we did not investigate the role of orthology due to a loss of statistical power that would result from further subdividing our datasets.
Demographic estimates
Finally, to contextualize our results with population dynamics, we estimated population size history using site frequency from 4-fold degenerate sites in the two species’ genomes (Figure S2). Both have effective population sizes near 2,000,000, as expected of herbivorous invertebrates with high dispersal potential, numerous host plants, and a large range over North America. We also recovered a population size increase in monarch butterflies in the recent past, which has been previously reported with genomic data [54]. We note that our inferred timing of this event differs from that of the previous authors, who used mutation rate estimates from Drosophila melanogaster. Such parameter differences affect the estimated time of events, but not the trajectories.
Discussion
We found a signal for elevated adaptive evolution in the sperm proteome compared to the background genome in monarch butterflies, but not in Carolina sphinx moths (Figure 1). In particular, this difference is greatest for monarch sperm genes with a sperm homolog in the sphinx moth (Figure 2B), suggesting that the same genes experience stronger selection in the polyandrous species. As discussed above, our estimates of α likely do not reflect the true proportion of adaptive substitutions, but the relative values within species are still meaningful. This point is best illustrated by the underlying patterns of polymorphism and divergence. The primary driver of the difference in α was not increased divergence, but rather a reduction in non-synonymous polymorphism in sperm proteins (Figure 1B, top left). This pattern suggests an increased role of purifying selection on sperm protein variants in monarchs.
Purifying selection is less commonly evoked in reproductive protein evolution, but while adaptive explanations for reproductive protein evolution are common [9–11,13], research demonstrating the selective processes is much more limited. Our results in monarch sperm proteins show a strong influence of purifying selection, evidence for adaptation, and a lack of elevated divergence that is consistent not only with Dapper and Wade’s molecular predictions [16] but also with the mechanics of selection through sperm competition.
Targets of selection in sperm competition
The elevated α in sperm homologs in monarchs (Figure 2) suggests that these genes, which have had conserved sperm function since the divergence of the two species some 100 million years ago [55], are under stronger selection in the species with more sperm competition. According to recent gene ontology analyses, such genes are enriched for core traits in sperm, such as mitochondrial function, respiration, and flagellar structure (Whittington et al., in submission). In a similar vein, proteins shared between the two sperm types and those unique to fertilizing (eupyrene) sperm show an elevated α compared to the background genome in monarchs (Figure 4). Sperm proteins shared between morphs are enriched for structural proteins that give rise to the sperm tail and thus impact motility (Whittington et al., in submission), while those expressed only in eupyrene sperm doubtless include important mediators of fertilization. On the morphological scale, variation in sperm traits like swimming ability, longevity, and overall viability affects sperm competition outcomes [57,58] and has a polygenic basis in other taxa [59]. For traits like longevity and motility there is a threshold below which fertilization becomes significantly impaired, but in the absence of competitor alleles, there is a larger range of effectively-neutral trait-values, allowing for more variation to be maintained in the population. In the presence of competitor alleles, however, marginal differences in fertilization success come under selection, leading to the removal of deleterious variants through sperm competition.
This process echoes findings in other the gametes of other taxa. Similarly strong purifying selection has been observed in genes expressed in pollen, the main male-male competitors in flowering plants [60]. On the morphological scale, this pattern can also be observed in passerine birds, in which species with higher rates of sperm competition show less intraspecific and intra-male variation in sperm length compared to sperm of less polyandrous species [61,62]. Our results are, to our knowledge, the first indication that intensifying selection and decreasing intraspecific variation extends to the molecular level in sperm thanks to post-copulatory male-male competition.
Stronger selection from competition may include even the event of fertilization itself. Lepidopteran eggs are known to possess multiple micropyle openings for sperm [63] and eupyrene sperm possess structures resembling an acrosome (while their apyrene counterparts do not) [64]. This rare combination of male and female gamete structures is also found in sturgeon, in which the multiple micropyles give several sperm potential access to the egg nucleus and there is competition among sperm to initiate karyogamy via the acrosome reaction [65]. If a similar dynamic exists in Lepidoptera, acrosomal proteins in eupyrene sperm would be likely targets for selection in polyandrous systems.
Whatever the mechanics of fertilization are, paternity outcomes in polyandrous species are often bimodally distributed [66,67], including in monarch butterflies [68]. For females that mate twice, one of the two males typically fathers most, if not all, of the observed offspring produced by the female, but there is little consistency in whether it is the first or second male. With these dynamics, fitness differences between winning and losing sperm phenotypes are large and selection can reliably remove less successful genotypes.
Evidence of this can be seen in the estimated distribution of fitness effects of new mutations in monarch sperm proteins. Compared to the background genome, we see a decrease in the proportion of effectively neutral and weakly deleterious mutations (−10 ≤ s ≤ 0) and an increase in both strongly deleterious (s ≤ -10) and beneficial (s > 0) mutations (Figure 4, right). The increase in apparently beneficial mutations follows the logic above. In the absence of competition, not only are mildly suboptimal variants effectively neutral, but novel beneficial variants have no point of comparison for fertilization efficiency. Thus, more efficient sperm variants should have no selective advantage in monandrous species unless they markedly increase fitness in a single mating. This reasoning is supported by the estimated distribution of fitness effect for the complimentary gene sets in the Carolina sphinx moth; in this species, we see little variation in the DFE between the background genome and the sperm proteome (Figure 4, left). These results between species fit well with the prediction that reproductive protein evolution depends on rates of polyandry in a species [16].
Evolution of genes with sex-limited expression
One corollary to this model of reproductive protein evolution is the prediction that genes with sex-limited expression can diverge more quickly than the rest of the genome under relaxed selection in the smaller effective population size of males or females compared to the population as a whole. Although we recovered their predictions for variable selection on sperm proteins, we did not observe a strong pattern of difference in the adaptive evolution of genes with testes-specific expression, our proxy for sex-limited expression, compared to the background genome (Figure 3). This pattern was not assessed statistically, but holds across a broad range of filtering stringency, suggesting that it is not an artifact of the ways in which we define sex-limited expression.
To explain this discrepancy between theory and observation, we turn to Nearly Neutral Theory. As mentioned above, large populations have more efficient selection than small populations and a smaller range of slightly deleterious mutations that behave neutrally [8]. Mutations with a selective effect less than 1/Ne are expected to behave neutrally. For instance, one commonly cited estimate for human population size is Ne ≈ 10,000 over evolutionary history [69]. Based on this, mutations with selective effects less than 0.0001 should behave neutrally for alleles expressed in both sexes, while those with effects of 0.0002 are effectively neutral for alleles only expressed in one sex. And indeed, there is evidence that genes expressed only in men have a higher mutational load than those expressed in both sexes [70]. Chimpanzees, another species with a similar effective population size [71], also show increased non-synonymous divergence in reproductive proteins [72]. Broadly, male reproductive protein evolution appears to depend more on effective population sizes than intensity of sperm competition in the great apes in general [73], as one would expect for species with relatively small effective population sizes.
In contrast to mammals, the effective population sizes of most insect species are orders of magnitude higher. Using neutral site frequency spectra, we estimated effective populations near 2,000,000 for both North American monarchs and Carolina sphinx moths (Figure S2). Consequently, selection is much more effective in these populations; mutations with effects above 5*10-7 should be subject to selection in both sexes and those above 1*10-6 should be subject to selection if expression is sex-limited. Thus, even selection on alleles with sex-limited expression in these insects should be 100 times stronger than selection on the entire human population. Even if there is a two-fold difference in selection, the magnitude of the difference should be miniscule, and the effects of mating system more apparent.
Sperm dimorphism
Finally, we investigated eupyrene (fertilizing) and apyrene (non-fertilizing) sperm, the ubiquitous lepidopteran cell-type of unknown functional significance. Research to-date has proposed four main hypotheses for apyrene sperm [reviewed in 29]: active sperm competition agents, passive competition agents, nutrient nuptial gifts, or necessary facilitators of fertilization. Our characterization of the molecular evolution of genes unique to eupyrene or apyrene sperm allowed us to re-evaluate these hypotheses using strength of selection to infer function.
In particular, our data are inconsistent with apyrene sperm having an active role in sperm competition. Apyrene sperm proteins did not show a distinct pattern of evolution compared to the background in either species (Figure 5). If these sperm were involved in interfering with competitors’ sperm, as some have suggested [29,41,74], we would expect evidence for stronger selection in apyrene sperm compared to the background genome in monarch butterflies; instead, apyrene sperm genes evolve similarly to the background genome and there was more adaptive evolution in eupyrene-specific and shared-sperm proteins (Figure 5, bottom), likely due to fertilization dynamics coupled with sperm competition as discussed above. So apyrene proteins engaging in sperm competition seems unlikely, but apyrene sperm may still have adaptive significance without specialized molecular function.
The filler hypothesis also relates to sperm competition, but posits that apyrene sperm are employed proactively, to fill the female’s sperm storage organ and delay remating, thus decreasing the risk of sperm competition, rather than impacting its outcome [29]. Both in monarchs and the butterfly Pieris napi, female time to remating increases with the number of apyrene sperm received from males [40,75]. Such observations are somewhat confounded by the size of the spermatophore nuptial gift that males provide during mating, but apyrene sperm themselves have been proposed as a form of nutritional nuptial gift [76,77]. Under both the nutrient and filler hypotheses, the actual sequence of apyrene sperm proteins should be less important than its physical presence and abundance, so factors affecting the rate of apyrene sperm production would be more likely targets for selection in polyandrous species than the proteins sequences themselves.
Finally, apyrene sperm appear to capacitate fertilization in Bombyx mori [78]; the mechanism here is unclear and untested in other taxa, but it could conceivably involve proteins that interact with the female reproductive tract to make conditions more favorable for eupyrene sperm. In such a case, these proteins would behave more akin to the broader class of reproductive proteins with sex-limited expression and evolve independently of rates of polyandry in a species. If there is an evolutionarily conserved capacitation effect in our study taxa, it is possible that this function is governed by a small subset of apyrene-specific proteins. Because our methods aggregate signal for selection across multiple genes or sites to counteract high variance in variant counts within genes [80], the importance of these few genes could be lost in the heterogeneous selection on different proteins.
Conclusions
Research on molecular evolution in reproductive proteins has focused less on non-adaptive variation and more on explaining positive selection. Our investigation of the sperm proteome in two Lepidoptera demonstrates a pattern of stronger purifying selection on sperm genes in a species with higher rates of sperm competition and provides important insights for the influence of post-copulatory selection in molecular evolution of reproductive proteins. In a polyandrous species, sperm genes experience a strikingly different selective environment than the rest of the genome, with strong purifying selection reducing variation in sperm genes. In contrast, sperm genes in the monandrous species hold as much deleterious variation as other parts of their genome. Our new molecular findings fit well with established literature on sperm morphology cataloging the effects of sperm competition on variation in sperm traits [61,62]. The results on genetic variation are visible with both straightforward count-based and sophisticated model-based population genetic analyses in the sperm proteome as a whole. The evolution of non-fertilizing sperm, however, does not show such strong differences, but this lack of pattern itself argues against apyrene sperm as active agents of sperm competition. Rather, we suggest, they may play a passive role in reducing the risk of competition. The method by which apyrene sperm capacitate fertilization in some species remains unclear based solely on genomic approaches and will likely require functional experiments to completely understand.
Materials and Methods
Sources of data
We used gene sets from the published genomes of each species [81,82] with sperm genes identified from their respective proteomes [42,43]. We inferred selection from patterns of polymorphism and divergence from congeners using whole genome Illumina resequencing data for both species: a previously published dataset for North American monarch butterflies [54] and a new dataset of North Carolinian sphinx moths. Focal moths were collected with a mercury vapor light trap in July of 2017 in Rocky Mount, North Carolina (see supplemental table S1 for sequencing summary statistics).
Divergences were called by comparison to the queen butterfly (Danaus gilippus, previously published) for monarchs, and the five-spotted hawkmoth (Manduca quinquemaculata, sequenced for this project) for the Carolina sphinx moth.
In both focal species, we used twelve wild-caught individuals for sampling of polymorphism. In the case of Carolina sphinx moths, these were twelve males caught over the course of three nights. The sex-biased sampling reflects a sex bias in dispersal and collection at the light trap. In the case of monarchs, samples were selected based on depth of sequencing coverage in the published dataset and included 8 females and 4 males from the panmictic North American migratory population. These samples added the complication of unequal sampling between the autosomes (n = 24) and Z sex chromosome (n = 16). Despite the male-biased gene accumulation on the Z chromosome, the vast majority of sperm genes (92% in the Carolina sphinx, 90% in the monarch) are autosomally linked in both species [49]. Due to the lowered power for statistical testing and limited inference to be gained from Z-linked genes, we focused on the autosomal genes in both species in subsequent analyses.
SNP-based methods
After quality-trimming, we aligned sequenced reads with bowtie2 [83] for conspecifics to their reference genome or stampy [84] with an increased allowance for substitution for heterospecific alignments. Alignments were taken through GATK’s best practices pipeline [85], including hard filtering, to yield a set of high quality SNPs both within and between species. Effect-class of each polymorphism and divergence (synonymous, non-synonymous, intergenic, etc.) was determined using custom databases for the two species created with SnpEff [86]. Annotated SNPs were curated to remove false divergences (ancestral polymorphism) and then differences in adaptive evolution were calculated using an estimator of the neutrality index to calculate α, the proportion of substitutions driven by adaptive evolution [80]; this form of α corrects the inherent bias in a ratio of ratios while also allowing summation across multiple genes to reduce noise associated with small numbers in count data. For any set of i genes with non-zero counts of synonymous polymorphism (P) and divergence (D):
This statistic was calculated with custom R scripts in R version 3.3.3 [87].
Assessment of adaptive evolution and statistical significance
In each analysis we calculated α for a biologically meaningful set of genes, e.g. the sperm proteome and the background genome, and generated a test statistic from the absolute difference of the two point-estimates. To determine significance, we combined the two sets and randomly assigned genes into two new sets of sizes equal to the originals. The difference of these two datasets was determined and the process was repeated for 50,000 permutations to build a distribution of differences between the point estimates of two gene sets of these relative sizes. The p-value was taken as the proportion of times a greater absolute difference was observed between the two random data sets than between the original sets.
These analyses were applied within-species at several levels: differences between the sperm proteome and background genome, differences between sperm homologs and unique proteins in the proteome, and, finally, differences between the two sperm morphs. The whole proteome comparison is relatively straightforward. For the sperm homolog to novel proteins comparison, we considered orthology in the same manner as we did in a previous investigation of genomic architecture in these two species [49]. Sperm proteins were divided into one of two classes based on orthology and expression: either sperm homolog or unique if the ortholog was not found in the sperm proteome or there was no detectable ortholog. In these analyses, the sperm proteome was taken as a whole, agnostic of sperm dimorphism. To examine differences in dimorphic sperm, we finally considered the gene product’s location in the proteome, either unique to eupyrene sperm, unique to apyrene sperm, or shared in both types. For these analyses, we did not consider orthology status owing to the reduction in power that would accompany multiple layers of subdivision of the dataset.
Site-frequency-based methods
We also investigated molecular evolution by leveraging site-frequency-spectrum-based approaches as complimentary evidence. We used the population genetics software suite ANGSD [88] to generate site frequency spectra at putatively neutral (four-fold degenerate) and selected (zero-fold-degenerate) sites in the genome. We unfolded site frequency spectra using parsimonious inference of ancestral state of alleles. These unfolded spectra were fed into the sfs-based tool polyDFE [52] to examine rates of adaptive evolution with a more complex likelihood model that corrects for effects of demography and misattribution of ancestral state. We compared sites from the backgrounds to sites from the sperm proteomes to see if estimates of α or the distribution of fitness effects of new mutations differed between these two gene sets in each species. Divergence counts were omitted here to simplify the likelihood computation for these large datasets and remove any error for misattributed divergence. To test the robustness of results, input data were bootstrapped 100 times to obtain confidence intervals for parameter estimates. Processing of model inputs and outputs was accomplished with custom R scripts.
Investigation of sex-limited and tissue-specific expression
Next, we investigated the robustness of results from the sperm proteome analyses using RNA-seq data in these taxa. For Manduca sexta, there exists a wealth of tissue-specific data at multiple developmental timepoints [48]. Because we were primarily interested in sperm involvement, we focused on sequencing from adult males, specifically RNA from the testes, head, thorax, and gut. Expression (as measured by fragments per kilobase of transcript per million mapped reads, FPKM) was averaged across biological replicates where available in this species. Monarchs had no comparable published data, so we sequenced the head, thorax, gut, testes, and accessory gland of three adult males.
To localize the signal for specific expression, we calculated tissue-specificity as SPM (specificity metric), a ratio ranging from 0 to 1 on the proportion of gene expression limited to a focal tissue [51]. For instance, a gene with an SPM value of 0.8 for the testes shows 80% of its total expression in the sampled tissues in the testes. Rather than strictly defining a single threshold for tissue-specific expression compared to general expression, we implemented a sliding cut-off. For a series of SPM thresholds ranging from 0 to 1 in increments of 0.05, all genes of a given class that showed specificity higher than the threshold were included in a calculation of α. This methodology created substantial non-independence between point estimates and precluded significance testing, but still allowed us to investigate whether results from the sperm proteome were sensitive to filtering of the included genes.
We first compared the effect of SPM threshold value on all genes in the genome that had non-zero expression in sampled tissues. These genes were evaluated based on the maximum SPM across all tissues. For sperm proteins, we considered only genes identified in the sperm proteome and ranked them by SPM in the testes. Finally, for male-limited non-sperm genes, we excluded sperm proteome genes and considered again those ranked by specificity in the testes (or testes and accessory glands for monarchs). These analyses were completed with custom R scripts.
Demographic estimates
Finally, to contextualize the previous analyses, we characterized present and historical population size from genomic data. Using folded 4-fold degenerate frequency spectra, we estimated neutral coalescence patterns with stairway plot [89]. For estimated generation time, we used the widely cited four generations per year for monarchs and three for the Carolina sphinx moth. Finally, for mutation rate, we chose the estimate 2.9*10-9 from the butterfly Heliconius melpomene, the closest relative with a spontaneous mutation rate estimate [90].
Acknowledgments
This project was funded by the NSF DDIG (DEB-1701931). Manduca sequences can be found with the NCBI accession SRP144217. Thank you to Jacobus de Roode for use of the monarch image, Elizabeth Moore for facilitating collaboration between Kansas and North Carolina, Tawny Scanlan for comments on sperm biology, and Amanda Pierce and Tom de Man for housing during field collection.
References
- 1.↵
- 2.
- 3.↵
- 4.↵
- 5.
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.
- 23.↵
- 24.↵
- 25.
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.
- 33.↵
- 34.↵
- 35.
- 36.
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵