Abstract
Interactions between herbivorous insects and their host-plants are a central component of terrestrial food webs and a critical topic in agriculture, where a substantial fraction of potential crop yield is lost annually to pests. Important insights into plant-insect interactions have come from research on specific plant defenses and insect detoxification mechanisms. Yet, much remains unknown about the molecular mechanisms that mediate plant-insect interactions. Here we use multiple genome-wide approaches to map the molecular basis of herbivory from both plant and insect perspectives, focusing on butterflies and their larval host-plants. Parallel genome-wide association studies in the Cabbage White butterfly, Pieris rapae, and its host-plant, Arabidopsis thaliana, pinpointed a small number of butterfly and plant genes that influenced herbivory. These genes, along with much of the genome, were regulated in a dynamic way over the time course of the feeding interaction. Comparative analyses, including diverse butterfly/plant systems, showed a variety of genome-wide responses to herbivory, yet a core set of highly conserved genes in butterflies as well as their host-plants. These results greatly expand our understanding of the genomic causes and evolutionary consequences of ecological interactions across two of Nature’s most diverse taxa, butterflies and flowering plants.
Introduction
Butterflies and moths, and the host-plants that their larvae feed on, comprise two of the largest groups of species on earth, the Lepidoptera and Angiosperms. The extreme diversity of these two groups has arisen from the co-evolutionary interactions between them, wherein evolution proceeded via reciprocal adaptations as each clade evolved in response to changes in the other1,2. In their seminal paper, Ehrlich and Raven3 used the diffuse evolutionary relationships among butterflies and their host-plants to formally introduce the concept of co-evolution. In the 50 years since Ehrlich and Raven3, research on the molecular basis of herbivory between butterflies and their host-plants has revealed important insights into specific chemicals and defense pathways utilized by plants, such as furanocoumarins, glucosinolates, and cardenolides4. Research on butterflies, in turn, has revealed specific genes and gene families involved in host-plant detoxification, such as cytochrome P450 enzymes5,6, the nitrile specifier protein of pierid butterflies7, and the Na(+), K(+) –ATPase of the monarch butterfly8. Yet, despite these important advances, our understanding of the molecular genetic basis of these co-evolutionary interactions remains limited. Genomic approaches, including genome-wide association (GWA) studies and transcriptomics, provide a means to move beyond candidate genes and pathways to uncover the molecular determinants of this fundamental ecological interaction in an unbiased way9,10. Here we utilize these methods in a diversity of butterflies and their respective host-plant to uncover the genetic basis of herbivory.
Results
Genome wide associations with herbivory: the host-plant, Arabidopsis thaliana
First, we investigated the genetics of butterfly/host herbivory by mapping associated variants in parallel GWA studies focused on the flowering plant Arabidopisis thaliana and its natural insect herbivore, the Cabbage White butterfly, Pieris rapae. For these experiments, we used either 96 natural accessions of A. thaliana and a single lab strain of P. rapae (for the plant GWAS), or the offspring of 96 field-caught females of P. rapae and a single accession of A. thaliana (for the butterfly GWAS). For both experiments, we measured herbivory as the amount of weight gained and the amount of leaf surface area eaten by second instar larvae over a period of 72 hours. Arabidopisis thaliana GWA resulted in a total of 90 associated SNPs that were in linkage-disequilibrium with 389 genes, a gene set enriched for plant defense (Supplementary data 1). A subset of 12 well-supported candidate genes contained three or more associated SNPs each (Fig. 1a), eight of which we were able to functionally validate using SALK T-DNA mutants (Fig. 1c). This validated gene set includes well-known and novel defense genes. For instance, the cytochrome P450 gene CYP79B2 is involved in the conversion of tryptophan to indole-3- acetaldoxime, a precursor of indole glucosinolates and indole-3-acetic acid11. Indole glucosinolates are important secondary metabolites used for defense by Arabidopsis, and other species in the plant family Brassicaceae, and they have been shown to deter herbivores and pathogens 12,13. Insects that feed on Brassicaceae have evolved various physiological strategies against toxic effects of glucosinolates14. The larvae of P. rapae, for instance, redirect the hydrolysis pathway catalyzed by myrosinase7, and instead of producing toxic isothiocyanates, hydrolysis is redirected toward the formation of nitriles by the butterfly’s nitrile specifier protein (NSP)15,16. Another functionally validated gene, phosphoribosylanthranilate isomerase 3 (PAI3), catalyzes a step in the L-tryptophan synthesis pathway17, to produce the precursor of indole glucosionolates, although there is no direct evidence that it influences glucosinolates. The genes PROPEP1 and PROPEP3 belong to the AtPep (endogenous danger peptides) gene family. They are associated with activation of danger- or damage-associated molecular patterns (DAMPs) immunity in plants against both pathogen and herbivore attack18. The other four genes validated in our study do not have known roles in defense and include an uncharacterized cytochrome P450 (CYP705A33), Importin alpha (IMPA-1), a CTP synthase (AT4G20320), and an S-adenosyl-L-methionine-dependent methyltransferase (AT1G29470). Interestingly, the A. thaliana genes we found associated with herbivory did not overlap with those from a recent genome-wide association study of methionine-derived glucosinolates in A. thaliana19.
Pieris rapae genome sequencing
We began our butterfly GWAS by assembling a high-quality reference genome sequence for the Cabbage White butterfly, Pieris rapae. For the reference genome, we combined next-generation DNA sequencing data from one PCR-free paired-end Illumina library, three mate-pair libraries (3 kbp, 7 kbp, 40 kbp), and a Dovetail Chicago library to generate a final assembly of 320 Mbp, with a N50 of 11.5 Mbp, spanning the 25 P. rapae chromosomes20. Annotation was performed using a diverse set of RNA-Seq data. This chromosomal-level assembly of the butterfly genome (Supplementary Fig. 1) provided an essential genomic foundation with which to explore the insect side of the plant-insect interaction.
Genome wide associations with herbivory: the herbivore, Pieris rapae
For GWAS, we performed our larval feeding trials with the offspring of wild-caught females and used whole genome resequencing to genotype 96 unrelated larvae at genome-wide single-nucleotide polymorphisms (SNPs). Pieris rapae GWA revealed just two strongly associated regions in the genome (Fig. 1b), the largest of which encompassed a total of 16 significantly associated SNPs distributed across 98 kbp on chromosome 10, spanning seven genes (Fig. 1d). One of these genes, Glyoxalase 1 (Glo1), a lactoylglutathione lyase, is a central enzyme in the glyoxalase pathway present in all organisms. Glyoxalase detoxifies cytosolic methyglyoxyl, a toxic by-product of metabolism, in a two-step process that utilizes glutathione. Glutathione itself is an important metabolite involved in multiple biological processes in plants and animals21. In addition to Glo1, the 98 kbp associated region also contained the following genes: Resilin, a Retrovirus-related Pol polyprotein POLX, the ATP-dependent RNA helicase DHX35, a THAP domain-containing protein, and two uncharacterized genes. A second associated region in the genome, on chromosome 15, contained just one gene, peroxidasin (Supplementary Fig. 2). Peroxidasin genes play important roles in morphogenesis, cell adhesion, formation of extracellular matrix and defense against pathogens22,23. Our results suggest this gene plays a role in larval growth, possibly in the context of host-plant detoxification.
Herbivory-induced differential genes expression experiments
To further interrogate the GWA genes, and explore genome-wide patterns of gene expression throughout the plant-insect interaction, we next used transcriptomics to measure differential gene expression in P. rapae and A. thaliana over the time course of their interaction. For these experiments, butterfly and plant were allowed to interact for a period of time, after which we harvested tissue from both organisms, as well as controls, for RNA-seq. For these experiments, the plant and butterfly genome sequences were used as references to analyze RNA-seq data. To capture gene expression changes in response to oviposition, we compared gene expression profiles of (a) leaves with eggs (Fig. 2a,b) vs. leaves without eggs 72 h after oviposition, and (b) eggs deposited on leaves vs. eggs deposited on wax paper 72 h after oviposition (Fig. 2c). To capture gene expression changes of the plant in response to larval feeding, we compared (c) leaves after 24 h of larval feeding from plants previously exposed to eggs vs. leaves after 24 h of larval feeding from plants not previously exposed to eggs (Fig. 2d) vs. control leaves never exposed to eggs or larvae. We also analyzed leaves 24 h after mechanical wounding (Fig. 2e) to compare gene expression changes associated with larval feeding vs. wounding. To identify gene expression changes in the insect associated with feeding, we compared (d) larvae after 24 h of feeding on leaves from plants exposed to eggs vs. larvae after 24 h of feeding on leaves from plants not exposed to eggs vs. larvae after 24 h of feeding on artificial diet (Fig. 2f). We also measured gene expression in butterfly pupae and adults, for comparison.
The genes identified in our butterfly and plant GWA experiments showed patterns of expression consistent with a role in herbivory (Fig. 3). In A. thaliana, eight of 12 genes identified in the GWA study had the highest expression in leaves exposed to eggs and larvae, including both cytochrome P450 genes and PROPEP3. PROPEP1 and PAI3 showed elevated expression in leaves with eggs only, followed by leaves with larvae only. The remaining two genes, IMPA-1 and RABE1b (one of the four genes that was not validated) had other expression patterns, showing elevated expression in mechanically wounded leaves and the leaf control, respectively (Fig. 3a). In P. rapae, six of the seven genes in the associated region on chromosome 10, and peroxidasin on chromosome 15 (Supplementary Fig. 2), had the highest expression in larval treatments—the larva is the herbivorous life stage of the butterfly—and were expressed at lower levels in eggs, pupae and adults. Notably, this pattern of expression was distinct from physically adjacent genes that were located outside the associated regions (Fig. 3b and 3c). The one exception, POLX, which was not a candidate gene based on function, showed similar levels of expression across developmental stages. Resilin expression was elevated specifically in larvae feeding on leaves, as opposed to those eating artificial diet, indicative of a role directly in the plant-insect interaction.
The genome-wide responses of Arabidopisis and Pieris, in terms of differential gene expression, were also striking. In particular, A. thaliana showed a massive response to butterfly oviposition with approximately 50% of the plant’s genes changing expression, mostly up-regulated, in response to a female butterfly laying eggs (Fig. 2g). The response of the plant to larval feeding, in contrast, was more modest. The egg is the first life stage of the herbivore that is in contact with the plant and previous studies have shown that plants launch various defenses against insects even before larval hatching24. Egg-induced plant defense strategies include plant-mediated desiccation of eggs, egg dropping, egg crushing, and egg killing24. We found that a total of 14,563 genes were differentially expressed in Arabidopsis leaves after P. rapae egg deposition (Fig. 2i). Genes belonging to defense and stress responses were enriched in all treatments (Supplementary data 2). In addition to genes involved in production of glucosinolates and glutathiones, these include protein kinases, proteolytic enzymes, oxidoreductases, peroxidases, NBS-LRR and defense signaling transcription factors that are known to be involved in release of reactive oxygen species, production of pathogenicity-related proteins, activation of systemic acquired resistance, cell wall modification, and programmed cell death in pathogens25. These results mirror, and greatly extend, previous work with microarrays and a different species of Pieris, P. brassicae showing an elevated response of A. thaliana to insect oviposition with induction of genes involved in the hypersensitive-like response and pathogenesis-related genes as well as callose and reactive oxygen species accumulation 26-28.
Similar to the plant, P. rapae eggs exhibited an elevated response in comparison to the larval stages (Fig. 2h). Of all the genes differentially expressed between eggs oviposited on leaves and wax paper, approximately 50% of them were uncharacterized. Among the genes with predicted functions, genes responding to stress, oxidoreductases, and proteolytic enzymes were abundant (Supplementary data 3). In the next stage of their interaction, larvae feeding on the plant, we analyzed gene expression patterns in larvae feeding on plants exposed to eggs as well as those feeding on plants not previously exposed to eggs. We found that larvae feeding on plants exposed to eggs had a smaller number of differentially expressed genes compared to larvae feeding on plants that were not exposed to eggs (Fig. 2j). This mirrored gene expression patterns in the plant (Fig. 2i), in which the added effect of herbivory after oviposition was less than the effect of herbivory alone. There were 120 differentially expressed larval genes in common between the two treatments, and among the genes with putative functions, genes involved in defense, stress response and proteolysis were overrepresented (Supplementary data 4). Interestingly, glutathione s-transferases (GSTs) frequently show elevated expression in the larvae of Lepidoptera29-31 and other insects32, but we did not detect differential expression of GSTs in P. rapae.
Comparative transcriptomics across butterflies and plants
The strong response of A. thaliana to butterfly eggs was a particularly striking result. To determine if this was a general property of butterfly-plant interactions, we expanded our analysis of gene expression to three additional plant/insect systems: Medicago sativa/Colias eurytheme, Citrofortunella microcarpa/Papilio polytes and Passiflora oerstedii/Heliconius cydno (Fig. 4a). For this comparative analysis, we generated de novo transcriptomes for all organisms to use as references to analyze patterns of differential gene expression using RNA-seq data. We also reanalyzed our A. thaliana/P. rapae data using de novo reference transcriptomes, as opposed to the genome sequences, for consistency. Interestingly, we found substantial variation in the responses of both plants and butterflies, and no plant showed the same, elevated response to oviposition that we saw in A. thaliana (Fig. 4b, c). To investigate which, if any, components of the defense network were conserved among plants or among butterflies, we extracted and compared common differentially expressed genes across the four systems. Among plants, common gene families that were differentially expressed in leaves after oviposition were protein kinases, proteases, heat shock proteins, esterases, myb transcription factors and nac transcription factors (Supplementary data 5), all of which have previously been implicated in defense against pathogens33,34. During larval feeding, leaves of the four plant species shared protein kinases, proteases, heat shock proteins, esterases, myb transcription factors, nac transcription factors, and in addition, glutathione s-transerfases, cytochrome P450s, xyloglucan endotransglycosylases, dof zinc finger proteins and wrky transcription factors (Supplementary data 6). Based on protein alignments, we identified a small set of true orthologs among the differentially expressed genes, 11 in total, which were shared across all plant species. Leaves after oviposition did not have any shared orthologs but the larval feeding treatments had 4 or 5 orthologs each (Supplementary data 7), the majority of which have putative roles in defense.
Among butterfly species, the common gene families that were differentially expressed in eggs 72 h following oviposition were proteases, heat shock proteins, esterases, chemosensory proteins and cuticular proteins (Supplementary data 8), all of which have previously been implicated in detoxification pathways35,36. In larvae feeding on plants, the genes differentially expressed belonged to gene families such as chitinases, proteases, cuticular proteins, lipases and Osiris (Supplementary data 9). There was a small set of differentially expressed genes, 17 in total, that were true orthologs shared among all butterfly species (Supplementary data 10). Among this set of conserved genes, Osiris 9 (Osi9) stood out because it was up-regulated in all four butterfly species specifically in larvae feeding on leaves with previous exposure to eggs. Osi9 is a transmembrane protein that is a member of the 24 gene Osiris family that is unique to insects 37. The function of Osiris genes has been mysterious37,38 but the Osiris gene cluster was recently associated with detoxification of the Morinda citrifolia host-plant in both Drosophila sechellia39,40 and Drosophila yakuba41. Previous work has shown that Lepidoptera appear to have multiple copies of Osi937 and we detected multiple paralogs of Osi9 in the P. rapae genome and across our assembled transcriptomes (Supplementary Fig. 3). However, the copy of Osi9 that was up-regulated in larvae on all four butterfly species was the same, Osiris 9E (Supplementary Fig. 3). To further explore the potential role of Osi9E in butterflies, we surveyed spatial and temporal patterns of Osi9E expression in the four butterfly species and found elevated expression during larval stages, as opposed to pupae and adult, and expression specifically in the larval gut (Supplementary Figs. 4 and 5). These results suggest Osi9E expression is up-regulated in response to contact with host-plant tissue.
Discussion
Butterflies and their larval host-plants provide a historically significant example of co-evolution and important prior research has explored the molecular genetic basis of this ubiquitous ecological interaction5-8,15,16,30,31. Here we used two genome-wide approaches, genome-wide association studies and transcriptomics, to characterize the genetic basis of herbivory in butterflies and plants simultaneously. Our genome-wide association studies uncovered a relatively small number of well-supported herbivory genes in both A. thaliana and P. rapae. The A. thaliana GWAS yielded 12 genes that contained three or more associated variants, eight of which we were able to validate with knock-out lines, and this gene set contained both established and novel plant defense genes. P. rapae GWAS revealed just one strongly associated region of the genome that contained only seven genes. One of these genes, Glyoxalase 1, stands out as a particularly good candidate gene because it utilizes glutathione to detoxify methylglyoxal, a toxic by-product of cellular metabolism. Glutathione is a defensive compound in plants: glutathione concentrations increase in plants during oxidative stress42-45 and it has been shown to be involved in defense against pathogens46 and insect feeding47. Furthermore, glutathione levels are known to vary among Arabidopsis accessions48 and we found natural sequence variation in the GST genes in our mapping panel (Supplementary data 11). It is intriguing to consider the possibility that P. rapae may somehow be using the glyoxalase pathway to detoxify host-plant glutathione. A second P. rapae GWA peak, while not statistically significant, was prominent and contained just a single gene, peroxidasin. Peroxidasin is critical for basement membrane integrity and tissue development, and this gene has been implicated in phagocytosis and defense22,23. The combined P. rapae GWA results suggest that we may have identified genes related to host-plant detoxification and/or larval metabolism and growth.
Previous research on Pieris rapae and related butterflies has identified the nitrile-specifier protein (NSP) as a critical component in host-plant detoxification7. The origin of NSP appears to be the key innovation that allowed the ancestor of the butterfly subfamily Pierinae to colonize and detoxify host-plants in the order Brassicales, all of which produce glucosinolates (i.e., mustard oils)15,16. Prior to our experiments, we hypothesized that genetic variation at NSP may also influence detoxification capacity in contemporary populations and that we may see a strong association with NSP in our butterfly GWA study. This was not the case. Furthermore, NSP was not differentially expressed in our transcriptomic experiments with P. rapae, although it was expressed in all larvae. In contrast, glucosinolate pathway genes did emerge in the Arabidopsis GWAS and many were differentially expressed in our transcriptomic studies, suggesting that glucosinolate defense remains an active front in this co-evolutionary arms race, at least on the plant side of the interaction.
Expanding our analysis to four diverse butterfly/plant systems, we found that over evolutionary time, the molecular dynamics of the plant-insect interaction change dramatically. The number, timing, and identity of genes expressed in plants and butterflies throughout the plant/insect interaction differed considerably across the four systems we studied. This finding highlights that the specific genetic underpinnings of herbivore-plant dynamics derived from a model system like Arabidopsis may not always be generalizable to other systems. However, such a result is also expected because the process of co-evolution should drive each system along a very different evolutionary trajectory, especially over the long times scales that separate the butterfly and plant species we are studying. What is surprising, then, is that we did identify a core set of orthologous genes that were differentially expressed in response to herbivory in all butterfly/plant systems. A total of 11 orthologs were differentially expressed in all plant species and 17 in butterflies, but this summary belies further complexity as a number of these genes, while all differentially expressed, were regulated in opposite directions among systems, being upregulated in one species and downregulated in another. One gene that stood out among the conserved orthologs in butterflies was Osiris 9E, both because it was consistently expressed at high levels in larvae of all four butterfly species and because Osiris genes were recently implicated in host-plant detoxification in Drosophila39-41. Furthermore, Osiris 9 and a number of other Osiris genes have been shown to be differentially expressed in the larvae of the fly Scaptomyza flava in response to feeding on glucosinolates49. These results suggest that Osiris genes are ancient players in insect/plant interactions.
As a whole, our results provide a comprehensive portrait of the molecular genetic dynamics mediating insect/host interactions. Importantly, we are able to not only corroborate previous findings related to the identity of particular defense genes and pathways in Arabidopsis, but also expand this to other plants as well as the insect herbivores. The results from Pieris and other butterflies pinpoint specific genes that appear important for herbivory and these results further hint at even deeper evolutionary ties among herbivorous insects as a whole. Important next steps will be to functionally characterize these genes and identify the molecular and cellular mechanisms by which they impact herbivory. A complete understanding of the molecular genetic basis of the pervasive, antagonistic relationship between caterpillars and their host-plants promises to inform our understanding of ecology, evolution, and human agriculture.
Methods
Genome assembly and annotation
Genomic DNA was isolated from single, seventh-generation inbred female Pieris rapae pupae, using ethanol precipitation50. This inbred line of P. rapae was established using a singly-mated wild female caught in August 2013 near Rochester, Pennsylvania, USA (+40° 44’ 45”, -80° 9’ 45”). Three Illumina libraries were prepared, one PCR-free DNA library (180bp) and two mate-pair libraries (3kb and 7kb).The 180bp library was sequenced in two lanes and the two mate-pair libraries were sequenced in one lane each on Illumina HiSeq High Output mode, PE 100bp. Genomic DNA was isolated as described above from another inbred P. rapae pupae, a sibling of the first sample described above, and a 40kb mate pair library was constructed and sequenced by using a Lucigen NxSeq 40 kbp Mate-Pair Cloning Kit. For the final scaffolding step variable insert size libraries of 100bp – 100,000bp, using DNA from a third sibling, were generated using the Chicago and HiRise methods51 and these were sequenced by Centrillion Biosciences Inc. (Palo Alto, CA, USA), Illumina HiSeq High Output mode, PE100bp.
Genome size was estimated at 280 Mbp from unique k-mer distribution of the raw data using Jellyfish v2.1.352 and a custom R script. For the 3kb and 7kb libraries, Nextclip (version 0.8)53 was used to look for the absence of linker sequence in either read in a pair and discard those reads as potential contamination of non-mate pair sequence. All read sets were then quality filtered, the ends trimmed of adapters and low quality bases, and screened of common contaminants using bbduk v34.94 (https://github.com/BioInfoTools/BBMap/blob/master/sh/bbduk.sh). For contig generation and scaffolding, the 180bp, 3kb, and 7kb reads were assembled using AllpathsLG (version 50960)54. The best assembly was obtained by using a random subset of 56 (33%) million reads from the initial 3kb and 7kb libraries with the full set of 162 million reads from the 180bp library, more input data resulted in reduced performance and quality of assembly. AllpathsLG was run with haploidify = true option to compensate for the high degree of heterozygosity present in the P. rapae data. The assembly was composed of 347 million bases contained in 8,113 scaffolds with a N50 of 141,106 bp. Complete conserved single copy ortholog content was assessed at 87% by CEGMA (version 2.5)55. A second scaffolding step using SSPACE v256 and the 3kb, 7kb, and 40kb libraries together brought up the assembly size and the N50 to143, 392 bp. A final scaffolding step was undertaken by Dovetail genomics using the custom library and the HiRise scaffolding pipeline51, which improved the N50 to 3,706,409 bp. Chromosomal relationships of scaffolds were inferred from alignment using LAST (version 714)57 to the chromosomal structure of a closely related species Pieris napi58, and validated with scaffold junction spanning mate pair and syntenic blocks (Supplementary Fig. 1). The final P. rapae assembly contained 323,179,347 bp in 25 chromosomes and 2,747 unplaced scaffolds and a N50 of 11,535,178 bp.
The genome annotation involved the following pipeline: a) Collection of reference proteins from Uniprot database59 and assembly of high-confidence transcript sequences from previously published RNA-seq data60 using tophat2 (version 2.0.9)61,62 and the cufflinks package (version 2.2.1)63, b) Modelling of repeat sequences to mask the genome using RepeatMasker package (4.0.3) (Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-3.0.1996-2010 <http://www.repeatmasker.org>) and RepeatModeler package (1.0. 8) (Smit, AFA, Hubley, R. RepeatModeler Open-1.0.2008-2015 <http://www.repeatmasker.org>), c) Evidence-based gene build to generate training models for ab-initio gene finders using the Maker package (version 2.31-6)64, d) Manual curation of gene models and training of the Augustus gene finder (version 2.7)65, e) Re-annotation of the evidence-based annotation using ab-initio predictions, and finally, f) Functional annotation of the refined gene build using Blast matches against Uniprot/Swissprot and results from InterproScan, condensed and reconciled using ANNotation Information Extractor(Annie)( Tate, R., Hall, B., DeRego, T., & Geib, S. (2014). Annie: the ANNotation Information Extractor (Version 1.0)).
Sample collection and data analysis for GWAS
Host-plant GWAS: The growth chamber conditions for growing Pieris rapae and Arabidopsis thaliana were 23°C day/21°C night and 60% relative humidity on a 16 hour photoperiod. Three replicates of each of the 96 accessions of Arabidopsis thaliana that are listed in Supplementary Table 1 were grown until they were almost ready to bolt. After taking a picture of each plant, two 5 day old, lab-grown P. rapae larvae were weighed and placed on each plant and then the plant was enclosed in a plastic sleeve bag. After 72 h, the larvae were weighed and a new picture of the plant was taken in order to record plant surface area eaten by the larvae. The weight gained by the larvae and the total surface area eaten were calculated and used as the phenotype data for GWAS. The SNP information for the 96 accessions from the 250K SNP data66 and the phenotype data were fit into a multivariate linear mixed model in GEMMA (version 0.94.1) for association studies67. The pipeline involves converting the SNP file into a PLINK binary PED file and generating a relatedness matrix file using default parameters. The average initial weight of the larvae was used as a covariate for the analysis. For the functional validation of the candidate genes, knock out T-DNA mutants from SALK and wild type plants were grown and assayed as described above.
Herbivore GWAS: 96 P. rapae females were collected from various locations across the US Midwest during June-July 2014 and raised in the lab green house. These butterflies included 57 from around the University of Chicago campus, 19 from Schaumburg, IL, and 16 from North Dakota, 3 from downtown Chicago, and 1 from Carolina Biological Supply. The growth chamber conditions for growing P. rapae and A. thaliana were 23°C day/21°C night and 65% relative humidity on a 16 hour photoperiod. Eggs were collected from each female and two 5 day old larvae from each family were weighed and placed together on a Col-0 A. thaliana plant and this was performed in triplicate. Leaf area eaten and weight gain phenotypes were assayed as described for the Arabidopsis GWAS. For genotyping, we selected one random GWAS larva from each family (a total of 96 larvae) and sequenced the genome of each sample at 5X coverage and analyzed using the P. rapae genome assembly. Genomic DNA was extracted from skin tissue of the larvae using the VDRC Drosophila genomic extraction (https://stockcenter.vdrc.at/images/downloads/GoodQualityGenomicDNA.pdf) and sequencing libraries were prepared using KAPA Hyper Prep Kits (KR0961 – v1.14). Barcoded libraries were pooled and sequenced on an Illumina HiSeq 2500 to generate paired-end 100 bp data. After sequencing, PE100 bp Illumina reads were trimmed using Trimmotatic (version 0.36)68, aligned to the P. rapae reference genome using bowtie2, using a –very-sensitive-local option, and the aligned SAM files were prepared for calling SNPs using the default parameters of PICARD tools (version 1. 141)(http://broadinstitute.github.io/picard). Finally, UnifiedGenotyper, a variant calling program in GATK (version 3.4)69,70 was used to call SNPs. The number of reads generated and the number of SNPs called per sample is listed in Supplementary Table 2.The SNP information for the 96 larvae (a total of 18,603,675 SNPs) and the phenotype data were fit into a multivariate linear mixed model in GEMMA (version 0.94.1) for association studies67. The pipeline involves converting the SNP file into a PLINK binary PED file and generating a relatedness matrix file using default parameters. Again, the average initial weight of the larvae was used as a covariate for the analysis. The statistical significance for both the Arabidopsis and Pieris GWAS was determined by simpleM, a multiple testing correction method for genetic association studies using correlated SNPs71.
Arabidopsis and Pieris RNA-seq
Pieris rapae was raised on Arabidopsis thaliana at 23°C and 65% relative humidity on a 16 hour photoperiod. Samples at different time points were collected from both the plant and insect in TRIzol Reagent. The time points are listed in Supplemental Table 3. RNA was extracted using TRIzol Reagent and the RNA-seq libraries were generated using an Illumina TruSeq Stranded Total RNA Kit with Ribo-Zero Plant for Arabidopsis and Illumina TruSeq Stranded mRNA Library Prep Kit for Pieris. These libraries were sequenced on an Illumina HiSeq 2500 to generate SR50bp reads. All the SR50bp reads were processed with the Trimmomatic (version 0.36)68 QC pipeline. The trimmed SR50bp reads were aligned to their respective genomes using STAR (version 2.4.2)72 software. The Cuffdiff pipeline (version 2.2.1)63 was used to call the differentially expressed genes at using an FDR-corrected p-value of 0.001. The differentially expressed set of genes in Arabidopsis thaliana and Pieris rapae were annotated using the annotations available for their respective genomes.
Comparative transcriptome analysis
Each butterfly species was reared on its natural host-plant under favorable environmental conditions in the laboratory greenhouse. Pieris rapae was raised on Arabidopsis thaliana at 23°C and 65% relative humidity on a 16 hour photoperiod. Colias eurytheme was raised on Medicago sativa at 26°C and 65% relative humidity on a 16 hour photoperiod. Papilio polytes was raised on Citrofortunella microcarpa at 26°C and 65% humidity on a 16 hour photoperiod. Heliconius cydno was raised on Passiflora oerstedii at 26°C and 65% relative humidity for a 13 hour photoperiod. Samples at different time points were collected from both plants and herbivores in TRIzol Reagent. These are the same time points as those listed in Supplemental Table 3. RNA was extracted using TRIzol Reagent, and RNA-seq libraries were generated using Illumina TruSeq Stranded Total RNA Kit with Ribo-Zero Plant for the four plant species and Illumina TruSeq Stranded mRNA Library Prep Kit for the four butterfly species. These libraries were sequenced on an Illumina HiSeq 2500 to generate SR50bp reads. In addition, one library from each of the host-plant species and two libraries from of each butterfly species (the egg and a larval stage) were sequenced with PE100bp in order to generate a reference transcriptome for each species. The number of reads generated from each library is listed in Supplemental Table 4.
The SR50bp and PE100bp reads were processed with the Trimmomatic QC pipeline (version 0.36)68. All trimmed PE100bp reads and the trimmed SR50bp reads were assembled together for each host-plant and herbivore system to generate a reference transcriptome. After excluding different isoforms, the number of unique coding sequences found in each plant species was: A. thaliana, 18,105; M. sativa, 43,843; C. microcarpa, 20,367; and P. oerstedii, 22,073. The number of unique coding sequences in each butterfly species was: P. rapae, 12,037; C. eurytheme, 16,277; P. polytes, 15,176; and H. cydno, 19,914. Trimmed SR50bp reads from each species were aligned to their respective reference transcriptome and differential expression analyses were performed with the Trinity package (version r20140717)73, which includes bowtie2 for alignment, RSEM for transcript abundance and edgeR for different expression analysis. Default parameters and an FDR-adjusted p-value of 0.001 were used for analyses. The differentially expressed set of genes from all four host-herbivore systems were annotated using Blast2GO (version 3.3)74 using default settings. The annotated genes were manually parsed to find the common gene families. To detect true orthologs that were differentially expressed across all plants or all butterflies, we analyzed the protein sequences of the differentially expressed genes with Proteinortho (version 5.11)75,76 using a p-value of 0.001.
Osiris 9 paralog detection
All Osiris 9 protein sequences from Bombyx mori, Drosophila melanogaster and Danaus plexippus were extracted from the Osiris protein database37. Using them as the query, the homologous sequences were extracted from the de novo transcriptomes of all the four butterfly species and the P. rapae genome using tblastn similarity searches. A multiple sequence alignment of all extracted Osiris 9 protein sequences, including the Osiris 11 and Osiris 14 from B. mori, was generated using MAFFT v6.847b77 with the L-INS-i algorithm. A phylogenetic tree was reconstructed using FastTree (version 2.1.10), which infers approximately maximum likelihood phylogenies from alignments of protein sequences78. All the default options and the JTT+CAT model were used for tree inference.
Osiris 9E gene spatial and temporal assay
For the temporal assay of Osiris 9E expression in butterflies, whole body tissue samples were collected at ten stages of development for all four butterfly species, except for adults where wings and eyes were excluded. Multiple individuals were pooled at early instar stages (24 hours to 72 hour instar) because of their small size. For the rest of the stages, a single individual was used for each RNA extraction. The developmental stages were: 24 hour larvae, 48 hour larvae, 72 hour larvae, 2nd instar larva, 3rd instar larva, 4th instar larva, 5th instar larva, pre-pupa, 72 hour pupa, and adult. For the spatial assay of Osiris 9E expression in larvae, the head, foregut, midgut, and skin were dissected out for each individual. RNA was extracted using TRIzol Reagent, and cDNA for each sample was synthesized using a High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems) using 500 ng/ul of RNA. RT-qPCR for the temporal and spatial samples of each species was performed using SsoAdvanced SYBR Green Supermix (Bio-Rad) and a CFX96 Optical Reaction Module thermocycler (Bio-Rad). Elongation Factor 1 alpha ( EE1α) served as a normalizing gene to measure the relative expression of Osiris 9E in each sample for all four butterfly systems, the primer sequences for each species are in Supplemental Table 5.
Author Contributions
SN designed the project, collected and analyzed data for the GWAS, RNA-seq and comparative transcriptome studies, and co-wrote the manuscript. JH and CW sequenced and assembled the Pieris rapae genome. KD, CS and WZ helped collect and raise butterflies and contributed to the GWAS experiments. KD analyzed temporal and spatial expression patterns for Osi9E expression. NM generated the inbred P. rapae line used for genome sequencing, and NM, CM and NC generated the RNA-seq data used for P. rapae genome annotation. ES-R collected the North Dakota P. rapae individuals for GWAS. JB provided Arabidopsis accessions for GWAS. MRK designed and directed the project, and wrote the manuscript. All authors contributed to editing the manuscript.
Competing financial interests
The authors declare no competing financial interests.
Materials & Correspondence
Correspondence to: M. R. Kronforst
Acknowledgments
We thank E. Westerman, R. Marquez, L. Southcott, A. Russell and G. Garcia for assistance with experiments and D. Samac for generously providing Medicago sativa seeds. N. Saleh helped with generating the inbred P. rapae material for the sequencing the initial genome assembly. We also thank Cristina Sahagun for assisting with the butterfly photography. This project was funded by NIH grant GM108626 and NSF grant IOS-1452648 to MRK, as well as the Pew Charitable Trust and Neubauer funds from the University of Chicago and University of Pittsburgh.
Footnotes
Present address: INRA, Institute of Ecology and Environmental Sciences of Paris iEES-Paris, Route de Saint-Cyr, 78026 Versailles Cedex, France
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.
- 44.
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.