Abstract
We report on the variable venom composition of a population of the Caucasus viper (Vipera kaznakovi) in Northeastern Turkey. We applied a combination of venom gland transcriptomics, as well as de-complexing bottom-up and top-down venomics, enabling the comparison of the venom proteomes from multiple individuals. In total, we identified peptides and proteins from 15 toxin families, including snake venom metalloproteinases (svMP; 37.8%), phospholipases A2 (PLA2; 19.0%), snake venom serine proteinases (svSP; 11.5%), C-type lectins (CTL; 6.9%) and cysteine-rich secretory proteins (CRISP; 5.0%), in addition to several low abundant toxin families. Furthermore, we identified intra-species variations of the V. kaznakovi venom composition, and find these were mainly driven by the age of the animals, with lower svSP abundance in juveniles. On a proteoform level, several small molecular weight toxins between 5 and 8 kDa in size, as well as PLA2s, drove the difference between juvenile and adult individuals. This study provides first insights into venom variability of V. kaznakovi and highlights the utility of intact mass profiling for a fast and detailed comparison of snake venoms of individuals from a community.
Biological Significance Population level and ontogenetic venom variation (e.g. diet, habitat, sex or age) can cause a loss of antivenom efficacy against snake bites from wide ranging snake populations. The state of the art for the analysis of snake venoms are de-complexing bottom-up proteomics approaches. While useful, these have the significant drawback of being time-consuming and following costly protocols, and consequently are often applied to pooled venom samples. To overcome these shortcomings and to enable rapid and detailed profiling of large numbers of individual venom samples, we integrated an intact protein analysis workflow into a transcriptomics-guided bottom-up approach. The application of this workflow to snake individuals of a local population of V. kaznakovi revealed intra-species variations in venom composition, which are primarily explained by the age of the animals, and highlighted svSP abundance to be one of the molecular drivers for the compositional differences.
Highlights
First community venomic analysis of a local population of the Caucasian viper (Vipera kaznakovi).
The venom gland transcriptome of V. kaznakovi identified 46 toxin genes relating to 15 venom toxin families.
Bottom-up venomics revealed the identification of 25 proteins covering 7 toxin families mainly dominated by snake venom metalloproteinases (svMP).
Community venomics by top-down mass profiling revealed ontogenetic shifts between juvenile and adult snakes.
1. Introduction
Venomics is considered an integrative approach, combining proteomics, transcriptomics and genomics to study venoms [1]. Although the term was initially used to describe the mass spectrometry-based proteomic characterization of venoms [2,3], genomic [4,5] or more commonly venom gland transcriptomic sequencing [6–14] have also been used to characterize venom compositions. These molecular approaches render an overview over venom composition by providing the nucleotide sequences of venom toxin-encoding genes (among others) and, in the case of transcriptomics, provide an estimation of their relative expression in the venom gland. Furthermore, (translated) protein sequence databases are crucial for the robust annotation of tandem mass spectra from proteomic analyses in peptide/protein spectrum matching (PrSM). A bibliographic search to the keyword “Snake venomics” in PubMed identified 147 hits between 2004 and 2018, which showed particularly in recent years a rapid expansion in the application of venomics approaches.
Initial proteomic analyses of snake venoms included the combination of multidimensional separation techniques (chromatographic and gel electrophoresis), N-terminal Edman degradation, and de novo sequencing by tandem mass spectrometry of tryptic peptides gathered by in-gel digestion of SDS-PAGE bands [2,15]. Since these initial studies, the proteomic characterization of snake venoms has become much more comprehensive due to technical advances in mass spectrometry and next generation nucleotide sequencing. Several complementary strategies were developed to unveil the venom proteomes of more than 100 snake species [16]. Most of these studies applied the so called ‘bottom-up’ proteomics whereby intact proteins are typically digested with trypsin before tandem mass spectrometry analysis. Many workflows perform venom decomplexation prior to the digestion either by liquid chromatography (LC) or gel electrophoresis, or a combination of both [17]. The direct, in-solution digestion, or so called ‘shotgun proteomics’, allows for a fast qualitative overview, but suffers from a less quantitative breakdown of snake venom composition [17,18]. For example, in shotgun experiments, the problem of protein inference often does not permit the differentiation of the numerous toxin isoforms present in venom [19]. Thus, the chromatographic or electrophoretic separation of venom samples greatly aids in differentiating between toxin isoforms (paralogs). In addition, decomplexing prior to trypsin digestion often does not allow for a clear identification of differential post-translational modified variants, so-called proteoforms [20].
A logical bypass of this problematic would be the omittance of the digestion step and the direct analysis of intact proteins by tandem mass spectrometry, called top-down proteomics. Recently top-down protein analysis has been applied alone or in combination with other venomics approaches to study the venom of King Cobra (Ophiophagus hannah) [21,22] the entire genus of mambas (Dendroaspis spp.) [23,24], the brown tree snake (Boiga irregularis) [6], the Okinawa Habu Pit Viper (Protobothrops flavoviridis) [25] and several viper species from Turkey [26–28]. In the case of viperid species, top-down analysis typically only reveals a partial characterization of the venom, as a number of the main toxin components, such as high molecular weight snake venom metalloproteinases (svMPs) (>30 kDa), are challenging to efficiently ionize by denaturing electrospray ionization (ESI) and might only provide few observable fragments in tandem MS [29]. A possible way to overcome difficulties in terms of ionization of high molecular weight proteins is the application of native ESI, as described by Melani et al. [22]. However native top-down mass spectrometry typically requires a special type of mass spectrometer with extended mass range and more extensive sample preparation, which makes this type of analysis more technically challenging. In most of the aforementioned studies, the top-down workflow was performed with a front-end LC-based sample decomplexation. This allows for the generation of MS1 mass profiles (XICs) of intact proteoforms. Typically, the MS1 information is accompanied by tandem MS (MS2) information acquired in data-dependent acquisition (DDA) mode. The MS2 fragment spectra are than matched to a translated transcriptome/genome database in order to identify the proteins. In the case that there are not enough MS2 fragment peaks of a particular proteoform, the intact molecular mass can still enable identification, especially if the intact mass can be associated to masses observed in complementary experiments, such as retention time, mass range of SDS-PAGE and/or bottom-up protein IDs of decomplexed bottom-up venomics [26]. The additional information gained through exact intact protein masses can be particularly informative to differentiate between isoforms or proteoforms. Furthermore, the simple sample preparation, high sensitivity and fast analysis time allows for a rapid comparison of the venom composition. The quantitative capabilities of top-down approaches [30–32] thereby offer great potential for comparative venomic studies of individuals. While most snake venom compositions reported so far [16] were performed on a single pool of venom sourced from different numbers of individuals, several studies have shown correlations between different ecological, geographical, genetic and/or developmental factors and the venom proteome, e.g. different diets [33–36], regional separation of populations [37–39], sex [40–42] or age [43–46]. In addition to better understand the heritability of venom toxins [47], and the evolutionary processes underpinning population level venom variations [48], venomics is an important approach to better understand regional and intraspecific variations in the venom composition of medically important snake species, which has considerable relevance for the development and clinical utility of snakebite therapies, known as antivenom [49,50].
Here, we applied a top-down venomics approach to demonstrate intraspecific venom variation in a local population of the medical relevant Caucasian viper (Vipera kaznakovi). The Caucasian viper is a subtropical, medium-sized, viper species with a distribution range mainly at the Caucasian Black Sea coast in the Artvin and Rize province of Turkey. V. kaznakovi feed predominately on small vertebrates (mice, lizards etc.) and also insects [51]. In a previous shotgun proteomics study of this species, Utkin and coworkers described the venom of V. kaznakovi to be composed of phospholipase A2 (PLA2), svMP, snake venom serine proteases (svSP), Cysteine-rich secretory proteins (CRISP), C-type lectins (CTL), L-amino acid oxidase (LAAO), vascular endothelial growth factor (VEGF), disintegrins (Dis), phospholipase B (PLB), nerve growth factors (NGF), as well as other a number of other proteins of lower abundance [52]. In this study, we pursue a more in-depth approach to characterizing the venom of this species. We use a combination of venom gland transcriptomics, decomplexing bottom-up proteomics and comparative top-down proteomics to broadly characterize the venom composition of this species, and to also investigate intraspecific variation in toxins on the level of the individual snakes.
2. Material & Methods
2.1. Sampling
Venom samples of V. kaznakovi were collected from 6 adult (2 female, 4 male) and 3 juvenile specimens (unknown sex). All specimens were captured in late June 2015 in their natural habitat and released back into their natural environment after venom extraction. The V. kaznakovi individuals were collected in Artvin province in Turkey near the Georgian border, with 6 individuals sampled from Hopa district, 2 individuals from Borçka district and 1 specimen in the Arhavi district. An additional female individual found in Borçka district was collected for venom gland dissection for transcriptomics analysis. Ethical permission (Ege University Animal Experiments Ethics Committee, 2013#049) and special permission (2015#124662) for the sampling of wild-caught V. kaznakovi were received from the Republic of Turkey, Ministry of Forestry and Water Affairs.
2.2. Sample storage and preparation
Crude V. kaznakovi venom was extracted by using a parafilm-covered laboratory beaker without exerting pressure on the venom glands. Venom samples were centrifuged at 2000 × g for 10 min at 4 °C to remove cell debris. Supernatants were immediately frozen at −80 °C, lyophilized and samples were stored at 4 °C until use.
2.3. Determination of lethal dose (LD50)
Lethal potency (LD50) of venoms to mice (milligrams of dry weight per kg) was determined by an up- and down method as recommended by the Organization for Economic Cooperation and Development (OECD) guidelines (Test No. 425) [53,54]. Groups of five mice (n = 15; age, 8 to 10 weeks; female 8 and male 7 individuals) were used per venom dose. Various venom concentrations (5, 2 and 1 mg/kg, milligrams of protein per kg calculated from dry weight venom by Bradford assay) were diluted in ultrapure water to a final volume of 100 μL and injected by intraperitoneal (IP) routes. Control mice (n = 5; female 2 and male 3 individuals) received a single IP injection of sterile saline (0.9%, 0.1 mL). All assays and procedures involving animals strictly followed the ethical principles in animal research adopted by the Swiss Academy of Medical Sciences [55]. Additionally, they were approved by a local ethics committee (2013#049). The mortality was recorded 24 h after injection. The median lethal dose was determined by a nonlinear regression fitting procedure (GraphPad Prism 5., Version 5.01, Inc., San Diego, CA, USA).
2.4. RNA isolation and purification
Venom glands were dissected from a wild caught adult female specimen of V. kaznakovi in Kanlidere, Hopa district (Artvin province) and processed as previously described [9,24]. Briefly, immediately following euthanasia, venom glands were dissected and were immediately flash frozen in liquid nitrogen and stored cryogenically prior to RNA extraction. Venom glands were next homogenized under liquid nitrogen and total RNA extracted using a TRIzol Plus RNA purification kit (Invitrogen), DNAse treated with the PureLink DNase set (Invitrogen) and poly(A) selected using the Dynabeads mRNA DIRECT purification kit (Life Technologies), as previously detailed [9,24].
2.5. RNA sequencing, assembly and annotation
RNA-Seq was performed as previously described [9,24]. The RNA-Seq library was prepared from 50 ng of enriched RNA material using the ScriptSeq v2 RNA-Seq Library Preparation Kit (epicenter, Madison, WI, USA), following 12 cycles of amplification. The resulting sequencing library was purified using AMPure XP beads (Agencourt, Brea, CA, USA), quantified using the Qubit dsDNA HS Assay Kit (Life Technologies), before the size distribution was assessed using a Bioanalyser (Agilent). The library was then multiplexed and sequenced (alongside other sequencing libraries not reported in this study) on a single lane of an Illumina MiSeq, housed at the Centre for Genomic Research, Liverpool, UK. The V. kaznakovi library amounted to 1/6th of the total sequencing lane. The ensuing read data was quality processed by (i) removing the presence of any adapter sequences using Cutadapt (https://code.google.com/p/cutadapt/) and (ii) trimming low quality bases using Sickle (https://github.com/najoshi/sickle). Reads were trimmed if bases at the 3’ end matched the adapter sequence for 3 bp or more, and further trimmed with a minimum window quality score of 20. After trimming, reads shorter than 10 bp were removed.
For sequence assembly we used VTBuilder, a de novo transcriptome assembly program previously designed and validated for constructing snake venom gland transcriptomes [56]. Paired-end read data was entered into VTBuilder and executed with the following parameters: min. input read length 150 bp; min. output transcript length 300 bp; min. isoform similarity 96%. Assembled contigs were annotated with the BLAST2GO Pro v3 [57] using the blastx-fast algorithm with a significance threshold of 1e−5, to provide BLAST annotations (max 20 hits) against NCBI’s non redundant (NR) protein database (41 volumes; Nov 2015) followed by mapping to gene ontology terms, and Interpro domain annotation using default parameters. Following generic annotation, venom toxins were initially identified based on their BLAST similarity to sequences previously identified in the literature or in molecular databases as snake venom toxins, and then manually curated for validation.
2.6. Venom proteomics (bottom-up)
The crude venom (1 mg) was dissolved to a final concentration of 10 mg/ml in aqueous 3% (v/v) acetonitrile (ACN) with 1% (v/v) formic acid (FA) and centrifuged at 16,000 g for 5 min to spin down insoluble content. The supernatant was loaded onto a semi-preparative reversed-phase HPLC with a Supelco Discovery BIO wide Pore C18–3 column (4.6 × 150 mm, 3 μm particle size) using an Agilent 1260 Low Pressure Gradient System (Agilent, Waldbronn, Germany). The column was operated with a flow rate of 1 mL/min and performed with ultrapure water (solution A) and ACN (solution B), both including 0.1% (v/v) FA. A standard separation gradient was used with solution A and solution B, starting isocratically (5% B) for 5 min, followed by linear gradients of 5–40% B for 95 min and 40–70% for 20 min, then 70% B for 10 min, and finally re-equilibration at 5% B for 10 min. Peak detection was performed at λ = 214 nm using a diode array detector (DAD). After the chromatographic separation of the crude venom, the collected and vacuum-dried peak fractions were submitted to a SDS-PAGE gel (12% polyacrylamide). Subsequently, the coomassie-stained bands were excised, and submitted to in-gel trypsin digestion, reduced with fresh dithiothreitol (100 mM DTT in 100 mM ammonium hydrogencarbonate, pH 8.3, for 30 min at 56 °C) and alkylated with iodoacetamide (55 mM IAC in 100 mM ammonium hydrogencarbonate, pH 8.3, for 30 min at 25 °C in the dark). The resulting peptides were then extracted with 100 μL aqueous 30% (v/v) ACN just as 5% (v/v) FA for 15 min at 37 °C. The supernatant was vacuum-dried (Thermo speedvac, Bremen, Germany), redissolved in 20 μL aqueous 3% (v/v) ACN with 1% (v/v) FA and submitted to LC-MS/MS analysis.
The bottom-up analysis were performed with an Orbitrap XL mass spectrometer (Thermo, Bremen, Germany) via an Agilent 1260 HPLC system (Agilent Technologies, Waldbronn, Germany) using a reversed-phase Grace Vydac 218MSC18 (2.1 × 150 mm, 5 μm particle size) column. The pre-chromatographic separation was performed with the following settings: After an isocratic equilibration (5% B) for 1 min, the peptides were eluted with a linear gradient of 5–40% B for 10 min, 40–99% B in 3 min, held at 99% B for 3 min and re-equilibrated in 5% B for 3 min.
2.7. Community venom profiling (top-down)
The top-down MS analysis was performed by dissolving the crude venoms in ultrapure water containing formic acid (FA, 1%) to a final concentration of 10 mg/mL, and centrifuged at 20,000 × g for 5 min. Aliquots of 10 μL dissolved venom samples were submitted to reverse-phase (RP) HPLC-high-resolution (HR)-MS analyses. RP-HPLC-HR-MS experiments were performed on an Agilent 1260 HPLC system (Agilent, Waldbronn, Germany) coupled to an Orbitrap LTQ XL mass spectrometer (Thermo, Bremen, Germany). RP-HPLC separation was performed on a Supelco Discovery Biowide C18 column (300 Å pore size, 2 × 150 mm column size, 3 μm particle size). The flow rate was set to 0.3 mL/min and the column was eluted with a gradient of 0.1% FA in water (solution A) and 0.1% FA in ACN (solution B): 5% B for 5 min, followed by 5–40% B for 95 min, and 40–70% for 20 min. Finally, the gradient was held isocratic with 70% B for 10 min and re-equilibrated at 5% B for 10 min. ESI settings were: 11 L/min sheath gas; 35 L/min auxiliary gas; spray voltage, 4.8 kV; capillary voltage, 63 V; tube lens voltage, 135 V; and capillary temperature, 330 °C. MS/MS spectra were obtained in data-dependent acquisition (DDA) mode. FTMS measurements were performed with 1 μ scans and 1000 ms maximal fill time. AGC targets were set to 106 for full scans and to 3 × 105 for MS/MS scans, and the survey scan as well as both data dependent MS/MS scans were performed with a mass resolution (R) of 100,000 (at m/z 400). For MS/MS the two most abundant ions of the survey scan with known charge were selected. Normalized CID energy was set to 30% for the first, and 35% for the second, MS/MS event of each duty cycle. The default charge state was set to z = 6, and the activation time to 30 ms. Additional HCD experiments were performed with 35% normalized collision energy, 30 ms activation time and z = 5 default charge state. The mass window for precursor ion selection was set to 2 or 6 m/z. A window of 3 m/z was set for dynamic exclusion of up to 50 precursor ions with a repeat of 1 within 10 s for the next 20 s.
2.8. Bioinformatic analysis
The LC-MS/MS data files (.raw) obtained from the in-gel digestion were converted to mascot generic format (.mgf) files via MSConvert GUI of the ProteoWizard package (http://proteowizard.sourceforge.net; version 3.0.10328) and annotated by DeNovo GUI [58] (version 1.14.5) with a mass accuracy of 10 ppm for precursor mass and 0.2 m/z for fragment peaks. A fixed modification carbamidomethyl cysteine (C +57.02 Da) was selected. Resulting sequence tags were examined manually and searched against the non-redundant Viperidae protein database (taxid: 8689) using the basic local alignment search tool (BLAST) [59].
For peptide spectrum matching, the SearchGUI software tool was used with XTandem! As the search engine [60]. The MS2 spectra were searched against the non-redundant Viperidae protein NCBI (taxid: 8689, 3rd Nov 2017, 1727 sequences), our in-house Vipera kaznakovi toxin sequence database (translated from our venom gland transcriptomic analyses; 46 toxin sequences) and a set of proteins found as common contaminants (CRAP, 116 sequences), containing in total 1889 sequences. Mass accuracy was set to 10 ppm for the precursor mass and 0.2 m/z for the MS2 level. Alkylation of Cys was set as fixed modification and acetylation of the N-terminus, of Lys as well as oxidation of Met were allowed as variable modifications. A false discovery rate (FDR) was estimated through a target-decoy approach and a cut-off of 1% was applied. All PSMs were validated manually and at least 2 PSMs were required for a protein ID to be considered.
For the top-down data analysis, the .raw data were converted to .mzXML files using MSconvert of the ProteoWizard package (http://proteowizard.sourceforge.net; version 3.0.10328), and multiple charged spectra were deconvoluted using the XTRACT algorithm of the Xcalibur Qual Browser version 2.2 (Thermo, Bremen, Germany). For isotopically unresolved spectra, charge distribution deconvolution was performed using the software tool magic transformer (MagTran).
2.9. Multivariable statistics
Principal component analysis (PCoA), using the relative percentages of the major toxin families as well as different proteoforms as a variable, was applied to explain determinants of compositional variation among venoms. PCoA was performed in R (R Foundation for Statistical Computing, 2016) with the extension Graphic Package rgl, available from https://www.R-Project.org.
2.10. Data sharing
Mass spectrometry proteomics data (.mgf, .raw and results files and search database) have been deposited to ProteomeXchange [61] with the ID PXD010857 via the MassIVE partner repository under project name “Venom proteomics of Vipera kaznakovi” and massive ID MSV000082845.
Raw sequencing reads and the assembled contigs generated for the venom gland transcriptome (.fastq and .fasta, respectively) have been deposited in the NCBI sequence read archive (SRA) under accession SRR8198764 and linked to the BioProject identifier PRJNA505487.
3. Results and Discussion
3.1. Field work and venom toxicity
The medium-sized Caucasian viper (Vipera kaznakovi) mainly inhabits the forested slopes of mountain peaks with a distribution range from the Caucasian Black Sea coast provinces of northeastern Turkey, through Georgia to Russia (Figure 1). V. kaznakovi feeds predominately on small vertebrates (mice, lizards, etc.) or insects, and a specific characteristic of this species is the complete black coloration with elements of orange to red zigzag-looking strip on the upper side of the body (Figure 1).
During our fieldwork in June 2015 we collected nine V. kaznakovi individuals (6 adults and 3 juveniles) in their natural habitat, whose venom was extracted by using a parafilm-covered laboratory beaker before the snakes were released back into their natural environment. The different V. kaznakovi individuals were found in Hopa (6 spec.), Borçka (2 spec.) and Arhavi (1 spec.) districts of the Artvin province (Figure 1). The LD50 mean values of venom pooled from all collected V. kaznakovi individuals was assessed by the intraperitoneal (IP) route using a random sample survey of five swiss mice for three venom dose (5, 2 and 1 mg/kg), which is summarized in supplemental table 1. The LD50 mean value obtained for the pooled V. kaznakovi venom was calculated as ~2.6 mg/kg and can categorized to have slightly weaker toxicity in this model, compared to other related viper species (0.9–1.99 mg/kg) [62–65].
3.2. Venom gland transcriptomics
The V. kaznakovi venom gland transcriptome resulted in 1,742 assembled contigs, of which 46 exhibited gene annotations relating to 15 venom toxin families previously described in the literature (Figure 2). The majority of these contigs (33) encode genes, expressing toxin isoforms relating to four multi-locus gene families, namely the svMPs, CTLs, svSPs and PLA2s (Figure 2). Moreover, these four toxin families also exhibited the highest expression levels of the toxin families identified; in combination accounting for >78% of all toxin expressions (Figure 2). These findings are consistent with many prior studies of viperid venom gland transcriptomes [10,12,49,66,67].
The svMPs were the most abundantly expressed of the toxin families, accounting for 33.4% of the total toxin expression, and were encoded by 17 contigs (Figure 2). However, these contig numbers are likely to be an overestimation of the total number of expressed svMP genes found in the V. kaznakovi venom gland, as six of these contigs were incomplete and non-overlapping in terms of their nucleotide sequence, and therefore likely reflect a degree of low transcriptome coverage and/or under-assembly. Of those contigs that we were able to identify to svMP class level (e.g. P-I, P-II or P-III [68,69]), ten exhibited structural domains unique to P-III svMPs, one to P-II svMPs and one to a short coding disintegrin. Interestingly, the svMP contig that exhibited the highest expression level encoded for the sole P-II svMP (5.1% of all venom toxins), whereas the short coding disintegrin, which exhibited 98% identity to the platelet aggregation inhibitor lebein-1-alpha from Macrovipera lebetina [70], was more moderately expressed (2.1%). Interestingly, we found no evidence for the representation of the P-I class of svMPs in the V. kaznakovi venom gland transcriptome.
The CTLs were the next most abundant toxin family, with six contigs representing 27.5% of all toxin gene expression (Figure 2). Interestingly, one of these CTLs, which exhibits the closest similarity to snaclec-7 from Vipera ammodtyes venom (GenBank: APB93444.1), was by far the most abundantly expressed toxin identified in the venom gland transcriptome (15.4% of all toxins) (Figure 2). We identified lower expression levels for the multi-locus svSP and PLA2 toxin families, which accounted for 9.2% and 8.1% of the toxins, expressed in the venom gland transcriptome respectively, and were encoded by seven and three contigs (Figure 2). Of the remaining toxin families identified, only two exhibited expression levels >3% of the total toxin expression; CRISPs were encoded by two contigs amounting to 5.4% of total toxin expression, and LAAO by a single contig representing 4.23% (Figure 2). The remaining nine, lowly expressed, toxin families identified in the venom gland transcriptome are displayed in Figure 2, and combined amounted to 12.1% of total toxin expression.
3.3 Decomplexed proteomics of pooled venom
To broadly characterize the venom composition of V. kaznakovi, in an initial experiment, we performed bottom-up analysis of pooled venom by reversed phase-HPLC separation (Figure 3A) and direct online intact mass analysis by ESI-HR-MS (Figure 3B). The prominent bands of the subsequent separation by SDS-PAGE (Figure 3C) were excised followed by trypsin in-gel digestion and LC-MS/MS analysis. During the first analysis we did not have a species-specific transcriptome database available, hence the spectra were analyzed by de novo sequencing. The resulting sequence tags were searched against the NCBI non-redundant viperid protein database using BLAST [59]. The 57 sequence tags resulted in the identification of 25 proteins covering 7 toxin families (Table 1), namely svMP, PLA2, svSP, CTL, CRISP, VEGF and LAAO.
De novo sequencing of MS/MS spectra of native small peptides (peaks 1–9) resulted in four additional sequence tags and the identification of a svMP inhibitor (svMP-i) and two bradykinin potentiating peptides (BPP). When we obtained the assembled transcriptome data, we re-analyzed the MS/MS data from the tryptic peptides by peptide spectrum matching (PSM) using the translated protein sequences of the transcriptome as well as the NCBI viperidae protein database. PSM resulted in 114 peptide matches in total, which doubled the number of annotated spectra in comparison to the de novo annotation. The analysis revealed the same seven major toxin families as identified by the tryptic de novo tags, but showed with 29 identified proteins (compared to 25 by the above approach) a slight improvement. Not surprisingly, most of the peptide matches were from the transcriptome derived sequences and only six protein IDs came from other viperid sequences from the NCBI database. Relative quantification through integration of the UV-HPLC peaks and densitometric analysis of the SDS-gels revealed that the most abundant toxin families were svMP (37.7%), followed by PLA2 (19.0%), svSP (9.6%), LAAO (7.1%), CTL (6.9%), CRISP (5.0%), and VEGF (0.3%). In the small molecular mass range < 2kDa, SVMP-i contributed 12.6%, BPP 2.0%, and unknown peptides 4.0% to the overall venom composition (Figure 3D).
Comparing the abundance of venom toxins (Figure 3D) with transcriptomic predictions of expression (Figure 2A), we observed an overall positive correlation, but we noted some major differences, particularly relating to the CTLs: transcriptomic expression levels showed CTLs to be the second most abundant toxin family (27.5% of all toxin contigs) while proteomic analysis shows a much lower occurrence (6.9%). Interestingly, some of the molecular masses observed for CTLs (~20 kDa) during SDS-PAGE did not correspond to the expected molecular mass derived from the transcriptome sequence. As reported in other studies, we assume that some of the observed CTLs are hetero-dimers [71]. SvMPs showed highly consistent profiles, as both the most abundantly expressed (33.4%) and translated (32.7%) toxin family. Similarly, the svSPs (9.2%) and CRISPs (5.4%) exhibited transcription levels highly comparable to their relative protein abundance in venom (9.6% and 5.02%). A lower transcription level was shown for PLA2 (8.1%) in contrast to the two times higher protein level (19.0%). As anticipated, with the exception of VEGF (2.0% T; 0.4% P) and svMP-i (1.7%; 12.6%) as part of the peptidic content, other lowly expressed ‘toxin’ families could not be assigned on the proteomic level.
The observed discrepancies in proteomic abundance and transcriptomic expression (e.g. CTLs and PLA2s) is influenced by many factors, e.g. post-genomic factors acting on toxin genes [49], such as the regulation of expression patterns by MicroRNAs (miRNA) [7,72], degradation processes [73], systematic or stochastic variations [74] or technical limitations in the experimental approach, including the eventually lower sensitivity of the proteomics workflow. Perhaps most importantly it needs to be mentioned that here we compared the toxin transcription level of a single individual (adult female) to a pooled venom protein sample (n=9), and thus, while it is possible that these differences are predominately due to the above mentioned regulatory processes, it seems likely that intra-specific venom variations may influence our findings. Due to understandable sampling/ethical restrictions relating to the sacrifice of individuals, we were unable to sequence venom gland transcriptomes of multiple specimens of V. kaznakovi.
The previous proteomic characterization of the V. kaznakovi venom by Utkin and coworkers was performed by in-solution trypsin proteolysis followed by nanoLC-MS/MS [52]. The PSM against a full NCBI Serpentes database identified 116 proteins from 14 typical snake venom protein families. The semi-quantitative venom composition showed PLA2 (41%) as the most abundant component, followed by svMPs (16%), CTL (12%), svSP (11%), CRISP (10%), LAAO (4%), VEGF (4%) and other lowly abundant proteins (< 1%) [52]. Besides the additional detection of lowly abundant proteins, the main difference to our results are the considerably higher levels of PLA2 and the lower abundance of svMPs (~ 4 fold difference for both protein families). The reasons for the additional detection of lowly abundant proteins could be of technical nature, as the nanoLC-MS/MS and mass spectrometer used in the study by Utkin et al., is typically more sensitive than the LC-MS/MS setup we used. While explanations for the major differences in protein abundance could be the different quantification method applied (UV abundance vs. summed peptide abundance [52]). Furthermore, the observed variations could also be biological in nature, i.e. the result of intra-specific venom variation, as the animals were collected in different geographic regions (Krasnodar Territory, Russia [51], with a distance of ~ 400 km to our collection site). However, as in most other venom proteomics studies the composition was determined from a pooled venom sample (15 individuals [52]), which has the potential to offset variation among individuals. In order to robustly assess the extent of intra-specific (e.g. population level) variations in V. kaznakovi venom analysis of a representative group of individuals is necessary.
3.4 Community venom profiling
It seems understandable that many venom studies are undertaken using pooled venom samples due to the associated costs and analysis time of decomplexing bottom-up venomics studies. In our case, we fractionated pooled venom from V. kaznakovi into 25 fractions and further separated the protein containing fractions (MW > 5 kDa) by SDS-PAGE. This multidimensional separation resulted in 25 digested peptide samples which we analyzed by LC-MS/MS, requiring ~ 10 h MS run time (25 min/sample), and an estimated ~$2,000 costs ($80/sample). Multiplying this effort and cost by numerous venom samples from individuals would of course make such a study comparatively expensive. Hence, many previous studies investigating venom variability within a species have used pooled venom for in-depth proteomic analysis, and then illuminated individual variability by the comparison of HPLC chromatograms and/or SDS-PAGE images [50,75,76]. This comparison allows at best a comparison at the protein family level (if protein families are clearly separated by HPLC or SDS-PAGE). As an alternative, a comparison by top-down or shotgun proteomics would allow for the differential comparison on the protein, or potentially proteoform, level performing a single LC-MS/MS run per individual.
However, shotgun approaches are likely to suffer from the aforementioned issues with protein inference, while top-down approaches have the drawback of not resolving high molecular mass proteins. This is particularly the case if the identification and comparison of proteins are based on Protein Spectrum Matching (PrSM), as high molecular weight toxins may not result in isotope resolved peaks and sufficient precursor signal, and thus are unlikely to provide sufficient fragment ions. However, a comparison by MS1 mass profiling only [77] would eliminate the problem of insufficient MS/MS fragments and isotope resolution, as spectra can be easily deconvoluted based on their charge state distribution. Such an approach could be particularly interesting for laboratories that are equipped with low resolution mass spectrometers.
In order to explore the potential of venom comparison by top-down mass profiling, we analyzed the venoms of nine V. kaznakovi individuals by LC-MS using the same chromatographic method as for our initial HPLC separation of our decomplexing bottom-up venom analysis. Chromatographic peak extraction of all individuals resulted in 119 consensus extracted ion chromatograms (XIC) or so-called ion features. The alignment of XICs by retention time and mass enabled the comparison of samples between individuals, but also a comparison with the mass profile of the pooled venom sample for a protein level annotation. An overview of all resulting features, including annotations, is shown in supplemental table 1. Looking at the binary distribution of ion features, individual venoms contained between 62 and 107 features, with a slightly higher average feature number in juveniles vs. adults. Comparing the total ion currents (TIC) of the LC-MS runs, the individual with the lowest feature number also had the lowest overall signal. Hence it is likely that the lower number of features in this individual was due to lower overall signal intensity and therefore might not be biologically representative. For further statistical evaluation we thus normalized feature abundance to TIC. Matching the features to the pooled bottom-up venomics results yielded an annotation rate between 83.4% and 93.5% of the features (based on XIC peak area). As anticipated, the annotation rate is slightly lower than the relative annotation of the pooled sample (96.0%; based on the UV214 peak area). The comparison of protein family venom compositions is shown in figure 4 and supplemental table 2. The highest variance was observed for svSP, CTL and LAAO toxin families (Figure 5A). Taking the age of the individuals into account, the abundance of svSPs was generally higher in the adult individuals than in the juveniles (average of 21.7% vs. 5.5 %), but no significant difference between male and female individuals, or between different geographic regions was observed. The svSPs play a significant role in mammalian envenomation by affecting the hemostatic system through perturbing blood coagulation, typically via the inducement of fibrinogenolytic effects [78,79]. Taking this into account, a possible explanation could be that lower svSP concentration in juveniles could be the result of differences in diet, as young animals typically prey on insects, before switching to feed upon small mammals and lizards as they become adults [80–82]. Despite their observed variations in abundance, no significant differences between the individual groups could be observed for the CTL and LAAO toxin families (Figure 5A). However, there was evidence that the svSP concentration is correlated to levels of LAAO, as the three individuals with the lowest svSP abundance showed the highest content of LAAO (Figure 5A). Whether this is a true biological effect or perhaps is the result of differences in ion suppression of the co-eluting compounds will need further investigation. We also observed variations between the PLA2 levels identified in the venoms, which ranged from 6.5–25.1%, but in all cases remained lower than those previously reported by Kovalchuk et al. (41%) [52]. In order to investigate the inner-species differences by multivariate statistics we performed a principal coordinate analysis (PCoA) using the Bray-Curtis dissimilarity metric. The PCoA plots of protein-level and proteoform-level data is shown in figure 5. Clustering of individuals in protein-family level PCoA space (Figure 5B) was only observed for the juvenile individuals. As expected from the univariate statistics no significant separation based on gender or region could be observed. Since an explanation for not resolving phenotype differences could be the reductions of variables through the binning of proteoforms, we used proteoform abundance as input matrix for PCoA. The outcomes of this analysis revealed a separation between both juvenile and adults, as well as between male and female snakes (Figure 5C). To investigating the toxin variants underpinning these separations, we used univariate comparison of the two groups and plotted the fold change of toxin abundance (log2) vs. the statistical significance (-log10 p-value, t-test) shown in supplemental figure 2. Besides the above mentioned differences in svSP, the most significant (p-value < 0.05, log2 fold change >2 or <-2) differences between juvenile and adult individuals was the higher abundance of small proteins with the masses 7707.26 Da, 5565.02 Da, 5693.10 Da in the juvenile group, all of which were unidentified in our proteomic analyses. Furthermore, we observed several smaller peptides with the masses 589.27 Da, 1244.56 Da, and 575.26 Da as well as a putative PLA2, with the mass of 13667.91 Da that was more abundant in the juveniles. Contrastingly, a putative PLA2 with a mass of 13683.86 Da was of lower abundance in the juvenile group. While we observed fewer significant changes between the venom toxins of the male and female individuals, the observed masses of the differential features indicated, that those differential toxins belong to different protein families than those involved in differentiating between juvenile and adult snakes. Two toxins with the masses 22829.66 Da and 24641.23 Da were higher abundant in male individuals and could be putatively annotated as hetero-dimeric CTLs. Another toxin with the mass 13549.87 was also higher abundant in the male group and according to the mass range is most likely a PLA2.
4. Concluding remarks
Here we describe the detailed analysis of the venom composition of Vipera kaznakovi by a combination of venom gland transcriptomics and decomplexing bottom-up and top-down venom proteomics revealing the presence of 15 toxin families, of which the most abundant toxins were svMPs (37.7%), followed by PLA2s (19.0%), svSPs (9.5%), CTLs (6.9%) and CRISP (5.0%). Intact mass profiling enabled the rapid comparison of venom sourced from multiple individuals. This community venomics approach enabled higher sensitivity of direct intact protein analysis by LC-MS, in comparison to decomplexing bottom-up venomics, and thus enabled us working with multiple venom samples and with low amounts of material (< 0.5 mg venom). This allowed us to capture the snakes, perform venom extractions and then immediately release the animals back in to the field. Our approach revealed intraspecific venom variation in Vipera kaznakovi, including both ontogenetic differences between juvenile and adult snakes, and to a lesser extent, sexual differences between adult males and females. The highest significant difference in venom proteome composition was found between the adult and juvenile group, with svSP toxins found to exhibit the greatest variance. However, in addition, individuals within all groups showed a generally high relative variance of CTL and LAAO concentrations. svMPs on the other hand seemed to be constantly the most abundant venom component in all V. kaznakovi individuals analyzed in our study. However, as the statistical power with a relatively small subject size (n=9) is limited, it would be interesting to extend this study to a larger sample cohort, ideally covering all geographical regions (from Northeastern Turkey to Georgia and Russia) of the V. kaznakovi distribution zone. The workflow applied herein would be well suited for an extensive venom analysis at the population level, and will hopefully enable venom researchers to more easily expand their experimental approach towards robust comparisons of intra-species venom variation, and not only characterize pooled venom samples.
Author contributions
D.P., A.N. and R.D.S. planned the study. D.P., A.N., B.G., M.K. and P.H. collected the animals and prepared venom and venom gland tissue samples. A.N. performed the determination of acute lethal dose. D.P., P.H. and B.-F.H. performed the toxin separation and acquired the mass spectrometry data. G.W., S.C.W. and N.R.C. constructed the transcriptome. D.P., B.-F.H. and N.R.C. performed the data analysis. A.N., N.R.C. and R.D.S. acquired funding and provided materials and instruments for the study. D.P., B.-F.H. and R.D.S. wrote the manuscript. All authors read, discussed and approved the manuscript.
Acknowledgements
We thank Sabah Ul-Hasan and Anthony Saviola for critical reading of the paper draft, and Robert Harrison for assistance with transcriptomics. This study was supported by the Deutsche Forschungsgemeinschaft (DFG) through the Cluster of Excellence ‘Unifying Concepts in Catalysis (UniCat), a postdoctoral research scholarship to D.P. (PE 2600/1), the Scientific and Technical Research Council of Turkey (TÜBİTAK) under Grant 114Z946, and a Sir Henry Dale Fellowship (200517/Z/16/Z) jointly funded by the Wellcome Trust and the Royal Society to N.R.C
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].
- [9].↵
- [10].↵
- [11].
- [12].↵
- [13].
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].
- [28].↵
- [29].↵
- [30].↵
- [31].
- [32].↵
- [33].↵
- [34].
- [35].
- [36].↵
- [37].↵
- [38].
- [39].↵
- [40].↵
- [41].
- [42].↵
- [43].↵
- [44].
- [45].
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].
- [64].
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].↵
- [79].↵
- [80].↵
- [81].
- [82].↵
- [83].↵