Abstract
Peste des petits ruminants virus (PPRV) is an acute, highly contagious viral disease of small ruminants. It is endemic to sub-Saharan Africa, Asia and the Arabian Peninsula. The disease is a major constraint to food security, causing significant economic losses to subsistence farmers in affected areas.
The nucleoprotein of morbilliviruses is highly immunogenic and produced in large quantities in virus infected cells. This makes it a suitable target for the host’s immune response. In this study, B-cell and T-cell epitopes of PPRV Nig/75/1 nucleoprotein were predicted using a suite of in silico tools. Forty-six T-cell epitopes were predicted, of which 38 were MHC-I binding while eight were MHC-II binding. Of the 19 B-cell epitopes predicted, 15 were linear epitopes while four were discontinuous epitopes. Homology modelling of PPRV-N was done to elucidate the 3D structure of the protein and conformational epitopes. Conservation analysis of the discontinuous epitopes gave an indication into the similarity of the selected epitopes with other isolates of PPRV.
Predicted epitopes may form an important starting point for serological screening and diagnostic tools against PPRV. Experimental validation of the predicted epitopes will assist in selection of promising candidates for consideration as antigen-based diagnostic tools. Such diagnostic tools would play a role in the global fight and possible eradication of PPR.
Introduction
Peste des petits ruminants virus (PPRV) highly contagious transboundary viral disease of goats and sheep widely distributed in sub-Saharan Africa and the Arabian Peninsula (Shaila et al., 1996). The disease causes significant economic losses, especially to subsistence farmers who do not have easy access to vaccination. It is considered one of the main constraints to small ruminants’ production and is listed as a reportable disease by the Office International des Epizooties (OIE) (Berhe et al., 2003). Morbidity and mortality rates vary but can be as high as 100% and 90%, respectively in immunologically naive populations (Pope et al., 2013). The disease is caused by Peste des petit ruminant virus (PPRV); a single-stranded, negative sense RNA virus of the genus Morbillivirus, in the family Paramyxoviridae (Diallo et al., 1994).
The nucleoprotein of morbilliviruses is the most abundant structural protein and an important regulator of replication and transcription (Ismail et al., 1995). It is highly immunogenic in spite of its internal location (Yu et al., 2015) and expressed to a very high level in morbillivirus-infected cells (Choi et al., 2004). It consists of 525 amino acids with an estimated molecular weight of 58 kDa. (Diallo et al., 1994). After infection anti-N antibodies are produced indicating that there is a direct release of morbillivirus nucleoprotein into the extracellular compartment, where it binds to B-cell receptors (Bodjo et al., 2007). The nucleoprotein protein is therefore a good antigen candidate for the development of differential tests for differentiating infected animals from ones vaccinated (DIVA) with fusion (F) or haemagglutinin (H) based-recombinant marker vaccines (Choi et al., 2004; Choi et al., 2005). As a result the N protein is suitable for serologic screening of naturally infected from vaccinated animals (Couacy-Hymann et al., 2002) and as a diagnostic antigen towards which antibodies are directed during infection (Dechamma et al., 2006). N protein-specific T cells were previously found to comprise the bulk of the virus specific memory cells in the paramyxovirus family (Mitra-Kaushik et al., 2001). To underscore the importance of N protein in the immune response against morbilliviruses, most epitope mapping has been on the nucleoprotein protein (Mitra-Kaushik et al., 2001; Choi et al., 2004; Choi et al., 2005; Dechamma et al., 2006; Bodjo et al., 2007; Yadav et al., 2009; Yu et al., 2015). Though suspected to be unimportant for humoral protection against morbillivirus infection (Couacy-Hymann et al., 2002), the N protein is crucial in development of PPRV serological tests for diagnosis and disease surveillance (Yadav et al., 2009). Descriptions of PPRV N epitopes have been made for B-cell epitopes (Choi et al., 2005; Bodjo et al., 2007) and for both T-helper and B-cell epitopes (Dechamma et al., 2006).
In this study, the 3D model of PPRV nucleoprotein (also known as PPRV gP1) was determined and an integrated in silico approach used to predict both B-cell and T-cell epitopes. The predicted epitopes may form an important starting point for serological screening and diagnostic tools against PPRV.
Materials and methods
Retrieval of protein sequences and conservation analysis
All available full amino acid sequences of the PPRV nucleoprotein (PPRV N) were retrieved from the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/nucleotide) database in FASTA format. Conserved regions of PPRV N were determined with the web version of Clustal Omega (McWilliam et al., 2013) multi sequence alignment (MSA) tool from the European Bioinformatics Institute (EMBL-EBI) (http://www.ebi.ac.uk/Tools/msa/clustalo/) in default parameters. Redundant sequences were excluded from the alignment. Even though an NCBI reference sequence for PPRV nucleoprotein exists: accession number (YP_133821.1) (Bailey et al., 2005), the sequence for PPRV strain Nigeria/75/1, GenBank accession number: CAA52454.1 (Diallo et al., 1994) was the one used throughout this study based on locally available technical support with the particular strain.
T cell epitope prediction MHC class I prediction
MHCI binding predictions were made with artificial neural networks (ANNs) based methods such as the allele specific NetMHC (Lundegaard et al., 2008) and pan-specific NetMHCpan (Hoof et al., 2009). NetMHC results were obtained from the NetMHC 4.0 Server (Nielsen and Andreatta, 2016), accessed from (http://www.cbs.dtu.dk/services/NetMHC/) while NetMHCpan results were obtained from NetMHCpan version 3.0 server (http://www.cbs.dtu.dk/services/NetMHCpan/). Predictions were made based on the selection of between 9mer and 14mer mouse alleles: H-2-Db,H-2-Dd,H-2-Kb,H-2-Kd, H-2-Ld, H-2-Qa1 and H-2-Qa2 with threshold for strong binders set at 0.5 and for weak binders at 2. Cross binding epitopes were selected as putative epitopes.
Furthermore, proteosomal processing predictors based upon neural network architecture such as NetChop (Kesmir et al., 2002; Saxova et al., 2003; Nielsen et al., 2005), NetCTL (Larsen et al., 2005; Larsen et al., 2007) and NetCTLpan (Stranzl et al., 2010) were utilised. Epitope candidates based on peptide processing within the cell were determined from the IEDB website (http://tools.iedb.org/main/tcell/). Parameters selected for NetChop analysis were C terminal (3.0) with a 0.5 threshold. For NetCTL the weight on C terminal cleavage was set at 0.15, weight on TAP transport efficiency 0.05, selected supertype B7 and threshold varied between 0.2 and 0.75. For NetCTLpan (NetCTLpan 1.1 Server; http://www.cbs.dtu.dk/services/NetCTLpan/) epitope candidates were predicted for different mouse alleles based on the following settings: weight on C terminal cleavage (0.225), weight on TAP transport efficiency (0.025), threshold for epitope identification (1.0), threshold for showing predictions (-99.9) High scoring epitopes recognised by multiple alleles were selected.
MCH class II prediction
MHCII binding predictions for mouse H-2-I locus (H2-IAd and H2-IEd alleles) were made using the IEDB analysis resource Consensus tool (Wang et al., 2008; Wang et al., 2010). This tool combines any three of the four methods; NN-align (Nielsen and Lund, 2009), SMM-align (Nielsen et al., 2007), Sturniolo (Sturniolo et al., 1999) and NetMHCIIpan (Nielsen et al., 2008) to predict MHC Class II epitopes. Strong binding peptides and those binding to multiple MHC class II molecules were selected.
B-cell epitope predictions Prediction of linear B-cell epitopes from antigen sequence properties
A variety of tools were used to predict linear epitopes along the 525 amino acid PPRV N protein. IEDB B-cell epitope prediction tools used included BepiPred (Larsen et al., 2006), Chou and Fasman beta-turn prediction (Chou and Fasman, 1979), Emini accessibility prediction (Emini et al., 1985), Karplus and Schulz flexibility prediction (Karplus and Schulz, 1985), Kolaskar and Tongaonkar antigenicity prediction method (Kolaskar and Tongaonkar, 1990) and Parker hydrophilicity plot (Parket et al., 1986). B-cell epitope prediction methods; BCPred (EL-Manzalawy et al., 2008) and implementation of the AAP method (Chen et al., 2007) from the BCPREDS server (http://ailab.ist.psu.edu/bcpred/) were also used. Results acquired from the BCPREDS server were compared with BepiPred obtained results. Epitopes with high scores, and those common between different prediction methods were selected as likely to be antigenic.
Structural epitope prediction
Structural predictions of B-cell epitopes were done with the ElliPro antibody epitope prediction tool (http://tools.iedb.org/ellipro/). ElliPro, is a web-tool that allows the prediction and visualization of antibody epitopes in a given protein sequence or structure (Ponomarenko et al., 2008). Structural epitopes were generated from the PDB file of PPRV N 3D protein model generated by Phyre2 (Kelley et al., 2015). Epitope structure predictions were performed based on default parameters (minimum score value 0.5 and maximum distance of 6Å). High scoring three dimensional epitope structures were viewed with the Jmol applet (http://www.jmol.org/).
3D Modelling of PPRV N
Searching the Protein Databank Archive (RCSB PDB) (http://www.rcsb.org/pdb/home/home.do) revealed that there were no records of 3D structures for PPRV N. The three dimensional protein structure of PPRV N was predicted with the Protein Homology/analogY Recognition Engine V 2.0 (Phyre2 V2.0) (Kelley et al., 2015) server (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) in ‘Intensive’ modelling mode. Phyre2 uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyse the effect of amino acid variants in protein sequences (Kelley et al., 2015).
Ligand prediction
The ligand binding site on the modelled 3D structure of PPRV N was predicted with 3DLigandSite (http://www.sbg.bio.ic.ac.uk/3dligandsite). 3DLigandSite is a high performing web server for automatic prediction of ligand-binding sites for protein that have not been solved (Wass et al., 2010).
Protein structure quality and validation
The resulting 3D model was analysed and viewed with Swiss-PdbViewer Deep View Version 4.1 (http://spdbv.vital-it.ch/) and the Jmol viewer (http://www.jmol.org/). Different tools were also employed for stereochemical analyses and model quality evaluation. Structural evaluation and validation were performed using the Structure Analysis and Verification Server (http://services.mbi.ucla.edu/SAVES/) to obtain ERRAT (Colovos and Yeats, 1993), VERIFY 3D (Bowie et al., 1991) and PROVE (Pontius et al., 1996) scores. In addition Ramachandran plots of the model were generated using the RAMPAGE web tool (Lovell et al., 2002) available at (http://mordred.bioc.cam.ac.uk/˜rapper/rampage.php).
Results and discussion MHC prediction
T-cell recognition of peptide/major histocompatibility complex (MHC) is a prerequisite for cellular immunity. Different MHC molecules bind peptide fragments from pathogens and in association with T-cell receptor (TCR) molecules present them to appropriate T-cells (Zepp,2016). Peptide fragments that bind specific MHC are therefore targets for vaccine and immunotherapy (Tong et al., 2007). As such their accurate prediction is important in the design of diagnostics and vaccines. MHC I binds and presents epitopes which are derived from proteolytically degraded intracellular proteins about 8–11 residues long while MHC II epitopes are derived from extracellular sources and are much longer on average (up to 25 residues long) (Michalik et al., 2016). Differences in the structures of MHC molecules mean that peptides bind differently to the binding pockets of each MHC class (Wang et al., 2008). Different prediction tools may therefore be required to exploit differences in structure and binding affinities of the two MHC classes. MHC class I epitopes (Tables 1 and 2; Figure 1) and MHC class II epitopes (Table 3) were predicted using NetMHC, NetCTLpan, NetCHOP and NetCTL. Selected epitopes were those binding multiple alleles (Tables 1 and 2) and those taken to be strongly binding (Table 3) based on binding threshold. Most epitopes described as either binding different types of alleles, or strong binders (less than 2% binding threshold) corresponded with epitopes and minimal motifs described by Yu et al., (2015) through fine mapping and conservation analysis. The distribution of epitopes within the nucleoprotein is graphically displayed in Figure 1. The observed overlap between predicted and experimentally determined epitopes implies a high likelihood that described epitopes are antigenic.
Prediction of linear B-cell epitopes from PPRV N sequence properties
The humoral immune response recognises antigenic determinants in pathogenic proteins to activate and generate memory B-cells. These antigenic regions are called B-cell epitopes and can be used to develop diagnostic tools and vaccines against pathogens (Dhanda et al., 2016). Consequently, reliable prediction of antibody, or B-cell epitopes is important in the design of vaccines and immunodiagnostics (Ponomarenko et al., 2008). Unlike T-cell epitopes, B-cell epitopes are not presented in the context of MHC molecules (Michalik et al., 2016). Additionally, they often exist as discontinuous epitopes and are known to be difficult to predict (Kringelum et al., 2012). Tools for identifying B-cell epitopes rely on primary sequence information and functional characteristics of the epitope (Ponomarenko and van Regenemortel, 2009; Sun et al., 2015). This is in contrast with sequence-based methods which rely on descriptors such as ability to form linear secondary structures, hydrophilicity (Hopp and Woods, 1981), hydrophobicity (Kyte and Doolittle, 1982), surface accessibility (Emini et al., 1985). Methods which depend on amino acid composition are accurate for continuous epitope prediction (Ponomarenko and van Regenmortel, 2009).
The methods employed in this study take advantage of various amino acid properties to determine areas of the protein likely to be antigenic. BepiPred predictions for linear B-cell epitopes (Table 3, Figure 2A) rely on a combination of hidden Markov models and propensity scales. The BepiPred method has been found to perform better than both random predictions and propensity scale methods (Larsen et al., 2006).
A comparison of sequence based methods for epitope prediction (Figure 2A-2E) indicates that the different methods predict similar areas of the PPRV nucleoprotein as likely to be antigenic. This overlap in epitope prediction increases the chances that identified epitopes will provide the required immune response for vaccine design, or binding specificity in diagnostics. Chou and Fasman beta-prediction method (Figure 2B) uses automated prediction of the chain reversal regions of globular proteins using bend frequencies and β-turn conformational parameters (Chou and Fasman, 1979).
Emini accessibility prediction (Figure 2C) compares sequences by predicted surface features based upon indices of surface probability (Emini et al., 1985). Karplus and Schulz flexibility plot (Figure 2D) takes chain flexibility as indicative of an antigenic determinant and as a basis for selecting cross reacting peptides (Karplus and Schulz, 1985).
Epitopes predicted by the Kolaskar and Tongaonkar antigenicity prediction method (Figure 2E) are based on physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes (Kolaskar and Tongaonkar, 1990). Applying this method to a large number of proteins has been shown to perform better than most of the known methods; predicting antigenic determinants by up to 75% accuracy (Kolaskar and Tongaonkar, 1990).
Epitopes predicted by the Parker hydrophilicity method (Figure 2F) were also similar to those predicted by other methods used in the study, or previously described in the literature (Yu et al., 2015). Parker hydrophilicity method is based on peptide retention times during high-performance liquid chromatography (HPLC) in which HPLC parameters showed high correlation with antigenicity (Parket et al., 1986).
B-cell epitopes predicted in this study compare favourably with morbillivirus epitopes mapped by Bodjo et al., (2007) who described Rinderpest virus epitopes that mapped to an area between amino acids 120 and 149 and in the carboxy terminal region between aa 421-525 (Bodjo et al., 2007). Similarly, Choi et al., (2004) described morbillivirus epitopes in the aa regions 1-149, 440-452 and 479-486 which are reflected in figures 3A, 3B, 3C, 3D and 3F of this study (Choi et al., 2004). Based on the conserved nature of the morbillivirus nucleoprotein, and the immunological similarity of both rinderpest and PPRV it is possible that the epitopic regions may occur on the same protein regions. PPRV specific N protein epitopes (Choi et al., 2005; Dechamma et al., 2006) were also described within the amino acid regions 1-262 and 448-521. The advantage of in silico methods over experimental methods is that in silico methods are quicker and span the entire genome making it easier to predict antigenic areas not recognisable as epitopic experimentally.
Structural epitope prediction
Sequence-based epitope prediction methods are inadequate in identifying discontinuous B-cell epitopes (Michalik et al., 2016). Discontinuous epitopes are best described by structural methods which incorporate the peptides’ three dimensional structure (Kringelum et al., 2012) and functional properties of the epitope (Sun et al., 2015). In this study, the structure based tool ElliPro was used to predict linear and discontinuous epitopes of PPRV N. ElliPro (from Ellipsoid and Protrusion), utilises three algorithms to approximate the protein shape as an ellipsoid; calculate the residue protrusion index (PI); and cluster neighbouring residues based on their PI values (Ponomarenko et al., 2008). Five PPRV N epitopes with protrusion indices (PI) above 0.7 (Table 5) were selected and Jmol visualisation of the epitope residues shown in Figure 3 (A-E). A similar approach was used for discontinuous epitopes, where epitopes with PI scores above 0.6 were selected. Results are shown in Table 6, with visualisation of the structures in Figure 4A-D. In both instances, the epitopes are similar to those described by other methods showing the reliability of combined methods in predicting possible antigenic sites. Conservancy results for the epitopes (Tables 7 and 8) show the sequence identity of the epitopes to various isolates and accessions of PPR. Conservancy ranged from about 24% to almost 90% of the 49 PPRV strains for linear epitopes (Table 7). For discontinuous epitopes (Table 8) conservancy ranged from 16% to 98%. Conserved sequences within different strains are possibly in parts of the protein which are indispensable for protein function and not prone to mutations (Diallo et al., 1994). Epitopes from the pathogen’s conserved protein sequences are likely to be effective for various isolates of the pathogen (Sette et al., 2001). Epitopic residues with high sequence identity and high PI score are likely to be antigenic for different PPRV strains, and therefore suitable for further analysis.
The 3D model of PPRV N was predicted with Phyre2 in the ‘Intensive’ mode. Phyre2 is a homology modelling tool for prediction and analysis of protein structure, function and mutations. Phyre2 uses Hidden Markov Models (HMM) to build 3D models, predict ligand binding sites and analyse the effect of amino acid variants on the submitted protein sequence (Lawrence et al., 2015). The intensive mode attempts to create a complete full-length model of a sequence through a combination of multiple template modelling and simplified ab initio folding simulation (Lawrence et al., 2015). Three templates were selected to model the final protein model (Figure 5) based on heuristics that maximise confidence, percentage identity and alignment coverage. Two of the templates; c5e4v (PDB Title: crystal structure of measles n0-p complex) and c4uftB (PDB Title: structure of the helical measles virus nucleocapsid) were modelled with 100% confidence. These templates had 81% and 85% identity respectively with the final model.
The third template, c1t6oB (PDB Title: nucleocapsid-binding domain of the measles virus p protein2 (amino acids 457-507) in complex with amino acids 486-5053 of the measles virus n protein) was modelled with 78% confidence, and 80% identity. The confidence of the predicted final model is quite high, with 414 residues (79%) modelled at >90% accuracy. However, 94 residues were modelled by ab initio. Ab initio modelling, even though highly unreliable, enables modelling of entire model. Models with a confidence value >90% are considered good quality models. The predicted model is taken to adopt the overall fold shown and that the core of the protein is modelled at an accuracy of 2–4 Å root mean square deviation (RMSD) from the native, true structure (Lawrence et al., 2015).
In addition to the 3D model, ligand binding site of the predicted structure was determined using the 3DLigandSite web server (Wass et al., 2010) and viewed in Jmol. The predicted binding site occurs between amino acid residues 376VAL to 440ALA of the structure and is shown in blue colour while the rest of the protein is shown as a grey cartoon (Figure 6A). A close up of the ligand binding site (Figure 6B) indicates the residues predicted to form part of the binding site comprising 376VAL, 377SER, 380ILE, 381ALA, 401THR, 405ARG, 408ARG, 412PRO, 428GLU, 429SER, 430PRO, 434THR, 435ARG, 436GLU, 437GLU, 438VAL, 439LYS and 440ALA. The ligands that form the cluster used for the prediction are displayed with non metal ions shown as wireframes.
Protein structure quality and validation
Integrity of the 3D model was tested with the SAVES server (http://services.mbi.ucla.edu/SAVES/) to obtain ERRAT, VERIFY 3D and PROVE scores. The ERRAT program differentiates between correctly and incorrectly determined regions of protein structures based on characteristic atomic interactions (Colovos and Yeates, 1993). Analysis of frequency distribution and positioning of amino acids in the PPRV N model resulted in a low ERRAT score of 60.665% (Figure 7A). VERIFY 3D Determines the compatibility of an atomic model (3D) with its own amino acid sequence (1D) by assigning a structural class based on its location and environment (Bowie et al., 1991). VERIFY 3D results (Figure 7B) showed that 73.71% of the residues had an averaged 3D-1D score >= 0.2. This score is more than the low 65% mark, but less than the accepted 80% of residues having a 3D-1D score above 0.2. The Z-score of a protein is the difference in energy between the native fold and the average of an ensemble of misfolds in the units of the standard deviation of the ensemble (Zhang and Skolnick, 1998). Z-score for PPRV N (Figure 7C) was predicted with the PROVE (PROtein Volume Evaluation) program (Pointius et al., 1996). The Z-score locates the modelled structure in relation to highly resolved and well-refined protein structures submitted to the PDB (Pointius et al., 1996). The Z-score for PPRV N was found to be −0.095 indicating acceptable model quality. Ramachandran plots (Figure 8) of PPRV N model generated using the RAMPAGE web tool (Lovell et al., 2003)http://mordred.bioc.cam.ac.uk/˜rapper/rampage.php) revealed that the phi-psi torsion angles for 87% of residues are in the favoured region, while 8.4% of the residues are in the allowed region. The number of residues in outlier region was found to be 4.6%. The generally low model quality scores may be as a result of low sequence coverage from templates used for modelling and from lack of PPRV crystal structures solved at near atomic resolution in the PDB.
Conclusion
PPRV is the causal agent of peste des petits ruminants; a highly contagious viral disease with high mortality in small ruminants. The nucleoprotein of PPRV was subjected to epitope prediction using different in silico prediction and 3D modelling tools. Predicted epitopes compared favourably to previously described PPRV N epitopes in the literature. Accurate prediction of epitopes is an important part of designing N-protein specific diagnostic immunoassays for PPRV. The next step after in silico prediction would be experimental validation of the predicted epitopes and selection of promising candidates for consideration as antigen-based diagnostic tools. Such diagnostic tools would play a role in the global fight and possible eradication of PPR.