ABSTRACT
Background Genetic polymorphisms in the ADAMTS13 gene are associated with thrombotic thrombocytopenic purpura or TTP, a life-threatening microangiopathic disorder. This study aims to predict the possible pathogenic SNPs of this gene and their impact on the protein structure and function using insilico methods.
Methods SNPs retrieved from the NCBI database were analyzed using several bioinformatics tools. The different algorithms applied collectively to predictthe effect of single nucleotide substitution on both structure and function of the ADAMTS13 protein.
Results Fifty one mutations were found to be highly damaging to the structure and function of the protein. Of those, thirty five were novel nsSNPs not previously reported in the literature.
Conclusion According to our analysis we found thirty five nsSNPs effects on ADAMTS13 protein leading to thrombotic thrombocytopenic purpura using computational approach. Bioinformatics tools are vital in prediction analysis, making use of increasingly voluminous biomedical data thereby providing markers for screening or for genetic mapping studies.
INTRODUCTION
Thrombotic thrombocytopenic purpura (TTP), also known as Upshaw – Schulman syndrome (USS) is one of microangiopathic disorder which is a life threating thrombotic microangiopathy [1–4], caused by widespread of microthrombi composed from highly reactive high molecular von Willebrand factor (vWF) and platelets due to deficiency of the vWF-cleaving metalloprotease encoded by ADAMTS13 gene [1, 5, 6]. ADAMTS13 (a disintegrin and metalloprotease with thrombospondin type 1 repeats, member 13), is highly specific multi-domain plasma reprolysin-like metalloprotease [4, 7–9], located on chromosome 9 (9q34) [10], which is normally synthesized in the kidney (major side of production), liver and endothelium cells [11, 12]. The disease is characterized by thrombocytopenia, hemolytic anemia, neurologic signs, fever, organ ischemia and subsequently impaired function of different organs especially the kidneys and the central nervous system (CNS) [4, 5, 11, 13–16].
The congenital TTP is usually occurring due to mutations in ADAMTS13, while the acquired occur due to autoantibodies against ADAMTS13 [11–13]. Several studies revealed that the single nucleotide polymorphisms may be associated with reduced ADAMTS13 secretion and activity, and some mutations in ADAMTS13 gene responsible from phenotype expression of TTP characteristics [2, 7, 10–13, 17–22]. Although there are some researchers studied the mutations in ADAMTS13 [10, 12, 14, 16, 20, 21, 23] but the genetic mechanism of some mutations remain unclear. Our study aimed to define the impact of various SNPs in ADAMTS13gene using a bioinformatics approach and assessing the effect of these SNPs on the structure and function of ADAMTS13 protein that may be implicated in the disease susceptibility. Recently, bioinformatics approach make huge differences in science through providing massive information about genes, proteins, domains, SNPs, and many more. It is valuable to know genes functions and their interaction with other genes so the pathways in normal and abnormal case, genetic evolution to know who is closest to whom, proteins function so the structure and proprieties and SNPs and its association with disorder which all poured in health line. This study is regarded as the first computational study to predict the effects of SNPs on ADAMTS13 protein function and structure.
METHODS
Data mining
The data for human ADAMTS13 gene were collected from the National Center for Biological Information (NCBI) website [24]. The SNP information (SNP ID) of the ADAMTS13 gene was retrieved from the NCBI dbSNP (http://www.ncbi.nlm.nih.gov/snp/) and the protein ID and its sequence was collected from Swiss-Prot databases with the accession number: (Q76LX8). (http://expasy.org/) [25].
SIFT
SIFT (Sorting Intolerant from Tolerant) is an online computational tool [26] to detect a harmful non-synonymous single-base nucleotide polymorphism (nsSNP). SIFT predicts whether an amino acid replacement affects protein function. SIFT prediction is based on the grade of conservation of amino acid residues in sequence alignments resulting from closely related sequences, collected through PSI-BLAST. SIFT can be applied to naturally occurring non-synonymous polymorphisms or laboratory induced missense mutations, the genetic mutation that causes a single amino acid substitution (AAS) in a protein sequence subsequently changing the carrier’s phenotype and health status. SIFT expects whether these substitutions affect protein function by using sequence homology, SIFT predicts the effects of all possible substitutions at each position in the protein sequence. Furthermore, the algorithm performs a comprehensive search in protein repositories to find the tolerance of each candidate compared to the conserved counterparts. Non-synonymous reference SNPs identity (nsSNPs ID) were downloaded from online dbSNPs of NCBI and then submitted to SIFT. Results are expressed as damaging (deleterious) or benign (Tolerated) depending on cutoff value 0.05; as values below or equal to (0.0-0.04) predicted to be damaging or intolerant while (0.051) is benign or tolerated, then the damaging SNPs were reanalyzed by Polyphen software which predicts the effect of mutations on both structural and functional sides. It is available at (http://sift.bii.a-star.edu.sg/).
Polyphen-2
Polyphen-2 is a software tool [27] used to predict the possible impact of an amino acid substitution on both structure and function of a human protein by analysis of multiple sequence alignment and protein 3D structure, in addition, it calculates position-specific independent count scores (PSIC) for each of two variants, and then calculates the (PSIC) scores difference between two variants. The higher a PSIC score difference, the higher the functional impact a particular amino acid substitution is expected to have. Prediction outcomes could be categorized as probably damaging, possibly damaging or benign according to the value of PSIC as it ranges from (0_1); values closer to zero considered benign while values closer to 1 considered probably damaging and also it can be indicated by a vertical black marker inside a color gradient bar, where green is benign and red is damaging, nsSNPs that predicted to be intolerant by Sift has been submitted to Polyphen as protein sequence in FASTA format that obtained from UniproktB /Expasy after submitting the relevant ensemble protein (ESNP) there, and then we entered position of mutation, native amino acid and the new substituent for both structural and functional predictions. PolyPhen version 2.2.2 is available at the website. (http://genetics.bwh.harvard.edu/pph2/index.shtml)
Provean
Provean is a software tool which predicts whether an amino acid substitution or indel has an impact on the biological function of a protein. It is useful for purifying sequence variants to identify nonsynonymous or indel variants that are predicted to be functionally important [28]. Variants with a score equal to or below −2.5 are considered “deleterious” while variants with a score above −2.5 are considered “neutral.” It is available at (https://rostlab.org/services/snap2web/).
SNAP2
Functional effects of mutations are predicted with SNAP2 [29]. SNAP2 is a trained classifier that is based on a machine learning device called “neural network”. It distinguishes between effect and neutral variants/non-synonymous SNPs by taking a variety of sequence and variant features into account. The most important input signal for the prediction is the evolutionary information taken from an automatically generated multiple sequence alignment. Also, structural features such as predicted secondary structure and solvent accessibility are considered. If available also annotation (i.e. known functional residues, pattern, regions) of the sequence or close homologs are pulled in. In a cross-validation over 100,000 experimentally annotated variants, SNAP2 reached sustained two-state accuracy (effect/neutral) of 82% (at an AUC of 0.9). In our hands, this constitutes an important and significant improvement over other methods. It is available at (https://rostlab.org/services/snap2web/).
PHD-SNP
PHD-SNP is an online Support Vector Machine (SVM) based classifier optimized to predict if a given single point protein mutation can be classified as disease-related or as a neutral polymorphism [30]. It is available at: (http://http://snps.biofold.org/phd-snp/phdsnp.html)
SNPs&Go
SNPs&Go is an algorithm developed in the Laboratory of Biocomputing at the University of Bologna directed by Prof. Rita Casadio. SNPs&Go is an accurate method that, starting from a protein sequence, can predict whether a variation is disease related or not by exploiting the corresponding protein functional annotation. SNPs&Go collects in unique framework information derived from protein sequence, evolutionary information, and function as encoded in the Gene Ontology terms, and outperforms other available predictive methods [30]. It is available at (http://snps.biofold.org/snps-and-go/snps-and-go.html)
Pmut
PMUT a web-based tool for the annotation of pathological variants on proteins, allows the fast and accurate prediction (approximately 80% success rate in humans) of the pathological character of single point amino acidic mutations based on the use of neural networks [31]. It is available at (http://mmb.irbbarcelona.org/PMut).
I-Mutant 3.0
I-Mutant 3.0 Is a neural network based tool for the routine analysis of protein stability and alterations by taking into account the single-site mutations [32]. The FASTA sequence of protein retrieved from UniProt is used as an input to predict the mutational effect on protein stability. It is available at (http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi).
Project Hope
Project hope (version 1.0 is an online webserver used to search protein 3D structures by collecting structural information from a series of sources, including calculations on the 3D protein structure (if available), sequence annotations in UniproktB and predictions from DAS servers. Protein sequences were submitted to project hope server in order to analyze the structural and conformational variations that have resulted from single amino acid substitution corresponding to single nucleotide substitution. It is available at (http://www.cmbi.ru.nl/hope) [33].
UCSF (University of California at San Francisco) Chimera
UCSF Chimera (https://www.cgl.ucsf.edu/chimera/) is a highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles. High-quality images and animations can be generated. Chimera includes complete documentation and several tutorials. Chimera is developed by the Resource for Biocomputing, Visualization, and Informatics (RBVI), supported in part by the National Institutes of Health (P41-GM103311) [34].
GeneMANIA
We submitted genes and selected from a list of data sets that they wish to query. GeneMANIA approach to know protein function prediction integrate multiple genomics and proteomics data sources to make inferences about the function of unknown proteins [35]. It is available at (http://www.genemania.org/).
RESULTS
ADAMTS13 gene investigation using dbSNP/NCBI
This study identified the total number of nsSNP in Homo sapiens located in the coding region of ADAMTS13 gene, were investigate in dbSNP/NCBI Database [24]. Out of 1291 SNPs there are 821 nsSNPs (missense mutations).
Functional analysis of ADAMTS13 gene using SIFT, Polyphen-2, Provean, SNAP2, PHD-SNP, SNP&GO, PHD-SNP and P-Mut softwares
821 nsSNPs (missense mutations) were submitted to SIFT server, PolyPhen-2 server, Provean sever and SNAP2 respectively, 323 SNPs were predicted to be deleterious in SIFT server. In PolyPhen-2 server, our result showed that 491 were found to be damaging (133 possibly damaging and 358 probably damaging showed deleterious). In Provean server our result showed that 393 SNPs were predicted to be deleterious. While in SNAP2 server our result showed that 418 SNPs were predicted to be Effect. The differences in prediction capabilities refer to the fact that every prediction algorithm uses different sets of sequences and alignments. In table (2) we have submitted four positive results which are 66 results from SIFT, PolyPhen-2, Provean and SNAP2 to observe the disease causing one by SNP&GO, PHD-SNP and P-Mut servers.
In SNP&GO, PHD-SNP and P-Mut softwares were used to predict the association of SNPs with the disease. According to SNP&GO, PHD-SNP and P-Mut (57, 66 and 57 SNPs respectively) were found to be disease related SNPs. We selected the triple disease related SNPs only in 3 softwares (51 SNPs) for further analysis by I-Mutant 3.0, Table (3).
Prediction of Change in Stability due to Mutation Using I-Mutant 3.0 Server
I-Mutant result revealed that the protein stability decreased which destabilize the amino acid interaction (46). While five were found to be increased the protein stability (Table 3).
Interactions of ADAMTS13 gene with other Functional Genes illustrated by GeneMANIA
GeneMANIA revealed that ADAMTS13 has this function: metallopeptidase activity. The genes co-expressed with, share similar protein domain, or participate to achieve similar function are illustrated by GeneMANIA and shown in figure (52) Table (4) and (5).
Novel deleterious SNPs in ADAMTS13 gene revealed by this study
DISCUSSION
Our in-depth analysis of the SNPs in the ADAMTS13 gene found 51 amino acid changes that are highly pathogenic according to our insilico analysis. According to the dbSNP database some of these SNPs are listed as per their clinical impact as benign, pathogenic, and untested or no information is available. One of the aims of our study is to confirm or refute these claims based on computational analysis. 35 deleterious SNPs, not previously mentioned in the literature, were determined to be novel mutations and are listed in Table (6).
R398H, C508Y, C951G, C1024G are listed as being pathogenic by dbSNP. Several studies [14, 21, 22, 36–38] have found R398H mutation to be detrimental. Similarly, wet lab studies confirmed the pathogenicity of C508Y[21], C951G and C1024G[36] mutations. H234Q, D235Y, C311Y, P353L, W390C, G525D and C908Y are categorized as ‘untested’ in dbSNP. H234Q mutation in the ADAMTS13 gene found to be damaging through insilico methods in this study was also found to the cause of repeated renal failure in an adult patient [39]. D235Y mutation has also been studied in patients with congenital TTP [40]. Similarly, C311Y and P353L mutations in the ADAMTS13 gene have been found in patients with congenital TTP in more than one study[14, 37]. W390C mutation has been implicated in TTP and HUM (hemolytic uremic syndrome) [5]. G525D and C908Y ADAMTS13 mutations were found in Japanese patients with Upshaw-Shulman syndrome [2]. No clinical information was provided by dbSNP for the following mutations: R398C, R409W, R421C, G761S and C908S. However, several studies have indeed been conducted. A paper published in 2012 studied congenital TTP patients in the UK and confirmed the pathogenicity of R398C and R409W mutations and their impact on ADAMTS13 activity [22]. Sequencing of the ADAMTS gene in patients with deep vein thrombosis identified R421C mutation [41]. G761S mutation was identified in patients with impaired renal function [40], while C908S mutation was one of 10 ADAMTS13 gene mutations in six families with congenital TTP [38].
CONCLUSION
According to our analysis, we found 35 nsSNPs effects on ADAMTS13 protein leading to thrombotic thrombocytopenic purpura using computational approach. Bioinformatics tools are vital in prediction analysis, making use of increasingly voluminous biomedical data thereby providing markers for screening or for genetic mapping studies.
CONFLICT OF INTEREST
The authors declare that there is no conflict of interest regarding the publication of this paper.
DATA AVAILABILITY
All relevant data used to support the findings of this study are included within the manuscript and supplementary information files.
ACKNOWLEDGMENT
The authors wish to acknowledgment the enthusiastic cooperation of Africa City of Technology – Sudan.