Abstract
Background Retinitis Pigmentosa (RP) refers to a group of inherited disorders characterized by the death of photoreceptor cells leading to blindness. The aim of this study is to identify the pathogenic SNPs in the IDH3A gene and their effect on the structure and function of the protein.
Method we used different bioinformatics tools to predict the effect of each SNP on the structure and function of the protein.
Result 20 deleterious SNPs out of 178 were found to have a damaging effect on the protein structure and function.
Conclusion this is the first in silico analysis of IDH3A gene and 20 novel mutations were found using different bioinformatics tools, and they could be used as diagnostic markers for Retinitis Pigmentosa.
1. Introduction
Retinitis Pigmentosa (RP) refers to a group of inherited disorders characterized by the death of photoreceptor cells (rods and cons) leading to blindness (1–12) It affects about 2.5 million people worldwide (5) and symptoms usually starts with night blindness followed by vision loss, progressive restriction of the peripheral visual field and abnormalities in the electroretinogram (1–4, 13). RP is classified as nonsyndromic (70%-80%) (14, 15), syndromic or systemic (4, 16). The most frequent form of syndromic RP is Usher syndrome (14, 17) which involves neurosensory hearing loss, and Bardet-Biedl syndrome (18, 19) which involves RP, obesity, renal abnormalities and mental retardation.
RP has three patterns of inheritance; autosomal dominant (adRP), autosomal recessive (arRP) and X-linked pattern (14, 20–22) and the most reported genes for each case are RHO gene, USH2A and RPGR respectively (4, 21–24) yet the gene we are working on, Isocitrate Dehydrogease 3, (IDH3A) is a novel gene identified as the cause of typical arRP (25) located in the human chromosome 15q25.1 (26). The IDH3A function is to catalyze the oxidative decarboxylation of isocitrate (27) which is the key rate-limiting step of the tricarboxylic acid cycle. It also plays a central role in the aerobic energy production and cellular respiration thus mutations in this gene will affect the nervous system (28, 29) and may cause cancer (30, 31).
Although the RP is the most common inherited disease of the retina (14, 32–36) there is still no effective therapeutic strategy and the exact pathogenesis and etiology of the disease is not clear (3, 5, 32, 37).
The aim of this study is to identify the pathogenic SNPs in the IDH3A gene and their effect on the structure and function of the protein, which might help in the overall understanding of the pathogenesis of the disease and could be used as diagnostic markers. This is the first in silico analysis in the coding region of IDH3A gene to prioritize SNPs for further genetic mapping studies. Utilization of in analysis softwares expedites the process of identifying the deleterious SNPs with no cost, and also facilitates future genetic studies.(38)
2. Method
2.1 Data Mining
The data concerning the IDH3A human gene was procured from the National Center for Biotechnology Information (NCBI)(http://www.ncbi.nlm.nih.gov/) (39). The SNPs information of the IDH3A gene was collected from the NCBI dbSNP database (http://www.ncbi.nlm.nih.gov/snp/) (40) and the IDH3A protein sequence was obtained from UniProt database (https://www.uniprot.org/) (41).
2.2 SIFT
SIFT algorithm predicts damaging and tolerated (non-damaging) substitutions based on sequence homology and physical properties of sequence submitted. It also predicts the functional consequences of amino acid substitutions caused due to nsSNPs. SIFT is based on the premise that protein evolution is correlated with protein function. It uses multiple alignment information to predict tolerated and deleterious substitutions for every position of the sequence of interest. (42, 43)
SIFT score ranges from 0 to 1. The amino acid substitution is predicted to be damaging if the score is ≤ 0.05, and tolerated if the score is > 0.05. (https://sift.bii.a-star.edu.sg/)
2.2 PolyPhen-2
PolyPhen-2 (polymorphism phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations. PolyPhen-2 uses eight sequence-based and structure-based predictive features which were selected automatically by an interactive greedy algorithm. Majority of these features involves comparison of a property of the wild-type residue and the corresponding property of the mutant.
PolyPhen-2 uses multiple sequence alignment by selecting the homologues sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignments. The functional significance of an amino acid replacement is predicted from its individual features by Naïve Bayes classifier. (44, 45)
Prediction outcomes are benign, possibly damaging or probably damaging according to the PSIC value which ranges from 0 to 1. Values closer to zero are considered benign while values closer to one are considered probably damaging. PolyPhen-2 is available at (http://genetics.bwh.harvard.edu/pph2/).
2.3 PROVEAN
Provean predicts the functional effect of protein sequence variations, including single amino acid substitutions and small insertions and deletions. The prediction is based on the change, caused by a given variation, in the similarity of a query sequence to a set of its related protein sequences. For this prediction the algorithm is required to compute a semi global pair-wise sequence alignment score between the query sequence and each of the related sequences (46, 47). Provean is available at this website (http://provean.jcvi.org/index.php).
2.4 PhD-SNP
A predictor of human deleterious SNPs based on an online Support Vector Machine (SVM) classifier explicitly designed for human dataset correlated with disease inducing mutations. The SVM classifies mutations into disease related (desired output set to 0) and neutral polymorphism, based on protein sequence and profile information. The reliability index value (RI) is evaluated and the probability of the amino acid being deleterious is obtained. A probability> 0.5 is considered disease associated whereas ≤0.5 is considered neutral. PhD-SNP is available at this website (http://snps.biofold.org/phd-snp/phd-snp.html).
2.5 SNP&GO
SNP&GO is a method for the prediction of deleterious SNPs using protein functional annotation. The server is based on SVM classifier that discriminates between disease related and neutral SNPs. It has two components, one is sequence based and the other is structure based. The output gives the mutated residue, the prediction (either disease or neutral), RI (probability of disease related class) and the information about the prediction method. If the probability is > 0.5 then the variation is disease associated. SNP&GO is available at this website (http://snps.biofold.org/snps-and-go/index.html) (48, 49).
2.6 P-Mut
P-Mut uses a robust methodology to predict disease-associated mutations. It allows fast and accurate prediction based on the use of neural networks (NNs) trained with a large database of neutral mutations and pathological mutations of mutational hot spots which are obtained by alanine scanning, massive mutations and genetically accessible mutations. The final output is displayed as a pathogenecity index ranging from 0 to 1 and a confidence index ranging from 0 to 9(50). P-Mut server is available at this website (http://mmb.irbbarcelona.org/PMut/).
2.7 I-Mutant 3.0
I-Mutant is an SVM-based tool for the automatic prediction of protein stability changes upon single point mutations. The predictions are performed starting either from the protein structure or, more importantly, from the protein sequence. I-Mutant correctly predicts 80% or 77% of the dataset, depending on the usage of structural or sequence information, respectively. The server was also trained to predict the value of the free energy stability change upon single point mutation, starting from the protein structure or sequence (51). (http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi)
2.8 MUpro
MUpro is a SVM-based tool for the prediction of protein stability changes upon nsSNPs. The value of the energy change is predicted, and a confidence score between −1 and 1 for measuring the confidence of the prediction is calculated. The accuracy for SVM using sequence information is 84.2%. MUpro is available at (http://mupro.proteomics.ics.uci.edu/).
2.9 Chimera 1.8
Chimera is a highly extensible program of interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories and conformational ensembles. High quality images and animations can be generated. Chimera is segmented into a core that provides basic services and visualization, and extensions that provides higher level functionalities (52). The protein structure was obtained from RaptorX server (http://raptorx.uchicago.edu/). Chimera is available at (http://www.cgl.ucsf.edu/chimera/).
2.10 GeneMania
GeneMania is a flexible, user-friendly web interface for generating hypothesis about gene function, analyzing gene lists and prioritizing genes for functional assays. Given a query gene list, GeneMania finds functionally similar genes using a wealth of genomics and proteomics data. Another use of GeneMania is gene function prediction by finding the genes that are likely to share functions with the query based on their interactions with it (53–55). (https://genemania.org/).
2.11 Project HOPE
Project HOPE is next-generation web application for automatic mutant analysis and it explains the molecular origin of a disease related phenotype caused by mutations in human proteins. HOPE collects information from data sources such as the protein’s 3D structure and the UniProt database of well-annotated protein sequences. A decision scheme is used to process these data and to predict the effect of the mutation on the 3D structure and the function of the protein. Then a report is produced which explains and illustrate the effect of the mutations (56). (http://www.cmbi.ru.nl/hope/method/)
2.12 BioEdit
BioEdit is a user-friendly sequence alignment editor which enables the user to manually edit a multiple sequence alignment in order to obtain a more reasonable or expected alignment. The sequences are entered in a FastA format and the output file is provided in a suitable user-designated format. BioEdit is available for download at (http://www.mbio.ncsu.edu/bioedit/bioedit.html).
3. Result
4. Discussion
20 novel mutations have been found to affect the stability and function of IDH3A gene using bioinformatics tools. The softwares used were based on different aspects and used different parameters to determine the pathogenecity of the SNPs and to give clues about the effect of the mutation on the molecular level. Multiple softwares were used in this study and the results were compared in order to minimize the errors and confirm the results. (Figure 1)
The total number of the nsSNP −571-in Homo sapiens located in the coding region of the IDH3A gene was obtained from the dbSNP/NCBI Database; out of this number 178 nsSNPs (missense mutations) were submitted to SIFT server, Polyphen-2 server and provean server respectively to predict the effect of the SNPs on the function of the protein. SIFT server predicted 91 SNPs to be deleterious, polyphen-2 server predicted 90 SNPs to be deleterious (68 probably damaging and 22 possibly damaging) while Provean server predicted 100 deleterious SNPs.
The triple positive results-which are the ones affecting the function of the protein-were found to be 24 SNPs (see table 1); they were submitted to PHD-SNP, SNP&GO and P-MUT for further analysis. 20 deleterious SNPs were predicted by PHD-SNP and SNP&GO while P-MUT predicted 23 deleterious SNPs, so the triple positive result −20 SNPs-(see table 2) was submitted to I-Mutant and MUpro to investigate their effect on the stability.
The 20 SNPs were all found to be decreasing the stability of the protein except for 2 SNPs (F184L and A302V) which were predicted by I-Mutant to increase the stability of the protein (see table 3).
The sequence of IDH3A was aligned against 6 members of its homologues using Bioedit software, and all the SNPs were found to be in a conserved region agreeing with the result obtained from Project Hope which revealed that the SNPs are all found in a domain. This suggests that the SNPs will have an effect on the structure and function of the protein. As an example, P304H mutation was found in a conserved region (see figure 3). Project HOPE uncovered that the mutant (Histidine) is bigger than the wild-type (Proline) and the wild-type was buried in the core of the protein so the mutant will probably not fit. In addition to that, there is a difference in the hydrophobicity between the two and the mutation will cause loss of hydrophobic interactions in the core of the protein. This mutation was confirmed pathogenic in our study and that comes with agreement to another study that confirmed the association of P304H mutation with sever encephalopathy in infancy. (28)
GeneMANIA revealed that IDH3A interacts with so many genes and functions in the cellular respiration and energy derivation by oxidation of organic compounds in the tricaboxylic acid cycle, and abnormalities in this function was found to play a key role in the pathogenesis of Bipolar Disorder, therefore IDH3A could potentially be a novel therapeutic target for bipolar disorder. (29) The genes co-expressed with, share similar protein domain, or participate to achieve similar function were illustrated by GeneMANIA and shown in figure (2) Tables (4 & 5). UCSF Chimera was used to visualize and analysis of molecular structures. (Figures 4-23)
Also aberrant expression of IDH3A promotes malignant tumor growth so could be used as a novel target for the diagnosis and treatment of cancers (30, 31).
Our study identified 20 deleterious SNPs that affect the stability and the function of the protein. All these SNPs were retrieved from the NCBI as untested and this study confirmed them damaging. Our study is the first in silico analysis of IDH3A gene. It was based on functional analysis while all previous studies (25, 28) were based on exome sequencing. The 20 deleterious SNPs might be considered as a potential novel target in the diagnosis of Retinitis Pigmentosa and associated diseased.
5. Conclusion
In this study the effect of the SNPs of IDH3A gene was thoroughly investigated through different bioinformatics prediction softwares. 20 novel mutations were found to have a damaging impact on the structure and function of the protein and may thus be used as diagnostic markers for Retinitis Pigmentosa and could help in the overall understanding of the disease.
Conflict of interest
The authors declare that they have no competing interest. The authors declare that there is no conflict of interest regarding the publication of this paper.
Acknowledgement
The authors wish to acknowledge the enthusiastic cooperation of Africa City of Technology-Sudan.