Abstract
Number of beta-lactamase variants have ability to deactivate ceftazidime antibiotic, which is the most commonly used antibiotic for treating infection by Gram-negative bacteria. In this study an attempt has been made to develop a method that can predict ceftazidime resistant strains of bacteria from amino acid sequence of beta-lactamases. We obtained beta-lactamases proteins from the β-lactamase database, corresponding to 87 ceftazidime-sensitive and 112 ceftazidime-resistant bacterial strains. All models developed in this study were trained, tested, and evaluated on a dataset of 199 beta-lactamases proteins. We generate 9149 features for beta-lactamases using Pfeature and select relevant features using different algorithms in scikit-learn package. A wide range of machine learning techniques (like KNN, DT, RF, GNB, LR, SVC, XGB) has been used to develop prediction models. Our random forest-based model achieved maximum performance with AUROC of 0.80 on training dataset and 0.79 on the validation dataset. The study also revealed that ceftazidime-resistant beta-lactamases have amino acids with non-polar side chains in abundance. In contrast, ceftazidime-sensitive beta-lactamases have amino acids with polar side chains and charged entities in abundance. Finally, we developed a webserver “ABCRpred”, for the scientific community working in the era of antibiotic resistance to predict the antibiotic resistance/susceptibility of beta-lactamase protein sequences. The server is freely available at (http://webs.iiitd.edu.in/raghava/abcrpred/).
Key Points
Ceftazidime is commonly used to treat infection caused by Gram-negative bacteria.
Beta-lactamase is responsible for lysing ceftazidime, make it resistant to bacteria.
Comparison of resistant and sensitive variants of beta-lactamase.
Classification of sensitive and resistant strain of bacteria based on beta-lactamase.
Prediction models have been developed using different machine learning techniques.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Emails of Authors: LM: lucymary20{at}gmail.com, AD: anjalid{at}iiitd.ac.in, SP: sumeetp{at}iiitd.ac.in, SSU: salman007usmani{at}gmail.com, NS: neelams{at}iiitd.ac.in, GPSR: raghava{at}iiitd.ac.in
Author’s Biography
Lubna Maryam is currently working as a Post-Doctoral Fellow in the Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
Anjali Dhall is currently working as a Ph.D. scholar in Bioinformatics from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
Sumeet Patiyal is currently working as a Ph.D. scholar in Bioinformatics from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
Salman Sadullah Usmani has completed his Ph.D. in Bioinformatics from CSIR-IMTECH, Chandigarh, India and is now working as Research Associate-I in the Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
Neelam Sharma is currently working as a Ph.D. scholar in Bioinformatics from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
G.P.S. Raghava is currently working as Professor and Head of Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.