Abstract
Motivation Accurate prediction of protein contact map is essential for accurate proteins structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most contact map prediction methods rely on protein sequence evolutionary information which may not exist for many proteins due to lack of sequence homology. Moreover, generating evolutionary profiles is computationally intensive and time consuming. Therefore, we developed a contact map predictor utilizing the output of a pre-trained language model ESM-1B as an input along with a large training set and an ensemble of residual neural networks.
Results We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods TrRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins in the SPOT-2018 set without homologs (Neff=1). The new method provides a much faster and reasonably accurate alternative to profile-based methods, useful for large-scale prediction, in particular.
Contact jaspreetsingh2{at}griffithuni.edu.au, k.paliwal{at}griffith.edu.au, and zhouyq{at}szbl.ac.cn
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵* jaspreetsingh2{at}griffithuni.edu.au, k.paliwal{at}griffith.edu.au and zhouyq{at}szbl.ac.cn