ABSTRACT
Protein fold recognition is the key to study protein structure and function. As a representative pattern recognition task, there are two main categories of approaches to improve the protein fold recognition performance: 1) extracting more discriminative descriptors, and 2) designing more effective distance metrics. The existing protein fold recognition approaches focus on the first category to finding a robust and discriminative descriptor to represent each protein sequence as a compact feature vector, where different protein sequence is expected to be separated as much as possible in the fold space. These methods have brought huge improvements to the task of protein fold recognition. However, so far, little attention has been paid to the second category. In this paper, we focus not only on the first category, but also on the second point that how to measure the similarity between two proteins more effectively. First, we employ deep convolutional neural network techniques to extract the discriminative fold-specific features from the potential protein residue-residue relationship, we name it SSAfold. On the other hand, due to different feature representation usually subject to varying distributions, the measurement of similarity needs to vary according to different feature distributions. Before, almost all protein fold recognition methods perform the same metrics strategy on all the protein feature ignoring the differences in feature distribution. This paper presents a new protein fold recognition by employing siamese network, we named it PFRSN. The objective of PFRSN is to learns a set of hierarchical nonlinear transformations to project protein pairs into the same fold feature subspace to ensure the distance between positive protein pairs is reduced and that of negative protein pairs is enlarged as much as possible. The experimental results show that the results of SSAfold and PFRSN are highly competitive.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Author Biography: Ke Han received her M.S. degree in computer science from Nanjing University of Science and Technology in 2009. She is currently a PhD candidate in the School of Computer Science and Engineering at Nanjing University of Science and Technology and a member of Pattern Recognition and Bioinformatics Group. Her research interests include pattern recognition, machine learning and bioinformatics.
Yan Liu received his M.S. degree in computer science from Yangzhou University in 2019. He is currently a PhD candidate in the School of Computer Science and Engineering at Nanjing University of Science and Technology and a member of Pattern Recognition and Bioinformatics Group. His research interests include pattern recognition, machine learning and bioinformatics.
Dong-Jun Yu received the PhD degree from Nanjing University of Science and Technology in 2003. He is currently a full professor in the School of Computer Science and Engineering, Nanjing University of Science and Technology. His research interests include pattern recognition, machine learning and bioinformatics. He is a senior member of the China Computer Federation (CCF) and a senior member of the China Association of Artificial Intelligence (CAAI).