TY - JOUR T1 - Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model JF - bioRxiv DO - 10.1101/073239 SP - 073239 AU - Sheng Wang AU - Siqi Sun AU - Zhen Li AU - Renyu Zhang AU - Jinbo Xu Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/09/06/073239.abstract N2 - Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not extremely useful for de novo structure prediction.Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) information and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. This neural network allows us to model very complex relationship between sequence and contact map as well as long-range interdependency between contacts and thus, obtain high-quality contact prediction.Results Our method greatly outperforms existing contact prediction methods and leads to much more accurate contact-assisted protein folding. For example, on the 105 CASP11 test proteins, the L/10 long-range accuracy obtained by our method is 83.3% while that by the state-of-the-art methods CCMpred and MetaPSICOV (the CASP11 winner) is 43.4% and 60.2%, respectively. On the 398 membrane proteins, the L/10 long-range accuracy obtained by our method is 77.3% while that by CCMpred and MetaPSICOV is 51.8% and 61.2%, respectively. Ab initio folding using our predicted contacts as restraints can yield correct folds (i.e., TMscore>0.6) for 224 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Further, our contact-assisted models also have much better quality (especially for membrane proteins) than template-based models. Using our predicted contacts as restraints, we can fold 240 of the 398 membrane proteins with TMscore>0.5. However, when the training proteins of our method are used as templates, homology modeling can only do so for 10 of the membrane proteins. ER -