PT  - JOURNAL ARTICLE
AU  - Hiroyuki Kurata
AU  - Sho Tsukiyama
TI  - ICAN: interpretable cross-attention network for identifying drug and target protein interactions
AID  - 10.1101/2022.08.04.502877
DP  - 2022 Jan 01
TA  - bioRxiv
PG  - 2022.08.04.502877
4099  - http://biorxiv.org/content/early/2022/08/06/2022.08.04.502877.short
4100  - http://biorxiv.org/content/early/2022/08/06/2022.08.04.502877.full
AB  - Drug–target protein interaction (DTI) identification is fundamental for drug discovery and drug repositioning, because therapeutic drugs act on disease-causing proteins. However, the DTI identification process often requires expensive and time-consuming tasks, including biological experiments involving large numbers of candidate compounds. Thus, a variety of computation approaches have been developed. Of the many approaches available, chemo-genomics feature-based methods have attracted considerable attention. These methods compute the feature descriptors of drugs and proteins as the input data to train machine and deep learning models to enable accurate prediction of unknown DTIs. In addition, attention-based learning methods have been proposed to identify and interpret DTI mechanisms. However, improvements are needed for enhancing prediction performance and DTI mechanism elucidation. To address these problems, we developed an attention-based method designated the interpretable cross-attention network (ICAN), which predicts DTIs using the Simplified Molecular Input Line Entry System of drugs and amino acid sequences of target proteins. We optimized the attention mechanism architecture by exploring the cross-attention or self-attention, attention layer depth, and selection of the context matrixes from the attention mechanism. We found that a plain attention mechanism that decodes drug-related protein context features without any protein-related drug context features effectively achieved high performance. The ICAN outperformed state-of-the-art methods in several respects and revealed with statistical significance that some weighted sites in the cross-attention weight represent experimental binding sites, thus demonstrating the high interpretability of the results.Key points We created the interpretable cross-attention network (ICAN), which is composed of nn.Embedding of FCS label-encoding vectors of SMILES of drugs and AA sequences of target proteins, cross-attention mechanisms, and a CNN output layer.ICAN decoded drug-related protein context features without any protein-related drug context features, achieving high prediction performance despite the plain attention mechanism.In comparison with seven state-of-the-art methods, ICAN provided the highest PRAUC for the imbalanced datasets (DAVIS and BindingDB).Statistical analysis of attention-weight matrixes revealed that some weighted attention sites corresponded to experimental binding sites, demonstrating the high interpretability achievable with ICAN.Competing Interest StatementThe authors have declared no competing interest.