PyCoM: a python library for large-scale analysis of residue-residue coevolution data

Bioinformatics. 2024 Mar 29;40(4):btae166. doi: 10.1093/bioinformatics/btae166.

Abstract

Motivation: Computational methods to detect correlated amino acid positions in proteins have become a valuable tool to predict intra- and inter-residue protein contacts, protein structures, and effects of mutation on protein stability and function. While there are many tools and webservers to compute coevolution scoring matrices, there is no central repository of alignments and coevolution matrices for large-scale studies and pattern detection leveraging on biological and structural annotations already available in UniProt.

Results: We present a Python library, PyCoM, which enables users to query and analyze coevolution matrices and sequence alignments of 457 622 proteins, selected from UniProtKB/Swiss-Prot database (length ≤ 500 residues), from a precompiled coevolution matrix database (PyCoMdb). PyCoM facilitates the development of statistical analyses of residue coevolution patterns using filters on biological and structural annotations from UniProtKB/Swiss-Prot, with simple access to PyCoMdb for both novice and advanced users, supporting Jupyter Notebooks, Python scripts, and a web API access. The resource is open source and will help in generating data-driven computational models and methods to study and understand protein structures, stability, function, and design.

Availability and implementation: PyCoM code is freely available from https://github.com/scdantu/pycom and PyCoMdb and the Jupyter Notebook tutorials are freely available from https://pycom.brunel.ac.uk.

MeSH terms

  • Amino Acids
  • Databases, Protein
  • Proteins* / chemistry
  • Sequence Alignment
  • Software*

Substances

  • Proteins
  • Amino Acids