A new formulation of protein evolutionary models that account for structural constraints

Mol Biol Evol. 2014 Mar;31(3):736-49. doi: 10.1093/molbev/mst240. Epub 2013 Dec 3.

Abstract

Despite the importance of a thermodynamically stable structure with a conserved fold for protein function, almost all evolutionary models neglect site-site correlations that arise from physical interactions between neighboring amino acid sites. This is mainly due to the difficulty in formulating a computationally tractable model since rate matrices can no longer be used. Here, we introduce a general framework, based on factor graphs, for constructing probabilistic models of protein evolution with site interdependence. Conveniently, efficient approximate inference algorithms, such as Belief Propagation, can be used to calculate likelihoods for these models. We fit an amino acid substitution model of this type that accounts for both solvent accessibility and site-site correlations. Comparisons of the new model with rate matrix models and alternative structure-dependent models demonstrate that it better fits the sequence data. We also examine evolution within a family of homohexameric enzymes and find that site-site correlations between most contacting subunits contribute to a higher likelihood. In addition, we show that the new substitution model has a similar mathematical form to the one introduced in Rodrigue et al. (Rodrigue N, Lartillot N, Bryant D, Philippe H. 2005. Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347:207-217), although with different parameter interpretations and values. We also perform a statistical analysis of the effects of amino acids at neighboring sites on substitution probabilities and find a significant perturbation of most probabilities, further supporting the significant role of site-site interactions in protein evolution and motivating the development of new evolutionary models similar to the one described here. Finally, we discuss possible extensions and applications of the new substitution model.

Keywords: approximate inference algorithms; factor graphs; protein evolution; protein structure; site-site correlations; substitution models.

MeSH terms

  • Amino Acid Substitution / genetics
  • Crystallography, X-Ray
  • Databases, Protein
  • Evolution, Molecular*
  • Homogentisate 1,2-Dioxygenase / chemistry
  • Humans
  • Likelihood Functions
  • Models, Genetic*
  • Phylogeny
  • Proteins / chemistry*
  • Proteins / genetics*
  • Statistics as Topic

Substances

  • Proteins
  • Homogentisate 1,2-Dioxygenase