RT Journal Article SR Electronic T1 Improving Protein Docking with Constraint Programming and Coevolution Data JF bioRxiv FD Cold Spring Harbor Laboratory SP 002329 DO 10.1101/002329 A1 Ludwig Krippahl A1 Fábio Madeira YR 2014 UL http://biorxiv.org/content/early/2014/02/03/002329.abstract AB Background Constraint programming (CP) is usually seen as a rigid approach, focusing on crisp, precise, distinctions between what is allowed as a solution and what is not. At first sight, this makes it seem inadequate for bioinformatics applications that rely mostly on statistical parameters and optimization. The prediction of protein interactions, or protein docking, is one such application. And this apparent problem with CP is particularly evident when constraints are provided by noisy data, as it is the case when using the statistical analysis of Multiple Sequence Alignments (MSA) to extract coevolution information. The goal of this paper is to show that this first impression is misleading and that CP is a useful technique for improving protein docking even with data as vague and noisy as the coevolution indicators that can be inferred from MSA.Results Here we focus on the study of two protein complexes. In one case we used a simplified estimator of interaction propensity to infer a set of five candidate residues for the interface and used that set to constrain the docking models. Even with this simplified approach and considering only the interface of one of the partners, there is a visible focusing of the models around the correct configuration. Considering a set of 400 models with the best geometric contacts, this constraint increases the number of models close to the target (RMSD ¡5Å) from 2 to 5 and decreases the RMSD of all retained models from 26Å to 17.5Å. For the other example we used a more standard estimate of coevolving residues, from the Co-Evolution Analysis using Protein Sequences (CAPS) software. Using a group of three residues identified from the sequence alignment as potentially co-evolving to constrain the search, the number of complexes similar to the target among the 50 highest scoring docking models increased from 3 in the unconstrained docking to 30 in the constrained docking.Conclusions Although only a proof-of-concept application, our results show that, with suitably designed constraints, CP allows us to integrate coevolution data, which can be inferred from databases of protein sequences, even though the data is noisy and often “fuzzy”, with no well-defined discontinuities. This also shows, more generally, that CP in bioinformatics needs not be limited to the more crisp cases of finite domains and explicit rules but can also be applied to a broader range of problems that depend on statistical measurements and continuous data.