PT - JOURNAL ARTICLE AU - James H. Collier AU - Lloyd Allison AU - Arthur M. Lesk AU - Peter J. Stuckey AU - Maria Garcia de la Banda AU - Arun S. Konagurthu TI - Statistical inference of protein structural alignments using information and compression AID - 10.1101/056598 DP - 2016 Jan 01 TA - bioRxiv PG - 056598 4099 - http://biorxiv.org/content/early/2016/06/02/056598.short 4100 - http://biorxiv.org/content/early/2016/06/02/056598.full AB - Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates.) Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a framework for precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power - the amount of lossless compression achieved to explain the protein coordinates using that alignment. We have implemented this approach in the program MMLigner http://lcb.infotech.monash.edu.au/mmligner to distinguish statistically significant alignments, not available elsewhere. We also demonstrate the reliability of MMLigner’s alignment results compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes.