ABSTRACT
Summary Recombinant protein production is a widely used technique in the biotechnology industry and biomedical research, yet only a quarter of target proteins are soluble and can be purified. Failures are largely due to low protein expression and solubility. We have discovered that global structural flexibility, which can be modeled by normalised B-factors, accurately predicts the solubility of 12,216 recombinant proteins expressed in Escherichia coli. We have optimised B-factors, and derived a new set of values for solubility scoring that further improves the prediction accuracy. We call this new predictor the ‘Solubility-Weighted Index’ (SWI). Importantly, SWI outperforms many existing protein solubility prediction tools. We have developed ‘SoDoPE’ (Soluble Domain for Protein Expression), a web interface that allows users to choose a protein region of interest for predicting and maximising both protein expression and solubility.
Availability The SoDoPE web server and source code are freely available at https://tisigner.com/sodope and https://github.com/Gardner-BinfLab/TIsigner, respectively.
The code and data for reproducing our analysis can be found at https://github.com/Gardner-BinfLab/SoDoPE_paper_2019.