Abstract
Copper is essential for life but toxic, therefore all organisms control tightly its intracellular abundance. Bacteria have indeed whole operons devoted to copper resistance, with genes that code for efflux pumps, oxidases, etc. Recently, the CopM protein of the CopMRS operon was described as an important element for copper tolerance in Synechocystis. This protein consists of a domain of unknown function, and was suggested to act as a periplasmic/extracellular copper binder. This work describes a bioinformatic characterization of CopM including structural models based on homology modeling and on residue coevolution, to help expand on the recently reported experiments. The protein is predicted to be membrane-anchored, not secreted. Two disordered regions are predicted, both possibly involved in protein-protein interactions. The 3D models disclose a 4-helix bundle fold with several potential copper-binding sites, most of them largely buried inside the lumen of the bundle. Some of the predicted copper-binding sites involve residues from the disordered regions, suggesting that copper binding could induce structuring of these disordered regions and thus modulate the interactions they mediate.
Copper is an element essential for life, but it becomes toxic at concentrations exceeding organismal requirements.1,2 Therefore, living creatures have evolved complex homeostatic systems that monitor copper concentrations and manage its intake and distribution to the proteins that require it, and that prevent its toxic effects.1,3 Bacteria carry whole genetic operons and regulons devoted to conferring tolerance against Cu(I) and Cu(II). The proteins coded in these genetic elements achieve protective roles through varied mechanisms, including efflux systems, reduction of the more toxic Cu(I) to Cu(II), metal chelation and other less understood functions.4–9
Recently, a novel protein dubbed CopM was identified in the copMRS operon of a copper-resistance regulon of Synechocystis.10,11 The CopM protein is of periplasmic and extracelullar localizations,11 and plays an important role in resistance according to the effects observed in knock out strains.11 In vitro, it can tightly bind one equivalent of Cu(I) or, less tightly, of Cu(II).11 However, its amino acid sequence includes several histidine and methionine residues that could allow it to bind more metal equivalents.12 Indeed, other periplasmic copper-tolerance proteins that function as metal sponges, like E. coli PcoE, retain one metal ion upon purification but accept more on titration.13
This work reports structural models of CopM based on classical homology modeling and on a protocol that uses evolutionary couplings to model a protein’s 3D structure. These two methods, based on fundamentally different concepts and data, lead to essentially the same structural features. Specifically, CopM is predicted a 4-helix bundle that brings together several His and Met side chains into proximity, giving place to many potential copper-binding sites, with two disordered regions that could become ordered upon copper binding. Together with basic bioinformatic analyses of CopM’s sequence, the emerging model supports the hypothesis of the protein working as a copper sponge, possibly to rapidly buffer sudden increases in environmental copper, with the potential additional function of differentially interacting with other proteins depending on its metallation level, possibly to modulate responses beyond initial chelation.
Results and Discussion
Analysis of Synechocystis CopM sequence. The amino acid sequence of full CopM from Synechocystis sp. PCC6803 (Uniprot ID Q55943_SYNY3) is shown in Fig. 1. It consists of a DUF305 domain (domain-of-unknown function number 305, PF03713, underlined in Fig. 1). The sequence begins with a short hydrophobic segment (italics in Fig. 1) previously regarded as a signal peptide for excretion; however, a computational prediction specifically designed to tell transmembrane regions from signal peptides suggests that this is a transmembrane helical region.14 As such, the protein is predicted to be mostly in the periplasm anchored to the membrane, consistently with < 30% leaking out in the first 24 h of growth as reported.11
Secondary structure predictions (red in Fig. 1) disclose four main helical elements with high confidence plus two short regions of helical propensity. Disorder predictions with Disopred315 (lowercase in Fig. 1) suggest two main unstructured regions, to which the short segments of helical propensity map. Both disordered regions are predicted by Disopred3 to be involved in interactions with other proteins.
Homology and coevolution-based modeling. Models of proteins and their dynamics often lead to new hypotheses and explanations. In a first exploratory step, two models of the globular domain of CopM (residues 25-196) were built using either classical homology modeling (with I-TASSER17) or a tool that analyzes evolutionary couplings throughout a protein’s sequence to fold it into a consistent 3D structure (EVFold18–20). These two protocols are completely unrelated and rely on fundamentally different data and methods. They independently produced similar models (Fig. 2 top), both of high confidence, indicating that the global features of the models should not differ largely from those of the “true” structure. The high confidence of the homology model stems from the availability in the Protein Data Bank of X-ray structures for two DUF305 domains with sequence identity > 20% to CopM (PDB ID 3BT5 and 2QF9). In turn, the high confidence of the coevolution-based prediction stems from the large number of sequences compiled by EVFold in an alignment with standard parameters (2586) that reaches good coverage of the sequence and of the amino acid space at each position of the sequence.
The largest differences between the EVFold and I-TASSER models map to the two predicted disordered regions, which are heterogeneous themselves even among the different models produced by each program. This is likely due to a combination of true dynamics, lack of homolog elements in the X-ray structures used by I-TASSER, and a low sequence complexity that prevents extraction of direct couplings by EVFold. To improve the definition of these regions, a new set of models was built by feeding the coevolution information from EVFold into I-TASSER. This procedure results in four similar top models (pairwise RMSDs between 0.9 and 4.1 Å, Z-scores from -1.61 to -0.51) that are better packed than the EVFold-only model and that reproduce the EVFold contacts better than the I-TASSER-only model. These models show variability mainly in the first 15-20 N-terminal residues (Fig. 2 bottom and Fig. 3).
All the obtained models display a 4-helix bundle topology, with an overall fold that brings most Met and His side chains inside the lumen of the bundle (Fig. 3 and Fig. 4). This results in many potential metal-binding sites inside the protein, some of them rather isolated but others making a continuous network of potential sites, most remarkably a central cluster of buried histidines (Fig. 4). Also the two disordered regions form arrangements reminiscent of copper-binding sites, by combining their methionine and histidine residues with those of the 4-helix bundle.
Based on the structural models and sequence predictions, it is possible that the two predicted disordered regions are truly unstructured and dynamic in the metal-free protein, and that they become folded upon copper binding, as observed for other copper-binding proteins.21,22 An open, flexible state in the absence of copper could facilitate metal uptake by the many internal sites of the protein. After such initial role as a “copper sponge” capable of buffering sudden increases in environmental copper, metal binding-induced structuring of the disordered regions could possibly trigger a second response by altering interactions with other proteins.
One last interesting point stems from two pairs of sequence segments that are significantly predicted to interact according to EVFold’s residue-residue couplings, but that none of the three modeling procedures could satisfy, not even the one based exclusively on EVFold data (Fig. 5). In the few examples where such situation was reported in the literature, the couplings turned out to correspond to either homodimerization surfaces19 or to alternative conformations such as intermediate and hidden states.23 Given the localization of these regions in the models, it is in this case more likely that these sequence segments mediate dimerization. An alternative conformation is less likely because it would entail very large rearrangements, although it cannot be discarded so both possibilities deserve exploration.
Conclusions
Coevolution-based methods for modeling protein structures and their complexes are rapidly getting established18–20,24–26 and starting to be used for real-world applications.27,28 Together with increasingly confident tools for homology modeling and sequence-based predictions, coevolution-based methods have an enormous potential to explain existing results and drive the generation of new hypotheses. Hopefully, the models and analyses described here will fulfill this goal, in this case headed towards a better understanding of how CopM and possibly other similar copper-tolerance proteins work.
Footnotes
luciano.abriata{at}epfl.ch