Abstract
In recent years, solving protein structures by single particle cryogenic electron microscopy (cryoEM) has become a crucial tool in structural biology. While exciting progress is being made towards the visualization of smaller and smaller macromolecules, the median protein size in both eukaryotes and bacteria is still beyond the reach of single particle cryoEM. To overcome this problem, we implemented a platform strategy in which a small protein target was rigidly attached to a large, symmetric base via a selectable adapter. Seven designs were tested. In the best construct, a designed ankyrin repeat protein (DARPin) was rigidly fused to tetrameric rabbit muscle aldolase through a helical linker. The DARPin retained its ability to bind its target, the 27 kDa green fluorescent protein (GFP). We solved the structure of this complex to 3.0 Å resolution overall, with 5 to 8 Å resolution in the GFP region. As flexibility in the DARPin limited the overall resolution of the target, we describe strategies to rigidify this element.
Author summary Single particle cryogenic electron microscopy (cryoEM) is a technique that uses images of purified proteins to determine their atomic structure. Unfortunately, the majority of proteins in the human and bacterial proteomes are too small to be analyzed by cryoEM. Over the years, several groups have suggested the use of a platform to increase the size of small protein targets. The platform is composed of a large protein base and a selectable adapter that binds the target protein. Here we report a platform based on tetrameric rabbit muscle aldolase that is fused to a Designed Ankyrin Repeat Protein (DARPin). Phage display libraries can be used to generate DARPins against target proteins. The residues mutated in a phage display library to generate a DARPin against a new target do not overlap with the DARPin-base fusion in the platform, thus changing the DARPin identity will not disrupt the platform design. The DARPin adapter used here is capable of binding Green Fluorescent Protein (GFP). We report the structure of GFP to 5 to 8 Å local resolution by single particle cryoEM. Our analysis demonstrates that flexibility in the DARPin-aldolase platform prevents us from achieving higher resolution in the GFP region. We suggest changes to the DARPin design to rigidify the DARPin-aldolase platform. This work expands on current platforms and paves a generally applicable way toward structure determination of small proteins by cryoEM.
Introduction
Single particle cryoEM can reveal the structures of large macromolecular complexes to near atomic resolution. To solve a protein structure by single particle cryoEM, purified proteins are rapidly frozen in a thin layer of vitreous ice. A transmission electron microscope is used to collect projection images of the protein. Individual proteins are identified in the ice and their orientations are computationally determined. The projection images are then combined to calculate a 3D reconstruction of the protein.
A fundamental challenge in single particle cryoEM is that small proteins do not produce enough contrast in noisy projection images to precisely determine their orientation. Richard Henderson estimated that with ideal images, a 3 Å structure could be reconstructed for a 40 kDa protein (1). Unfortunately, real electron micrographs are imperfect so this theoretical minimum of macromolecular size has never been reached. The smallest protein to be solved to near atomic resolution so far by cryoEM is hemoglobin (64 kDa) (2), but the median protein lengths in both bacteria (27 kDa) and eukaryotes (36 kDa) are about two times smaller (3). Consequently, many proteins in biology are beyond the reach of high-resolution structure determination by single particle cryoEM.
Over the years, several strategies to overcome the size limit problem in single particle cryoEM have been suggested. Two major themes have emerged to increase the target mass and improve its orientation determination. First, the target can be decorated with antibody fragments (6) (7). Second, the target can be rigidly attached to a large platform protein. The platform is typically composed of a base protein and an adapter. The purpose of the base protein is to increase the molecular weight, which facilitates accurate particle picking and precise particle orientation determination. The adapter can be customized (a covalent fusion between the target and the base) or general (a selectable adapter that facilitates non-covalent binding of the target to the platform base). Covalent approaches have utilized direct fusions between the target protein and the base either via a flexible linker adapter (4) or a helical junction adapter (5) (8). For the platform to be successful, the adapter must be rigidly attached to the base. The flexible linker adapter was therefore insufficient to determine the structure of the target (4), but the use of a helix-forming peptide linker (8)(9) or direct concatenation of two helices (5)(10) has shown promise. Most recently, Liu et al. demonstrated that a rigid, continuous α-helix could be formed by linking the terminal α-helices of a designed ankyrin repeat protein (DARPin) and a nanocage subunit through a helix-forming peptide linker (9) (8). Notably, Liu et al were able to show the structure of the 17 kDa DARPin to 3.5 to 5 Å local resolution (8). Unfortunately, these strategies are limited to target proteins with a terminal α-helix, and their implementation requires that the length of the helical junction adapter must be customized for each new target. Utilizing a non-covalent platform strategy with a selectable adapter (like an antibody or a DARPin) has the potential to be generally applicable, regardless of the structure of the target, since the selectable adapter could be raised against any target using phage display, while the invariant nature of the adapter framework region would allow the one-time optimization of a rigid attachment point between it and the base. Along these lines, Liu et al. suggested that their DARPin-nanocage could display a small protein for structure determination by cryoEM (8), but so far no group has demonstrated this.
Here we report the outcomes of a variety of new designs and report the structure of the first small protein visualized through a base/selectable-adapter platform approach.
Results
Platform strategy and design
The goal of our study was to design a generally applicable platform to solve small protein structures by single particle cryoEM. We explored several candidate base proteins and selectable adapters (Fig S1). We favored bases that were easy to purify and that had already been solved to high-resolution by single particle cryoEM. We reasoned that oligomeric and symmetric (as a globular protein, or as a helical tube) bases would be best.
As selectable adapters, we first considered antibody fragments (Fabs and scFvs). Fabs have a flexible elbow connecting two immunoglobulin regions, whereas scFvs are made up of one immunoglobulin region. The Fab elbow could introduce flexibility, so we preferred the smaller, more compact scFv. However, because the beta sandwich immunoglobulin fold of a scFv could be difficult to rigidly fuse to the surface of a platform base, we identified Staphylococcus Protein A (PrA) as a linker that could bind the invariant region of a scFv (11). As PrA is a three-helix bundle, we reasoned that it could be rigidly attached to a base via a helical linker. Thus in one of our designs, the C-terminal helix of PrA was fused to the N-terminal helix of the base protein. Since PrA is capable of binding the invariant scFv framework, the base-PrA:scFv interfaces would not need to be redesigned for each new target. Unfortunately, in our biochemical experiments, we observed that the PrA:scFv interaction did not remain stable through a gel filtration column, indicating that the binding affinity was not strong enough for our purposes. Further mutagenesis to the PrA:scFv interface may strengthen the interaction. Regardless, a fundamental concern with this design is that two non-covalent binding interactions are required (PrA:scFv, and scFv:target), which could lead to occupancy issues. As a result, we moved to DARPins as our selectable adapter (Fig 1B).
In our designs, the final alpha helix of the DARPin C-terminal cap (C-cap) was directly fused to the first α-helix of the base (Fig 1C). All DARPin libraries use a C-cap to stabilize the protein, so we expect it will be straightforward to swap in any DARPin built on the same framework (Fig 1B). In the base-DARPin platform design, only one non-covalent interaction is required (between the DARPin and the target), which results in a more predictable and stable complex. We chose a DARPin that formed a stable complex with GFP as a first test case (12) and screened several base-DARPin candidates.
Screening base candidates
We performed expression trials for five of our base-DARPin candidates (Fig S1). These bases included β-galactosidase (β-gal) (13), the vipA/vipB helical tube (14), the E. coli ribosome (15), TibC (16), and aldolase (17).
Because β-gal tetramerization requires the N- and C-termini of each subunit (18), an internal DARPin insertion was used, flanked by a helix-forming peptide (at the DARPin N-cap) and a flexible linker (at the DARPin C-cap) (9). Biochemically the ß-gal-DARPin platform formed a stable complex with GFP, but no cryoEM density was observed for the DARPin or GFP in our 3 Å reconstruction. This means that the helical linker was flexible relative to the β-gal base.
We therefore focused on bases with a terminal α-helix that could be rigidly fused to the DARPin. The vipA/vipB, ribosome L29, TibC, and aldolase proteins all had long terminal α-helices to facilitate direct fusion. In our experiments, the helical tube vipA-DARPin/vipB platform exhibited poor expression in E. coli, while the L29-PrA fusion did not integrate well into ΔL29 E. coli ribosomes (15) (Fig S1). The purified TibC-DARPin platform formed a stable complex with GFP, but the complex demonstrated aggregation and preferred orientation on plunge frozen grids. In contrast, the DARPin-aldolase platform was well-behaved.
In our DARPin-aldolase platform, the C-terminal α-helix of the DARPin was directly concatenated to the N-terminal α-helix of aldolase (Fig 1C) (S2 Fig A). The D2 symmetry of the DARPin-aldolase platform provided extensive space for the target and could potentially accommodate a globular protein of up to 740 kDa without steric clash (Fig 1E, 1F) (S1 Movie). The purified GFP:DARPin-aldolase complex was stable in a gel filtration column with an apparent 1:1 stoichiometry of DARPin-aldolase to the target (GFP) (S2 Fig B and C).
CryoEM analysis of the GFP:DARPin-aldolase complex
To solve the structure of GFP bound to the DARPin-aldolase platform, we collected 1,681 micrographs on a Titan Krios (Fig S3). Because the thin ice forced a slight preferred orientation issue, an additional 1,180 micrographs were collected at 26° tilt (see methods) (19). High quality micrographs were selected after CTF correction (Fig S4) and particles were autopicked in Relion (Fig S3). After 2D classification in cryoSPARC, classes with strong secondary structure were selected for reconstruction. The GFP:DARPin-aldolase complex reconstruction yielded an overall resolution of 3 Å with C1 symmetry (Fig S5B). Further classification suggested too much conformational heterogeneity to apply D2 symmetry. The aldolase core and the helical linker were resolved to near atomic resolution (Fig 2B, 2C, 2D). The DARPin and GFP exhibited a local resolution of 4 to 8 Å, with discontinuous regions of higher resolution of 3.5 Å (Fig 2D) (S2 Movie). Although the resolution in the GFP and DARPin portion was not sufficient to build a model or assign sequence de novo, the static X-ray structures of GFP and the DARPin could be reliably docked into the map (Fig 2A).
DARPin framework caused conformational heterogeneity
Because of the 5 to 8 Å local resolution range in the GFP portion of the map (Fig 2D), we suspected that part of the GFP:DARPin-aldolase complex was flexible. To better understand the conformational heterogeneity in the data, a mask was generated around a single DARPin/GFP unit and Relion particle symmetry expansion was used to consider each subunit individually (Fig 3A, Fig S3) (21). The symmetry expanded particles were subjected to 3D classification without alignment, a strategy in which the orientation parameters determined in the previous refinement are used to classify the particles into subsets. For this focused classification, a spherical mask that encompassed the aldolase surface was used to increase the signal. The resulting five classes showed reasonable GFP:DARPin conformations (Fig 3B), but subsequent refinements were still limited to 5 to 6 Å overall, which suggested that additional conformational heterogeneity remained within the subsets. The majority of the particles (54%) were classified into class 2 (yellow), which appeared to lack a DARPin (Fig 3B). Class 2 was subjected to an additional round of 3D classification where it revealed several reasonable but lower resolution GFP:DARPin conformations (Fig S6). To investigate the heterogeneity in the focused classes, we compared each class to class 4 (Fig 3C, 3D). In the different classes, the GFP:DARPin density shows a clear rocking around the Y axis (Fig 3C) and around the Z axis (Fig 3D) relative to the aldolase base. At this point, we wondered if any these displacements could be attributed to the aldolase subunit. We performed a similar focused classification experiment with a mask around the aldolase subunit and the helical linker, but no rotation or shift was observed in the resulting subsets (data not shown). Thus, we concluded that the displacement likely arose in the C-cap second helix that is fused into the helical linker, and other regions of the DARPin distal to the linker.
Discussion
In this study, we designed and tested a variety of platforms capable of non-covalently binding a small target protein via a selectable adapter for structure determination by single particle cryoEM. In our best construct, we resolved our target protein (GFP) to 5 to 8 Å resolution.
Our DARPin-aldolase platform has several advantages over other strategies. It is simple to express and purify. Aldolase has D2 symmetry and allows attachment of four targets without steric clash. Aldolase can be reconstructed to 2.6 Å resolution with even a 200 keV microscope (17). Because DARPins can be readily generated against a wide range of small protein targets, the attachment of a DARPin to aldolase promises to be a generally applicable strategy. A recent study of the insulin degrading enzyme (IDE) bound to Fabs was able to isolate several IDE conformations using different Fabs (22). It stands to reason that different DARPins could also stabilize different conformations of the target. Because switching DARPins in the platform would be done by straight-forward DNA manipulations, our DARPin-aldolase platform has the potential to resolve a series of conformations of the target protein.
Our biochemistry experiments suggested that the purified GFP:DARPin-aldolase complex was very stable, and clear secondary structure was apparent in the 2D classes, yet heterogeneity remained. Because the aldolase base and the helical linker region were resolved to near atomic resolution (Fig 2B and 2C), the heterogeneity likely began in the DARPin C-cap. The DARPin against GFP used here was from a first generation DARPin library. The C-cap of the first generation DARPins was reported to be less stable than the other repeat modules (23). While the crystal structure contained a well-resolved C-cap, the heterogeneity observed here suggests that it is not yet sufficiently rigid to serve as an attachment point in a cryoEM platform (Fig 3). Recent DARPin phage display libraries contain DARPins with reduced surface entropy and a more stable C-cap sequence (23), however, and additional stabilizing surface interactions could be introduced in future designs (28) (29), or even a second attachment point of the DARPin to the base (at both N- and C-terminal caps of DARPin for instance). Together such improvements could allow the DARPin-aldolase platform to reveal the structures of many small proteins to near atomic resolution.
Materials and Methods
Computational design
Computational α-helix fusion was generated by manually docking the rabbit muscle aldolase structure (PDB code: 5VY5) and GFP/DARPin complex (PDB code: 5MA6). In order to rigidly join the aldolase and DARPin moiety together, we truncated the C-terminal flexible loop on DARPin and N-terminal flexible loop on aldolase, respectively, exposing the two terminal α-helices. The two terminal α-helices were manually concatenated and joined together to form an ideal α-helix using building α-helix tool in UCSF Chimera (30). The model was inspected for the orientation of DARPin relative to the aldolase, ensuring no steric clash and the providing enough space for target protein attachment. All structural design figures were generated using PyMOL1.8 (https://pymol.org).
Cloning, protein expression, and purification of the recombinant DARPin-aldolase platform and GFP
The DARPin sequence was DARPin 3G86.32 (Fig S2A) (12). The cDNA expressing GFP and our DARPin-aldolase fusion were synthesized at IDT DNA company. The cDNA of GFP and DARPin-aldolase fusion were PCR-amplified and inserted into pACYCDuet and pET21b vector for recombinant expression in E. coli, producing no-tag GFP protein and C-terminal His-tag of DARPin-aldolase chimeric fusion. GFP and DARPin-aldolase were coexpressed in E. coli BL21(DE3) (Lucigen) using autoinduction medium with trace elements (Formedium) at 30 °C for overnight. Cells were harvested by centrifugation and the protein complex was then purified with Ni-NTA affinity chromatography (Qiagen), and Superdex 200 chromatography (GE healthcare). The purified GFP-DARPin-aldolase complex was concentrated to 2.5mg/ml in a buffer containing 25 mM Tris-HCl pH 8.0 and 150 mM NaCl.
CryoEM sample preparation and data collection
Electron microscopy grids were prepared at Scripps Research Institute. Briefly, 3 µL sample of 2.5 mg/ml GFP-DARPin-aldolase complex was applied to a plasma cleaned Au UltraFoil Grid (300 Mesh, R2/2, Quantifoil) in a cold room (4°C, ≥95% relative humidity). The grid was manually blotted with a filter paper (Whatman No.1) for approximately 3 seconds before plunging into liquid ethane using a manual plunger (17). The grids were screened in Talos Arctica 200 kV with Falcon 3 (FEI) direct electron detector for ice thickness and sample distribution. Micrographs of GFP-DARPin-aldolase complex were collected on Titan Krios microscope (FEI) operating and 300 kV with energy filter (Gatan) and equipped with a K2 Summit direct electron detector (Gatan). For untilted data, Serial EM was used for automated EM image acquisition (31). After calculating an efficiency score from early refinements using cryoEF (19), additional data were collected at 26° using EPU software (FEI). A nominal magnification of 165,000x was used for data collection, corresponding to a pixel size of 0.865 Å at the specimen level, with the defocus ranging from -1.0 µm to -3.0 µm. Movies were recorded in superresolution mode, with a total dose of ~40 e-/ Å2, fractioned into 20 frames (0° tilt images) or 40 frames (26° tilt images) under the does rate of 8.4 electron per pixel per second.
Image processing and structure analysis
Movies were decompressed and gain corrected with IMOD (32). Motion correction was performed using program MotionCor2 (33), and exposure filtered in accordance with relevant radiation damage curves (34). Micrographs with high CTF Figure of Merit scores and a maximum resolution great than 3.6 Å were selected for further processing. Particles were autopicked using 2D classes as references and extracted in RELION (35) and initial 2D classification was performed in cryoSPARC (36). High quality 2D classes were selected for further processing. The initial model was de novo generated and subsequent 3D refinement were performed using cryoSPARC. The UCSF PyEM package (https://github.com/asarnow/pyem) script was used to convert the cryoSPARC coordinates into Relion. Duplicate particles were removed and particles were analyzed by 3D refinement, Bayesian Particle Polishing and CTF Refinement in Relion. The data were binned to 1.5 Å/pixel, refined with D2 symmetry, and symmetry expanded. Symmetry expanded particles were used in 3D classification without alignment. All reconstructions were analyzed using USCF Chimera. The initial model was built rigidly docking individual protein structures into the EM map using Chimera. The model was then fit and adjusted manually in USCF Chimera and Coot (37). The figures were generated using UCSF Chimera, and local resolution and final Fourier shell correlation were calculated using ResMap (38) and cryoSPARC.
Data deposition
Density map of GFP:DARPin-aldolase complex has been deposited in the Electron Microscopy Data Bank (EMDB) with access code: EMD-9277 and PDB 6MWQ.
Acknowledgement
We thank Dr. Mark Herzik Jr. and Mengyu Wu at The Scripps Research Institute for help with sample preparation. We also thank Dr. Songye Chen, Dr. Andrey Malyutin, Dr. Rebecca Voorhees, and Dr. Bil Clemons at Caltech for technical assistance. We are also grateful to all members of the Jensen laboratory for discussion and technical assistance. This work was supported by funds from NIH NIGMS P50 082545. CryoEM work was performed in the Beckman Institute Resource Center for Transmission Electron Microscopy at Caltech.