Abstract
Despite the immense importance of enzyme-substrate reactions, there is a lack of generic and unbiased tools for identifying the molecular components participating in these reactions on a cellular level. Here we developed a universal method called System-wide Identification of Enzyme Substrates by Thermal Analysis (SIESTA). The approach assumes that enzymatic post-translational modification of substrate proteins changes their thermal stability, and applies the concept of specificity to reveal potential substrates. For selenoprotein thioredoxin reductase 1, SIESTA confirmed several known protein substrates and suggested novel candidates. For poly-(ADP-ribose) polymerase-10, SIESTA revealed a number of putative substrates, which were confirmed by targeted mass spectrometry and functional assays. Wider application of SIESTA can enhance our understanding of the role of enzymes in homeostasis and disease, and facilitate drug discovery.
One sentence summary Monitoring protein stability can reveal changes due to specific enzymatic post-translational modifications at the proteome level.
Introduction
At least a third of all proteins possess enzymatic activity. One of the most comprehensive enzyme databases BRENDA (1) comprises >9 million protein sequences and encompasses >7000 classes of enzyme-catalyzed reactions (http://genexplain.com/brenda/). Many of these enzymes catalyze the modifications of protein substrates. Transient modulation of protein post-translational modifications (PTMs) controls numerous cellular processes, with PTM occupancies often acting as independent regulators in respect to protein expression levels (2). PTMs induce a host of downstream effects, such as changes in protein function, stability, hemostasis, localization and cellular diversification (3). Not surprisingly, mechanisms and kinetics of protein modifications have become a vibrant research area (4). An important aspect of PTM research is the characterization of enzyme-substrate associations, which is essential for our understanding of cell biology and disease mechanisms. Moreover, many high-throughput screening assays rely upon modified substrates as a readout (5). The lack of information on the physiological substrates of enzymes hampers the development of effective therapeutics, e.g. in Parkinson’s disease (6) and cancer (7).
For thorough understanding of homeostasis and disease, all substrates need to be determined for every PTM-catalyzing enzyme. However, conventional techniques used for identifying specific substrates are enzyme-specific, labor-intensive and often not straightforward. Such experiments include the use of genetic and pharmacologic perturbations (8), substrate-trapping mutants (9), affinity purification-mass spectrometry (10), utilizing peptide (11) or protein arrays (12), tagging the client proteins by substrate analogues using engineered enzymes (13), peptide immunoprecipitation (14) or the use of sophisticated computational tools (15). Most of these techniques are specifically designed for a certain enzyme or enzyme class, which limits their applicability. Engineering mutant enzymes can alter the biology of the system, potentially introducing a bias and increasing the risk of false positive discoveries. Therefore, designing an unbiased, universal and system-wide method not involving modification of the enzymes or their co-factors can prove to be a significant methodological advancement useful in a wide range of studies.
Any interaction with a small molecule, metabolite, protein or nucleic acid, as well as a genetic mutation, can affect protein stability (16-19). PTMs can also alter thermal stability of the substrate protein (20-22). A cell lysate contains most potential substrates for any cellular enzyme. Therefore, proteome-wide monitoring of the thermal stability changes (19) in the cell lysate upon addition of a recombinant enzyme and an eventual cofactor has the potential to reveal enzyme substrates. However, the concomitant protein-enzyme and protein-cofactor interactions will interfere with and can mask modification-specific thermal stability changes of the substrates. This problem is addressed in our method of System-wide Identification of Enzyme Substrates by Thermal Analysis (SIESTA). SIESTA identifies specific thermal stability changes induced in substrate proteins by a combination of enzyme and co-factor as compared to the changes induced by either enzyme or co-factor alone. The idea of specific response is borrowed from our method of Functional Identification of Target by Expression Proteomics (FITeXP) (23). FITExP reveals specific changes in protein abundances, by contrasting proteome responses to a given drug with those to all other drugs and controls. We have recently shown that excellent specificity can be achieved in some cases with a few contrasting treatments, while in more general cases >10 contrasting treatments are required (24). Since in our experience, thermal stability changes are more specific than the proteome abundance responses, the number of contrasting treatments in SIESTA can be as low as three: “control” (cell lysate incubated with vehicle), “enzyme” (cell lysate incubated with an added purified enzyme, recombinant or cell-isolated), and “co-factor” (cell lysate incubated with an added cofactor). These treatments are contrasted with “enzyme + co-factor” to reveal specifically shifting proteins (Fig. 1).
SIESTA identified known and putative TXNRD1 substrates
As a proof of principle of the SIESTA approach, we first selected an enzymatic reaction involving disulfide bond reduction. Since such a reaction should destabilize substrate proteins and lead to negative ΔTm, the asymmetry between positive and negative values will be easy to verify. For this reaction we employed human selenoprotein thioredoxin reductase 1 (TXNRD1), a key oxidoreductase that catalyzes the reduction of several substrate proteins using NADPH as a co-factor (25). HCT116 cell lysate was therefore treated with vehicle, NADPH (1 mM), TXNRD1 (1 μM), or both, all experiments being done in duplicates. In LC-MS/MS analysis, 5864 proteins were quantified, of which 5637 proteins with at least two peptides (Table S1).
Changes in Tm after NADPH treatment revealed stabilization of several known NADPH-interacting proteins, as expected (Fig. 2A-B; listed in Table S2). The analysis of specific ΔTm shifts in the TXNRD1+NADPH treatment revealed that in the presence of NADPH, TXNRD1 destabilized 13 known substrate proteins (Fig. 2C). Several novel candidate substrate proteins were found. In general, the expected asymmetry in melting temperature shifts in favor of destabilization was well pronounced (Fig. 2C). A partial least square discriminant analysis (OPLS-DA) model (26) contrasting TXNRD1+NADPH with all other treatments was also used to reveal the specifically destabilized proteins (Fig. 2D). Such modeling showed its utility in FITExP (23, 24) and its main advantage is in the ability to use arbitrary number of contrasting treatments.
Examples of melting curves for proteins destabilized by TXNRD1 are shown in Fig. 2E. The 28 identified substrates (Table S3) mapped to the following INTERPRO Protein Domains and Features pathways: “thioredoxin-like fold” (8 proteins, p < 3e-11) and “peroxiredoxin, C-terminal” (4 proteins, p < 3e-08). GO and KEGG terms included “oxidoreductase activity” (13 proteins, p < 5e-10) and “glutathione metabolism” (4 proteins, p < 0.0002), in very good agreement with the known physiological roles of TXNRD1 (25).
GPX1 was the protein showing the strongest destabilization (Fig. 2C-E). GPX1 is a cytosolic selenoprotein usually considered to be glutathione-dependent (27), but at least some GPX isoenzymes are known to be directly reduced by TXNRD1 (28). Several peroxiredoxins (PRDXs), the highly abundant thioredoxin-dependent peroxidases, were also destabilized by the addition of TXNRD1 and NADPH, presumably through the action of thioredoxin present in the cell lysate (29). TXNL1 (or TRP32) (30), NXN (31), COPS5 (Or Jab-1) (32) and NFKB1 (33) are also well-known substrates of TXNRD1 or TXNRD1-dependent enzymes. GSTO1 and GSTO2 have thioredoxin-like domains with glutaredoxin-like activities (34) and are also likely to be TXNRD1-dependent. RNASET2 is an extracellular ribonuclease, dependent upon correct disulfide formation in the endoplasmic reticulum, which in turn is dependent upon TXNRD1 (35). There are also links between ETHE1 and thioredoxin system, as this protein affects polysulfide status (36). Other identified proteins are of interest as potential TXNRD1 substrates; their validation was outside the scope of this work.
SIESTA identified many novel putative substrates for PARP10
We next selected a more challenging system involving poly-(ADP-ribose) polymerase-10 (PARP10), a member of the PARP family of proteins that performs mono-ADP ribosylation of proteins (37). ADP-ribosylation is involved in cell signaling, DNA repair, gene regulation and apoptosis (38). Identification of PARP family substrates by mass spectrometry has generally proven challenging, as ADP-ribosylation is a glycosidic modification that can be easily lost during protein extraction or sample processing. It is also highly labile in the gas phase, which hampers its detection by MS/MS. Different strategies have thus been used to enrich the modified peptides for mass spectrometric analysis and use “gentle” MS/MS methods (39, 40). Unlike poly-ADP-ribosyltransferases, the mono-ADP-ribosyltransferases such as PARP10 have not been studied to great extent and this enzyme is in need of further functional characterization.
In contrast to the TXNRD1 system where protein substrates were mainly destabilized, we expected both destabilization and stabilization of substrate proteins with PARP10 system in SIESTA. Overall, 5194 proteins were quantified, of which 4979 with at least two peptides (Table S4). As a verification of experiment quality, several proteins known to interact with NAD were identified by comparing the ΔTm shifts in the NAD vs vehicle treatments (Fig. S1A-B, Table S5).
In total, 28 proteins changed their stability only when both PARP10 and NAD were present (12 proteins were destabilized and 16 stabilized), i.e. were identified by SIESTA as potential PARP10 substrates (Fig. 3A, Table S6). Melting curves for some of these proteins are shown in Fig. 3B. An OPLS-DA model contrasting “PARP10+NAD” Tm vs. those from all other treatments is given in Fig. S1C.
In GO terms analysis, the identified potential protein substrates of PARP10 mapped to “ribonucleoprotein complex” (7 proteins, p < 0.02), “nuclear part” (15 proteins, p < 0.02) and “nucleus” (19 proteins, p < 0.05), which is in line with known PARP10 functions and its cellular location (41). The stabilized proteins did not map to any common pathway, while three of the 12 destabilized proteins mapped to “double-stranded RNA binding” (p < 0.02) and six to “poly(A) RNA binding” (p < 0.04). Several proteins, such as ILF2 and ILF3, have already been reported as PARP10 substrates (42), which serves as an additional confirmation of the method.
The majority of the identified PARP10 substrates were novel, and to rank them for validation priority, their melting curves were manually examined and OPLS-DA loadings were sorted. Based on the availability of high purity recombinant proteins, the destabilized PDRG1 and HDAC2 as well as the stabilized CASP6 and RFK putative substrates were chosen for verification of the presence of mono-ADP-ribosylation as a result of PARP10-catalysis. After incubation with recombinant PARP10 and NAD, the above proteins were digested and analyzed with LC-MS/MS. Every higher energy collision dissociation (HCD) MS/MS event triggered in data-dependent acquisition was investigated in real time for the presence of signature ions of adenine (m/z 136.0623), adenosine-18 (m/z 250.094) and adenosine monophosphate (AMP, m/z 348.0709). The presence of these triggers would then initiate a second MS/MS event using electron-transfer dissociation (ETD) with a supplementary HCD activation. The obtained RFK sequence coverage was 94%, and ADP ribose moieties were found in four positions: on Glu140, Glu131, Glu113 and Arg14, ordered from highest to lowest peptide score (Table S7). The ETD MS/MS spectrum of a peptide with Glu140 is shown in Fig. 3C. The PDRG1 sequence coverage was 74%, and the protein was found modified with ADP in three locations: on Glu110, Glu75 and Asp32 (Table S7, Fig. S1D).
Since the HDAC2 sequence coverage with trypsin digestion was low despite supplementary LysC digestion, we used an in vitro chemiluminescence assay to validate the mono-ADP-ribosylation of HDAC2. Two different PARP10 catalytic domain constructs were used for that analysis, and both of them indeed significantly modified HDAC2 (Fig. 3D). These results verified RFK, PDRG1 and HDAC2 as novel PARP10 substrates.
Caspase-6 showed the largest specific stabilization (10.4 °C, Fig. 3A-B), but its modification was not verified in either of the two in vitro assays. It should also be noted that PARP10 was suggested to be a substrate for caspase-6 during apoptosis (43). PARP10 has a major cleavage site at D406 that is preferentially recognized by caspase-6 (43). The large specific thermal stabilization might therefore indicate that PARP10 induces a conformational change in caspase-6 and thus an increase in its stability by binding, as has been reported for other caspase-6 substrates (44). The reason why caspase-6 stabilization was not observed upon PARP10 addition in the absence of NAD may be that auto-modified PARP10 (in presence of NAD) is required for effective caspase-6 binding.
In summary, we demonstrated SIESTA to be a general approach for unbiased identification of protein substrates for specific enzymes in a proteome-wide manner. Besides confirming several known specific substrates for both TXNRD1 and PARP10, we uncovered a number of interesting potential novel candidates for these proteins, implicating them in important cellular processes.
SIESTA will likely be able to identify most types of protein substrates for a wide range of enzymes. The spatial resolution of the method can be increased by sub-cellular fractionation of the lysate prior to analysis. Furthermore, cell-or tissue-specific substrates should be possible to discover by comparing lysates from different sources. The ease of identifying enzyme-specific substrates offered by this technology can enhance our understanding of enzyme systems and disease, accelerate constructing high-throughput assays and thus facilitate drug discovery.
Materials and methods
Experimental Design
A SIESTA layout with four parallel treatments was designed (Fig. 1), with vehicle, enzyme, co-factor or both being added to a cell lysate of choice and incubated. Subsequently, the thermal stability of the incubated mixture was monitored at the proteome level. The specific melting temperature shifts resulting from combined action of enzyme and co-factor were identified by filtering out the shifts due to enzyme alone or cofactor alone. The top candidate substrate proteins were validated by targeted tandem mass spectrometry and in vitro chemiluminescence.
Cell culture
Human colorectal carcinoma HCT116 (ATCC, USA) cells were grown at 37 °C in 5% CO2 using McCoy’s 5A modified medium (Sigma-Aldrich, USA) supplemented with 10% FBS superior (Biochrom, Berlin, Germany), 2 mM L-glutamine (Lonza, Wakersville, MD, USA) and 100 units/mL penicillin/streptomycin (Gibco, Invitrogen). Low-number passages (<10) were used for the experiments.
Recombinant proteins
Human TXNRD1 was expressed recombinantly in E.coli and purified as described earlier (45). PARP10 full length protein and catalytic domain construct were produced as detailed before (46). RFK (ab89009) and PDRG1 (PRO-007) were purchased from Abcam and Prospec, respectively, while Caspase-6 (ALX-201-060-U100) and HDAC2 (BML-SE533-0050) were obtained from Enzo.
SIESTA experiment
Cells were cultured in 175 cm2 flasks, and were then trypsinized, washed twice with PBS, counted, resuspended in 50 mM HEPES pH 7.5, 2 mM EDTA (for TXNRD1) or in 50 mM HEPES pH 7.5, 100 mM NaCl and 4 mM MgCl2 (for PARP10), both with complete protease inhibitor cocktail (Roche). The cells were then freeze-thawed 5 times. The cell lysates were centrifuged at 21,000 g for 20 min and the soluble fraction was collected. The protein concentration in the lysate was measured using Pierce BCA assay (Thermo) and equally distributed into 8 aliquots (1 mL each). For TXNRD1, each pair of samples were incubated with vehicle, 1 mM NADPH, 1 μM TXNRD1, or with TXNRD1+NADPH at 37 °C for 30 min. For PARP10, each pair of samples were incubated with vehicle, 100 μM NAD, 400 nM PARP10, or with PARP10+NAD at 37 °C for 1 h. Each replicate was then aliquoted into 10 PCR microtubes and incubated for 3 min in SimpliAmp Thermal Cycler (Thermo) at temperature points of 37, 41, 44, 47, 50, 53, 56, 59, 63, and 67 °C. Samples were cooled for 3 min at room temperature and afterwards kept on ice. Samples were then transferred into polycarbonate thickwall tubes and centrifuged at 100,000 g and 4 °C for 20 min.
The soluble protein fraction was carefully transferred to new Eppendorf tubes. Protein concentration was measured in the samples treated at lowest temperature points (37 and 41° C) using Pierce BCA Protein Assay Kit (Thermo), the same volume corresponding to 50 μg of protein at lowest temperature points was transferred from each sample to new tubes and urea was added to a final concentration of 4 M. Dithiothreitol (DTT) was added to a final concentration of 10 mM and samples were incubated for 1 h at room temperature. Subsequently, iodoacetamide (IAA) was added to a final concentration of 50 mM and samples were incubated in room temperature for 1 h in the dark. The reaction was quenched by adding an additional 10 mM of DTT. Proteins were precipitated using methanol/chloroform. The dry protein pellet was dissolved in 8 M urea, 20 mM EPPS (pH=8.5) and diluted to 4 M urea. LysC was added at a 1 : 100 w/w ratio at room temperature overnight. Samples were diluted with 20mM EPPS to the final urea concentration of 1 M, and trypsin was added at a 1 : 100 w/w ratio, followed by incubation for 6 h at room temperature. Acetonitrile (ACN) was added to a final concentration of 20% and TMT reagents were added 4x by weight to each sample, followed by incubation for 2 h at room temperature. The reaction was quenched by addition of 0.5% hydroxylamine. Samples were combined, acidified by TFA, cleaned using Sep-PaK cartridges (Waters) and dried using DNA 120 SpeedVac™ Concentrator (Thermo). Samples were then resuspended in 0.1% TFA and fractionated into 8 fractions using Pierce™ High pH Reversed-Phase Peptide Fractionation Kit (Thermo).
Samples dissolved in buffer A (0.1% formic acid and 2% ACN in water) were loaded onto a 50 cm EASY-Spray column (75 μm internal diameter, packed with PepMap C18, 2 μm beads, 100 å pore size) connected to the EASY-nLC 1000 (Thermo) and eluted with a buffer B (98% ACN, 0.1% FA, 2% H2O) gradient from 5% to 38% of at a flow rate of 250 nL/min. The eluent was ionized by electrospray, with molecular ions entering an Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific). The survey mass spectrum was acquired at a nominal resolution of 120,000, with the m/z range from 400 to 1600. Precursors were isolated in 3s cycle time using 0.7 Da isolation width and fragmented via HCD with excitation set at 40%. MS2 spectra were acquired at 60,000 resolution, 105 ms maximum injection time and AGC target of 1e5. Dynamic exclusion was set to 60s.
Data processing
The raw LC-MS data (SIESTA) were analyzed by MaxQuant, version 1.5.6.5 (47). The Andromeda search engine matched MS/MS data against the Uniprot complete proteome database (human, version UP000005640_9606, 92957 entries). TMT10-plex on the MS/MS level was used for quantification of protein abundances. Cysteine carbamidomethylation was used as a fixed modification, while methionine oxidation was selected as a variable modification. Trypsin/P was selected as enzyme specificity. No more than two missed cleavages were allowed. A 1% false discovery rate was used as a filter at both protein and peptide levels. For all other parameters, the default settings were used. After removing all the contaminants, only proteins with at least two peptides were included in the final dataset.
Network mapping
For GO term and pathway analyses, STRING version 10.5 (http://string-db.org) protein network analysis tool was used (48). Medium confidence threshold (0.4) was used to define protein-protein interactions. In-built gene set enrichment analysis with the whole genome as a background was used to identify enriched gene ontology terms and pathways.
Validation of mono-ADP-ribosylation by targeted tandem mass spectrometry
Recombinant RFK (5 μg) and PDRG1 (5 μg) were diluted with 50 Mm HEPES (pH = 7.5), 0.5 mM TCEP, 100 mM NaCl, 100 μM NAD, 4 mM MgCl2 and incubated with 400 nM of PARP10 for 1 h. Proteins were reduced with 10 mM DTT for 30 min and alkylated with 50 mM IAA for 30 min in the dark. Afterwards, 1 M urea was added to the samples and LysC (overnight) and Trypsin (6 h) were added sequentially at 1 : 100 w/w to protein. After acidification, samples were cleaned using StageTips. Samples were dissolved in 0.1% FA and 1 μg of each samples was analyzed with LC-MS using a 1 h gradient.
The chromatographic separation of peptides was achieved using a 50 cm Easy C18 column connected to an Easy1000 LC system (Thermo Fisher Scientific). The peptides were loaded onto the column at a flow rate of 1000 nL/min, and then eluted at 300 nL/min for 50 min with a linear gradient from 4% to 26% ACN/0.1% formic acid. The eluted peptides were ionized with electrospray ionization and analyzed by an Orbitrap Fusion mass spectrometer. The survey mass spectrum was acquired at the resolution of 120,000 in the m/z range of 300-1750. The first MS/MS event data were obtained with a HCD at 32% excitation for ions isolated in the quadrupole with a m/z width of 1.6 at a resolution of 30,000. Mass trigger filters targeting adenine, adenosine and AMP ions were used to initiate a second MS/MS event using ETD MS/MS with HCD supplementary activation at 30% collision energy and with a 30,000 resolution. Samples treated with NAD but no PARP10 were used as negative controls.
Spectra were converted to Mascot generic format (MGF) using in-house written RAWtoMGF v. 2.1.3. The MGFs files were then searched against the UniProtKB human database (v. 201806), which included 71434 sequences. Mascot 2.5.1 (Matrix Science) was used for peptide sequence identification. Enzyme specificity was set to trypsin, allowing up to two missed cleavages. C, D, E, K, N, R and S residues were set as variable ADP-ribose acceptor sites. Carbamidomethylation was set as a fixed modification on C and oxidation as a variable modification on M.
In vitro mono-ADP-ribosylation assay
Hexahistidine-tagged PARP10 catalytic domain (auto-modification) or protein substrate (substrate protein modification) was immobilized on Ni2+-chelating microplates (5-PRIME). TEV-cleaved PARP10 catalytic domain was used for evaluation of substrate protein modification. Mono-ADP-ribosylation was assessed after incubation with 100 μM NAD+ (including 2% biotinylated NAD+, Trevigen) prior to chemiluminescence detection of biotinyl-ADP-ribose in a Clariostar microplate reader (BMG Labtech) as described in detail before (46).
Statistical Analysis
Curve normalization and fitting was done by an in-house R package, available in GitHub (https://github.com/RZlab/SIESTA). Briefly, after removing the contaminant proteins and those quantified with less than two peptides, protein abundances in temperature points 41-67 °C were normalized to the total proteome melting curve similar to Franken at al. (49). Individual protein abundances were scaled so that the lowest temperature intensity was set to 1. For each protein in each replicate, a sigmoid curve was fitted using non-linear least squares method according the formula:
where Pl – high-temperature plateau of the melting curve, Tm – melting temperature, b – slope of the curve.
Difference in Tm between samples was assessed using modified t-test. T-statistics was calculated assuming normal distribution of Tm estimation errors with non-equal variances between replicates. T-test derived p-values were adjusted for multiple comparison using Benjamini–Hochberg method. The melting curves for all putative candidates were also manually inspected to exclude false positives.
Funding
The research is funded by grants from the Knut and Alice Wallenbergs Foundation (grant KAW 2015.0063) and VINNOVA (grant Oxidocurin) awarded to R.Z.; E.A. is supported by grants from Karolinska Institutet, Swedish Cancer Society and Swedish Research Council, and H.S. by Swedish Research Council (grant 2015-4603). K.N. was supported by a stipend from the Wenner-Gren foundation.
Authors contributions
Conceptualization, R.A.Z. and A.A.S.; methodology and experiment design, A.A.S., R.A.Z., J.A.W., H.S., E.A., S.R. and T.K.; project organization, training, resources and funding acquisition, R.A.Z., H.S. and E.A; SIESTA experiments, A.A.S. and P.S.; targeted mass spectrometry for ADP-ribose confirmation, A.A.S. and A.V.; data analysis and visualization, A.A.S., C.B. and A.C.; production and testing of enzymes, A.G.T. and Q.C.; in vitro mono-ADP-ribosylation assays, K.N.; writing—original draft, A.A.S. and R.A.Z.; writing—review & editing, A.A.S., R.A.Z., H.S., E.A and J.A.W.
Competing interests
Authors declare no conflicts of interest.
Data and materials availability
Excel files containing the analyzed data are provided in Supplementary Materials. The mass spectrometry data were deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository (50) with the dataset identifier PXD010554.
List of Supplementary Materials
Materials and Methods
Tables S1 to S7
Fig. S1
Acknowledgements
We are grateful to Marie Ståhlberg and Carina Palmberg for their assistance in different proteomic experiments.