Abstract
Protein acetylation is a widespread post-translational modification that is implicated in many cellular processes. Emergent technologies such as mass spectrometry has catalogued thousands of sites throughout the cell, however identifying regulatory acetylation marks has proven to be a daunting task. Here, we reveal the dynamic nature of site-specific acetylation in response to serum stimulation. An improved method of quantifying acetylation stoichiometry was developed and validated, providing a detailed landscape of dynamic acetylation stoichiometry within cellular compartments. The nuclear compartment displayed significantly higher median stoichiometry, in line with the existence of known acetyltransferases. Site-specific, temporal acetylation using a serum starved-refed cell culture model uncovered distinct clusters of acetylations with diverse kinetic profiles. Of particular note, dynamic acetylation sites on protein translational machinery suggests a major regulatory role.
Introduction
Reversible lysine acetylation is a regulatory modification which has emerged as a widespread modification, rivaling phosphorylation in scope (1). Protein acetylation was first discovered on the N-terminal lysine residues of histone proteins localized to the nucleus (2, 3) and has since been identified throughout the cell including: cytoplasm (4), mitochondria (5), endoplasmic reticulum (6), peroxisomes (7), etc. In the nucleus, histone acetylation is associated with active gene expression, acting in part to open chromatin and allowing access to transcriptional machinery. Bromodomain containing proteins recognize and bind acetyl-lysine residues for recruitment of larger multisubunit complexes and activation of gene transcription (8, 9). Acetylation of cytoplasmic proteins affects diverse cellular processes which include cell migration, cytoskeleton dynamics, metabolism, and aging. Mitochondrial protein acetylation has been linked to metabolic regulation, oxidative stress, OXPHOS, and mitochondrial gene expression (10).
Reversible acetylation is catalyzed by the enzymatic activity of lysine acetyltransferases (KATs) and deacetylases (KDACs), however mounting evidence suggests that a significant proportion of acetyl-lysine sites are a result of a nonenzymatic mechanism (11–13). We previously measured second order rate constants for nonenzymatic acetylation and found that mimicking the chemical conditions of the mitochondrial matrix in vitro could explain a nonenzymatic mechanism, especially for those in vivo sites with low acetylation stoichiometry (14). With the number of in vivo acetylation sites reaching over 20,000 in human cells (15), a way to prioritize the functional and regulatory acetylation sites is needed, whether or not acetylation is enzyme catalyzed. Therefore, quantifying acetylation stoichiometry would provide critical information to understand regulatory acetylation networks.
Lysine acetylation has not been associated with signaling cascades, where the acetylation event of one acetyltransferase leads to acetylation of a second acetyltransferase to transmit a biological signal. Such cascades are used to amplify and propagate a signal down a signal transduction pathway. Rather, evidence suggests that acetylation can modulate protein-protein and protein-DNA interactions, cellular localization, enzyme activity and stability (10). In mitochondria, protein acetylation generally serves as an inhibitory mark for enzymatic function. In this regard, acetylation appears to function as a rheostat to modulate the degree of a biochemical process. However, it is not known what level of stoichiometry is needed for a biological effect.
Cells adjust their signaling networks in response to environmental changes. Phosphorylation cascades are well known examples of this response. Cells also respond to external stimuli with dynamic reversible acetylation. For example, upon glucose stimulation, the acetyltransferase, p300, acetylates cytosolic phosphoenolpyruvate carboxykinase (PCK1), which increases the conversion of phosphoenolpyruvate to oxaloacetate, the anaplerotic reaction of PCK1 (16). Activation of cytoplasmic signal transducer and activator of transcription 3 (STAT3) by CBP/p300 upon growth factor and insulin stimulation regulates mitochondrial metabolism (17).
How dynamic acetylation is regulated at the proteome level remains unknown. In this study, we improved upon our method to quantify acetylation stoichiometry at the proteome level, which was benchmarked using cellular proteomes with defined stoichiometry. We quantified steady state as well as dynamic acetylation stoichiometry in response to growth factor stimulation in MCF7 cells, which ranged from less than 1% to 99%. Acetylation stoichiometry distribution across cellular compartments was significantly different. The nuclear compartment, which contains the majority of documented lysine acetyltransferases, displays the highest acetylation stoichiometry, while mitochondrial and cytoplasmic proteins displayed the lowest stoichiometry. RNA binding proteins including ribosomal proteins were enriched in high acetylation stoichiometry. To understand how acetylation and proteome dynamics are concomitantly modulated in cells, MCF7 cells were synchronized by serum depletion/refeeding and monitored for changes in the proteome and site specific acetylation, revealing rapid and dynamic changes in acetylation and protein expression profiles. Quantifying acetylation stoichiometry dynamics will be a critical tool for prioritizing the ever-increasing number of detected lysine acetylation sites and towards a deeper understanding of this regulatory modification.
Results
DIA acetylation stoichiometry method optimization
We previously reported a method to determine lysine acetylation stoichiometry across an entire proteome (18). This method employs an isotopic chemical acetylation approach to label all unmodified lysine residues within a sample and, upon proteolytic digestion coupled to LC-MS/MS, has been utilized to quantify proteome-wide acetylation stoichiometry in various biological conditions (18–21) as well as in in vitro, nonenzymatic acetylation kinetics (14). Here, we utilize an improved method to quantify acetylation stoichiometry using peptide prefractionation coupled with data-independent acquisition (DIA) mass spectrometry to quantify global acetylation stoichiometry (Figure 1A). Briefly, a sample is chemically acetylated using isotopic acetic anhydride and digested with trypsin and GluC. The sequential digestion of the acetyl-proteome generates shorter peptides for MS analysis. Peptides are then prefractionated offline using high pH reverse phase (HPRP) chromatography, analyzed using nano-LC-MS/MS in DIA mode and analyzed using Spectronaut™ v11 (22–25). A project specific spectral library is generated from a 12C-acetic anhydride labeled sample and acquired in data-dependent acquisition (DDA) mode. In order for the spectral library to include both the light and heavy acetyl-lysine fragment ions, a novel, standalone software was developed to be used with Spectronaut™ which generates the heavy labeled fragment ion in silico from the 12C-AcK spectral library. This process assures that for a given peptide all heavy and light combinations are contained within the spectral library. Combining offline prefractionation and DIA analysis addresses unique limitations of the original study as previously discussed (21). HPRP prefractionation reduces interferences caused by coeluting peptides and has the added benefit of increasing the depth of the acetylome coverage. DIA is used to measure multiple light and heavy fragment ions of a precursor, which are used for stoichiometry calculation, thus allowing for multiple measurements of a unique lysine site even when multiple lysines are present on a peptide (Figure 1B, C).
To assess the accuracy and precision of our improved workflow, we generated a proteome-wide stoichiometry curve encompassing 1 – 99% acetylation stoichiometry. A whole cell lysate was labeled with light (12C-) and heavy (D6-) acetic anhydride, mixed at varying ratios, trypsin digested, and subjected to our DIA workflow. Due to the 3 Da mass shift between the light and heavy acetyl peptides, high acetylation stoichiometry will lead to an underestimation of stoichiometry due to increased intensity of the M+3 natural abundance isotopic peak. To account for this, we incorporated a natural abundance correction to all heavy acetyl lysine fragment ions. This correction was performed by subtracting the M+3 isotopic peak of the light acetyl lysine fragment ion from the M+0 isotopic peak of the heavy acetyl lysine fragment ion (Figure 2A). This global correction improved the precision of the stoichiometry quantification, especially in the higher stoichiometry values (Figure 2B). An alternative method to assess the precision of the quantification is to measure the ratio of the light and heavy fragment ions. Quantification of the light/heavy ratios corresponding to stoichiometry profiles between 20 and 80% displayed the highest precision (Figure 2C). This is due to the abundance values of the light and heavy fragment ions near 1:1 ratio. In contrast, stoichiometries at the extreme ends of the curve (< 5% and > 95%) displayed the lowest precision since quantitation in these conditions require the measurement of fragment ions greater than 20-fold difference (Figure 2C). Principal component analysis (PCA) also detects the increase in stoichiometry across the experimental conditions (Figure 2D). The first principal component (PC1) represents the variability associated with the input stoichiometry, and, as expected, increases in a linear fashion. The stoichiometry curve analysis quantified stoichiometry for ~1400 acetyl lysine sites. The number of acetyl sites quantified in at least nine conditions was 616. Linear regression analysis using acetyl lysine sites quantified in at least nine conditions shows high reproducibility of our method (Figure 2E) with a median R2 of 0.94, after correction for multiple regression analysis (Figure 2E inset). This global analysis with well defined input stoichiometries highlights the quantitative nature of this method and is applicable to query acetylation stoichiometry of an entire proteome.
Our improved stoichiometry workflow enables the quantitation of different lysines from a single peptide, removing any ambiguity of site quantification. As an example, the histone H3 peptide (containing K18 & K23), KAcQLATKAcAAR, has fragment ions that are unique to each lysine site. K18 is quantified by the fragment ions b2-b3, while y4-y8 are specific for K23 (Figure 1C). Obtaining high quality and high coverage of b- and y-ions will provide quantification of multiple lysines on the same peptide. Therefore, we optimized the normalized collision energy (NCE) for higher energy collisional dissociation (HCD) fragmentation (26) and evaluated the number of peptide spectral matches (PSMs) (Figure 3A) as well as the global b- and y-ion coverage (Figure 3B) across a wide range of NCEs (15-50 in 5 unit increments) using a chemically acetylated, trypsin and GluC digested proteome. To determine the global b- and y-fragment ion coverage, we counted each fragment ion identified for a given PSM and normalized to the peptide length. As y-ions increase with higher NCE, the proportion of b-ions begin to decline at a similar rate (Figure 3B). The NCE 25 was used to balance the frequency of b-ions, y-ions, as well as the number of PSMs (Figure 3A). A negligible amount of PSMs with a c-terminal lysine are seen (blue bar) demonstrating our ability to chemically acetylate the entire proteome to near completion.
Subcellular distribution of acetylation stoichiometry
Studies measuring (or estimating) acetylation stoichiometry in mammalian systems are limited and most have focused on the mitochondrial or nuclear compartments of the cell (20, 27). Thus, there has not been a clear picture of the acetylation stoichiometry distribution across the cell. To address this, we utilized our quantitative stoichiometry approach using MCF7 cells, a breast cancer cell line and quantified a wide range of stoichiometry (< 1% up to 97%) with high correlation between acetyl lysine fragment ions (red) and peptides (blue) between the three biological replicates (Figure 4A). To identify biological processes that are enriched in acetylation, we applied a biological pathway analysis tool that we recently developed termed quantitative site set functional score analysis (QSSA) to our acetylation stoichiometry dataset (28). This tool was developed for PTM pathway enrichment analysis taking into account the number of modified sites in a given pathway as well as the relative change across conditions (fold-change) and was adapted for acetylation stoichiometry datasets. The stoichiometry from MCF7 cells was divided into quartiles (for stoichiometry ranges, see Materials and Methods). Each quartile was used as input for the QSSA. Gene Ontology processes that were enriched in this study include Metabolic Pathways, Ribosome, Spliceosome, and Protein Processing in Endoplasmic Reticulum (Figure 4B). Enrichment of metabolic pathways is a hallmark of previous acetylation studies (3, 28, 29) and processes associated with RNA protein complexes have been previously identified in a wide range of organisms including E. coli, B. subtilis, P. falciparum, and S. cerevisiae (18, 30–32), confirming our QSSA results.
Mass spectrometry is inherently biased towards quantifying high abundant peptides and proteins. Therefore, we wanted to quantify the levels of acetylation where high as well as low abundant proteins were measured, specifically within subcellular compartments. To do this, we performed a differential centrifugation and acid extraction on MCF7 cells to enrich for histone proteins, non-histone nuclear fraction, mitochondria, and the cytosolic fraction. Each fraction was treated with a combination of proteases in order to digest the proteins of each subcellular compartment to individual amino acids. The relative abundance of acetyl-lysine and unmodified lysine can be measured using mass spectrometry (33) (Figure 4C). This approach reduces sample complexity and provides a global and unbiased view of the entire pool of modified and unmodified lysine residues. Acetylation is significantly enriched on histone proteins and the nuclear fraction compared to the cytoplasm and mitochondrial fractions (Figure 4D). We then asked whether our acetylation stoichiometry data agreed with the targeted lysine amino acid quantification results. The number of acetyl-lysine containing peptides corresponding to each of the subcellular fractions analyzed were disproportionately measured. We therefore grouped each protein accession into a cytoplasmic or nuclear fraction. Using this grouping and a non-parametric analysis, the nuclear fraction contains more acetylation sites with a higher stoichiometry compared to the cytoplasmic fractions (p = 0.00027) (Figure 4E).
Acetylation and proteome dynamics
Protein acetylation can be modulated by the enzymatic activity of acetyltransferases, deacetylases and, non-enzymatically, by the levels of acetyl-CoA (10, 29, 36). Additionally, changes in acetylation stoichiometry can be observed in conjunction with changes in protein abundance. Therefore, understanding dynamics of not only protein acetylation, but also proteome dynamics will be important to understand the interplay between these two processes. The majority of studies quantifying acetylation dynamics utilize an antibody based workflow to enrich for the acetyl-peptides (37, 38). Using an enrichment strategy, it is necessary to account for changes in protein abundance in order to accurately report changes in acetylation by analyzing a sample of the proteome which was not subjected to the immuno-enrichment procedure. Our protocol does not utilize an enrichment step. Instead, we chemically modify all free lysine residues using acetic anhydride, a step which is analogous to the alkylation of cysteine residues with iodoacetamide. Therefore, precursor abundance data collected from our acetylation stoichiometry workflow can also be used to estimate protein abundance using label free quantification techniques.
To understand cellular protein and acetylation dynamics, a serum starve-replete cell culture model system was used which has the benefits of synchronizing the cells upon serum starvation followed by robust changes in signaling pathways that occur upon serum replenishment (17, 39, 40). MCF7 cells were grown to 70-80% confluency and serum starved for 24 hours. Cells were then replenished with fresh media containing serum and harvested at 0, 1, 2, 4 hours (Figure 5A). To verify activation of major signaling pathways, we monitored the level of phosphorylation of ribosomal protein S6 (Figure 5B, C), as a proxy for mTOR signaling activation (39).
Acetylation stoichiometry was quantified using our DIA-MS approach followed by a pattern recognition analysis using fuzzy c-means clustering (41). This analysis identified four distinct clusters where acetylation is dynamic with varied profiles (Figure 5D, E). Over two thirds of the acetylation sites identified in this clustering analysis were found in clusters 1 and 3 which display rapid changes upon growth factor stimulation. These clusters correspond to acetylation levels that rapidly increased and returned to pretreatment baseline levels (Cluster 1) as well as trends where acetylation rapidly decreased and remained at a consistent level (Cluster 3), respectively. Gene ontology (GO) enrichment analysis was performed on the acetylation stoichiometry clusters demonstrating an enrichment of biological processes associated with cell growth such as transcription and translation (Figure 5F). Protein abundance was determined by label free quantification using MSstats with our chemically acetylated proteome sample (42) followed by the clustering analysis using fuzzy c-means (Figure 5G, H) (41).
To investigate the co-trends of protein and acetylation dynamics, we compared the trends of acetylation with protein clusters (Figure 5I). Observed trends in protein acetylation can be the result of several biochemical processes such as: acetylation, deacetylation, changes in protein abundance or any combination of these processes. For example, an increase in protein abundance without lysine acetylation (or deacetylation) would result in a trend showing a decrease in acetylation stoichiometry. In fact, we do observe this relationship when comparing acetylation cluster 3 and protein clusters 2 and 4. Lastly, acetylation stoichiometry cluster 4 represents acetylation sites which increase over time, which follows the same trend as the phosphorylation of ribosomal protein S6 (Figure 5C).
Discussion
Integrated stoichiometry workflow
We previously developed a method to quantify acetylation stoichiometry in an entire proteome, where a protein extract is chemically modified with an isotopic acetic anhydride, digested, and analyzed by high resolution mass spectrometry (18). In this study, we integrated our workflow for quantifying acetylation stoichiometry with Spectronaut™ for the targeted analysis of the light and heavy acetyl fragment ions using a novel spectral library. A spectral library which contains all light and heavy acetyl-lysine feature pairs was generated. The samples for the project specific spectral library were generated by chemically labeling a sample with 12C-acetic anhydride followed by proteolytic digestion and analysis by DDA-MS. The heavy acetyl fragment ions were generated in silico using a 3 Dalton shift. This step ensures that all light and heavy acetyl lysine pairs are used for the targeted analysis of acetylation stoichiometry.
Acetylation stoichiometry distribution in the cell
Using our integrated stoichiometry workflow, we quantified the distribution of acetylation stoichiometry across MCF7 cells, observing a significantly higher proportion of proteins with higher acetylation stoichiometry within the nuclear compartment. Binning the stoichiometry into equal sized quartiles and performing a PTM pathway enrichment analysis (28), we found that metabolic pathways were enriched for all levels of acetylation. Additionally, KEGG pathways associated with RNA binding such as the Ribosome, Spliceosome and RNA Degradation display a high QSSA enrichment score. It was recently shown that the lysine acetyltransferases, NAT8/NAT8B, are localized in the lumen of the endoplasmic reticulum and function to acetylate properly folded proteins as proteins traverse through the secretory pathway (43). NAT8/NAT8B use acetylation to signal correctly folded proteins. In this context, acetylation stoichiometry would be expected to be higher due to the localization of the acetyltransferases. Indeed, our QSSA results identify higher acetylation stoichiometry in this cellular compartment. Additionally, amino acid quantification from digested subcellular proteomes agreed with the site-specific acetylation distribution. Interestingly, well-characterized acetyltransferases reside in the nuclear compartment, highlighting the strong regulatory role of epigenetic and gene expression systems.
Integrated analysis of protein and acetylation dynamics
Understanding how acetylation is regulated in the cell is an active area of investigation and quantifying the dynamics is a critical component to understand this regulatory modification. To understand dynamic acetylation and signaling events, we quantified acetylation and protein dynamics upon serum starvation followed by growth factor stimulation. Acetylation and protein groups were clustered using fuzzy c-means, which allows a protein (or lysine site) to belong to multiple groups, reflecting the multiple functions of proteins within a cell. We identified several patterns in acetylation dynamics upon serum replenishment with the following trends. Acetylation cluster 1, acetyl sites with baseline stoichiometry, increase rapidly upon serum repletion followed by a return to baseline. Acetylation cluster 3 are acetyl sites that begin elevated, decrease rapidly and remain low. Finally, acetylation cluster 4 are acetyl sites which begin at a baseline and increase steadily. We quantified an acetylation site of ribosomal protein S6 (K64Ac) within cluster 4. Interestingly, the phosphorylation level of ribosomal protein S6 also appears very similar to the profile of acetylation cluster 4 (Figure 5C). Whether there is any cross-talk between acetylation and phosphorylation of ribosomal protein S6 is unclear. The method described in this study provides a valuable way to decipher regulatory acetylation events. Determining proteome-wide acetylation kinetics will help decode functional vs. spurious acetylation events and prioritize those regulatory lysine sites.
Methods
Cell Culture conditions
MCF7 cells were grown using DMEM supplemented with 10% FBS. For global acetylation stoichiometry and single amino acid analysis, MCF7 cells were harvested at ~80% confluency. Four hours prior to harvesting, cells were washed with PBS and replaced with fresh media. MCF7 cells were cultured for a total of 48 hours.
Sample preparation
Protein chemical acetylation and digestion
Equal amount of protein (200 μg) was resuspended into 25-30 μL of urea buffer (8 M urea (deionized), 500 mM ammonium bicarbonate pH = 8.0, 5 mM DTT). Incubation steps throughout the sample preparation are carried out using the Eppendorf ThermoMixer® C. Sample was incubated at 60 °C for 20 minutes while shaking at 1500 RPM. Cysteine alkylation was carried out with 50 mM iodoacetamide and incubating for 20 minutes. Chemical acetylation of unmodified lysine residues was performed as previously described (14, 18, 33). Briefly, ~20 μmol of the “light” 12C-acetic anhydride (Sigma) or “heavy” D6-acetic anhydride (Cambridge Isotope Laboratories) was added to each sample and incubated at 60 °C for 20 minutes at 1500 RPM. The pH of each sample was raised to ~8 using ammonium hydroxide and visually checked with litmus paper. Two rounds of chemical acetylation were performed for each sample to ensure near complete lysine acetylation. To hydrolyze any O-acetyl esters formed during the chemical acetylation, the pH of the sample raised to ~8.5 and each sample was incubated at 60 °C for 20 minutes at 1500 RPM. For protein digestion, the urea concentration of each sample was diluted to ~2 M by adding 100 mM ammonium bicarbonate pH = 8.0 followed by addition of trypsin (Promega) at a final ratio of 1:100. The sample was digested at 37 °C for 4 hours while shaking at 500 RPM. If a second digestion using gluC (Promega) occurred, the urea concentration was further diluted to ~1 M using 100 mM ammonium bicarbonate pH = 8.0 and digested with gluC (1:100) at 37 °C overnight while shaking at 500 RPM. Each sample was acidified by addition of 15 μL of acetic acid.
Digesting protein sample to single amino acids
For complete digestion of proteins, which converts all unmodified lysine residues to free lysine, and all N-ε-acetylated lysine residue to acetyl-lysine, 20 μg of sample was diluted into 50 μL of digestion buffer (50 mM ammonium bicarbonate, pH 7.5, 5 mM DTT, in LC–MS grade water). A sample with 50 μL digestion buffer without protein was also included as procedural blank. The samples were digested to single amino acids by treatment with three enzymes sequentially: First, samples were treated with 0.4 μg Pronase and incubated for 24 hr at 37°C. Then the Pronase activity was stopped by heating to 95°C for 5 min. After cooling down to ambient temperature, samples were then treated with 0.8 μg aminopeptidase and incubated at 37°C for 18 hr. Aminopeptidase activity was again stopped by heating samples to 95°C for 5 min and cooling down. Finally, samples were digested with 0.4 μg prolidase and incubated at 37°C for 3 hr. To extract the resulted single amino acids, 200 μL LC–MS grade acetonitrile (ACN) was added to each sample. The mixture was vortexed for 5 sec, spun at maximal speed for 5 min, and supernatant was saved for analysis by LCMS.
Offline Higʼn pH Reverse Phase (HPRP) Prefractionation
Chemically acetylated peptides were resuspended into ~2mL of HPRP buffer A (100 mM Ammonium Formate pH = 10) and injected onto a preequilibrated Phenomenex Gemini® NX-C18 column (5μm, 110 Å, 150 × 2.0mm) with 2% buffer B (10% Buffer A, 90% acetonitrile). Peptides were separated with a Shimadzu LC-20AT HPLC system using a 2% - 40% buffer B linear gradient over 30 minutes at 0.6 mL/min flow rate, collecting 24 fractions throughout the length of the gradient. Fractions were dried down using a speedvac and pooled by concatenation into 6 final fractions as described previously (44).
Mass spectrometry
Liquid chromatography
Peptides were separated with a Dionex Ultimate 3000 RSLCnano HPLC using a Waters Atlantis dC18 (100 μm × 150 mm, 3μm) C18 column. The mobile phase consisted of 0.1% formic acid (A) and acetonitrile with 0.1% formic acid (B). Peptides were eluted with a linear gradient of 2 – 35% B at a flow rate of 800 nL/min over 90 minutes. Peptides were injected by nanoelectrospray ionization (Nanospray Flex™) into the Thermo Fisher Q Exactive™ Hybrid Quadrupole-Orbitrap™ Mass spectrometer.
Data-dependent acquisition mass spectrometry
For data dependent acquisition (DDA), the MS survey scan was performed in positive ion mode with a resolution of 70,000, AGC of 3e6, maximum fill time of 100 ms, and scan range of 400 to 1200 m/z in profile mode._Data dependent MS/MS was performed in profile mode with a resolution of 35,000, AGC of 1e6, maximum fill time of 200 ms, isolation window of 2.0 m/z, normalized collision energy of 25, dynamic exclusion was set for 30 seconds, and a loop count of 20.
Data-independent acquisition mass spectrometry
For data-independent acquisition (DIA), the MS survey scan was performed in profile mode with a resolution of 70,000, AGC of 1e6, maximum fill time of 100 ms in the scan range between 400 and 1000 m/z. The survey scan was followed 30 DIA scans in profile mode with a resolution of 35,000, AGC 1e6, 20 m/z window, and NCE of 25 or 30. For both DDA and DIA methods, the source voltage was set at 2000 V and capillary temperature at 250 °C.
LCMS analysis of single amino acids
The abundances of free lysine, acetyl-lysine, and other amino acids from completely digested protein samples were analyzed using a Thermo Fisher Q Exactive™ Hybrid Quadrupole-Orbitrap™ Mass spectrometer coupled to a Dionex UltiMate 3000 UHPLC system. Samples are separated using a 5 μm polymer 150 2.1 mm SeQuant® ZIC®-pHILIC column, with the following gradient of solvent A (ACN) and solvent B (10 mM ammonium acetate in water, pH 5.5) at a flow rate of 0.3 mL/min: 0-2min, 10% solvent B; 2-14min, linearly increase solvent B to 90%; 14-17min, isocratic 90% solvent B; 17-20min, equilibration with 10% solvent B. Samples are introduced to the mass spectrometer by heated electrospray ionization. Settings for the ion source are: 10 aux gas flow rate, 35 sheath gas flow rate, 1 sweep gas flow rate, 3.5 kV spray voltage, 320°C capillary temperature, and 300°C heater temperature. Analysis is performed under positive ionization mode, with scan range of 88–500 m/z, resolution of 70 K, maximum injection time of 40 ms, and AGC of 1E6.
To quantify absolute levels of lysine and acetyl-lysine, an external calibration curve was run in the same sequence with the experimental samples. Lysine standard ranges between 10 to 200 μM, and acetyl-lysine standard ranges between 0.5 to 10 μM. Signal from procedural blank was subtracted from samples.
Data Processing
Generating 12C-AcK and D3-AcK Spectral Library
The spectral library consists of a catalogue of high quality MS/MS fragmentation spectra resulting from data-dependent acquisition (DDA) MS runs. For our workflow, we performed DDA runs on an MCF7 lysate which was chemically acetylated with 12C-acetic anhydride, digested with trypsin and gluC, followed by HPRP prefractionation (see above). Prior to MS analysis, iRT peptides (Biognosys) were spiked into each sample following manufacturer’s guidelines. Database search was performed using MaxQuant version 1.5.4.1 using lysine acetylation and methionine oxidation as variable modifications and cysteine carbamidomethylation as fixed modification. The MaxQuant search results were imported into Spectronaut to build the 12C-AcK library. The 12C-AcK spectral library was then exported as a spreadsheat from Spectronaut and imported into a custom spectral library modifier, which completes the spectral library for all combinations of light and heavy acetylated peptides. With this in silico approach to inflate the 12C-AcK library, every acetylated peptide precursor will be represented by 2n versions differing in the number and position of heavy/light acetylated lysine, where n is the number of acetylation sites in the peptide. The spectral library was completed with the corresponding precursor m/z values and fragment m/z values. The most intense fragment ions selected from the initial MS2 spectrum were cloned to the other peptide precursor versions. All peptide precursor versions will have identical retention time and hence iRT was also cloned.
DIA MS data analysis
Data from DIA-MS was analyzed using Spectronaut 10. Thermo raw files were converted to HTRMS files with the Spectronaut Raw to HTRMS converter using the default settings and input into Spectronaut. The Spectronaut default settings for quantitation were used with slight modification: Identification-Qvalue score was set to 0.1 and Workflow-Unify peptide peaks was selected. This will cause Spectronaut to use the same integration boundaries for all light/heavy versions of one acetylated peptide within one LC-MS run. This change in workflow will instruct Spectronaut to select for a given acetylated peptide precursor the best signal (by q-value) of the 2n versions in the spectral library (see above). With this workflow Spectronaut will then transfer the integration boundaries of the best scoring peptide precursor to the other peptide precursors. Because all of the 2n peptide precursor versions only differ by the number of heavy instead of light acetylated lysine the retention time is expected to be identical. The spectral libraries which were completed as described above for all the light/heavy peptide precursor versions were used with this workflow. A Spectronaut output file containing all the fragment ion peak areas along with the corresponding peptide and protein identification was exported and used to compute the lysine site stoichiometry. A list containing all the data categories used for downstream stoichiometry analysis is found in supplemental information.
Stoichiometry data processing
Data processing was performed in R v3.5.0 (http://www.r-project.org/) using an in-house made R script, which is available in the supplementary information. The stoichiometry preprocessing pipeline consists of two major steps: quantifying fragment ion stoichiometry and natural abundance isotopic correction.
Quantifying site-specific stoichiometry
DIA MS measures multiple peptide fragment ion abundances so our approach allows for quantitation of multiple lysines within a peptide. Acetylation stoichiometry of unique lysine sites are quantified by matching light and heavy fragment ion pairs and using the equation: where XICL is the peak area of the light fragment ion and XICH is the peak area of the heavy fragment ion.
Isotopic purity correction
The mass shift of the light and heavy AcK peptides is 3 Da. This causes the M+0 peak of the heavy AcK peptide to overlap with the M+3 peak of the light AcK peptide. Therefore, we are correcting for the isotopic distribution overlap between the peptide pairs. This is done using an in-house R script as well as the R package, BRAIN v1.16.0 (Baffling Recursive Algorithm for Isotopic DistributioN calculations), available from Bioconductor, the open source, software project (http://www.Bioconductor.org/) (45). To correct for natural abundance of 13C isotope, the M+0 and M+I, where I represents the isotopic mass shift +1 or +3, were used to calculate the correction coefficient.
The correction coefficient is used to calculate the correction value: where the XICL is the peak area of the light fragment ion. Finally, the corrected heavy peak area (CorrXICH) is calculated: where the XICH is the peak area of the heavy fragment ion. The corrected stoichiometry is quantified using equation 1, substituting with CorrXICH.
MSstats
Protein abundance summarization was performed using MSstats v3.12.0 with the output of Spectronaut as the input. The function “SpectronauttoMSstatsFormat” was used with the following arguments: intensity set to “PeakArea”, filter_w_Qvalue set to TRUE, qvalue_cutoff set to 0.01, useUniquePeptide set to TRUE, fewMeasurements set to “remove”, removeProtein_with1Feature set to FALSE, and summaryforMultipleRows set to “max”. The dataProcesses function was then performed using the default arguments.
NCE Optimization
To quantify site-specific acetylation stoichiometry from peptides containing multiple lysines, the fragmentation spectra of precursor ions must contain a high b- and y-ion coverage. To this end, we compared and optimized the number of peptide spectral matches (PSMs) as well as b- and y-ion coverage of MCF7 peptides (chemically acetylated with 12C-acetic anhydride followed by trypsin and gluC digestion) with a Q-Exactive MS using varying NCE settings (15, 20, 25, 30, 35, 40, 45, 50). For all NCE conditions, precursors between 400 - 1200 m/z were selected for fragmentation. MS1 resolution was set to 70,000, 3e6 target AGC, and 100 ms max IT in profile mode. MS2 resolution was set to 35,000, 1e6 target AGC, 200 ms max IT in profile mode with 15 sec dynamic exclusion. Database search was performed using MaxQuant version 1.5.4.1 followed by data analysis in R.
Stoichiometry curve
We determined the accuracy and precision of our stoichiometry method by generating an 11-point stoichiometry curve using a complex sample. For this, we used a HEK293 lysate that was grown using standard culture conditions and harvested by centrifugation. The packed cell volume was resuspended using urea buffer (6-8M urea, 100mM ammonium bicarbonate pH = 8.0) and lysed by sonication. Protein concentration was measured using Bradford reagent (Bio-Rad).
To quantify stoichiometry ranging between 1-99%, we varied the amount of starting material to be chemically acetylated with 12C-acetic anhydride or D6-acetic anhydride using a total of 200 μg of protein for each stoichiometry point. For example, to measure a sample as 10% acetylated, we labeled 20 μg of HEK293 lysate with 12C-acetic anhydride and 180 μg of HEK293 lysate with D6-acetic anhydride. The starting protein amounts were varied to generate stoichiometries of: 1, 5, 10, 20, 40, 50, 60, 80, 90, 95, and 99% acetylation. Upon chemical acetylation, the sample was pooled together, digested using trypsin and we performed an offline HPRP prefractionation as outlined above.
Bioinformatics
Functional annotation
Functional annotation of enriched gene ontology (GO) terms were assessed using DAVID v6.8 (46, 47). For enrichment analysis, the background was set to all the proteins identified in the DDA spectral library totaling 2400 unique protein IDs. The list of high-confidence acetylation stoichiometries, which consist of lysine sites with at least 2 fragment ion quantitation values ± 5% stoichiometry, were ordered from lowest to highest stoichiometry and split into 5 equal quantiles: (A < 0.0095, n = 97), (B 0.0096 - 0.0246, n = 97), (C 0.0247 - 0.0464, n = 96), (D 0.0465 - 0.2741, n = 96), (E 0.2747 - 0.9704, n = 96). Each quantile was analyzed using DAVID for the following GO terms: GOTERM_BP_DIRECT, GOTERM_CC_DIRECT, and GOTERM_MF_DIRECT. A Benjamini Hochberg-corrected pvalue < 0.05 was used corresponding to < 1% FDR for all GO terms. To generate the GO Term scatter plot, we used the −Log10 transformed pvalue, fold enrichment, and coverage from the DAVID output. The scatter plot was generated in R with the ggplot2 v2.1.0 package.
Subcellular localization assignment
To assign protein subcellular localization, we used the MitoCarta (34, 35) and Uniprot (http://www.uniprot.org/) databases. For “Mitochondrial” assignment of proteins, we used the Mitocarta database. Additionally, we used “Subcellular location” or “GO - Cellular component” from the Uniprot database to assign “Mitochondrial”, “Nuclear”, and “Cytoplasmic” pools. Other subcellular locations, such as endoplasmic reticulum, golgi apparatus, cell membrane, etc., were assigned to the “Nuclear” fraction due to the likelihood that these cellular compartments, during differential centrifugation, would sediment in the “Nuclear” spin, which occurs at 1000 xg.
Quantitative Site set functional Score Analysis (QSSA)
The intersection of the KEGG pathway map (48) and proteins in the spectral library detected with < 1% FDR was used for the gene set background. Acetylation coverage for each (p) pathway was calculated as the number of acetyl sites identified (nack) over the total number of lysines in the pathway (nk), counted using protein sequences from Uniprot. The extent of acetylation was taken into account by summing the acetylation stoichiometry (s) across all conditions and all sites in each pathway. To allow for combining acetylation coverage and stoichiometry, the standard score of each quantity was taken. The overall pathway score was then calculated as the sum of the individual z-scores.
Author contributions
J.B., J.F., and J.M.D. designed the research study. J.B. and A.L. performed acetylation stoichiometry mass spectrometry experiments and analyzed the data. J.F. performed single amino acid mass spectrometry experiments and analyzed data. I.L., T.G., O.M.B., and L.R. developed software to analyze DIA MS data as well as for custom modification of the library. J.B., A.L. and M.J.S. performed bioinformatic analysis. J.B. and A.L. designed the figures and drafted the manuscript. J.B., A.L., J.F., M.J.S., L.R. and J.M.D edited manuscript. J.M.D. is the corresponding author.
Additional Information
The authors I.L., T.G., O.M.B. and L.R. are employees of Biognosys AG (Zurich, Switzerland). Spectronaut is a trademark of Biognosys AG.
Acknowledgements
We would like to thank Greg Barrett-Wilt and Greg Sabat at the University of Wisconsin-Madison Biotechnology center for use of the Mascot database server. This work was supported, in whole or in part, by National Institutes of Health (NIH) Grant GM065386 (J.M.D.), NIH National Research Service Award T32 GM007215 (J.B. and A.L.) and the National Science Foundation Graduate Research Fellowship Program (NSF-GRFP) DGE-1256259 (J.B.).