Abstract
Acyl-CoAs are essential for life, serving as the fundamental cellular building blocks lipid biosynthesis, energy metabolism, and reversible protein acetylation. Each of these functions are physically dependent on acyl-CoA/protein interactions. However, despite the known ability of these interactions to influence biology and disease, their global scope and selectivity remain unknown. Here we describe the development and application of CATNIP (CoA/AcetylTraNsferase Interaction Profiling) to globally analyze acyl-CoA/protein interactions in endogenous human proteomes. First, we demonstrate the ability of CATNIP to identify acetyl-CoA-binding proteins through unbiased clustering of competitive dose-response data. Next, we apply this method to profile diverse protein-CoA metabolite interactions, enabling the identification of biological processes susceptible to altered acetyl-CoA levels. Finally, we utilize systems-level analyses to assess the features of novel protein networks that may interact with acyl-CoAs and demonstrate a strategy for high-confidence annotation of direct acetyl-CoA binding proteins and AT enzymes in human proteomes. Overall our studies illustrate the power of integrating chemoproteomics and systems biology analysis methods and provide a novel resource for understanding the diverse signaling roles of acyl-CoAs in biology and disease.
Introduction
Acyl-CoAs are essential for life. These metabolites serve as fundamental cellular building blocks in the biosynthesis of lipids, intermediates in energy production via the TCA cycle, and essential precursors for reversible protein acetylation. Each of these functions are physically dependent on acyl-CoA/protein interactions, which can regulate protein activity via a variety of mechanisms (Fig. 1). For example, the interaction of acyl-CoAs with lysine acetyltransferase (KAT) active sites allows them to serve as enzyme cofactors or, alternatively, competitive inhibitors.1-2 Binding of acyl-CoAs to the allosteric site of pantothenate kinase (PanK) enzymes can exert positive or negative effects on CoA biosynthesis.3 Acyl-CoAs can also non-enzymatically modify proteins, a covalent interaction that often causes enzyme inhibition.4-5 These examples illustrate the ability of acyl-CoA signaling to influence biology and disease. However, the global scope and selectivity of these metabolite-governed regulatory networks remains unknown.
A central challenge of studying acyl-CoA/protein interactions is their pharmacological nature.6 These transient binding events are invisible to traditional next-generation sequencing and proteomic methods. To address this, our group recently reported a competitive chemical proteomic (“chemoproteomic”) approach to detect and analyze acyl-CoA/protein binding.7-8 This method applies a resin-immobilized CoA analogue (Lys-CoA) as an affinity matrix to capture CoA-utilizing enzymes directly from biological samples. Pre-incubating proteomes with acyl-CoA metabolites competes capture and allows their relative binding affinities to enzymes of interest to be assessed. In our initial application of this platform we studied the susceptibility of KATs to metabolic feedback inhibition by CoA, evaluating competition by quantitative immunoblot.8 The signal amplification afforded by immunodetection enables the capture of extremely low abundance KATs to be readily quantified; however, its targeted nature it is best suited to the study of specific CoA-utilizing enzymes, rather than broad profiling or discovery applications. We reasoned such applications could be enabled by integrating CoA-based affinity reagents with i) multidimensional chromatographic separation, to efficiently sample rare KAT enzymes, ii) quantitative LC-MS/MS proteomics, for unbiased identification of CoA-interacting proteins, and iii) systems analysis of the acyl-CoA-binding proteins identified, for data-driven analysis of putative interaction networks. We term this approach CATNIP (CoA/AcetylTraNsferase Interaction Profiling). Here we describe the development and application of CATNIP to globally analyze acyl-CoA/protein interactions in endogenous human proteomes. First, we demonstrate the ability of CATNIP to identify acetyl-CoA-binding proteins through unbiased clustering of competitive dose-response data. Next, we apply this method to profile diverse protein-CoA metabolite interactions, enabling the identification of biological processes susceptible to altered acetyl-CoA levels. Finally, we utilize systems-level analyses to assess the features of novel protein networks that may interact with acyl-CoAs and demonstrate a strategy for high-confidence annotation of direct acetyl-CoA binding proteins and AT enzymes in human proteomes. Overall our studies illustrate the power of integrating chemoproteomics and systems biology analysis methods and provide a novel resource for understanding the diverse signaling roles of acyl-CoAs in biology and disease.
Results
Validation of CATNIP for the global study of acyl-CoA/protein interactions
In order to deeply sample acyl-CoA/protein interactions on a proteome-wide scale, we initially set out to integrate CoA-based protein capture methods with LC-MS/MS (Fig. 2a). In this workflow, whole cell extracts are first incubated with Lys-CoA Sepharose. This affinity matrix enables active site-dependent enrichment of many different classes of CoA-binding proteins,8 making it ideal for broad profiling studies. Next, enriched proteins are subjected to tryptic digest and analyzed using MudPIT (multidimensional protein identification technology), a proteomics platform that combines strong cation exchange and C18 reverse phase chromatography to pre-fractionate tryptic peptides, followed by ionization and data-dependent MS/MS.9 The separation afforded by this approach significantly decreases sample complexity, allowing the identification of rare, low abundance peptides from complex proteomic mixtures. To facilitate the identification of acyl-CoA/protein interactions, competition experiments are performed in which proteomes are pre-incubated with a CoA metabolite prior to capture.10 Decreased enrichment in competition samples compared to controls (as assessed by quantitative spectral counting) signifies that the CoA metabolite interacts with a protein of interest. These interacting proteins can then be further classified into pharmacological or biological networks using either conventional metrics (fold-change, gene ontology, etc) or systems-based analysis tools.
As an initial model, we explored the utility of CATNIP to globally profile acetyl-CoA/protein interactions in unfractionated HeLa cell proteomes. Proteomes were pre-incubated with acetyl-CoA or vehicle (buffer) control, followed by enrichment using Lys-CoA Sepharose. These experiments assessed competition at 3, 30, and 300 μM acetyl-CoA, which spans the physiological concentration range of acetyl-CoA in the cytosol and mitochondria. Protein capture in each condition was quantified using distributed normalized spectral abundance factor (dNSAF), a label-free metric that normalizes spectral counts relative to overall protein length (Fig. S1a-c, Table S1).11 Each condition was analyzed in triplicate, constituting 12 experiments, >144 hours of instrument time, and over 1.1 million non-redundant peptide spectra collected. We limited our analysis to high-confidence protein identifications (>4 spectral counts in vehicle [0 μM] sample). The capture of Uniprot annotated CoA-binding proteins or members of AT complexes (termed ‘AT interactors’) did not correlate with overall protein abundance or gene expression,12 consistent with the ability of chemoproteomic methods to sample functional activity metrics (e.g. unique pharmacology, active-site folding/conformation, integration into complexes, posttranslational modification) rather than raw quantity (Fig. S1d-i).13
To analyze acetyl-CoA binding in a systematic manner throughout the proteome, we first grouped proteins into subsets based on their dose-dependent competition profiles. Chemoproteomic capture data from 0, 3, 30, and 300 μM acetyl-CoA competition was transformed, plotted in two dimensions, and subjected to k-means clustering. Eight protein clusters were identified, each of which exhibited a distinct dose-dependent competition signature (Fig. 2b-c, Fig. S2a-b). The capture of proteins within clusters 1-3 were antagonized by acetyl-CoA in a dose-dependent fashion. Proteins in cluster 1 displayed hypersensitivity to acetyl-CoA competition, while proteins in clusters 2 and 3 exhibited moderate and partial competition, respectively. The remaining clusters exhibited more complicated capture profiles, consisting of either dose-dependent and independent antagonism (cluster 5), or mixed agonist/antagonist behavior (clusters 4, 6-8, Fig. 2c, Fig. S2b). To determine which of these competition signatures were most characteristic of acetyl-CoA binding, we first analyzed each cluster for the presence of known CoA-binding proteins and AT interactors. Cluster 1, composed of proteins whose capture is hypercompetitive to pre-incubation with acetyl-CoA, contains only 7% of the total proteins identified in this experiment. However, 25% of proteins in this cluster are annotated CoA-binding proteins and AT interactors, a disproportionate enrichment (Fig. 2d, Fig. S2c). Clusters 2 and 3 were also relatively enriched in annotated CoA binders and AT interactors, while all other subsets were not (Fig. 2d). Examining our entire dataset, we found the total number of CoA-binding proteins and AT interactors competed 2-fold by acetyl-CoA almost doubled going from 3 to 30 μM, but was only modestly increased by higher concentrations of competitor (Fig. 2e). Proteins in clusters 1 and 2 exhibit almost complete loss of capture at 30 μM acetyl-CoA (Fig. 2c). This suggests the occupancy of most acetyl CoA-binding sites accessible to our method are saturated at the intermediate concentration used here (∼30 μM), in line with literature measurements of binding affinity and Michaelis constants.14 Clusters 1 and 2 include proteins that bind to acetyl-CoA directly (CREBBP, NAA10), allosterically (PANK1), and indirectly via protein-protein interactions (NAA25, JADE1; Table S2). This indicates that proteins with disparate modes of acetyl-CoA interaction can display similar dose-dependent competition signatures. Gene ontology analysis of annotated CoA binders in clusters 1 and 2, whose enrichment was hypercompetitive to acetyl-CoA pre-incubation, revealed an enrichment in terms related to histone and N-terminal acetyltransferases as well as CoA biosynthetic enzymes (Fig. 2f, Fig. S2d-e). The strong enrichment of KATs likely results from the propensity of our bisubstrate Lys-CoA capture agent to interact with this enzyme family.15 A similar analysis of proteins in cluster 3, which exhibits partial competition by acetyl-CoA, identified a disproportionate number of mitochondrial CoA-binding enzymes (Fig. S2f). This decreased sensitivity to acyl-CoA competition may reflect evolutionary adaptation to the unique subcellular concentrations of metabolites in mitochondria, where acetyl-CoA is found at millimolar concentrations.16 Overall, these studies validate the ability of CATNIP to detect bona fide acetyl-CoA/protein interaction signatures, and establish key parameters necessary for its useful application in studying the pharmacology of CoA metabolites.
Applying CATNIP to define the unique pharmacological signatures of acetyltransferase enzymes
In addition to acetyl-CoA (1), cells produce a physiochemically diverse range of CoA metabolites whose concentrations directly reflect the metabolic state of the cell. Many of these species make regulatory interactions with proteins, including the long chain fatty acyl (LCFA) palmitoyl-CoA, a classic feedback inhibitor of acetyl-CoA carboxylase,17 short chain fatty acyl (SCFA) butyryl-CoA, which can potently inhibit KATs or be used as a substrate,2, 18 and negatively charged succinyl-CoA, which can covalently inhibit many mitochondrial enzymes.4-5 However, despite their physiological relevance, few studies have interrogated the comparative pharmacology of acyl-CoA/enzyme interactions. We hypothesized that the ability of CATNIP to report on the binding affinity of ligands relative to Lys-CoA could address this gap and enable the generation of pharmacological fingerprints of acyl-CoA-protein interactions across the proteome. To explore this hypothesis, we performed competitive chemoproteomic capture experiments in the presence of additional metabolites including: i) CoA (2), a feedback inhibitor of acetyltransferases, ii) butyryl-CoA (3), a short chain fatty acyl-CoA, iii) crotonyl-CoA (4), a SCFA-CoA containing a latent acrylamide electrophile, iv) acetic-CoA (5), a stable analogue of malonyl-CoA which has recently been shown to be a hyperreactive metabolite capable of covalent protein modification, and v) palmitoyl-CoA (6), a LCFA-CoA which we have previously shown can potently inhibit KATs in vitro (Fig. 3a, Table S3). For these experiments, CoA metabolites were equilibrated with proteomes (1 h) prior to Lys-CoA capture. A dosage of 30 μM was selected to enable a comparison of each ligand’s competition profile to that of acetyl-CoA, which showed substantial interaction with proteins in clusters 1-3 at this concentration. For palmitoyl-CoA a lower concentration was used (3 μM) in order to ensure solubility and reflect the limited free (non-protein/membrane bound) quantities of LCFA-CoA likely to be present in cells.
As an initial rough measure of acyl-CoA selectivity, we performed a global analysis of proteins displaying robust interaction (>2-fold decreased capture) with competitors. Evaluating 1757 proteins quantified in Lys-CoA enrichments, we found that 1566 (89%) were >2-fold competed by at least one CoA/acyl-CoA metabolite (1-6, Fig. 3b, Table S3). In general the majority of proteins found to interact with acetyl-CoA (1) also displayed competition by 2-6, suggestive of ligand-binding promiscuity amongst CoA-binding proteins. Examining physiochemically distinct ligands 3-6, a handful of selective interactions were observed for each acyl-CoA (Fig. 3c). Notably, butyryl-CoA (3) showed substantial overlap with protein interactors of 4-6. This may be suggestive of its metabolic stability in lysates, or ability to make high affinity interactions with many classes of CoA-binding proteins at the concentration applied. To compare the magnitude of protein-ligand interactions, we plotted the competition (log2 fold change, competitor v. control) of each individual ligand relative to acetyl-CoA (Fig. 3d). Most proteins interacted more strongly with acetyl-CoA (1) than other ligands, with the exception of butyryl-CoA (3). This is consistent with the fact that acetyl-CoA (1) and butyryl-CoA (3) exhibited the greatest number of unique interaction partners in our comparative analysis. This promiscuous binding also extended to known CoA-binding proteins, including KATs (Fig. S3). Notable exceptions were HADHB, which was found to interact only with butyryl-CoA, as well as ECHS1, which was found to interact only with crotonyl-CoA (Table S3). HADHB encodes the thiolase subunit of the mitochondrial trifunctional protein, which is involved in the oxidation of fatty acids 8 carbons or longer.19 The ability of this enzyme to specifically interact with butyryl-CoA but not acetyl-CoA could represent a mechanism allowing cells to sense blockade of the terminal steps of SCFA-CoA catabolism, triggering feedback inhibition of fatty acid oxidation in a manner that product inhibition does not. ECHS1 encodes an acyl-CoA dehydrogenase, and was the only protein found to be competed 2-fold by crotonyl-CoA, but not the structurally related acetyl- and butyryl-CoA. Other enzymes with crotonase folds did not display this selective inhibition profile (Table S4). Selective crotonyl-CoA dependent capture is consistent with the substrate specificity of this enzyme, which shows rapid turnover of crotonyl-CoA relative to longer chain enoyl-CoA thioesters.20 To facilitate a more granular analysis, we grouped CoA-binding proteins by biological function or fold and compared their quantitative metabolite-binding signatures upon interaction with 1-6. Histone, lysine, and GNAT acetyltransferases displayed a diversity of ligand binding signatures (Fig. 3e). For example, the capture of enzymes such as CREBBP and NAT10 was strongly competed by multiple metabolites, while others (KAT8 and HAT1) displayed an apparent preference for selective interaction with acetyl-CoA (Fig. 3e). Selectivity did not correlate with enzyme function/fold, capture abundance, or acetyl-CoA interaction cluster (Fig. 3e, Table S4), suggesting this metabolite interaction fingerprint represents a unique and intrinsic feature of individual enzymes. The promiscuous ligand binding of CREBBP is notable, as this KAT and its homologue EP300 have been found to utilize several acyl-CoAs as alternative cofactors.21-22 The PANK family of proteins catalyze the phosphorylation of pantothentate (vitamin B5) to phosphopantothenate, which constitutes a key step in CoA biosynthesis. Previous biochemical studies have found PANK1 to be allosterically inhibited by acetyl-CoA but not CoA, while PANK2 is strongly inhibited by both ligands.23-24 We found acetyl-CoA interacted more strongly with each enzyme, but did not observe substantial disparity between CoA binding to the two enzyme isoforms. This may reflect differential binding of metabolites to these enzymes in the complex proteomic milieu compared to biochemical assays or, alternatively, a limitation of our method, which uses a single concentration of ligand that may saturate both selective and non-selective interactions. Overall, these studies validate the ability of chemoproteomics to study acyl-CoA/protein interactions and provide an initial snapshot of the proteome-wide binding selectivity of acyl-CoA metabolites.
Evaluating the dynamic activity of acetyltransferases in response to metabolic perturbation
Coenzyme A (2) is one of the most abundant metabolites in cells. In addition to functioning as an obligate precursor for acyl-CoA biosynthesis, CoA can also serve as a potent feedback inhibitor of members of the AT superfamily. Previously, we used quantitative immunoblotting of chemoproteomic capture experiments to probe the sensitivity of eight ATs to product inhibition by testing their relative binding to acetyl-CoA (cofactor) and CoA (inhibitor).8 The success of this approach inspired us to apply CATNIP to extend this comparison proteome-wide. Capture experiments were performed in the presence of escalating doses of CoA (3, 30, 300 μM), transformed, and clustered using an identical pipeline as in our acetyl-CoA binding experiments above (Table S5). Two clusters (3 and 8) exhibited readily interpretable dose-dependent competition profiles, with several additional clusters (1, 2, and 5) displaying hypersensitivity at low concentrations (3 μM) of CoA (Fig. S4a-b). Dose-dependent cluster 3 contained KAT2A, CREBBP, and PANK2, all of whom have been shown to be biologically or biochemically susceptible to metabolic feedback inhibition by CoA.3, 25-26 Further examination of this cluster revealed three proteins that were most sensitive to CoA, exhibiting >50% loss of capture in the presence of 3 μM ligand and >80% loss of capture in the presence of 30 μM ligand: ACLY, NAT6, and NAT10 (Fig. 4a). The unusually strong CoA interaction profile of these three proteins was distinct from that of other proteins in the cluster and within the KAT superfamily, most of which are competed much more efficiently by acetyl-CoA (Fig. 4b, Fig. S4c). The binding of ACLY to both CoA and acetyl-CoA is consistent with the reversible activity of the enzyme, which has been previously observed in biochemical assays.27 NAT6 (NAA80) is a recently de-orphanized enzyme which has been determined to acetylate the N-terminus of actin, whose metabolic sensitivity has not been explored.28 NAT10 is an RNA acetyltransferase that has been found to catalyze acetylation of cytidine in ribosomal, transfer, and messenger RNA, forming the minor nucleobase N4-acetylcytidine (ac4C).29-31 The identification of NAT10-CoA interactions by CATNIP is consistent with our previous studies, which have established that NAT10 binds acetyl-CoA and CoA with similar affinities and may be susceptible to metabolic feedback inhibition.8 Compounding this effect, no related N4-acylations of cytidine (e.g. butyrylation) have been identified in RNA, suggesting NAT10 may additionally interact with SCFA- and LCFA-CoAs as inhibitors.
To explore the metabolic inhibition of NAT10 in greater detail, we determined the metabolic source of the acetate group post-transcriptionally introduced into ac4C in proliferating cancer cell lines (Fig. 4c). Treatment of cells with isotopically-labeled acetyl-CoA precursors, followed by RNA digest and analysis by LC-MS/MS, revealed that the majority of NAT10-dependent cytidine acetylation stems from glucose-derived acetyl-CoA (Fig. 4d). Since the production of glucose-derived acetyl-CoA in human cells is highly dependent on ACLY activity, and ACLY perturbation can drastically influence the ratio of acetyl-CoA to inhibitory CoA metabolites,32 we next examined how stable knockout of ACLY impacted ac4C levels in RNA. Analysis of wild-type and ACLY knockout human glioblastoma cells33 revealed similar levels of ac4C in total RNA (Fig. 4e). However, LC-MS analysis of poly(A)-enriched RNA fractions from these cell lines indicated an ACLY-dependent decrease in ac4C. ACLY-dependent deposition is also observed for another acetyl-CoA derived RNA nucleobase, 5-methoxycarbonylmethyl-2-thiouridine (mcm5S2U), whose production is catalyzed by the AT enzyme Elp3 (Fig. S4d-e).34 The observation that ac4C and mcm5s2U are sensitive to the metabolic state of the cell is consistent with the findings of Balasubramanian and coworkers, who reported that starvation conditions reduced NAT10-dependent ac4C levels in transfer RNA.35 The ability of ACLY perturbation to influence the acetylation of poly(A)RNA, but not total RNA, suggests inhibitory CoA/acyl-CoAs may interact in a distinct manner with different functional forms of NAT10. These studies illustrate the ability of CATNIP to guide the identification of novel acetylation events that are sensitive to the metabolic state of the cell.
Unbiased CATNIP analysis reveals annotation and mechanistic features of acyl-CoA-binding
The annotation of the cellular acyl-CoA binding proteome has never been directly assessed using experimental methods. Therefore, we next set out to develop an unbiased workflow for analysis of CATNIP binding data that could enable the de novo identification of known acyl-CoA dependent enzymes and ask what, if any, uncharacterized proteins share these properties. To differentiate acyl-CoA interacting proteins from background, our initial criteria were: 1) significant competition (p ≤ 0.05) of enriched proteins by three or more CoA ligands, and 2) absence of enriched proteins in the ‘CRAPome’ common contaminant database (Fig. 5a).36 Of 1764 proteins detected in Lys-CoA Sepharose capture experiments, 672 (38.1%) passed these cut-offs (Table S6), including the majority of annotated ATs that were enriched by Lys-CoA (Fig. 5b). Proteins not identified were mostly found to be poorly expressed by RNA Seq (Fig. 5b)12 and did not display obvious structural similarities in GNAT consensus elements (Fig. S5a).1 To examine whether unique patterns of acyl-CoA binding in this filtered dataset are associated with distinct biological processes, we further analyzed these proteins using Topological Data Analysis (TDA).37 TDA functions as a geometric approach that can be used to identify shared properties of complex multidimensional datasets that may not be apparent by other methods, and has previously been used to detect biologically-relevant modules in protein complexes from immunoprecipitation LC-MS/MS data.38 Therefore, we applied TDA to analyze the multidimensional CoA metabolite competition profiles for each protein in our filtered subset, and then annotated the TDA clusters with enriched pathways identified by gene ontology analysis using DAVID (https://david.ncifcrf.gov) and ConsensusPathDB (http://cpdb.molgen.mpg.de/). This analysis revealed that histone acetyltransferases formed a distinct cluster relative to PANK2 and PANK3, which are allosterically regulated by CoA metabolites, as well as many proteins involved in RNA metabolism and cell cycle whose association with CoA metabolites has not been previously characterized (Fig. 5c). This suggests these processes may be subject to differential crosstalk by levels of CoA metabolites and demonstrates the utility of TDA for clustering and visual representation of ligand-protein interaction networks.
Next, we sought to extend our approach to explore the annotation of the human CoA-binding proteome. Specifically, we wished to incorporate additional criteria allowing us to differentiate proteins that directly bind to CoA metabolites, such as ATs, from proteins that are indirectly captured via protein-protein interactions, such as non-catalytic members of AT complexes. Such an approach would potentially provide a pipeline for novel AT discovery, as well as insights into how well the CoA-binding proteome is currently characterized. To accomplish this, we first classified proteins in our statistically significant filtered subset based on their dose-dependent acetyl-CoA competition profiles determined above, whose clustering we found could highlight protein subsets enriched in known ATs and CoA-binding proteins (Fig. 2b, d). For the purposes of this discussion we focus on cluster 1, which was found to be the most enriched in these protein classes. Approximately 7% (45/672) of the filtered proteins resided in cluster 1, whose capture is hypersensitive to competition by acetyl-CoA (Fig. 5d, Table S7). This included 12 direct CoA-binders, 15 AT-interactors, and 18 proteins whose interaction with CoA metabolites had not been previously characterized. Amongst this protein subset, terms related to histone acetyltransferases and CoA biosynthesis were clearly differentiated as the most highly enriched biological process (Fig. S5d). To further differentiate direct and indirect CoA interactions, we next assessed these 45 proteins for sites of high stoichiometry acetylation. We reasoned this criteria may further enrich our analysis for proteins that directly bind CoA metabolites, since acyl-CoA interactions can underlie both enzymatic and non-enzymatic autoacylation. Using a recently published dataset,39 we identified 6 out of 42 proteins that contain a modified lysine lying in the top 10% of all acetylation stoichiometries measured in HeLa cells (>0.17% stoichiometry, Fig. 5d). Five of these proteins were Uniprot annotated CoA binders (ACLY, CREBBP, HADH, NAT10, NAA10), while one was a member of an AT complex (NAA15). These analyses suggest a multi-pronged approach assessing i) statistically significant multi-ligand competition,ii) dose-response clustering, and iii) acetylation stoichiometry may prove most useful for annotation of the CoA-binding proteome, with the caveat that additional stringency will also lead to filtering of some ‘true’ positives. These findings also imply the acyl-CoA binding proteome interrogatable by this analysis is well-annotated.
Finally, we asked whether CATNIP binding data may be able to provide insight into the mechanisms (enzymatic versus non-enzymatic) responsible for high stoichiometry lysine acetylation of the proteins identified above. Our previous studies have provided evidence that lysine malonylation can serve as marker of non-enzymatic acylation in the nucleocytosolic space due to the hyperreactive nature of malonyl-CoA.4 However, the extent to which malonylation reflects specific binding of malonyl-CoA, lysine reactivity, or some combination thereof remains unknown. Examining the six proteins above, only one (NAT10) exhibited statistically significant competition by the malonyl-CoA surrogate acetic-CoA. In line with this, while 5/6 of these proteins were found to harbor sites of lysine malonylation,40 only in the case of NAT10 were the high stoichiometry acetylation and malonylation sites found on the same residue (K426). This lysine lies within NAT10’s GNAT domain and is highly conserved from eukaryotes to bacteria (Fig. S5f). Analyzing the position of K426 using the structure of a NAT10 orthologue shows it lies proximal to the acetyl-CoA binding site,41 potentially priming it for non-enzymatic acetylation (Fig. 5f). Such a non-enzymatic mechanism would reconcile the paradoxical finding that NAT10 undergoes functional lysine acetylation in its active site, but its only biochemically validated substrates are RNA cytidine residues. While further work will be needed to evaluate the impact of K426 malonylation on NAT10 activity, our studies demonstrate the potential for interfacing CATNIP datasets with analyses of lysine malonylation to identify proximity-dependent non-enzymatic acylation mechanisms.
Discussion
Chemoproteomics has recently emerged as a powerful method for the interrogation of metabolite signaling. Here we describe the development and application of CATNIP, a systems chemoproteomic approach for the high-throughput analysis of acyl-CoA/protein interactions. We first validate the ability of CATNIP to identify protein subsets enriched in CoA-binding, and then apply this method to probe the selectivity of acyl-CoA/protein interactions, visualize novel acyl-CoA interactive biological networks, and characterize the interplay between direct acyl-CoA binding and covalent lysine acylation. CATNIP identified a strong interaction of the RNA cytidine acetyltransferase NAT10 with the feedback metabolite CoA as well as several additional acyl-CoA cofactors. Furthermore, in cell models where acetyl-CoA biosynthesis is impaired we found that a subset of cytidine acetylation in was decreased, implying these CoA metabolites may be capable of interacting with NAT10 as endogenous inhibitors. Of note, the percentage of relative abundance of ac4C is ∼8-fold lower in oligo(dT)-enriched RNA than total RNA, and no data regarding the stoichiometry of these targets has been reported. Therefore, additional work will be needed to validate this finding, as well as to understand what effect acetyl-CoA metabolism has on the acetylation of specific RNA targets and pathogenic NAT10 activity. These studies highlight the ability of CATNIP to identify biological processes conditionally regulated by acetyl-CoA and provide a novel hypothesis generation tool for the functional interrogation of metabolite-protein interactions in biology and disease.
To explore the utility of CATNIP for discovery applications, we developed an unbiased workflow to applying chemoproteomic data for the de novo annotation of acetyl-CoA binding proteins. Critical to this endeavor was the integration of CATNIP and acetylation stoichiometry datasets,39 which allowed the identification of a protein subset highly enriched in CoA-binders and AT interactors that was obscure to either method alone (Fig. S4d). An interesting finding was the absence of any ‘unexpected interactors,’ i.e. unannotated proteins with CATNIP profiles indicative of CoA-binding, within this highly curated subset. This suggests the current CoA-binding proteome is well-annotated, with the caveat that this conclusion is entirely dependent on the unique workflow applied here, and therefore does not preclude the discovery of novel acyl-CoA-binding proteins by new experimental methods (e.g. structurally distinct capture probes) or computational analyses. With regards to the latter, it is important to note that many authentic acyl-CoA-binding proteins sampled by CATNIP do not exhibit high stoichiometry acetylation sites (e.g. ATAT1) or fall outside of dose-dependent cluster 1 (e.g. KAT2A). Our studies demonstrate how acetylation stoichiometry may serve as a useful guide to high-confidence annotation of acyl-CoA binding, while simultaneously raising the possibility of mining additional CoA binders and AT interactors from CATNIP data.
Acyl-CoA/protein interactions can play many potential functional roles (Fig. 1). Inspired by recent chemoproteomic studies showing that inositol polyphosphate binding can trigger non-enzymatic protein pyrophosphorylation,42 we wondered whether acyl-CoA binding may similarly be a major driver of non-enzymatic lysine acylation. Examining the lysine malonylation, a putative non-enzymatic PTM derived from the electrophilic metabolite malonyl-CoA,4 we identified NAT10 as a unique case in which these PTMs could be correlated with proximity to an acyl-CoA binding site. However, this approach is far from predictive and, even in our curated dataset of high confidence acyl-CoA-binding proteins, found many sites of malonylation mapping far from the annotated active site (Fig. 5e, Fig. S5f).40 Although additional studies are needed, our data suggests for many non-enzymatic acylations factors independent of acyl-CoA binding affinity such as lysine nucleophilicity, surface accessibility, and exposure to high local concentrations of electrophilic CoAs may be important determinants for covalent modification.
Finally, it is important to note some limitations of our current method, as well as steps that may be taken to optimize it for future applications. To facilitate the development of CATNIP, our initial study employed ion trap mass spectrometers for protein identification. For future experiments, we propose using higher resolution instruments to simultaneously perform and PTM identification such as lysine acetylation on enriched proteins, which may be indicative of activity, or use tandem-mass tag (TMT) workflows that enable multiplexed measurements in a single LC-MS/MS run. Transitioning CATNIP to higher resolution instruments will be important for improving the throughput and quantitative applications of our method. An important characteristic of CATNIP is that it reports on relative, rather than absolute, binding affinities due to differences in the inherent binding affinity of individual proteins to the Lys-CoA capture matrix. This means CATNIP is best suited to gauging the comparative pharmacology of individual acyl-CoA binding proteins (i.e. for a series of ligands, which ones interact strongly with protein of interest), rather than rank order comparisons of absolute ligand-protein binding affinity across the proteome. Such biases are an intrinsic feature of chemoproteomic methods and extend even to label-free approaches such as LiP-MS and CETSA,43-44 whose detection of protein-ligand interactions require ligand binding to alter proteolytic or thermal stability, respectively. Future studies of acyl-CoA/protein interactions will likely benefit from the integration of multiple approaches. Spike-in controls whose affinity for the CATNIP matrix has been determined may also prove useful for quantitative measurements. Clustering analysis indicated that many CoA binders and AT interactors display similar competition profiles, implying CATNIP as currently constituted is not able to discriminate between direct and indirect interactors. In addition to using acetylation stoichiometry as an orthogonal measure for the de novo assignment of direct acyl-CoA binding, it may be possible to distinguish indirect binding based on susceptibility to ionic competition (i.e. high salt) or by complementing matrix-based pulldown with covalent capture using clickable photoaffinity probes.7 Alternatively this may be solved by optimized computational analysis, in which the proteins identified from multiple competitive ligands are compared using topological scoring (TopS)45 to determine enrichment of proteins and direct interactions from a range of concentrations or ligand types. Although we focused here on studying the interactions of proteins with endogenous acyl-CoA metabolites, recently multiple classes of drug-like KAT inhibitors have been reported,46-47 and we anticipate our method will be immediately useful for understanding the pharmacological specificity and potency of these small molecule chemical probes. Such studies are underway, and will be reported in due course.
Supporting Information
Additional data including Figures S1-S5, Tables S1-S8, and experimental methods are available in the supporting information.
Materials and Methods
General materials and methods
NHS-Activated Sepharose 4 Fast Flow resin was purchased from GE Healthcare Life Sciences (71-5000-14 AD). Amine-functionalized Lys-CoA-Ahx was synthesized as described previously.8 U-13C6-glucose (CLM-1396) and U-13C2-acetate (CLM-440) were purchased from Cambridge Isotope Laboratories. Acetyl-CoA (A2056), butyryl-CoA (B1508), crotonyl-CoA (28007) and palmitoyl-CoA (P9716) were purchased from Sigma. Coenzyme A (CoA; C7505-51) was purchased from United States Biological. Acetic-CoA was synthesized from 2-bromoacetic acid and CoA in a single-step as described previously.7 Prior to utilization all acyl-CoAs were analyzed for purity by LC-MS and re-purified via HPLC if necessary. CoAs were quantified using the molar extinction coefficient (ε) for Coenzyme A of 15, 000 M-1cm-1 at λmax of 259 nm. HeLa cells used to prepare proteomic extracts were grown by Cell Culture Company (formerly National Cell Culture Center, Minneapolis, MN). TRIzol reagent (#15596026) and oligo-(dT)25 Dynabeads (#61005) were purchased from ThermoFisher Scientific (15596026). Analytical analyses of Lys-CoA and all acyl-CoAs were performed using a Shimadzu 2020 LC-MS system.
Preparation of Lys-CoA Sepharose resin
Lys-CoA Sepharose (1) was prepared using NHS-Activated Sepharose 4 Fast Flow resin essentially according to the manufacturer’s protocol (GE Healthcare Life Sciences, Instructions 71-5000-14 AD).8 Briefly, amine-functionalized Lys-CoA-Ahx was prepared as a 3.4 mM solution in PBS. Resin was washed with cold 1 mM HCl prior to coupling, before addition of the ligand solution at a ratio of 2:1 resin:ligand volume. The pH was adjusted to ∼7-8 by addition of 20x PBS, and the mixture was then rotated at 4°C overnight. The resin was pelleted at 1400 rcf for 3 minutes, and the supernatant was discarded prior to addition of 3 resin volumes of 0.1 M Tris-HCl [pH 8.5], and the mixture was rotated for 3 hr at room temperature. Resin was washed 3x each with alternating solutions of 0.1 M Tris-HCl [pH 8.5] and 0.1M Sodium Acetate, 0.5 M NaCl [pH 4.5] (6 washes total). Resin stored as a 33% solution in aqueous 20% EtOH at 4°C.
Procedure for CATNIP affinity capture, competition and LC-MS/MS studies
Affinity capture using Lys-CoA Sepharose was carried out essentially as previously reported.2, 8 Briefly, 33 µl of capture resin was washed once with 1 ml of PBS, prior to addition of 500 µl of clarified lysates (1.5 mg/ml, pretreated with vehicle or competitor ligand for 30 min on ice). This mixture was rotated for 1 hr at room temperature, pelleted at 1400 rcf, and supernatant discarded. Sepharose capture resins were subjected to a series of mild washes using ice cold wash buffer (50 mM Tris-HCl [pH 7.5], 5% glycerol [omitted in LC-MS/MS experiments], 1.5 mM MgCl2, 150 mM NaCl, 3 × 500 µl). Following the last wash, enriched resin was collected on top of centrifugal filters (VWR, 82031-256). For LC-MS/MS analysis of captured proteins, enriched resin was transferred from centrifugal filters to fresh 1.7-ml tubes using 400 µl of tryptic digest buffer (50 mM Tris-HCl [pH 8.0], 1 M urea). Digests were initiated by addition of 0.4 µl of 1 M CaCl2 and 4 µl of trypsin (0.25 mg/ml) and allowed to proceed overnight at 37°C with shaking. After extraction, tryptic peptide samples were acidified to a final concentration of 5% formic acid, lyophilized, and frozen at −80°C until LC-MS/MS analysis.
MudPIT LC-MS/MS analysis of and database searching of Lys-CoA enriched proteomes
Lyophilized peptide samples from Lys-CoA Sepharose enriched HeLa proteomes were analyzed independently in triplicate by Multidimensional Protein Identification Technology (MudPIT), as described previously48-49. Briefly, dried peptides were resuspended in 100µL of Buffer A (5% acetonitrile (ACN), 0.1% formic acid (FA)) prior to pressure-loading onto 100 µm fused silica microcapillary columns packed first with 9 cm of reverse phase (RP) material (Aqua; Phenomenex), followed by 3 cm of 5-µm Strong Cation Exchange material (Luna; Phenomenex), followed by 1 cm of 5-µm C18 RP. The loaded microcapillary columns were placed in-line with a 1260 Quartenary HPLC (Agilent). The application of a 2.5 kV distal voltage electrosprayed the eluting peptides directly into LTQ linear ion trap mass spectrometers (Thermo Scientific) equipped with a custom-made nano-LC electrospray ionization source. Full MS spectra were recorded on the eluting peptides over a 400 to 1600 m/z range followed by fragmentation in the ion trap (at 35% collision energy) on the first to fifth most intense ions selected from the full MS spectrum. Dynamic exclusion was enabled for 120 sec50. Mass spectrometer scan functions and HPLC solvent gradients were controlled by the XCalibur data system (Thermo Scientific).
RAW files were extracted into .ms2 file format51 using RawDistiller v. 1.0, in-house developed software52. RawDistiller D(g, 6) settings were used to abstract MS1 scan profiles by Gaussian fitting and to implement dynamic offline lock mass using six background polydimethylcyclosiloxane ions as internal calibrants52. MS/MS spectra were first searched using ProLuCID53 with a mass tolerance of 500 ppm for peptide and fragment ions. Trypsin specificity was imposed on both ends of candidate peptides during the search against a protein database combining 36,628 human proteins (NCBI 2016-06-10 release), as well as 193 usual contaminants such as human keratins, IgGs and proteolytic enzymes. To estimate false discovery rates (FDR), each protein sequence was randomized (keeping the same amino acid composition and length) and the resulting “shuffled” sequences were added to the database, for a total search space of 73,642 amino acid sequences. A mass of 15.9949 Da was differentially added to methionine residues.
DTASelect v.1.954 was used to select and sort peptide/spectrum matches (PSMs) passing the following criteria set: PSMs were only retained if they had a DeltCn of at least 0.08; minimum XCorr values of 1.9 for singly-, 2.7 for doubly-, and 2.9 for triply-charged spectra; peptides had to be at least 7 amino acids long. Results from each sample were merged and compared using CONTRAST54. Combining all replicate injections, proteins had to be detected by at least 2 peptides and/or 2 spectral counts. Proteins that were subsets of others were removed using the parsimony option in DTASelect on the proteins detected after merging all runs. Proteins that were identified by the same set of peptides (including at least one peptide unique to such protein group to distinguish between isoforms) were grouped together, and one accession number was arbitrarily considered as representative of each protein group.
NSAF7 55 was used to create the final reports on all detected peptides and non-redundant proteins identified across the different runs. Spectral and protein level FDRs were, on average, 0.31±0.10% and 1.0±0.35%, respectively. QPROT (Choi, et al, 2015) was used to calculate a log fold change and false discovery rate for the dosed samples compared to the vehicle control.
Partitioning clustering
To group proteins based on their abundance profile across the four treatment conditions (i.e. 0, 3µM, 30µM and 300µM), first each individual protein was normalized in each condition to the highest value across the four conditions (i.e. the highest value equals to 100%). To spatially map the proteins in the dataset, a t-distributed stochastic neighbor embedding (t-SNE), a nonlinear visualization of the data was applied. Then, k-means clustering was applied to this transformed matrix using the Hartigan-Wong algorithm and a maximum number of iterations set at 50000. To determine the best partition, the numbers of clusters, k, were continuously increased from 3 to 20. The result showed that the optimal number of clusters was obtained when k=8, after carefully inspecting all the clusters and their silhouette and Hartigan indexes. All computations were run using R environment using k-means function for the partition and daisy function to compute all the pairwise dissimilarities (Euclidean distances) between observations in the dataset for the silhouette.
Dose response curves
Normalized dNSAF values for each protein were plotted as a function of ligand concentration in Origin Pro 2018 for each cluster. The curves were averaged in Origin and the average was displayed on the graph.
Topological data analysis
The input data for TDA were represented in a matrix, with each column corresponding to a CoA ligand and each row corresponding to a protein. Values were distributed spectral counts values for each protein. A network of nodes with edges between them was then created using the TDA approach based on Ayasdi platform (AYASDI Inc., Menlo Park CA as described previously.38 Two types of parameters are needed to generate a topological analysis: First is a measurement of similarity, called metric, which measures the distance between two points in space (i.e. between rows in the data). Second are lenses, which are real valued functions on the data points. Here, Variance Norm Euclidean was used as a distance metric with 2 filter functions: Neighborhood lens 1 and Neighborhood lens2. Resolution 30 and gain 3 were used to generate Fig. 5c.
Pathway analysis
Proteins that were changing in at least one of the CoA ligands with a Z-score less than −2 and FDR less or equal to 0.05 were considered for the analysis. Using this criteria, 671 proteins were identified and used for the pathway analysis. As expected, HATs acetylate histone was one of the top 30 enriched pathways (p-value of 4.55e-12) in the ConsensusPathDB (http://cpdb.molgen.mpg.de/) database.
Bioinformatic analyses of CATNIP data and correlation with literature datasets
A list of annotated CoA-binders was defined by searching the Uniprot database using query terms related to this function including “CoA binding,” “CoA,” “Coenzyme A,” “Acetyltransferase” “HAT,” “NAT,” “NAA,” “GNAT.” A similar analysis was performed to annotate AT interactors, using query terms including “HAT complex,” “KAT complex,” “NAA complex,” “NAT complex,” and “acetyltransferase complex.” Results were then manually curated with irrelevant proteins and duplicates removed, resulting in the term list provided in Table S2. Correlation of CATNIP enrichment to HeLa cell gene expression and protein abundance (Fig. S1 d-i) was performed using literature RNA-Seq and deep proteomic datasets.12 Venn diagrams comparing overlap between proteins competed 2-fold by acetyl-CoA and all other ligands (Fig. 3b), or metabolic acyl-CoAs 3-6 (Fig. 3c) were generated by identifying a list of proteins showing a (-log2FC) value >1 for each ligand and then assessing overlap using an online Venn diagram tool accessible at http://bioinformatics.psb.ugent.be/webtools/Venn/. Protein subsets were interrogated for enrichment of molecular functions and pathways using the online informatics tools DAVID (david.ncifcrf.gov) and ConsensusPathDB (http://cpdb.molgen.mpg.de/CPDB/rlFrame). For analysis of acetylation stoichiometry, filtered protein subsets were cross-referenced with a list of peptide hits falling in the top 10% of all HeLa cell lysine acetylation stoichiometries measured in a recently published analysis.39 For analysis of lysine malonylation, filtered protein subsets were cross-referenced with a list of malonylated peptides derived from a recently published analysis. Figures of E. coli NAT10 orthologue complexed with acetyl-CoA was generated using Chimera.
Isotopic tracing experiments to determine metabolic source of N4-acetylcytidine (ac4C)
HeLa cells were cultured at 37 °C under 5% CO2 atmosphere in a growth medium of DMEM supplemented with 10% FBS and 2 mM glutamine. HeLa cells were plated in 10 cm dishes (3 ×106 cells in 10 ml RPMI media/dish) and allowed to adhere for 24 h. After this, media was removed, cells were washed once with PBS (10 ml), and switched to either i) heavy glucose media (glucose-free DMEM containing 2 mM glutamine, 25 mM U-13C6-glucose, 0.2 mM acetate), ii) heavy acetate media (glucose-free DMEM containing 2 mM glutamine, 25 mM glucose, 0.2 mM U-13C2-acetate) or iii) regular glucose media (glucose-free DMEM containing 2 mM glutamine, 25 mM glucose, 0.2 mM acetate). Cells were incubated with the tracer for 16 h or 24 h at 37 °C and total RNA was harvested using TRIzol reagent (ThermoFisher Scientific) according to the manufacturer’s instructions. Digestion of total RNA (220 µg) was performed as previously described.56 Briefly, RNA was incubated with 1U/10 µg RNA of nuclease P1 (Sigma-Aldrich) in 100 mM ammonium acetate [pH 5.5] for 16 hr at 37 °C. Five microliter of 1 M ammonium bicarbonate [pH 8.3] and 0.5U/10 µg RNA of Bacterial Alkaline Phosphatase (ThermoFisher Scientific) were added for 2 hr at 37 °C. Following digestion, sample volumes were adjusted to 150 µL with RNase-free water and spin filtered to remove enzymatic constituents (Amicon Ultra 3K, #UFC500396). Filtrate and washes (200 µL × 3, RNase-free water) were collected and lyophilized. Lyophilized samples were reconstituted in 250 µL H2O containing internal standards (D3-ac4C, 500 nM; 15N3-C, 5 μM, Cambridge Isotopes). Individual samples (15 µL for ac4C analyses, 5 µL for major bases) were then analyzed via injection onto a C18 reverse phase column coupled to an Agilent 6410 QQQ triple-quadrupole LC mass spectrometer in positive electrospray ionization mode (Agilent Technologies). Quantification was performed based on nucleoside-to-base ion transitions using standard curves of pure nucleosides and stable isotope labelled internal standards.
LN229 ac4C analysis and MS
LN229 wild-type (WT) and ACLY knockout (ACLY KO) cell lines (kind gift of K. Wellen laboratory, University of Pennsylvania) were cultured at 37 °C under 5% CO2 atmosphere in a growth medium of RPMI supplemented with 10% FBS and 2 mM glutamine as previously described.57 For assessment of ac4C levels, total RNA was isolated from LN229 cells using TRIzol reagent (ThermoFisher Scientific). Enrichment of polyadenylated RNA [poly(A) RNA] for UHPLC-MS, was carried using two rounds of selection with Oligo-(dT)25 Dynabeads (ThermoFisher Scientific) according to the manufacturer’s instructions. 300 ng of total or poly(A) RNA was used to evaluate the levels of ac4C and mcm5s2U by LC-MS/MS using a similar method as described.57 Briefly, prior to UHPLC-MS analysis, 300 ng of each oligonucleotide was treated with 0.5pg/µl of internal standard (IS), isotopically labeled guanosine, [13C][15N]-G. The enzymatic digestion was carried out using Nucleoside Digestion Mix (New England BioLabs) according to the manufacturer’s instructions. Finally, the digested samples were lyophilized and reconstituted in 100 µl of RNAse-free water, 0.01% formic acid prior to UHPLC-MS/MS analysis. The UHPLC-MS analysis was accomplished on a Waters XEVO TQ-STM (Waters Corporation, USA) triple quadruple mass spectrometer equipped with an electrospray source (ESI) source maintained at 150 °C and a capillary voltage of 1 kV. Nitrogen was used as the nebulizer gas which was maintained at 7 bars pressure, flow rate of 500 l/h and at temperature of 500°C. UHPLC-MS/MS analysis was performed in ESI positive-ion mode using multiple-reaction monitoring (MRM) from ion transitions previously determined for ac4C and mcm5s2U.58. A Waters ACQUITY UPLC ™ HSS T3 guard column, 2.1x 5 mm, 1.8 µm, attached to a HSS T3 column, 2,1×50 nm,1.7 µm were used for the separation. Mobile phases included RNAse-free water (18 MOcm-1) containing 0.01% formic acid (Buffer A) and 50:50 acetonitrile in Buffer A (Buffer B). The digested nucleotides were eluted at a flow rate of 0.5 ml/min with a gradient as follows: 0-2 min, 0-10%B; 2-3 min, 10-15% B; 3-4 min, 15-100% B; 4-4.5 min, 100 %B. The total run time was 7 min. The column oven temperature was kept at 35oC and sample injection volume was 10 ul. Three injections were performed for each sample. Data acquisition and analysis were performed with MassLynx V4.1 and TargetLynx. Calibration curves were plotted using linear regression with a weight factor of 1/x.
Data accessibility
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via Pride (Deutsch, et. al, 2017; Perez-Riverol, Y., et. al, 2019) partner repository with the dataset identifier PXD013157 and 10.6019/DXD013157. Original data underlying this manuscript may also be accessed after publication from the Stowers Original Data Repository at http://www.stowers.org/research/publications/libpb-1355. Review access can be obtained using the following username and password:
Username: reviewer59307{at}ebi.ac.uk
Password: 3npaE9w9
Acknowledgements
The authors thank K. Wellen (University of Pennsylvania) and N. Snyder (Drexel University) for helpful discussions. This work was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research (ZIA BC011488–04), the Stowers Institute for Medical Research, and the National Institute of General Medical Sciences of the National Institutes of Health under Award Number RO1GM112639 to MPW. In addition, this project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract number HHSN261200800001E. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.