Abstract
Meningitis is a potentially life-threatening infection characterized by the inflammation of the leptomeningeal membranes. Many different viral and bacterial pathogens can cause meningitis, with differences in mortality rates, risk of developing neurological sequelae and treatment options. Here we constructed a compendium of digital cerebrospinal fluid (CSF) proteome maps to define pathogen-specific host response patterns in meningitis. The results revealed a drastic and pathogen-type specific influx of tissue-, cell- and plasma proteins in the CSF, where in particular a large increase of neutrophil derived proteins in the CSF correlated with acute bacterial meningitis. Additionally, both acute bacterial and viral meningitis result in marked reduction of brain-enriched proteins. Generation of a multi-protein LASSO regression model resulted in an 18-protein panel of cell and tissue associated proteins capable of classifying acute bacterial meningitis and viral meningitis. The same protein panel also enabled classification of tick-borne encephalitis, a subgroup of viral meningitis, with high sensitivity and specificity. The work provides insights into pathogen specific host response patterns in CSF from different disease etiologies to support future classification of pathogen-type based on host response patterns in meningitis.
Introduction
Meningitis is a common condition with an estimated annual prevalence of 8.7 million cases globally (Kassebaum et al, 2017). In the majority of cases, meningitis is caused by viruses, such as enteroviruses, and is associated with low mortality rates (Chadwick, 2005). Certain subtypes of viral meningitis (VM), such as tick-borne encephalitis (TBE) are in contrast associated with higher mortality rates and risk of developing neurological sequelae (Bogovic & Strle, 2015). Acute bacterial meningitis (ABM) is one of the leading causes of death due to infectious diseases worldwide and is associated with rapid disease progression, increased risk of long-term neurological sequelae in survivors and high mortality rates (van de Beek et al, 2004; van de Beek et al, 2006). The different bacterial and viral pathogens are associated with specific virulence mechanisms that impacts the molecular phenotype of the host immune response. This information can be used to diagnose patients with meningitis, which routinely involves lumbar punctures to evaluate several parameters in the CSF such as the number of white blood cells as well as glucose and protein concentrations to differentiate between ABM and VM (Ross et al, 1988). Unfortunately, these parameters are relatively non-specific yielding inconclusive diagnostic information (Garty et al, 1997; Lindquist et al, 1988; Nigrovic et al, 2002) and it currently remains unknown if different pathogens and pathogen types can evoke detectable differences in host response proteome.
Extracellular body fluids such as blood plasma, saliva and cerebrospinal fluid (CSF) are deficient in the machinery required for de novo protein synthesis. The protein constituents of body fluids are generated via active protein secretion from surrounding tissues or from passive leakage derived from normal protein turnover from cells and tissues. During healthy conditions, the concentration of individual proteins in body fluids is maintained via a tightly controlled balance between protein secretion and clearance. This balance is altered in severe infectious diseases such as sepsis or meningitis due the host responses triggered by the invading pathogen (Malmstrom et al, 2016). In meningitis and sepsis, the immune response normally increase cellular and acellular mediators in the CSF and plasma, which leads to a drastic proteome reorganization (Karlsson et al, 2018). In the most severe stages of disease, host immune response becomes overwhelming, leading to organ damage and impaired prognosis (van de Beek et al., 2004; van de Beek et al., 2006).
Improved definitions of the pathogen or pathogen type specific host response patterns in CSF could provide insights into specific virulence mechanisms subsequently leading to the development of new diagnostic and prognostic information. However, detection of pathogen specific host response patterns requires the analysis of large sample cohorts to compare host response patterns from a similar group of pathogens to all other pathogen types. As the protein composition of body fluids is produced elsewhere, mRNA transcript profiling is less suitable for the detection of host responses. Mass spectrometry (MS) based protein quantification has become the preferred method for multiplexed and quantitative analysis of proteomes (Aebersold & Mann, 2016). The prevailing MS strategy, relies on data-dependent acquisition (DDA) where the mass spectrometer sequentially selects and fragments trypsin generated peptide ions to generate informative daughter fragment ion spectra used for database searching. Although such proteomics experiments enable identification of thousands of peptide ions, the strategy is associated with lower quantitative reproducibility, as in complex samples the number of available peptides exceeds the cycling time of the data-dependent acquisition method. In contrast, the recent development of sequential window acquisition of all theoretical fragment ion spectra (SWATH)-MS generates fragment ion spectra of all MS-measurable peptides to produce a digital representation of the analyzed proteome (Gillet et al, 2012). In SWATH-MS, proteome maps are generated based on data-independent acquisition (DIA), followed by protein quantification using a priori established assay libraries (Teleman et al, 2017). Different assay libraries can be used to iteratively query the same DIA data in an iterative fashion, which is a considerable advantage as it permits a single data collection step for clinical samples (Guo et al, 2015) and focused re-analysis in future studies. Importantly, DIA is associated with a high degree of reproducibility making it possible to merge data sets analyzed separately in time to perform cross-study comparisons. In this way, new opportunities emerge to construct compendiums of proteome maps from physical biobanks to identify protein patterns in body fluids associated with for example disease progression and treatment options.
In this study, we developed a compendium of SWATH-MS CSF proteome maps to provide novel insights in central nervous system (CNS) functioning and host response in a cohort of patients with meningitis. We demonstrate how an extendable compendium of proteome maps supports post-acquisition and iterative data analysis using cell and tissue derived assay libraries to define discriminatory protein panels associated with ABM or VM. The results revealed how meningitis generates pathogen specific changes in the CSF proteome. Furthermore, the compendium of CSF proteome maps supported the identification of a protein panel capable of differentiating between ABM, VM and TBE, a viral meningitis subgroup, with high sensitivity and specificity based on the host response patterns in the CSF.
Results
Construction of a compendium of SWATH-MS CSF proteome maps from meningitis patients
Detection of pathogen-specific host responses in the CSF proteome requires comparative analysis of patient sample cases of meningitis caused by different pathogens. Here, CSF was collected by lumbar puncture from a cohort of 135 patients admitted to the hospital with the suspicion of meningitis. Following confirmed diagnosis, the patients were broadly subdivided into ABM (n=35), neuroborreliosis (BM, n=7), VM (n=21), suspected ABM (n=5), suspected VM (n=16) and inflammation without infection (n=2). In this cohort, ABM was caused by 11 different bacterial pathogens, where the most frequent pathogen was S. pneumoniea. VM was caused by 7 different viruses with the highest frequency of TBE and enteroviruses (Fig 1). Patients with suspected meningitis but with normal white blood cell (WBC) count (<5 x 10^6 /L) and with no clinical signs of infection/inflammation were regarded as a control group (n=49). CSF from each sample was digested, and peptides were analyzed by DDA MS for the construction of a CSF proteome assay library and DIA-MS to produce a compendium of proteome maps (Fig. 1A). The CSF assay library was merged with previously established assay libraries from 28 healthy human organs or primary cells to enable the quantification of proteins enriched in relevant tissues such as brain, plasma and immune cells. The assay library relied on the protein abundance in the analyzed tissues and provides a statistically significant relationship between proteins and tissues, which was integrated into our results to infer the most likely tissue origin of proteins detected in the CSF. The compendium of SWATH-MS files was subsequently interrogated with the merged human assay library followed by interrogation of assay libraries from the most common pathogens causing meningitis to determine protein profiles correlating with pathogen-type (Fig. 1B).
Changes in the proteome pattern in CSF during meningitis
On average, the total protein intensities in ABM was 2, 2.7 and 4.3 times higher than in BM, VM and the control group, respectively (Supplementary Figure 1A). This reflects an elevated protein concentration in CSF, associated in particular with bacterial meningitis. The distribution of quantified proteins in relation to sample groups is presented as a heat map in Figure 2A. The sample clustering clearly subdivides the sample cohort into four distinct clusters representing predominantly the ABM, VM and control samples. The row-wise color-coding indicates inferred tissue origin based on the information from the assay library. The most distant sample cluster includes in principle all the ABM samples. The remaining sample clusters are more similar but the general trend supports subdivision of most of the VM samples from the control samples. We observe that numerous neutrophil-, plasma- and brain-associated proteins constitutes on average 15-20 % of the protein intensity, where in particular the neutrophil proteins increased during ABM (Supplementary Figure 1B). Plasma proteins are known to be major constituents of CSF under physiological conditions (Guldbrandsen et al, 2014) and presence of brain-associated proteins in CSF is likely related to protein turnover in the brain. Statistical analysis between sample groups reveals 79 statistically induced or repressed proteins (Fig. 2B-D). The majority of significantly altered proteins were associated with neutrophils (32 % in ABM and 14 % in VM), the brain (44 % in ABM and 32 % in VM) and some plasma proteins including several acutephase proteins (Supplementary Fig. 2A-B). In addition, several neutrophil associated proteins were induced ≥ 64-fold (yellow dots) but not statistically significant, indicating a large degree of variation associated with these proteins. For each group, the average intensities of all neutrophil proteins show a significant increase only in ABM compared to controls (Fig. 2F), whereas there is a significant decrease in brain-associated proteins in ABM, VM and TBE (Fig. 2E). Among the gene ontology (GO) terms enriched for the repressed proteins in ABM was “regulation of synapse maturation and assembly”, suggesting changes on the cellular level in neuronal circuits during ABM (Supplementary Fig. 2C). Of the induced proteins, notable GO terms were “regulation of apoptotic signaling pathway”, “defense response to bacterium” and “glial cell development”. These suggest increased apoptotic pathways and enriched molecular processes specialized in combating bacteria in the CSF of patients with ABM, followed by increased development of glial cells, which are involved in maintaining chemical homeostasis and act as the immune cells of the CNS (Ransohoff & Brown, 2012).
Cross-comparison of CSF protein composition between ABM, BM and VM
To evaluate differences in the host response between the sample groups we plotted the differentially expressed proteins in scatter plots, with the corresponding log2 fold change for ABM/control on the x-axis and log2 fold change for BM/control on the y-axis (Fig. 3A). This cross-plot was repeated for ABM against VM (Fig. 3B) and VM against BM (Fig. 3C). Proteins that were statistically significant or associated with high fold changes (≥ 64) in both sample groups were colored in black, otherwise only in one color (red; ABM, orange; BM and green; VM). This cross-comparison reveals that ABM and VM evoke a more similar response compared to BM. In total, ABM was associated with 49 unique statistically significant proteins and VM 10 proteins and 16 proteins were shared between the disease groups (Figure 3D). Only three proteins, chitinase-3-like protein 2 (CHI3L2), Immunoglobulin mu heavy chain disease protein (MUCB), and profilin-1 (PROF1), were found at higher levels in all the three diseased groups compared to control (Fig. 3E), although the intensities for these proteins are substantially higher in the ABM samples. Previous studies have shown an association between these proteins and neurological disease (Mollgaard et al, 2016; Narayanan et al, 2016; Opsahl et al, 2016). The inferred tissue assignment and average intensities for the 49 ABM-specific proteins in all samples are presented in Figure 3F and for the 10 VM-specific proteins in Figure 3G. We observed that neutrophil proteins are distinctive for ABM, but not for VM. In addition, ABM also led to increased levels of acute-phase proteins such as serum amyloid a-1 (SAA1) and C-reactive protein (CRP). In contrast, VM resulted in increased concentrations of C4b-binding protein and C-X-C motif chemokine 10 (CXCL10), where the latter chemokine is elevated in plasma during viral infections.
Predictive proteomic patterns using LASSO regression modeling
To define host response protein patterns specific for pathogen-type, we used LASSO regression to select discriminatory protein intensities from the compendium of CSF proteome maps. In total, we included the four broader groups, ABM, BM, VM and the control group. In addition, we included TBE, as this group was associated with a distinct molecular host-response. We used 4-fold cross-validation to build a model for each of the 5 groups to distinguish the groups from each other. To assess the stability of our model, we repeated the process 100 times (runs), generating 2000 models in total (4 folds x 5 groups x 100 runs). All of the tested groups had an area under the receiver operating characteristic curve (AUROC) above 0.80 and two of them had an AUROC above 0.90 (Fig. 4A). The protein profiles for ABM and control were reproducible across the 100 runs and specific, with a mean AUROC of 0.96 for ABM and 0.95 for controls. The models for BM and VM had a lower mean AUROC of 0.85 and 0.80 respectively. TBE displayed the most stable and discriminatory AUROC of 0.87 of all specific causative-agent subgroups, indicating that TBE evokes a distinct host response. In total 18 proteins were detected in ≥ 90 % of the 100 runs in every fold and had a nonzero weight, which were consistently used in the LASSO models for discriminating different sample groups. For these 18 predictive proteins, we visualized the distribution of the respective protein contribution to the models as boxplots colored by the average weight (coefficient) over all 100 runs (Fig. 4C). Proteins with a non-zero coefficient had a positive influence on the model and are regarded as predictive proteins for the disease group as shown by the fill color of the box plots. The average intensity of the 18 proteins shows the differences in the abundance levels in their respective sample group (Fig. 4D). Five of the 18 proteins were in general detected in higher amounts in ABM and BM and include proteins such as gelsolin (GEL), cathepsin D (CATD) and Transthyretin (TTHY) that are involved in neutrophil degranulation according to the Reactome Pathway Database. Nine of the proteins were exclusively elevated in ABM and include proteins involved in inflammatory response (A2GL, HPT A1T1), fibrin clot formation (HEP2, PROS), LPS binding (CD14) and regulation of the complement cascade (CFAH). Notably, these proteins are all found in lower concentrations in the control samples and these proteins were mostly predictive for ABM and/or the control samples. In contrast, BM, VM and TBE in general have less discriminatory weight coefficient. For TBE monocyte differentiation antigen CD14 was elevated compared to the controls. Antithrombin-III, a serine protease inhibitor that regulates the blood coagulation cascade, was associated with a high weight coefficient and a relatively high abundance level in the VM group. In conclusion, these results demonstrate that LASSO regression can select a set of predictive proteins to classify different disease etiologies in meningitis, which was not possible by any of single-protein biomarker candidates quantified in this study.
Longitudinal follow-up of protein levels in ABM and subarachnoidal hemorrhage
Meningitis is a disease that is associated with a risk of developing severe neurological sequelae in survivors, and some studies have shown that the levels of certain proteins remained high in the CSF from non-survivors (Goonetilleke et al, 2010), such as chitotriosidase, complement C1q tumor necrosis factor-related protein 9 and haptoglobin. To investigate changes of the CSF proteome over time, we extended the compendium of proteome maps with an additional total of 36 CSF samples from ABM (6 patients) and from patients with subarachnoidal hemorrhage (SAH, 4 patients). From these patients there were multiple CSF samples collected up to 10-13 days after admittance to hospital (Fig, 5A). Four LASSO-predicted proteins for ABM remained elevated until the end of the 10-day period (Fig. 5B), suggesting that sampling over a longer time period is required to observe the protein levels returning to baseline. The brain proteins with significantly lower levels in ABM started to increase during the extended follow-up times peaking around day 5 (Fig. 5C). In contrast, the proteins with significantly lower abundance in ABM compared to control cases, show a slight increase over time (Fig. 5D), whereas the proteins with higher levels in ABM compared to control decreased in ABM over the 10 day-period (Fig. 5E). The levels of neutrophil proteins in ABM start to decrease but showed a high degree of variability (Fig. 5F). In all cases, proteins levels were low or absent in SAH. Interestingly, the LASSO-predicted proteins used for discriminating ABM tend to stay elevated even after the patients were released from hospital care. These results indicate an ongoing complex host response process that is unique to infectious neurological disease, such as ABM, and are absent in other types of neurological trauma such as SAH. Data of protein levels over a longer period of time could possibly help in understanding neurological sequelae that are followed after some cases of meningitis.
Discussion
Here we present a resource of extendable compendium of CSF proteome maps from a cohort of meningitis patients to define specific host response patterns in meningitis. The quantified proteins have a median CVs for technical replicates below 20 % as previously shown (Guo et al., 2015). The compendium presented in this study is comprised of in total 171 CSF samples with different sample- and time-dimensions generated from <50 μl of CSF. In SWATH-MS, the postacquisition targeted data analysis strategy using previously established assay libraries confirms presence and relative abundance of proteins. Once acquired, the compendium of SWATH-MS maps can be iteratively re-analyzed in silico to test new hypotheses. In this study, we used a highly curated assay library to search for tissue or cell enriched proteins informative for host response patterns associated with ABM, BM, TBE, and VM.
In severe infectious diseases, several factors may influence the type and magnitude of the host response, such as type of infecting pathogen, host immune status and time of infection. The dysregulated host responses during sepsis and meningitis can be detrimental to the host (Iskander et al, 2013; Ward et al, 2008). At the same time, altered protein composition of CSF provides an opportunity to probe disease status of the complex interplay of factors driving the host response. Damage derived from the infectious process and/or the host response may generate detectable protein changes associated with for example organ damage (Malmstrom et al, 2015). In this study, we used LASSO regression to define protein patterns capable of discriminating between ABM, BM, VM, TBE and controls. Based on an assay library containing information of tissue and cell enriched proteins, the different disease groups had noticeable differences in protein patterns associated with neutrophils, blood plasma and proteins predominantly present in the brain. The increased levels of neutrophil proteins are a consequence of infiltrating activated neutrophils, known to occur predominately during ABM (Hoffman & Weber, 2009; Tunkel & Scheld, 1993). Neutrophils may release decondensed extracellular DNA coated with antimicrobial proteins called neutrophil extracellular traps (NETs) (Moorthy et al, 2016). A recent study showed high levels of NETs in the cerebrospinal fluid (CSF) of patients with pneumococcal meningitis and that disrupting NETs using DNase I significantly reduces bacterial load, demonstrating that NETs contribute to the pathogenesis of pneumococcal meningitis (Mohanty et al, 2019). It is noteworthy that both ABM and VM introduce reduced levels of brain specific proteins, although the overlap of the reduced proteins between ABM and VM is small. The reason for this reduction is not clarified although it was previously reported that brain proteins decrease in CSF during other neurological disorders, such as Huntington’s disease (Fang et al, 2009).
The large number of disease-causing pathogens in meningitis leads to high heterogeneity in both patients and disease progression, and this variability can be seen in other severe infectious diseases such as sepsis. In order to account for this clinical diversity in medical research, traditional individual biomarker studies can be replaced with multi-protein panels to provide better coverage of the underlying disease. The LASSO regression model resulted in 18 predictive proteins. Proteins predictive of ABM include cathepsin D (CATD) and complement factor H (CFAH), both of which are known plasma proteins involved in clearance of specifically bacterial infections (Bewley et al, 2011; Haapasalo et al, 2012). Furthermore, one predictive protein of VM, antithrombin-3 (ANT3), has shown to have a broad-spectrum anti-viral activity for various human cytomegaloviruses and herpes simplex viruses (Quenelle et al, 2014). These results together indicate that large-scale multi-protein panels can yield biologically and clinically valuable results difficult to achieve with traditional statistical analyses.
The high reproducibility of SWATH-MS is a consequence of the data-independent acquisition, which generates fragment-ion spectra from all MS-measurable peptides from a proteome. This feature can support extension of the compendium to generate a digital representation of physical biobanks. In total, the sample cohort in this study consists of patients infected with 19 different pathogens, where the three largest groups were Streptococcus pneumoniae, Enteroviruses and TBE. Among these pathogen groups, the LASSO regression generated the highest AUC for TBE although the sample size was relatively small. There are currently no biomarkers for detecting and diagnosing TBE (Bogovic & Strle, 2015). Future extension of the compendium will enable further investigations of pathogen-specific host responses for other pathogens in CSF. In addition, the opportunity to re-analyze the compendium in silico based on improved assay libraries will provide opportunities to find new protein-patterns correlating with other types of clinical information such as disease outcome or the development of neurologic sequelae. This goal can be achieved through either a data-driven process where samples are clustered based on similarities in the proteome changes or in a hypothesis driven fashion where the compendium is interrogated for protein profiles correlating with for example pathogen-type, infection time or disease severity. The increasing efforts to construct complete assay libraries of the human proteome (Kim et al, 2014; Matsumoto et al, 2017; Zolg et al, 2017) will provide improved definition of proteins enriched in particular cells and tissues and will enable quantification of additional proteins, post-translational modifications (Rosenberger et al, 2017) and proteolytically processed proteins from the already established compendiums of proteome maps. Ultimately such a resource can be anticipated to enable improved correlation between host responses and detectable changes in CSF and potentially blood plasma to identify molecular markers that can be used for the development of new diagnostic, prognostic and treatment decisive information for severe infectious diseases.
Materials and Methods
Patients and CSF samples
CSF samples (n=171) from a total of 139 patients from two different cohorts were analyzed. Patients enrolled in a prospective study at the Clinic for Infectious Diseases, Lund University Hospital, Lund, Sweden, between March 2006 and November 2009 (as previously described, (Linder et al, 2011)) with a clinical suspicion of meningitis who underwent a lumbar puncture and where CSF samples were collected were included. The patients were categorized into following groups: ABM (n=35), BM (n=7), VM (n=21, of which TBE n=5), suspected ABM (n=5), suspected VM (n=16) and inflammation with no infection (n=2). Control patients had a suspected meningitis, but were declared healthy after displaying a normal CSF WBC count (<5 x 10^6 /L) (n=49). In addition, longitudinal samples were collected from six of the previously mentioned ABM patients (6 original samples and additional 14 longitudinal samples) and from 4 patients with subarachnoidal hemorrhage (SAH, 22 longitudinal samples).
Ethics statement
The medical ethics committees (Institutional Review Boards) of the Lund University approved of the study (decision number 790/2005 and 2016/672), and all samples were taken with the informed consent of the participants or next of kin.
CSF sample preparation
A constant volume of 50 μl of each CSF sample was used to correlate to the protein concentration present in each sample. Samples were heat-inactivated by incubation on a heatblock for 5 min at 80 °C to kill any microorganisms present in the samples, and then transferred into lysing matrix tubes (Nordic Biolabs) containing 90 mg silica beads (0.1 mm). The cells were homogenized with a cell disruptor (BeadBeater, FastPrep 96, MP Biomedicals) twice for 180 seconds. The samples were incubated for 30 min at 37 °C in 10 M urea, 1 M ABC (ammonium bicarbonate) and 1 μg trypsin (Sequence grade modified trypsin porcin, Promega) for denaturation and tryptic cleavage. Samples were further incubated in 10 M urea and 1 M ABC for 30 min, after which large unfolded proteins were spun down by centrifugation. The supernatants were reduced by incubation for 60 min at 37 °C in 500 mM TCEP (tris(2-carboxyethyl) phosphine, Sigma Aldrich), and alkylated by incubation with 500 mM IAA (2-Iodoacetamide, Sigma Aldrich). Samples were diluted in 100 mM ABC, and incubated overnight in 1 μg trypsin, after which trypsin was inactivated by addition of formic acid. C18-columns (Vydac UltraMicro SpinTM Silica C18 300Å) were used according to manufacturer’s instructions to clean-up and concentrate the peptide samples.
LC-MS/MS analysis
All peptide analyses were performed on a Q Exactive Plus mass spectrometer (Thermo Fisher Scientific) connected to an EASY-nLC 1000 ultra-high-performance liquid chromatography system (Thermo Fisher Scientific). For shotgun analysis, peptides were separated on an EASY-Spray column (Thermo Scientific; ID 75 μm x 25 cm, column temperature 45 °C). Column equilibration and sample load were performed using constant pressure at 600 bar. Solvent A was used as stationary phase (0.1 % formic acid). Solvent B (mobile phase; 0.1 % formic acid, 100% acetonitrile) was used to run a linear gradient from 5 % to 35 % over 60 min at a flow rate of 300 nl/min. One full MS scan (resolution 70,000 @ 200 m/z; mass range 400-1,600 m/z) was followed by MS/MS scans (resolution 17,500 @ 200 m/z) of the 15 most abundant ion signals (TOP15). The precursor ions were isolated with 2 m/z isolation width and fragmented using higher-energy collisional-induced dissociation at a normalized collision energy of 30. Charge state screening was enabled and unassigned or singly charged ions were rejected. The dynamic exclusion window was set to 10 s. Only MS precursors that exceeded a threshold of 1.7e4 were allowed to trigger MS/MS scans. The ion accumulation time was set to 100 ms (MS) and 60 ms (MS/MS) using an AGC target setting of 1e6 (MS and MS/MS).
For data-independent acquisition (DIA), peptides were separated using an EASY-spray column (Thermo Fisher Scientific; ID 75 ym x 25 cm, column temperature 45 °C). Column equilibration and sample load was performed at 600 bar. Solvent A was used as stationary phase (0.1 % formic acid). Solvent B (mobile phase; 0.1 % formic acid, 100% acetonitrile) was used to run a linear gradient from 5 % to 35 % over 120 min at a flow rate of 300 nl/min. A full MS scan (resolution 70,000 @ 200 m/z; mass range from 400 to 1,200 m/z) was followed by 32 MS/MS full fragmentation scans (resolution 35,000 @ 200 m/z) using an isolation window of 26 m/z (including 0.5 m/z overlap between the previous and next window). The precursor ions within each isolation window were fragmented using higher-energy collisional-induced dissociation at normalized collision energy of 30. The automatic gain control was set to 1e6 for both MS and MS/MS with ion accumulation times of 100 ms (MS) and 120 ms (MS/MS). The obtained raw files were converted to mzML using MSConvert (Kessner et al, 2008).
Shotgun analysis mass spectrometry analysis
The shotgun MS data was searched with Trans-Proteomic Pipeline (TPP, v4.7 POLAR VORTEX rev 0, Build 201405161127) using X!Tandem against the UniProt human reference proteome (UP000005640, Oct-2015, reviewed and canonical proteins only), and for generation of decoy proteins a reverse approach was used. Cysteine carbamidomethylation was considered as a fixed and methionine oxidation as a variable modification, and enzyme specificity was set for trypsin to allow two missed cleavage sites. Variable acetylation of the n-terminae, S-carbamoylmethyl-cysteine cyclization of the n-terminal cysteines as well as pyro-glutamic acid formation from glutamic acid and glutamine was also allowed by X!Tandem. The precursor mass tolerance thresholds were set to 20 ppm and the fragment mass tolerance to 50 ppm. The raw files were gzipped and Numpressed (Teleman et al, 2014) and converted to mzML format using msconvert from ProteoWizard (v3.05930 suite, (Chambers et al, 2012)).
Assay library creation and DIA analysis
Assay libraries were created using the Fraggle-Franklin-Tramler workflow (Teleman et al., 2017). In brief, fragment spectra from TPP search results were interpreted by the software tool fraggle and assembled into a retention time normalized consensus assay library with a Franklin derived multi-level FDR of <0.01. Assays were generated by the software tool Tramler and contain only the 3-6 most intense fragments within the mass range of 350-2000 m/z and do not fall within the precursor isolation window (Deutsch et al, 2012). The assay library was stored in traML format. For DIA analysis, DIANA v2.0.0 was used (Teleman et al, 2015) with a 20 ppm extraction window. The generated data was manually curated to remove various immunoglobulin variable chain proteins.
Human tissue atlas
The human tissue atlas was used to statistically assign all detected proteins (n=771) to human tissue based on abundance. A total of 12 tissue-assignments were used, which included the following tissues and cell types: adipose, brain, liver, nerve, erythrocytes, lymphocytes, macrophages, neutrophils, platelets and plasma. Proteins associated to other tissues that were available via the atlas were depicted here as “others”. A protein was depicted as “not classified” if the abundance could not be statistically associated to only one tissue, or if it was missing from the atlas altogether. The tissue assignments of certain proteins used in further analyses in this study were compared and matched against other, publicly available and published protein tissueassignment repositories (Supplementary Table 1).
Statistical analysis
For statistical analyses a Benjamini-Hochberg corrected t-test was used. Gene ontology annotations were performed by using two unranked lists of proteins with GOrilla (Eden et al, 2009). For the LASSO modeling, the proteomic data was first log-transformed, scaled and centered (i.e. log10 of the abundances, subtraction of the per-protein mean and division by the per-protein standard deviation). This data was then used as the input to the LASSO implementation of the LiblineaR package of the R statistical programming language (R version 3.6.1; LiblineaR version 2.10-8). A 4-fold cross-validation approach was used for model building, where the samples were split randomly into 4 balanced groups (same proportion of pathogen type as the whole cohort). For each of the 4 groups, the model was trained on 3/4 of the data and the performance was evaluated on the remaining 1/4, and this modeling was performed 100 times. Proteins that received a non-zero weight coefficient (as in the value of that protein has an influence on the model) in ≥ 90 % of all models were selected for further analysis.
Supplementary Materials
Fig. S1. Overview of average protein content and intensities.
Fig. S2. Analysis of differences in the CSF of patients with meningitis.
Table S1. Cross-referencing the tissue assignments based on Malmström, et al. (unpublished work).
Table S2. The full data generated from 112 data-independent acquisition MS-runs.
Table S3. The full data generated from the data-independent acquisition MS-runs from the longitudinal study.
Funding
This work was supported by the Swedish Research Council (project 2015-02481), the Crafoord Foundation (grant 20100892), Stiftelsen Olle Engkvist Byggmästare, the Wallenberg Academy Fellow program KAW (2012.0178 and 2017.0271), European research council starting grant (ERC-2012-StG-309831) and the Medical Faculty, Lund University.
Author contributions
A.B, A.L and J.M conceived and designed the study. A.L provided the clinical samples. A.B and T.M performed the laboratory experiments. A.B performed the MS-analysis. A.B, P.P and L.M performed bioinformatics analysis and analyzed the data. A.L provided clinical advice. A.B, T.M, P.P, L.M, A.L and J.M contributed to writing of the manuscript.
Competing interests
The authors declare no competing financial interests.