Abstract
To develop efficient therapies and identify novel early biomarkers for chronic kidney disease an understanding of the molecular mechanisms orchestrating it is essential. We here set out to understand how differences in CKD origin are reflected in gene expression. To this end, we integrated publicly available human glomerular microarray gene expression data for nine kidney disease entities that account for a majority of CKD worldwide. We included data from five distinct studies and compared glomerular gene expression profiles to that of non-tumor parts of kidney cancer nephrectomy tissues. A major challenge was the integration of the data from different sources, platforms and conditions, that we mitigated with a bespoke stringent procedure. This allowed us to perform a global transcriptome-based delineation of different kidney disease entities, obtaining a landscape of their similarities and differences based on the genes that acquire a consistent differential expression between each kidney disease entity and nephrectomy tissue. Furthermore, we derived functional insights by inferring activity of signaling pathways and transcription factors from the collected gene expression data, and identified potential drug candidates based on expression signature matching. We validated representative findings by immunostaining in human kidney biopsies indicating e.g. that the transcription factor FOXM1 is significantly and specifically expressed in parietal epithelial cells in RPGN whereas not expressed in control kidney tissue. These results provide a foundation to comprehend the specific molecular mechanisms underlying different kidney disease entities, that can pave the way to identify biomarkers and potential therapeutic targets. To facilitate this, we provide our results as a free interactive web application: https://saezlab.shinyapps.io/ckd_landscape/.
Translational Statement Chronic kidney disease is a combination of entities with different etiologies. We integrate and analyse transcriptomics analysis of glomerular from different entities to dissect their different pathophysiology, what might help to identify novel entity-specific therapeutic targets.
1. Introduction
Chronic Kidney Disease (CKD) is a major public health burden affecting more than 10 % of the population globally 1 There is no specific therapy and the associated costs are enormous 2. The origin of CKD is heterogenous and has slowly changed in recent years due to an aging population with increased number of patients with hypertension and diabetes. Major contributors to worldwide CKD include Diabetic nephropathy (DN) and Hypertensive nephropathy (HN). Other contributors are immune diseases such as Lupus Nephritis (LN) and glomerulonephritides including IgA nephropathy (IgAN), Membranous glomerulonephritis (MGN), Minimal Change Disease (MCD) as well as Focal Segmental Glomerulosclerosis (FSGS) and Rapidly progressive glomerulonephritis (RPGN).
Regardless of the type of initial injury to the kidney the stereotypic response to chronic repetitive injury is scar formation with subsequent kidney functional decline. Scars form in the tubulointerstitium as tubulointerstitial fibrosis and in the glomerulus as glomerulosclerosis. Despite this stereotypic response the initiating stimuli are quite heterogeneous, ranging from an auto-immunological process in LN to poorly controlled blood glucose levels in DN. A better understanding of similarities and differences in the complex molecular process orchestrating disease initiation and progression will guide the development of novel targeted therapeutics.
A powerful tool to understand and model the molecular basis of diseases is the analysis of genome-wide gene expression data. This has been applied in the context of various kidney diseases contributing to CKD 3–7, and most studies are available in the online resource NephroSeq. However, to the best of our knowledge, no study so far has combined these data sets to build a comprehensive landscape of the molecular alterations underlying different kidney diseases that account for the majority of CKD cases. We collected data from five large studies with microarray gene expression data from kidney biopsies of patients of eight different glomerular disease entities leading to CKD (from hereon referred to as CKD entities), FSGS, MCD, IgAN, LN, MGN, DN, HN and RPGN. We normalized the data with a bespoke stringent procedure, which allowed us to study the similarities and differences among these entities in terms of deregulated genes, pathways, and transcription factors, as well as to identify drugs that revert their expression signatures and thereby might be useful to treat them.
2. Results
2.1. Assembly of a pan-CKD collection of patient gene expression profiles
We searched in Nephroseq (www.nephroseq.org) and Gene Expression Omnibus (GEO) 8,9 and identified five studies - GSE20602 10; GSE32591 11; GSE37460 11; GSE47183 12,13; GSE50469 14 (see section 4.1.) - with human microarray gene expression data for nine different glomerular disease entities: FSGS, MCD, IgAN, LN, MGN, DN, HN and RPGN, as well as healthy tissue and non-tumor part of kidney cancer nephrectomy tissues as controls (Figure 1A and B). In addition, in one dataset, patients were labeled as an overlap of FSGS and MCD (FSGS-MCD) and we left it as such. These studies were generated in two different microarray platforms. To jointly analyze and compare the different CKD entities, we performed a stringent preprocessing and normalization procedure involving quality control, either cyclic loess normalization or YuGene transformation and a batch effect mitigation procedure (see Methods and Supplementary material). At the end we kept 6289 genes from 199 samples in total. From the two potential controls, healthy tissue and nephrectomies, we chose the latter for further analysis as the batch mitigation removed a large number of genes from the healthy tissue samples.
2.2. Technical heterogeneity across samples
We first examined the similarities among the samples to assess potential batch effects. Data did not primarily cluster by study source or platform, which can be attributed to our batch mitigation procedure (Figure 1C, Supplementary Figure 1), although some technical sources of variance potentially still remained (see section 4.4. and Supplementary Figure 1). Samples from RPGN and FSGS-MCD conditions seemed to be more affected by platform-specific batch effects than samples from other conditions, due to the unbalanced distribution of samples: RPGN and FSGS-MCD samples were exclusively represented in one study and in one of the two platforms (Affymetrix Human Genome U133 Plus 2.0 Array (GPL570)). Therefore, batch effect mitigation procedure could not be conducted on them.
2.3. Biological heterogeneity of CKD entities
We set out to find molecular differences among glomerular CKD entities. First, we calculated the differential expression of individual genes between the different CKD entities and TN using limma 15,16. From the 6289 genes included in the integrated dataset, 1791 showed significant differential expression (|logFC| > 1, p-value < 0.05) in at least one CKD entity. RPGN was the CKD entity with the largest number of significantly differentially expressed genes (885), while MCD was the one with the least (75). Twelve genes showed significant differential expression across all the CKD entities (AGMAT, ALB, BHMT2, CALB1, CYP4A11, FOS, HAO2, HMGCS2, MT1F, MT1G, PCK1, SLC6A8). Interestingly, all these genes were underexpressed across all the CKD entities compared to TN. In contrast, QKI and LYZ genes were significantly overexpressed in HN, IgAN, and LN, while significantly underexpressed in FSGS-MCD, and RPGN (and DN for QKI). 107 different genes were significantly differentially expressed relative to TN in at least 6 CKD entities (Figure 2A). Of note, several of the above mentioned genes are considered to be expressed mainly in tubule. This is one drawback of the microdissection technique and future studies using scRNA-seq will dissect which genes are specifically expressed in glomerular cells during homeostasis and disease.
To better comprehend the divergence and similarities of the CKD samples, we asked how the distinct CKD entities localised with respect to each other using a common set of differentially expressed genes with regard to the tumor nephrectomies using diffusion maps (Figure 2B). The diffusion distances of each given CKD entity sample relative to tumor nephrectomy samples reflects a non-linear lower dimensional representation of the differences in gene expression profiles between those samples. The Diffusion map orders the patients along a “pseudo-temporal” order, which we interpret here to indicate disease progression severity in glomeruli 17.
The most distant condition from nephrectomy samples was RPGN, which is arguably the most drastic kidney disease condition with the most rapid functional decline among the entities included. Interestingly, healthy donor samples were distinct from tumor nephrectomy samples despite the fact that the later were resected distantly from the tumors. This might be explained by either minor contamination with cancer cells or paraneoplastic effects on the non-affected kidney tissue such as immune cell infiltration or solely the fact that the nephrectomy tissue was exposed to short ischemia whereas the biopsy tissue from healthy donors was not. DN and LN were in close proximity to RPGN, whereas HN localised near IgAN. Differences were harder to asses in the middle of the diffusion map, but were visible when plotting the dimension components pair-wise (Sup. Figure 2). For instance, MCD samples spanned from a point proximal to tumor nephrectomy to near FSGS, but some MCD samples were in close proximity to MGN or even hypertensive nephropathy. While it makes sense that MCD as a relatively mild disease with normal light microscopy, is relatively close to the control groups of TN and HLD, it remains unclear why other disease entities such as LN and DN. spread widely in the diffusion map. Unfortunately, the data we used did not include information about disease severity, which might help to explain this heterogeneity with early stage disease possibly closer to the control groups and late stage disease closer to RPGN. Dimension component 1 (DC1) seems to offer a focus on the dissimilarity between the two reference healthy conditions, tumor nephrectomy and healthy living donor from the CKD entities. Dimension component 2 (DC2) provides more insight into the disparity of the reference conditions. Dimension component 3 (DC3) discerns the subtle geometrical manifestation of the distinct CKD entities with regard to each other. In summary, using diffusion maps we find clear differences in the global expression profiles of the CKD entities.
2.4. Transcription factor activity in CKD entities
To further characterize the differences among the CKD entities, we performed various functional analyses. First, we assessed the activity of transcription factors (TFs; Figure 3), based the levels of expression of their known putative targets (see Methods). Changes in putative target genes provided superior estimates of the TF activity than the expression level of the transcription factor itself 18,19 (Figure 3). We found 10 TFs differentially regulated in at least one CKD entity (Figure 3). Furthermore, we correlated the identified TF’s activities with the expression of those genes, that are encoding for these TFs. The idea was that, while factors with negative correlations are potentially acting as repressors, those with positive correlations are acting as activators. Those with no correlation indicate factors whose activity may be significantly modulated using post-translational modifications or factors whose regulation or expression measurements are unconfident. For instance, Interferon regulatory factor-1 (IRF1) is significantly enriched in LN and moderately correlated-Spearman’s rho (rs = 0.624- with the expression level of the gene encoding for IRF1. This suggests an as of yet undiscovered potential role of IRF1 as a transcriptional activator in LN. In addition, IRF1’s transcriptional activity was elevated in LN compared to the other disease entities. The activity of the upstream stimulatory factor 2 (USF2) - a basic helix-loop-helix (bHLH) TF 20 - was estimated to be significantly decreased in MCD compared to the rest of the conditions. Interestingly, USF2’s estimated activity across the CKD entities was inversely correlated - Spearman’s rho (rs = −0.867) - with the expression level of the gene USF2, that is encoding for the TF USF2. Intriguingly, USF2 has been implicated as a potential transcriptional modulator of angiotensin II type 1 receptor (AT1R) - associated protein (ATRAP/Agtrap) in mice 20.
We next sought to validate the expression of two identified TFs in human tissue by immunostaining. We stained for USF-2 in human kidney biopsies from healthy controls and patients with MCD. USF-2 was expressed in podocytes, the mainly affected glomerular cell-type in MCD (Figure 4A-B). However, when compared to controls, USF-2 expression in podocytes showed no significant difference detectable by immunofluorescence (Figure 4C-C’’). The reason for this might be that USF-2 activity as a TF might be regulated not only by its abundance in the nucleus but rather by its DNA binding capability in the interaction with other proteins. FOXM1 is a transcription factor of the forkhead box family and a known regulator of cell cycle progression in normal cells as well as a predictor of adverse outcomes across 39 human malignancies 21. Our analysis suggests a highly increased activity of FOXM1 in RPGN (Figure 3). We next validated this observation in human biopsy samples from RPGN patients and normal controls. FOXM1 showed a unique expression in CD44 positive glomerular parietal epithelial cells in RPGN lesions whereas we did not find any expression of FOXM1 in healthy human glomeruli (Figure 4. D-F). Consistent with our TF activity analysis, quantification of this finding in 5 RPGN biopsies versus 6 controls yielded a highly significant difference (Figure 4F), indicating that FOXM1 has a significant role in RPGN progression. This data suggest that our computational method might be useful to identify novel regulators in CKD.
2.4. Signaling Pathway Analysis
We complemented the functional characterization of transcription factor activities with an estimation of pathways activities with the tools PROGENy 22 and Piano 23.
2.4.1. Pathway activity of CKD entities using PROGENy
PROGENy infers pathway activity by looking at the changes in levels of the genes affected by perturbation of pathways. This provides a better proxy of pathway activity than assessing the genes in the actual pathway 22. We used PROGENy scores to estimate pathway activity in a disease entity from the gene expression data (Figure 5A). Essentially, the degree of pathway deregulation was associated with the degree of disease severity, and present rather divergent activities across the CKD entities. For example, VEGF was estimated to be significantly influential in five CKD entities: RPGN, HN, DN, LN and IgAN, out of which VEGF is predicted to be deactivated in RPGN and DN, but more prominently activated in HN, LN and IgAN. 10 out of 11 pathways were predicted to be significantly deregulated in RPGN with respect to TN, which is aligned with the diffusion map (Figure 2B) outcome; the divergence of RPGN from TN (control) was considerably more prominent both at a global transcriptome landscape and signaling pathway level. Intriguingly, the pathway JAK-STAT did not appear to be affected in RPGN, but was considerably activated in LN and markedly deactivated in DN in comparison to TN. Overall, the separate CKD entities were characterised by distinct combinations, magnitudes and directions of signaling pathway activities according to PROGENy.
2.4.2. Pathway enrichment with Piano
While PROGENy can give accurate estimates of pathway activity, it is limited to 11 pathways for which robust signatures could be generated 22. To get a more global picture, we complemented that analysis with a gene-set-enrichment analysis using Piano 23. A total of 160 pathways out of 1329 were significantly enriched (up-/down-regulated, corrected p-value < 0.05) in at least one CKD entity. HN was the entity with the largest number of differentially enriched pathways (81, 25 down-regulated, 56 up-regulated), while FSGS-MCD did not show significant enrichment for any pathway. Cell-cycle and immune-system related pathways were significantly up-regulated in 7/9 CKD entities (FSGS, HN, IgAN, LN, MGN and RPGN in both cases, DN for immune system, and MCD for cell-cycle); in contrast, the VEGF pathway was differentially enriched in LN only. Interestingly, the TNFR2 pathway was differentially enriched in IgAN, HN, and LN, in line with the results from PROGENy where the VEGF pathway was significantly deregulated not only in IgAN, HN and LN, but also in RPGN and DN. 59 different pathways showed significant enrichment in at least 3 CKD entities (Figure 5B). Figure 4B also shows that HN (52), MGN (45), and IgAN (37) are the CKD entities with more pathways differentially enriched in at least 3 entities, a result that agrees with Figure 2B showing these entities in the center of the diffusion map.
2.5 Prediction of potential novel drugs that might affect the identified disease signature in different kidney diseases
Finally, we applied a signature-search-engine, L1000CDS2,24. L1000CDS2 prioritizes small molecules that are expected to have a reverse signature compared to the disease signature. This is based on computing the distance between two signatures of disease data and the LINCS-L1000 data, a large collection of changes in gene expression driven by drugs. We performed this analysis separately for the nine CKD entities and identified 220 small molecules across the CKD entitles (Supplementary Figure 5). In order to narrow down the list of 220 small molecules, we focused on 20 small molecules observed in the L1000CDS2 output of at least 3 subtypes (Figure 6A).
By curation of scientific publications, we found that four small molecules have experimental evidence to support their clinical relevance in CKD or renal disease animal model testing (Supplementary Figure 7.). BRD-K04853698 (LDN-193189) which is known as a selective bone morphogenic protein signaling inhibitor, has been shown to suppress endothelial damage in mice with CKD 25. Wortmannin, a cell-permeable PI3K inhibitor, decreased albuminuria and podocyte damage in early diabetic nephropathy in rats 26. The tyrosine kinase inhibitor nilotinib is used to treat chronic myelogenous leukemia in man.27 Nilotinib treatment resulted in stabilized kidney function and prolonged survival after subtotal nephrectomy in rats when compared to vehicle 28. Finally, narciclasine was identified and it has been reported to reduce macrophage infiltration and inflammation in the mouse unilateral ureteral obstruction (UUO) model of kidney fibrosis 29.
To further explore the association of these drugs with CKD and its’ progression, we analysed the expression data for the targets of the literature supported drug candidates. First, each drug candidate was mapped to genes that encode the proteins targeted by these drugs (Figure 6B). For each gene, its differential expression of any CKD entity against TN was evaluated. Out of the 11 mapped genes, MYLK3, a target of narciclasine, was significantly differentially expressed (under-expressed, logFC<-1, p<0.05) in two CKD entities (IgAN and LN) (Supplementary figure 6). Complementarily, screened drugs were mapped to the pathways they affect based on their functional information. The enrichment of the subset of pathways was evaluated using the previous results from gene set analysis algorithm (piano). This time, only the PI3KCI pathway appeared to be enriched in HN (up-regulated, p<0.05), and as pathway affected by the candidate repositioned drugs (Wortmannin, PI3K inhibitor). Taken together, this data suggests that kidney transcriptomics might be useful to predict potential novel drug candidates.
3. Discussion
We have aimed to shed light on the commonalities and differences among glomerular transcriptomes of major kidney diseases contributing to the CKD epidemic affecting >10% of the population worldwide. Multiple pathologies are covered under the broad umbrella of CKD and, while they share a physiological manifestation, i.e. loss of kidney function, the driving molecular process can be different. In this study we explored these processes by analyzing glomerular gene expression data from kidney biopsies obtained via microdissection. We observed expression data of many genes that are considered to be tubule specific in the glomerular dataset e.g. ALB and CALB1. The reason for this might be that microdissection techniques are imperfect, resulting in contamination of glomerular preps with tubule. Current technologies including scRNA-seq will help to dissect expression in particular cell-types of the glomerulus.
Genes such as Quaking (QKI) or Lysozyme C (LYZ), were significantly overexpressed, underexpressed or not altered depending on the underlying kidney disease. It is known that QKI is associated with angiogenic growth factor release and plays a pathological role in the kidney 30, while LYZ was known to be related to the extent of vascular damage and heart failure but was recently found to be increased in plasma during CKD progression 31. This data supports the fact that despite a stereotypic response of the kidney to injury with glomerulosclerosis, interstitial fibrosis and nephron loss, there are various disease specific differences that are important to understand in order to develop novel personalized therapeutics.
CKD is a complex disease with a high degree of polygenicity. Furthermore, it is a very heterogeneous condition that can be acquired through a variety of biological mechanisms which is reflected by the results of pathway analysis. There was little to no overlap in significantly enriched pathways between the different kidney disease entities. We found 59 different pathways that showed significant enrichment in at least 3 disease entities (Figure 5B), indicating that different disease entities share some general mechanisms but their underlying pathophysiology differs from one entity to another. Besides increasing the interpretability, the pathway analysis identified many more differences among disease-identities than the gene-level analysis (Figure 2A). For example, pathway analysis identified pathways related to the metabolism of lipids and lipoproteins significantly down-regulated in MCD, MGN, and HN; and pathways related to fatty acid metabolism significantly down-regulated in MCD, IgAN, MGN, and HN, results similar to those reported by Kang et al 6.
PROGENy (Figure 5A) yielded JAK-STAT, a major cytokine signal transduction regulator 32, to be significantly activated in LN with respect to TN and DoROthEA (Figure 3) predicted the TFs IRF1 and STAT1 to be significantly enriched in LN. A pathogenic role of JAK-STAT/STAT1/Interferon signaling in LN is supported by various studies 33,34,35.
We also used the signature-matching paradigm to explore potential drugs that could revert the disease phenotype, and found that four drugs hold promise in different CKD entities. Even though more experimental validation is required for the unknown medical interaction between drugs of our results and CKD progression, our approach suggests that it is possible to find promising treatments for CKD via drug repositioning. In particular, for one of the identified drugs, nilotinib, use in humans has already been granted in leukemia and there is supporting data of its value insight at indications for CKD 28.
The analysis of the drug targets’ expression found that MYLK3, a gene encoding for one of the targets of narciclasine, was significantly underexpressed in IgAN and LN when compared with TN. Similarly, the PI3KCI pathway, the target of Wortmannin was enriched in HN (up-regulated, p<0.05). This analysis attempted to refine the outcome of the repositioning analysis, and at the same time helped to connect it to the disease mechanism both at the gene as well as the pathway level.
We view our analysis as a first step towards a characterization of the similarities and differences of the various pathologies that lead to CKD. As more data sets become available, either from micro-arrays or RNA-seq, these can be integrated in our pipeline. Furthermore, the burgeoning field of single-cell RNA (scRNA) has just started to produce data sets in kidney 36,37 and holds the potential to revolutionize our understanding of the functioning of the kidney and its pathologies 38,39. In particular, scRNA data can provide signatures of the many cell types of the kidney, which in turn can be used to deconvolute the composition of cell types12 in the more abundant and cost-effective bulk expression datasets 39. Other data sets, such as (phospho)proteomics40 and metabolomics41, may complement gene expression towards a more complete picture of the CKD-entities. Ideally, all these data sets would be collected in a standardized manner to facilitate integration, which was a major hurdle in our study. Such a comprehensive analysis across large cohorts, akin to what has happened for the different tumour types thanks to initiatives such as the International Cancer Genome Consortium, can lead to major improvements in our understanding of and treatment venues for CKD 42.
4. Methods
4.1. Data collection
Raw data CEL files of each microarray dataset - GSE20602 10; GSE32591 11; GSE37460 11; GSE47183 12,13; GSE50469 14 - were downloaded and imported to R (R version 3.3.2). For more information see Supplementary Methods.
4.3. Normalization
Cyclic loess normalization was applied using the limma package 16,43,44. YuGene transformation was carried out using the YuGene R package 45.
4.4. Detection of genes with consistently small p-values across all studies
Based on the assumption that common mechanisms might contribute to all CKD entities we performed a Maximum p-value (maxP) method 46 - which uses the maximum p-value as the test statistic - on the output of the differential expression analysis of the hypothetically separate studies. For more information see Supplementary Methods.
4.6. Diffusion map
The batch mitigated data containing merely the maxP identified (section 4.5.) 1790 genes (Supplementary Material/Data and Code) (FDR < 0.01), were YuGene transformed 45 and the destiny R package 47 was utilised to produce the diffusion maps.
4.7. Functional Analysis
4.7.1. Transcription factor activity analysis
We estimated transcription factor activities in the glomerular CKD entities using DoRothEA18 which is a pipeline that tries to estimate transcription factor activity via the expression level of its target genes utilizing a curated database of transcription factor - target gene interactions (TF Regulon). For more information see Supplementary Methods.
4.7.2. Inferring Signaling Pathway Activity fusing PROGENy
We used the cyclic loess normalised and batch effect mitigated expression values for PROGENy 22, a method which utilizes downstream gene expression changes due to pathway perturbation in order to infer the upstream signaling pathway activity. For more information see Supplementary Methods.
4.7.3. Pathway Analysis with Piano
Pathway analysis was performed using the piano package from R 23. For more information see Supplementary Methods.
4.8. Drug repositioning
For each CKD entity, the signature of cosine distances computed by characteristic direction was applied to a signature-search-engine, L1000CDS2,24 with the mode of reverse in configuration.
4.9. Immunofluorescent staining of human kidney biopsies and analysis
Validation involving human kidney biopsies was approved by the local ethics committee at Karolinska Institutet (Dnr 2017/1991-32). Stainings were performed on 2 μm paraffin-embedded sections as previously described 48. For more information see Supplementary Methods.
Disclosure
The author declare that there is no conflict of interest regarding the publication of this article.
Data and Code
Acknowledgements
This work was supported by the the JRC for Computational Biomedicine which was partially funded by Bayer AG, the European Union Horizon 2020 grant SyMBioSys MSCA-ITN-2015-ETN #675585 that provided the financial support for A.A and by Grants of the German Research Foundation (KR-4073/3-1, SCHN1188/5-1, SFB/TRR57, SFB/TRR219 TPC01 and C05) a Grant of the European Research Council (ERC-StG 677448), and a Grant of the State of Northrhinewestfalia (Return to NRW) to RK.