Abstract
Epigenetic landscapes can shape physiologic and disease phenotypes. We used integrative, high resolution multi-omics methods to characterize the oncogenic drivers of esophageal squamous cell carcinoma (ESCC). We found 98% of CpGs are hypomethylated across the ESCC genome and two-thirds occur in long non-coding (lnc)RNA regions. DNA methylation and epigenetic heterogeneity both coincide with chromosomal topological alterations. Gene body methylation, polycomb repressive complex occupancy, and CTCF binding sites associate with cancer-specific gene regulation. Epigenetically-mediated activation of non-canonical WNT signaling and the lncRNA ESCCAL-1 were validated as potential ESCC driver alterations. Gene-specific cancer driver roles of epigenetic alterations and heterogeneity are identified.
Main Text
Epigenetic regulation is an important determinant of many biological phenotypes in both physiologic and pathophysiological contexts 1. However, epigenetic forces shaping the evolution of complex diseases such as cancer remain incompletely defined. Esophageal cancer is the sixth leading cause of cancer-related death and the eighth most common cancer worldwide 2. In China and East Asia, ESCC is the most prevalent pathohistological type of esophageal cancer 3. Comprehensive analysis by whole-genome and whole exome sequencing uncovered the genetic landscape of ESCC 4–9 and multi-region whole-exome sequencing revealed intra-tumor genetic heterogeneity in ESCC 10. This intra-tumor genomic heterogeneity could serve as a prognostic predictor in esophageal cancer 11 and as a potential foundation for improved treatment. Notable and frequently mutated epigenetic modulator genes in ESCC include KMT2D, KMT2C, KDM6A, EP300 and CREBBP, and epigenetic perturbations might interact with other somatic genomic alterations to promote ESCC. The interplay between epigenetic perturbations and other somatic genetic alterations may have a critical role during ESCC tumorigenesis 4.
The Cancer Genome Atlas research group (TCGA) identified ESCC-related biomarkers at a multi-omics level (genomic, epigenomic, transcriptomic, and proteomic) and pinpointed 82 altered DNA methylation events, along with altered transcriptional targets genomic alterations 9. While genomic and transcriptomic-level studies of ESCC produced valuable biological discoveries and resources, the single-nucleotide resolution of the epigenetic landscape of ESCC, and of most other cancers, at the whole genome level remains poorly studied. This knowledge gap is due to the comparatively high cost, computational complexity, and technical challenges of capturing genome-wide and single-nucleotide resolution of the epigenetic landscape. Consequently, an integrative and causal analysis across orthogonal multi-omics datasets remains incomplete.
We addressed this challenge by using an integrated multi-omics study that includes whole genome bisulfite sequencing (WGBS), whole genome sequencing (WGS), whole transcriptome sequencing (RNA-seq) and proteomic experiments on a cohort of ESCC samples and their adjacent non-tumor esophageal tissues along with orthogonal analysis and validation using the large TCGA-esophageal cancer (ESCA) dataset. Our goal was to understand the extent and complexity of epigenetic heterogeneity in DNA methylation alterations and consequent dysregulation of both protein coding and non-coding gene expression.
Results
Whole genome bisulfite sequencing reveals the epigenetic landscape and heterogeneity in ESCC
Different types of cancers exhibit unique epigenetic alterations, particularly in the DNA methylome 12–14. We initially collected ten pairs of primary ESCC samples and their adjacent non-tumor tissues (Supplementary Fig.1), performed WGBS with over 99% of a bisulfite conversion ratio, and generated a mean 15x sequencing depth per sample (Supplementary Table 1, Supplementary Fig. 2). Over 99% of CpG dinucleotides were covered and ∼95% of CpGs were reliably mapped by more than five reads. To ensure the quality of data, bisulfite converted sequencing reads were aligned with TCGA-ESCA Human Methylation 450K (HM450K) array (Supplementary Fig. 3a) and showed strong concordance in all normal and tumor samples (Pearson r = 0.9644, p-value < 0.01, Supplementary Fig. 3b); the coefficient for WGBS-ESCC versus TCGA-ESCC, or TCGA-EAC (esophageal adenocarcinoma) was 0.7570 and 0.5554, respectively (Supplementary Fig. 3c, 3d). DNA methylation at non-CpG contexts was present in less than 0.5% in our samples.
More than 5 million differentially methylated cytosines (DMCs) were identified using a one-way ANOVA test (FDR < 5%) (Fig. 1a). Among them, 57.5% were located at known annotated regions, 42.5% were located at unannotated regions of the genome (Supplementary Fig. 4a). Methylation loss in cytosines in ESCC accounted for 97.3% of the DMCs and was mostly confined to intergenic regions of the genome. Only 2.7% of the DMCs were gains of methylation in ESCC (proportional test for hyper- and hypo-methylation, p value < 2.2e-16, Fig. 1b) and 83.67% of them mapped to gene bodies, promoters, and enhancers with RefSeq annotation (Supplementary Fig. 4b). Of the hypomethylated DMCs in ESCC, 63.08% were mapped to lncRNA regions with ENCODE annotation (v27lift37), whereas 58.01% of hypermethylated DMCs in ESCC were dispersed in antisense RNA regions of the genome (Supplementary Fig. 5a and 5b). These DMCs clearly discriminate normal tissues from tumor tissues, as measured by unsupervised Principal Component Analysis (PCA) (Fig. 1c), similar to unsupervised transcriptome-mediated clustering of normal and tumor samples (PCA in Supplementary Fig. 6a and Dendrogram in 6b). In the larger sample set (n=202) of TCGA-ESCA, the differentially methylated CpG probes present in the lower resolution Illumina HM450K array were also able to discriminate normal and tumor and even subtypes of esophageal cancer using t-distributed stochastic neighbor embedding (t-SNE), a nonlinear dimensionality reduction algorithm (Fig. 1d). The data suggest that alterations in DNA methylation can characterize the biological features of cellular states of physiologic and pathophysiologic phenotypes.
DNA methylation heterogeneity has been observed in other cancer types 14,15 and stochastically increasing variation in DNA methylation appears to be a property of the cancer epigenome 16. The clinical significance of such inferences remains unclear. We found a higher variance of altered methylation in ESCC (275.76 ± 204.01) compared with normal samples (95.67 ± 112.38, two sample t-test p-value ≈ 0) in our cohort as well as in the TCGA-ESCC cohort (p < 2.2e-16) (Supplementary Fig. 6c). As a further measurement of the level of epigenetic variance, we calculated Shannon’s entropy of methylation levels at each CpG locus. We observed increased entropy in ESCC compared with normal samples (two sample t-test p-value ≈ 0) (Fig.1e), and this is consistent with the increase in stochastic ‘noise’ (heterogeneity) in tumors. Our simulation using the Euler-Murayama method 17 also reflected increased DNA methylation heterogeneity in ESCC (Supplementary Fig. 6d). Using the independent TCGA-ESCC clinical cohort, we stratified patient samples by their variance of methylation level and found that the group with a higher variance (N=49) of methylation levels showed a worse overall survival time (hazard ratio=2.9, 95% confidence interval (1.1∼ 7.5), p-value < 0.05) (Fig. 1f). This provides potential clinical relevance of the epigenetic heterogeneity that we uncovered in ESCC.
Differentially methylated regions (DMRs) associate with alterations of genome topology and a global abnormal functional annotation of the ESCC transcriptome
We further defined 299,703 DMRs (p value <= 0.05, FDR < = 0.05) between tumor and normal tissues, resulting from a CpG density peak of 4% and a DMR peak size of 200-400 base pairs (bp) (Fig. 2a, Supplementary Fig. 7a and 7b). Only 1.8 % of these DMRs are hypermethylated, while 98.2 % of DMRs are hypomethylated (proportional test, p-value < 2.2e-16) in tumors relative to normal tissues. DMRs in regions of −2990bp ∼ +6990bp appear hyper-methylated while gene bodies, intergenic, and non-coding regions are in general hypomethylated in tumors (Fig. 2b, Supplementary Fig. 7c,7d). The occupancy of each transcription factor (TF) binding consensus varies in the genome (161 TF binding sites from ENCODE), with POLR2A (5.23%) and CTCF (3.55%) ranking at the top (Supplementary Fig. 8a, Supplementary table 2). We searched CpG content in these TF binding sequence and the top 20 TFs affected by methylation alterations in consensus binding sites were identified. Notably, the Polycomb Repressor Complex 2 (PRC2) subunits SUZ12 and EZH2 binding sites were substantially affected by hypermethylation in the CpGs (Supplementary Fig. 8b-8d). These observations indicated the possibility of a paradoxical activation mechanism for PRC2 target genes through loss of PRC2 occupancy in gene promoters in tumor cells.
The DMRs were distributed mostly (>6%) in Chromosome (Chr)8, Chr19 and Chr20 after normalization to chromosome size (Supplementary Fig. 9a) and DMRs are enriched (>20%) in gene promoters at Chr19 (Supplementary Fig. 9b). We integrated the most significant DMRs with all CpGs, CpG island, chromatin state, and potential TF binding data using the ENCODE dataset 18. We observed Chr8 harbors three large genomic regions with unique DMR patterns and these regions contain the SOX17, RGS22, and ESCCAL-1 (CASC9) gene loci, respectively. For example, around the gene of SOX17 (Chr8:55,360,000-55,400,000), CpG island regions were hyper-methylated but CpG shore regions were hypo-methylated (Fig. 2c). Two CpG islands with significant hyper-methylation upstream of the SOX17 gene were observed but there is no association with low gene expression (p-value = 0.668, Fig. 2d). In the region 100,650,000-101,190,000) of Chr8, hypo-DMRs covered all of the gene body of RGS22 (regulator of G protein signaling) (Supplementary Fig. 10), which is a putative tumor suppressor 19. The region around the lncRNA ESCCAL-1 (Chr8:76,130,000-76,240,000), which was previously identified by us 20, contained significantly hypo-methylated DMRs in its promoters and we further investigated the uncharacterized biological function of this lncRNA later in this study.
A link between hypo-methylated blocks, variable gene expression, and large heterochromatin domains such as Large Organized Chromatin lysine (“K”) modification (LOCK) or lamina-associated domains (LAD) was previously reported in cancer 21. Nevertheless, the relation between significant DMRs and Topologically Associating Domains (TADs), or self-interacting genomic regions 22 is largely uncharacterized in ESCC. Genome-wide chromosome conformation capture followed by massively parallel DNA sequencing (Hi-C) showed increased TAD abundance and reduced TAD size in the ESCC cancer genome relative to the normal genome (Supplementary Fig.11a). The interactions between chromosomes was altered during ESCC tumorigenesis (Supplementary Fig.11b). Closer interactions between Chr16 through Chr22 were observed in ESCC compared with normal esophageal cells (Fig. 2e and 2f). Using the Hi-C data, we inferred two compartments: open euchromatin of transcriptionally active states (compartment A) and closed chromatin of transcriptionally silent states (compartment B) 22. We observed 22.61% of the compartment shift (A→B or B→A) during ESCC tumorigenesis (Supplementary Fig. 11c). The A→ B shifted regions contain 0.5∼9% DMRs with Chr10 and Chr19 showing the most DMR occupancy (Supplementary Fig.12a-d). In contrast, the B→A shifted regions have 0.5∼20% DMRs with a higher percentage of DMR in Chr3 (Supplementary Fig.13a-d). This suggests that a functional link exists in ESCC between DMR alterations and shifts in genomic architecture.
Significant methylation changes were identified by WGBS at 5085 gene promoter regions (−4500bp ∼500bp relative to a TSS). Gene set enrichment analysis (GSEA) analysis of these target genes harboring promoter hypomethylation indicated an over-representation of WNT/β-catenin signaling, whereas gene promoters harboring hyper-methylation were enriched for KIT signaling genes (Supplementary Fig. 14a). In parallel, GSEA analysis for differentially expressed genes (DEGs) from the RNAseq dataset indicated enrichment for genes regulating cell cycle pathways and metallopeptidase activity (Supplementary Fig. 14b). Hence, the DMRs associated with alterations of genome architecture appear to shift the gene regulatory networks during ESCC tumorigenesis.
Aberrant DNA methylation in promoter regions mediates transcriptional dysregulation in ESCC
DNA methylation at regulatory regions influence transcript expression levels 23. From our WGBS analysis, we identified 5085 promoter regions (−4500bp ∼ +500 bp to TSS) of coding and non-coding genes whose CpGs were significantly differentially methylated (FDR < 0.01) and focused on the 4768 significantly differentially expressed transcripts in ESCC relative to the adjacent normal tissues from the RNA-seq dataset. We then identified 694 genes that showed significant differential methylation alteration in promoters and concomitant dysregulation of gene expression (Fig. 3a). The genes were systematically classified into four distinct clusters (denoted as C1, C2, C3 and C4) according to methylation (met) and gene expression (ge) pattern. C1(HmetLge) showed hypermethylated promoters with decreased ge; C2(LmetHge) showed hypomethylated promoters with increased ge; C3(HmetHge) contained hypermethylated promoters with increased ge; C4(LmetLge) denoted hypomethylated promoters with decreased ge. C1 and C2 followed the well-documented canonical model, showing anti-correlation in promotor methylation and gene expression 13; in contrast, genes in C3 and C4 showed a non-canonical pattern in that promotor methylation and gene expression were positively correlated (Fig. 3b, 3c, Supplementary Fig. 15, Supplementary table 3). Among the 694 genes, only 1.5% of them harbor non-synonymous mutations from our selected cases of WGS (Supplementary Fig. 16) and no copy number changes of these genes were inferred from RNA-seq (Supplementary Fig. 17). Therefore, the majority (98.5%) of these dysregulated genes in cluster C1∼C4 appear to occur via epigenetic dysregulation (epimutation) 24. This phenomenon is also seen in the independent TCGA-ESCC (n=96) sample cohort, by analysis of the available multi-OMICs dataset (Supplementary Fig. 18a-d).
The underlying mechanisms of the divergent regulation of gene expression are complex and involve DNA methylation, chromatin remodeling, and DNA accessibility 25. We explored a potential explanation for the non-canonical patterns in C3 and C4 across epigenetic regulatory features. First, we cross-referenced 13,000 TCGA-ESCA chromatin accessible regions as determined by ATAC-seq 25 to the regulatory regions of the 694 genes. Accessibility in promoter regions was highly associated with gene expression in C2 and C3 (Supplementary Fig. 19a, b). Second, methylation in defined promoter regions and in gene bodies showed a differential phenotype between C1 and C3 (but not between C2 and C4): methylation levels in gene bodies were higher in C3 (−5.4587 ± 26.3450) than C1 (−26.8551 ± 16.4716, p < 0.05) (Fig. 3d). Third, hyper-methylation at cohesion and CCCTC-binding factor (CTCF) binding sites could compromise binding of this methylation-sensitive insulator protein and result in gene activation 26. Thus, we searched for CTCF binding sites within promoter regions of the 694 genes and observed that the CTCF binding sites were enriched in C3 (Fig. 3e), which could partially explain the phenotype of high promoter methylation and high gene expression. Fourth, the compartment shift regions inferred from Hi-C data showed that 53.24% of the genes in C3 shifted from a closed state to an active state (Supplementary Fig. 20). The data also indicated that the promoters of genes in C4, despite being hypomethylated in the tumor, were inaccessible. This highlights both impotence of accessibility and absence of methylation as linked features of the gene expression pattern in the C4 cohort. Gene enrichment analysis of the 694 genes was performed using multiple databases (KEGG 27, WikiPathways 28, ENCODE 18, ChEA29) and showed PRC2 subunits (EZH2 and SUZ12)-mediated polycomb repressive gene sets were enriched in the non-canonical clusters C3 and C4 (Fig. 4a). We searched for ENCODE-defined EZH2 and SUZ12 binding sites across gene promoters in C1-C4 and observed that EZH2 occupancy was enriched in C3 (1.5970 ± 1.2316) and C4 (0.6000 ± 0.7684) compared with C1(0.9167 ± 0.8464) and C2 (0.2336 ± 0.5870), respectively (p-value < 0.001) (Fig. 4b). SUZ12 occupancy was higher only in C3 gene promoter regions (1.5522 ± 1.7946), (Fig. 4c). To understand the functional mechanism that is responsible for differential methylation at target gene promoters, we performed unsupervised hierarchical clustering of ENCODE-defined known TF binding sites in the DMRs of the 694 gene promoters. The analysis showed that EZH2 binding sites were enriched in genes in C3 compared to other clusters (Fig. 4d, Supplementary Fig. 21). In addition, we performed a correlation analysis between gene expression difference (expression fold change) and corresponding promoter methylation level difference (methylation Δ) of genes in C3. WNT2 was identified as the top non-canonical gene in C3 (Fig. 4e). The collective data show the non-canonical gene expression pattern (C3) appears to arise via de-repression of the EZH2-mediated suppressor effects on promoter regions of genes in C3 to increase gene expression, which we experimentally validate later.
DNA methylation gain at the promoter region activates WNT2/β-catenin pathway in ESCC
Epigenetics dysregulation of the components of MAPK, AKT and WNT pathway can promote aberrant activation of these pro-growth pathways in ESCC 30. We extracted known components of these genes from published literature 30,31 ,32 and compared their gene expressions between tumor and normal samples. The gene expression analyses of the component genes in MAPK, AKT and WNT pathway identified only WNT2 in WNT pathway was significantly highly expressed in the tumor samples compared to normal samples (Supplementary Fig. 22a, b). These data indicate selective and specific upregulation WNT2 in ESCC tumors through a putative non-canonical epigenetic regulatory mechanism. WNT2 belongs to the structurally related WNT family of genes that functions as secretory ligands for the WNT signaling pathway 33. Canonical WNT signaling pathway results in stabilization of the transcriptional co-regulator β-catenin and subsequent upregulation of downstream target genes 33.
To gain mechanistic insight into the epigenetic regulation of WNT2 promoter, we queried our transcription-factor target gene hierarchical clustering analysis for genes in C3 and found that the EZH2 binding site, along with SUZ12 binding sites, are present at the WNT2 promoter region compared to other transcription factors (Fig. 4d). EZH2 and SUZ12 are subunits of PRC2, which has histone methyltransferase activity to primarily tri-methylate histone H3 on lysine 27 (H3K27me3) 34. We also found that the EZH2 biding site at WNT2 promoter overlap with the hyper CpG methylation sites at WNT2 promoter region in cancer cells (Figure 5a).
WNT2 promoter region (Chr7: 116,960,000-116,965,000) was hypermethylated but paradoxically WNT2 gene expression was increased in tumors (Supplementary Fig. 23 from our WGBS dataset, Supplementary Fig. 24 from TCGA-dataset). We reasoned that de-repression of EZH2 occupancy may cause non-canonical methylation-mediated activation of WNT2 gene expression in ESCC, we validated EZH2 occupancy on the differentially methylated regions of WNT2 promoter by performing Chromatin ImmunoPrecipitation sequencing (ChIP-seq) in normal immortalized esophageal epithelial cells (Het-1A) and the patient-derived ESCC cell line, EC109. The ChIP-seq analysis showed EZH2 binding peaks at WNT2 promoter region in normal cells compared to minimal binding peaks in the ESCC cells (Fig. 5a). Furthermore, we confirmed the promoter region of WNT2 was hypermethylated in three esophageal cancer cell lines (EC9706, EC109, and EC1) while no methylation was detected in normal esophagus epithelial cells (Het-1A) (Fig. 5b). In addition, WNT2 mRNA and protein expression was also higher in independent ESCC samples (Fig. 5c, 5d).
To identify downstream effector genes of WNT/β-catenin signaling that might promote ESCC, the GSEA of differential expression from proteomic data and RNAseq data found extracellular matrix organization (Supplementary Fig. 25) and extra-cellular metalloproteins MMP3 and MMP9 (known β-catenin targets) 35 gene set (Supplementary Fig. 26) are enriched in tumor samples. We validated that both MMP3 and MMP9 transcripts and proteins were highly expressed in ESCC relative to normal tissues (Fig. 5e). To test whether WNT2-mediated signaling was required for tumor cell growth, we suppressed WNT2 expression using two independent short interfering (si)RNAs in two different patient-derived ESCC cell lines (EC9706 and EC109). WNT2 knockdown significantly inhibited ESCC cell growth (p-value < 0.01) (Fig. 5f). These data place WNT2 as an essential gene for ESCC cancer cell growth. Furthermore, since MMPs are known downstream targets of WNT/β-catenin signaling activation, we tested the effect of WNT2 knockdown on MMP3 and MMP9 expression. We found baseline WNT2 expression in ESCC cell lines to be significantly higher than normal cell lines (Supplementary Fig. 27a). Furthermore, WNT2 knockdown decreased MMP3 and MMP9 expression (Fig. 5g). Since matrix metalloproteinases (MMPs) can promote tumor invasion and metastasis 36, we next tested whether WNT2 knockdown abrogates the migratory and invasive potential of ESCC tumor cells. In two patient-derived ESCC cell lines (EC9706 and EC109), silencing of WNT2 significantly reduced cellular invasion and migration (Fig. 5h, p-value < 0.01, Supplementary Fig. 27b, c, d, e, f). We also found that β-catenin protein expression encoded by CTNNB1 gene is significantly higher in ESCC tumors (Supplementary Fig. 27g). Knockdown of WNT2 remarkably reduced protein level of β-catenin (Supplementary Fig. 27h). These data show that a WNT2/β-catenin/MMP3/9 signaling axis was not only required for tumor cell growth, but also for tumor cell migration and invasion in ESCC. Taken together, our data demonstrate a novel non-canonical mechanism for increased WNT2 expression in the absence of EZH2-PRC2 occupancy of the WNT2 promoter with hypermethylated CpG. The data delineate specific downstream targets of WNT2-mediated signaling and their functional consequences in ESCC (Fig. 5i).
Epigenetic activation of the lncRNA ESCCAL-1 is a novel ESCC cancer driver gene
Increasing evidence indicates dysregulation of lncRNAs during cancer progression and metastasis, but the mechanisms of dysregulation and of action of lncRNAs in cancer are relatively poorly understood 37. Our WGBS analysis revealed that hypomethylated DMCs are significantly enriched in genomic regions harboring annotated lncRNAs (Fig. 6a). In the canonical cluster C2, gene expression was strongly anti-correlated with gene promoter CpG methylation levels (Fig. 6b). Previously, we showed that the lncRNA ESCCAL-1 was overexpressed in ESCC 20, and overexpression of its ESCCAL-1 has been reported in other cancer types 38–41. The mechanism underling ESCCAL-1 upregulation in cancer is unknown. We found ESCCAL-1 is one of the most notable candidates for DNA methylation loss-mediated increased gene expression in C2 (Fig. 6b). One DMR in the promoter of ESCCAL-1 showed decreased CpG methylation in cancer, leading to increased transcription of lncRNA ESCCAL-1 (Fig. 6c) 42. There is no mutation or copy number variation of ESCCAL-1 reported or observed in TCGA-ESCA genomic dataset or in our WGS data of three ESCC patients (Fig. 6c, top panel). Independent verification of the methylation status of the ESCCAL-1 promoter region showed 62.5% (20/32) hypomethylation in ESCC tumors versus 71.8% (23/32) hypermethylation in adjacent normal tissues, chi square test p-value < 0.01 (Fig.6e, f). In agreement, ESCCAL-1 expression was significantly higher in ESCC compared to adjacent normal tissues (p-value = 0.00113, FDR <0.05) (Fig. 6d from RNAseq). We corroborated these observations by analysis of an independent cohort of 73 ESCC tissues relative to their normal counterparts (Fig. 6g). We also noted a hypermethylated ESCCAL-1 promoter region in normal esophageal cells (Het-1A), whereas methylation was not detected in three ESCC cell lines (EC1, EC109 and EC9706) (Fig. 7a). ESCCAL-1 expression was substantially overexpressed in ESCC cell lines compared with normal cells Het-1A (Fig. 7b). Furthermore, increased expression of ESCCAL-1 was a biomarker of worse overall survival time and progression-free survival time in ESCC patients (Fig. 7c, d). Knockdown of ESCCAL-1 reduced growth of patient-derived ESCC cells in vitro (Supplementary Fig. 28) and in vivo (Fig. 7e, f).
To identify a possible mechanism of ESCCAL-1 upregulation, we examined sequence motifs of known TFs in the ESCCAL-1 hypomethylated promoter region and found a predicted binding site for YY1 from the ENCODE project. YY1 is a TF belonging to the GLI-Kruppel class of zinc finger proteins and contributes to tumorigenesis 43. Using ChIP-PCR, we validated YY1 binding at the hypomethylated promoter region of ESCCAL-1 (Fig. 7g). siRNA knockdown of YY1 expression led to decreased expression of ESCCAL-1 (Fig. 7h), indicating YY1 transcriptionally regulates ESCCAL-1 in ESCC.
Since the downstream mechanism of ESCCAL-1’s contribution to ESCC pathogenesis is not clear, we performed a “guilty-by-association” co-expression analysis using RNAseq from ten pairs of normal and tumor samples. The ESCCAL-1 related gene expression modules are enriched in cell cycle pathways, RNA binding and the Myc pathway (Fig. 7i and Supplementary Fig. 29a, b). In order to explore the causative roles of ESCCAL-1 in ESCC progression, we conducted RNAseq in EC9706 cells with ESCCAL-1 knockdown by shRNA (or with shControl). We hypothesized that depletion of ESCCAL-1 could reverse the phenotype of cells alike to a normal cell state, thus we also use RNAseq from a normal cell line, Het-1A. We identified the 210 significantly differentially expressed genes (DEGs) between shControl expressed EC9706 and shESCCAL-1 expressed EC9706 and Het-1A (normal) (gene list in Supplementary table 4) using an iterative clustering approach 44. Gene enrichment analysis on these identified DEGs exhibit an enrichment of “RNA binding”, “ribosomal proteins”, and Myc target gene sets (Fig. 7j, Supplementary Fig. 30). These results indicate that ESCCAL-1 participates in the biological process of Myc-mediated regulation of genes, which extends current knowledge on the potential role of Myc signaling in ESCC 45. Thus, beyond WNT2-mediated WNT pathway activation, other aberrant signaling pathways activated by ESCCAL-1 upregulation also contribute to ESCC tumorigenesis. Therefore, epigenetic dysregulation promotes ESCC through divergent and multi-factorial mechanisms (diagram in Fig. 8).
Discussion
The development of ESCC is a complex dynamic biological process that involves multiple steps of genetic and epigenetic alterations. Numerous genetic studies of ESCC at whole genome and exome levels revealed recurrent genetic alterations and related altered pathways such as cell cycle, p53, and AKT/mTOR signaling pathways and Hippo signaling pathways 4,6,8,46,47. It remains unclear in ESCC and most other cancer types whether and how the epigenetic landscape contributes to cancer pathogenesis. We performed WGBS, RNA-seq, and proteomic analyses on matched normal and tumor samples along with analysis of TCGA-ESCA datasets and Hi-C sequencing on esophageal normal and tumor cell lines. We observed global hypomethylation (98%) and local hypermethylation across the ESCC genome, consistent with previous studies in colon cancer and other types of cancers 12,48. The DMCs alone can discriminate cellular states between tumor and normal conditions, and histological subtypes of esophageal cancer. DNA methylation is a defining feature of cellular identity and is essential for cell development 49. Hansen et al. identified cancer-specific differentially methylated regions in colon cancer and showed that such stochastic methylation variations distinguish cancer from normal 13 and can be a potential epi-biomarker for early tumor diagnosis or a predictive epi-biomarker for therapeutic outcome. We revealed the heterogeneity of DNA methylation alteration is greater in ESCC relative to normal esophageal tissues and, was a biomarker of inferior clinical outcome. Our findings provide new insight into the potential clinical relevance of epigenetic dysregulation and heterogeneity as a molecular biomarker of clinical outcome in cancer.
We validated a prominently epigenetically altered coding and noncoding gene from the non-canonical cluster (C3) and canonical cluster (C2), respectively, that we defined. The WNT pathway appears to be epigenetically regulated via inactivation of negative regulators (SFRP1/2/4/5 and WIF1) in ESCC 30. Our data identified high WNT2 expression in ESCC, along with a highly methylated promoter region. HM450K methylation array and RNA-seq of TCGA ESCA analysis showed high promotor methylation and high gene expression of WNT2. The collective data suggest EZH2-mediated PRC2 repression of WNT2 expression in normal cells. By contrast, we show hypermethylation-mediated de-repression of WNT2 activates the WNT pathway in ESCC. Our data provide new insight into the mechanism of epigenetic dysregulation via non-canonical gene expression regulation in cancer and new insight into the underlying molecular events promoting WNT pathway activation in ESCC.
LncRNA dysregulation is an emerging but poorly understood feature of oncogenesis 37. We reported ESCCAL-1 overexpression in ESCC 20. Additional studies showed that this lncRNA is overexpressed in multiple types of cancers 38–40. Overexpression of this lncRNA promotes cancer cell growth 50, invasion51 and metastasis 52. Here, we discovered that methylation loss of its promoter is a principle molecular mechanism of ESCCAL-1 dysregulation in ESCC. We show that ESCCAL-1 has an epigenetically-mediated causal role in tumor growth and is a biomarker of worse survival of ESCC patients. Interestingly, overexpression of ESCCAL-1 is also related to other types of cancer and it is correlated to drug resistance in lung cancer 38. Whether ESCCAL-1 is similarly regulated via epigenetic mechanisms in other cancer types beyond ESCC remains to be investigated in future studies. Nevertheless, suppressing ESCCAL-1 expression, potentially using anti-sense RNA 53 or CRISPR-based strategies 54, may be a promising therapeutic approach in ESCC and other cancer types.
Our study provides a rationale and a roadmap for delineating the landscape and functional roles of epigenetic dysregulation in cancer at genome-wide high resolution. Further analysis will be required to fully understand the impact of epigenetic dysregulation and heterogeneity on various cancer-associated phenotypes and treatment responses 14. Multi-regional WGBS or single cell DNA bisulfite sequencing could facilitate addressing this opportunity in the future.
Author contributions
W.C and W.W conceived the project, W.W, H.L, A.Z, S.M performed WGBS, RNA-seq and Proteomic data analysis and construct the figures and manuscript. J.C and N.S.A helped WGBS, multi-omics analyses. G.S performed simulation, A.H.S conducted SNV analysis from RNA-seq, Q.H.X, Y.B.C, J.L, H.G.G, P.J.L, X.Y.S, L.S, P.L.H, and Y.N.L performed experiments. J.W.W and M.Y collected, processed specimen as well as performed some experiments. H.G.X, W.X.T, J.C, J.C.G, and Y.C.G performed Whole Genome Sequencing (WGS), Whole Genome Bisulfite Sequencing (WGBS), Whole Transcriptome Sequencing (RNA-seq), Isobaric Tag for Relative and Absolute Quantitation (iTRAQ) and Hi-C sequencing. F.G.B, E.C helped to discuss the results, T.G.B and M.S guided, discussed and edited the manuscript.
Competing interests
T.G.B is an advisor to Revolution Medicine, Novartis, Astrazeneca, Takeda, Springworks, Jazz, Array Biopharma and receives research funding from Revolution Medicine and Novaritis.
Acknowledgements
This study was supported by the National Natural Science Foundation of China (Grants 81171992, 31570899), the Natural Science Foundation of Henan (Grants 182102310328, 162300410279, 182300410374, 192102310096); the Education Department of Henan Province(18B310022,19A320037). This work used the Genome Sequencing Service Center by Stanford Center for Genomics and Personalized Medicine Sequencing Center, supported by the grant award NIH S10OD020141. T.G.B acknowledges funding support from NIH / NCI U01CA217882, NIH / NCI U54CA224081, NIH / NCI R01CA204302, NIH / NCI R01CA211052, NIH / NCI R01CA169338, and the Pew-Stewart Foundations.