Abstract
Most cancers acquire resistance to targeted therapeutics. Knowing the timing of molecular changes responsible for the development of acquired resistance can enable optimization of alterations to patients’ treatments. Clinically, acquired therapeutic resistance can only be studied at a single time point in resistant tumors. To determine the dynamics of these molecular changes, we obtained high throughput omics data weekly during the development of cetuximab resistance in a head and neck cancer model. An unsupervised algorithm, CoGAPS, quantified the evolving transcriptional and epigenetic changes. Further applying a PatternMarker statistic to the results from CoGAPS enabled novel heatmap-based visualization of the dynamics in these time-course omics data. We demonstrate that transcriptional changes resulted from immediate therapeutic response and resistance whereas epigenetic alterations only occurred with resistance. Integrated analysis demonstrated delayed onset of changes in DNA methylation relative to transcription, suggesting that resistance was stabilized epigenetically. Genes with epigenetic alterations associated with resistance that had concordant expression changes were hypothesized to stabilize resistance. These genes include FGFR1, which was associated with EGFR inhibitor resistance previously. Thus, integrated omics analysis distinguishes the timing of molecular drivers of resistance.
INTRODUCTION
Cancer targeting therapeutic agents inhibit specific key role players in the regulation of molecular pathways essential for tumor development and maintance1. These therapies prolong survival but are not curative. Most patients will develop acquired resistance within the first few years of treatment2. Although a wide variety of molecular alterations that confer resistance to treatment have been described, the mechanisms and timing of their evolution are still poorly characterized3,4. Serial biopsies during the prolonged treatment period are invasive, expensive, and impractical for patients. Thus, known molecular alterations are restricted in their characterization to when resistance has already developed and unable to resolve the current two hypotheses for the development of therapy resistance: the presence of small populations of resistant cells that will survive the treatment and repopulate the tumor; or the development of de novo resistance3,5. Characterization of the dynamics of genomic alterations induced during acquired cetuximab resistance can identify targetable oncogenic drivers and determine the best time point to introduce alternative therapeutic strategies to avoid resistance establishment6.
Inhibitors against Epidermal Growth Factor Receptor (EGFR) represent a common class of targeted therapeutics. Cetuximab, a monoclonal antibody against EGFR, is FDA approved for the treatment of metastatic colorectal cancer and head and neck squamous cell carcinoma (HNSCC)7. As with other targeted therapies, stable response is not observed for a long period and virtually all patients invariably develop acquired resistance8. Recent advances in in vitro models of acquired cetuximab resistance9 provide a unique opportunity to study the time-course of genetic events resulting in acquired resistance. Cell lines chronically exposed to the targeted agent develop resistance and can be sequentially collected during the course of treatment to evaluate the progressive molecular changes. Previous studies to assess the mechanisms of acquired cetuximab resistance have been limited to comparing the genomic profile of the parental sensitive cell line to stable clones with acquired resistance9–11. Therefore, these studies fail to capture the dynamics of acquired molecular alterations during the evolution of therapeutic resistance. Approaches using combined experimental and bioinformatics tools that would adjust to different tumor models and therapeutic agents are fundamental tools to overcome issues related to sample availability and serial time point data analysis.
Even with advances in experimental sampling approaches, time-course high-throughput data alone is insufficient to determine molecular drivers of therapeutic resistance. A novel serial, multi-platform genomics analysis is essential to untangle specific and targetable signaling changes that drive cetuximab resistance in HNSCC. Current supervised bioinformatics algorithms that find time-course patterns in genomic data adjust linear models to correlate molecular profiles with known temporal patterns12–15. However, these algorithms cannot quantify the rate of genomics alterations relative to that of the input phenotype. Other algorithms16–21 enhance such inference by using prior knowledge of gene relationships to find coherent, dynamic regulatory relationships that are linked to pathways. Many of these algorithms trace individual phenotypes or individual genomics platforms. Their ability to determine drivers of gene expression associated with acquired resistance from time-course data in multiple experimental conditions and multiple genomics data modalities is emerging22.
On the other hand, unsupervised analysis algorithms can simultaneously quantify the dynamics and infer the gene regulatory networks directly from the input time course data23. Nonetheless, visualization tools for inference of dynamics from unsupervised algorithm are limited.
In this study, we used an in vitro HNSCC cell line model to induce resistance and measure the molecular changes using multiple high throughput assays while the resistant phenotype developed. We selected DNA methylation based upon previous association of DNA methylation with acquired cetuximab resistance in vitro and in vivo24. We also measured gene expression to determine the functional impact of the inferred epigenetic alterations. The Bayesian non-negative matrix factorization algorithm CoGAPS25 inferred specific patterns of gene expression and DNA methylation that develop according to the gradual establishment of the acquired cetuximab resistance. Gene expression also had a pattern associated with immediate therapeutic response. We select genes uniquely associated with these changes using a PatternMarker statistic25. Plotting expression or methylation of genes with the PatternMarker statistic enabled novel visualization of the dynamics of these alterations from high-throughput data. Analysis of the CoGAPS patterns demonstrated that onset of methylation changes associated with resistance were temporally delayed relative to expression changes and involved different genes. This observation lead to the hypothesis that epigenetic alterations stabilize the resistant phenotype. Specifically, the DNA methylation PatternMarker genes were selected as putative epigenetic drivers. Therefore, we next performed correlation analysis of PatternMarkers of the DNA methylation patterns with gene expression to identify the subset of genes with tight temporal concordance implying direct epigenetic regulation. These genes included FGFR1, which also had the strongest correlation between gene expression and DNA methylation in fast growing cetuximab resistant clone generated from the same parental cell line. Previous studies associate FGFR1 gene expression with acquired cetuximab resistance in HNSCC patients26–28. In this study, we also demonstrated that epigenetic changes of FGFR1 are observed in HNSCC tumors in TCGA. This work represents the first integrated time-course analyses to determine the drivers of acquired resistance, suggesting a direct link between epigenetic regulation of FGFR1 gene expression and the development of acquired resistance. Both the experimental and bioinformatics methods developed here are applicable to other molecular platforms, therapeutics, and cancer types.
MATERIAL AND METHODS
Cell lines and materials
SCC25 cells were purchased from American Type Culture Collection (ATCC). Cells were cultured in Dulbecco’s Modified Eagle’s medium and Ham’s F12 medium supplemented with 400ng/mL hydrocortisone and 10% fetal bovine serum and incubated at 37oC and 5% carbon dioxide. SCC25 was authenticated using short tandem repeat (STR) analysis kit PowerPlex16HS (Promega, Madison, WI) through the Johns Hopkins University Genetic Resources Core Facility. Cetuximab (Lilly, Indianapolis, IN) was purchased from the Johns Hopkins Pharmacy.
Experimental protocol to establish time-course during acquisition of cetuximab resistance in SCC25
The HNSCC cell line SCC25 (intrinsically sensitive to cetuximab) was treated with 100nM cetuximab every three days for 11 weeks (generations G1 to G11). On the eighth day, cells were harvested. Sixty thousand cells were replated for another week of treatment with cetuximab and the remaining cells were separately collected for: (1) RNA isolation (gene expression analysis); (2) DNA isolation (DNA methylation analysis); (3) proliferation assay and (4) storage for future use. All steps were repeated for a total of 11 weeks. In parallel with the cetuximab treated cells, we generated controls that received the same correspondent volume of PBS (phosphate buffered saline). Cells were plated in several replicates each time at the same initial density. The replicates were then harvested and pooled to provide enough cells for genetic, epigenetic and proliferation assays. To achieve adequate final cell confluence and number of cells for the experimental analysis of each generation, cetuximab and PBS treated cells were plated in different flask sizes. Cells treated with cetuximab were plated in multiple T75 (75cm2) flasks (60,000 cells/flask) that were combined on the eighth day. PBS treated cells were plated in a single T175 (175cm2) flask (60,000 cells). This design was selected considering the growth inhibition of the earliest cetuximab generations and to control confluence of the PBS controls at the collection time (Supplemental Fig. 1).
Cell proliferation and colony formation assays
Cell proliferation events were measured using the Click-iT Plus EdU Flow Cytometry Assay Kit Alexa Fluor 488 Picolyl Azide (Life Technologies, Carlsbad, CA) according to manufacturer’s instructions. The cetuximab generations were considered resistant when the frequency of proliferating cells was higher than in the PBS control generations.
Anchorage-independent growth assay was used to further confirm the development of resistance. The parental SCC25 and the late G10 resistant cells were treated with different concentrations of cetuximab 10nM, 100nM and 1000nM. Number of colonies was compared to the same cells treated with PBS. Colony formation assay in Matrigel (BD Biosciences, Franklin Lakes, NJ) was performed as described previously29.
Stable SCC25 cetuximab resistant single clones (CTXR clones)
Resistance to cetuximab was induced in an independent passage of SCC25 cells. After resistance was confirmed, single cells were isolated and grown separately to generate the isogenic resistant single cell clones (CTXR). In total, 11 CTXR clones were maintained in culture without addition of cetuximab. With the exception of one clone (CTXR6), all CTXR clones presented substantial survival advantage compared to the parental SCC25, as reported by Cheng et al.30.
Proliferation assay was performed to confirm cetuximab resistance in the CTXR clones compared to the parental SCC25. A total of 1000 cells were seeded in 96-well plates in quadruplicate for each condition. PBS or cetuximab (10nM, 100nM or 1000nM) was added after 24 and 72 hours and cells were maintained in culture for 7 days. AlamarBlue reagent (Invitrogen, Carlsbad, CA) at a 10% final concentration was incubated for 2 hours and fluorescence was measured according to the manufacturer’s recommendations (545nm excitation, 590nm emission). Resistance in the CTXR clones was confirmed when the proliferation rates were higher than in the PBS treated SCC25 cells.
RNA-sequencing (RNA-seq) and data normalization
RNA isolation and sequencing were performed for the parental SCC25 cells (G0) and each of the cetuximab and PBS generations (G1 to G11) and the CTXR clones at the Johns Hopkins Medical Institutions (JHMI) Deep Sequencing & Microarray Core Facility. RNA-seq was also performed for two additional technical replicates of parental SCC25 cell line to distinguish technical variability in the cell line from acquired resistance mechanisms. Total RNA was isolated from a total of 1x106 cells using the AllPrep DNA/RNA Mini Kit (Qiagen, Hilden, Germany) following manufacturer’s instructions. The RNA concentration was determined by the spectrophotometer Nanodrop (Thermo Fisher Scientific, Waltham, MA) and quality was assessed using the 2100 Bioanalyzer (Agilent, Santa Clara, CA) system. An RNA Integrity Number (RIN) of 7.0 was considered as the minimum to be used in the subsequent steps for RNA-seq. Library preparation was performed using the TrueSeq Stranded Total RNAseq Poly A1 Gold Kit (Illumina, San Diego, CA), according to manufacturer’s recommendations, followed by mRNA enrichment using poly(A) enrichment for ribosomal RNA (rRNA) removal. Sequencing was performed using the HiSeq platform (Illumina) for 2X100bp sequencing. Reads were aligned to hg19 with MapSplice31 and gene expression counts were quantified with RSEM32. Gene counts were upper-quartile normalized and log transformed for analysis following the RSEM v2 pipeline used to normalize TCGA RNA-seq data33. All RNA-seq data from this study is available from GEO (GSE98812) as part of SuperSeries GSE98815.
DNA methylation hybridization array and normalization
Genome-wide DNA methylation analysis was performed on the same samples as RNA-seq using the Infinium HumanMethylation450 BeadChip platform (Illumina) at the JHMI Sidney Kimmel Cancer Center Microarray Core Facility. Briefly, DNA quality was assessed using the PicoGreen DNA Kit (Life Technologies) and 400ng of genomic DNA was bisulfite converted using the EZ DNA Methylation Kit (Zymo Research, Irvine, CA) following manufacturer’s recommendations. A total volume of 4μL of bisulfite-converted DNA was denatured, neutralized, amplified and fragmented according to the manufacturer’s instructions. Finally, 12μL of each sample were hybridized to the array chip followed by primer-extension and staining steps. Chips were image-processed in the Illumina iScan system. Data from the resulting iDat files were normalized with funnorm implemented in the R/Bioconductor package minfi (version 1.16.1)34. Methylation status of each CpG site was computed from the signal intensity in the methylated probe (M) and unmethylated probe (U) as a β value as follows:
Annotations of the 450K probes to the human genome (hg19) were obtained from the R/Bioconductor package FDb.InfiniumMethylation.hg19 (version 2.2.0). Probes on sex chromosomes or annotated to SNPs were filtered from analysis. The CpG island probe located closest to the transcription start site was selected for each gene. Genes with CpG island probes less than 200bp from the transcription start site were retained to limit analysis to CpG island promoter probes for each gene. Probes are said to be unmethylated for β < 0.1 and methylated β > 0.3 based upon thresholds defined in TCGA analyses33. All DNA methylation data from this study is available from GEO (GSE98813) as part of SuperSeries GSE98815.
Hierarchical clustering and CoGAPS analysis
Unless otherwise specified, all genomics analyses were performed in R and code for these analyses is available from https://sourceforge.net/projects/scc25timecourse.
The following filtering criterion for genes from the profiling of the time course data from generations of cetuximab treated cells was used. Genes from RNA-seq data were selected if they had log fold change greater than 1 between any two time points of the same condition and less than 2 between the replicate control samples at time zero (5,940 genes). CpG island promoter probes for each gene were retained if the gene switched from unmethylated (β < 0.1) to methylated (β > 0.3) in any two samples of the time course (1,087 genes). We used the union of the sets of genes retained from these filtering criteria on either data platform for analysis, leaving a total of 6,445 genes in RNA-seq and 4,703 in DNA methylation.
Hierarchical clustering analysis was performed with Pearson correlation dissimilarities between genes and samples on all retained genes. CoGAPS analysis was performed on both log transformed RNA-seq data and DNA methylation β values, independently using the R/Bioconductor package CoGAPS35 (version 2.9.2). CoGAPS decomposed the data according to the model where N represents a univariate normal distribution, A and P matrices and are learned from the data for a specified number of dimensions p, Σi,j is an estimate of the standard deviation of each row and column of the data matrix D, and i represents each gene and j each sample. In this decomposition, each row of the pattern matrix P quantifies the relative association of each sample with a continuous vector of relative gene expression changes in the corresponding column of A. These relative gene weights are called meta-pathways. The standard deviation of the expression data was 10% of the signal with a minimum of 0.5. The standard deviation of DNA methylation data under the assumption that β values follow a beta distribution is
CoGAPS was run for a range of 2 to 10 dimensions p for expression and 2 to 5 for DNA methylation. Robustness analysis with ClutrFree36 determined that the optimal number of dimensions p for expression was 5. DNA methylation was run in 4 parallel sets using GWCoGAPS25. In DNA methylation, the maximum number of patterns that modeled resistance mechanisms over and above technical variation in replicate samples of SCC25 was three. Gene sets representative of the meta-pathway were derived for each pattern using the PatternMarkers statistics25. Gene set activity was estimated with the gene set statistic implemented in calcCoGAPSStat of the CoGAPS R/Bioconductor package35. Comparisons between DNA methylation and gene expression values for PatternMarkerGenes or from CoGAPS patterns and amplitudes were computed with Pearson correlation.
Cetuximab resistance signatures and EGFR network
In a previous study, CoGAPS learned a meta-pathway from gene expression data corresponding to overexpression of the HRASVal12D in the HaCaT model of HPV- HNSCC premalignancy. That study associated the CoGAPS HaCaT-HRAS meta-pathway with gene expression changes in acquired cetuximab resistance in the HNSCC cell line UMSCC137. In the current study, we applied the PatternMarkers statistics25 to the previously published CoGAPS analysis of these data to derive a gene set from this meta-pathway called HACAT_HRAS_CETUXIMAB_RESISTANCE or HACAT_RESISTANCE. In addition, we searched MSigDB38 (version 5.2) for all gene sets associated with resistance to EGFR inhibition. In this search, we found the gene sets COLDREN_GEFITINIB_RESISTANCE_DN and COLDREN_GEFITINIB_RESISTANCE_UP representing resistance to the EGFR inhibitor gefitinib in non-small-cell lung cancer cell lines39. Gene sets of transcription factor targets were obtained from experimentally validated targets annotated in the TRANSFAC40 professional database (version 2014.1).
Sources and analysis of human tumor genomics data
Genomics analyses of TCGA was performed on level 3 RNA-seq and DNA methylation data from the 243 HPV-negative HNSCC samples from the freeze set for publication33. DNA methylation data was analyzed for the same CpG island promoter probes obtained in the cell line studies. Pearson correlation coefficients were computed in R to associate different molecular profiles.
Analysis was also performed on gene expression data measured with Illumina HumanHT-12 WG-DASL V4.0 R2 expression beadchip arrays on samples from patients treated with cetuximab from Bossi et al41, using expression normalization and progression-free survival groups as described in the study. Data was obtained from the GEO GSE65021 series matrix file. We performed t-tests in R on the probe that had the highest standard deviation of expression values for each gene.
RESULTS
Prolonged exposure to cetuximab induces resistance
SCC25 is among the most sensitive HNSCC cell lines to cetuximab but can acquire resistance during a long-term exposure to cetuximab30. In this study, cetuximab resistance was induced by exposing the SCC25 cells to the targeted therapeutic agent for a period of eleven weeks (CTX-G1 to –G11) (Supplemental Fig. 1). The SCC25 cells treated with PBS were used as time-matched controls (PBS-G1 to –G11). Response to cetuximab was determined by comparing the proliferation rates between CTX and PBS generations. Proliferation of the PBS generations was stable throughout the eleven weeks (G1 to G11). Conversely, proliferation of the CTX generations progressively increased over each week (Fig. 1). Relative to the untreated controls, the growth of the treated cells was initially (CTX-G1) inhibited until CTX-G3. Starting at CTX-G4, the cells became resistant to the anti-proliferative effects of cetuximab and gained stable growth advantages compared to the untreated controls.
Comparison of proliferation rates between generations of CTX treated cells relative to generations of cells treated PBS enabled us to conclude that cell growth advantages arise from chronic cetuximab treatment and were associated with resistance rather than prolonged cell culturing. We mirrored the changes in proliferation rates with clinical responses seen in HNSCC tumors treated with cetuximab (Fig. 1, top panel). Specifically, we inferred that the decreased growth rates in CTX-G1 to –G3 represented initial stages of treatment with a decrease in tumor size. Then, the switch from decreased to increased growth rates during CTX-G3 to -G4 represented stable disease without tumor outgrowth. Finally, the higher proliferation in cetuximab-treated cells starting at CTX-G4 represented rapid outgrowth after acquired resistance.
Because higher proliferation in treated than untreated cells started at CTX-G4, this was the timepoint at which acquired cetuximab resistance began and all subsequent timepoints continue to acquire stable cetuximab resistance. The resistant CTX generation 10 (CTX-G10) also presented enhanced anchorage-independent growth when compared to the parental SCC25 (G0) at different concentrations of cetuximab (two-way anova with multiple comparisons p-value < 0.01 for each concentration, Supplemental Fig. 2), representing the stabilization of cetuximab resistance in later generations.
Treatment vs. control gene expression changes dominated clustering and immediate therapeutic response was confounded with changes from acquired resistance
To characterize the gene expression changes occurring as cells acquire cetuximab resistance, we collected RNA-seq data for the parental SCC25 cell line (G0) and from each generation of CTX- and PBS-treated cells. The RNA-seq data hierarchical unsupervised clustering separated genes with expression changes in treated and untreated generations (Fig. 2A). Clustering analysis of samples (Supplemental Fig. 3) further distinguished clusters with gene expression changes in stages with cetuximab sensitivity (CTX-G1 to CTX-G3), early stages of resistance (CTX-G4 to CTX-G8), and late stages of resistance (CTX G9-G11). Expression changes at these distinct stages were shared between numerous genes. Confounding by changes resulting from immediate therapeutic response made identification of resistance-specific gene expression changes impossible with clustering.
Similar separation of stages of cetuximab response were observed in clustering analysis of gene signatures previously described in HNSCC and non-small cell lung cancer cell line models resistant to cetuximab or gefitinib (anti-EGFR small molecule), respectively37,39 (Supplemental Fig. 4). For these genes, changes during early stages of resistance clustered for CTX-G4 to CTX-G6 as distinct from later stages for CTX G7-11. Nevertheless, these gene signatures also clustered samples with gene expression changes at early stages (CTX-G1 to G3) as distinct from samples from PBS treated generations. However, these analyses were insufficient to quantify the relative dynamics of genes associated with immediate response to therapy or subsequent acquired resistance.
CoGAPS analysis of gene expression distinguished patterns of acquired resistance from immediate therapeutic response
To define gene expression signatures for treatment effect and cetuximab resistance, we applied CoGAPS25 Bayesian matrix factorization algorithm to the time-course gene expression data. Bayesian non-negative matrix factorization with algorithms such as CoGAPS have already proven highly effective in relating gene expression changes to patterns related to EGFR inhibition44, perturbation of nodes in the EGFR network45, and time course dynamics of targeted therapeutics. CoGAPS is an unsupervised algorithm that simultaneously infers the relative magnitude of genes in concordantly transcribed gene sets in each sample. These relative magnitudes across samples are called patterns and quantify the separation of distinct experimental conditions. The gene sets are inferred simultaneously, and are continuous to quantify the relative magnitude of gene weights in each set. A single gene may have non-zero magnitude in several distinct gene sets, representing the fact that a single gene can have distinct roles in different biological processes (such as immediate therapeutic response and acquired resistance). A recently developed PatternMarker statistic25 selects the genes that are unique to each of the inferred patterns, and therefore represent biomarkers unique to the corresponding biological process.
We identified five CoGAPS patterns in the time course gene expression dataset: three patterns that distinguished the experimental conditions (cetuximab vs. PBS) ( Fig. 2B and Fig. 2C and Supplemental Fig. 5); one pattern that represented changes in gene expression from the parental cell lines and subsequent generations; and one pattern that was constant and corresponded to signature of highly expressed genes (Supplemental Fig. 2). We applied the PatternMarker statistic to define genes that were uniquely associated with each of these patterns. We excluded the technical, flat pattern to focus with genes with expression changes. By design, genes selected with the PatternMarker statistic are selected to not be multiply regulated regulated. Therefore, limiting the heatmap to these genes enabled visualization of the dynamics of gene expression changes in our time-course dataset (Fig. 2B). The relative magnitude of CoGAPS pattern weights for each sample quantified the dynamics of gene expression changes (Fig. 2C).
Similar to the separation seen with clustering (Supplemental Fig. 5), the first CoGAPS expression pattern distinguished cetuximab from PBS at every generation (expression pattern 1, Fig. 2B and Fig. 2C, top). These genes had an immediate transcriptional induction in response to cetuximab treatment. Gene set analysis to determine the function of CoGAPS patterns was performed with an enrichment analysis on all gene weights obtained from the CoGAPS analysis. By performing the analysis on gene weights and not only the PatternMarker genes in Fig. 2B, we accounted for multiple regulation of genes in pathways. Gene set enrichment analysis on confirmed that published resistance signatures37,39 were significantly enriched in this pattern (Supplemental Fig. 6; one-sided p-values of 0.002 and 0.003 for resistance gene sets COLDREN_GEFITINIB_RESISTANCE_DN and HACAT_HRAS_CETUXIMAB_RESISTANCE, respectively). However, the transcriptional changes in this pattern were not associated with acquired resistance to cetuximab, and even decreased modestly as resistance developed. Further, enrichment by transcription factor AP-2alpha targets (TFAP2A; one-sided p-value of 0.05) confirmed previous work indicating that transcription by AP-2alpha is induced as an early feedback response to EGFR inhibition46. Based upon these findings, we concluded that pattern 1 was associated with immediate response to cetuximab although it includes genes that were also associated with cetuximab resistance in previous studies.
The second CoGAPS expression pattern quantified divergence of the cetuximab treated cells from controls at generation CTX-G4 (expression pattern 2, Fig. 2B and Fig. 2C, middle) which was the time point that cetuximab treated cells presented significant and stable growth advantage over PBS controls (Fig. 1). Therefore, expression pattern 2 obtained gene expression signatures associated consistently with the development of cetuximab resistance. Gene set statistics of transcription factor targets of EGFR on CoGAPS gene weights were significantly down-regulated in this acquired resistance pattern (Supplemental Fig. 6). One striking exception was c-Myc, which trended with acquired resistance (p-value of 0.06), consistent with the role of this transcription factor in cellular growth. Resistance signature COLDREN_GEFITINIB_RESISTANCE_DN gene signature was significantly down-regulated in expression pattern 2 (p-value of 0.04).
CoGAPS expression pattern 3 represented a gradual repression of gene expression with cetuximab treatment ( Fig. 2B and Fig. 2C, bottom). This expression pattern trended to significant enrichment in the COLDREN_GEFITINIB_RESISTANCE_DN resistance signature (Supplemental Fig. 6, one-sided p-value 0.12) and down-regulated in the HACAT_HRAS_CETUXIMAB_RESISTANCE resistance signature (Supplemental Fig. 6, one-sided p-value 0.09). This confirmed that expression pattern 3 was associated with repression of gene expression during acquired cetuximab resistance.
Significant enrichment of the acquired resistance signature in CoGAPS expression patterns 1-3 (Supplemental Fig. 6) suggested that genes defined from case-control experimental designs of acquired resistance provide a mixture of genes associated with early response to cetuximab and genes associated with acquired resistance. In addition, the published resistance signatures37,39 included genes that the CoGAPS and PatternMarker analysis associated with immediate response to treatment (Supplemental Fig 4). Inclusion of immediate response genes from expression pattern 1 in the published resistance signatures arose from the design of the experiments in the original publications37,39. Specifically, the resistance signatures derived from data that was collected at a single time point when the cell models have already developed resistance. At the same time point in our time-course data, gene expression changes included both immediate response genes and longer-term expression changes due to acquired resistance. Therefore, both sets of genes have significant expression changes in resistant cells when compared only to their parental cell line. These immediate response genes cannot be eliminated without including any additional samples at intermediate time points. This observation is consistent with recent studies demonstrating that time-course proliferation data increases the accuracy in drug-response metrics by removing the confounding effects of variability in cell growth/division rates and treatment effects42,43. Thus, the gene expression signature in CoGAPS patterns from the time course were able to parse apart transcriptional changes specific to immediate therapeutic response from those specific to acquired resistance.
Changes in DNA methylation inferred with CoGAPS were associated with resistance to cetuximab, but not the immediate response to treatment observed in gene expression
To determine the timing of the methylation changes associated with acquired resistance, we also measured DNA methylation in each cetuximab generation of SCC25 cells and PBS controls (Fig. 3A). Application of the CoGAPS matrix factorization algorithm to the methylation data revealed a total of 3 patterns (Fig. 3B and Fig. 3C): gradual increase of DNA methylation in controls (DNA methylation pattern 1, middle); rapid demethylation in CTX generations starting at CTX-G4 (DNA methylation pattern 2, bottom); and rapid increase in DNA methylation in CTX generations starting at CTX-G4 (DNA methylation pattern 3, top). In contrast to the gene expression data, there was no immediate shift in DNA methylation resulting from cetuximab treatment.
Comparing the CoGAPS patterns from gene expression and DNA methylation revealed strong anti-correlation between gene expression and DNA methylation in resistant patterns (Supplemental Fig. 7A). We observed that the gene expression changes associated with acquired resistance occurred gradually and were evident in early generations (Fig. 4, top). The DNA methylation was consistent in cetuximab treatment and control PBS in DNA methylation patterns 2 and 3 during early generations. Then, rapid accumulation in DNA methylation changes started after generations CTX-G4 and CTX-G5 in both patterns 2 and 3 (Fig. 4, bottom), concurrent with the onset of the observed growth advantage over the PBS control (Fig. 1). Changes in DNA methylation were delayed relative to those of gene expression in acquired cetuximab resistance (Fig. 4, dashed vertical lines). These dynamics suggests that DNA methylation stabilized the gene expression signatures crucial to the maintenance of acquired cetuximab resistance.
Epigenetic regulation of FGFR1 expression was associated with acquired cetuximab resistance in the time course and in stable cetuximab resistant clones
The gene signatures from the CoGAPS resistance patterns for expression and methylation had low correlation (Supplemental Fig. 7B) and there was little overlap between their respective PatternMarker genes. The low overlap between genes and their timing differences indicated that alterations to transcription were independent of DNA methylation. However, we hypothesized that the DNA methylation changes stabilized the resistant phenotype. Therefore, the CoGAPS gene signatures from each data modality were insufficient to define the functional DNA methylation regulation of acquired resistance. Nonetheless, we hypothesized that epigenetic regulation contributed to stabilizing the resistant phenotype. Characterizing the role of such epigenetic regulations is critical to understand the stable resistant phenotype. Moreover, identifying these epigenetic drivers can provide targets to overcome such stable resistance.
To ascertain potential drivers of the stable cetuximab resistant phenotype induced by DNA methylation, we defined genes that are PatternMarkers25 of the DNA methylation patterns associated with stable acquired cetuximab resistance (methylation patterns 2 and 3). We then applied correlation analysis to determine genes that were epigenetically silenced. Specifically, we performed correlation analysis between DNA methylation and gene expression for each of the DNA methylation PatternMarker genes (Fig. 5). FGFR1 was among these genes. This finding was consistent with previous studies that associate differential expression of FGFR1 with resistance to EGFR inhibitors, including cetuximab, in different tumor types in vitro and in vivo26–28. Given the tight temporal regulation of these genes and the previous work on FGFR1, we hypothesized that this set of genes represented epigenetic drivers of acquired resistance.
To delineate whether our presumptive drivers resulted from clonal expansion of resistant cells or from the development of new epigenetic alterations to drive resistance, we measured DNA methylation and gene expression on a panel of eleven isogenic stable cetuximab resistant clones derived from SCC25 cells previously30. Briefly, SCC25 was continuously treated with cetuximab until resistance developed, and then single cell clones were isolated and profiled in the absence of cetuximab treatment. Despite being derived from parental SCC25 cells, the single cell clones and time course generations displayed widespread differences. Significantly greater heterogeneity was observed among the cetuximab resistant single-cell clones in both expression and methylation profiles (Supplemental Fig. 8 and 9, respectively) and cellular morphology (Supplemental Fig. 10). Fig. 5A and 5B demonstrate that higher heterogeneity among single cell clones was also observed in the epigenetically regulated PatternMarker genes from the CoGAPS analysis that are shown in Figure 4D. These results suggest that different mechanisms of resistance may arise in the same HNSCC cell line.
We hypothesized that epigenetically regulated genes shared along the time course patterns and resistant single-cell clones may implicate common mechanisms acquired during evolution of the stable resistance phenotype. To test this hypothesis, we also performed correlation analysis for each of the epigenetically regulated genes in our resistant set (Fig. 5) in the resistant clones and parental cell lines. Nine of the epigenetically regulated PatternMarker genes also had significantly anti-correlated gene expression and DNA methylation in the stable cetuximab resistant clones (Supplemental Fig. 11). Of these, only FGFR1 was demethylated and reexpressed in a cetuximab resistant clone relative to the parental SCC25 cell line (Fig. 6). In this analysis, epigenetic regulation of gene expression for FGFR1 occurred in only one of the resistant clones (CTXR10). This clone was among the fastest growing under cetuximab treatment (Supplemental Fig. 12). This observation suggested that the pooled data from the time course captured clonal outgrowth of a cetuximab resistant clone with similar molecular features (FGFR1 demethylation) to CTXR10, and that therefore clonal outgrowth was the dominant mechanism of resistance in our resistance model.
FGFR1 observed dynamics in vitro recapitulates relationships from in vivo tumor genomics and acquired cetuximab resistance
In order to validate our in vitro findings, we further investigated the pattern of expression and methylation of FGFR1 and EGFR in other publicly available datasets. Using gene expression and DNA methylation data from The Cancer Genome Atlas (TCGA) for 243 HPV-negative HNSCC pretreatment samples33, we verified that the up-regulation of EGFR and FGFR1 is not concomitant (Pearson correlation coefficient = −0.06, p value = 0.33, Fig. 7A). We found that FGFR1 gene expression and DNA methylation status were significantly negatively correlated (Pearson correlation r of −0.32, p value < 0.0001, Fig. 7B), in TCGA samples, suggesting that FGFR1 transcription was epigenetically regulated in a significant proportion of HPV-negative HNSCC tumors.
Bossi et al.41 collected gene expression data from cetuximab-treated HNSCC patients with recurrent metastasis with either short- (SPFS, median 3 months surival) or long-progression-free survival (LPFS, median 19 months survival). Using this dataset, we verified that EGFR expression in SPFS is significantly lower than the LPSF group (Fig. 7C) (log fold change −1.0, t-test p-value 0.0003). The opposite was observed for FGFR1, with overexpression in SPFS vs. LPSF (Fig. 7D, log fold change 0.9, t-test p-value 0.003). However, Bossi et al.41 lacked DNA methylation data to assess whether FGFR1 was epigenetically regulated in these samples. Nonetheless, this finding in combination with the data from TCGA, supports our findings that the non-responder phenotype was accompanied by loss of EGFR expression and gain in FGFR1 expression as a result of FGFR1 promoter demethylation acquired during the development of cetuximab resistance.
DISCUSSION
Numerous short time course genomics studies of therapeutic response have been performed42,47,48, but this is the first time that genetic and epigenetic changes were measured for a prolonged exposure (11 weeks) to a targeted therapeutic agent. Using our novel robust time course experimental approach, we characterized the molecular alterations during the development of acquired cetuximab resistance in HNSCC in vitro. By collecting cells over experimentally equivalent cultures (cetuximab and PBS control generations), we could measure changes in proliferation and multiple genomics data platforms as resistance developed. We applied this approach to the intrinsic cetuximab sensitive cell line SCC25 to track the molecular progression in acquired cetuximab resistance. Thus, this was the first study to our knowledge to enable characterization of the dynamics at the early stages of therapeutic resistance, which cannot be measured in patients due to the complexity of early detection of resistance and obtaining repeat biopsy samples.
Determining the dynamics of the molecular alterations responsible for resistance requires integrated, time-course bioinformatics analysis to quantify the dynamics of these alterations. Based upon previous performance of Bayesian, non-negative matrix factorization algorithms in inferring dynamic regulatory networks for targeted therapeutics48,49, we selected CoGAPS35 for analysis of gene expression data from our time course experiment. In this dataset, CoGAPS analysis of gene expression data from cetuximab resistant clones distinguished the patterns for immediate gene expression changes and patterns for long-term changes associated with acquired resistance. Gene expression signatures for resistance to EGFR inhibitors from previous studies were significantly enriched in both types of CoGAPS patterns. These previous resistance signatures were learned from case-control studies that compared gene expression for sensitive cells to that of resistant cells, without multiple time point measurements. Therefore, we concluded that time course data was instrumental in parsing signatures of immediate therapeutic response from signatures of acquired resistance.
Pooling cells to obtain paired measure of methylation and gene expression enabled us to evaluate whether changes in DNA methylation impact gene expression. CoGAPS analysis of DNA methylation data observed only changes associated with acquired resistance, in contrast to the immediate expression changes observed with cetuximab treatment. Thus, while therapeutic response can drive massive changes in gene expression, only the subset of expression changes associated with the development of resistance have corresponding epigenetic signatures suggesting that epigenetic landscape was important for the creation of acquired resistance. The CoGAPS patterns in gene expression that were associated with acquired cetuximab resistance gradually changed over the time course. On the other hand, the CoGAPS patterns for DNA methylation changes had a sharp transition at the generation at which resistance was acquired (CTX-G4). These patterns reflect a delayed, but more rapid change in DNA methylation. Our data is consistent with previous observations that gene expression changes precede DNA methylation alterations in genes critical for cancer progression. P16INK4A and GSTP1 are tumor suppressor genes for which transcription silencing was found to occur prior to DNA hypermethylation and chromatin changes. The temporal delay observed between expression and methylation patterns in our time course provides transcriptome wide evidence of this phenomena. Specifically, that epigenetic changes are necessary to stabilize gene expression aberrant profile and will be followed by modification into a silenced methylation state that will result in tumor progression50,51. Our integrated RNA-seq and DNA methylation analysis corroborated the fact that gene expression changes occur earlier to epigenetic alterations and suggest that in acquired cetuximab resistance to cetuximab DNA methylation is essential to maintain the changes in gene expression. Future investigation into the chromatin remodeling mechanisms will test whether chromatin alterations follow the changes in expression and occur in combination with altered methylation patterns to drive epigenetic regulation of resistance.
In a recent study, gene expression changes are associated with a transient resistant phenotype present in melanoma cell lines prior to vemurafenib administration52. Once the melanoma cells are exposed to the drug, additional changes in gene expression are detected and are later accompanied by changes in chromatin structure52. These findings, together with our time course observations, suggest that in the heterogeneous tumor environment the existence of some cells expressing specific marker genes can trigger cellular reprogramming as soon as the targeted therapy is initiated. Upon drug administration, the number of genes with aberrant expression increases, and is followed by other epigenetic and genetic changes that will shift the transient resistant state into a stable phenotype. This finding on acquired resistance development could dramatically change the course of treatment with targeted therapeutic agents. The precise characterization of resistant gene signatures and their timing could be used to determine the correct point during the patients’ clinical evolution to introduce alternative therapeutic strategies. This way, secondary interventions would start before the stable resistant phenotype is spread among the tumor cells resulting in prolonged disease control and substantial increased in overall survival.
The timing delays between alterations in DNA methylation and gene expression pose a further computational challenge for integrated, time course genomics analyses. The vast majority of integrated analysis algorithms assume one-to-one mapping of genes in different data platforms or seek common patterns or latent variables across them53. Such approaches would fail to capture the early changes from cetuximab treatment that impact only gene expression, time delays between DNA methylation and gene expression patterns, and different gene usage in each pattern. It is essential to develop new integrated algorithms to simultaneously distinguish both patterns that are shared across data types and that are unique to each platform. For time course data, these algorithms must also model regulatory relationships that may give rise to timing delays, such as epigenetic silencing of gene expression. However, as we observed with the unanticipated changes in DNA methylation following and not preceding gene expression, they must also consider delays resulting from larger phenotypic changes such as the stability of the therapeutic resistant phenotype.
In spite of the complexities of the data integration, the weight of each sample in patterns inferred by CoGAPS reflected the dynamics of the process in each data modality. These patterns were learned completely unsupervised from the data, and did not require any gene selection or comparison between time points relative to any reference control. The genes associated with CoGAPS patterns had weights that were non-zero in multiple patterns. The PatternMarker25 statistic enabled further selection of the genes that were uniquely associated with each pattern. Creating a heatmap of the genomics profiles for these genes enabled novel, heatmap-based visualization of the temporal dynamics in the omics data. In the case of DNA methylation, these pattern marker genes also included genes representing driver alterations in resistance. However, transcriptional regulation by epigenetic alterations or in pathways involves simultaneous co-regulation of multiple genes. This co-regulation was reflected in the reuse of genes in CoGAPS gene weights associated in each pattern. Therefore, estimates of pathway dynamics from transcriptional data required accounting for all genes with gene set enrichment statistics instead of the PatternMarker statistic. Thus, we hypothesize that the PatternMarker statistic is robust for visualization, biomarker identification, and functional alterations in DNA over time, whereas it is robust only for visualization of and biomarker selection from time-course transcriptional data.
Among the genes we observed with the canonical relationship between expression and methylation, FGFR1 presented with loss of CpG methylation accompanied by increased gene expression. FGFR1 is a receptor tyrosine kinase that regulates downstream pathways, such as PI3K/AKT, and RAS/MAPK, that are also regulated by EGFR54. Its overexpression has been previously associated with EGFR inhibitors resistance26–28. To our knowledge this is the first study showing epigenetic regulation of FGFR1 in HNSCC and the association of that epigenetic regulation with acquired cetuximab resistance. In this case, FGFR1 induction through promoter demethylation in concordance with down regulation of EGFR appears to be the dominant mechanism. The novel cell culture protocol and time course analysis we developed here is what enabled us to see the clonal outgrowth of this particular mechanism. These results are also relevant for further translational studies into the role of FGFR1 as a potential biomarker of acquired cetuximab resistance and potential target to overcome that resistance. FGFR1 is a potential target for combined targeted therapy with EGFR, and inhibitors against this target are already the focus of clinical trials54.
We recognize that a limitation of the current study was the use of only one cell line model to induce resistance and collect the time course data for gene expression and epigenetics analysis. However, we had to take into consideration the potential batch and technical effects of broad cross-platform profiling since multiple data points in the analysis had to be accounted for when determining the number of cell models to be included. Nevertheless, the analysis of HNSCC patient samples from TCGA33 and another study41 validated our finding that FGFR1 is up-regulated and demethylated in HNSCC and associated with resistance to cetuximab.
The in vitro protocol for time course sampling developed in this study has the additional advantage of aggregating potentially heterogeneous mechanisms of resistance increasing the signal of changes in any cetuximab resistant subclone. For example, we observed epigenetic regulation of FGFR1 in the pooled cells, but only a single stable clone generated from the same SCC25 cell line in a previous study (CTXR10) had upregulation of FGFR130. This finding suggests that tumor heterogeneity also plays a role in acquired resistance to target therapies and enables different pathways to be used to bypass the silenced target within the same tumor. The heterogeneity in methylation profiles reflected the complexity of the resistance mechanisms that can arise from combination therapies in heterogeneous tumors. Future work extending these protocols to in vivo models is essential to determine the role of the microenvironment in inducing therapeutic resistance. Developing in vivo models with acquired therapeutic resistance presents numerous technical challenges that must first be addressed before such time course sampling is possible9. Pinpointing precise molecular predictors of therapeutic resistance will facilitate the identification of unprecedented biomarkers and reveal the mechanisms by which to overcome acquired therapeutic resistance to most therapies used to treat cancer.
Author contributions
G.S., L.T.K, S.L., C.H.C. and E.J.F. planned, designed and wrote the manuscript with input from all authors. G.S., L.T.K., S.L., M.T., C.H.C. and E.J.F. contributed to the development of methodology. G.S., S.L. and E.J.F. performed analysis and interpretation of data (e.g., computational analysis). R.R., H.O., H.C., M.C., A.F., L.V.D., J.A., D.A.G. participated in development of methodology and provided technical and material support. R.R., L.V.D., E.I. and D.A.G. participated in review, and/or revision of the manuscript. All authors discussed the data and contributed to the manuscript preparation. C.H.C. and E.J.F. instigated and supervised the project.
Acknowledgements
We thank JHMI Deep Sequencing & Microarray Core and SKCCC Microarray Core Facility on performing and providing advice on RNA-Seq and DNA methylation hybridization arrays, respectively; S. Boca, B. Kerr, S. Floor, C. Mak, T. Ou, D. Sidransky, L. M. Weiner, F. Zamuner, K. Zambo, and members of NewPISlack for critical comments and feedback during the preparation of the manuscript. This work was supported by NIH Grants R01CA177669, R21DE025398, P30 CA006973. R01 DE017982, and SPORE P50DE019032.
Footnotes
Genevieve Stein-O’Brien: gsteinobrien{at}jhmi.edu, Luciane T Kagohara: ltsukam1{at}jhmi.edu, Sijia Li: sli61{at}jhu.edu, Manjusha Thakar: mthakar3{at}jhmi.edu, Ruchira Ranaweera: Ruchira.Ranaweera{at}moffitt.org, Hiroyuki Ozawa: ozakky{at}cb.mbn.or.jp, Haixia Cheng: haixia.cheng{at}hci.utah.edu, Michael Considine: mconsid3{at}jhmi.edu, Alexander Favorov: favorov{at}sensi.org, Ludmila Danilova: ldanilo1{at}jhmi.edu, Joseph A Califano: jcalifano{at}ucsd.edu, Evgeny Izumchenko: izumchen{at}jhmi.edu, Daria A Gaykalova: dgaykal1{at}jhmi.edu, Christine H Chung: Christine.Chung{at}moffitt.org, Elana J Fertig: ejfertig{at}jhmi.edu