Abstract
With an overall lifetime risk of about 4.3% and 4.0%, in men and women respectively, colorectal cancer remains the third leading cause of cancer-related deaths in the United States. In persons aged 55 and below, its rate increased at 1% per year in the years 2008 to 2017 despite the steady decline associated with improved screening, early diagnosis and treatment in the general population. Besides standardized therapeutic regimen, many trials continue to evaluate the potential benefits of vorinostat, mostly in combination with other anti-neoplastic agents for its treatment. Vorinostat, an FDA approved anti-cancer drug known as suberoylanilide hydroxamic acid (SAHA), an histone deacylase (HDAC) inhibitor, through many mechanisms, causes cancer cell arrest and death. However, like many other anti-neoplastic agents, resistance and or failures have been observed. In the HCT116 colon cancer cell line xenograft model, exploiting potential lethal molecular interactions by additional gene knockouts restored vorinotat sensitivity. This phenomenon, known as synthetic lethality, offers a promise to selectively target cancer cells. Although without clearly delineated understanding of underlying molecular processes, it has been demonstrated as an effective cancer-killing mechanism. In this study, we aimed to elucidate mechanistic interactions in multiple perturbations of identified synthetically lethal experiments, particularly in the vorinostat-resistant HCT116 (colon cancer xenograft model) cell line. Given that previous studies showed that knocking down GLI1, a downstream transcription factor involved in the Sonic Hedgehog pathway – an embryonal gene regulatory process, resulted in restoration of vorinostat sensitivity in the HCT116 colorectal cancer cell line, we hypothesized that vorinostat resistance is a result of upregulation of embryonal cellular differentiation processes; we hypothesized that elucidated regulatory mechanism would include crosstalks that regulate this biological process. We employed a knowledege-guided fuzzy logic regulatory inference method to elucidate mechanistic relationships. We validated inferred regulatory models in independent datasets. In addition, we evaluated the biomedical significance of key regulatory network genes in an independent clinically annotated dataset. We found no significant evidence that vorinostat resistance is due to an upregulation of embryonal gene regulatory pathways. Our observation rather support a topological rewiring of canonical oncogenic pathways around the PIK3CA, AKT1, RAS/BRAF etc. regulatory pathways. Reasoning that significant regulatory network genes are likely implicated in the clinical course of colorectal cancer, we show that the identified key regulatory network genes’ expression profile are able to predict short- to medium-term survival in colorectal cancer patients – providing a rationale basis for prognostification and potentially effective combination of therapeutics that target these genes along with vorinostat in the treatment of colorectal cancer.
Introduction
The quest for effective therapies for colorectal cancer, particular in younger patients with advanced disease has never been more imperative. With an overall lifetime risk of approximately 4.3% and 4.0%, in men and women respectively[1, 2], colorectal cancer is the second leading cause of cancer-related deaths in the United States[3]. In persons aged 50 and below, its rate increased at 2% per year in the years 2012 to 2016 despite the steady decline associated with improved screening, early diagnosis and treatment in the general population[2, 3]. According to the center for disease control and prevention (CDC), in 2017, 141, 425 new cases of colorectal cancers were reported, and 52, 547 people died of it[4]. The CDC estimates that for every 100, 000 people, 37 new colorectal cancer cases are reported and 14 people died of this cancer[4].
Historically, risk factors have been classified as modifiable and non-modifiable factors[5]. Modifiable factors have included being overweight, a sedentary lifestyle, diet rich in red and processed meat, and sugars, smoking and alcohol consumption, while non-modifiable factors include increasing age, history of inflammatory bowel disease, polyps, family history of colorectal cancer, ethnicity, type II diabetes mellitus, and familial or inherited syndromes [5]. Although familial or hereditary factors account for only a third of colorectal cancer diagnoses, their molecular basis have enabled fundamental understanding of the etiopathogenesis of the disease. These include, lynch syndrome (hereditary non-polyposis colon cancer or HNPCC) which is primarily associated with defects in the MLH1, MSH2 or the MSH6 genes, and accounts for about 2% to 4% of all colorectal cancers, familial adenomatous polyposis coli (FAP) which accounts for 1% of colorectal cancers, Peutz-Jeghers syndrome (PJS), and MUTYH-associated polyposis (MAP). Associated with mutations in the APC gene, the FAP-related colorectal cancer consists of three sub-types with almost specific clinical features. These include: the attenuated FAP, associated with fewer polyps and development of colorectal cancer at a later age than it is typical; the Gardner syndrome, associated with tumors of the soft tissues, bones and skin; and the Turcot syndrome, associated with an higher risk of colorectal cancer and a predisposition to developing medulloblastoma – a brain cancer. Usually diagnosed at a younger age, PJS is associated with mutations in the STK11 (LKB1) gene while as its name implies, MAP is caused by mutations in the MUTYH gene[5]. These associated genetic defects are characteristically those of genes involved in tumor suppression and DNA repair mechanisms [6].
Besides standardized therapeutic regimen, many trials continue to evaluate the potential benefits of vorinostat, mostly in combination with other anti-neoplastic agents for its treatment[7–17]. Vorinostat, an FDA approved anti-cancer drug known as suberoylanilide hydroxamic acid (SAHA), a histone deacetylase (HDAC) inhibitor, through many mechanisms, causes cancer cell arrest and death[18]. First discovered on attempts to make more efficient hybrid polar compounds that induce the differentiation of transformed cells[19, 20] and initially approved by the FDA for the cutanous manifestation of T cell leukemia, vorinostat has since become a therapeutic candidate for many tumors[21–29]. This is due in part to the evolving understanding of the role of epigenetic and posttranslational modifications in the etiopathogenesis of transformed cells[30–33]. Altering many pathways and processes, vorinostat has been discovered to not only alter the modification state of histone proteins but many more essential proteins involved in the oncogenic and tumor suppression process. More specifically and among many other mode of action, vorinostat inhibits the removal of acetyl group from the ϵ-amino group of lysine residues of histone proteins by histone deacetylases (HDACs). Accumulation of acetyl group maintains chromatin in an expanded state, facilitating transcriptional activities of major regulatory genes[18, 30, 34–36]. However, like many other anti-neoplastic agents, toxicities, resistance and or failures have been observed[13, 37, 38].
In the HCT116 colon cancer cell line xenograft model, exploiting potential lethal molecular interactions by additional gene knockouts, Falkenberg and colleagues were able to restore vorinotat sensitivity[39, 40]. This phenomenon, known as synthetic lethality, offers a promise to selectively target cancer cells[41]. Although without clear delineated understanding of underlying molecular processes, many studies demonstrate synthetic lethality as an effective cancer-killing mechanism.
In this study, we aimed to elucidate regulatory interactions, in multiple perturbations of identified synthetically lethal experiments, particularly in the vorinostat-resistant HCT116 (colon cancer xenograft model) cell line. In addition to elucidating interactions, we aim to elucidate key interactions that potentially determine observed phenotypes. Given that previous studies[39, 40] showed that knocking down GLI1, a downstream transcription factor involved in the Sonic hedgehog (SHH) pathway [42–44] – an embryonal gene regulatory process, resulted in restoration of vorinostat sensitivity in the HCT116 colorectal cancer cell line, we hypothesized that vorinostat resistance is a result of uptick in embryonal gene regulatory programs. We also hypothesized that elucidated regulatory mechanism would include crosstalks that regulate this biological processes – embryonal gene regulatory programs. We employed a knowledege-guided fuzzy logic regulatory inference method to elucidate mechanistic relationships from multiple synthetic lethal pertubation experiments in the vorinostat-resistant colon cancer cell lines. We validated inferred regulatory models in independent experiment datasets. And, we evaluated the biomedical significance of key regulatory network genes in an independent clinically annotated dataset.
Materials and Methods
Datasets
Synthetic Lethal Experiments Transcriptome, RNA Sequencing Assay Data
Two RNASeq expression datasets (accessions GSE56788 and GSE57871) with available viability assay data were retrieved from the National Center for Biotechnology information (NCBI) Gene Expression Omnibus [45, 46] public repository.
GSE56788
Detailed under the BioProject accession PRJNA244587, this consists of 45 assays from 15 biosamples, each ran in 3 independent biological replicates. RNA-seq expression profiles were acquired by next-generation sequencing of vorinostat-resistant HCT116 cells (HCT116-VR) following knockdown of potential vorinostat-resistance candidate genes. Assays included those of mock transfection to serve as controls. The authors of the study sought to understand the mechanisms by which these knockdowns contributed to vorinostat response – reestablishment of a gain in sensitivity to Vorinostat. siRNA-mediated knockdown of each previously identified resistance candidate genes in the HCT116-VR cell line was employed[39]. Raw RNA sequence expression data were downloaded from the NCBI Sequence Read Archive [47, 48], with accession number SRP041162. Table 1 shows the transcriptome expression profile data accessions and associated siRNA treatment experiments.
GSE57871
Similar to GSE56788, the GSE57871 study is a 42 sample dataset derived from an expression profiling by high throughput sequencing. It consists of independent biological experiments of 14 samples performed in triplicates. RNA-seq high throughput expression profiling of vorinostat-resistant HCT116 cells was performed following gene knockdown of GLI1 or PSMD13 with or without vorinostat treatment. Study authors had chosen GLI1 and PSMD13 as potential vorinostat resistance genes because these had previously been identified through a genome-wide synthetic lethal RNA interference screen (the GSE56788 dataset study). An aim was to understand the transcriptional events underpinning the effect of GLI1 and PSMD13 knockdown (sensitisation to vorinostat-induced apoptosis). The authors first performed a knockdown on cells, and then treated these with vorinostat or the solvent control. Two timepoints for drug treatment were assessed: a time-point before induction of apoptosis (4hrs for siGLI1 and 8hrs for siPSMD13) and a timepoint when apoptosis could be detected (8hrs for siGLI1 and 12hrs for siPSMD13)[40]. Raw sequence expression data were downloaded from the NCBI Sequence Read Archive with accession number SRP042158. Table 2 shows the transcriptome expression profile sample data accessions and associated siRNA treatment and treatment timepoint experiments.
Colon Cancer-Associated Genes from OMIM
A curated list of colon cancer-associated genes (Table 3) were retrieved from the Online Mendelian Inheritance Man (OMIM) database [49, 50].
Biomedical Significance Experiment Data
To evaluate the clinical and biomedical significance of inferred regulatory features and themes, gene expression profile were retrieved from the cancer genome atlas (TCGA) colorectal cancer mRNA data, in the TCGAcrcmRNA R Bioconductor package[51, 52]. The package contains the TCGA consortium-provided level 3 data, generated by the HiSeq and GenomeAnalyzer platforms, from 450 primary colorectal cancer patient samples[53]. For a more comprehensive and up-to-date phenotype information, associated patients’ clinical data were retrieved from the genomic data commons[54–57].
Methods
RNA Sequence Analyses
Quality assessment
For data quality assessment (QA), the fastqcr, ngsReports and Rqc R/bioconductor tools [52, 58–60], modeled after the FASTQC [61] tool philosophy were used. These provide add-on capabilities and the R programming interface to the standalone Java program implementation of FASTQC. QA results were used to identify data with questionable measured quality metrics. In addition to data file statistics, reported quality metrics included; ‘adapter content’, ‘overrepresented sequences’, ‘per base N content’, ‘per base sequence content’, ‘per base sequence quality’, ‘per sequence GC content’, ‘per sequence quality score’, ‘sequence duplication levels’, and ‘sequence length distribution’.
Reads quantification
To quantify expression, we aligned reported reads from the sequencing experiment to the genome. Although non-alignment based quantification approaches such as those implemented in Salmon [62], Sailfish [63], and Kallisto [64] are becoming more popular, the performance of these on quantifying lowly expressed genes and small RNAs is still being debated [65]. Therefore sequence reads were aligned to the genome (NCBI GRCh38 build) using the TopHat2 [66, 67] tool which accounts for slice junctions in alignments. Tophat2 uses the bowtie2 [69], noted for its speed and proven memory efficiency for primary alignment. Rather than build new index files, pre-built bowtie2 index files were downloaded from Illumina’s iGenomes archive [70]. Accepted hits and annotation information in the BAM format [71] output files were assembled into an expression matrix of feature counts using the featureCount routine in the Rsubread package [72].
Preprocessing and normalization
Feature counts were normalized using the DESeq2 package [73] tool’s implemented regularized log transformation to account for disparate total read counts in the different files and to allow for comparison across the different samples. The regularized log transformation moderates the high variance typically observed at low read counts. We specified regularized log transformation intercept as the average expression profile across the normal (mock) samples.
Model Building and Independent Validation Datasets
Datasets were divided into training (regulatory-model-infering) and test (regulatory-model-validation) datasets (Figure 2). Regulatory models were inferred using the training datasets. Inferred models were tested in the independent validation datasets. Independent validation dataset included two parts. A part was used to test the regulatory models while the other part was used to test and evaluate a simulation of the consolidated network.
Feature Selection
Although similar, feature selection for regulatory network reconstruction and inference differs from classical feature selection. Classical feature selection [74–78] approaches aim to identify the optimal set of features with which a trained model can best predict or correctly identify a class of a not-previously-seen object, given the object’s attributes – the class prediction problem. With a class prediction problem is an associated feature redandancy [79] which needs to be mitigated when choosing an optimal set. With respect to selecting features for regulatory networks however, this may not necessarily be the case, since features that appear redundant may imply co-regulatory (direct or indirect regulatory) interactions in the network. In both situations anyways, on a one hand is the cost of learning a model while on another hand is the curse of dimensionality that plague the low sample to feature ratio characteristic of biological experiments. The very high dimension coupled with low sample size and the potential noise in measured experiments present a limitation for regulatory network inference methods [80] in particular. Feature selection seeks to find a middle spot where cost is minimized with minimal loss in learned model benefits. Although optimized algorithms may mitigate cost, poorly selected or less optimal set of features are set to undermine the efficiency of any learned model.
For a regulatory network model that would represent colon cancer, we reasoned that network features should very likely include known and previously identified products of genes associated with the disease process. Thus, we compiled a list of genes consisting of a curated set obtained from the OMIM database [49, 50] and those from literature evidences i.e. genes in described pathways of colon cancer tumorigenesis. And, if we assume that the regulary network is a function of changes in features’ expression across time, among different perturbations or across cellular states, it should also appeal to reason that features with significant variations or dispersion in expression across samples should be more informative i.e. more relevant for deriving a regulatory network than those without or with minimal variations. Mathematically, we may describe a cellular state s, as a linear combination of weighted features’ expressions, given by the equation below: where α, β, γ, ··· ω are the rates of change in respective feature’s expression i.e. rate constants; ϵ is the random error estimate; {x1, x2, x3 ··· xn} is the set of expression values of features under condideration; and n is the total number of features. We reasoned that if we assume a regulatory network describes changes in cellular state across time, we might as well describe it as a first derivative of cellular state, f(s)′. Therefore features without changes in expression across time, i.e. features whose rate constants tended to zero would drop off in the estimate d(f(s))/dt. This is analogous to being of less significance in determining the dynamic nature of the regulatory network, i.e. changes in cellular state.
To determine maximally varying features, from our RNA sequence analyses normalized expression values, we estimated a mean absolute deviation (MAD) from the mean, for each feature. Given by, where n in this case is the number of samples or perturbations and is the mean expression value of the specific feature across the samples. xi ∈ {x1, x2,… xn}.
To further assess variation in the expression of genes across samples, we also determined fold changes between the minimum and maximum expression values for for the respective genes and the strength of change between knockdown and control experiments. Because genes with highest MADs were observed to be predominantly those with low average expression and thus may be confounded by a Poisson noise distribution, we performed differential expression analyses between the respective groups of knockdown (siRNA) experiments and the controls to identify statistical significantly expressed genes (i.e. features with true changes)[81–83].
In summary, in additon to genes previously identified as related to colon cancer tumorigenesis and the specific genes targeted in the knockdown experiments, expression profile-informed genes were also considered for regulatory network inference based on their MAD, differential expresson and the log fold difference between the minimum and maximum expression values across siRNA knockdown experiments. The expresion profile-based selection criteria we specified were that for a gene to be considered:
Its mean absolute deviations (MAD) must be greater than the median of MADs.
Its expression value in 80% of samples must be greater than its minimum value across all samples by a minimum of two folds. The 80% of samples must include ≥ 80% of siRNA-targeted experiments. And, it must be
Statistical siginificant and differentially expressed in at least two siRNA-targeted sample groups versus the control group
Knowledge-guided feature selection
Purely data-driven methods have drawbacks such as limited biological interpretability. Likewise, canonical signaling pathways from literature evidences, provided in curated knowledge databases are not very specific and these hardly predict cell type-specific responses to experimental situations [84]. Therefore, we employed a hybrid approach that addresses these limitations and, can integrate prior knowledge and real data for network inference. We searched the derived features, and the colon cancer related gene features from OMIM database, against the STRING database[85–87]. Our search parameters included: a search against a full network type where edges indicate both functional and physical protein interactions; reported network edges indicate the presence of evidence of interactions between nodes; active interaction sources included mining of literature texts (TextMining), known experiments, knowledge bases, documented co-expression information, gene neighborhood, fusion and co-occurrence information. Quantitative interaction score for retrieved edges was specified as a minimum of 0.150. We retrieved features reported to be part of a potential network. For each feature found as part of a potential network, all reported interacting features were retrieved and mapped. We elaborated regulatory relationships between and among features using the fuzzy logic approach.
Fuzzy Logic Regulatory Model and Network Inference
To tease regulatory interactions among our initial selection of features, we employed the fuzzy-logic approach. The fuzzy logic approach mitigates known challenges of modeling biological systems, such as inconsistencies and inaccuracies associated with high-throughput characterizations. These challenges also include data noise and those of dealing with a semi-quantitative data [88]. Similar to Boolean networks, fuzzy logic methods are simple and are fit to model imprecise and or highly complex networks. And, opposed to differential equation based models, they are less computationally expensive and less sensitive to imprecise measurements [89–91]. Fuzzy logic compensates for the inadequate dynamic resolution of a Boolean (or discrete) network, while simultaneously addressing the computational complexity of a continuous network [92, 93].
A significant advantage of the fuzzy logic approach is that, in contrast to many other automated decision making algorithms or regulatory inference methods, such as neural networks or polynomial fits, algorithms in fuzzy logic are presented in similar day-to-day conversational language. Therefore, a fuzzy logic is more easily understood and can be extrapolated in predictable ways.
In general, the fuzzy logic modeling approach entails three major steps.
Fuzzification
Rule evaluation, and
Defuzzification
[94].
Fuzzification
Considering expression as a linguistic variable and applying defined membership functions on observed continuous numerical expression data, the fuzzification step derives qualitative values. It is a mapping of non-fuzzy inputs to fuzzy linguistic terms [94]. To make data fuzzification easier, a normalization technique may be applied to scale values to within a preferred range [92, 94, 95].
The fuzzification step derives qualitative values from the expression profile’s crisp values. By applying defined membership functions on crisp, numerical expression data, we derived qualitative values – described as a mapping of non-fuzzy inputs to fuzzy linguistic terms [96]. Given qualitative values of HIGH, MEDIUM, or LOW, the fuzzification step takes a feature’s expression value and assigns it degrees to which it belongs to the respective class of HIGH, MEDIUM or LOW expression values. [97–100]. After an initial data transformation of log2 expression ratios by the arctan function and dividing values by , to project the ratios onto [-1,1], the fuzzification step utilizes three membership functions consisting of the ‘low’, ‘medium’, and ‘high’ functions. Given the three fuzzification functions (y1 = low, y2 = medium, y3 = high), fuzzification of a gene expression value x results in the generation of a fuzzy set y = [y1, y2, y3] as follows:
Rule evaluation
The rule evaluation step considers combinations of features and utilizes an inference engine of rules, of the form IF-THEN, including fuzzy set operations such as AND, OR, or NOT, to evaluate input features’ expression (in fuzzy set definition) in relation to output features. This has been described as attempting to make an expert judgment of collective linguistic terms; attempts to find a solution to an evaluation of the concurrent state of existence of linguistic description of states.
We specified our rule configuration (the specification of if-then relationships between variables in fuzzy space) in the form of a vector r = [r1, r2, r3]. We specified the state of an output node z = [z1, z2, z3] to be determined by the fuzzy state of an input feature y = [y1, y2, y3] and the rule describing the relationship between the input and the output, r = [r1, r2, r3] as follows:
An inhibitory relationship, for example, specified as [3, 2, 1] implies, if input is low (r1), then output is high (3); if input is medium (r2), then output is medium (2), and if input is high (r3), then output in low (1). The classic fuzzy logic rule evaluation using the logical AND connective results in a combinatorial rule explosion i.e. an exponential increase in the number of rules to be evaluated and computational time, with additional inputs to be considered [101]. Therefore, to address this combinatorial rule explosion situation, we employed the logical OR (union) rule configuration, an algebraic sum in fuzzy logic [102, 103] as described in [104].
Defuzzification
The defuzzification step produces a quantifiable expression result or value given the input sets, the fuzzy rules, and membership functions. Defuzzification technically interpretes the membership degrees of the fuzzy sets into a specific decision or real value. The defuzzification step attempts to report a corresponding continuous numerical variable from a fuzzy state liguistic variable. Several approaches to defuzzify abound. We employed the simplified centroid method [103]. Given a predicted fuzzy values of an output node y = [y1, y2, y3], we defined defuzzified expression values as:
After defuzzification, we reverse transformed back to log2 expression values by multiplying derived values by and applying the tangent function.
Inferred regulatory model fit
For each regulatory model, which consists of an output feature, its suggested regulatory input feature(s) and associated fuzzy logic rules (relating each input feature to the output respectively), we estimated the fitness of such model’s prediction of the output x across M experiment samples or perturbations x = {x1, x2,…, xM} as: where is the set of defuzzified numerical log expression ratios predicted for the output feature and is the mean of the experimental values of x across the samples or perturbations observed. A perfect fit would result in a maximum E of 1.0.
Model probability (p-value) estimates
To estimate models probabilities, we fitted a probability density distribution for 100, 000 fit estimates of models derived by random permutations of rules and input features for each output features. We allowed up to four regulatory interactors. We computed a model fit’s p-value as the probability of observing an estimated fit from a random estimated fits distribution. A gamma distribution was fitted and, the ‘scale’ and ‘shape’ parameters were derived using The Maximum Likelihood Estimate (MLE) approach [105–108] implemented in the egamma function, in the EnvStat R package. With the ‘’scale’ and ‘shape’ parameters, random deviates and cummulative probabilities were derived using the (rgamma) and (pgamma) implementations respectively, in the stats package [109, 110].
Model validation
As described above, the fuzzy logic approach infers a regulatory model to consist of an output node, input nodes and respectively derived regulatory rule that relate each input node to the output node. We validated derived models for each feature output in the independent GLI1 siRNA knockdown experiments datasets generated by Falkenberg et al (2016). In this dataset, the authors focused on the genes GLI1 and PSMD13 as potential vorinostat-resistance candidate genes, identified from previous screens. Falkenberg and colleagues performed transcriptome analysis on vorinostat-resistant HCT116 cells (HCT116-VR) upon knockdown of these candidate genes in the presence and absence of vorinostat. According to the authors, treatment of vorinostat-resistant cells with the GLI1 small-molecule inhibitor, GANT61, phenocopied the effect of GLI1 knockdown. Therefore, for independent validation of our inferred regulatory models, we reason that for model estimated fit in the test data should as closely as possible be similar to (or better than the) estimated fit in the training dataset. The two timepoints for drug treatment assessed by Falkenberg and colleague represent a timepoint before induction of apoptosis (4hrs for siGLI1) and a timepoint when apoptosis could be detected (8hrs for siGLI1). Therefore for this validation, we used the sample expression data at 8hrs (see the table 4).
Network construction and validation
For each output node, the best-fitted model as determined by estimated fit difference between the associated models in the training and validation data was selected as a representative model. Representative models were consolidated into a single regulatory network (Figure 3). We reasoned that, models with minimal estimated fit difference are more likely stable than those with high differences.
To validate the derived regulatory network, we compared the monotonic and adaptive changes[111] observed by a dynamic simulation of the network over 5, 000 time-step iterations in the training data against that observed in the validation data. We reasoned that the distribution of observed changes between the training data network simulation and the independent validation data simulation would not be significantly different.
To simulate the network, we derived successive time-step expression values (In+1) for each node by a linear combination of the previous (In-1) and new values (In), to ensure the system converge smoothly towards equilibrium[99]. Given by Gormley et al, new values (In) were computed as:
Where the α option specifies the ‘mixing parameter’, guiding how quuickly the simulation reaches system equilibrium. New values for each node were based on the initial conditions and the fuzzy relations (regulatory rules) inferred from the training data. Zhang et al (2019) respectively described monotonic SM and adaptive changes SA as:
Where R are the estimated values over the entire iteration, R0 are observed values at the start of simulation and RT are values observed at the end of simulation. We utilized the Student t-test to determine if there is any difference in monotonic and adaptive network simulation changes between the training data and independent network validation data. To effectively simulate a knockdown and making the validation dataset-2 more comparable, we in-silico kept the level of knocked-down feature expression unchanged throughout the simulation steps. The table (Table 5) shows the dataset considered for independent validation of regulatory network (validation dataset-2).
Clinical Significance Evaluation
To evaluate biomedical significance of inferred regulatory network, we first estimated importance of all nodes contained therein. We defined node importance score (Ii) similar to Zhang et al’s [111]. The node importance score estimates integrate network topology, network edge interaction strengths and gene expression. To encapsulate these, Zhang and colleagues defined a hub score (H), a local network entropy (S) and an adaptation score (A) and integrated these into a comprehensive index for each node – a normalized rank sum of these values.
A Hub score assesses a node’s connectivity to other nodes. It is the principal eigenvector of the adjacency matrix of the inferred regulatory network. If
Zhang et al described the hub score of node i as hi.
Extending the works of Teschendorff and Severini [112], Zhang et al described local entropies as the degree of randomness in the local pattern of information flux around each node[111]. This is analogous to the centrality entropy described by Ortiz-Arroyo and Hussein [113]. It is a measure of the centrality of nodes depending on their contribution to the entropy of the derived regulatory network. We computed each nodes local entropy using Jalili et al’ scentiserve R package implementation of entropy[114]; derived from Shannon’s [115] definition of entropy which states that the entropy of a random variable X that can take n values is:
Jalili et al’s centrality entropy measure Hce of a graph G, is defined as: where where paths(vi) is the number of geodesic paths from node vi to all the other nodes in the graph and paths(v1, v2,…, vM) is the total number of geodesic paths M that exists across all the nodes in the graph.
In place of an adaptation score rank, we modified the node importance score to include instead the fit rank (rF), the mean edges confidences rank (rE) and the delta rank (rD). We defined the fit rank as the rank of the estimated fit associated with the respective node in the network. We defined the mean edges confidences rank as the rank of the average of edge confidences returned from the STRING database associated with the node and contained in the node’s regulatory model inferred by the fuzzy logic approach. To moderate the estimated fits, we defined the delta rank as the rank of the difference in model-associated estimated fits observed in the training and independent validation datasets.
We defined an importance score (Ii) for each node as the normalized rank sum of these values, similar to Zhang et al’s.
Similar to Zhang and colleagues’[111], we evaluated the potential for highly ranked regulatory node features or themes to predict short-(three or less years) and mid-term survival (greater than 3 years). We reasoned that these features are potentially able to drive tumor cells to either circumvent or succumb to epistatic events. We fitted a logistic regression model using the expression profile and clinical information we retrieved on the cancer genome atlas (TCGA) primary colorectal cancer samples – incorporating our derived node importance measures as penalty weights and specifying the 3-year survival statuses (dead or alive) as the outcome. Given yi = 0 or 1 as the binary response outcome associated with the i-th sample in n patients; pi = Pr(yi = 1); i = 1, ···, n; and xi = (xi1, xi2,…, xiL)T is the expression profiles of the genes in the i-th patient, we modeled the logistic regression model as: where β0 and βj are respectively the intercept and regression coefficients.
We randomly divided the data into training and test subdatasets at varying sample ratios of 50%, 60%, 70%, and 80%. We ran 100 repeated estimates at the different sample ratios. We calculated the areas under the ROC curves (AUCs) for the training and test dataset. We further evaluted the association of the top ranked features with survival using a Kaplan–Meier (K-M) survival analysis[116–118] and estimated significance between the K-M curves using the Cox proportional hazard model[119] and the two-sided log-rank test[120]. We classified patients into two groups (high-risk vs low-risk) based on the optimal cutoff using the ROC approach.
Results
RNA Sequence Analyses
From a total of 45 samples in the GSE56788 dataset, 34 samples succesfully passed through our analysis pipeline. 11 samples failed because of potentially corrupted raw data files. Samples that passed are shown in Table 6. These include two assays each of the interferring RNA treatment samples siCCNK, siEIF3L, siGLI1, siJAK2, siNFYA, siPOLR2D, siPSMD13 and siRGS18; one assay of the siBEGAIN treatment sample; and, three assays each for the interferring RNA treatment samples siCDK10, siDPPA5, siSAP130, siTGM5 and siTOX4. Two assays were experiment control samples. Spannig gene products involved in the cell cycle-, gene transcription- and signal transduction pathways-associated biological processes, Table 7 shows the siRNA targeted (knocked-down) genes in the vorinotat-resistant colon cancer cell line sythetic lethality experiment assays.
Reads quality assessment
All quality assessment measures as defined by Andrews and colleagues at the Babraham Institute[61], except the ‘Per base sequence content’ module which represents the relative amount of each base in the entire genome (Table 8), were satisfied. These included the: ‘Adapter Content’, ‘Overrepresented sequences’, ‘Per base N content’, ‘Per base sequence content’, ‘Per base sequence quality’, ‘Per sequence GC content’, ‘Per sequence quality scores’, ‘Sequence Duplication Levels, and the ‘Sequence Length Distribution’ module passed QC assessment. In most instances, the ‘Per base sequence content’ is not of biological concern as this arises from technical issues relating to using primers with random hexamers or the use of transposases which are biased toward specific cleavage sites during the library generation step. It is understood that, because of this some biases may occur, particularly at the start of the sequence reads[61]. The ‘Per base sequence content’ failure is triggered if the difference between A and T, or G and C is greater than 20% in any position. This is generally not a problem if the biases and failures can be visualized (see Figure 4) and attributed to around the first 12 base locations.
Figure 5 shows the Per Base Sequence Quality plot for the SRX516756.sra_data. This shows the distribution of quality scores for bases at the respective positions in a box plot with whiskers. The y-axis shows the quality scores. A better base call is indicated by a higher score. The background of the graph divides the y axis into very good quality (green), reasonable quality (orange), and poor quality (red) calls. It is not unusual for the quality of a base call to degrade toward the end of the read. Although scores appear generally okay, it can be seen that the scores reported at the first 5 —10 base positions are of lesser quality.
Table 9 shows estimates for quality assessment parameters in each RNA sequencing sample in the GSE56788 dataset. Assessed parameters included percent duplication, percent GC content, and average sequence read length. Average duplication rate is estimated to be 20.21%, while that of GC content stood at 48.68%.
Reads quantification
Maximum number of reads was found to be 26430192 reads in the SRX516756.sra_data (siCCNK) sample, while the minimum was 9756110, in the SRX516790.sra_data (siGLI1) sample - an approximately three fold difference across the sythetic lethal experiment assays (Fig. ??, Table 10). As opposed to 45 samples, results are presented for 34 samples. As previously mentioned, data for 11 samples failed on topHat2 alignment on execution, potentially due to corrupted samples’ raw data file.
Similar QC assessment profile is observed for the data in both the GSE56788 and the GSE57871 datasets.\
Feature Selection, for Regulatory Network Inference
Features’ mean absolute deviations (MADs)
As previously described, we reasoned that the most changing features, in terms of expression values across pertubations, are more informative in the context of regulatory networks than non-changing features (see section). To determine these features, we estimated the mean abosolute deviations of each of the 28, 395 genes across all available assay samples in our model-inferring (training) dataset. The observed median of MADs is 0.2033 (Mean=0.2905, SD=0.3482). With a maximum and minimum observed MAD of 2.1333 and 0.0 respectively, over 30% (11, 123) of features do not appear to change (Fig 6, 1st Qu=0.0000). These include those for knocked-down features, DPPA5, RGS18, and TGM5. The feature with maximum MAD was HRNR (hornerin).
Features’ differential expression
Another measure of change we employed was the differential expression for each feature – an estimate of features that are truly different in terms of expression values between conditions. We estimated the differential expression of features in each knock-down assay group against the control assays. At adjusted p-values ≤0.05, the maximum number of differentially expressed features (8, 055) were found in the POLR2D siRNA knockdown assays, while the least number (2, 504) were found in the RGS18 knockdown experiments (Table 11, Fig. 7 and 8). 1, 645 features were differentially expressed in 5 comparisons of siRNA knockdown assays versus control assays, while 6 features are differentially expressed in all comparisons(Fig. 9). The features found to be differentially expressed in all contrasts include the NRBP1, MTHFD2, ALDH1A3, TNS2, LIMA1 and the BAK1 gene products. Cummulatively 13, 090 features were found to be differentially expressed in at least one contrast comparison, while 4, 270 were found to be differentially expressed in at least half of the comparisons (Fig. 10, Table 12). In terms of features discovered to be differentially expressed, the assay groups appear to cluster into 3 major groups (Fig. 11).
Features’ expression ranges and log-fold changes
Still on evaluating features’ expression changes across knockdown experiments, we considered the log-fold change between the minimum and the maximum expression value for each feature across all knockdown assays. A large proportion of features show no log-fold change (Fig. 12). The maximum log fold change (10.436) is found in the SPP1 feature expression profile. Across all features, median log-fold change was 1.099 while mean log-fold change was 1.497. 8, 037 features have log-fold changes ≥ 2 while only 19 features have log-fold change ≥ 8 (Fig. 13 and Table 13).
Online Mendelian inheritance in man (OMIM) database features
For a more encompassing regulatory network inference feature set, and addressing limitations of purely data-driven approaches as previously mentioned, 52 features were retrieved from the online mendelian inheritance in man (OMIM) database (Table 3)[49, 50].
Search tool for the retrieval of interacting proteins (STRING) database search
According to criteria specified previously, and together with features determined from the OMIM database, 571 were considered to be temporally changing and potentially informative for regulatory network construction.
These were searched against the STRING database for any remote biological evidence of potential interactions – serving as a priori knowledge guide for our downstream fuzzy logic regulatory network inference. At a false discovery rate of 0.05 and minimum interation confidence of 0.150 (low confidence), retrieved interaction network consisted of 559 nodes and 8, 819 edges. Average node degree was 31.6 and the average local clustering coefficient was 0.312. With an expected number of edges of 7, 659, p-value of protein-protein interaction (PPI) enrichment was < 1e-16. With 238 mapped interactions, AKT was reported to have the most identitfied interactions. Features with only one SRING database-identified interaction were C1orf35, CCDC171, MROH8, OR51B2, PRB3, and RNF223.
Selected features for fuzzy logic based regulatory network
Of the 559 features found to belong to probable biological network in the STRING database, 535 were subjected to the fuzzy logic regulatory inference approach.
Regulatory Network Inference
An inferred fuzzy logic model consists of an output node, its regulatory input nodes, and respective regulary rule that describes the interaction and relationship of the input to the output node. A regulatory rule is one of 27 rules, each represented by a three-member array notation (or tuple). The indices of the array represent respectively a low, medium and high presupposed state of the input node. Actual value (1, 2, or 3) of each element in the array represents the respective expected state (low, medium, or high) of the output node. For example, the rule [3, 2, 1] states that: when the input expression value is ‘low’, the output node is ‘high’(3); when the input is ‘medium’, the output is ‘medium’(2); and when the input is ‘high’, the output is ‘low’(1). This represents a classic repression-like regulation (i.e. negative control). The converse is true for a rule [1, 2, 3] representation. It implies that when the input expression value is ‘low’, the output node is ‘low’(1); when the input is medium, the output is medium(2); and when the input is ‘high’, the output is ‘high’(1).
This represents a classic activation-like regulation (i.e. positive control).
Fuzzy logic-based regulatory models
Filtering at estimated fit of 0.70, 299 output nodes and fuzzy logic regulatory models were obtained. These consist of 402 gene features. Ranked by models’ minimum difference between estimated fit in training data and independent validation data, the top models include the output nodes: TGFBR2 (model fit = 0.7011; adjusted p-value = 5.720849e-04), RIMBP3B (model fit = 0.7972; adjusted p-value = 1.907588e-05), PPP2R1A (model fit = 0.7162; adjusted p-value = 9.497439e-05), UBQLN2 (model fit = 0.7770; adjusted p-value = 4.702329e-05), WNT3A (model fit = 0.7300; adjusted p-value = 1.872238e-04), TP53 (model fit = 0.7144; adjusted p-value = 2.021138e-03), and PIK3CA (model fit = 0.7040; adjusted p-value = 7.431795e-04) amongst many others (Tables 14, 15, and 16). Enumerated to regulate the TGFBR2 gene were three regulatory inputs. These include, a inhibitory interaction (fuzzy logic rule, [2, 1, 1]) by the TNS1 gene, a stimulatory interaction (fuzzy logic rule, [1, 1, 3]) by the AXIN2 gene, and another inhibitory interaction (fuzzy logic rule, [3, 2, 2]) by the CCND1 gene products respectively. For the TP53 gene, two stimulatory interactions by the TP53I3 and the WNT1 gene products were identified (fuzzy logic rules, [1, 2, 3], and [1, 1, 3] respectively) (Tables 14, 15, and 16).
Models consolidation – Fuzzy logic-based regulatory network
Combining the derived fuzzy logic models into a consolidated network as previously described, we obtained a network with 402 nodes, 849 edges, and a network mean clustering coefficient of 0.018. Consisting predominantly of out-degrees, the maximum degree of 20 is observed at the TP53 gene feature node, followed closely by the SRC (degrees = 19), LONRF2 (degree = 12), PIK3CA (degree = 11), AKT1 (degree = 11), NTN1 (degree = 11), MAPK3 (degree = 10), AURKA (degree = 10), CCND1 (degree = 10), UNC5A (degree = 10), CHEK2 (degree = 10), ICAM1 (degree = 10) and the UBC (degree = 10) gene feature nodes. Average number of neighbors is 3.826, network diameter is 22, characteristic path length is 6.816, and network density is 0.005. Fig. 14
Regulatory Network Validation
Validating regulatory network in independent dataset by topological changes observed on dynamic simulations of inferred regulatory network, no statistical difference is seen in the distribution of monotonic (t statistic = –1.5104, p-value = 0.1315) and the adaptive changes (t statistic = 1.2079, p-value = 0.2278) across all features over a 5, 000 time steps (Fig. 15 and 16).
Clinical Significance Evaluation
Node importance estimation
Based on the characterized fuzzy logic regulatory network, we computed the hubscore, entropy, mean edge confidence, the associated model fit and the delta change (in fit estimate between training and independent test data), for each gene in the regulatory network (see Methods section) to measure the node importance and to identify key genes driving changes to sensitivity or resistance to Vorinostat in colorectal cancer. Emerging tops with respect to the hubscore, entropy, mean edge confidence, model fit and the delta change scores are the genes TP53, MAGEE1, POLE, TGFBR2 and BUB1 respectively (Supplementary Table). We calculated the normalized score accordingly, to estimate the importance score for each gene (see Methods). Top ranked genes include UBC, PTEN, SMAD2, LMO7, GNAZ, POLR2D, TP53, AKT1, RIMBP3, and CCNK (Table 17).
Survival analysis
We reasoned that features driving resistance to vorinostat are very likely drivers of aggressivenes and therefore of poor patient clinical outcomes. We evaluated the potential clinical significance of these vorinostat-resistance associated features in three different ways using colon cancer transcriptomic and clinical data from the cancer genome atlas (TCGA) assayed samples. From the genome data bank retrieved data 334 patient samples had associated clinical information. Of these, 77 samples have had a survival event. The Median time to event is 334.0 days (Mean = 540.7 days), while maximum time to event is 2821.0 days. At different sampling ratios, the 77 samples were randomly divided into a training and a test subdataset and repeating the sample division at each ratio 100 times. Assigning the node importance score of the network feature expression values, Figure 17 shows the AUC estimates’ distribution in the training data and test data weighted logistic regressions. AUC estimates ranged up to 0.99 and 0.93 in the training and test dataset, suggesting a potential optimal subset. Based on data from the TCGA samples, the top 10 percent (by node importance) of features shows a significant association with the survival probability (log-rank p-value = < 0.0001) of colon cancer patients (Fig. 18) – demonstrating the significant clinical relevance of the top identified genes and their roles in cancer progression.
Tools and implementations
The fuzzy logic regulatory inference method of Gormley et al [95] and its optimization for quicker time to inference is implemented in the platform independent Java programing language. Its source codes and precompiled binaries are freely available at the repository locations:
All statistical analysis was done in the R programing and computational statistics enviromment [59]. Network diagrams and analysis were also done using the Cytoscape tool, version 3.8.2 running on a Mac OS X 10.15.7 - x86_64 operating system.
Discussions
Though the prevalence of colon cancer appear to be on the decline particularly in the above 65 year olds, the rising incidence in the younger population remain of significant concern. Besides surgical excision of tumor tissue and the use of classical chemotherapeutic regiments such as 5-FU, oxaliplatin, irinotecan, capecitabine, leucovorin, etc. with or without radiation, the search for more rational therapy continue to gain traction. The approval of bevacizumab, panitumumab, and cetuximab for colorectal cancer buttress the potential clinical benefit of rationally designed therapies – targeted therapies developed to take advantage of unique alterations specific to cancer cells to maximize the desired therapeutic effect in malignant cells and minimize the toxicity in normal cells [121]. Currently approved for metastatic, stage IV or recurrent disease, other targeted therapies for colorectal cancer include aflibercept, ramucirumab, panitumumab, regorafenib, and pembrolizumab. These are designed against the VEGF, EGFR, or tyrosine kinases. Targeting the VEGF pathway, bevacizumab and ramucirumab are developed as monoclonal antibodies while aflibercept, a recombinant fusion protein. Cetuximab and panitumumab, targets the EGFR pathway upstream of KRAS, and particularly effective in non-KRAS activating mutation cancers. More recently approved based on on data from 149 patients with MSI-H or DNA mismatch repair cancers enrolled across 5 uncontrolled, multicohort, multicenter, single-arm clinical trials, Pambrolizumab is an antibody that targets the programmed cell death 1 (PD-1) protein in patients with microsatellite unstable tumors – associated with germline defects in the MLH1, MSH2, MSH6, and PMS2 genes[122, 123]. The proposed relationship of pambrolizumab to the DNA mismatch repair genes or gene products (MLH1, MSH2, MSH6, and PMS2) is less direct, compared to that of bevacizumab, ramucirumab or aflibercept to VEGF. Pambrolizumab targets the PD-1 protein on T cells, preventing its association with the PD-L1 ligand on tumor cells, macrophages or other tumor-infiltrating lymphocytes and myeloid cells acting in concert to suppress T-cells activation[124]. T cell activation results from presentation of mutation-associated neoantigen (MANA) that results from protein products consequent to DNA mismatches in complex with the major histocompatibility complex (MHC) protein [125].
Analogous to pambrolizumab and other targeted therapies, vorinostat can act in both direct and indirect ways on the molecular pathways regulating oncogenic processes. In addition to inhibiting histone deacetylases, evidences also point to its effect on the posttranslational modification state of proteins involved in oncogenic and tumor supression processes. Vorinostat inhibits the removal of acetyl group from the ϵ-amino group of lysine residues of histone proteins by histone deacetylases (HDACs) maintaining chromatin in an expanded state and thus facilitating transcriptional activities of major regulatory gene products such as transcription factors, cell-signaling regulatory proteins, and proteins regulating cell death[35]. Particularly described is its effect on the cyclin-dependent kinase inhibitor 1A, CDKN1A (also known as p21, WAF1/CIP1) gene transcription[36]. Richon et al found that vorinostat selectively induces CDKN1A expresion[36]. By binding to cyclin dependent kinases, CDKN1A prevents the phosphorylation of cyclin-dependent kinase substrates and thus block cell cycle progression[126].
Non-histone-protein related effect of vorinostat include increased DNA binding of the transcriptional activator TP53 (Tumor protein 53, also known as p53) from increased acetylation, and a consequent increase in p53-regulated gene transcription rate. Also, BCL6 repression of transcription is inhibited by increased acetylation as a result of vorinostat[127]. As opposed to increased transcription, vorinostat (HDACi) represses the expression of genes cyclin D1, ErbB2, and thymidylate synthase [128]. Evidently, the effect of vorinostat tips the cellular equilibriun toward cell cycle arrest, anti-proliferation and apoptosis. It thus appeal to reason that molecular pathways of resistance would be quite the opposite. In attempts to circumvent resistance, Falkenberg et al had through a functional genomics screen identified genes that when knocked down by RNA interference (RNAi) sensitized cells to vorinostat-induced apoptosis. In other words, when these genes are knocked down, they co-operated with vorinostat to induce tumour cell apoptosis in otherwise resistant cells (synthetic lethality). These included – BEGAIN, CCNK, CDK10, DPPA5, EIF3L, GLI1, JAK2, NFYA, POLR2D, PSMD13, RGS18, SAP130, TGM5, and TOX4 (see Table 18)[39], all of which are pro-proliferative and potentially oncogenic. Of importance to our study however is to determine the molecular processes underneath the observed synthetic lethal phenotype and potential clinical significance. In a follow-up study, Falkenberg et al had validated the GLI1 gene as co-operative with vorinostat[40] to induce cell cycle arrest and apoptosis in otherwise vorinostat-resistant colon cancer cell lines. Given GLI1’s role in the sonic hedgehog pathway, we had hypothesized that resistance to vorinostat is a result of uptick in embryonal gene regulatory programs. We also hypothesized that elucidated regulatory mechanism would include crosstalks that regulate this biological processes – embryonal gene regulatory programs.
Glioma-Associated Oncogene Homolog 1 (GLI1) is a zinc finger protein and a transcription factor that acts downstream of the Hedgehog (Hh) signaling pathway. It mediates morphogenesis, cell proliferation and differentiation[129–132]. From reports of its potential relationship to GLI1 and vorinostat, Falkenberg and collagues reported the repression of the BCL2L1 gene on GLI1 knockdown. Bcl-2-Like Protein 1 (BCL2L1) is a cell death inhibitor, it inhibits the activation of caspases by binding to and blocking the voltage-dependent anion channel (VDAC), preventing the release of the caspase activator, CYC1, from the mitochondrial membrane[134, 135]. At a adjusted p-value of < 0.05, we found the differential expression of BCL2L1 to significantly change in up to 8 siRNA knockdown (synthetic lethality) experiments, including siGLI1 knockdown (Adjusted p-value = 1.7444e–21, Log2 fold change = –0.6694, St. error = 0.0676), compared to controlled experiments. However, subject to our selection criteria, BCL2L1 did not make the list of selected features for regulatory network inference. Although it may not be generalized in this study, there is evidence of the potential utility of BCL2L1 repression consequent to GLI1 knockdown as a path to restoring sensitivity to vorinostat in resistant colon cancer cell lines.
The Sonic Hedgehog (SHH) pathway upstream of GLI1 consists of PTCH1, SMO, and SUFU. Binding of the Hedgehog ligand to the cell surface receptor Patched (PTCH) releases its inhibitory effect on Smoothened (SMO), which in turn activates GLI1 [40, 136]. Suppressor Of Fused Homolog, SUFU down-regulates transactivation of target genes by GLI1 [137, 138]. It forms a part of the co-repressor complex that acts on DNA-bound GLI1 and may act by linking GLI1 to BTRC – targeting GLI1 for degradation by proteasome [130, 138–140]. Amongst TTRUSTv2 transcription factor-target database [141] retrieved 19 gene-targets of GLI1, only 2, AKT1[142] and SMAD4 [143] are contained in the derived fuzzy-logic regulatory network. Both of these are also known to be regulated by PIK3A and TGF-β respectively. Argawal et al. and, Nye et al. had suggested a form of cross-talk between the Sonic Hedgehog pathway and the cell proliferative pathways of AKT1 and TGF-β, mediated through GLI1 and GLI1-SMAD4 complex respectively. However, the non-significance of other members of the SHH pathway in our derived fuzzy logic regulatory network questions its role in vorinostat-resistance or re-sensitivity in the HCT116 colorectal cancer cell lines. The significance of other canonical upstream regulators of AKT1 and SMAD4, PIK3CA (node importance rank = 77; model fit =0.701, model adjusted p-value = 0.00074) and TGFBR2 (node importance rank = 34, model fit = 0.704, model adjusted p-value = 0.00057) respectively suggests alternate mechanisms of resistance or re-sensitivity (Figs 19 and 20). These interactions may in part explain the proliferative and anti-proliferative processes observed in vorinostat resistant and re-sensitized colon cancer cell lines independent of the SHH pathway. In some form multiple feed-forward (FF) and positive feed-back (PFB) control manner, GLI1 on the other hand appears to be under regulatory control with RET and ETV4, which themselves are tightly regulated by NEURL1B and AURKA. A consequently amplified GLI1 signal activates ABLIM2, whose signal is tempered by SRC. (Fig. 21). Using in-vivo cell culture and xenograft models, Ruan et al. recently showed that RET (rearranged during transfection) enhanced transcriptional activation by HH, independent of the SHH pathway. They showed that inhibition of GLI1 led to a reduction of RET-induced proliferation of SH-SY5Y cells and outgrowth of xenografts[144]. The role of GLI1 on RET expression in neuroblastoma is well documented – GLI1 induces the expression of RET[145, 146]. Zhu et al in a xenograft model, showed that EVS variant transcription factor 4 (ETV4) depletion inhibits the CXCR4/SHH/GLI1 signaling cascade in breast cancer[147]. Such may yet be the case in the vorinostat resistant colorectal cancer cell lines.
PIK3CA upregulation of anoctamin 1 (ANO1) through AKT1 may be a mechanism of resistance to circumvent vorinotat’s activity, particularly in response to GLI1 knockdown. Mazzone et al [148], using a luciferase reporter system to determine ANO1 promoter activity, chromatin immunoprecipitation, siRNA knockdown, PCR, immunolabeling, and recordings of Ca2+-activated Cl-currents in human embryonic kidney 293 (HEK293) cells showed that binding of GLI1 represses ANO1 expression. They also showed that knocking down of GLI1 expression and inhibition of its activity increased the expression of ANO1 transcripts and Ca2+-activated Cl-currents in HEK293 cells. Relating to the activity of PIK3CA, Mroz and colleagues [149] showed that induction of the transmembrane protein 16A (TMEM16A) also known as ANO1 expression is mediated by a sequential activation of phosphatidylinositol 3-kinase (PIK3) and protein kinase C-δ (PKCδ). Our fuzzy logic approach indicates an involvement of AKT (Fig. 19). We suppose that in response to vorinostat, the PIK3CA-AKT-ANO1 pathway provides alternate escape pathway from anti-cell profliferation signatures.
With evidences pointing towards activation of cell pro-survival and cell proliferation pathways independent of SHH, the role of these pro-survival and cell proliferation pathway members (PIK3CA, AKT1, MAPK1, MAPK3, WNT3A etc) in vorinostat resistance, restoring vorinostat sensitivity or improving patient clinical response in characteristic colorectal cancer, are worth evaluating. Interestingly observed are the almost equal or more significant represention of tumor suppressor genes (PTEN, TP53, APC, UBC, GSK3B, etc) top-ranked in terms of node importance in the derived regulatory network – very likely playing the role to restore vorinostat sensitivity in the vorinostat-resistant colorectal cancer cell lines. These relationships appear to be clinically significant, given top-ranked features’ expression being predictive of colorectal cancer patients survival (Fig. 18, log rank, p-value < 0.0001) in the sampled population. Here is presented a rationale for including anti-cell prosurvival and anti-cell proliferative genes’ targeted therapy, in combination with vorinostat therapy to improve patient survival in colorectal cancer.
Conclusions
Our knowledge-guided fuzzy logic approach is able to tease the regulatory mechanism involved in histone deacetylase inhibition resistance in colon cancer cell lines from biological dataset.
There is no significant evidence that vorinostat resistance is due to an upregulation of emmbryonal gene regulatory pathways. Our observation rather support a topological rewiring of canonical oncogenic (pro-cell survival, cell proliferative) pathways, including the PIK3CA, AKT, RAS, BRAF etc. pathways. Exploring the potential clinical or biomedical significance, inferred major regulatory molecules are able to delineate patients into high- and low-risk of mortality. The identified key regulatory network genes’ expression profile are able to predict short-to medium-term survival in colorectal cancer patients – providing a rationale for an effective combination of therapeutics that target these genes (particularly the pro-cell survival and cell proliferative gene products) along with vorinostat in the treatment of colorectal cancer.
Study Limitations
Apparent is the paucity of siRNA knockdown data to encompass all potentially possible epistatic interactions that may be associated with re-sensitizing vorinostat-resistant colorectal cancer. The fuzzy logic approach models direct and indirect interactions which inherently associates relationship in molecular abundance with regulatory activation or inhibition; it incorporates no epigenetic, trans- or cis-genomic signatures independent of abundance and the consequent effects in the regulatory models inferred.
Acknowledgements
A special thanks to Dr. Iosif Vaisman, Dr. Dmitri Klimov, and Dr. Saleet Jafri for their advises on the greater body of work presented here. Here is also acknowledging the high performance computing resource at the Frederick National Laboratory for Cancer Research (FNLCR) and the Extreme Science and Engineering Discovery Environment (XSEDE) supported by the National Science Foundation at the Texas Advanced Computing Center (TACC) at The University of Texas at Austin - http://www.tacc.utexas.edu. The Frederick National Laboratory for Cancer Research is wholly funded by the National Cancer Institute.
Footnotes
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.↵
- 14.
- 15.
- 16.
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.↵
- 30.↵
- 31.
- 32.
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.
- 56.
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.
- 76.
- 77.
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.
- 83.↵
- 84.↵
- 85.↵
- 86.
- 87.↵
- 88.↵
- 89.↵
- 90.
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.
- 107.
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵
- 131.
- 132.↵
- 133.
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.↵
- 145.↵
- 146.↵
- 147.↵
- 148.↵
- 149.↵