Abstract
Drug resistance is an almost inevitable consequence of cancer therapy and ultimately proves fatal for the majority of patients. In many cases this is the consequence of specific gene mutations that have the potential to be targeted and re-sensitize the tumor. The means therefore to saturate the genome with point mutations and that avoids chromosome or nucleotide sequence context bias would open the door to identify all possible drug resistance mutations in cancer models. Here we describe such a method for elucidating drug resistance mechanisms using genome-wide chemical mutagenesis allied to next-generation sequencing. We show that chemically mutagenizing the genome of cancer cells dramatically increases the number of drug-resistant clones and allows the detection of both known and novel drug resistance mutations. We have developed an efficient computational process that allows for the rapid identification of involved pathways and druggable targets. Such a priori knowledge would greatly empower serial monitoring strategies for drug resistance in the clinic as well as the development of trials for drug resistant patients.
Introduction
Despite an increasing array of new cancer therapies, drug resistance is an almost universal phenomenon that is likely due to the presence of rare subclonal populations that act as a reservoir for resistance mutations. The emergence of drug resistance ultimately proves fatal for the majority of patients, and therefore the early detection of resistance and the identification of novel re-sensitization strategies is a subject of intense activity.
Previously, the identification of drug resistance genes has relied on either re-biopsy of cancer patients following the development of resistance or the use of cancer cell lines made resistant by exposure in vitro to drug over many weeks. Both approaches can suffer from inherent biases. With respect to the former, biopsy of a single resistant site of disease may miss alternate resistance mechanisms in other metastatic sites (Van Allen et al. 2014). Equally, serial drug exposure in cancer cell lines will favor pre-existing drug resistant clones that are specific for that cell line and may not represent the entire spectrum of resistance mechanisms for that treatment.
For these reasons, there is considerable interest in the use of forward genetic screens capable of engineering into the cancer genome mutational events that can be tested for their ability to cause drug resistance in an unbiased fashion. Such screens, if sufficiently unbiased, could in theory capture the entire breadth of genetic resistance mechanisms for any drug. Recent studies have demonstrated the power of both genome-wide gain- and loss-of-function screens using CRISPR/Cas9, lentiviral shRNA and large-scale open-reading frame technologies to identify clinically relevant drug resistance mechanisms in cancer (Hu and Zhang 2016). However, they all fail to capture a third important mechanism of drug resistance, namely that of point mutations. Point mutations account for resistance in large numbers of patients receiving targeted therapies in melanoma, colon and lung cancers and chronic myeloid leukemia (Supplementary Table 1) (Kobayashi et al. 2005; Katayama et al. 2012; Montagut et al. 2012; Ohashi et al. 2012; Bettegowda 2014; Long et al. 2014; Van Allen et al. 2014; Wagle et al. 2014; Arena et al. 2015; Russo et al. 2015; Siravegna et al. 2015; Thress et al. 2015).
N-Ethyl-N-nitrosourea (ENU) has been used as a potent mutagen in mouse models of development for over four decades (Acevedo-Arozena et al. 2008). Exposure results in the efficient generation of random point mutations throughout the cell genome (Tokunaga et al. 2014). We therefore tested whether, in cancer cell line models, ENU could be used to mutagenize the genome and enable expansion of drug resistant cells following the application of a targeted agent. As proof of concept, we chose to investigate whether this approach could identify all clinically demonstrated resistance mutations in colorectal cancer patients treated with the EGFR monoclonal antibody Cetuximab (Van Cutsem et al. 2009). We developed a sequencing and informatics approach to detect novel resistance mutations from next-generation sequence data and to detect statistical enrichment for mutually exclusive mutations in specific signalling pathways comprising more than 8,000 genes at the sample population level (Figure 1a). Our mutagenesis screen was able to successfully identify all drug resistance mutations to Cetuximab previously observed in the clinic as well as a novel mutation that we subsequently identified in a colorectal cancer patient. We suggest that this approach is a powerful and facile means to draw the landscape of point mutations that confer resistance to targeted therapies. Such knowledge could be used to discover therapeutic strategies to re-sensitize resistant tumors as well as identify which genes should be prioritised for non-invasive monitoring during treatment using plasma DNA sequencing.
Results
ENU exposure confers stable resistance to Cetuximab in colon cancer cells
We screened 51 colorectal cancer cell lines with a concentration range of the EGFR monoclonal antibody Cetuximab and assessed viability after 6 days (Figure 1b). In keeping with clinical experience of the genetic factors that underpin response to this drug, those cell lines wild-type for KRAS/NRAS/BRAF (green bars) exhibited heightened sensitivity to Cetuximab (Douillard et al. 2013). We therefore chose two of these lines, CCK-81 and NCI-H508, to use in the ENU resistance experiment. Both cell lines additionally demonstrated Cetuximab sensitivity in long-term clonogenic survival assays (Supplementary Figure 1a, b). Moreover, CCK-81 has features of microsatellite instability (MSI) whereas NCI-H508 is microsatellite stable (MSS). MSI is detected in 16% of colorectal cancers and is associated with a different phenotype and clinical outcome compared to MSS cancers. The CCK-81 cell line was exposed to a dose range of ENU (0.01–1mg/ml) for 24 hours, following which the mutagenized cells were treated with Cetuximab (10μg/ml) for 8 consecutive weeks. The number of drug resistant colonies was counted at the end of the experiment. We observed no drug resistant colonies in the absence of ENU (Figure 1c). With increasing ENU concentration we observed a linear increase in both the number of drug resistant colonies (left y-axis, blue bars) as well as their mutational burden (right y axis, green triangles). Subsequently, NCI-H508 cells were also treated with ENU (0.1mg/ml) for 24 hours followed by weekly Cetuximab treatment for 8 weeks. Drug resistant colonies were picked, expanded in culture and submitted for whole exome Illumina sequencing (total of 14 CCK-81 and 58 NCI-H508 colonies). Data was analysed for substitutions and insertions/deletions to enable an estimation of the number of ENU-associated mutations per Mb of exome and to detect novel (and putative drug resistance) mutations (Supplementary Table 2). We then performed clonogenic survival assays on a subset of resistant clones and confirmed robust and stable resistance to Cetuximab (Figure 1d).
The spectrum of ENU-induced mutations
The ability of any mutagenesis screen to capture a particular phenotype is strongly dependent on its ability to evenly saturate the genome with all 6 possible classes of base substitution type (expressed as the pyrimidine of a mutated Watson–Crick base pair, C>A, C>G, C>T, T>A, T>C, T>G). On average, we detected 451 novel mutations per exome in each clone (mean 550 and 199 in CCK-81 and NCI-H508 clones respectively), for a total of 33,857 (Supplementary Table 2). The mutations were almost exclusively composed of base substitutions (96% of total). A third of such mutations were non-synonymous variants within the coding exon of a gene where resistance mutations are more likely to occur (Figure 2a). Only 4% were potential loss-of-function truncating mutations (frameshift indels or nonsense mutations). The remaining mutations were either silent or intronic. Significantly, analysis of exome sequence data across all 72 clones (regardless of whether MSS or MSI) revealed that of the 6 possible classes of base substitution, only C>G substitutions are less well represented (3% of all substitution base changes) (Figure 2a). The mutation spectrum in ENU-derived clones was similar regardless of whether the cells came from a MSI or MSS background (Supplementary Figure 2). There was no evidence of a significant bias towards mutations in coding genes in any particular chromosome or indeed any specific region within a chromosome (Figure 2b and Supplementary Figure 3). However, given these 33,857 mutations are a mixture of those caused by ENU, background mutational processes and private subclonal variants, we elected to use a mathematical approach to specifically extract the ENU signature from the data and more accurately determine the mutation spectrum of ENU mutations.
The non-negative matrix factorization algorithm has previously been used to detect the presence of mutational signatures in human cancers, including from defects in DNA mismatch repair and altered activity of the error-prone polymerase POLE (http://cancer.sanger.ac.uk/cosmic/signatures) (Alexandrov et al. 2013b). It extracts signatures based on a 96-mutation classification that incorporates the 6 base substitution types described above as well as the immediate flanking sequence context of the mutated base (four possible 5’ and four possible 3’ bases). In our data it revealed a distinct and unique signature that was represented across almost all trinucleotide contexts in both CCK-81 and NCI-H508 clones and not previously detected in prior tumor studies, including a panel of 51 colorectal cancer cell lines (data not shown) (Figure 2c, Supplementary Figure 4). This signature (‘Signature A’) is likely one of ENU exposure. Reassuringly, the pattern of base substitutions that comprise this signature was almost identical to that seen across the entire set of substitutions detected in the ENU-derived clones, with again only C>G substitutions seen at lower frequency (Figure 2c). Thus, using this approach it should be feasible to generate the majority of theoretical coding point mutations for drug resistance across the entire genome.
As expected we detected a signature of MSI (‘Signature B’) in the combined mutational catalogue for the CCK-81 clones but surprisingly also in the NCI-H508 clones (Supplementary Figure 4) (Alexandrov et al. 2013a). On closer examination, this signature was the result of two hypermutator clones in the NCI-H508 mutational catalogue (red arrows) (clones NCI-H508_26 and NCI-H508_40) (Supplementary Figure 5). These clones had mutation rates as high as any of the MSI CCK-81 clones and increased numbers of small insertions and deletions. This would be in keeping with a defect in the mismatch repair pathway (Supplementary Table 2). Clone NCI-H508_26 was found to harbor a novel nonsense (stop-gained) mutation in the mismatch repair gene MLH1 and clone NCI-H508_40 a nonsense mutation in the DNA repair gene EXO1. Two other clones (black arrows) also have elevated mutation rates that may be the result of gaining nonsense mutations in POLQ, a gene involved in DNA damage repair. These gave rise to the third signature of unknown origin detected in NCI-H508 clones (‘Signature C’) (Supplementary Figure 4).
ENU mutagenesis identifies clinically relevant resistance mutations and pathways
A challenge in the identification of drug resistance mutations in ENU-derived clones is that each clone harbors many hundreds of ‘passenger’ mutations in addition to that conferring resistance. We hypothesised that with a sufficient population of individual resistant clones it might become feasible to use statistical enrichment for non-synonymous coding mutations in specific pathways to help identify drug resistance mutations. We therefore developed a statistical framework (SLAP-Enrich) to identify whether genetic alterations observed in multiple samples are enriched within a specific pathway in a statistically significant manner using a network of 8,056 unique genes (https://github.com/francescojm/SLAPenrich). Once significantly enriched pathways are identified SLAP-Enrich applies a final filter based on the tendency for genes in a positively selected pathway to be mutated in a mutually exclusive manner. When applying this method to the set of ENU-mutations across the 72 Cetuximab-resistant clones, we found several statistically enriched pathways (false discovery rate (FDR) < 5%) (Supplementary Table 3). The pathway most significantly enriched with mutations, ‘Signalling to P38 via RIT and RIN’, contains many of the key genes of the canonical MAP kinase pathway (Figure 3a). In total, we were able to identify credible resistance mutations in 42 of the 72 resistant clones (59%) (Supplementary Table 4). We detected credible resistance mutations in all of the genes found clinically to confer resistance to EGFR therapy in colorectal cancer (Supplementary Table 1). EGFR, KRAS, NRAS, BRAF and MAP2K1 (MEK1) were each found to be mutated in >3 clones and in a mutually exclusive manner. There was no clear difference in the frequency of specific mutations between NCI-H508 and CCK-81. Furthermore, 38/42 (90%) of these putative resistance mutations have previously been identified in colorectal patients developing resistance to Cetuximab. The most frequently observed ENU resistance mutation was that of BRAF p.V600E (13/42 clones), followed by NRAS p.Q61K (8/42) and KRAS p.G12C (4/42) (Figure 3b,c and Supplementary Figure 6). These mutations are all canonical driver mutations in tumorigenesis and known to activate oncogenic signalling and confer resistance to Cetuximab both experimentally and clinically (Diaz et al. 2012; Misale et al. 2012). We also detected EGFR mutations in 3 of the resistant clones. The EGFR I491K substitution has been shown to induce structural changes to the extracellular domain of EGFR such as to prevent Cetuximab binding and confer resistance (Montagut et al. 2012). Of the remaining 30 clones where SLAP-Enrich was unable to detect statistical enrichment of mutations in specific pathways, there are two plausible explanations – one is that these mutations do not cluster into previously characterized pathways and are therefore not detectable by SLAP-Enrich, the other is that the observed resistance is the result of mutations outside of the coding exome (for example, in enhancer/promoter or untranslated (UTR) regions) and therefore not amenable to detection using this whole exome capture approach.
ENU mutagenesis identifies novel resistance mutations in MAP2K1
Recently, two studies of plasma DNA sequencing in colorectal cancer patients undergoing treatment with EGFR monoclonal antibodies jointly identified the first MAP2K1 (MEK1) codon K57 resistance mutations (p.K57T and p.K57N) (Russo et al. 2015; Siravegna et al. 2015). In our study we also identified MAP2K1 mutations at the K57 codon (p.K57N, p.K57E) as well as at two sites not previously reported (p.F53L, p.C121S) (Supplementary Table 4). We therefore sequenced these MAP2K1 loci (together with additional mutation hotspots in 34 other genes) in a series of plasma DNA samples collected from 22 colorectal cancer patients who acquired resistance to treatment with EGFR therapies (either Cetuximab or Panitumumab) after an initial response. In addition to many of the known canonical resistance mutations (in KRAS, NRAS and BRAF), we detected in one such patient a novel p.F53L MAP2K1 mutation predicted by our screen to be a resistance mutation (Table 1, Figure 4a). As previously reported, we detected in a number of these patients more than one likely resistance mutation, in keeping with different metastatic sites evolving different resistance mechanisms.
To functionally validate the resistance effects of these MAP2K1 mutations, we treated CCK-81 cells expressing the novel p.F53L and p.C121S mutations as well as the previously identified p.K57N mutation with Cetuximab (alongside empty vector and wild-type MAP2K1 controls). We found that all of our candidate resistance mutations induced resistance to Cetuximab and the strength of the resistance effect for the mutations was comparable to that conferred by overexpression of the MET receptor tyrosine kinase, a previously identified resistance mechanism (Figure 4b, left panel) (Bardelli et al. 2013). Long-term growth inhibition assays similarly showed robust and durable resistance to Cetuximab in the MAP2K1 mutant cells (Figure 4b, right panel). Immunoblot analysis demonstrated failure to completely suppress pERK1/2 expression following Cetuximab treatment in all of the MAP2K1 clones (Figure 4c).
Rational targeting of pathways can re-sensitize drug resistant mutants to Cetuximab
Identification of the key signalling pathways that underpin drug resistance opens up the possibility of rationally targeting key components of such pathways and thus re-sensitizing resistant cells. The creation of mutagenized resistant cell lines, either through the ENU screen or through the deliberate genetic modification of the parental cell line for specific mutations, allowed us the opportunity for such experiments to be carried out in vitro. The pathway most frequently mutated in the drug resistant CCK-81 and NCI-H508 ENU clones converges towards MAPK family members and targeting these nodes might be expected to overcome resistance (Figure 3c). A Cetuximab-resistant CCK-81 BRAF V600E mutant clone (ENU-10) was resensitized when the monoclonal antibody was combined with the BRAF inhibitor Dabrafenib (Supplementary Figure 7a). Cetuximab-resistant clones harbouring mutations (in KRAS, NRAS and MAP2K1) that would be predicted to activate MAPK signalling were re-sensitized when a MEK inhibitor (Trametinib) was combined with Cetuximab (Supplementary Figure 7a,b). Similarly, combining Cetuximab with Trametinib almost completely re-sensitized the resistant MAP2K1 mutant CCK-81 cells (Figure 5a, b). Indeed, such a combination has already been suggested as putative therapeutic strategy for colon cancer patients (Misale et al. 2014).
Discussion
Drug resistance is an almost inevitable feature of anticancer treatment and is ultimately fatal in the majority of patients. Subclonal mutations, present in a small subset of tumor cells prior to drug treatment, can serve as a reservoir for the emergence of drug resistance. The population of resistant cells is likely to be at very low frequency because resistance comes with a fitness cost. Such cells are naturally subject to a process of Darwinian selection following the treatment of any cancer cell population, such that we only detect the emergence of resistant clones following treatment. The emergence of plasma DNA sequencing to allow longitudinal samples to be assayed during cancer treatments makes it ideal to detect such cancer evolution (Russo et al. 2015). In theory, therefore, this approach would avoid the bias of single organ biopsy and would be eminently more feasible to execute across large clinical cohorts.
Regardless of what technology is used to detect resistance mutations, some a priori knowledge of the likely drug resistance candidates greatly increases the sensitivity of such assays. Identifying the complete catalogue of drug resistance effectors to any drug requires in vitro studies that model resistance in the relevant tissue and genetic background. In the past such in vitro studies featured cell lines that had undergone serial passage in the presence of the candidate drug in order to force the emergence of resistant clones. Although such studies have successfully identified clinically relevant mechanisms of drug resistance in some instances, they are biased towards mainly selecting for those pre-existing resistant subclones that are particular to that specific cell line.
Forward genetic screens enable an unbiased approach to be taken in the definition of genetic events that cause drug resistance. At the forefront of such screens have been large-scale lentiviral shRNA or CRISPR/Cas9 libraries that repress gene expression (Berns et al. 2004; Konermann et al. 2015) or conversely open reading-frame libraries that overexpress full length gene transcripts (Yang et al. 2011). Such screens have been recently used to define the resistance effectors to ALK inhibition in lung cancer (Wilson et al. 2015) and to BRAF inhibition in melanoma (Shalem et al. 2014; Wilson et al. 2015). An advantage of such screens is the identification/prioritization of candidate genes for subsequent investigation in clinical samples (either tissue biopsies or plasma DNA) and the ability to model the means to re-sensitize resistant cells using combinatorial drug strategies (Misale et al. 2014). However, although such unbiased screens present us with an extraordinary opportunity to map the entire landscape of genes that when repressed or overexpressed confer drug resistance, they fail to capture a third important class of drug resistance drivers, namely point mutations. In almost every analysis of drug resistant disease across a range of tumor types, including lung cancer, melanoma, colon cancer and chronic myeloid leukemia, the landscape of resistance effectors includes significant numbers of genes that confer resistance exclusively through the acquisition of point mutations (Supplementary Table 1). Indeed, in colorectal cancer, it is increasingly evident that KRAS/NRAS/BRAF/EGFR point mutations collectively account for the majority (up to 90%) of both acquired and intrinsic resistance to EGFR therapies (Bettegowda 2014) (Douillard et al. 2013).
Genome-wide ENU chemical mutagenesis and subsequent phenotype-driven screening has been pivotal to a complete understanding of how complex biological processes operate in classical model organisms including yeast (Forsburg 2001), flies (St Johnston 2002), zebrafish (Patton and Zon 2001), and, perhaps most extensively, mice (Kile and Hilton 2005). The alkylating agent N-ethyl-N-nitrosourea (ENU) can introduce a high rate of point mutations into the genome and has two distinct advantages over previously used mutagens. First, it is very efficient, inducing a point mutation every 1 to 2 Mb throughout the genome in mouse models (~100-fold higher than the spontaneous mutation rate and 3-fold higher than X-irradiation) (Concepcion et al. 2004). Second, unlike irradiation, which induces multi-locus deletions, ENU is a point mutagen and affects single loci. ENU functions by the transfer of its ethyl group to Oxygen or Nitrogen atoms in DNA, resulting in misidentification of these ethylated bases during replication. If the mismatch is not repaired a base-pair substitution results (Justice 2000). To date, the use of ENU to define drug resistance mechanisms in cancer has been focused on specific genes in non-cancer cell line models rather than to interrogate the entire coding genome (Tiedt et al. 2011; Zhang et al. 2011; Ercan et al. 2015).
Here we establish a model for the use of genome-wide chemical mutagenesis screens to capture the diversity of clinically relevant drug resistance mutations. As proof of concept, we employed this screen in the setting of an EGFR therapy and colorectal cancer, a disease in which response to such therapy is invariably followed by the acquisition of resistance. Such resistance mechanisms are dominated by point mutations in the MAP kinase signalling pathway and have been extensively validated in patient cohorts (Supplementary Table 1) (Yonesaka et al. 2011; Diaz et al. 2012; Misale et al. 2012; Montagut et al. 2012; Bardelli et al. 2013; Bettegowda 2014). We are able to identify all clinically detected resistance mutations to Cetuximab treatment in colorectal cancer, and in addition potential therapeutic avenues to re-sensitize resistant cells. We propose that ENU mutagenesis should be incorporated alongside newer CRISPR gene editing technologies in the systematic interrogation of drug resistance given the prevalence (and potential for therapeutic targeting) of point mutations as mediators of resistance in cancer.
Methods
Materials
All cell culture was performed in either RPMI or DMEM/F12 medium (according to the supplier’s recommendations) and supplemented with 5% FBS and penicillin/streptavidin. Cells were maintained at 37°C and 5% CO2 during culture. The identity of all cell lines used in this paper was confirmed using a panel of 95 single nucleotide polymorphisms (SNPs) used previously for cell line authentication (Fluidigm, San Francesco, CA).
Immunoblotting
Differential phosphorylation of proteins in signalling pathways were analysed by western blot. Cells were plated 24 hours prior to drug treatment and incubated for indicated times and concentrations. Adherent cells were then washed with PBS and collected after indicated incubation time with drug using lysis buffer containing 5% β-mercaptoethanol, 150mM NaCl, 50nM Tris pH 7.5, 2nM EDTA pH 8, 25nM NaF, 1% NP-40, protease inhibitors (Roche), phosphatase inhibitors (Roche). Lysates were then normalised after bicinchoninic acid (BCA) assay using lysis buffer. Protein lysates were resolved using SDS page electrophoresis in precast Invitrogen 4–12% Bis-Tris gels and transferred for 12 hours. Primary antibodies: p44/42 MAPK, Phospho-p44/42 MAPK (Thr202/Tyr204) and Akt were sourced from Cell Signalling and phospho-Akt (pS473) was sourced from Invitrogen. Monoclonal β-tubulin was sourced from Sigma (USA).
Drug sensitivity assays
Cells were seeded in 96-well plates for 6-day assays and 6-well plates for clonogenic assays. Cells were incubated in drug free media to allow for adherence for 24 hours before the addition of drug at indicated concentrations. Each cell line was seeded to achieve approx. 70% confluency at the end of the assay. Cetuximab was obtained from the Addenbrookes’ Hospital Pharmacy, Cambridge (UK). Trametinib (GSK1120212) and Dabrafenib were obtained from Selleckchem (USA).
ENU mutagenesis of cell lines
Cells were incubated in ENU at the indicated concentration for 24 hours before being washed 3 times with PBS and incubated in media for a further 24 hours. Cells were then selected with 10μg/ml Cetuximab 48 hours post-ENU exposure for 8 weeks. Clones were then picked using Scienceware small cloning cylinders (Wayne, NJ) and either transferred to 96 well plates or expanded into large flasks for drug sensitivity assays. DNA was extracted in 96-well plate format using the Agencourt DNAdvance Genomic DNA Isolation kit.
Whole Exome Sequencing
Exome sequencing was carried out using the Agilent SureSelectXT Human All Exon 50Mb bait set. 72 clones were DNA extracted and underwent library construction, flow cell preparation and cluster generation according to the Illumina library preparation protocol. We performed 75-base paired-end Illumina sequencing. Read alignment to the reference human genome (GRCh37) was performed using the Burrows-Wheeler Aligner (BWA) (http://bio-bwa.sourceforge.net/). Unmapped reads were excluded from the analysis. The average coverage across CCK-81- and NCI-508-derived clones was 65X and 62X respectively. The matched parental cell lines were sequenced at greater depth (158X in CCK-81 and 144X in NCI-H508).
Variant Detection
Single nucleotide substitutions were called using the CaVEMan C (Cancer Variants through Expectations Maximisation) algorithm and insertions/deletions were called using split-read mapping implemented in the Pindel algorithm (https://github.com/cancerit). Variants were identified by comparison to a reference single matched sample consisting of a high sequence coverage contemporary parental cell line control.
Data Filtering To Remove Pre-existing Subclonal Variants
A number of clones shared mutations which were present in a small percentage of reads in their corresponding contemporary parental cell line sequence. These subclonal mutations could confound subsequent pathway analysis by causing enrichment in a pathway due to mutations which were present before ENU treatment but were not called due to their low representation. To overcome this problem, variants were filtered against the deep sequenced contemporary parental control after mutation calling via Caveman and Pindel. The Samtools mpileup algorithm was used to remove any mutations which were present in 0.5% or more reads in the high coverage parental cell line control (http://samtools.sourceforge.net/). The final set of mutations were used to generate an event matrix for all 72 clones (Supp. Table 5) and used as the input file for SLAP-Enrich analysis described below.
Deciphering mutational signatures branding exome sequences of clones exposed to ENU
The immediate 5′ and 3′ sequence context of base substitutions identified across Cetuximab-resistant clones was extracted using the ENSEMBL Core APIs for human genome build GRCh37 and was used to generate mutational catalogues for the downstream analysis. The mutational catalogue of CCK-81 Cetuximab-resistant clones contained 7,198 substitutions, while the NCI-H508 clones contained a total of 23,862 substitutions. Mutational signatures were deciphered separately across both catalogues of mutations, using a previously developed computational framework (Alexandrov et al. 2013b). Briefly, the algorithm identifies a minimal set of mutational signatures that optimally explains the proportions of mutation types found across a given mutational catalogue (i.e. across all substitutions identified in CCK-81 and NCI-H508 clones; Supplementary Figure 4); and then estimates the contribution of each identified signature to a mutation spectra of each sample included in analysis (i.e. to a mutation spectra of each individual clone; see for NCI-H508 clones - Supplementary Figure 5).
Sample Level Analysis of Pathway Enrichments (SLAP-Enrich)
As a first step, SLAP-Enrich estimates the probability of observing at least one gene belonging to a given pathway mutated in a given sample, based on the length of the total exon blocks of the genes in that pathway, and the sample mutation burden. Once this probability has been estimated for each individual sample, SLAP-Enrich models the likelihood of observing a given number of samples with mutations in the pathway under consideration through a Poisson binomial distribution. This is the discrete distribution of a sum of Bernoulli trials in which the probability of success is not constant. It is used by SLAP-Enrich to compute the deviance of the number of observed samples with mutations in a given pathway from its expectation through a corresponding p-value assignment.
Pathway gene-sets collection and post-processing
A collection of pathway gene sets was downloaded from the Pathway Commons data portal (v4-201311) (http://www.pathwaycommons.org/archives/PC2/v4-201311/) and used in SLAP-Enrich. This contained an initial catalogue of 2,893 gene sets (one for each pathway) assembled from multiple public available resources, such as Reactome, Panther, HumanCyc and PID, and covering 8,148 unique genes. From this catalogue, gene sets containing less than 4 genes were discarded.
In order to remove redundancies those gene sets 1) corresponding to the same pathway across different resources or 2) with a large overlap (Jaccard index (JI) > 0.8, as detailed below) were merged together by intersecting them. The gene sets resulting from this compression were then added to the collection (with a joint pathway label) and those participating in at least one merging were discarded. The final collection resulting from this pre-processing was composed by 1,636 gene sets, for a total amount of 8,056 unique genes. Given two gene sets P1 and P2 the corresponding JI is defined as:
Statistical framework
Let G = {g1, g2, ⋯, gn} be a list including all the genes whose mutational status across a set of samples S = {s1, s2, ⋯, sm} has been determined by whole exome sequencing profiling, and f:G × S → {0, 1} a function defined as f(g, si) = {1 if g is altered in sample si and 0 otherwise}.
Given a pathway gene set P, the aim is to assess whether there is a tendency for that pathway to be recurrently genetically altered across the samples in S. In what follows we will refer to P and the pathway corresponding to P interchangeably.
We assume that the pathway P is altered in sample si if ∃g ∈ P such that g ∈ G and f(g, si) = 1, i.e. at least one gene in the pathway P is altered in the i-th sample. First of all we quantify how likely it is to observe at least one gene belonging to P altered in sample si. To quantify this probability, let us introduce the variable Xi = |{g ∈ P:g ∈ G and f(g, si) = 1}|, whose value is equal to the number of genes in P that are altered in sample si.
Under the assumption of both a gene-wise and sample-wise statistical independence, the probability of Xi assuming a value greater or equal than 1 is given by: where is the sum of the exonic block length of all the genes in pathway P, and ρ the background mutation rate, which can be computed from analysed dataset directly or set to established estimated values (typically 10−6/nucleotide).
These pi can be considered as the success probability of a set of Bernoulli trials {i} (with i = 1,…, M) and summing them across all the elements of S gives the expected number of samples harbouring a mutation in at least one gene belonging to the pathway P:
On the other hand, if we consider a function φ on the domain of the X variables, defined as φ(X) = {1 if X ≥ 1 and 0 otherwise}, then summing the values assumed by this function across all the samples S gives the observed number of samples harbouring a mutation in at least one gene belonging to P:
A pathway alteration index quantifying the deviance of O(P) from its expectation can be then computed as:
To assess the significance of such deviation, let us observe that the probability of the event O(P) = y, with y ≤ M, (i.e. the probability of observing exactly y samples in which the pathway P is altered) distributes as a Poisson binomial B (a discrete probability distribution modeling the sum of a set of independent Bernoulli trials that are not identically distributed). In our case, the i-th Bernoulli trial accounts for the event “the pathway P is altered in the sample si” and its probability of success is given by introduced above. The parameters of such B distribution are then the probabilities {pi}, and its mean is given by Equation 2.
The probability of the event O(P) = y can be written then as: where Fy is the set of all the possible subsets of y that can be selected from {1, 2,…, M} (for example if M = 3, then F2 = {{1, 2}, {1, 3}, {2, 3}}), and Ac is the complement of A (i.e. {1, 2,…, M}\A).
Hence a p-value can be computed against the null hypothesis that O(P) is drawn from a Poisson binomial distribution parametrised through the vector of probabilities {pi}. Such p-value can be derived for an observation O(P) = z, with z ≤ M, as:
Mutual Exclusivity Filter
After correcting the p-values yielded by testing all the pathways in the collection with the Benjamini-Hockberg method (http://www.math.tau.ac.il/~ybenja/MyPapers/benjamini_hochberg1995.pdf), SLAP-Enrich further filters the pathways whose enrichment false discovery rate (FDR) is below a user defined threshold (in this study 5%) based on mutual exclusivity criteria as a further evidence of positive selection. Particularly for a given enriched pathway P, an exclusive coverage score M(P) is computed as where O(P) is the number of samples in which at least one gene belonging to the pathway gene-set P is mutated, and Ȯ(P) is the number of samples in which exactly one gene belonging to the pathway gene-set P is mutated. In this study, all the pathways P with M(P) ≥50 pass this final filter.
Site-directed mutagenesis of MAP2K1 expression vectors
In order to validate candidate there four drug resistance mutations from the ENU-based forward genetic screen we sought to create mutated vectors to express within Cetuximab sensitive colorectal cell lines. Wild-type construct for MAP2K1 was ordered from Dharmacon (Lafayette, CO) and taken forward for in vitro site-directed mutagenesis reactions using the GENEART® Site-Directed Mutagenesis System from Thermo Fisher Scientific (Waltham, MA). To achieve this, two complementary mutagenic oligonucleotide primers were designed (obtained from Sigma-Aldrich (St Louis, MO) and used to generate gene cDNA expression constructs with desired mutations. Mutations were confirmed using Sanger Sequencing, before being delivered into cells using lentiviral infection.
Plasma DNA sequencing
DNA extraction was performed with QIAmp DNA Mini kit (Qiagen, Hilden, Germany). Library preparation was done with the Oncomine™ Focus Assay (Thermofisher Scientific, Waltham, MA USA) following the manufacturer’s instructions. After barcoding, libraries were equalized to 100pM. The sequencing template was prepared using the IonPGMSequencing 200 Kit v2 and sequenced in an Ion Select 318 chip using the PGM Sequencing 200 Kit v2 with 500 flows. Hotspot mutations in 35 genes were targeted using the Oncomine Focus Assay (ThermoFisher) (Supplementary Table 6). Variant Caller v4.0.r73742 was used for variant calling with the Ion Reporter Software. All filtered variants were also analyzed with the Integrative Genomic Viewer (IGV v2.3) software.
Data Access
All raw sequence data has been deposited in the European Genome-Phenome Archive (EGAS00001001743, EGAS00001001744, EGAS00001001745). SLAP-Enrich is implemented as an R-Package and public available on GitHub at https://github.com/francescojm/SLAPenrich together with a pipeline and detailed instructions to reproduce the results presented in this manuscript
Author Contributions
J.B. performed the majority of experiments, analysed the data and contributed to writing the manuscript. M.P., L.G.A and C.M. analysed the data. S.B. and M.G. designed and ran the combination screen. C.M., A.D. and B.B. carried out and analysed the plasma sequencing. F.I. designed the computational methods, analysed the data and contributed to writing the manuscript. J.S.R supervised the analysis. U.M. conceived the project, analysed the data and contributed to writing the manuscript.
Disclosure Declaration
Competing financial interests - none of the authors hold any competing interests.
Acknowledgements
UM was supported by a Cancer Research UK Clinician Scientist Fellowship. FI was supported by the European Bioinformatics Institute and Wellcome Trust Sanger Institute. JB, SP and JY are supported by a ERC Synergy Grant.
Footnotes
↵6 Co-senior author