Abstract
Mouse models have an essential role in cancer research, yet little is known about how various models resemble human cancer at a genomic level. However, the shared genomic alterations in each model and corresponding human cancer are critical for translating findings in mice to the clinic. We have completed whole genome sequencing and transcriptome profiling of two widely used mouse models of breast cancer, MMTV-Neu and MMTV-PyMT. This genomic information was integrated with phenotypic data and CRISPR/Cas9 studies to understand the impact of key events on tumor biology. Despite the engineered initiating transgenic event in these mouse models, they contain similar copy number alterations, single nucleotide variants, and translocation events as human breast cancer. Through integrative in vitro and in vivo studies, we identified copy number alterations in key extracellular matrix proteins including Collagen 1 Type 1 alpha 1 (Col1a1) and Chondroadherin (CHAD) that drive metastasis in these mouse models. Importantly this amplification is also found in 25% of HER2+ human breast cancer and is associated with increased metastasis. In addition to copy number alterations, we observed a propensity of the tumors to modulate tyrosine kinase mediated signaling through mutation of phosphatases. Specifically, we found that 81% of MMTV-PyMT tumors have a mutation in the EGFR regulatory phosphatase, PTPRH. Mutation in PTPRH led to increased phospho-EGFR levels and decreased latency. Moreover, PTPRH mutations increased response to EGFR kinase inhibitors. Analogous PTPRH mutations are present in lung cancer patients and together this data suggests that a previously unidentified population of human lung cancer patients may respond to EGFR targeted therapy. These findings underscore the importance of understanding the complete genomic landscape of a mouse model and illustrate the utility this has in understanding human cancers.
Introductory paragraph
Heterogeneity in human breast cancer is present in genomic events, gene expression, metastatic potential, and treatment response. To assess gene function in tumor biology, studies have used model systems, including genetically engineered mouse models (GEMMs). Recent work characterized the transcriptional landscape of breast cancer GEMMs, with particular attention paid to relationships with human breast cancer1–3. However, whether additional genomic events are required for tumor development and progression in these models remains unknown. Here we present whole genome sequencing data of two highly utilized mouse models of breast cancer, MMTV-Neu4 and MMTV-PyMT5. In PyMT tumors we identified a highly conserved mutation in the protein tyrosine phosphatase receptor (Ptprh) resulting in elevated EGFR activity and erlotinib sensitivity. In Neu tumors, a copy number alteration including Collagen Type 1 Alpha 1 (Col1a1) and Chondroadherin (Chad) altered metastatic potential, which was validated through genetic ablation. Together, this data demonstrates that genomic alterations beyond the initiating oncogene need to be considered when choosing a model system for breast cancer.
Main text
To characterize the genomic landscape of the MMTV-Neu and MMTV-PyMT tumors, we created a tumor database with complete phenotypic characterization including tumor latency, histology, and metastatic burden (Table S1). Representative tumors from this database were selected for whole genome sequencing and whole transcriptome profiling by microarray. The analysis pipeline then correlated phenotypic changes with molecular profiling, including transcriptomics and sequence alterations. The resulting genes were then filtered through human breast cancer datasets to ensure relevance to human breast cancer and confirmed with in vitro / in vivo experiments (Figure 1A). A high degree of transcriptomic diversity both between and within each model was observed in hierarchical clustering (Figure 1B). As expected, this heterogeneity correlated with tumor histological subtype rather than tumor model, consistent with recent studies1, 6. It was hypothesized that these differences in expression were driven by genomic changes.
Following standard informatic pipelines, the whole genome sequence was analyzed. To validate bioinformatic calls of SNVs and CNVs we used PCR and qPCR, observing a validation rate of 85% (Table S2). Whole genome sequencing revealed large differences in the genomic landscape of the MMTV-Neu (Figure 1C) and MMTV-PyMT (Figure 1D) tumors. The two tumor models had similar numbers of SNVs (Figure 1E, Table S3), however both models were ~20X more stable than human breast tumors with 0.049 mutations/megabase in the mouse models in comparison to an average of approximately 1 mutation/megabase in breast cancer7. Copy number alterations (Figure 1F, Table S4) and translocations (Figure 1G, Table S5) were more frequent in the MMTV-Neu model relative to MMTV-PyMT.
To understand the specific role of copy number alterations within the two models we compared copy number variants present in the mouse models with those also in the human breast cancer. This analysis identified 11 candidate genes which were highly altered in breast cancer (Figure S1) and predicted to impact tumor biology based upon a literature screen. qPCR gene copy number analysis across an extended tumor panel (15 MMTV-PyMT, 10 MMTV-Neu) identified the rate at which each copy number variant occurred throughout the model (Figure 2A). This analysis showed that while each of the copy number variants predicted through bioinformatic means were valid (Table 2S), the depth of the amplification was largely around 1.5 fold indicating shallow amplification events (Figure 2B). Interestingly we identified the largest diversity of copy number profiles in the 11D locus. This locus includes a total of 40 genes, 19 with transcriptomic differences. Depending on the presence or absence of the locus, the tumors exhibited striking differences in structure and behavior. We identified dramatic differences in the tumors with the presence of an 11D amplification with regards to collagen content through a Mason’s trichrome (Figure 2C) stain and the presence of metastatic lesions in the lungs (Figure 2D).
To identify the driving genes of the metastatic phenotype associated with 11D amplification, we examined human breast cancer for distant metastasis free survival outcomes and then created CRISPR-Cas9 generated knockouts of two potential metastasis related proteins within the region, Collagen type 1 alpha 1 (Col1a1) and Chondroadherin (Chad). Knockouts were generated in two mouse driven tumor cell lines NDL2-58 and PyMT 4199 (Figure S2). NDL2-5 is an 11D amplified Neu driven line, while the 419 line is diploid for the 11D locus and is driven by PyMT expression. Knockouts of each gene in both cell lines revealed defects in the ability to migrate in a wound healing assay (Figure 2E and Figure 2F). Migration was partially rescued with addback of wildtype Col1a1 or Chad, demonstrating that migration defects were not due to off target effects (Figure S3). Defects in lung colonization in a tail vein injection were also observed with the Col1a1 and Chad knockout cell lines (Figure 2G and Figure 2H).
Mouse chromosome 11D is conserved in humans and is analogous to chromosomal region 17q21.33. There is similar amplification event at 17q21.33, including COL1A1 and CHAD that occurs in 8% of breast cancer patients. Array CGH from the TCGA data10 demonstrates that COL1A1/CHAD amplification was distinct from HER2 amplification (Figure 3A). Importantly, this amplification is subtype specific; 25% of Her2+ breast cancers have a co-amplification of the 17q21.33 region along with the HER2 amplicon while only 6% of Luminal A, 7% of Luminal B, and 1.2% of Basal breast cancers have amplification (Figure 3B). To investigate the transcriptional impact of the amplification event we used weighted gene correlation network analysis11. This identified a robust transcriptional signature that differentiated COL1A1/CHAD, Her2 positive tumors from Her2 positive tumors without the amplification event (Table S6). Unsupervised hierarchical clustering readily identified separation of the two HER2 positive subtypes based on this signature (Figure 3C). These correlated genes were used in a predictive signature to correlate patient outcome with predictive amplification status (Figure S4) revealing that metastasis was associated with the amplification event (Figure 3D).
To test whether COL1A1 and CHAD were driving the metastasis phenotype in human breast cancer, we used CRISPRi12 to knockdown COL1A1 and CHAD in the Her2 amplified, COL1A1/CHAD amplified breast cancer line BT-474. These knockdowns showed a decreased ability to migrate in a wound healing assay (Figure 3E and 3F). Importantly the knockdown lines also were unable to metastasize to the lung after being injected into the mammary fat pad (Figure 3G and 3H). Together these data underscore the importance of identifying copy number variation in mouse models of cancer.
In addition to copy number alterations, the whole genome sequence data resulted in the identification of numerous mutations (Figure 1C-E). When COSMIC mutational signatures13, 14 were applied to the models, it was observed that the tumor models had similar mutational processes (Figure S5). The MMTV-Neu and MMTV-PyMT tumors both contain the same trinucleotide context of their mutation spectrum. The mutation spectrum shows all nucleotide substitutions present with a slight bias towards C/T and T/C transitions. When compared to the human mutational signatures, the mutational processes present in both mouse models closely resembles COSMIC signature 5 (Fig S5C). This signature has been shown to be present in breast cancer patients with disease associated with late onset15, indicating a similar mutational process in both the human disease and mouse models Distribution of SNVs reflected patterns seen in the transcriptional data (Figure 1B) with some events shared between Neu and PyMT tumors while others were unique to the models. Considerable SNV diversity within a model was also prevalent. For instance, the MMTV-Neu model had no genes with shared mutations in all samples and only five genes containing a coding, non-synonymous mutation in more than one sample (Figure S6). Notably we identified mutations within Mucin 4 (Muc4) which are potentially impactful due to Muc4’s emerging roles in Her2 positive cancer and metastasis16. Interestingly, we observed that PyMT induced tumors had more SNVs in the coding regions of the genome. Specifically, these mapped to 34 genes, 9 of which overlapped with Neu tumors. A number of genes with coding mutations specifically in PyMT tumors, including Matn2, Plekhm1, Muc6 and Ptprh were observed. Matn217, Plekhm1 and Muc619 have all been demonstrated to have roles in tumor progression and metastasis and may contribute to the high metastatic capacity of the MMTV-PyMT model.
To test the frequency of these coding mutations in the models as a whole we selected a population of 10 MMTV-Neu tumors and 15 MMTV-PyMT tumors for targeted resequencing. From these tumors we extracted genomic DNA and performed PCR based amplification followed by Sanger sequencing of Matn2, Plekhm1 and Ptprh. While Matn2 and Plekhm1 confirmed the whole genome sequencing variant calls in the sequenced tumors, additional mutations were not found. Strikingly, Ptprh was found to be mutated in 81% of MMTV-PyMT tumors. Furthermore, the Ptprh mutation was shown to be homozygously mutated in 21% of PyMT tumors and heterozygously mutated in 60% of PyMT tumors (Figure 4A and 4B). Surprisingly, an identical C to T mutation was observed in each tumor resulting in a valine residue being converted to a methionine at amino acid 483 (V483M). To test for the conservation of mutations of Ptprh in mouse strains beyond FVB/NJ, we sequenced Ptprh of MMTV-PyMT models in a C57/Bl6, C57/Bl10, CAST, and MOLF backgrounds as well as a different inbred MMTV-PyMT FVB /NJ line. This analysis showed consistent mutation in the structural fibronectin domains (FN3) and the phosphatase domain of Ptprh (Figure 4C). Interestingly we found that the two FVB models contained different mutational patterns indicating an impact of environmental and potential epigenetic causes of mutational hotspots.
Given that recent work identified the target of PTPRH as EGFR20, we hypothesized that EGFR was not dephosphorylated with Ptprh mutation. Testing this, we observed that the V483M mutation correlated with pEGFR levels (Figure 4D and Figure 4E). With the resulting increase in EGFR activity, we also observed a significant decrease in tumor latency (Figure 4F). With an increase in EGFR activity, it was possible that tumors with mutant Ptprh would be dependent upon EGFR signaling. To test this prediction, cell lines derived from Ptprh wildtype and mutant tumors were treated with EGFR targeted therapy. After 48 hours, tumors containing Ptprh mutations were shown to be more sensitive to erlotinib treatment (Figure 4G).
Given the role of EGFR in lung cancer, we next sought to determine if there was a non-EGFR mutant patient population within lung cancer that could benefit from EGFR inhibition. Examination of the pan–lung TCGA data revealed 5% of patients with a mutation in PTPRH. Importantly, these mutations were shown to be mutually exclusive from EGFR, indicating that patients were likely not treated with EGFR tyrosine kinase inhibitors. To confirm the impact of PTPRH mutations on EGFR activity in human lung tumors we used gene set enrichment analysis to predict EGFR activity of each mutant PTPRH sample. This analysis revealed four key hotspots of mutations driving high EGFR activity, including three in the FN3 domains and one in the phosphatase domain of PTPRH (Figure 4I)
Together these data emphasize the heterogeneity within tumor models and the importance of understanding the genomic landscape within the tumors. Here we presented a proof of concept study to identify a number of events that have influenced key tumor phenotypes, including metastasis and tumor latency. Despite the limitations with the number of samples, this study offers a unique opportunity to identify novel genomic alterations which impact tumor behavior and treatment response. These findings have direct therapeutic impact with a potential impact on patient therapeutic intervention and metastatic progression of their disease and underscore the important role genetically engineered mouse models have in understanding tumor biology.
The presence of the Col1A1/CHAD amplification event in the mouse model mirrors the 25% of human HER2+ve breast cancers that also had amplification of a structurally conserved region. Given the potential role for these genes in metastasis, this work clearly indicates that the MMTV-Neu system is an appropriate model for select facets of HER2 tumor biology, including the additional amplification event. However, the lack of amplification of genes surrounding erbB2 in the mouse model indicates that other models with erbB2 amplification21 may be more suitable for other studies.
In the PyMT model system, tumor onset is rapid with mice developing tumors 45 days after birth. Despite this rapid onset, the data presented here indicates that over 80% of PyMT tumors acquire the identical mutation in Ptprh, suggesting that there is a significant evolutionary pressure applied during the initial transformation. Given that the Ptprh mutant does not dephosphorylate EGFR, this results in unchecked activation of a key signaling pathway. In part, this event may impact metastatic progression of this model system. However, likely the highest impact from this discovery will be in the identification of additional tumor patients that may well benefit from EGFR TKI therapy.
Taken together, this manuscript provides a resource for investigators to determine how well the subtype of cancer they examine is represented by these model systems. While we have explored two genomic events, others of interest are noted and their impact will be elucidated. The data from these two models also underscores the importance of a complete characterization of GEMMs for human cancer.
Methods
Animal Studies
All animal husbandry and use was conducted according to local, national and institutional guidelines. The MMTV-Neu4 and MMTV-PyMT5 mice were in the FVB background. MMTV-PyMT634 and MMTV-Neu mice were obtained from The Jackson Laboratory. Mice were monitored twice weekly for tumor initiation and growth. At a 2000 mm endpoint, mice were necropsied. For mice with multiple tumors the endpoint was established when the primary tumor was at 2000 mm3. Tumors and lungs were collected for genomic analysis, hematoxylin and eosin staining for histological subtyping and presence of pulmonary metastases. The number of metastasis was quantified using a single cut through the lung and count of the number of micro-metastases in that plane. Masson’s trichrome staining was used to examine tumors for collagen deposition using standard methods.
Whole genome sequencing
Flash frozen tumor pieces were ground and DNA was extracted with the Qiagen Genomic-tip 20/G with the manufacturer’s protocol. DNA was sequenced to a depth of 40X with paired end 150 base pair reads on an Illumina HiSeq 2500 using the Illumina TruSeq Nano DNA library preparation.
Transcriptomic profiling
1 22 23
Transcriptome data for this study was previously published1, 22, 23. Data was downloaded from GSE42533 (MMTV-Neu) and GSE104397 (MMTV-PyMT) as.cel files. Affymetrix expression console was used to normalize each individual dataset using RMA normalization. To remove batch effects between datasets BRFM normalization24 was performed with standard parameters.
Clustering
Unsupervised hierarchical clustering was performed using Cluster 3.0 and heatmaps were created using the MATLAB imagesc function.
Variant calling
Generated.fastq files were assessed for quality control using FASTQC analysis http://www.bioinformatics.babraham.ac.uk/projects/fastqc. Reads were trimmed for quality using Trimmomatic25. After trimming, data was reassessed for quality using FASTQC. Then reads were aligned to the mm10 mouse reference genome using BWA-mem26. After alignment, base recalibration and pcr induced biases were removed using PICARD tools (http://broadinstitute.github.io/picard). For variant calling we utilized four software packages, GATK27, Mutect228, Strelka29, and SomaticSniper30. To be a legitimate variant we filtered to only those variants called by 3 of the 4 packages. To control for differences in the FVB strain and the mm10 reference genome we used previously published normal FVB tissue (ERR046395)31. To call copy number and structural variants we used Delly32. For copy number we used default quality control settings and only analyzed those copy number events which had precise boundaries and were larger than 100KB. For translocations we used default quality control setting and precise breakpoints.
Variant verification and extended tumor panel sequencing
For verification of SNVs we used PCR based amplification followed by Sanger sequencing. For validation of CNVs we used qPCR on the genomic DNA with the Quantabio PerfeCTa SYBR green kit under the manufacturer’s specifications. Primers for PCR and sequencing are listed in Table S7
Circos visualization
Representative MMTV-Neu and MMTV-PyMT samples were chosen to be displayed as CIRCOS33 plots. CIRCOS plots were generated using CIRCOS v 0.69 and SNVs, CNVs, and translocations were mapped according to their location on the mm10 genome.
Mutation signatures
Due to the low mutational burden of MMTV-Neu and MMTV-PyMT tumors, mutations were combined into a signal analysis for each model. These samples were processed with MutSpec-NMF34 for trinucleotide context and comparison to the known human mutation signatures.
Cell lines
The PyMT 419 cell lines were a gracious gift from Dr. Stuart Sell and Dr. Ian Guess9. The NDL2-5 cells lines were obtained as a gift from Dr. Peter Siegel8. The BT-474 cell line was obtained from Dr. Kathy Gallo and validated using fingerprinting analysis performed at Michigan State University.
CRISPR generated knockouts of PyMT 419 and NDL2-5
CRISPR/Cas9 constructs were created to knockout Col1a1 and Chad in PyMT 419 and NDL2-5. Guides were designed and inserted into Px458, obtained from addgene (Addgene #48138) as a gift from Feng Zhang, as previously described35. Cells were sorted using FACS technology into single cells and grown into clonal population, then screened for the presence of INDELs using Sanger sequencing. Knockouts were further confirmed for the NDL2-5 lines using western blot. Guide Sequences are listed in Table S7.
CRISPRi generated knockdowns in BT-474
Knockdowns of Col1a1 and CHAD were created in the BT-474 line using CRISPRi technology. gRNA were cloned into a plasmid containing the gRNA under the control of the U6 promoter (Addgene plasmid #60955)36. Lenti virus was created for stable expression of this plasmid and the stable expression of KRAB-Cas9 fusion protein (Addgene plasmid #60954)36. Cells were infected with KRAB-Cas9 expression virus first and selected for uptake by puromycin treatment. The stable KRAB-Cas9, BT474 line was then infected with the virus for stable selection of the gRNA for CHAD or COL1A1. These were then sorted using flow cytometery for RFP expression into a pooled population and validated knockdown through western blot. The plasmids used in the part of the project were obtained through Addgene as a gift from Jonathan Weissman.
Wound healing assay
Wound healing assays were performed similarly for all cell lines in the manuscript. Cells were grown to 100% confluence in a six well plate then a wound was created in the middle of the plate. Cells were allowed to close the wound for 24 hours in the presence of Mitomycin C growth inhibitor then the cells were imaged. Images were quantified for the amount of migration into the wound using ImageJ.
Tail vein injection
NDL2-5 Chad and Col1a1 knockout cell lines were injected into the tail vein of syngeneic FVB/NJ mice. Cell were suspended in PBS in a single cell population and injected in a single bolus of 500×105 cells in 50uL. Mice were monitored for 9 weeks then euthanized. At this point, lungs were collected and stained with Hematoxylin and Eosin to identify the presence of pulmonary metastases.
Mammary fat pad injection
NDL2-5 WT and cell lines were suspended in PBS and injected into mammary gland number four in syngeneic FVB/NJ mice as a single bolus of 1×106 cells. The mice were monitored twice weekly until tumors reached an endpoint of 2000 mm in diameter.
BT474 wildype and CHAD/COL1A1 knockout lines were suspended in a 1:1 concentration of matrigel:PBS mixture and injecting into the mammary gland number four in a single bolus of 1×106 cells. Balb/C nude mice were used for these studies. Tumors were monitored until a size of 1000 mm3 in diameter. Tumors were then resected and mice were monitored for an additional four weeks. At necropsy lungs were imaged for RFP using the IVIS imaging system and then processed for hematoxylin and eosin staining.
Human dataset usage
All human datasets used in this study are publically available and noted as used in the manuscript. For genomic alteration frequency the TCGA Breast cancer10 and the TCGA-pan Lung cancer37 datasets were used. For the expression based survival data the KMPlot.com dataset38 was used.
Western blotting
Western blots in this manuscript were completed under manufacturer’s specifications. Blocking was performed for 1 hour by incubation at room temperature with the LiCor blocking reagents. Western blots were imaged using the LiCor system. The following antibodies were used: COL1A1 (Origene TA309096), CHAD (Abcam ab104757), EGFR (CST D38B1), pEGFR (Invitrogen PA5-37553), HSP90 (CST 4874S), Beta-tubulin (CST 2128S), anti-rabbit secondary (Licor 926-32211), anti-mouse secondary (Licor 926-68070)
Erlotinib sensitivity assay
Cell lines derived from Ptprh mutant and wildtype tumors were seeded at a concentration of 250 cells/mL and subjected to erlotinib treatment for 48 hours with the concentrations stated in the manuscript. Eroltinib was purchased from Cayman Chemical. After treatment with erlotinib or DMSO control, cells were given fresh media to grow for 7 days. Cells were then fixed and stained with crystal violet for counting.
Data Availability
The datasets generated during and/or analysed during the current study are available in the GEO and SRA repositories
Author contributions
JR and EA collaborated on the study conception, design, and interpretation of results. MSU provided annotation for translocations and CAN analysis. YZ assisted with copy number validation. CL, EB, MJ, and WH provided assistance with in vitro experiments. CR, KS, and KH collected samples and performed WES. KH assisted in the writing of the manuscript and WES study design. JR performed all other experiments and drafted the manuscript. All authors have critically read, edited, and approved the final version of the manuscript.
Competing interests
The authors declare no competing interests.
Acknowledgements
We thank the members of the Andrechek laboratory for helpful discussions. We thank the Michigan State Investigative HistoPathology Laboratory for the assistance with staining. This work was supported in part by Michigan State University through computational resources provided by the Institute for Cyber-Enabled Research.
Footnotes
This work was supported with NIH R01CA160514 and Worldwide Cancer Research WCR - 14-1153 to E.R.A as well as NIH 1F99CA212221–01 to J.P.R