ABSTRACT
Endocrine therapy resistance is a hallmark of advanced estrogen receptor (ER) positive breast cancer. In this study, we performed DNA/RNA hybrid-capture sequencing on 12 locoregional recurrences after long-term estrogen-deprivation along with each tumor’s matched primary. Despite being up to 7 years removed from the primary lesion, most recurrences harbored similar intrinsic transcriptional and copy number profiles. Only two genes, AKAP9 and KMT2C, were found to have single nucleotide variant (SNV) enrichments in more than one recurrence. Enriched mutations in single cases included SNVs within transcriptional regulators such as ARID1A, TP53, FOXO1, BRD1, NCOA1 and NCOR2 with one local recurrence gaining three PIK3CA mutations. In contrast to DNA-level changes, we discovered recurrent outlier mRNA expression alterations were common—including outlier gains in TP63 (n=5 cases [42%]), NTRK3 [n=5 [42%]), NTRK2 (n=4 [33%]), PAX3 (n=4 [33%]), FGFR4 (n=3 [25%]) and TERT (n=3 [25%]). Recurrent losses involved ESR1 (n=5 [42%]), RELN (n=5 [42%]), SFRP4 (n=4 [33%]) and FOSB (n=4 [33%]). ESR1-depleted recurrences harbored shared transcriptional remodeling events including upregulation of PROM1 and other basal cancer markers. Taken together, this study defines acquired genomic changes in long-term, estrogen-deprived disease, highlights longitudinal RNA-profiling and identifies a common endocrine-resistant breast cancer subtype with basal-like transcriptional reprogramming.
INTRODUCTION
Hormone receptor positive breast cancer has served as a prototype for targeted therapy due to the well-established efficacy of estrogen deprivation. Largely because of these approaches, breast cancers are somewhat unique in that recurrences can occur years, sometimes decades following the primary diagnosis1–4. Given that the majority of patients receive long-term maintenance regimens of either a selective estrogen receptor modulator (SERM) or aromatase inhibitor (AI), recurrent breast cancers are often classified as estrogen-independent given their ability to thrive in an estrogen-deprived environment. Identifying the biological mediators that allow breast cancer cells to bypass their dependence on estrogen is a crucial step in understanding advanced breast cancer biology and defining novel therapeutic targets.
Defining these molecular processes in patient samples, however, has been challenging because of the logistics in obtaining well-characterized, longitudinally collected biospecimens. Nevertheless, shared features of more advanced breast cancers have emerged, such as relapsed tumors losing expression of ER and over 20% of metastatic ER-positive breast cancers acquiring mutations in ESR1 that confer ligand-independent signaling5–7. Other largely accepted mechanisms of estrogen-independence are bypass activations of mitogenic pathways such as MAPK and PI3K through initiating FGFR, EGFR and IGF signaling and exploitation of the Rb-CDK-E2F axis8–12. Less well validated, more recently discovered mechanisms include ESR1 fusions and amplifications13,14.
Recent studies analyzing multiple, longitudinally collected, pre and post-treatment samples have shown clonal evolution and selection in the context of targeted therapies15–18. Similar work analyzing hormone-receptor positive breast cancers have mainly been restricted to short-term pre/post neoadjuvant therapy analyses19–22. One of the most comprehensive genomic studies of this type was a multi-platform effort that characterized the clonal architecture of tumors after four months of AI therapy23. Although drastic clonal remodeling was observed at the DNA-level, few recurrent resistance mechanisms were appreciated. A more recent, large-scale study showed activating ERBB2 mutations, MAPK activation and NF1 loss as mechanisms possibly driving endocrine resistance—with some of these alterations being confirmed in subsequent studies24–27. The majority of this work has notably been performed on metastatic tissues—whether or not some of these changes occur locally as a result of estrogen-independence before distant spread is unknown.
Thus, to better define both DNA and transcriptional changes that occur in long-term estrogen-independent tumors, we undertook a targeted analysis of DNA/RNA alterations in ∼1,400 cancer genes in 12 paired primary and locoregional recurrences from patients with ER-positive breast cancers that were documented as being treated with estrogen-depleting therapy. The median time to recurrence was 3.7 years, with the longest time to recurrence being over 7 years.
RESULTS
Expression and copy number changes in local recurrences
Dual hybrid-capture DNA/RNA sequencing was performed for 1,400 cancer genes on 12 paired primary and local recurrences from patients with ER-positive breast cancers that underwent continuous endocrine therapy (Table 1). RNA-seq data (Supplementary Table 1) underwent unsupervised hierarchical clustering of normalized RNA expression values which showed most patient matched pairs clustered transcriptionally with their matched primary—regardless of the length of disease-free survival (Figure 1A). Unlike a previous transcriptome-wide analysis of primary breast cancers and matched bone metastases28, there was no significant correlation in pair transcriptional similarity and time to recurrence—although a trend towards negative correlation was observed (pearson R = −0.37, p-value = 0.236). Only a single recurrence showed marked transcriptional deviation from its matched primary (ERLR_03_R1); whereby it lost ER-positivity and gained HER2-positivity clinically. Copy number alterations (CNAs) between primary and recurrences were analyzed in the targeted capture regions for 10 cases (Supplementary Figure 1 and Supplementary Figure 2). Similar to expression, CNAs were largely consistent among the recurrences when compared to their matched primary (Figure 1B). Two exceptions were recurrences from cases ERLR_01 and ERLR_03, which showed distinct copy number profiles from the matched primary tumors with poor correlation between primary and recurrence CNA values versus all other cases (Supplementary Figure 3). Notably, unlike case ERLR_03, ERLR_01 interestingly retained a similar expression profile despite a distinct CNA profile. An analysis of shared variants validated both DNA and RNA extracts originated from the same patient (Supplementary Figure 4), excluding the possibility of sample mixup. ERBB2 copy number values correlated well with RNA expression (pearson R 0.92), as did other amplified genes including CDK12 and CCND1 (Figure 1C, Data Supplement S3).
SNV enrichments and differentially expressed genes
A total of 406 distinct, presumed-somatic nonsynonymous mutations were detected in either a primary or recurrence at an AF > 5% among the 10 DNA-sequenced cases (Data Supplement S4). To assess if there are shared DNA mutations acquired in recurrences, an analysis of enriched single nucleotide variants (SNVs) was performed which showed 56 statistically enriched SNVs in local recurrences versus matched primary tumors (Figure 2A, Data Supplement S5). SNVs in two genes were found to be enriched in more than one case (n = 2 [20%]), AKAP9 (R3320W, S319*) and KMT2C (T1969I, Y366N, R894Q). The recurrent mutations did not exhibit features suggesting functional selection, such as being within a conserved functional domain or within a COSMIC29 hotspot region, making it difficult to assess if these are pathogenic. Other case-specific, n-of-one enriched mutations included nonsense mutations in ARID1A (Q1424*, Case ERLR_20, Primary AF 0.5%, Recurrence AF 16.5%) and BRD1 (Q467*, Case ERLR_01, Primary AF 0.93, Recurrence AF 57.88%), an acquired TP53 mutation (S241C, Case ERLR_03, Primary AF 0.0%, Recurrence AF 53.4%) and an enriched NCOR2 mutation (A4942C Case ERLR_08, Primary AF 4.4%, Recurrence AF 19.4%). In case ERLR_01, an enrichment of a suite of three somatic mutations in PIK3CA was observed (E542K, Q546K, E726K) in the recurrence (Figure 2B). Notably, the number of enriched, non-silent SNVs ranged from 0 to 13 and was positively correlated with clinical time to recurrence (Figure 2C). No acquired ESR1 mutations were observed. These mutations were examined in the corresponding RNA-seq data to determine if they are expressed. Out of 633 total mutations—considering some of the 406 distinct mutations were present in both matched tumors—315 were detected in RNA with at least 2 supporting reads of the altered allele and an AF >=5%. Allele frequencies called from DNA-seq and RNA-seq data correlated well (Supplementary Figure 5, pearson R = 0.609, p-value=2.57e-33). Noteworthy, out of the 56 enriched SNVs in recurrence at DNA level, 31 distinct mutations can be detected with confidence at the RNA level (Data Supplement S6)—including AKAP9, KMT2C, ARID1A, BRD1 and TP53 as discussed above.
A differential expression analysis, comparing all primary tumors versus all local recurrences, yielded no genes passing an FDR corrected p-value of less than 0.05—which is perhaps expected given heterogeneity of clinical specimens (Data Supplement S7). Nonetheless, 71 genes with an average, voom normalized expression value of 2 or greater, a nominal p-value of less than 0.05 and a log2 fold-change greater than +/- 0.5 were identified (Table 2). Some of these genes, including the upregulation of EPOR, NDRG1, IDH2, CEBPA and PTPA and downregulation of ESR1, IGF1R, NFKB1 and RUNX2, are also differentially expressed in long-term estrogen deprived ER-positive cell lines (Supplementary Figure 6)30.
Outlier expression gains and losses
To further explore major expression changes that may be driving recurrence and therapy resistance, an outlier expression analysis was performed using gene-level fold-change values of each patient-matched case (Data Supplement S8). Unlike non-silent SNVs, recurrent transcriptional gains and losses were common (Figure 3A). These included gains and losses in shared pathway members, notably NTRKs and SFRPs respectively, targetable upregulation of growth factor pathway mediators such as FGFR4 and EGF and outlier gains in the CDK regulator CCNE1. 3 of 12 cases also shared outlier expression gains in TERT, with case ERLR_14 harboring a particularly extreme enrichment from near undetectable levels in the primary tumor (Figure 3B). Case ERLR_03’s recurrence, which was most dissimilar to its patient-matched pair transcriptionally, showed extreme loss and gain of ESR1 and ERBB2 respectively. CNA analysis confirmed recurrence-specific ERBB2 amplification and is consistent with previous studies of endocrine therapy-treated breast cancers selecting for HER2-signaling in more advanced tumors. The most recurrent outlier loss involved ESR1.
ESR1 depleted recurrences
Five cases showed outlier expression losses of ESR1 (Figure 4A). Despite estrogen receptor being the driver of ER-positive breast cancer and a major regulator of transcription; counterintuitively, 4 of 5 of the recurrences which lost ESR1 expression generally retained the expression profile of their patient-matched primary (Figure 1A). Importantly, many of these cases also harbored very similar CNA profiles (Figure 1B), implying the recurrences were derived from a continuous clonal lineage as opposed to being completely distinct breast cancers. Thus, to explore the transcriptional consequences of acquired ESR1 loss in ER-positive disease and identify potential bypass mechanisms driving ER- independence, a differential expression analysis was performed on the subset of pairs with outlier ESR1 expression losses. This analysis revealed several recurrently dysregulated genes in ESR1 depleted recurrences (Data Supplement S9). Two standout genes, KLK7 and PROM1, showed the highest degree of fold change with a log2 fold-change increase of 5.4 and 3.9 respectively—with some tumors exhibiting changes from near undetectable levels to high expression (Figure 4C). These two genes are more commonly expressed in basal cancers, with PROM1 being a cancer lineage stem cell marker (Supplementary Figure 7)31. Other genes with significant log2 fold-changes > 1 included drug targets such as FGFR4, KIT, IGF1R and BCL-2 (Table 2). NDRG1, a particularly compelling candidate since it also showed upregulation in LTED breast cancer models, was further interrogated using METABRIC data. Like PROM1 and KLK7, NDRG1 is most highly expressed in basal breast cancers; yet, when expressed in ER-positive primary tumors, NDGR1 confers significantly worse disease-specific survival outcomes (Supplementary Figure 8).
DISCUSSION
In this study, a targeted RNA/DNA analysis of approximately 1,400 cancer genes in ER-positive primary breast cancers and matched long-term, endocrine therapy treated local recurrences was performed. We found general conservation of transcriptional and copy number profiles among the majority of samples—suggesting that even after 7 years of dormancy and the selective pressures of therapies, locally recurrent breast cancers generally retain their intrinsic molecular features. An analysis of recurrence-enriched SNVs revealed limited recurrent mutation events, yet notable “n-of-one” mutation selection was observed—such as case ERLR_01 which showed three distinct, recurrence-enriched PIK3CA mutations. The most striking changes in long-term estrogen-deprived tumors; however, were highly recurrent (up to 42%), outlier expression changes. An analysis of tumors with the most recurrent outlier loss, ESR1, revealed concurrent upregulation of genes typically expressed in basal breast cancers, such as PROM1, KLK7 and NDGR1, suggesting selection of a more basal-like phenotype in endocrine-resistant disease. Our data showing similar CNA profiles argue against the outgrowth of a distinct ER-negative subclone but instead suggest possible epigenetic, transcriptionally-driven remodeling under antiestrogen pressures.
Nearly all recurrences are more similar transcriptionally to their matched primaries than to other, long-term estrogen deprived tumors—reinforcing the notion that advanced cancers generally retain their core transcriptional programming, even after nearly a decade of dormancy26-29. Furthermore, amplifications and deletions of recurrences are markedly similar to primaries, supporting recent evidence from breast cancer single-cell sequencing that structural variation is likely an early event and many CNAs, even in metachronous therapy-resistant tumors, may be shared by the majority of subclones32. An important exception to this conservation was ERLR_03_R1, a recurrence with a completely unique transcriptional and copy number profile than its matched primary. Evidence has emerged of so-called ‘collision tumors’, whereby two synchronous, distinct cancers can merge anatomically and only under the selective pressures of therapy or through deep sequencing, their individuality can be unmasked23,33. Indeed, this “recurrence” switched to ER-negative/HER2-positive from ER-positive/HER2-negative clinically, and thus could represent a different cancer than the primary— although the level of shared SNVs suggests some degree of clonal relatedness.
Limited shared, non-silent SNVs were discovered in these specimens, with AKAP9 and KMT2C being the only two genes that harbored recurrence-enriched mutations in greater than one case. These mutations are not in a conserved functional domain nor in a hotspot location, making it difficult to assess their pathogenic roles. AKAP9 and KMT2C also encode relatively large gene products (3911 and 4911 amino acids, respectively) which may increase the likelihood of obtaining a passenger mutation by chance. Nevertheless, KMT2C and other lysine methyltransferases have been implicated in breast cancer pathology, argued as potential drivers in large-scale sequencing studies of primary tumors and KMT2C mutations specifically may confer hormone therapy resistance in breast cancer models34–36. Case ERLR_20 harbored an enriched nonsense mutation in ARID1A. ARID1A alterations are associated with more unfavorable tumor features in breast cancer, and has recently been shown to determine luminal identify and therapy response in ER-positive tumors—consistent with the more basal-like transcriptional features we observe with ESR1-depleted recurrences37–40. A single recurrent cancer (ERLR_01_R1) showed enrichment of three somatic hotspot PIK3CA mutations (E542K, Q546K, E726K), suggesting strong MAPK signaling selection within that particular tumor and coincident with recent reports of multiple mutations occurring in individual cancer genes in advanced cancers41. SNVs within genes that act as corepressors and coactivators, some with direct influences on estrogen receptor mediated transcription, were found to be enriched in recurrences—such as NCOA1, NCOR2, FRYL and CREBBP—along with transcription factors including PAX5, FOXO1 and TP53. Notably, we did not observe any ESR1 mutations unlike other studies on locoregional recurrences42—likely due to our small sample size. Interestingly, this study reported lower frequency of ESR1 mutations in locoregional recurrences versus advanced metastases at an AF > 1% and recent data has emerged regarding a pro-metastatic phenotype of ESR1 variants43—suggesting locoregional recurrences may have a lower frequency of ESR1 variants versus distant disease. We also observed a positive correlation between the frequency of acquired, non-silent SNVs and disease-free survival—validating the concept that surviving cancer cells after initial therapy acquire potentially pathogenic mutations as they lay dormant and undetectable over time.
Given the heterogeneity of clinical specimens makes it difficult to rely on typically used differential expression workflows—since resistant mechanisms of individual tumors may be distinct—we undertook an analysis of patient-specific outlier expression gains and losses to identify more extreme transcriptional reprogramming events within individual cases that may be driving estrogen independence. Surprisingly, unlike SNVs, recurrent outlier transcriptional gains and losses were quite common. Particularly compelling outlier events included recurrent gains within shared pathway members, such as near mutually exclusive upregulations of NTRK3 [n = 5 [42%]) and NTRK2 (n = 4 [33%]). Notably, activation of NTRK’s mediates downstream signaling pathways typically associated with breast carcinomas, including PI3K and MAPK, and small molecule inhibitors of this family are showing promising results in recent solid tumor trials44. Other notable pathway member changes included loss of Wnt antagonists SFRP2 (n = 3 [25%]) and SFRP4 (n = 4 [33%]). SFRP2 is hypermethylated and silenced in a subset of breast cancers45 and experiments in model systems have shown cross-talk between ER and Wnt signaling that may mediate endocrine therapy resistance46,47. Other recurrent gains included FGFR4 (n = 4 [33%]), TERT (n = 3 [25%]) and CCNE1 (n = 3 [25%])—particularly relevant given the recent success of CDK inhibitors in hormone-positive disease and the burgeoning use of FGFR inhibitors against solid malignancies as we and others have reported48,49.
The most recurrent outlier expression loss was ESR1, which was diminished in 42% of long-term estrogen-deprived local recurrences. Interestingly, the loss of ESR1 for the majority of cases was not associated with a dramatic change in the tumors’ transcriptional profile. To further explore this counterintuitive result, given ESR1 is a master regulator of transcription and a driver of luminal breast cancers, we identified genes that were consistently altered in ESR1 depleted recurrences. The most substantial gains in ESR1 depleted tumors are genes generally expressed in basal breast cancers—such as NDRG1, DKK1, KIT, KLK7, PROM1 and COL9A3—and genes significantly lost in the ESR1 depleted subset are generally downregulated in basal cancers—EVLOVL2, BCL2, IGF1R, MYB, RABEP and ATP8A2 (MsigDB: SMID_BREAST_CANCER_BASAL_DN/UP gene lists)50. These results reveal a common, novel and distinct ESR1-depleted subtype of advanced breast cancers that acquire basal-like transcriptional reprogramming.
The greatest fold-change difference in ESR1 depleted recurrences was the upregulation of PROM1. PROM1 is a marker for tumor-initiating cancer stem cells and plays a key role in determining ER-positive luminal cell fate during differentiation from multipotent stem cells31, suggesting long-term endocrine deprived breast cancer cells may enrich themselves with stem-like progenitors to achieve estrogen-independence. Indeed, PROM1 has been shown to mediate endocrine therapy resistance in breast cancer models through IL6/Notch3 signaling 51,52. Here, we show that a large portion of long-term endocrine resistant breast cancers may be exploiting this transcriptional reprogramming. Finally, NDRG1, also significantly upregulated in ESR1 depleted recurrences and generally expressed in basal cancers, showed differential expression in three distinct LTED cell lines. NDRG1 is a suspected metastasis suppressor gene. Counterintuitively, we see upregulation of this gene in resistant disease and show increased expression confers worse survival outcomes in ER-positive primary tumors53. Further functional studies assessing the mechanistic and biological consequences of these transcriptional reprogramming events will be essential.
A pertinent point these results raise is the benefit of integrating longitudinal, targeted RNA-sequencing to inform resistance mechanisms and therapeutic targets in breast cancers. In this study, we find limited DNA-level enrichments yet highly recurrent, acquired transcriptional remodeling events from primary to advanced cancers, including a few of which that are immediately targetable such as NTRKs, FGFR4 and CCNE1. Overall, this work challenges our lack of emphasis on RNA-level changes, particularly those that can be elucidated from longitudinal biopsies, in clinical profiling of tumors and future work should be geared towards deciphering which of these bypass transcriptional programs may be druggable.
Collectively, these results begin to unravel the complex adaptations that breast cancer populations undergo when under the selection of long-term estrogen depleting therapies long-term. We identify acquired DNA-level mechanisms of resistance, such as mutations in ARID1A, other transcriptional regulators and multiple mutation selection within PIK3CA—but more importantly, uncover the most recurrent genomic adaptations taking place appear to be at the transcriptional level. These include targetable outlier gains and modifications in NTRKs as well as a distinct population of ESR1 depleted recurrences that enrich themselves with genes generally expressed in basal breast cancers—such as PROM1 and NDRG1. Preclinical, mechanistic investigations into these temporally altered genes are warranted given they may uncover novel and targetable mechanisms of endocrine therapy resistance in advanced breast cancers.
METHODS
Patient Samples, tissue processing and nucleic acid extraction
Institutional Review Board approval from both participating institutions (University of Pittsburgh IRB# PRO15050502, The Charité IRB Office) was obtained prior to initiating the study. Inclusion criteria for this study were (1) patients harbored patient-matched formalin-fixed paraffin-embedded (FFPE) tissue from primary breast cancers and local recurrences (Table 4), (2) biospecimens contained macrodissectable regions with sufficient tumor cellularity and (3) disease was treated continuously with a form of estrogen-depleting therapy prior to the recurrence. Biospecimens were reviewed by a trained molecular pathologist to confirm pathology, quantify tumor cellularity and to highlight regions of relatively high tumor cellularity for macrodissection. If a slide region harbored sufficient, microscopically verifiable adjacent normal cells, this region was also dissected and processed for downstream analyses. Between four to ten (depending on tumor size) 10-micron FFPE sections immediately adjacent to the H&E-analyzed section were pooled and underwent dual DNA/RNA extraction using Qiagen’s AllPrep kit. Nucleic acids were quantified fluorometrically with a Qubit 2.0 Fluorometer and quality assessed with an Agilent 4200 TapeStation Instrument prior to sequencing.
RNA and DNA-sequencing
RNA-seq library preparation was performed for all 12 cases using approximately 100 ng of RNA and Illumina’s TruSight RNA Pan-Cancer (1385 targets) protocol. DNA-seq library preparation was performed for 10 (6 with associated normal tissue) cases using no less than 30 ng of DNA and Illumina’s TruSeq Exome protocol with TruSight RNA Pan-Cancer probes for hybridization-based capture. Indexed, pooled libraries were then sequenced on Medium Output flow cells using an Illumina NextSeq 500 system (paired-end reads, 2 × 75 bp). A target of 5-10 million reads per sample was used to plan indexing and sequencing runs for RNA-sequencing and a target of 10-15 million reads was used for DNA-sequencing. RNA-sequencing FASTQ files were quantified with k-mer based lightweight-alignment (Salmon v0.7.2, quasi-mapping mode, 31-kmer index using GRCh38 Ensembl v82 transcript annotations, seqBias and gcBias corrections)54. tumorMatch (Chapter 3, Chapter 4) was used to validate sequencing pairs were patient-matched.
RNA-sequencing and quantification and DNA-sequencing alignment
RNA-seq read counts and mapping percentages were calculated (Data Supplement 4: S1) and transcript abundance estimates were collapsed to gene-level with tximport55. Log2 transformed TMM-normalized CPM (log2normCPM) values were implemented for subsequent analyses56,57. DNA-seq reads were aligned with bwa –mem (v.0.7.13) to an hg19 reference, sorted with samtools (v1.3), duplicates marked and removed with picardtools (v1.140) and local realignment performed with GATK (v3.4-46)58–60. Average coverage depth for the processed bam file was calculated using GATK’s DiagnoseTargets and the Illumina Pan-Can bed file (Appendix A.4: Figure 41, Data Supplement 4: S2). Metrics for average coverage values across all target intervals were plotted with ggplot2.
DNA-seq recurrence enriched variant determination
To determine enriched variants in recurrences versus patient-matched primary tumors, VarScan2 was implemented61. More specifically, primary and recurrent samtools mpileup files derived from processed bam files were input into VarScan2 using somatic mode, with somatic p-values representing the significance of a particular variant being acquired or enriched in the recurrence [SS = 1 or SS = 2]. Tumor purity estimates, as assessed by a molecular pathologist, were included in VarScan2 to correct contaminating normal cell influence on allele frequencies. The minimum coverage for a variant to be considered was 40X, with a minimum allele frequency (AF) of 0.05 in either the primary or recurrence and a minimum of 5 reads supporting the variant. Germline variants were determined for cases containing a matched normal (ERLR_01, ERLR_02, ERLR_07, ERLR_08, ERLR_12 and ERLR_15) using VarScan2’s germline mode with the same parameters. VCF output files were then imported into R using the VariantAnnotation package62. If a normal sample was available for the case, all germline variants (AF > 0.30) were excluded from subsequent analyses. Additionally, to limit technical artifacts especially considering specimens were formalin-fixed paraffin embedded63, a “blacklist” of variants was created including those called in at least 3 of the normal samples. Germline and blacklist-removed variants were then annotated with Annovar64. Lastly, to call recurrence-enriched, potentially pathogenic variants the following inclusion criteria were enacted: (1) VarScan2 somatic p-value < 0.05, (2) > 2-fold gain in allele frequency in the recurrence versus the primary, (3) minimum AF of 0.10 in the recurrence, (4) non-silent and (5) an ExAC AF < 0.01 considering some samples were without a paired normal (Data Supplement 4: S3)65. These non-silent, enriched, potentially pathogenic variants were then plotted using the OncoPrint function in ComplexHeatmaps66. A pearson R correlation was calculated between the frequency of enriched variants and disease-free-survival. PIK3CA mutations were visualized with IGV (2.3.60)67 and variant allele frequencies were derived from VarScan2.
RNA-seq variant determination
RNA-seq reads covering mutation sites called from DNA-seq of the corresponding sample were extracted from bam file and counted. Variants with at least 2 supporting reads containing the altered allele, and with AF greater than 0.05 in either primary or recurrence were considered.
Copy number alterations
To estimate copy number ratios, CNVkit was implemented on processed bam files using default settings and the -drop-low-coverage option68. A pool of bam files from adjacent normal tissue, sequenced in the same manner, was used as a reference. Probe and segment level copy number estimates were finalized with CNVkit’s call function, which utilizes circular binary segmentation69. To adjust for tumor purity and normal contamination, the –m clonal option was used with tumor purities from pathologic evaluations. Copy number ratios were then plotted with the heatmap function and copy number values were assessed and plotted with ggplot2. Gene-level copy number estimates represent the mean copy number call across all probe targets. CNVkit copy number ratios showed a near normal distribution and ERBB2 copy number values demonstrated a strong correlation (pearson R = 0.924, p-value < 0.001) with expression (Appendix A.4: Figure 42).
Differential gene expression, clustering and outlier gains and losses
Hierarchical clustering was performed using the heatmap.3 function (https://raw.githubusercontent.com/obigriffith/biostar-tutorials/master/Heatmaps/heatmap.3.R) in R on log2normCPM values of the top 10% most variable genes (defined by IQR) with 1 minus Pearson correlations as distance measurements and the “average” agglomeration method. Differential expression between primary and recurrent tumors was analyzed with limma. Raw counts were input into the voom function and quantile normalized prior to fitting the linear model and performing the empirical Bayes method for differential expression70,71. The linear model was fitted with a design that accounts for the paired nature of the cohort (model = ∼Patient+Tissue [primary or recurrence]). Outlier expression gains and losses were determined for each patient by discretely categorizing genes into one of 5 categories. If log2FC values (i.e. recurrence log2normCPM – primary log2normCPM) for a given gene were less than Q1 – (1.5 X IQR) or Q1 – (3 X IQR), using case-specific log2FC values for all genes as the distribution, that gene was deemed an “Outlier Loss” or “Extreme Loss” respectively. If log2FC values calculated were greater than Q3 + (1.5 X IQR) or Q3 + (3 X IQR), it was deemed an “Outlier Gain” or “Extreme Gain” respectively. All other genes with intermediate fold changes were classified as “Stable.” To determine subtype expression of KLK7, PROM1 and NDRG1, normalized microarray expression data along with PAM50 calls was obtained from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) through Synapse (https://www.synapse.org/, Synapse ID: syn1688369), following IRB approval for data access from the University of Pittsburgh72. Overlap with genes in long-term estrogen deprived, ER-positive breast cancer lines (HCC1428, MCF7, T47D, ZR75.1) was performed by running a separate differential expression analysis (LTED vs. parental lines) on microarray data with limma71,73. Dysregulated gene overlap was designated if the nominal p-value and FDR-adjusted p-value were both < 0.05 in the local recurrence and LTED differential expression analysis, respectively. Binary dichotomization of METABRIC samples using NDRG1 expression (>50th percentile, <50th percentile) and log-rank testing were used to assess significant differences in disease-specific survival (DSS) and then Kaplan-Meier curves were plotted with survminer74,75.
AUTHOR CONTRIBUTIONS
Study concept and design (NP, KD, SO, AVL); acquisition, analysis, or interpretation of data (all authors); drafting of the manuscript (NP, KD, SO, AVL); critical revision of the manuscript for important intellectual content (all authors); administrative, technical, or material support (WH, JKK, TD, PCL, JUB, CD, AM, BIH)
Grant Support
Research funding for this project was provided in part by Susan G. Komen Scholar awards (AVL and SO), Breast Cancer Research Foundation (AVL and SO), the Fashion Footwear Association of New York, Magee-Women’s Research Institute and Foundation, Nicole Meloche Foundation, Penguins Foundation, Penguins Alumni Foundation, Mario Lemieux. NP was supported by a training grant from the NIH/NIGMS (2T32GM008424-21), an individual fellowship from the NIH/NCI (5F30CA203095) and University of Pittsburgh School of Medicine Dean’s MSTP Postdoctoral Scholar Award.
Conflicts of Interest Disclosure
Illumina provided free reagents for study but had no role in study design, data analysis or conclusions. CD reports personal fees from Novartis, Roche, MSD Oncology and Daiichi Sankyo, grants from Myriad Genetics and is a shareholder of Sividon Diagnostics / Myriad. In addition, Dr. Denkert has a patent EP18209672 pending, a patent EP20150702464 pending, and a patent Software (VMscope digital pathology) pending. JUB has received Honoraria from Amgen, AstraZeneca, Pfizer, Novartis, SonoScape, MSD Oncology, Roche and has served a consulting role for Pfizer, Amgen, Novartis, AstraZeneva and Roche.
SUPPLEMENTARY FIGURES AND LEGENDS ARE PROVIDED IN SEPARATE FILE
ACKNOWLEDGMENTS
This project used the University of Pittsburgh HSCRF Genomics Research Core and Pitt Biospecimen Core, and the UPMC HIllman Cancer Center Tissue and Research Pathology Services supported in part by award P30CA047904. The authors would like to thank the patients who contributed samples to this study and Lori Miller (University of Pittsburgh) for their efforts in collecting tissue.