Abstract
Neuroblastoma, like many childhood cancers, exhibits a relative paucity of somatic single nucleotide variants (SNVs). Here, we assess the contribution of structural variation (SV) in neuroblastoma using a combination of whole genome sequencing (WGS; n=135) and single nucleotide polymorphism (SNP) genotyping (n=914) of matched tumor-normal pairs. Our study design provided means for orthogonal validation of SVs as well as validation across genomic platforms. SV frequency, type, and localization varied significantly among high-risk tumors, with MYCN non-amplified tumors harboring an increased SV burden overall (P=1.12×10−5). Genes disrupted by SV breakpoints were enriched in neuronal lineages and autism spectrum disorder. The postsynaptic adapter protein-coding gene SHANK2, located on chromosome 11q13, was disrupted by SVs in 14% and 10% of MYCN non-amplified high-risk tumors based on WGS and SNP array cohorts, respectively. Forced expression of SHANK2 in neuroblastoma cell models resulted in significant growth inhibition (P=2.62×10−2 to 3.4×10−5) and accelerated neuronal differentiation following treatment with all-trans retinoic acid (P=3.08×10−13 to 2.38×10−30). These data further define the complex landscape of structural variation in neuroblastoma and suggest that events leading to deregulation of neurodevelopmental processes, such as inactivation of SHANK2, are key mediators of tumorigenesis.
Neuroblastoma is a cancer of the developing sympathetic nervous system that most commonly affects children under 5 years of age, with a median age at diagnosis of 17 months1. Approximately 50% of cases present with disseminated disease at the time of diagnosis, and despite intense multi-modal therapy, the survival rate for this high-risk subset remains less than 50%1. Recent whole genome and exome sequencing studies of neuroblastoma have revealed relatively few recurrent protein-coding somatic mutations including single nucleotide variations (SNVs) and small (<50b) insertion/deletions (indels)2-5.
Large-scale structural variations (SVs) such as deletions, insertions, inversions, tandem duplications and translocations can arise from mutational processes that alter chromosome structure and evade innate mechanisms of maintaining genomic stability. These diverse SVs are commonly acquired somatically and act as driver mutations6.
A plethora of approaches have been applied to detect SVs across large cancer datasets6-9. First, methods that identify copy number variations (CNVs) can be applied to intensity log R ratios from genotyping and comparative genomic hybridization (CGH) arrays as well as read-depth measures from next generation sequencing. Different segmentation algorithms have been applied to either platform in order to obtain copy number gain and loss calls, which range from a few hundred base-pair size to whole chromosomal alterations. These methods are dosage sensitive, allowing numerical quantification of amplifications and homozygous deletions. Analysis of CNVs in neuroblastoma primary tumor and matched blood samples has led to identification of recurrent somatically acquired DNA copy number alterations (SCNA), such as MYCN amplification, gain of chromosome 17q, and deletion of chromosomes 1p and 11q; these events are associated with an undifferentiated phenotype, aggressive disease, and poor survival to the disease10-22. In addition, focal deletions cause deleterious loss of function in the chromatin remodeler gene ATRX23,24, implicated in the alternative lengthening of telomeres (ALT) phenotype, and other tumor suppressor genes such as PTPRD25, ARID1A and ARID1B26.
Second, NGS technologies have profoundly expanded our understanding of the impact of SVs in cancer7. DNA sequencing methods focus on discordantly aligned reads and read-pairs to the reference genome. As such, alignment based approaches do not rely on dosage quantification and cannot quantify numerical changes of deletions and tandem-duplications; however, they provide information about inversions, translocations and transposable elements, which are elusive for CNV callers. In addition, alignment based approaches offer single base pair resolution and genome-wide coverage in the case of WGS. Recent studies using alignment based detection of SVs from WGS profiles from primary neuroblastomas revealed structural rearrangements as key oncogenic drivers mediating focal enhancer amplification or enhancer hijacking, influencing telomere maintenance through activation of telomerase reverse transcriptase gene (TERT)24,27,28 or by deregulating the MYC oncogene29. Despite the demonstrated importance of somatic CNVs and other SVs in neuroblastoma, studies systematically integrating CNV analysis and alignment based approaches are lacking; hence the global landscape and mechanisms of pathogenicity of many of these events remain poorly understood.
Here, we studied the role of somatic SVs in the largest available dataset to date, including 997 distinct primary neuroblastoma tumors and integrating whole genome sequencing (WGS) from 135 tumor-normal pairs and 914 single nucleotide polymorphism (SNP) arrays obtained at diagnosis. We considered alternative approaches for SV detection from both datasets, which overlap in a subset of 52 cases. As such, our study provides orthogonal as well as cross-platform validation of SV breakpoints. Furthermore, we explored the functional impact of SVs by integrating overlapping transcriptomic profiles and gene fusions from 153 RNA-sequencing samples and expression data from 247 HumanExon arrays in a combined subset of 361 tumor samples. Finally, in vitro studies demonstrated the functional relevance SHANK2, a newly identified tumor suppressor gene disrupted by SVs. Altogether, our dissection of multi-omic datasets together with patient clinical profiles and biological experimentation, expands the genomic landscape of neuroblastoma.
Results
Patient characteristics and multi-omic datasets for the study of structural variations
To establish the landscape of SVs in neuroblastoma, we first sequenced the genomes of 135 primary diagnostic tumors and matched normal (blood leukocyte) DNA pairs through the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative (https://ocg.cancer.gov/programs/target). Samples were obtained through the Children’s Oncology Group (COG) and included 106 patients with high-risk tumors (29 MYCN-amplified and 77 non-MYCN-amplified), 14 with intermediate-risk tumors and 15 with low-risk tumors (Fig. 1a, Supplementary Tables 1 and 2). Whole genome sequencing (WGS) was performed by Complete Genomics30 to a median average depth of 76x (Supplementary Fig. 1a) and primary data was processed via the Complete Genomics pipeline version 2.0. This pipeline reports small somatic variants (SNVs, small indels, and substitutions)31, larger SVs, and read-depth coverage across the genome used to infer copy number segmentation profiles (Online methods).
To augment the WGS data, and to provide independent replication, we genotyped and analyzed 914 patient tumor samples using Illumina SNP platforms (Fig. 1a, Supplementary Tables 1 and 2). This cohort comprised 696 high-risk (239 MYCN-amplified and 457 non-MYCN-amplified), 70 intermediate-risk and 145 low-risk tumors (Fig. 1a, Supplementary Tables 1 and 2); 488 of these samples were previously released32 and reanalyzed here. Copy number segmentation was obtained using the SNPrank algorithm implemented by the NEXUS® software platform (Online Methods).
To further assess the functional impact of SVs, we integrated additional data types generated through the TARGET initiative, we obtained transcriptional profiles from RNA sequencing (N=153) and Affymetrix HumanExon arrays (HuEx, N=247). In addition, the RNA-seq dataset was studied with three available gene fusion pipelines (STAR-fusion33, fusionCATCHER34 and DeFUSE35).
Patient clinical covariates were organized by the Children’s Oncology Group (COG) (Fig. 1a, Supplementary Table 1 and 2; https://ocg.cancer.gov/programs/target/data-matrix; phs000218.v4.p1). Along this study, we examined disease risk groups as defined by the COG and the International Neuroblastoma Risk Group (INRG)36. Specifically, the following subtypes were considered: LOWR: low-risk neuroblastomas; INTR: including those with intermediate-risk disease; MNA: high-risk neuroblastomas with amplification of the MYCN oncogene, and HR-NA: high-risk neuroblastomas without MYCN amplification.
Identification of novel regions of recurrent DNA copy number gain and loss
WGS-derived copy number profiles were compared with those obtained from the larger SNP array dataset. SCNAs were visualized with Integrative Genome Viewer (IGV) and confirmed well-established patterns of large SCNAs in neuroblastoma that differed between the tumor clinical subtypes (Fig. 1b, c)11,37. We further analyzed CNV segmentation profiles within neuroblastoma subtypes using GISTIC2.038. As expected, LOWR and INTR tumors harbored few focal or large SCNAs, although aneuploidy was observed (Fig. 1b, Supplementary Fig. 2a, b). Consistent with clinical records and previous reports, the MNA and HR-NA subsets shared highly frequent 17q gains and PTPRD deletions (9p23) and differ in 2p24 (MYCN locus) and prevalence in deletions of 1p, 3p, 4p and 11q (Fig. 1d, e, Supplementary Fig. 2c-e). We also observed less frequently reported variants in HR-NA group, including deletions at 16q24.339 and segmental gains of the q-arm of chromosome 7, a region recently suggested to exhibit oncogenic potential in neuroblastoma40 (Fig. 1d, Supplementary Fig. 2e).
CNV profiles derived from WGS are deemed to have higher resolution and returned peaks not found in SNP arrays. These SCNAs involved, focal gains at chromosome 5p15.33 (Q-value=1.42 × 10−3) harboring the telomerase reverse transcriptase (TERT) gene) (Fig. 1d), intragenic deletions of the ATRX chromatin remodeler gene at Xq21.1, (Q-value=3.76 × 10−3). Moreover, we observed a novel region of recurrent deletions at 10p15.3 (Q-value=6.16 × 10−2, Fig. 1e).
Orthogonal detection to SV identification: sequence junction, read-depth and copy number breakpoint analyses
To strengthen our findings, we considered three approaches to SV identification (Table 1). We integrated alignment-based SV calls and read-depth CNVs from WGS as well as intensity-based CNV calls from genotyping arrays (Table 1, Online Methods), and subsequently assessed to extent to which SV breakpoints overlapped between alternative methods and across WGS and SNP datasets. First, we obtained alignment-based SVs reported by CGI somatic pipeline, which provides information about SV boundaries, size and the type of variant in every sample; including deletions (>500b), tandem-duplications (>40b), inversions (>30b), translocations, inversions and complex events (Supplementary Fig. 1c-e). We applied additional filters by removing likely artifacts including duplicate junctions across samples and common germline variants found in the database of genomic variants (Online Methods)41. This resulted in a total of 7,366 (Supplementary Table 3); SV calls distributed heterogeneously across neuroblastoma subtypes (Fig. 2a). These SVs were defined by sequence junctions delimited by two breakpoints in the genome, which will be referred to as sequence junction breakpoints (SJ-BP). We next mapped copy number dosage breakpoints derived from WGS read-depth segmentation profiles, hereafter referred as read-depth breakpoint (RD-BP, Online Methods). A total of 2836 RD-BPs were identified (µ=21) unevenly distributed across samples (Fig. 2b). Finally, analogous to the RD-BPs, we mapped copy number breakpoints from segmentation profiles derived from the larger SNP cohort, referred to as copy number breakpoints (CN-BP, Online Methods); a total of 6,241 CN-BPs were identified across 914 samples (µ=6.8); As expected from previous reports11,42, we observed increased number of events in high-risk compared to intermediate and low-risk tumors when studied as SJ-BPs (Fig. 2a), RD-BPs (Fig. 2b) and CN-BP (Fig. 2c).
We further studied the co-localization of breakpoints derived from alternative measurements. First, we compared SJ-BPs and RD-BPs in each of the 135 WGS samples; overall, 30.5% of SJ-BPs co-localize with a RD-BPs (Fig. 2d) whereas 62% RD-BPs matched with SJ-BPs (Fig. 2e). The lower overlap in SJ-BPs is expected since not all SVs necessarily involve a change in copy number dosage (i.e. inversions and translocations). We next evaluated the co-localization of breakpoints across WGS and SNP platforms within the subset of 52 overlapping samples. 50.2% of CN-BPs from SNP arrays co-localized with SJ-BPs from the WGS dataset (Supplementary Fig.3a) whereas only 8.2% of SJ-BPs co-localize with CN-BPs (Supplementary Fig. 3b). Furthermore, when comparing dosage based breakpoints across platforms (RD-BP and CN-BP), 23.6% RD-BPs where found co-localizing CN-BP (Supplementary Fig. 3c) whereas 66.6% CN-BP co-localized with RD-BPs (Supplementary Fig. 3d). Overall, SNP arrays display reduced the number of breakpoints compared read-depth based profiles; we attribute these differences to a narrower dynamic range and lower probe density of the platform.
Finally, we performed a randomized test by sample shuffling (Ni=1000) in order to evaluate whether each of the co-localization percentages listed above could arise by chance or due to recurrence of structural variants across samples. All randomized percentage distributions range between 0.7% and 2.3%; in all cases the null hypothesis was discarded (p-value < 0.001, Supplementary Fig. 3e-j). Taken together, alternative breakpoint detection methods returned consistent results even when derived from different platforms providing means for both orthogonal and cross-platform validation of SVs. However, certain types of SVs can only be detected using alignment-based methods.
Patterns of SV and SNV mutational burden differ across neuroblastoma subtypes
High-risk tumors (MNA and HR-NA) presented considerably higher SV mutational burden than low- and intermediate-risk cases (INTR and LOWR), across SJ-BP, RD-BP and CN-BP measures (Fig. 2a-c)11,37. Comparison of MNA vs. HR-NA tumors revealed these high-risk subsets differed in SV type and genomic location (Fig. 2f-i, Supplementary Fig. 4a-d). MNA tumors harbored more SVs on chromosome 2 (Wilcox P=1.6 × 10−14; Fig. 2f), largely confined to complex junctions at the MYCN amplicon at chromosome 2p24 (Supplementary Fig. 4a). However, nearly all chromosomes displayed a higher frequency of SVs in HR-NA than MNA (Fig. 2f). Specifically, HR-NA tumors harbored more tandem-duplications in all chromosomes except chromosome 2 (Fig. 2g). Inter-chromosomal events were also more frequent in HR-NA tumors and overlapped with regions of known SCNAs other than chr2, including chromosome 3p (P=1.8 × 10−3), chromosome 4p (P=9.1 × 10−6) and chromosome 11q (P=1.9 × 10−8), but not chromosome 1p and 17q (Fig. 2h). In contrast, complex events showed no overall differences between high-risk groups with the exception of the aforementioned chr2 (Fig. 2i). Finally, RD-BP and CN-BP frequencies followed a similar pattern across chromosomes as that of SJ-BPs; MNA tumors harbored increased number of breakpoints in chromosomes 2 (PRD-BP=2.4 × 10−9, Fig. 2j; PCN-BP=4.2 × 10−83, Fig. 2k) while HR-NA harbored increased frequencies in most other chromosomes and in particular, chromosome 11 (PRD-BP=2.0 × 10−8, Fig. 2j; PCN-BP=4.0 × 10−25, Fig. 2k).
We next studied overall differences in mutational burden and chromosomal instability across subtypes; we posit that the densities of breakpoints (SJ-BP, RD-BP and CN-BP) throughout the genome represent a bonafide measure of chromosomal instability. We also obtained measures of somatic SNV density. In order to avoid skewing of results due to the MYCN amplicon in MNA and regions exhibiting chromothripsis43, we implemented an SNV and SJ-BP tumor burden measure robust against outliers. To this end, the genome was divided into 41 sequence mapped chromosome arms and the density of SVs per Mb was measured; then, for each sample, the interquartile mean (IQM) was derived from the 41 arm measurements (Supplementary Fig. 4e,f). Similarly, we obtained IQM density measurements from RD-BP and CN-BP chromosomal burdens (Fig. 2n,o). As expected, LOWR and INTR tumors carried very low mutational burden (Fig. 2l-o)11,37. We observed increased CIN (SJ-BP, RD-BP and CN-BP) in HR-NA compared to MNA (Wilcoxon rank test: PSJ-BP=4.5 × 10−5, Fig. 2m; PRD-BP=1.3 × 10−2, Fig. 2n; PCN-BP=4.6 × 10−8, Fig. 2o), similar to previous reports44. In contrast, MNA and HR-NA did not differ in their average SNV burden (Wilcoxon rank test: P=0.29, Fig. 2h). These results confirm that HR-NA has increased chromosomal instability37,44 and supports the observation that small SNVs and larger SVs arise from different mutational processes.
Chromothripsis associates with major neuroblastoma oncogenic mechanisms
Previous studies have reported chromothripsis to occur in up to 18% of high-risk neuroblastomas43 and identified associations between chromothripsis and key neuroblastoma oncogenes MYCN and TERT27,28. We therefore sought to leverage our large dataset to further explore the oncogenic associations of chromothripsis in neuroblastoma. We first identified alterations of major neuroblastoma oncogenes (MYCN, TERT and ALK) in our WGS and SNP cohorts. Rearrangements near TERT locus were confirmed in 23 HR-NA samples and 2 MNA from the WGS dataset as well as 15 cases (14 HR-NA and 1 MNA) from the SNP dataset, one sample (PAPUTN) was present in both datasets (Fig 3a); 11 cases from the WGS set with available DNA were validated using Sanger sequencing (Supplementary Fig. 5). We confirmed that TERT expression was increased in those samples as well as in MNA tumors in accordance with previous reports (Supplementary Fig. 6) 27,28. In addition, CN-BPs were found near TERT in 15 HR-NA samples from the SNP array dataset (Fig. 3a); highlighting the capability of SNP arrays for detecting this type of event using the breakpoint analysis approach introduced in this study. MYCN amplification was determined diagnostically by FISH experiments in 29 samples from the WGS dataset (Supplementary Table 2). IGV visualization of segmentation data of 7Mb region surrounding MYCN confirms the clinical records (Supplementary Fig. 7a). We also explored events affecting the ALK gene, which can co-occur with amplification of MYCN. Two out of four rearrangements found near ALK involved also MYCN, (Supplementary Fig. 7b); these events were validated via Sanger sequencing (Supplementary Fig. 7c).
Next, chromothripsis was characterized by clustered somatic rearrangements and alternating copy number states in defined chromosome regions45. We identified candidate chromothripsis events at chromosome arms with unusual high breakpoint densities (> 2σ of each sample’s breakpoint burden distribution) and a minimum of 6 breakpoints (both SJ-BPs and RD-BPs) in 27 regions (Online Methods, Supplementary Table. 4) involving 20 distinct high-risk tumors (19%). Chromothripsis was observed in chromosome 2 in a total of 8 samples (Fig. 3b,d,e; Supplementary Fig. 8); those samples showed enrichment in samples harboring MYCN amplification (MNA) (7/8 samples, Binomial test P=7.4 × 10−4) (Fig. 3). Among them, two samples (PARETE and PATESI) involved co-amplification of ALK with MYCN (Supplementary Fig. 7b,c). In addition, 9 tumors harbored shattered chromosome 5p with strong enrichment in samples with rearrangements near TERT (8/9, Binomial test P = 7.3 × 10−5) (Fig. 3c,d; Supplementary Fig. 9). Two samples (PAPSRJ and PAPUTN, Fig. 3d) included inter-chromosomal events associating the MYCN and TERT gene loci and co-amplification of both oncogenes. Other chromosomes involved included chromosome 1, 10, 11 and × in a female sample (Supplementary Fig. 10). Chromothripsis in most cases (15/20) was localized to a single chromosome involving either the whole chromosome (i.e. PATBMM, Fig. 3b) or local regions (i.e. PATESI, Fig. 3c). Multiple chromosomes were involved in 5/20 (25%) of cases with chromothripsis. One sample (PARIRD) harbored an event involving chromosomes 2, 17 and 22, while PANRVJ involved large regions of chromosomes 1 and 2 (Supplementary Fig. 8).
We next sought further confirmation of our results in the larger SNP array dataset. In the absence of sequence junction information, we focused on unusual high density (>2σ) of CN-BPs (Fig. 3f). We observed high-breakpoint density on chromosome 2 enriched in MNA samples (46/46, P ∼ 0). We also observed enrichment of high breakpoint density on chromosome 5 involving cases in tumors harboring rearrangements or CN-BPs near TERT (7/11, P = 3.01 × 10−8). In addition, chromosome × high breakpoint density was enriched in female patients (6/7, P = 4.7 × 10−2), although no specific oncogenic associations were determined. Overall, SNP array analysis of high CN-BP density supports and replicates observations based on the WGS dataset.
Identification of genes recurrently altered by SVs in high-risk neuroblastomas
In order to identify genes affected by recurrent somatic SVs in neuroblastoma we generalized the approach described previously for TERT, MYCN and ALK genes (Fig 3a, Supplementary Fig. 7); SVs were assigned to different categories according to the inferred impact on the exonic sequence of known RefSeq genes (Fig. 4a,b). Sequence junctions (SJ-BPs) provide more detailed information including the type of SV and the two genomic breakpoint locations involved; with this knowledge we classified SVs into: a) “Coding”; SVs that modify the exonic sequence of known genes including whole gene copy number alterations (duplications and deletions, size up to 2Mb) and b) “Non-coding”: SV that do not modify the exonic sequences but might have an impact on regulatory regions proximal to known genes (100Kb upstream and 25Kb downstream) as well as intronic regions (Fig. 4a). In contrast to SJ-BP junctions obtained from discordantly aligned mate read pairs; dosage-based breakpoints (RD-BP and CN-BP) cannot identify their counterpart location in the genome. Therefore, events such as translocations and inversions cannot be defined. Conversely, read-depth and array intensity based copy number inform about multiple dosage gains and losses. With this in mind, we assumed the impact as a) “Coding”: breakpoints within the transcription “start” and “end” positions of known genes and b) “Non-coding”, breakpoints located on proximal upstream and downstream regions (Fig. 4b). In addition, we localized copy number variants involving amplification (CNWGS > 8; CNSNP>4.5, Online Methods) and deep deletions (CNWGS < 0.5; CNSNP < 0.9, Online Methods) (Fig. 4b).
Based on the aforementioned definitions, we ranked recurrently altered genes according to the number of samples harboring “coding” and “non-coding” SVs for each of the 3 alternative breakpoint analyses (SJ-BPs, RD-BPs and CN-BPs; Fig. 4c-h and Supplementary Table 5). Overall, recurrently altered genes by ‘coding’ events return highly concordant results across the three approaches; MYCN neighbor genes occupying top ranks followed by known neuroblastoma altered genes, PTPRD and ATRX and novel genes including SHANK2 and DLG2 located at chr11.q13 and chr11.q14 respectively, and others such as AUTS2 at chr7.q11 and CACNA2D3 at chr3.p14 (Fig. 4c,e,g). On the other hand, non-coding recurrent alterations consistently reflect as top ranking genes, MYCN and TERT and their respective neighbor genes at chr2.p24 and chr5.p15 (Fig. 4d,f,h).
In order to provide an integrated overview of the landscape of altered genes, we combined WGS based methods (SJ-BP and RD-BP) into a ranking of recurrently altered genes with co-localizing breakpoints, hence orthogonally validated. A total of 77 genes have at least 1 co-localizing SJ-BP and RD-BP breakpoint (Fig. 4i, Supplementary Table 6); in addition, we overlaid likely pathogenic SNV calls (Supplementary Table 7). Many altered genes cluster in specific regions associated with known oncogenes such as chr2.p24 near MYCN (11 genes) and chr5.p15 near TERT (7 genes). The ranking is led by MYCN with 37 samples harboring variants, which orthogonal validation was obtained in 26 cases by co-localizing SJ-BP and RD-BP; those include 29 MNA and 8 HR-NA tumors. Interestingly, 11 HR-NA samples harbor alterations of MYCN (8 SVs and 3 SNVs) supporting the pathogenic role of MYCN in non-amplified tumors. TERT rearrangements were identified in 25 samples; orthogonal validation of breakpoints was observed in 12 cases (Fig. 3a). PTPRD was found altered in 20 samples, 11 of which were orthogonally validated (Supplementary Fig. 11a) 25,49. We found ATRX (North=5; Ntot=12) intragenic deletions and one tandem-duplication in HR-NA tumors (Supplementary Fig. 11b)50. The SHANK2 gene was found disrupted in 11 HR-NA tumors; 3 samples involved gene fusions that didn’t appear in frame. DLG2, a newly described tumor suppressor in osteosarcoma46,47, was found disrupted in 10 samples based on SJ-BP and RD-BP analyses from a total of 14 samples, two of which involved gene fusion events. Both SHANK2 and DLG2 are located on chromosome 11q and play a role in the formation of postsynaptic density (PSD)48. Other novel candidate altered genes include AUTS2 with frequent intragenic deletions at chr7q (North=3; Ntot=18, Supplementary Fig. 12a) and the calcium channel CACNA2D3 (North=4; Ntot=11), which represent a frequent breakpoint associated with 3p deletion at chr3.p14.3 (Supplementary Fig. 12b). A region proximal to LINC00910 lncRNA in chromosome 17 suffered rearrangements in 13 tumors (Supplementary Fig. 13a). Finally, the list includes additional genes with known roles in neuroblastoma and cancer: ALK, which somatic SNVs were found in 18 samples, also harbors rearrangements (North=1; Ntot=5; Supplementary Fig. 7b)23,51 in a combined set of 23 samples (17% of all neuroblastomas). Also, CDKN2A and CDKN2B deletions were found in 3 tumors (Supplementary Fig. 13b).
Systematic validation of SVs in high-risk neuroblastoma patient samples
Along this study, we produced extensive validation via Sanger sequencing of variant junctions, provided the samples had available DNA supply in our tumor bank; The validations focused on key genes and included 12 proximal TERT SVs (Supplementary Fig. 5), 4 ATRX deletions (Supplementary Fig. 14), 4 proximal ALK variants (Supplementary Fig. 7c), 11 SHANK2 translocation events, 9 of which involved chromosome 17q (Supplementary Fig. 15) and 12 DLG2 variants (Supplementary Fig. 16). In a total we validated 45 SVs (Supplementary Table 8). The original CGI cancer pipeline classifies SVs into high and low confidence variants depending on the number of read pairs supporting the evidence (Nreads threshold = 10); our pipelines rescue many cases below that threshold; specifically 6 out of 45 SVs (13.3%, 3 ATRX and 3 DLG2) returned positive Sanger validation.
SVs have a regional transcriptional effect
To gain further understanding of the functional relevance of SVs, we performed an expression quantitative trait loci (eQTL) analysis for each of the recurrent SV associated genes (Supplementary Fig. 13). The analysis, which was replicated in the two available transcriptional datasets (RNA-seq and HuEx array), reported consistent up-regulation of MYCN and TERT including their neighbor genes. We also observed up-regulation of the lncRNA LINC00910 (PRNA= 7.0 × 10−3) at chr17.q21, a region with frequent inter-chromosomal translocations (Supplementary Fig. 17a). On the other side, CDKN2A was down-regulated (PHuEx= 4.7 × 10−2; Pboth = 2.5 × 10−2) by focal deletions. Finally, PLXDC1 at chr17.q12 was also down-regulated (PHuEx= 4.7 × 10−2; Pboth = 2.5 × 10−2) in association with 17q gain breakpoints.
In addition to changes in overall gene expression by eQTL, translocations may lead to the expression of gene fusion transcripts; we explored RNA-seq samples with three available gene fusion methods (STAR-fusion33, fusionCATCHER34 and DeFUSE35, Supplementary Fig. 17b). We then refined the list of fusion transcripts to those confirmed by the presence of translocations in the WGS SV calls (Supplementary Table 9), which dramatically increases the overall agreement across the three gene-fusion detection methods (Supplementary Fig. 17c). The most frequent gene fusion event with both RNA and DNA evidence involved SHANK2; the three SHANK2 fusion events involved 17q genes: EFTUD2, MED1 and FBXL20 (Fig 6a). DLG2 exhibited gene fusion events in two samples involving SEMA6C and MYCBP2 at chromosomes 12 and 13 respectively (Fig 6b). However, none of the SHANK2 and DLG2 fusion transcripts appeared to be in-frame, suggesting the fusion transcripts may not be biologically relevant and that these are more likely loss of function events. Conversely, we found an in-frame fusion transcript and translocation involving FOXR1:DDX6 (Supplementary Table 9), which oncogenic fusion events have previously been described in neutoblastoma52.
Neurodevelopmental genes are recurrently disrupted by structural variations in neuroblastoma
In order to identify pathways targeted by SVs we considered recurrently altered genes from each the coding (N>2) and non-coding (N>3) altered gene lists (#genes: SJ-BPcoding=109, SJ-BPnon-coding=36, RD-BPcoding=76, RD-BPnon-coding=27, CN-BPcoding=77 And CN-BPnon-coding=88, Fig. 4c-h, Supplementary Fig. 18, Supplementary Table 10). We tested each gene list for enrichment across Gene Ontology, pathway and disease gene classes using ToppGene53 (Supplementary Table 10). Genes with coding sequences altered showed consistent results across the three breakpoint mappings, revealing strong enrichment in genes involved in autism spectrum disorder susceptibility (PSJ-BP = 2.8 × 10−9; PRD-BP = 2.9 × 10−5; PCN-BP= 2.7 × 10−9) and other neurodevelopmental disorders (NDD) as well as protein localization to synapse (PSJ-BP = 1.2 × 10−5; PRD-BP = 1.1 × 10−7; PCN-BP= 2.4 × 10−6) and other neuronal related classes (Fig. 5a-c; Supplementary Table 10). The gene sets with ‘non coding’ alterations were more variable across the alternative breakpoint analyses, but were dominated by events involving MYCN and TERT in association with the disease class “stage, neuroblastoma” (PSJ-BP = 1.9 × 10−6; PRD-BP = 2.5 × 10−5; PCN- BP = 9.2 × 10−5, Supplementary Fig. 18 and Supplementary Table 10).
Recurrently disrupted neurodevelopmental genes are down-regulated in high-risk neuroblastoma
To further characterize the clinical relevance of recurrently altered genes in neuroblastoma, we studied their differential expression between high-risk subtypes and low-risk (Stage 1 and 4s) groups (Fig. 5d,e, Supplementary Fig. 19). We first used gene set enrichment analysis (GSEA)54 to confirm the directionality of the regulation of gene classes enriched in recurrently altered genes; we observed down-regulation of both neuronal and synaptic genes (PHuEx= 1.09 × 10−9) and autism disorder susceptibility genes (PHuEx= 6.38 × 10−7) in high-risk tumors when compared to stage 1 low-risk tumors (Fig. 5d-f). We then focused on differential expression of genes with recurrent SVs in high-risk subtypes (Fig. 5g). As expected, known oncogenes including TERT and ALK are up-regulated in both MNA and HR-NA while MYCN is up-regulated only in MNA tumors. Known neuroblastoma tumor suppressor genes including CAMTA1 and RERE from the 1p chromosome region and PTPRD are down-regulated in both subtypes. Most genes with a role in autism disorder predisposition and those involved in neuron parts and synapse formation, are down-regulated in both high-risk subtypes; in particular expression was significantly reduced for SHANK2 (PMNA = 2.15 × 10−11;PHR-NA= 1.05 × 10−8) and DLG2 (PMNA = 2.1 × 10−8;PHR-NA= 4.86 × 10−8) in high-risk compared with stage 1 low-risk tumors (Fig. 6c) and compared to stage 4S low-risk tumors (PMNA = 1.41 × 10−3;PHR-NA= 1.82 × 10−5 and PMNA = 1.09 × 10−4; PHR-NA= 2.72 × 10−4 respectively).
Neurodevelopmental genes SHANK2 and DLG2 are frequently disrupted by chromosome 11 translocation events
High-risk neuroblastomas without MYCN amplification frequently exhibit deletion of chromosome 11q and this event is associated with a poor outcome16,44. The most frequent breakpoints observed in this study were located at chromosome 11q.13 and 11q.14 disrupting the SHANK2 and DLG2 gene loci respectively (Supplementary Fig. 19). SHANK2 translocation partners involved chromosome 17q in 10/11 WGS cases, in addition we identified 49 samples from the SNP dataset (10.7%) with breakpoints in SHANK2 (Fig 6a). In contrast, DLG2 translocation partners include multiple chromosomes; breakpoints were also identified in DLG2 locus in 28 samples from SNP dataset (Fig 6b).
SHANK2 is a scaffold protein in the postsynaptic density (PSD) with two known coding isoforms (long: NM_012309; short: NM_133266). We therefore studied the expression pattern of SHANK2 at the exon level using both HumanExon arrays (Fig. 6c) and RNA-seq (Supplementary Fig. 20) data. Clustering analysis of SHANK2 exon expression revealed two distinct clusters corresponding to the two known coding isoforms. Expression of both isoforms was decreased in high-risk tumors compared to INTR and LOWR as observed from RNA-seq (Fig. 6d) and HuEx expression analysis (Supplementary Fig. 21a, b). Finally, in a large independent cohort55, reduced expression of the long isoform (NM_012309) was associated with increased tumor stage (P=1.62 × 10−22, Supplementary Fig. 21c) and poor survival (P=7.21 × 10−13, Supplementary Fig. 21d). Consistent with the SHANK2 expression pattern, we observed decreased activation of PSD genes based on GSEA in high-risk compared to low-risk neuroblastomas in multiple prognostic signatures (Supplementary Fig. 22). We decided to further study the long isoform of SHANK2 (NM_012309) given that nearly all SVs uniquely disrupt this splice variant, leaving the short isoform (NM_133266) intact.
SHANK2 expression inhibits cell growth and viability of neuroblastoma cells
To further elucidate the role of SHANK2 in neuroblastoma, three neuroblastoma cell lines with low or no endogenous SHANK2 expression (Supplementary Fig. 23), including SY5Y (MYCN Non-amplified), Be(2)C (MYCN-amplified), and NGP (MYCN amplified), were stably transduced to constitutively overexpress SHANK2 long isoform or an empty vector control. SHANK2 expression was confirmed by Western blot (Fig. 7a-c). When maintained in selection media and grown alongside empty vector controls, the SHANK2-expressing cells consistently exhibited decreased cell growth and viability as measured by RT-CES cell index (Fig. 7d-f) as well as CellTiter Glo assay (Fig. 7g-i). For SY5Y, when control reached confluence, the comparable cell indexes of the SHANK2 overexpressing lines were reduced by 75% (P=3.4 × 10−5; Fig. 7d), Be(2)C cell index reduced by 62% (P=3.16 × 10−4; Fig. 7e), and NGP showed a 14% reduction (P=2.62 × 10−2; Fig. 7f). We also observed decreased cell viability in SHANK2-expressing cells at both 4-and 7-day endpoints using an ATP-dependent CellTiter Glo assay. Specifically, viability of SY5Y SHANK2-expressing cells was reduced to 65.51% (P=1.34 × 10−18) and 52.64% (P=4.72 × 10−26) of controls (Fig. 7g). This was reinforced in the similar results for Be(2)C SHANK2-expressing cells (49.21% and 44.26%, P=5.76 × 10-28 and 5.74 × 10−15; Fig. 7h) and NGP (90.63% and 74.01%, P=5.11 × 10−3 and 6.01 × 10−13) (Fig. 7i).
SHANK2 expression accelerates differentiation of neuroblastoma cells exposed to all-trans retinoic acid (ATRA)
We next investigated the role of SHANK2 in neuronal differentiation in Be(2)C and SY5Y cells exposed to ATRA. In the presence of ATRA, overexpression of SHANK2 accelerated differentiation as measured by presence and length of neurites compared to cell body (Fig 7j-o; Supplementary Fig. 24a-d). While decreases in growth can be measured even without drug application, once ATRA is applied, cells overexpressing SHANK2 develop neurites more quickly, and those neurites extend further than empty vector controls (Supplementary Fig. 24c,d). In Be(2)C cells, significant differences in neurite outgrowth normalized to cell-body area was seen at 72 hours post treatment with 1 uM ATRA (Fig. 7j-l) with SHANK2 cells exhibiting a 1.6-fold increase over controls (P=3.09 × 10−13), and the difference increased at 96 hours to 2.76-fold (P=2.37 × 10−30). Even with vehicle alone, SHANK2 cells had more neurite outgrowth per cell body compared to their empty vector counterparts at both 72 and 96 hours post treatment (P=1.02 × 10−5, P=1.25 × 10−13, respectively). In SY5Y, though differentiation takes longer and both SHANK2 cells and controls eventually reach 100% confluence with vehicle alone, SHANK2 overexpression still led to a decreased confluence in samples (P=1.69 × 10−6, Supplementary Fig. 24b). In analyzing total neurite outgrowth without normalization for cell body area, SY5Y ATRA-treated SHANK2 cells outpaced controls starting at hour 144 post-treatment and continued to lead until the experiment end, with a total neurite measurement 1.55-fold increased over controls (P=1.62 × 10−35; Supplementary Fig. 24d). Once normalized, SHANK2 cells have higher measured outgrowth starting at 75 hours post treatment, hour 96, and maintain from there. At 195 hours past treatment, SHANK2 cells treated with 5uM ATRA displayed neurites at 1.71-fold increase over their empty vector controls (P=2.36 × 10−17; Fig. 7m-o). Taken together, these data suggest SHANK2 is a newly identified haplo-insufficient tumor suppressor in high-risk neuroblastoma that is disrupted by recurrent somatic structural variation in the MYCN non-amplified subset of cases.
DISCUSSION
Sequencing studies of neuroblastoma tumors have revealed a relatively low SNV burden and limited mutational landscape, leaving aneuploidy and large segmental chromosomal alterations as the main candidate driver mutations in many tumors23. Structural variants (including insertions, deletions duplications and translocations) may also function as potent cancer drivers, as demonstrated with the discovery of rearrangements near TERT driving aberrant TERT expression in many high-risk neuroblastomas27,28. In this study, we have substantially expanded the landscape of structural variation in neuroblastoma and revealed their functional impact in the disease through integrative genomic analysis of a large cohort of patient samples profiled by whole genome sequencing and SNP arrays together with additional transcriptional data. To the best of our knowledge, we presented here the largest fully integrated genome wide survey of structural variation in neuroblastoma, combining alignments based (SJ-BP) and copy number based (RD-BP and CN-BP) breakpoint analyses. Despite the rich landscape of structural variations described in this study, we didn’t identify any oncogenic gene fusion events, with the exception of the previously reported case of FOXR152, which appears to occur preferentially in intermediate risk tumors within our cohort.
Overall, we showed that structural variation considerably increases the genetic complexity of high-risk neuroblastomas. This complexity is most evident in high-risk tumors without amplification of MYCN (HR-NA), which show increased chromosomal instability44, as confirmed by structural variation and breakpoint burden analyses. Moreover, this subset harbors more SVs in known cancer genes as well as novel genes. Interestingly, the SNV burden is very similar between MNA and HR-NA groups. As shown in pan-cancer studies, the underlying mechanisms potentiating chromosomal instability (CIN) and somatic SNV burden may differ56. Nonetheless, loss of TP53 function by deleterious mutations, associated with increased CIN in pan-cancer studies, is largely absent in primary neuroblastomas and drivers of the observed increased chromosomal instability in neuroblastoma remain unknown. Despite lesser burden, MNA tumors also present widespread structural variation but often associated to the MYCN locus. Indeed, translocations involving MYCN complex events may have a broader effect throughout the genome as observed in cases where MYCN appears co-amplified with ALK and TERT.
Chromothripsis is a well-documented genetic alteration in neuroblastomas, reported in as many as 18% high-stage tumors57. Similarly, in the current study, 19% of high-risk tumors from the TARGET cohort exhibited chromothripsis (N=20/105) involving a total of 27 chromosomal regions. These events largely overlap with amplification of MYCN (as well as some ALK cases) on chromosome 2p and TERT on chromosome 5p, suggesting an important role of chromothripsis followed by purifying selection as an underlying cause of those alterations. We also observed high-breakpoint density in the × chromosome of females based on the SNP data, which could be explained by higher tolerance to chromothripsis of diploid regions. Altogether, the prevalence and oncogenic role of chromothripsis in neuroblastomas is confirmed; future studies need to address whether it represents a therapeutic intervention opportunity.
Along this study, we report a common genetic repertoire of altered genes between neuroblastomas and neurodevelopmental disorders (NDD) such as autism. Linkage between cancer and autism has been previously established in PTEN-associated germline syndromes58. Furthermore, multiple autism susceptibility genes also have a known role in cancer59. Certain germline deletions with NDD associations such as 10p1560 and 16p24.361 are reported here to occur somatically in neuroblastoma. We hypothesize that structural variants in SHANK2 and DLG2 genes coding proteins of the postsynaptic density (PSD) comprise novel neuroblastoma candidate tumor suppressors involved in neuronal differentiation; additional candidate altered genes with a role in neurotransmission and synapsis and involvement in autism include AUTS2, CNTNAP2, NRXN1, CTNND262. These alterations are more prevalent in high-risk tumors without amplification of MYCN, which is itself a potent driver of dedifferentiation63. Transcriptomic analyses have shown that neural lineage pathways are commonly down-regulated in high-risk neuroblastomas compared to low-risk signatures64. Synaptogenesis is a key process in neuronal differentiation, and mutations in genes involved in the formation of synapses have frequently been implicated in NDD (also termed shankopathies)65. Furthermore, DLG2 has been recently described as a tumor suppressor in osteosarcoma46,47. We propose that the dysregulation of SHANK2 and DLG2 synaptic genes is involved in maintaining the undifferentiated state of the neuroblastic cancer cell. In particular, here we show that SHANK2 expression reduces cell growth and increases neurite outgrowth in human derived neuroblastoma cell lines in the presence of ATRA. The sensitizing effect of SHANK2 expression to ATRA treatment reveals the importance of understanding the mechanisms of differentiation disrupted in neuroblastoma. Retinoids are currently utilized as maintenance therapy in high-risk neuroblastoma standard of care66,67; subsequent studies with larger cohorts should evaluate the contribution of alterations in neurodevelopmental genes to retinoic acid treatment response as a maintenance therapy.
Despite the background genetic heterogeneity of neuroblastoma subtypes, SVs systematically target telomere maintenance mechanisms and neurodevelopmental pathways influencing differentiation. While MYCN-amplified tumors are largely sustained by the oncogene’s strong effect, other high-risk tumors suffer recurrent hits in both key pathways. Altogether, we depict a new landscape of structural variation in neuroblastoma and provide mechanistic insight into the neuronal development abrogation hallmark of the high-risk form of this pediatric disease.
Supplementary Data
Supplementary data include 10 tables and 24 figures.
Author Contributions
S.J.D designed the experiment. G.L., K.L.C and S.J.D. drafted the manuscript. G.L. and S.J.D. performed analyses of SVs from WGS. G.L. performed RNA data analysis. G.L. and K.L.H. performed telomere analyses and allele-specific expression studies. G.L. and A.M. performed de novo transcript analyses. G.L. and K.S.R. performed fusion transcript analyses. K.L.C. and M.D. performed Sanger sequencing. K.L.C., M.D., L.M.F., and E.H. performed SHANK2 experiments. Z.V. assisted with sequence data analysis. J.S.W. and J.K. generated RNA sequencing data. S.A. and R.C.S. generated array-based expression data. H.S. and P.W.L generated methylation array data. All authors commented on or contributed to the current manuscript.
ONLINE METHODS
Description of the dataset and data availability
The whole genome and RNA-sequencing data was downloaded directly from NCBI dbGaP (https://www.ncbi.nlm.nih.gov/gap with study-id phs000218 and accession number: Neuroblastoma (NBL)-phs00046723; The HumanExon arrays and Methylation arrays were obtained from the TARGET data NCI data matrix (https://target-data.nci.nih.gov/), In addition to the different TARGET datasets we used SEQC RNA-seq dataset (Gene Expression Omnibus: GSE62564) for survival analysis. The primary dataset in this study comprises 128 whole genomes sequenced tumor/blood matched pairs by Complete Genomics (CGI). CGI short read sequencing uses commercial software for processing, aligning to reference genome (hg19) and variant calling (Cancer Pipeline 2.0; http://www.completegenomics.com/documents/DataFileFormats_Cancer_Pipeline_2.0.pdf); Processed variant calls from CGI are available through TARGET data NCI pages (https://target-data.nci.nih.gov/). Additional genomic profiles from neuroblastoma available cell lines included: SNP genotyping arrays (Gene Expression Omnibus: GSE89968) and RNA-seq from 39 neuroblastoma cell lines (Gene Expression Omnibus: GSE89413)68.
Copy number segmentation, visualization (IGV) and recurrence analysis
CGI “somaticCnvDetailsDiploidBeta” files provide information on estimated ploidy and tumor/blood coverage ratio for every 2 kb along the genome. We used in home scripts to reformat coverage data to be processed with “copynumber” R bioconductor pakage69. Data used Winsorization (winsorize) data smoothing and segmentation with piecewise constant segmentation (pcf) algorithm with attributes kmin=2 and gamma=1000. Segmented data was visualized with IGV. Segmented data was used as GISTIC2.038 input; GISTIC attributes v 30 -refgene hg19 -genegistic 1 -smallmem 1 -broad 1 -twoside 1 -brlen 0.98 -conf 0.90 -armpeel 1 -savegene 1 -gcm extreme -js 2 -rx 0.
Filtering of CGI Structural Variants calls
The CGI Cancer Pipeline 2.0 is a full report including quality control, variant calling and CNV analyses: somaticAllJunctionsBeta files provides information for individual junctions detected in tumor genome that were absent in the normal genome. The “highConfidenceJunctionsBeta” files contains a filtered subset of the junctions reported in “somaticAllJunctionsBeta” file. This subset comprises junctions that likely resulted from a true physical connection between the left and right sections of the junctions. To obtain the junctions reported in this file the following filter criteria was applied to the junctions in the somaticAllJunctionsBeta Include the junction if DiscordantMatePairAlignments ≥ 10 (10 or more discordant mate pairs in cluster) AND Include the junction if JunctionSequenceResolve = Y (local de novo assembly is successful) AND Exclude interchromosomal junction if present in any genomes in baseline samples (FrequencyInBaseline > 0) AND Exclude the junction if overlap with known underrepresented repeats (KnownUnderrepresentedRepeat = Y): ALR/Alpha, GAATGn, HSATII, LSU_rRNA_Hsa, and RSU_rRNA_Hsa AND Exclude the junction if the length of either of the side sections is less than 70 base pairs. Additional filtering of high confidence variants was initially applied to the whole TARGET repertoire of tumor datasets including ALL, AML, NBL, OS, CCSK and RT in order to remove duplicate junctions indicative of common variations. We added additional filters to remove rare/common germline variants that passed CGI filters as well as artifacts and low confidence variants. To this end we used the Database of Genomic Variants (DGV v. 2016-05-15, GRCh37) in order to remove SVs which reciprocal overlap with DGV annotated common events was higher than 50%. We only filtered variants which type matched in both CGI SV set and DGV database.
Annotation of Structural variants from WGS
We used RefSeq gene definitions for hg19 downloaded from UCSC (10/31/2018) in order to map Structural Variant calls to nearby genes. First, we used two approaches to map variants: numerical changes (tandem-duplications and deletions size <2Mb) containing whole genes were used to define copy number alterations. Second, we mapped breakpoints relative to gene exonic coordinates; SVs were considered ‘disrupting’ when either one of the breakpoints localized between transcription start and ends of any of a genes isoform and proximal when localized at within 100Kb upstream or 25Kb downstream the most distal isoform of each gene, a graphical description is represented in Figure 4b. In addition we considered ‘intronic’ SVs those in which both breakpoints mapped to the same intron.
Processing Copy Number segmentation from WGS
We processed cnvDetailsDiploidBeta files from the Cancer Pipeline 2.0 (CGI©) containing average normalized read-depth coverage values at every 2Kb sliding window throughout the genome. Then, tumor/blood normalized ratios were subject to piecewise constant segmentation algorithm69 implemented in the ‘copynumber’ R package. The processed segmentation file is available through the TARGET data matrix (https://target-data.nci.nih.gov/).
Processing Copy Number from SNP arrays
We genotyped 914 matched patient tumor and normal samples using Illumina SNP arrays. This cohort comprised 488 of these samples were previously reported32 and reanalyzed here. The complete datasets comprises three different genotyping Illumina chip architectures: HumanHap550, Human610-Quad HumanOmniExpress. We processed for further segmentation the common set of 316210 probes Copy number segmentation was obtained using the SNPrank algorithm implemented by the NEXUS® software platform for tumor samples. As a result we obtained segmentation file with 914 unique samples available at the TARGET data matrix.
Identification and Annotation of copy number amplifications, deep deletions and breakpoints from WGS and SNP datasets
Breakpoints were called from segmentation profiles. We observed frequent artifact at subtelomeric and pericentromeric regions, those regions were filtered out from the analyses. A breakpoint is called when the absolute value of the copy number log-ratio difference between contiguous segments is higher than 0.152 for SNP arrays and 0.304 for WGS; both cutoffs account for 10% and 20% copy number change respectively (i.e for diploid regions, ΔCN = 0.2 and ΔCN = 0.4 respectively). We called for amplifications to segments from WGS dataset with CN >=8 and deep deletions CN <= 0.5; for SNP arrays we used less stringent cutoffs for amplification (CN >= 4.5) and deep deletions (CN <= 0.9). We used different thresholds since SNP array dynamic range is narrower than that of sequencing platforms and the resolution is also lower due to large regions with low probe density. We used RefSeq gene definitions for hg19 downloaded from UCSC (October 31st, 2018) version from UCSC in order to map copy number alterations variants and breakpoints to nearby genes. Genes were considered amplified or deep deleted when all isoforms were contained within the altered segment boundaries. Breakpoints were considered ‘disrupting’ when the breakpoint localized between transcription start and ends of any of a genes isoform and proximal when localized at within 100Kb upstream or 25Kb downstream the most distal isoform of each gene, a graphical description is represented in Figure 4b.
Tumor mutational burden analyses
We obtained measures representative of the burden of different mutation types under study (including SNVs, SVs and BPs). To this end, the density of mutations of every type is calculated as the average number of mutations in a given sample per sequence window (10Mb for SVs and BPs and 1Mb for SNVs). Instead of a single density value per sample we measure mutational densities for each chromosomal arm, excluding short arms with very low mappability (13p, 14p, 15p, 21p, 22p and Y chromosome). The remaining 41 chromosomal arms in each sample represent single sample distributions of mutational densities from which quantiles are obtained. We used the interquartile mean (IQM) since it offered a measure robust against outliers while conserving the variability across samples even in low-density breakpoint samples.
Analysis of recurrent structural variants
We studied SV recurrence as a means of their overlap and proximity to known genes defined by RefSeq database, hg19 genome version from 10/31/2018 obtained from UCSC. Calculations of genomic overlaps among SVs between SVs and with coding genes was performed using the R package GenomicRanges 70. Overlapping variants were considered those were junction and/or spanning segment overlap with known genes start and end coordinates. Proximal variants were defined as those affecting regions upstream and downstream of known genes; the high-confidence set considered variants at 50Kb upstream the transcription start site and 50 Kb downstream the transcription end. While the extended set incorporated variants up to 200Kb in upstream the transcription start site.
Filtering and annotation of somatic SNVs
The CGI cancer pipeline 2.0 provides somatic variant calls for SNVs and small indels. Given the gapped nature of CGI reads which leads to high noise to signal ratio we incorporated SNV filtering. We first annotated CGI SNV calls using Variant Effect Predictor (VEP) pipeline71. Our filter follows two steps: 1) collect high quality somatic non-synonymous coding variants (Phred like Fisher’s exact test P<0.001) annotated as having a moderate or high functional impact; this set of variants was combine with COSMIC catalogue of pathogenic variants (release v84). 2) Hot-spot analysis of variants from our combined catalogue (step 1) to identify both clonal and low allele frequency pathogenic variants.
Gene fusion analysis
Gene fusion analysis from RNA-seq data was studied using three available tools: (STAR-fusion33, fusionCATCHER34 and DeFUSE 35). We then collected fusion events that matched interchromosomal events from the CGI structural variation calls. While the three methods return (NSTAR-fusion=24,837, NfusionCATCHER=6,898, NDeFUSE=22,837) events with an overlap of 68 cases (0.1%) the subset of fusion events with matching translocation from WGS comprise 66 events (NSTAR-fusion=45, NfusionCATCHER=36, NDeFUSE=44) with the three methods overlapping 26 (40%) events. This data shows that DNA/RNA combined evidence verification returns high precision gene fusion events (see Supplementary Fig 13b,c).
Sanger Sequence Validation
From alignment and variant calls provided by Complete Genomics, structural variation breakpoints were mapped and junction sequence was assembled using public UCSC Genome browser. The assembled sequence was then submitted into Primer3 to engineer PCR primers to bridge the translocation. Resultant primers were then checked against BLAT as well as an internal algorithm for binding specificity. PCR reactions were then carried out on 25 ng of DNA using optimized conditions for each reaction. Products were checked via gel electrophoresis for specificity to expected size and uniqueness. If the product had multiple bands, the entire remaining sample would be run out then bands of interest excised and the DNA extracted using MinElute Gel Extraction Kit from Qiagen. Products with single bands were cleaned up and prepared for sequencing using the MinElute PCR Purification Kit (Qiagen). Samples were then sequenced at a core facility with 2 picomoles of the same primer used to create amplicon. Resultant sequences were then aligned to the expected sequence assembled from CGI results and mapped to chromosomes using BLAT at UCSC genome browser.
Cell Culture
Cells were grown in RPMI-1640 with HEPES, L-glutamine, and phenol red (cat # 22400-089), supplemented with 10% Fetal Bovine Serum, 1% antibiotic-antimycotic (cat # 15240-062), and 1% L-glutamine (cat # 25-005-CI) in 5% CO2 at 37°C in the dark. Transduced cells also had the appropriate concentration of puromycin in media for selection.
LentiVirus Infection
Lentiviral vector plasmid for the long isoform of SHANK2 (NM_012309) was obtained commercially from GeneCopoeia (EX-H5274-Lv105). Empty vector control plasmid pLv105 was originally from GeneCopoeia. Creation of the virus media was accomplished using Lipofectamine 3000 ™ applied to 293TN cells with packaging plasmid psPAX2, envelope plasmid pMD2.g, and the Lentiviral backbone plasmid containing the ORF for NM_012309 or empty vector. Infectious viral media was pooled over 2 days then filtered through 0.45µm nitrocellulose and combined with polybrene at 8 µg/mL media and applied to cells. Following infection, transduced cells were selected with puromycin in line-dependent concentrations.
Growth and Proliferation Assay using RT-CES
Cells were plated in 96-well RTCES microelectronic sensor arrays (ACEA Biosciences, San Deigo, CA, USA). Density measurements were made every hour. Cell densities were normalized to 5 hours post-plating.
Cell Viability Assays
Cells were plated in clear-bottomed, 96-well plates in 200 µL media and allowed to grow under normal conditions for either 4 or 7 days. Before reading, 100 µL media was replaced with equal volume of CellTiter Glo ® reagent and read on a GloMax Multi-detection instrument (Promega). Arbitrary luminescence units were normalized to empty vector-transduced controls and results expressed as percentages of control levels from the same assay.
ATRA-Induced Differentiation
Cells were plated in normal media at optimized densities for each parental line in 96-well plates and allowed 24-48 hours to firmly attach to plates. Media was then switched for low-serum media containing either 1% or 3% FBS and allowed 24 hours to equilibrate, after which it was replaced with low-serum media supplemented with varying concentrations of ATRA (all-trans-retinoic-acid, Sigma, R2625) or vehicle (DMSO) alone, in volume corresponding to the highest concentration of ATRA for each experiment. Plates were then left in normal growth conditions and protected from light. RA media was refreshed every 72 hours to prevent oxidation. Plates were placed in an IncuCyte ZOOM™ instrument to utilize live cell imaging. Each well was imaged every four hours and the “NeuroTrack” software module to quantify neurite outgrowth.
Protein Isolation and Western Blotting
Whole cell lysates were created by applying denaturing lysis buffer containing protease/phosphatase inhibitors (Cell Signaling Technology, 5872) to cells on ice and allowing lysis for 30 minutes. The total sample was sonicated for 5 seconds and spun at max speed in a microcentrifuge for 15 minutes at 4°C before collecting supernatant to clean tube. Quantification of protein was done using the Pierce BCA Protein Assay Kit (Thermo, 23227). Protein was loaded on 4-12% Tris-Glycine gels, transferred to PVDF membrane, and probed with antibodies in 5% milk in TBST. Antibody stripping used Restore ™ Stripping buffer (Thermo, 21059). Detection of HRP-conjugated secondary antibodies used SuperSignal™ West Femto Maximum Sensitivity Substrate (Thermo, 34096).
Acknowledgements
This work was supported in part by NIH grant R01-CA124709 (SJD) and the Roberts Collaborative Forefront Award (GL). This project was also funded in part by a supplement to the Children’s Oncology Group Chair’s grant CA098543 and with federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. HHSN261200800001E to S.J.D and Complete Genomics.
Footnotes
↵+ Equal contribution.