Convergent architecture of the transcriptome in human cancer

Lihua Zou

doi:10.1101/028415

ABSTRACTS

Despite large-scale efforts to systematically map the cancer genome, little is known about how the interplay of genetic and epigenetic alternations shapes the architecture of the tumor’s transcriptome. With the goal of constructing a system-level view of the deregulated pathways in cancer cells, we systematically investigated the functional organization of the transcriptomes of 10 tumor types using data sets generated by The Cancer Genome Atlas project (TCGA). Our analysis indicates that the human cancer transcriptome is organized into well-conserved modules of co-expressed genes. In particular, our analysis identified a set of conserved gene modules with distinct cancer hallmark themes involving cell cycle regulation, angiogenesis, innate and adaptive immune response, differentiation, metabolism and regulation of protein phosphorylation. We applied a network inference approach to nominate candidate drivers of these conserved gene modules. The predicted drivers have consistent cancer-relevant functions related to the specific hallmarks and are enriched with cancer consensus genes and significantly mutated genes. We showed genetic alternations of TP53 and other cell cycle drivers have major downstream transcriptional impact on cell cycle regulation. Collectively, our analysis provided global views of convergent transcriptome architecture of human cancer. The result of our analysis can serve as a foundation to link diverse genomic alternations to common transcriptomic features in human cancer.

INTRODUCTION

The evolutionary history of a cancer cell has documented the accumulation of random ‘passenger’ mutations along with a few rate-limiting ‘driver’ events which confer fitness benefit to the cell. The cumulative effect of acquired driver events eventually set cancerous cells into uncontrolled state of proliferation, immortality, angiogenesis, cell death, invasion and metastasis. However, how the genetic and epigenetic effects propagate through intricate gene and biochemical circuits to cause deregulated cellular states within a cancer cell is still largely unknown. A promising approach is to decompose the oncogenic state of tumor cells directly in terms of the activity of sets of conserved biological features that has been described as the hallmarks of cancers¹.

The comprehensive multi-dimensional genomic data provided by TCGA datasets across many tumor types offers an unprecedented opportunity to deepen our understanding of cancer biology and the discovery of new therapeutic and diagnostic targets. Gene expression analyses using unsupervised clustering approaches have been widely used in cancer research. Although informative, these methods analyzed gene level information and are prone to intrinsic noise caused by various biological and experimental factors. During the last decade, studies^2-5 showed that gene modules consisting of co-regulated or co-expressed genes can be considered as higher-order building blocks of the global transcriptomic network and have been successfully applied to detect gene-disease association by improving the statistical power beyond the level of individual genes. Systematic data-driven approaches^6-10 have also been developed to identify co-expressed gene modules in order to elucidate patterns of transcriptome organization across tissues and species.

We adopt the data-driven strategy to investigate the co-expression networks across 10 TCGA human cancer data sets including glioblastoma multiforme (GBM), ovarian carcinoma (OV), colon and rectal cancer (COADREAD), lung squamous cell carcinoma (LUSC), breast invasive carcinoma (BRCA), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), bladder urothelial carcinoma (BLCA), uterine corpus endometrioid carcinoma (UCEC) (Supplementary Table 1). Our analysis pipeline has four components: 1) construction of the co-expression network for each cancer data set and decomposition of the network into Co-expressed Gene Modules (CGMs) (Figure 1a); 2) identification of ‘conserved’ (or overlapping) CGMs across multiple tumor types (Figure 1b); 3) extraction of the core genes of conserved CGMs and classification of CGMs by cancer hallmark themes (Figure 1c); and 4) nomination of putative drivers of CGMs based on analysis of the human protein interaction network (Figure 1d).

View this table:

Figure 1. Overview of Pan-Cancer hallmark module analysis.

a, For RNAseq data of each cancer data set, we construct the co-expression network based on the pair-wise Pearson correlation of gene expression profile. In total, 79 CGMs were identified by dynamic tree cut algorithm¹². b, We enumerate all possible pairs of CGMs from any two different cancer data sets to find 11 groups of conserved CGMs across tumor types. c, We extract core module genes for each of the conserved CGM groups. We also classify the conserved CGMs into 7 functional categories by their enriched cancer hallmarks. d, We map the core module genes onto a human protein interaction network and apply NetSig to nominate module drivers (Supplementary Methods).

Overall, we identified a set of conserved cancer CGMs across multiple tissue lineages. These conserved cancer CGMs are enriched with distinct gene ontology and cancer hallmark themes. Predicted module drivers have consistent cellular functions with their corresponding hallmarks and are enriched with cancer consensus genes and significantly mutated genes.

RESULTS

Co-expression network analysis reveals conservation of gene modules

We followed a previously described method⁹ and constructed gene co-expression networks on the basis of Pearson correlation between the RNA-seq expression profiles for the 10 TCGA cancer data sets (Supplementary Methods). For each cancer data set, co-expressed gene modules (CGMs) were identified in two steps: First, we performed unsupervised hierarchical clustering of the gene expression profiles using the topological overlap^8,11 as a pair-wise distance measure; and then applied the dynamic tree cut method¹² to decompose the dendrogram at different height and merge the clusters to form stable CGMs iteratively (Supplementary Methods). We defined 12 CGMs in GBM, 5 in OV, 10 in COADREAD, 8 in LUSC, 7 in BRCA, 9 in HNSC, 11 in BLCA, 8 in KIRC, and 9 in UCEC. In total, 79 cancer CGMs were identified by this analysis.

Next we tested whether these CGMs can be validated using data from other experimental platforms and in independent data sets. We first compared the TCGA GBM expression data (syn1446212) measured by RNA-seq and by Affymetrix microarrays. We applied the same procedures to both data sets and were able to detect the majority of the CGMs defined using Affymetrix microarray when using RNA-seq data, suggesting that CGMs are reproducible across technical platforms (Supplementary Figure 1). We performed a similar comparison using 2 independently published GBM expression data sets (GSE4271, GSE13041) and reassuringly, as above, the majority of CGMs detected in the TCGA data were also detected in the external data sets demonstrating the robustness of the CGMs (Supplementary Figure 1).

To determine whether CGMs from different cancer data sets were composed of overlapping genes, we enumerated all pair-wise CGMs (n=2,756) from any two different cancer data sets and identified 509 pairs of CGMs with significant mutual overlap (Bonferroni corrected hypergeometric test, p < 0.05, Supplementary Materials). The list of overlapping pairs of CGMs is provided in Supplementary Table 2. Surprisingly, 63 out of 79 (82%) CGMs showed significant overlap with CGMs detected in at least two different tumor types (Figures 2 and 3). Taken together, our analysis suggests that the human cancer transcriptome is organized into modules of co-expressed genes that are conserved across tumor types.

View this table:

Figure 2. Detection of conserved co-expressed gene modules.

a-i, Each dendrogram corresponds to the co-expression network of one of the 10 cancer data set (COAD and READ are combined into one cancer data set) produced by average linkage hierarchical clustering of genes on the basis of topological overlap¹¹. The vertical scale of the dendrogram is the measure of distance i.e. 1 − ||d||, where d is Pearson correlation of a pair of expression profile. CGMs in each cancer data set are assigned unique color to distinguish different modules and numbered uniquely (e.g. M1, M2.) within the set. The M2 module of OV (indicated by ^*) is at the borderline of modularity cutoff (Supplementary Materials) and is manually assigned to the pink group. We merged the M7 module of COADREAD with two close neighbors M3, M9 as indicated within the brackets. Conserved CGMs from different cancer data sets with significant gene overlap were assigned the same color as indicated by the horizontal bar beneath each dendrogram. For clarity, only CGMs conserved in at least 3 different tumor types are shown. The remaining non-conserved or poorly connected genes are colored white. The color key at the bottom shows the function categories of the conserved CGMs.

Figure 3. Hierarchical organization of conserved co-expressed gene modules.

a, The leafs of the hierarchical tree represent a CGM. Groups of highly conserved CGMs are shown as distinct subtrees of the hierarchical tree corresponding to the module-module network shown below. The dotted line represents modularity threshold. b, Each node in the module-module network represents a CGM. Each edge represents the gene overlap between two CGMs of different tumor types. The color circles indicate the conserved CGM groups. The node color matches the leaf color of the tree. c, Each colored histogram shows the Pearson correlation of all possible gene pairs from one of the conserved module group. The overlapping grey histogram to the left shows the Pearson correlation of all genes pairs from a random gene set which has the same size of the conserved module group.

Hierarchical organization of co-expressed gene modules

To better understand their global organization, we sought to define the relationships between these CGMs through the construction of a module-module network. The resulting network has 79 nodes, each represents one CGM, and 509 edges between significantly overlapping CGMs. Many module nodes appear to be closely connected on the resulting network (Figure 3b). To visualize the multi-level relationship among the CGMs, we transformed the module-module network into a hierarchical tree (Figure 3a) based on the random walk distance¹³. We partitioned the hierarchical tree into subtrees or module groups based on network modularity¹⁴ (modularity threshold=0.37, Supplementary Materials), a graph structure measure reflecting the enrichment of edges among nodes within a group (or cluster) relative to the nodes between the groups.

As shown in Figure 3b, our analysis revealed 11 groups of CGMs comprised of the 63 CGMs with high degree of mutual overlap (Figures 2 and 3). The 11 CGM groups have a median of 5 CGMs representing at least 3 different tumor types in each group. This implies that these CGM groups capture common or ‘conserved’ cellular processes across cancer types. The 11 CGM groups are shown as distinct subtrees on the transformed hierarchical tree of CGMs (Figure 3a). Notably, the red, turquoise, and orange CGMs fall within one branch delineated by the green square on Figure 3a which mostly involves immune response. The blue, purple, and magenta modules fall within another branch (enclosed by the blue square on Figure 3a) which primarily involves angiogenesis, epithelial-mesenchymal transformation (EMT) and muscle development. The hierarchical tree of identified gene modules suggests convergent functional organization of cellular programs of tumor independent of their tissue lineage.

To test whether genes belonging to each of the CGM groups are indeed co-expressed across their tumor types, we calculated the Pearson correlation for all gene pairs between their expression levels in the relevant tumor types. We confirmed the gene-gene correlation within the overlapping CGMs is significantly higher than expected from a random gene set of the same size (Kolmogorov-Smirnov test p < 2.6 × 10⁻¹⁶, Figure 3c).

Conserved expression modules are enriched with cancer hallmarks

To assess possible biological functions of the CGMs, we calculated gene ontology enrichment for individual CGM (Figure 4, Supplementary Materials). We also extract core module genes (Supplementary Materials) for each of the 11 CGM group and calculate its functional enrichment using DAVID tools (http://david.niaid.nih.gov/david/). We classified the CGMs into 7 functional categories based on their specific enrichment of gene ontology¹⁵ and cancer hallmarks (Supplementary Table 3): (i) cell cycle (FDR< 2.2 × 10⁻⁵¹, black modules of Figure 2-4); (ii) innate immune response (FDR< 7 × 10⁻¹⁴, red modules of Figure 2-4); (iii) adaptive immune response (FDR< 5.6 × 10⁻¹⁰⁹, turquoise and orange modules of Figure 2-4); (iv) differentiation (FDR< 9 × 10⁻¹⁰, brown modules of Figure 2-4); (v) angiogenesis/EMT (FDR< 2 × 10⁻⁷, blue, purple, and magenta modules of Figure 2-4); (vi) metabolism (FDR<0.1 green and pink modules of Figure 2-4); (vii) phosphorylation (FDR<0.3, yellow modules). We referred to these classified CGMs as “hallmark gene modules” and highlighted their core module genes below. We provide the full list of core module genes in Supplementary Table 4.

View this table:

Figure 4: Gene ontology of co-expressed gene modules.

The columns of the matrix are the conserved CGMs. Examples of core genes (indicated within the colored square) for each of the conserved CGMs are shown below the matrix. The color rectangulars along the diagonal indicate the conserved CGM groups (columns) and their most significantly associated gene ontology terms (rows). The color intensity of the matrix cell represents the level of association by one-sided hypergeometric test. Columns are ordered so the modules of the same group are together. The rows are grouped so the most significantly enriched gene ontology terms for a module group are together.

The cell cycle modules (class i, black modules of Figure 2-4) are enriched with genes involving chromosome instability (i.e. CENPA, BUB3, NUF2^16-18) and DNA replication (i.e. CDC45, MCM10¹⁹). Other notable genes in this module include transcription factors MYBL2, FOXM1, E2F2 and mitotic regulators such as CCNB2, CDK1, AURKB, CDC25A and CDC25C. The innate immune modules (class ii, red modules of Figure 2-4) include multiple INF-signaling and virus-response pathway genes including IFI27, IFI6, IRF9 and several MHC class I genes including HLA-B, HLA-H, HLA-L. Mutations and somatic copy number alternations (SCNAs)(doi:10.7303/syn300013) in several HLA gene family members affect about 10% of total TCGA cancer patients (Supplementary Figure 2). The adaptive immune modules (class iii, turquoise and orange modules of Figure 2-4) are present in all 10 TCGA cancer data sets for this analysis. The core adaptive immune genes include multiple inflammatory response genes CD163, CPVL, FPR3 and members of MHC class II genes including HLA-DOA, HLA-DPA1, HLA-DPB1.

The angiogenesis/EMT modules (class iv, blue modules of Figure 2-4) are present across all 10 TCGA data sets in this study. Core genes of the angiogenesis/EMT modules include FN1, MMP2, TWIST1, and WNT5A that are involved with cell adhesion, embryonic development, angiogenesis and TGF-beta mediated signaling¹. Somatic mutations and SCNAs in members of angiogenesis and TGF-beta pathway are shown to affect 37% of human cancers (Supplementary Figure 2). The differentiation modules (class v, brown modules of Figure 2-4) appear to be specific to squamous carcinoma including LUSC, HNSC and BLCA. The core genes of the differentiation module contains multiple tissue differentiation genes, for example, SOX2, TP63, KRT5, KRT16, HOXA10 and WNT signaling pathway genes WNT4 and WNT7B^20,21. SOX2 is reported previously as lineage-survival proto-oncogene with essential roles in the differentiation and development of normal squamous cell²². SOX2 is located on 3q26, a frequent amplified region in HNSC and LUSC²³. The mutual exclusivity between SOX2 and TP63 mutations (Supplementary Figure 2) suggests a convergent mechanism driving the differentiation transcriptional programs among a subset of squamous carcinoma.

Functions of imputed module drivers reflect hallmark themes

We next sought to nominate candidate drivers of the conserved CGMs. Previous studies suggest that disease genes in close proximity on the protein interaction network (PIN) will likely be functionally related^24,25. We compiled a joint human PIN dataset from various resources including HPRD²⁶, NCI-PID²⁷, Cancer Cell Map from (http://cancer.cellmap.org/cellmap/), and REACTOME²⁸ (Supplementary Material). For each of the CGM groups, we applied a network-based driver inference algorithm (NetSig, Supplementary Materials) to prioritize putative module drivers based on their proximity to the core module genes on the PIN. Figure 5 shows examples of protein subnetworks comprised of the direct protein interactions between the top candidate module drivers (red node, Figure 5) and the most connected core module genes (Supplementary Materials).

Figure 5: Examples of protein subnetworks connecting conserved module genes.

a-i, The red round shaped nodes in each protein subnetwork indicate top candidate drivers of one of the conserved CGM group. The red square shaped nodes represent the predicted drivers that are also SMGs according to MutSig analysis. The edges connecting two nodes represent the human protein-protein interactions. The predominate cellular functions of the predicted module drivers are indicated next to each protein subnetwork. The functional classes of the predicted module drivers are shown at the bottom.

Our analysis identified a total of 318 candidate module drivers (doi:10.7303/syn1960770). Many of the top candidate drivers identified by NetSig have consistent oncogenic roles relevant to the functional classes of their CGMs. For example, top candidate drivers (red nodes, Figure 5a) of the cell cycle modules (black modules, Figure 5) include TP53, BRCA1, CDKN1A, CDK2, CDC6, PCNA and BUB1A, all of which have well-known roles in the regulation of cell cycle and genomic stability¹. Notably, non-synonymous mutations and foal SCNAs of TP53 and other cell cycle drivers are present in 55% of the human cancers (Figure 6d). This highlights the role of DNA repair and genomic instability as the key enabling hallmarks of human cancer. Tumor patients carrying genetic alternations of TP53 and other cell cycle drivers tend to have increased number of SCNA events (especially deletions) (Figure 6b), which is in agreement with the observation made independently by Zack et al. Submitted. To assess the downstream transcriptional impact of the genetic alternations to the cell cycle drivers, we defined the pathway activity of a gene set in terms of the coefficient of the first principle component of the expression matrix of the pre-defined gene set²⁹. Then we calculated the pathway activity for the cell cycle modules and the knowledge-based KEGG cell cycle pathway³⁰. The activity profile of the cell cycle modules and the KEGG cell cycle pathway (Figure 6e) are concordant with each other and both highly up-regulated among tumor patients with genetic changes of cell cycle drivers (student t-test, p < 2.2 × 10⁻¹⁶). In addition, the cell cycle module genes are expressed at much higher level than adjacent non-tumor tissues across 6 TCGA tumor types with such comparison available (Figure 6f).

Figure 6: Transcriptional impact of genetic alternations of cell cycle drivers.

a, The color keys for different tumor types used in this study. b, The stacked barplot shows sample-by-sample counts of broad amplification (red) and deletion events (blue) across tumor types (horizontal axis). c, The heatmap shows the SCNAs in each cancer sample (horizontal axis) plotted by chromosomal locations (vertical axis). d, The heatmap shows the somatic mutations of key cell cycle drivers sample-by-sample across tumor types. e, The pathway activity profile ²⁹ of the core cell cycle modules and the KEGG cell cycle pathway. The color key of somatic mutations and SCNA, the scale of pathway activity and copy number value based on GISTIC2⁴³ are shown below the activity profile from left to right. f, The boxplots show the comparison between the expression level (vertical axis) of the core cell cycle genes in adjacent normal tissue versus tumor tissue in 6 different TCGA data sets.

The protein subnetworks of the angiogenesis/EMT modules (blue and purple modules, Figure 5) share several common candidate drivers including CTNNB1, SMAD4, SRC, and ITGB1, which have known regulatory and signaling roles in TCF-beta signaling, angiogenesis and EMT^1,31,32. Other notable candidate drivers in the subnetworks include RAC1, EP300, EGFR which have been shown to be frequently mutated in human cancer (doi:10.7303/syn1715784.2). The top predicted drivers of the closely related module group (magenta modules, Figure 5f and Figure 3) include several muscle development genes including DMD, DMN, SGCD and SRC among others.

The top predicted drivers of the innate immune modules (red modules, Figure 5g) include STAT2 and RNASEL among others. STAT2 is reported to mediate the innate immunity and host defense mechanism via the Type I Interferon receptor ³³. RNASEL contributes in many ways to the interferon anti-virus system³⁴. The top candidate drivers of the adaptive immune modules (turquoise and orange module, Figure 5) include LCK, CD4, CBL, PTPN11 (also known as SHP2) with important roles in diverse immune response through the antigen recognition by the T cell receptor (TCR)³⁵. The top predicted drivers of the phosphorylation modules (yellow modules, Figure 5c) are linked to members of several major signaling pathways including AR, ESR1, PIK3R1, HSP90AA1, JAK1, STAT3, SMAD4¹.

The top predicted drivers of the metabolism modules (green modules, Figure 5b) include multiple members of the cohesin complex SMC1A, SMC1B, SMC3, RAD21, STAG1 and STAG2. Cohesin complex are frequently mutated in acute myeloid leukemia (AML)³⁶ and defects in cohesin complex may underlie chromosomal instability of human cancers^37,38. Other notable candidates of the subnetwork include H2AFX, HIST3H3, RHOA, MAPK8 and MAPK14. H2AFX is frequently mutated in the Pan-Cancer data by MutSig algorithm³⁹ (doi:10.7303/syn1715784.2).

We next asked whether the module drivers are enriched with cancer genes targeted by genomic alternations. We compared the predicted module drivers with the mutation drivers identified by the MutSig algorithm in the Pan-Cancer data (doi:10.7303/syn1715784.2). There are 29 out of 318 predicted modules drivers (Supplementary Table 3) overlapping with mutation drivers (hypergeometric test p < 8 × 10⁻²⁶). There are 41 out of 318 module drivers (Supplementary Table 5) overlapping with the Cancer Consensus Genes (CCG)⁴⁰³⁸ (hypergeometric test p < 1 × 10⁻²⁰). Overall, the result suggests that the predicted module drivers likely capture aspects of the biology and shed new light on the functional context of the cancer drivers discovered via mutation analysis.

View this table:

DISCUSSION

To our knowledge, this is the first extensive investigation of human cancer transcriptome to date. Our study defined the global transcriptome architecture of human cancer. We identified 11 groups of CGMs conserved across multiple tumor types. These conserved CGMs display distinct gene ontologies and cancer hallmark themes. The highly conserved CGMs across tumor types suggest the existence of selective force shaping the cancer transcriptome. Such conserved CGMs can be viewed as ‘building blocks’ of the cancer transcriptiome and be used as surrogate for the common cellular functions of human cancer. We noticed these conserved CGMs are significantly correlated with a large collection of published gene signatures (data not shown) although they are derived directly from the data and not dependent on any prior knowledge. As an example, we showed that the activity profile of the cell cycle modules is highly concordant with the knowledge-based KEGG cell cycle pathway (Figure 6e).

As a starting point to understand the common genetic and regulatory mechanism of these conserved CGMs, we integrated cancer transcriptome and interactome to nominated drivers for the conserved CGMs. We showed the predicted module drivers recapitulate known tumor biology and have consistent cancer-relevant functions with the hallmark themes of the conserved CGMs. Our analysis support that the cell cycle regulation and chromatin maintenance are the key system hallmarks of human cancer. Genetic alternations of TP53 and other cell cycle drivers have major downstream transcriptional impact on the cell cycle regulation from our Pan-Cancer analysis. We anticipate future studies to elucidate mechanism leading to the deregulation of the hallmark gene modules would have direct implications for the design of new drug targets and contribute to novel pathway-focused therapeutic paradigm.

METHODS SUMMARY

The 10 TCGA Pan-Cancer data sets used for our study are available from Sage Bionetworks (doi:10.7303/syn300013). The RNAseq molecular profiling was based on IlluminaHiSeq_RNAseqV2. The RNAseq data preprocessing was carried out using the SeqWare Pipeline project, version 0.7.0. The normalized level 3 RNAseq expression was based on RSEM⁴¹. Significantly recurrent SCNAs were identified using GISTIC 2.0^42,43 on relative total copy-number data after correction for sample purity. Significantly mutated genes were identified using MutSig³⁹. Co-expression network analysis was based on WGCNA package⁴⁴.

Author Contributions

LZ conceived the study and performed the analysis and wrote the paper.

ACKNOWLEDGEMENTS

The authors would like to acknowledge TCGA and Sage Bionetworks for the collaborative atmosphere to generate and make the PanCan12 data freeze public available and the support of all institutions involved. We would like to thank Gad Getz, Lynda Chin, Chip Stewart for the commenting on the preliminary draft of the manuscript. We would like to thank our Broad Institute colleagues Gordon Saksena, Mike Lawrence, Andrew Cherniack and members of the Broad GDAC team for their contribution to the generation of the PanCan12 data freeze during construction of the study.

Reference

1.↵
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
OpenUrl CrossRef PubMed Web of Science
2.↵
Mootha, V. K., Lindgren, C. M. & Eriksson, K. F. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature (2003).
3.
Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nat Genet 36, 1090–1098 (2004).
OpenUrl CrossRef PubMed Web of Science
4.
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545–15550 (2005).
OpenUrl Abstract/FREE Full Text
5.↵
Segal, E., Friedman, N., Kaminski, N., Regev, A. & Koller, D. From signatures to models: understanding cancer using microarrays. Nat Genet 37, S38–S45 (2005).
OpenUrl CrossRef PubMed Web of Science
6.↵
Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).
OpenUrl Abstract/FREE Full Text
7.
Snel, B., van Noort, V. & Huynen, M. A. Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes. Nucleic Acids Res. 32, 4725–4731 (2004).
OpenUrl CrossRef PubMed Web of Science
8.↵
Zhang, B. & Horvath, S. A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology 4, (2005).
9.↵
Zhang, B. et al. Integrated Systems Approach Identifies Genetic Nodes and Networks in Late-Onset Alzheimer’s Disease. Cell 153, 707–720 (2013).
OpenUrl CrossRef PubMed Web of Science
10.↵
Oldham, M. C., Horvath, S. & Geschwind, D. H. Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc. Natl. Acad. Sci. U.S.A. 103, 17973–17978 (2006).
OpenUrl Abstract/FREE Full Text
11.↵
Ravasz, E. Hierarchical Organization of Modularity in Metabolic Networks. Science 297, 1551–1555 (2002).
OpenUrl Abstract/FREE Full Text
12.↵
Langfelder, P., Zhang, B. & Horvath, S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics (2008).
13.↵
Pons, P. & Latapy, M. in Lecture Notes in Computer Science 3733, 284–293 (Springer Berlin Heidelberg, 2005).
OpenUrl
14.↵
Newman, M. Fast algorithm for detecting community structure in networks. Physical review E (2004).
15.↵
Berardini, T. Z. & Khodiyar, V. K. The Gene Ontology in 2010: extensions and refinements. Nucleic acids Res. 38, D331–D335 (2010).
OpenUrl CrossRef PubMed Web of Science
16.↵
Amato, A., Schillaci, T., Lentini, L. & Di Leonardo, A. CENPA overexpression promotes genome instability in pRb-depleted human cells. Mol Cancer (2009).
17.
Johnson, V. L., Scott, M. I. F., Holt, S. V., Hussein, D. & Taylor, S. S. Bub1 is required for kinetochore localization of BubR1, Cenp-E, Cenp-F and Mad2, and chromosome congression. (2004).
18.↵
Nabetani, A., Koujin, T., Tsutsumi, C., Haraguchi, T. & Hiraoka, Y. A conserved protein, Nuf2, is implicated in connecting the centromere to the spindle during chromosome segregation: a link between the kinetochore function and the spindle checkpoint. Chromosoma 110, 322–334 (2001).
OpenUrl CrossRef PubMed Web of Science
19.↵
Sawyer, S. L., Cheng, I. H., Chai, W. & Tye, B. K. Mcm10 and Cdc45 Cooperate in Origin Activation in Saccharomyces cerevisiae. Journal of Molecular Biology 340, 195–202 (2004).
OpenUrl CrossRef PubMed Web of Science
20.↵
MacDonald, B. T., Tamai, K. & He, X. Wnt/β-catenin signaling: components, mechanisms, and diseases. Developmental cell (2009).
21.↵
Pongracz, J. E. & Stockley, R. A. Wnt signalling in lung development and diseases. Respir. Res. 7, 15 (2006).
OpenUrl CrossRef PubMed
22.↵
Bass, A. J. et al. SOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas. Nat Genet 41, 1238–1242 (2009).
OpenUrl CrossRef PubMed Web of Science
23.↵
Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).
OpenUrl CrossRef PubMed Web of Science
24.↵
Pujana, M. A. et al. Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet 39, 1338–1349 (2007).
OpenUrl CrossRef PubMed Web of Science
25.↵
Barabási, A. L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nature Reviews Genetics (2004).
26.↵
Baolin, L. & Bo, H. HPRD: a high performance RDF database. Network and Parallel Computing (2007).
27.↵
Schaefer, C. F., Anthony, K., Krupa, S. & Buchoff, J. PID: the pathway interaction database. Nucleic acids Res. 37, D674–D679 (2009).
OpenUrl CrossRef PubMed Web of Science
28.↵
Croft, D. et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 39, D691–D697 (2010).
OpenUrl PubMed Web of Science
29.↵
Tomfohr, J., Lu, J. & Kepler, T. B. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6, 225 (2005).
OpenUrl CrossRef PubMed
30.↵
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–D114 (2011).
OpenUrl PubMed Web of Science
31.↵
Eliceiri, B. P. et al. Selective requirement for Src kinases during VEGF-induced angiogenesis and vascular permeability. Mol. Cell 4, 915–924 (1999).
OpenUrl CrossRef PubMed Web of Science
32.↵
Park, H. et al. Homeobox D1 regulates angiogenic functions of endothelial cells via integrin β1 expression. Biochem. Biophys. Res. Commun. 408, 186–192 (2011).
OpenUrl PubMed
33.↵
Perry, S. T., Buck, M. D., Lada, S. M., Schindler, C. & Shresta, S. STAT2 mediates innate immunity to Dengue virus in the absence of STAT1 via the type I interferon receptor. PLoS Pathog. 7, e1001297 (2011).
OpenUrl CrossRef PubMed
34.↵
Chakrabarti, A., Jha, B. K. & Silverman, R. H. New insights into the role of RNase L in innate immunity. J. Interferon Cytokine Res. 31, 49–57 (2011).
OpenUrl CrossRef PubMed Web of Science
35.↵
Pardoll, D. M. The blockade of immune checkpoints in cancer immunotherapy. Nat Rev Cancer 12, 252–264 (2012).
OpenUrl CrossRef PubMed Web of Science
36.↵
Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
OpenUrl CrossRef PubMed Web of Science
37.↵
Solomon, D. A. et al. Mutational inactivation of STAG2 causes aneuploidy in human cancer. Science 333, 1039–1043 (2011).
OpenUrl Abstract/FREE Full Text
38.↵
Barber, T. D. et al. Chromatid cohesion defects may underlie chromosome instability in human colorectal cancers. Proc. Natl. Acad. Sci. U.S.A. 105, 3443–3448 (2008).
OpenUrl Abstract/FREE Full Text
39.↵
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature (2013). doi:10.1038/nature12213
OpenUrl CrossRef PubMed Web of Science
40.↵
Futreal, P. A. et al. A census of human cancer genes. Nat Rev Cancer 4, 177–183 (2004).
OpenUrl CrossRef PubMed Web of Science
41.↵
Li, B. & Dewey, C. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics (2011).
42.↵
Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).
OpenUrl CrossRef PubMed Web of Science
43.↵
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
OpenUrl CrossRef PubMed
44.↵
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
OpenUrl CrossRef PubMed

View the discussion thread.

Posted October 05, 2015.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Cancer Biology

Subject Areas

All Articles

Animal Behavior and Cognition (5214)
Biochemistry (11745)
Bioengineering (8751)
Bioinformatics (29195)
Biophysics (14971)
Cancer Biology (12095)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14178)
Epidemiology (2067)
Evolutionary Biology (18306)
Genetics (12245)
Genomics (16801)
Immunology (11867)
Microbiology (28083)
Molecular Biology (11592)
Neuroscience (60965)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2885)
Systems Biology (7339)
Zoology (1651)

[1] 1.↵
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
OpenUrl CrossRef PubMed Web of Science

[2] 2.↵
Mootha, V. K., Lindgren, C. M. & Eriksson, K. F. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature (2003).

[3] 3.
Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nat Genet 36, 1090–1098 (2004).
OpenUrl CrossRef PubMed Web of Science

[4] 4.
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545–15550 (2005).
OpenUrl Abstract/FREE Full Text

[5] 5.↵
Segal, E., Friedman, N., Kaminski, N., Regev, A. & Koller, D. From signatures to models: understanding cancer using microarrays. Nat Genet 37, S38–S45 (2005).
OpenUrl CrossRef PubMed Web of Science

[6] 6.↵
Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).
OpenUrl Abstract/FREE Full Text

[7] 7.
Snel, B., van Noort, V. & Huynen, M. A. Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes. Nucleic Acids Res. 32, 4725–4731 (2004).
OpenUrl CrossRef PubMed Web of Science

[8] 8.↵
Zhang, B. & Horvath, S. A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology 4, (2005).

[9] 9.↵
Zhang, B. et al. Integrated Systems Approach Identifies Genetic Nodes and Networks in Late-Onset Alzheimer’s Disease. Cell 153, 707–720 (2013).
OpenUrl CrossRef PubMed Web of Science

[10] 10.↵
Oldham, M. C., Horvath, S. & Geschwind, D. H. Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc. Natl. Acad. Sci. U.S.A. 103, 17973–17978 (2006).
OpenUrl Abstract/FREE Full Text

[11] 11.↵
Ravasz, E. Hierarchical Organization of Modularity in Metabolic Networks. Science 297, 1551–1555 (2002).
OpenUrl Abstract/FREE Full Text

[12] 12.↵
Langfelder, P., Zhang, B. & Horvath, S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics (2008).

[13] 13.↵
Pons, P. & Latapy, M. in Lecture Notes in Computer Science 3733, 284–293 (Springer Berlin Heidelberg, 2005).
OpenUrl

[14] 14.↵
Newman, M. Fast algorithm for detecting community structure in networks. Physical review E (2004).

[15] 15.↵
Berardini, T. Z. & Khodiyar, V. K. The Gene Ontology in 2010: extensions and refinements. Nucleic acids Res. 38, D331–D335 (2010).
OpenUrl CrossRef PubMed Web of Science

[16] 16.↵
Amato, A., Schillaci, T., Lentini, L. & Di Leonardo, A. CENPA overexpression promotes genome instability in pRb-depleted human cells. Mol Cancer (2009).

[17] 17.
Johnson, V. L., Scott, M. I. F., Holt, S. V., Hussein, D. & Taylor, S. S. Bub1 is required for kinetochore localization of BubR1, Cenp-E, Cenp-F and Mad2, and chromosome congression. (2004).

[18] 18.↵
Nabetani, A., Koujin, T., Tsutsumi, C., Haraguchi, T. & Hiraoka, Y. A conserved protein, Nuf2, is implicated in connecting the centromere to the spindle during chromosome segregation: a link between the kinetochore function and the spindle checkpoint. Chromosoma 110, 322–334 (2001).
OpenUrl CrossRef PubMed Web of Science

[19] 19.↵
Sawyer, S. L., Cheng, I. H., Chai, W. & Tye, B. K. Mcm10 and Cdc45 Cooperate in Origin Activation in Saccharomyces cerevisiae. Journal of Molecular Biology 340, 195–202 (2004).
OpenUrl CrossRef PubMed Web of Science

[20] 20.↵
MacDonald, B. T., Tamai, K. & He, X. Wnt/β-catenin signaling: components, mechanisms, and diseases. Developmental cell (2009).

[21] 21.↵
Pongracz, J. E. & Stockley, R. A. Wnt signalling in lung development and diseases. Respir. Res. 7, 15 (2006).
OpenUrl CrossRef PubMed

[22] 22.↵
Bass, A. J. et al. SOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas. Nat Genet 41, 1238–1242 (2009).
OpenUrl CrossRef PubMed Web of Science

[23] 23.↵
Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).
OpenUrl CrossRef PubMed Web of Science

[24] 24.↵
Pujana, M. A. et al. Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet 39, 1338–1349 (2007).
OpenUrl CrossRef PubMed Web of Science

[25] 25.↵
Barabási, A. L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nature Reviews Genetics (2004).

[26] 26.↵
Baolin, L. & Bo, H. HPRD: a high performance RDF database. Network and Parallel Computing (2007).

[27] 27.↵
Schaefer, C. F., Anthony, K., Krupa, S. & Buchoff, J. PID: the pathway interaction database. Nucleic acids Res. 37, D674–D679 (2009).
OpenUrl CrossRef PubMed Web of Science

[28] 28.↵
Croft, D. et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 39, D691–D697 (2010).
OpenUrl PubMed Web of Science

[29] 29.↵
Tomfohr, J., Lu, J. & Kepler, T. B. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6, 225 (2005).
OpenUrl CrossRef PubMed

[30] 30.↵
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–D114 (2011).
OpenUrl PubMed Web of Science

[31] 31.↵
Eliceiri, B. P. et al. Selective requirement for Src kinases during VEGF-induced angiogenesis and vascular permeability. Mol. Cell 4, 915–924 (1999).
OpenUrl CrossRef PubMed Web of Science

[32] 32.↵
Park, H. et al. Homeobox D1 regulates angiogenic functions of endothelial cells via integrin β1 expression. Biochem. Biophys. Res. Commun. 408, 186–192 (2011).
OpenUrl PubMed

[33] 33.↵
Perry, S. T., Buck, M. D., Lada, S. M., Schindler, C. & Shresta, S. STAT2 mediates innate immunity to Dengue virus in the absence of STAT1 via the type I interferon receptor. PLoS Pathog. 7, e1001297 (2011).
OpenUrl CrossRef PubMed

[34] 34.↵
Chakrabarti, A., Jha, B. K. & Silverman, R. H. New insights into the role of RNase L in innate immunity. J. Interferon Cytokine Res. 31, 49–57 (2011).
OpenUrl CrossRef PubMed Web of Science

[35] 35.↵
Pardoll, D. M. The blockade of immune checkpoints in cancer immunotherapy. Nat Rev Cancer 12, 252–264 (2012).
OpenUrl CrossRef PubMed Web of Science

[36] 36.↵
Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
OpenUrl CrossRef PubMed Web of Science

[37] 37.↵
Solomon, D. A. et al. Mutational inactivation of STAG2 causes aneuploidy in human cancer. Science 333, 1039–1043 (2011).
OpenUrl Abstract/FREE Full Text

[38] 38.↵
Barber, T. D. et al. Chromatid cohesion defects may underlie chromosome instability in human colorectal cancers. Proc. Natl. Acad. Sci. U.S.A. 105, 3443–3448 (2008).
OpenUrl Abstract/FREE Full Text

[39] 39.↵
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature (2013). doi:10.1038/nature12213
OpenUrl CrossRef PubMed Web of Science

[40] 40.↵
Futreal, P. A. et al. A census of human cancer genes. Nat Rev Cancer 4, 177–183 (2004).
OpenUrl CrossRef PubMed Web of Science

[41] 41.↵
Li, B. & Dewey, C. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics (2011).

[42] 42.↵
Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).
OpenUrl CrossRef PubMed Web of Science

[43] 43.↵
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
OpenUrl CrossRef PubMed

[44] 44.↵
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
OpenUrl CrossRef PubMed

Convergent architecture of the transcriptome in human cancer

ABSTRACTS

INTRODUCTION

RESULTS

Co-expression network analysis reveals conservation of gene modules

Hierarchical organization of co-expressed gene modules

Conserved expression modules are enriched with cancer hallmarks

Functions of imputed module drivers reflect hallmark themes

DISCUSSION

METHODS SUMMARY

Author Contributions

ACKNOWLEDGEMENTS

Reference

Citation Manager Formats

Subject Area