TY - JOUR T1 - Cancer classification in the genomic era: five contemporary problems JF - bioRxiv DO - 10.1101/023127 SP - 023127 AU - Qingxuan Song AU - Sofia D. Merajver AU - Jun Z. Li Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/07/23/023127.abstract N2 - Classification is an everyday instinct as well as a full-fledged scientific discipline. Throughout the history of medicine, disease classification is central to how we organize knowledge, obtain diagnosis, and assign treatment. Here we discuss the classification of cancer, the process of categorizing cancers based on their observed clinical and biological features. Traditionally, cancer nomenclature is primarily based on organ location, e.g., “lung cancer” designates a tumor originating in lung structures. Within each organ-specific major type, further subgroups can be defined based on patient age, cell type, histological grades, and sometimes molecular markers, e.g., hormonal receptor status in breast cancer, or microsatellite instability in colorectal cancer. In the past 15+ years, high-throughput technologies have generated rich new data for somatic variations in DNA, RNA, protein, or epigenomic features for many cancers. These data, representing increasingly large tumor collections, have provided not only new insights into the biological diversity of human cancers, but also exciting opportunities for discovery of new cancer subtypes. Meanwhile, the unprecedented volume and complexity of these data pose significant challenges for biostatisticians, cancer biologists, and clinicians alike. Here we review five related issues that represent long-standing problems in cancer taxonomy and interpretation. 1. How many cancer types are there? 2. How can we evaluate the robustness of a new classification system? 3. How are classification systems affected by intratumor heterogeneity and tumor evolution? 4. How should we interpret cancer subtypes? 5. Can multiple classification systems coexist? While these problems are not new, we will focusing on aspects that were magnified by the recent influx of complex multi-omics data. Ongoing exploration of these problems is essential for developing data-driven cancer classification and the successful application of these concepts in precision medicine. ER -