Abstract
Cancer remains a leading cause of morbidity and mortality worldwide. Its evolutionary nature and resultant complex interactions with the tumour micro-environment and the host immune system engender heterogeneity, make developing interventions difficult. Usually detected at the advanced stages of disease, metastatic cancer accounts for 90% of cancer-associated deaths. Therefore early detection of cancer, combined with current therapies, would have a significant impact on survival and treatment of this insidious disease. Epigenetic changes such as DNA methylation are some of the early events in carcinogenesis. Here, we report on a machine learning model that can classify 13 types of cancer as well as non-cancer tissue samples using only DNA methylome data, with an accuracy of 98.2%. We utilise the features identified by this model to develop a robust deep neural network that can generalise to independent data sets. We also demonstrate that the methylation associated genomic loci detected by the classifier are associated with genes involved in cancer, providing insights into the epigenomic regulation of carcinogenesis.
9 Abbreviations
- TCGA
- The Cancer Genome Atlas
- BLCA
- Bladder urothelial carcinoma
- BRCA
- Breast invasive carcinoma
- COAD
- Colon adenocarcinoma
- ESCA
- Esophageal carcinoma
- HNSC
- Head and neck squamous cell carcinoma
- KIRC
- Kidney renal clear cell carcinoma
- KIRP
- Kidney renal papillary cell carcinoma
- LIHC
- Liver hepatocellular carcinoma
- LUAD
- Lung adenocarcinoma
- LUSC
- Lung squamous cell carcinoma
- PRAD
- Prostate adenocarcinoma
- THCA
- Thyroid carcinoma
- UCEC
- Uterine corpus endometrial carcinoma
- AUC
- Area Under the Curve
- ROC
- Receiver Operating Characteristic
- MCC
- Matthews Correlation Coefficient
- UMAP
- Uniform manifold approximation and projection