Abstract
Breast cancer is one of the most common cancers, accounting for about 30% of female cancers and a mortality rate of 15%. The 5-year survival rate is most commonly used to assess cancer progression and guide clinical practice. We used the CatBoost model to systematically construct a five-year mortality risk prediction model based on two independent data sets (BRCA_METABRIC, BRCA_TCGA). The model input data are the somatic genomic variants (copy number variation, SNP locus, cumulative mutation number of genes) and phenotype data of cancer samples. The optimal model combined all the above characteristics, and the AUC reached 0.70 in an independent external data set. At the same time, we also conducted a biological analysis of the characteristics of the model and found some potential biomarkers (TP53, DNAH11, MAP3K1, PHF20L1, etc.). The results of model risk stratification can be used as a guide for the prognosis of breast cancer.
Competing Interest Statement
The authors have declared no competing interest.