Abstract
Finding key genes which are relative with cancer is the first essential step to understand what has taken place in the tumor cell. At present, the most methods which can discover key genes make a contrast between normal samples and tumor samples and base on the statistical test. However, those methods face on some problems like the insufficient of statistical test in unbalanced samples, defect of only using single data that can not display the holistic situation in tumor cell. For solving those issues, i proposed a innovation method that uses semi-supervised and unsupervised algorithm to discover key genes which are linked with cancer. The genes in the final result list are not only in the double category but with distinct hierarchy and those genes are all detected from diversity data like methylation, gene expression RNA-Seq, exon expression RNA-Seq and so on. At last, for comparing the result of this method and traditional statistical method, i used the conception of information gain ratio to prove the advantage of this deep learning method in mathematical.