RT Journal Article SR Electronic T1 H&E-stained Whole Slide Deep Learning Predicts SPOP Mutation State in Prostate Cancer JF bioRxiv FD Cold Spring Harbor Laboratory SP 064279 DO 10.1101/064279 A1 Andrew J. Schaumberg A1 Mark A. Rubin A1 Thomas J. Fuchs YR 2016 UL http://biorxiv.org/content/early/2016/07/21/064279.abstract AB The genetic basis of histological phenotype is well known in molecular pathology, such as CDH1 loss leading to a lobular rather than ductal phenotype in breast cancer. Unfortunately, these phenotypes are qualitative. Moreover, a single genetic alteration may evidence multiple and non-unique histologic features, such as TMPRSS2-ERG fusion giving rise to macronuclei, blue-tinged mucin, and cribriform pattern. A quantitative model to genetically interpret the histology is desirable to guide downstream immunohistochemistry, genomics, and precision medicine. We constructed a statistical model that predicts whether or not SPOP is mutated in prostate cancer, given only the digital whole slide after standard hematoxylin and eosin [H&E] staining. Using a cohort of 177 prostate cancer patients where 20 had mutant SPOP, we trained multiple ensembles of residual networks, accurately distinguishing SPOP mutant from SPOP wild type patients. To our knowledge, this is the first statistical model to predict a genetic mutation in cancer directly from the patient’s digitized H&E-stained whole microscope slide.Significance Statement The authors present the first automatic pipeline predicting gene mutation probability in cancer from digitized light microscopy slides having standard hematoxylin and eosin staining. To predict whether or not the speckle-type POZ protein [SPOP] gene is mutated in prostate cancer, the pipeline (i) identifies diagnostically salient regions in the whole slide at low magnification, (ii) identifies the salient region having the dominant tumor, (iii) within this region finds the high magnification subregion most enriched for abnormal cells, and (iv) trains an ensemble of deep learning binary classifiers that together predict a 95% confidence interval of mutation probability. This work enables fully-automated histologic diagnoses based on probabilities of underlying molecular aberrations. Such probabilities may directly guide immunohistochemistry choices, genetic tests, and precision medicine.