ABSTRACT
Deconvolution of mouse transcriptomic data is challenged by the fact that mouse models carry various genetic and physiological perturbations, making it questionable to assume fixed cell types and cell type marker genes for different dataset scenarios. We developed a Semi-Supervised Mouse data Deconvolution (SSMD) method to study the mouse tissue microenvironment (TME). SSMD is featured by (i) a novel non-parametric method to discover data set specific cell type signature genes; (ii) a community detection approach for fixing cell types and their marker genes; (iii) a constrained matrix decomposition method to solve cell type relative proportions that is robust to diverse experimental platforms. In summary, SSMD addressed several key challenges in the deconvolution of mouse tissue data, including: (1) varied cell types and marker genes caused by highly divergent genotypic and phenotypic conditions of mouse experiment, (2) diverse experimental platforms of mouse transcriptomics data, (3) small sample size and limited training data source, and (4) capable to estimate the proportion of 35 cell types in blood, inflammatory, central nervous or hematopoietic systems. In silico and experimental validation of SSMD demonstrated its high sensitivity and accuracy in identifying (sub) cell types and predicting cell proportions comparing to state-of-the-arts methods. A user-friendly R package and a web server of SSMD are released via https://github.com/xiaoyulu95/SSMD.
Key points
We provide a novel tissue deconvolution method, namely SSMD, which is specifically designed for mouse data to handle the variations caused by different mouse strain, genetic and phenotypic background, and experimental platforms.
SSMD is capable to detect data set and tissue microenvironment specific cell markers for more than 30 cell types in mouse blood, inflammatory tissue, cancer, and central nervous system.
SSMD achieve much improved performance in estimating relative proportion of the cell types compared with state-of-the-art methods.
The semi-supervised setting enables the application of SSMD on transcriptomics, DNA methylation and ATAC-seq data.
A user friendly R package and a R shiny of SSMD based webserver are also developed.
- Tissue Data Deconvolution
- Cancer microenvironment
- Semi-supervised Learning
- Mouse Omics Data
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Authors’ Biographical Note: Xiaoyu Lu is a PhD student in the Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis.
Szu-Wei Tu is a master student in the Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis.
Wennan Chang is a PhD student in the Department of Electrical and Computer Engineering, Purdue University.
Changlin Wan is a PhD student in the Department of Electrical and Computer Engineering, Purdue University.
Jiashi Wang is a research associate at the Biomedical Data Research Data (BDRD) Lab at Indiana University School of Medicine.
Yong Zang is an Assistant Professor in the Department of Biostatistics and a member of the Center for Computational Biology and Bioinformatics, Indiana University School of Medicine.
Baskar Ramdas is an Assistant Research Professor in the Department of Pediatrics, Indiana University School of Medicine.
Reuben Kapur is Frieda and Albrecht Kipp Professor in the Department of Pediatrics, Indiana University School of Medicine.
Xiongbin Lu is Vera Bradley Foundation Professor of Breast Cancer Innovation and Professor in the Department of Medical and Molecular Genetics, Indiana University School of Medicine.
Sha Cao is an Assistant Professor in the Department of Biostatistics and a member of the Center for Computational Biology and Bioinformatics, Indiana University School of Medicine.
Chi Zhang is an Assistant Professor in the Department of Medical and Molecular Genetics and a member of the Center for Computational Biology and Bioinformatics, Indiana University School of Medicine.