MultiPLIER: a transfer learning framework for transcriptomics reveals systemic features of rare disease

Jaclyn N. Taroni; Peter C. Grayson; Qiwen Hu; Sean Eddy; Matthias Kretzler; Peter A. Merkel; Casey S. Greene

doi:10.1101/395947

SUMMARY

Unsupervised machine learning methods provide a promising means to analyze and interpret large datasets. However, most gene expression datasets generated by individual researchers remain too small to fully benefit from these methods. In the case of rare diseases, there may be too few cases available, even when multiple studies are combined. We trained a Pathway Level Information ExtractoR (PLIER) model using on a large public data compendium comprised of multiple experiments, tissues, and biological conditions. We then transferred the model to small rare disease datasets in an approach we term MultiPLIER. Models constructed from large, diverse public data i) included features that aligned well to important biological factors; ii) were more comprehensive than those constructed from individual datasets or conditions; iii) transferred to rare disease datasets where the models describe biological processes related to disease severity more effectively than models trained on specifically those datasets.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.