TY - JOUR T1 - A novel approach to identifying marker genes and estimating the cellular composition of whole blood from gene expression profiles JF - bioRxiv DO - 10.1101/038794 SP - 038794 AU - Casey P. Shannon AU - Robert Balshaw AU - Virginia Chen AU - Zsuzsanna Hollander AU - Mustafa Toma AU - Bruce M. McManus AU - J. Mark FitzGerald AU - Don D. Sin AU - Raymond T. Ng AU - Scott J. Tebbutt Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/02/03/038794.abstract N2 - Measuring genome-wide changes in transcript abundance in circulating peripheral whole blood cells is a useful way to study disease pathobiology and may help elucidate biomarkers and molecular mechanisms of disease. The sensitivity and interpretability of analyses carried out in this complex tissue, however, are significantly affected by its dynamic heterogeneity. It is therefore desirable to quantify this heterogeneity, either to account for it or to better model interactions that may be present between the abundance of certain transcripts, some cell types and the indication under study. Accurate enumeration of the many component cell types that make up peripheral whole blood can be costly, however, and may further complicate the sample collection process. Many approaches have been developed to infer the composition of a sample from high-dimensional transcriptomic and, more recently, epigenetic data. These approaches rely on the availability of isolated expression profiles for the cell types to be enumerated. These profiles are platform-specific, suitable datasets are rare, and generating them is expensive. No such dataset exists on the Affymetrix Gene ST platform. We present a freely-available, and open source, multi-response Gaussian model capable of accurately predicting the composition of peripheral whole blood samples from Affymetrix Gene ST expression profiles. This model outperforms other current methods when applied to Gene ST data and could potentially be used to enrich the >10,000 Affymetrix Gene ST blood gene expression profiles currently available on GEO.We introduce a model that accurately predicts the composition of blood from Affymetrix Gene ST gene expression profiles.This model outperforms existing methods when applied to Affymetrix Gene ST expression profiles from blood.Key Points ER -