Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data

Basel Abu-Jamous; Steven Kelly

doi:10.1101/221309

Abstract

Identification of co-expressed genes within a given experimental or biological context can provide evidence for genetic or physical interactions between genes. Thus, detection of co-expression has become a routine step in large-scale analyses of gene expression data. In this work, we show that application of the most commonly used methods to identify co-expressed gene clusters produce results that do not match the biological expectations of co-expressed gene clusters. Specifically, clusters generated using these methods are not discrete and can contain up to 50% unreliably assigned genes. Consequently, downstream analyses on these clusters, such as functional term enrichment analysis, suffer from high error rates. We present clust, an automated method that solves this problem by extracting clusters from gene expression datasets that match the biological expectations of co-expressed genes. Using 100 gene expression datasets from five model organisms we demonstrate that the statistical properties of clusters generated by clust are better than those produced by other methods. We further show that this improvement results in a concomitant improvement in detection of enriched functional terms.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.