Abstract
Various distance functions for evaluating the differences be- tween gene expression profiles have been proposed in the past. Such a function would output a low value if the profiles are strongly correlated—either negatively or positively—and vice versa. One popular distance function is the absolute correlation distance, da = 1 − |ρ|, where ρ is some similarity measures, such as Pearson or Spearman correlation. How- ever, absolute correlation distance fails to fulfill the triangular inequality, which would have guaranteed better performance at vector quantization, allowed fast data localization, as well as sped up data clustering. In this work, we propose as an alternative. We prove that dr satisfies the triangular equality when ρ represents Pearson correlation, Spearman correlation, or Cosine similarity. We empirically compared dr with da in gene clustering and sample clustering experiment, using real biological data. The two distances performed similarly in both gene cluster and sample cluster in hierarchical cluster and PAM cluster. However, dr demonstrated more robust clustering. According to bootstrap experiment, the number of times where dr generated more robust sample pair partition is significantly (p-value < 0.05) larger. This advantage in robustness is also supported by the class “dissolved” event.