RT Journal Article SR Electronic T1 A simple analytical formula to compute the residual Mutual Information between pairs of data vectors JF bioRxiv FD Cold Spring Harbor Laboratory SP 041988 DO 10.1101/041988 A1 Jens Kleinjung A1 Anthony C.C. Coolen YR 2016 UL http://biorxiv.org/content/early/2016/03/01/041988.abstract AB Summary The Mutual Information of pairs of data vectors, for example sequence alignment positions or gene expression profiles, is a quantitative measure of the interdependence between the data. However, data vectors based on a finite number of samples retain non-zero Mutual Information values even for completely random data, which is referred to as background or residual Mutual Information. Estimates of the residual Mutual Information have so far been obtained through heuristic or numerical approximations. Here we introduce a simple analytical formula for the computation of the residual Mutual Information that yields precise values and does not require the joint probabilities between the vector elements as input.Availability and Implementation A C program arMI is available at http://mathbio.crick.ac.uk/wiki/Software#arMI. Using an input alignment in FASTA format or alternatively an internally created random alignment of specified length and depth, the program computes three types of Mutual information: (i) Shannon’s Mutual Information between all pairs of alignment columns; (ii) the numerical residual Mutual Information by using the same formula on the randomised (shuffled) data; (iii) the analytical residual Mutual Information introduced here. The package depends on the GNU Scientific Library, which is used for vector and matrix operations, factorial expressions and random number generation (?). Reference alignments and result data are included in the program package in the folder ‘tests’. The R environment was used for statistics and plotting (?).Contact Jens.Kleinjung{at}crick.ac.ukSupplementary Material A detailed derivation of the analytical formula is given in the Supplementary Material.