TY - JOUR T1 - Codon usage is a stochastic process across genetic codes of the kingdoms of life JF - bioRxiv DO - 10.1101/066381 SP - 066381 AU - Bohdan B. Khomtchouk AU - Claes Wahlestedt AU - Wolfgang Nonner Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/07/27/066381.abstract N2 - DNA encodes protein primary structure using 64 different codons to specify 20 different amino acids and a stop signal. To uncover rules of codon use, ranked codon frequencies have previously been analyzed in terms of empirical or statistical relations for a small number of genomes. These descriptions fail on most genomes reported in the Codon Usage Tabulated from GenBank (CUTG) database. Here we model codon usage as a random variable. This stochastic model provides accurate, one-parameter characterizations of 2210 nuclear and mitochondrial genomes represented with > 104 codons/genome in CUTG. We show that ranked codon frequencies are well characterized by a truncated normal (Gaussian) distribution. Most genomes use codons in a nearuniform manner. Lopsided usages are also widely distributed across genomes but less frequent. Our model provides a universal framework for investigating determinants of codon use. ER -