Abstract
Machine learning is used to investigate the codon usage of protein-encoding genes, which is one of the fundamental questions of molecular biology. The presentation, parameter learning, and decoding of the conditional random field (CRF) model are implemented and then utilized to analyze the codon usage of the genes of Escherichia coli and its phages. Most genes of E. coli use codons conforming to the weights of the model determined by all E. coli genes. Phages use the codons like their host E. coli. Finally, the study evaluates the codon usage of several example genes in the context of the model. These results help to understand the codon usage in E. coli.
Competing Interest Statement
The authors have declared no competing interest.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.