TY - JOUR T1 - Modular non-repeating codes for DNA storage JF - bioRxiv DO - 10.1101/057448 SP - 057448 AU - Ian Holmes Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/06/08/057448.abstract N2 - We describe a strategy for constructing codes for DNA-based information storage by serial composition of weighted finite-state transducers. The resulting state machines can integrate correction of substitution errors; synchronization by interleaving watermark and periodic marker signals; conversion from binary to ternary, quaternary or mixed-radix sequences via an efficient block code; encoding into a DNA sequence that avoids homopolymer, dinucleotide, or trinucleotide runs and other short local repeats; and detection/correction of errors (including local duplications, burst deletions, and substitutions) that are characteristic of DNA sequencing technologies. We present software implementing these codes, available at https://github.com/ihh/dnastore, with simulation results demonstrating that the generated DNA is free of short repeats and can be accurately decoded even in the presence of substitutions, short duplications and deletions. ER -