RT Journal Article SR Electronic T1 Modular non-repeating codes for DNA storage JF bioRxiv FD Cold Spring Harbor Laboratory SP 057448 DO 10.1101/057448 A1 Ian Holmes YR 2016 UL http://biorxiv.org/content/early/2016/06/08/057448.abstract AB We describe a strategy for constructing codes for DNA-based information storage by serial composition of weighted finite-state transducers. The resulting state machines can integrate correction of substitution errors; synchronization by interleaving watermark and periodic marker signals; conversion from binary to ternary, quaternary or mixed-radix sequences via an efficient block code; encoding into a DNA sequence that avoids homopolymer, dinucleotide, or trinucleotide runs and other short local repeats; and detection/correction of errors (including local duplications, burst deletions, and substitutions) that are characteristic of DNA sequencing technologies. We present software implementing these codes, available at https://github.com/ihh/dnastore, with simulation results demonstrating that the generated DNA is free of short repeats and can be accurately decoded even in the presence of substitutions, short duplications and deletions.