Development and Application of a Novel Simple Sequence Repeat Mining Algorithm Based on Regular Expression

Zhenguo Jia; Ruimei Geng; Xiuming Wu; Shuai Chen; Ying Tong; Aiguo Yang; Chenggang Luo; Min Ren

doi:10.1101/2022.06.01.494292

ABSTRACT

Simple sequence repeats (SSRs) are molecular genetic markers that are powerful tools in genomics studies; SSR markers are routinely mined as a part of genetic workflows. Here, we developed a novel SSR mining algorithm based on regular expression that can reduce the complexity of commonly used SSR mining software. We used the following SSR mining regular expression: ({i, j}?) (\1) {k}, where i and j denote the minimum and maximum lengths of the motifs of the SSR sequence, respectively, and k is the minimum number of repeat motifs. From this SSR mining algorithm, we developed an SSR sequence analysis software (named “regexSSRw”) that is capable of mining eligible SSR loci from FASTA format sequences; regexSSRw can be accessed at https://github.com/renm79/rgxSSRw. This SSR mining algorithm can aid a range of applications, from being used by programmers in the development of SSR mining software to being implemented by scholars into their SSR marker workflow.