PT  - JOURNAL ARTICLE
AU  - Morten Muhlig Nielsen
AU  - Paula Tataru
AU  - Tobias Madsen
AU  - Asger Hobolth
AU  - Jakob Skou Pedersen
TI  - Regmex, Motif analysis in ranked lists of sequences
AID  - 10.1101/035956
DP  - 2016 Jan 01
TA  - bioRxiv
PG  - 035956
4099  - http://biorxiv.org/content/early/2016/01/11/035956.short
4100  - http://biorxiv.org/content/early/2016/01/11/035956.full
AB  - Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present a motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in a ranked list of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact probabilities for motif observations in sequences. Motif enrichment is optionally evaluated using random walks, Brownian bridges, or modified rank based statistics. These features make Regmex well suited for a range of biological sequence analysis problems related to motif discovery. We demonstrate different usage scenarios including rank correlation of microRNA binding sites co-occurring with a U-rich motif. The method is available as an R package.