PT - JOURNAL ARTICLE AU - David Heller AU - Martin Vingron AU - Ralf Krestel AU - Uwe Ohler AU - Annalisa Marsico TI - ssHMM: Extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data AID - 10.1101/076034 DP - 2016 Jan 01 TA - bioRxiv PG - 076034 4099 - http://biorxiv.org/content/early/2016/10/11/076034.short 4100 - http://biorxiv.org/content/early/2016/10/11/076034.full AB - RNA-binding proteins (RBPs) play important roles in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. To which extent RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders which produce informative motifs and simultaneously capture the relationship between primary sequence and different RNA secondary structures are missing. We developed ssHMM, an RNA motif finder that combines a hidden Markov model (HMM) with Gibbs sampling to learn the joint sequence and structure binding preferences of RBPs from high-throughput data, such as CLIP-Seq sequences, and visualizes them as a graph. Evaluations on synthetic data showed that ssHMM reliably recovers fuzzy sequence motifs in 80 to 100% of the cases. It produces motifs with higher information content than existing tools and is faster than other methods on large datasets. Examples of new sequence-structure motifs identified by ssHMM for uncharacterized RBPs are also discussed. ssHMM is freely available on Github at https://github.molgen.mpg.de/heller/ssHMM.