CRISPRCasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems

Victor A. Padilha; Omer S. Alkhnbashi; Shiraz A. Shah; André C. P. L. F. de Carvalho; Rolf Backofen

doi:10.1101/817619

ABSTRACT

CRISPR-Cas genes are extraordinarily diverse and evolve rapidly when compared to other prokaryotic genes. With the rapid increase in newly sequenced archaeal and bacterial genomes, manual identification of CRISPR-Cas systems is no longer viable. Thus, an automated approach is required for advancing our understanding of the evolution and diversity of these systems, and for finding new candidates for genome engineering in eukaryotic models. In this paper, we introduce a holistic strategy that combines regression and classification models for improving the quality of protein cascades, predicting their subtypes, detecting signature genes and extracting potential rules that reveal functional modules for CRISPR.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.