TY - JOUR T1 - The CRISPR spacer space is dominated by sequences from the species-specific mobilome JF - bioRxiv DO - 10.1101/137356 SP - 137356 AU - Sergey A. Shmakov AU - Vassilii Sitnik AU - Kira S. Makarova AU - Yuri I. Wolf AU - Konstantin V. Severinov AU - Eugene V. Koonin Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/05/12/137356.abstract N2 - The CRISPR-Cas is the prokaryotic adaptive immunity system that stores memory of past encounters with foreign DNA in spacers that are inserted between direct repeats in CRISPR arrays 1,2. Only for a small fraction of the spacers, homologous sequences, termed protospacers, are detectable in viral, plasmid or microbial genomes 3,4. The rest of the spacers remain the CRISPR “dark matter”. We performed a comprehensive analysis of the spacers from all CRISPR-cas loci identified in bacterial and archaeal genomes, and found that, depending on the CRISPR-Cas subtype and the prokaryotic phylum, protospacers were detectable for 1 to about 19% of the spacers (∼7% global average). Among the detected protospacers, the majority, typically, 80 to 90%, originate from viral genomes, and among the rest, the most common source are genes integrated in microbial chromosomes but involved in plasmid conjugation or replication. Thus, almost all spacers with identifiable protospacers target mobile genetic elements (MGE). The GC-content, as well as dinucleotide and tetranucleotide compositions, of microbial genomes, their spacer complements, and the cognate viral genomes show a nearly perfect correlation and are almost identical. Given the near absence of self-targeting spacers, these findings are best compatible with the possibility that the spacers, including the dark matter, are derived almost completely from the species-specific microbial mobilomes. ER -