RT Journal Article SR Electronic T1 Automating Assessment of the Undiscovered Biosynthetic Potential of Actinobacteria JF bioRxiv FD Cold Spring Harbor Laboratory SP 036087 DO 10.1101/036087 A1 Bogdan Tokovenko A1 Yuriy Rebets A1 Andriy Luzhetskyy YR 2016 UL http://biorxiv.org/content/early/2016/01/07/036087.abstract AB Background. Biosynthetic potential of Actinobacteria has long been the subject of theoretical estimates. Such an estimate is indeed important as a test of further exploitability of a taxon or group of taxa for new therapeutics. As neither a set of available genomes nor a set of bacterial cultivation methods are static, it makes sense to simplify as much as possible and to improve reproducibility of biosynthetic gene clusters similarity, diversity, and abundance estimations.Results. We have developed a command-line computational pipeline (available at https://bitbucket.org/qmentis/clusterscluster/) that assists in performing empirical (genome-based) assessment of microbial secondary metabolite gene clusters similarity and abundance, and applied it to a set of 208 complete and de-duplicated Actinobacteria genomes. After a brief overview of Actinobacteria biosynthetic potential as compared to other bacterial taxa, we use similarity thresholds derived from 4 pairs of known similar gene clusters to identify up to 40-48% of 3247 gene clusters in our set of genomes as unique. There is no saturation of the cumulative unique gene clusters curve within the examined dataset, and Heap’s alpha is 0.129, suggesting an open pan-clustome. We identify and highlight pitfalls and possible improvements of genome-based gene cluster similarity measurements.