TY - JOUR T1 - Automating Assessment of the Undiscovered Biosynthetic Potential of Actinobacteria JF - bioRxiv DO - 10.1101/036087 SP - 036087 AU - Bogdan Tokovenko AU - Yuriy Rebets AU - Andriy Luzhetskyy Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/01/07/036087.abstract N2 - Background. Biosynthetic potential of Actinobacteria has long been the subject of theoretical estimates. Such an estimate is indeed important as a test of further exploitability of a taxon or group of taxa for new therapeutics. As neither a set of available genomes nor a set of bacterial cultivation methods are static, it makes sense to simplify as much as possible and to improve reproducibility of biosynthetic gene clusters similarity, diversity, and abundance estimations.Results. We have developed a command-line computational pipeline (available at https://bitbucket.org/qmentis/clusterscluster/) that assists in performing empirical (genome-based) assessment of microbial secondary metabolite gene clusters similarity and abundance, and applied it to a set of 208 complete and de-duplicated Actinobacteria genomes. After a brief overview of Actinobacteria biosynthetic potential as compared to other bacterial taxa, we use similarity thresholds derived from 4 pairs of known similar gene clusters to identify up to 40-48% of 3247 gene clusters in our set of genomes as unique. There is no saturation of the cumulative unique gene clusters curve within the examined dataset, and Heap’s alpha is 0.129, suggesting an open pan-clustome. We identify and highlight pitfalls and possible improvements of genome-based gene cluster similarity measurements. ER -