Abstract
Background With advances in sequencing technology and decreasing costs, the number of bacteriophage genomes that have been sequenced has increased markedly in the last decade.
Materials and Methods We developed an automated retrieval and analysis system for bacteriophage genomes, INPHARED (https://github.com/RyanCook94/inphared), that provides data in a consistent format.
Results As of January 2021, 14,244 complete phage genomes have been sequenced. The data set is dominated by phages that infect a small number of bacterial genera, with 75% of phages isolated only on 30 bacterial genera. There is further bias with significantly more lytic phage genomes than temperate within the database, resulting in ~54% of temperate phage genomes originating from just three host genera. Within phage genomes, putative antibiotic resistance genes were found in higher frequencies in temperate phages than lytic phages.
Conclusion We provide a mechanism to reproducibly extract complete phage genomes and highlight some of the biases within this data, that underpins our current understanding of phage genomes.
Competing Interest Statement
The authors have declared no competing interest.