Abstract
A diverse range of DNA sequences derived from circoviruses (family Circoviridae) have been identified in samples obtained from humans and domestic animals, often in association with pathological conditions. In the majority of cases, however, little or nothing is known about the natural biology of the viruses from which these sequences are derived. Endogenous circoviral elements (CVe) are DNA sequences derived from circoviruses that occur in animal genomes and provide a useful source of information about circovirus-host relationships. In this study we screened whole genome sequence data of 684 animal species to identify CVe. We identify numerous novel circovirus-related sequences in invertebrate genome assemblies, including the first examples of CVe derived from cycloviruses. We confirmed the presence of these CVe in the germline of the elongate twig ant (Pseudomyrmex gracilis) via PCR, thereby establishing the first concrete evidence of a host association for the Cyclovirus genus. We examined the evolutionary relationships between CVe and contemporary circoviruses, showing that when sequences derived from circovirus isolates and CVe are examined alone, the host species associations of circovirus clades appear relatively stable, at least at higher taxonomic levels (i.e. phylum, class, order). However, when sequences generated via metagenomic sequencing are included, associations are randomly distributed across the phylogeny, particularly in the clade corresponding to the Cyclovirus genus, suggesting that contamination may be an issue. Based on the robust grouping of CVe from ants and mites with cycloviruses in phylogenies, we propose that cycloviruses occur commonly in the environment as infections of arthropods, and may frequently contaminate vertebrate samples as a consequence. Our study shows how endogenous viral sequences can inform metagenomics-based virus discovery. In addition, it raises important questions about the role of cycloviruses as pathogens of humans and other vertebrate species.
Importance Advances DNA sequencing have dramatically increased the rate at which new viruses are being identified. However, the host species associations of most virus sequences identified in metagenomic samples are difficult to determine. Our analysis indicates that viruses proposed to infect vertebrates (in some cases being linked to human disease) may in fact be restricted to arthropod hosts. The detection of these sequences in vertebrate samples may reflect their widespread presence in the environment as viruses of parasitic arthropods.