Abstract
Endogenous retroviruses (ERVs) represent host genomic fossils of ancient viruses. Foamy viruses, including those that form endogenous copies, provide strong evidence for virus-host co-divergence across the vertebrate phylogeny. Endogenous foamy viruses (EFV) have previously been discovered in mammals, amphibians and fish. Here we report a novel endogenous foamy virus, named SpuEFV, in genome of the tuatara (Sphenodon punctatus), a reptile species endemic to New Zealand. Surprisingly, SpuEFV robustly grouped with the coelacanth EFV on virus phylogenies, rather than with the mammalian foamy viruses as expected with virus-host co-divergence, and indicative of a major cross-species transmission event in the early evolution of the foamy viruses. In sum, the discovery of SpuEFV fills a major gap in the fossil record of foamy viruses and provides important insights into the early evolution of retroviruses.
Retroviruses (family Retroviridae) are viruses of major medical significance as some are associated with severe infectious disease or are oncogenic (Hayward, et al. 2015; Aiewsakun and Katzourakis 2017; Xu, et al. 2018). Retroviruses are also of note because of their ability to integrate into the host germ-line, generating endogenous retroviruses (ERVs) that then exhibit Mendelian inheritance (Stoye 2012; Johnson 2015). ERVs are widely distributed in vertebrates and provide important molecular “fossils” for the study of retrovirus evolution. ERVs related to all seven major retroviral genera have been described, although some of the more complex retroviruses, such as lenti-, delta- and foamy viruses, rarely appear as endogenous copies.
As well as being agents of disease, foamy viruses are of importance because of their long-term virus-host co-divergence (Switzer, et al. 2005). Endogenous foamy viruses (EFVs), first discovered in sloths (class Mammalia) (Katzourakis, et al. 2009) also exhibit co-divergence. The later discovery of a fish EFV in the coelacanth genome indicated that foamy viruses have an ancient evolutionary history (Han and Worobey 2012) and hence have likely co-diverged with their vertebrate hosts for hundreds of million years (Aiewsakun and Katzourakis 2017). However, although EFVs or foamy-like elements have been reported in fish, amphibians and mammals, they have currently not been reported in genomes of reptiles.
To search for potential foamy (-like) viral elements in reptiles, we collated 28 reptilian genomes (Supplementary Table S1) and performed in silico TBLASTN with full-length Pol sequences of various foamy viruses, including EFVs, as screening probes (Supplementary Table S2). We only considered viral hits within long genomic scaffold (>20 kilobases in length) to be bona fide ERVs. This genomic mining identified 175 ERV hits in three species: tuatara (Sphenodon punctatus), Schlegel’s Japanese Gecko (Gekko japonicas) and Madagascar ground gecko (Paroedura picta). However, because only one viral hit of each was found in the Schlegel’s Japanese Gecko and Madagascar ground gecko (accession number: LNDG01066615.1 and BDOT01000314.1), which could represent false-positives, they were excluded. Hence, a total of 173 ERV hits in the tuatara genome were extracted and subjected to evolutionary analysis (Supplementary Table S3).
The long Pol (>700 amino acids) and Env (>350) sequences of these ERVs were then selected for phylogenetic analysis. Our maximum likelihood (ML) phylogenetic tree revealed that the ERVs discovered in tuatara genome formed a close monophyletic group within the foamy clade, indicative of a single origin, and with high bootstrap supportsin both phylogenies (Fig. 1; Fig. S1). We named this new ERV as SpuEFV (Sphenodon punctatus endogenous foamy virus). To our surprise, SpuEFV was consistently and robustly related to the fish EFV – CoeEFV – derived from the coelacanth genome (Han and Worobey 2012), and hence in conflict with the known host phylogeny. Although this phylogenetic pattern is compatible with cross-class virus transmission from fish to reptiles, it is possible that this pattern will change with a larger sampling of taxa such that the EFV phylogeny expands. Failure to detect any SpuEFV related elements in the remaining reptilian genome screening suggests that the virus was not vertically transmitted among reptiles, although this will clearly need to be reassessed with a larger sample size.
We successfully retrieved two full-length SpuEFV viral genomes and annotated one in detail (Fig. S2). The annotated sequence exhibits a typical spuma virus structure, encoding three mainly open reading frames (ORF) – gag, pol and env – and one additional accessory genes, Acc1 (Fig. 2). Interestingly, this accessory ORF (Acc1) exhibit no sequence similarity to known genes. Notably, by searching the Conserved Domains Database (www.ncbi.nlm.nih.gov/Structure/cdd), we identified a typical conserved foamy virus envelope protein domain (pfam034308) (Han and Worobey 2012), they further confirming that SpuEFV is of foamy virus origin.
To broadly estimate the integration time of SpuEFVs, we employed the LTR (long terminal repeat)-divergence method, which analyzes the degree of divergence between 5’ and 3’LTRs assuming a known rate of nucleotide substitution (Johnson and Coffin 1999). In total, five pairwise LTRs flanking SpuEFV elements were used for date estimation (Supplementary Table S4), from which we estimated an integration time of SpuEFV ranging from 1.3 to 35.47 MYA (million years ago). Although these dates are young relative to the age of reptiles, LTR dating may severely underestimate ERV ages (Kijima and Innan 2010; Aiewsakun and Katzourakis 2017), such that all estimates of integration time should be treated with caution.
Previous studies provided strong evidence for the co-divergence of foamy viruses and their vertebrate hosts over extended time-periods (Katzourakis, et al. 2009). That the reptilian SpuEFV newly described here was most closely related to fish EFVs than those found in mammalian genomes (Fig. 3) indicates that cross-species virus transmission on a back-bone of long-term virus-host co-divergence may also play a major role in shaping the early evolution of retroviruses.
Materials and Methods
Genomic mining
To identify foamy viruses in reptiles, the TBLASTN program (Altschul, et al. 1990) was used to screen relevant taxa from 28 reptile genomes downloaded from GenBank (www.ncbi.nlm.nih.gov/genbank) (Supplementary Table S1). In each case amino acid sequences of the Pol and Env genes of representative EFVs (endogenous foamy viruses), foamy-like sequences, and foamy viruses were chosen as queries. As filters to identify significant and meaningful hits, we chose sequences with more than 30% amino acid identity over a 30% genomic region, with an e-value set to 0.00001. We extended viral flanking sequences of the hits to identify the 5’- and 3’-LTRs using LTR finder (Xu and Wang 2007) and LTR harvest (Ellinghaus, et al. 2008).
Phylogenetic analysis
To determine the evolutionary relationship of EFVs and retroviruses, Pol and Env protein sequences were aligned in MAFFT 7.222 (Katoh and Standley 2013) and confirmed manually in MEGA7 (Kumar, et al. 2016). The phylogenetic relationships among these sequences were then determined using the maximum-likelihood (ML) method in PhyML 3.1 (Guindon, et al. 2010), incorporating 100 bootstrap replicates to determine node robustness. The best-fit models of amino acid substitution were determined by ProtTest 3.4.2 (Abascal, et al. 2005): RtREV+Γ+I for Pol, and WAG+Γ for Env. All alignments used in the phylogenetic analyses can be found in Data set S1.
Molecular dating
The ERV integration time can be calculated using the following simple relation: T = (D/R)/2, in which T is the integration time (million years, MY), D is the number of nucleotide differences per site between the two LTRs, and R is the genomic substitution rate (i.e. number of nucleotide substitutions per site, per year). We used the previously estimated neutral substitution rate for squamate reptiles (7.6 × 10−10 nucleotide substitutions per site, per year) (Perry, et al. 2018). LTRs less than 300 bp in length were not included in this analysis.
Acknowledgments
J.C. is supported by National Natural Science Foundation of China (31671324) and CAS Pioneer Hundred Talents Program. ECH is supported by an ARC Australian Laureate Fellowship (FL170100022).