Abstract
Defining the unique protein features of SARS-CoV-2, the viral agent causing Coronavirus Disease 2019, may guide efforts to control this pathogen. We examined proteins encoded by the Sarbecoviruses closest to SARS-CoV-2 using profile Hidden Markov Model similarities to identify features unique to SARS-CoV-2. Consistent with previous reports, a small set of bat and pangolin-derived Sarbecoviruses show the greatest similarity to SARS-CoV-2. The analysis provided a measure of total proteome similarity and showed that a small subset of bat Sarbecoviruses are closely related but unlikely to be the direct source of SARS-CoV-2. Spike analysis reveals that the current SARS-CoV-2 variants of concern have sampled only 36% of the possible spikes changes which have occurred historically in Sarbecovirus evolution. It is likely that new SARS-CoV-2 variants with changes in these regions are compatible with virus replication and are to be expected in the coming months, unless global viral replication is severely reduced.
Competing Interest Statement
The authors have declared no competing interest.