Abstract
SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21st century and that likely emerged from animal reservoirs. Differences in nucleotide and protein sequence composition within related β-coronaviruses are often used to better understand CoV evolution, host adaptation, and their emergence as human pathogens. Here we report the comprehensive analysis of amino acid residue changes that have occurred in lineage B β-coronaviruses (sarbecoviruses) that show covariance with each other. This analysis revealed patterns of covariance within conserved viral proteins that potentially define conserved interactions within and between core proteins encoded by SARS-CoV-2 related β-coranaviruses. We identified not only individual pairs but also networks of amino acid residues that exhibited statistically high frequencies of covariance with each other using an independent pair model followed by a tandem model approach. Using 149 different CoV genomes that vary in their relatedness, we identified networks of unique combinations of alleles that can be incrementally traced genome by genome within different phylogenic lineages. Remarkably, covariant residues and their respective regions most abundantly represented are implicated in the emergence of SARS-CoV-2 and are also enriched in dominant SARS-CoV-2 variants.
Significance Statement There currently are enormous international efforts to better understand the emergence of SARS-CoV-2 as a human pathogen and its persistence in the ongoing pandemic in various mutated forms. Most studies have focused on identifying presumptive gain-of-fitness mutations in core viral proteins. The contributions of residues unique to SARS-CoV-2 or those conserved in other β-coronaviruses are not yet understood. Also, the absence of an identified ancestral virus to SARS-CoV-2 prevents a continuous comparison-driven analysis within a lineage of other potential bat-, civet-, and human-adapted viruses such as SARS-CoV that may have similarly contributed to the emergence of SAR-CoV-2 through viral evolution and recombination. Here we identified unique amino acid residues that are found to be variable between distinct β-coronaviruses but are conserved in covariant pairs or networks of covariant residues within or between viral proteins. The relationships revealed are likely preserved in the evolutionary record because these covariant residues impose selective pressure on each other through direct or indirect interactions. The covariant residues we identified could play key roles in the emergence of SARS-CoV-2 as a human pathogen and its continued evolution in more recently emerged variants of clinical interest.
Competing Interest Statement
The authors have declared no competing interest.
Abbreviations
- AA
- Amino acid
- ATDS
- Average Taxomomic Distribution Score.
- CoV
- Coronavirus
- CTD
- Carboxy terminal domain
- FCS
- Furin Cleavage Site
- FP
- Fusion Peptide
- GISAID
- Global Initiative on Sharing All Influenza Data
- NCBI
- National Center for Biotechnology Information
- nsp
- Nonstructural protein
- NTD
- Amino terminal domain
- PCA
- Principal Component Analysis
- RBD
- Receptor binding domain
- WHO
- World Health Organization