ABSTRACT
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (2019-nCoV), is a positive-sense, single-stranded RNA coronavirus. The virus is the causative agent of coronavirus disease 2019 (COVID-19) and is contagious through human-to-human transmission. The present study reports sequence analysis, complete coordinate tertiary structure prediction and in silico sequence-based and structure-basedfunctional characteration of full SARS-CoV-2 proteome based on the NCBI reference sequence NC_045512 (29903 bp ss-RNA) which is identical to GenBank entry MN908947 and MT415321. The proteome includes 12 major proteins namely orf1ab polyprotein (includes 15 proteins), surface glycoprotein, ORF3a protein, envelope protein, membrane glycoprotein, ORF6 protein, ORF7a protein, orf7b, ORF8, nucleocapsid phosphoprotein and ORF10 protein. Each protein of orf1ab polyprotein group has been studied separately. A total of 25 polypeptides have been analyzed out of which 15 proteins are not yet having experimental structures and only 10 are having experimental structures with known PDB IDs. Out of 15 newly predicted structures six (6) were predicted using comparative modeling and nine (09) proteins having no significant similarity with so far available PDB structures were modeled using as-initio modeling. The ERRAT and PROCHECK verification revealed that the all-atom model of tertiary structure of high quality and may be useful for structure-based drug designing targets. The study has identified nine major targets (spike protein, envelop protein, membrane protein, nucleocapsid protein, 2’-O-ribose methyltransferase, endoRNAse, 3’-to-5’ exonuclease, RNA-dependent RNA polymerase and helicase) for which drug design targets can be considered. There are other 16 nonstructural proteins (NSPs), which can also be considered from the drug design perspective. The protein structures are deposited to ModelArchive.
Competing Interest Statement
The authors have declared no competing interest.