Abstract
A lack of the complete pig proteome has left a gap in our knowledge of the pig genome and has restricted the feasibility of using pigs as a biomedical model. We developed the tissue-based proteome maps using 34 major normal pig tissues. A total of 7,319 unknown protein isoforms were identified and systematically characterized, including 3,703 novel protein isoforms, 669 protein isoforms from 460 genes symbolized beginning with LOC, and 2,947 protein isoforms without clear NCBI annotation in current pig reference genome. These newly identified protein isoforms were functionally annotated through profiling the pig transcriptome with high-throughput RNA sequencing (RNA-seq) of the same pig tissues, further improving the genome annotation of corresponding protein coding genes. Combining the well-annotated genes that having parallel expression pattern and subcellular witness, we predicted the tissue related subcellular components and potential function for these unknown proteins. Finally, we mined 3,656 orthologous genes for 49.95% of unknown protein isoforms across multiple species, referring to 65 KEGG pathways and 25 disease signaling pathways. These findings provided valuable insights and a rich resource for enhancing studies of pig genomics and biology as well as biomedical model application to human medicine.
Footnotes
Pengju Zhao: zhaopengju2014{at}gmail.com, Xianrui Zheng: zxr07sk1{at}163.com, Ying Yu: yuying{at}cau.edu.cn Zhuocheng Hou: zchou{at}cau.edu.cn Chenguang Diao: firepanda007{at}163.com, Haifei Wang: wanghaiffei{at}126.com, Huimin Kang: nongdaxiaokang{at}126.com, Chao Ning: ningchao{at}cau.edu.cn Junhui Li: cooljunhui{at}126.com, Wen Feng: wfeng{at}cau.edu.cn Wen Wang: wwang{at}wangweb-lab.org George E. Liu: george.liu{at}ars.usda.gov Bugao Li: jinrenn{at}163.com, Jacqueline Smith: Jacqueline.smith{at}roslin.ed.ac.uk Yangzom Chamba: qbyz628{at}126.com, Jian-Feng Liu: liujf{at}cau.edu.cn
- List of Abbreviations
- RNA-seq
- : RNA sequencing
- PCGs
- : protein-coding genes
- LC-MS/MS
- : liquid chromatography tandem mass spectrometry
- EST
- : expressed sequence tag
- PSMs
- : peptide spectrum matches
- FDR
- : false discovery rate
- PBMC
- : peripheral blood mononuclear cells
- GO
- : Gene Ontology
- KEGG
- : Kyoto Encyclopedia of Genes and Genomes
- RIN
- : RNA Integrity Number