ABSTRACT
Lectin-glycan interactions facilitate inter- and intracellular communication in many processes including protein trafficking, host-pathogen recognition, and tumorigenesis promotion. Specific recognition of glycans by lectins is also the basis for a wide range of applications in areas including glycobiology research, cancer screening, and antiviral therapeutics. To provide a better understanding of the determinants of lectin-glycan interaction specificity and support such applications, this study comprehensively investigates specificity-conferring features of all available lectin-glycan complex structures. Systematic characterization, comparison, and predictive modeling of a set of 221 complementary physicochemical and geometric features representing these interactions highlighted specificity-conferring features with potential mechanistic insight. Univariable comparative analyses with weighted Wilcoxon-Mann-Whitney tests revealed strong statistical associations between binding site features and specificity that are conserved across unrelated lectin binding sites. Multivariable modeling with random forests demonstrated the utility of these features for predicting the identity of bound glycans based on generalized patterns learned from non-homologous lectins. These analyses revealed global determinants of lectin specificity, such as sialic acid glycan recognition in deep, concave binding sites enriched for positively charged residues, in contrast to high mannose glycan recognition in fairly shallow but well-defined pockets enriched for non-polar residues. Focused analysis of hemagglutinin interactions with human-like and avian-like glycans uncovered features representing both known and novel mutations related to shifts in influenza tropism from avian to human tissues. The presented systematic characterization of lectin binding sites provides a novel approach to studying lectin specificity and is a step towards confidently predicting new lectin-glycan interactions.
AUTHOR SUMMARY Glycans are sugar molecules found attached to many proteins and coating the outsides of cells from most organisms. Specific recognition of glycans by proteins called lectins facilitates many biological processes, for example enabling influenza to gain access to cells, helping the immune system recognize pathogens, and sorting newly built proteins for transport to appropriate cellular regions. Understanding what makes a particular lectin consider a particular glycan “sweeter” than the vast set of other glycans can help us better understand these processes and how to monitor and control them. To that end, we systematically characterized the sites on lectin structures where glycans are bound, breaking down molecular structures into a comprehensive set of biochemical and geometric features summarizing the sites. This enabled us to discover statistical relationships between binding site features and the glycans recognized by the sites, and further to be able to predict, from a lectin structure, which glycans it recognizes. For the first time, we are able to demonstrate that there are general features of lectin binding sites correlated with and predictive of their specificities, even in unrelated lectins. Ultimately, these findings can help us discover and engineer new lectins for use in research, diagnostics, or even therapeutics.
Competing Interest Statement
The authors have declared no competing interest.
ABBREVIATIONS
- − 3DZDs
- 3-dimensional Zernike descriptors
- − AAL2
- Agrocybe aegerita lectin 2
- − Ca2+
- Calcium
- − CFG
- Consortium for Functional Glycomics
- − Fuc
- Fucose
- − Gal
- Galactose
- − GalNAc
- N-acetylgalactosamine
- − Glc
- Glucose
- − GlcNAc
- N-acetylglucosamine
- − Lac
- Lactose
- − LacNAc
- N-acetyllactosamine
- − NeuAc
- N-acetylneuraminic acid
- − NMR
- Nuclear magnetic resonance
- − PDB
- Protein Data Bank
- − PLIP
- Protein-Ligand Interaction Profiler
- − RF
- Random forest
- − RFU
- Relative Fluorescence Units
- − SNFG
- Symbol Nomenclature for Glycans
- − SMILES
- simplified molecular-input line-entry system
- − WMW test
- Wilcoxon-Mann-Whitney test (also known as Wilcoxon rank-sum test or Mann-Whitney-U test)