Abstract
Broadly neutralizing antibodies (bNAbs) targeting the HIV-1 envelope glycoprotein (Env) have promising utility in prevention and treatment of HIV-1 infection with several undergoing clinical trials. Due to high sequence diversity and mutation rate of HIV-1, viral isolates are often resistant to particular bNAbs. Resistant strains are commonly identified by time-consuming and expensive in vitro neutralization experiments. Here, we developed machine learning-based classifiers that accurately predict resistance of HIV-1 strains to 33 neutralizing antibodies. Notably, our classifiers achieved an overall prediction accuracy of 96% for 212 clinical isolates from patients enrolled in four different clinical trials. Moreover, use of the tree-based machine learning method gradient boosting machine enabled us to identify critical epitope features that distinguish between antibody resistance and sensitivity. The availability of an in silico antibody resistance predictor will facilitate informed decisions of antibody usage in clinical settings.
Introduction
During the past decade, broadly neutralizing antibodies (bNAbs) were isolated from sera of HIV-1 chronically infected donors, with several undergoing clinical trials for use of preventing and treating HIV-1 infection (1–3). As a result of the high sequence diversity and mutation rate of HIV-1, HIV-1 viral isolates can be resistant to a particular bNAb and administration of bNAbs may lead to viral escape (Figure 1a). Resistant viral strains are usually identified by subcloning or synthesizing amplified Envs, producing pseudoviruses, and performing in vitro neutralization assays (1), which is time-consuming and expensive.
While many genotypic assays and in silico algorithms have been developed to predict HIV-1 drug resistance (4) and co-receptor usage (5), in silico prediction of neutralization susceptibility to bNAbs has only been explored by a few studies (6–8), none with publicly available software.
In this work, we present bNAb-Resistance Predictor (bNAb-ReP), an HIV-1 isolate antibody resistance predictor based on the white-box non-linear predictive modeling technique, gradient boosting machine (GBM) (Supplementary Figure S1). GBM has been shown to be competitive with black-box non-linear modeling techniques such as deep learning, particularly when large amounts of training data are not available (9, 10). To build bNAb-ReP, we used sequence and neutralization data for 33 HIV-1 bNAbs obtained from the CATNAP database (11). Full Env sequences and IC50 neutralization titers for 205 to 711 HIV-1 isolates with varying clade distributions were available for each bNAb (Supplementary Figure S2).
Results and discussion
The performance of bNAb-ReP was evaluated in ten runs of ten-fold cross validation measured as the area under the receiver operating characteristic curve (AUC). All bNAb-ReP classifiers performed better than random prediction with average AUC values between 0.63 and 0.97 and an overall median AUC of 0.83 (Figure 1b). Notably, the AUC scores were significantly higher in 26 of the 33 cases using GBM when compared to conventional prediction methods like logistic regression and random forest (Supplementary Figure S3).
In contrast to black-box machine learning approaches, the major advantage of the tree-based method, GBM, is the ability to obtain variable importance scores for all input features, which enables interpretability of the predictive models (feature importance for all 33 bNAb classifiers can be found under https://github.com/RedaRawi/bNAb-ReP). For instance, the top three discriminative features of the bNAb VRC01 classifier involve HIV-1 Env residues 414A, 456, and 459 with a total feature importance of 24.84% (Figure 1c, Supplementary Table S1). Structural studies revealed two of the three amino acid positions were located at the VRC01 epitope and thus can be critical to VRC01 binding and neutralization (Figure 1c) (12). Additionally, the top three features of bNAb 8ANC195 classifier account for a total variable importance of 48.47% and include Env residues 234 and 276, which must be glycosylated in order for 8ANC195 to bind and neutralize Env (Figure 1d, Supplementary Table S2) (13). Not all of the top discriminative features, however, involved bNAb epitope residues, suggesting that distal regions outside the epitope also affect neutralization susceptibility of HIV-1 strains (Supplementary Table S4). In particular, the prediction accuracy for glycan-V3 directed bNAbs had the highest reduction when using only epitope regions rather than full Env sequences. These feature importance scores provide helpful information to facilitate bNAb optimization and guide immunogen design.
To validate bNAb-ReP beyond the data sets obtained from CATNAP, we predicted antibody resistance of HIV-1 Env sequences from clinical studies of HIV infection. First, we tested the bNAb VRC01 classifier on clinical HIV-1 isolates obtained from HIV positive patients enrolled in the VRC601 trial (1) studying the efficacy of VRC01 as therapeutic to control viral load. bNAb-ReP correctly predicted 100% of the resistant and 87% of the sensitive strains to VRC01 (Figure 2a, b).
Additionally, we evaluated the prediction performance of bNAb-ReP using sequence and neutralization data from a phase IIa clinical trial studying HIV positive patients treated with bNAb 3BNC117 (3). bNAb-ReP’s overall classification accuracy was 87%, correctly predicting 26 of 29 sensitive HIV-1 Env strains, although falsely predicting the only resistant sequence as sensitive (Supplementary Figure S5).
To further evaluate bNAb-ReP’s prediction accuracy, we performed in vitro neutralization assay experiments on clinical sequences obtained from Bar et al. (Supplementary Data S1 and Table S4) (2). bNAb-ReP predicted neutralization susceptibility to VRC01, 3BNC117, 10-1074, and PGT121 with accuracies of 82%, 96%, 100%, and 100%, respectively (Figure 2c).
In addition to predicting neutralization susceptibility from the aforementioned clade B sequences, we used bNAb-ReP to predict resistance for clade A and A/D recombinant sequences from a superinfection case study in a Ugandan couple (14). Interestingly, the bNAb-ReP classification accuracy was 100%, 93%, 100%, 100%, 100%, and 100%, for bNAbs VRC01, PGT121, PGT128, PGT145, VRC26.25, and VRC34.01, respectively (Figure 2e).
In this work, we developed bNAb-ReP, a GBM-based antibody resistance predictor, and demonstrated bNAb-ReP’s ability to predict neutralization susceptibility of HIV-1 isolates with high accuracy for several independent clinical test sets. The underlying machine learning technique, GBM, provided insight into how specific features can distinguish between predicting resistance and sensitivity in HIV-1 strains, informing antibody optimization and immunogen design experiments. The availability of bNAb-ReP will facilitate easy and fast prediction of HIV-1 isolates for their neutralization susceptibility to bNAbs, allowing realtime assessment in a clinical setting. The bNAb-ReP predictors for 33 HIV-1 bNAbs are available for download at https://github.com/RedaRawi/bNAb-ReP.
Author contributions:
R.R. and G.-Y.C. designed research; R.R., C.-H. S., R.M., S.K.F., J.Z., performed computational research; N.A.D., A.S., T.-W.C., R.L., and J.R.M. for viral sequence and neutralization data; R.R., P.D.K., and G.-Y.C. wrote the paper, with all authors providing comments or revisions.
Materials and methods
Training data
We used neutralization data of 33 different antibodies (10-1074, 2F5, 2G12, 35O22, 3BNC117, 4E10, 8ANC195, CH01, DH270.1, DH270.5, DH270.6, HJ16, NIH-4546, PG16, PG9, PGDM1400, PGT121, PGT128, PGT135, PGT145, PGT151, VRC-CH31, VRC-PG04, VRC01, VRC03, VRC07, VRC13, VRC26.08, VRC26.25, VRC29.03, VRC34.01, VRC38.01 and b12) assayed against 205 to 711 HIV-1 isolates published in the CATNAP database (1). These assays were performed using single-round-of-infection Env-pseudoviruses on cell lines (2, 3). Each HIV-1 isolate is represented with its full-length envelope glycoprotein amino acid sequence. Duplicated full-length HIV-1 envelope sequences were removed. The viral isolate was categorized as resistant to an antibody if its geometric mean IC50 is greater than 50μg/ml or designated with a “>” sign or was categorized as sensitive otherwise.
Test data
For the data from VRC601 clinical trial, we used sequences and neutralization data from Env-pseudoviruses generated from Envs isolated by single genome amplification (SGA) RT-PCR from plasma virus, as described in Lynch et al. (4). The Env-pseudoviruses were assayed on TZM-bl cells as in Sarzotti-Kelsoe et al. (3). The clade A and A/D sequences and neutralization data were generated the same way and were taken from Ssemwanga et al. (5). The VRC01-ATI sequences were derived from patients in an analytical treatment interruption trial, in which volunteers based at the NIH were administered VRC01 infusions before and during an interruption of antiretroviral therapy (6). In that publication, Env sequences were generated by SGA; however, the published neutralization assays were performed with infectious virus from outgrowth cultures, not in the Env-pseudovirus/TZM-bl format. Here, we report new data, for which we expressed the Env-pseudoviruses from the sequences reported in Bar et al. and used the TZM-bl format as above (6).
Gradient boosting machine (GBM)
To build the training models, we employed a non-linear interpretable tree-based ensemble technique referred as gradient boosting machine (GBM) for building antibody resistance predictors using h2o package (Version 3.16.0.2) in R software (https://www.R-project.org) (7, 8). GBM belongs to the family of predictive methods, which uses an iterative strategy such that the learning framework will consecutively fit new models so as to have a more accurate estimate of the response variable after each iteration. The primary notion behind this technique is to construct new tree-based learners to be as correlated as possible with the negative gradient of a given loss function, calculated using all the training data. We can use any arbitrary loss function (L(·,·)) here. However, if the loss function is the most commonly used squared-loss function, the learning procedure would result in consecutive residual error-fitting. Algorithm 1 summarizes the generic GBM approach.
Gradient Boosting Machine
The advantage of the boosting procedure is that it works on decreasing the bias of the model, without increasing the variance. Learning uncorrelated base learners helps to reduce the bias of the final ensemble model. In this work, we used the L2-TreeBoost approach proposed in (7) to build the core GBM model. Here the loss function is the classical squared-loss function (L2):
In our approach, the base learner is a J-terminal node classification tree. Each tree model has an additive form given as:
Here are J disjoint regions that together cover the space of all joint values of the predictor variable X. These regions represent the J terminal nodes of the corresponding classification tree. The indicator function 1(·) takes the value 1 if the argument passed to it is true, and 0 otherwise. Because the regions are disjoint, θ(X) is equivalent to the prediction rule: if X ∈ Pj, then θ(X) = γj. Now, the pseudo-residuals become:
The line search becomes:
Using classification trees as base learners, we use the idea of separate updates for each terminal region as proposed in (7) to get:
The L2-TreeBoost approach for two-class GBM is summarized in Algorithm 2.
Here the parameter η is a regularization parameter which is used to avoid overfitting the models and is acquired via cross validation. For each iteration a, the least-squares criterion (I(ϕ)) used to assess potential splits of a current terminal region P into two disjoint sub-regions (Pl, Pr) is given by: where Yl and Yr are the left and right child node responses respectively, and wl wr are proportional to the number of samples in regions Pl and Pr respectively as show in (Friedman, 2001). I(ϕ) is a measure of the importance of the variable (ϕ) which maximizes this criterion. During a given iteration, only one feature is allowed to cause a split into 2 terminal regions. Thus, in the case of a J-terminal node classification tree, we generate J – 1 such measures. However, the same feature can generate multiple optimal splits for the J-terminal node tree. In such a scenario, we sum the importance of such features to get the total importance of each feature ϕ after A iterations. This procedure results in the variable importance scores from the GBM approach.
Classifier features
Sequence information was represented using one-hot encoding to represent 20 standard amino acids and N-linked glycan. Each amino acid aai, i∈{1, …, 21} was translated into a 21-dimensional vector, where the ith vector position was set to 1, and all other 20 vector positions were set to 0. For instance, applying one-hot encoding to an amino acid sequence of length 100, would be translated into a binary vector of length 2100.
Training of bNAb-ReP
To train bNAb-ReP classifiers, we first performed a hyperparameter optimization to identify the optimal GBM parameters for the given data. We created a grid of T × J × r × η = 120, in particular number of trees T = 1000, maximum depth J∈{1,2,3,4,5,6}, sample rate and learn rate η∈{0.001,0.01,0.05,0.1,0.2}. We then performed tenfold cross validation for each of the combinations. Finally, we selected the best parameters that had the maximal ten-fold cross validation area-under-the-curve (AUC). Once the optimal hyper-parameters are known, the model is built on the full training set using these parameters and its prediction performance is evaluated on the independent test set.
Alternative Predictors Logistic Regression and Random Forest
Logistic Regression belongs to the class of generalized linear models and we trained binomial predictors using glm function available in h2o package in R. Random Forest (RF) belongs to the class of ensemble based supervised white-box learning techniques. The RF algorithm applies the general technique of bagging or bootstrapped aggregating to decision tree learners. We performed a grid search for optimizing the hyper-parameters including the number of trees in the random forest, maximum depth of the trees and column sampling rate using a 10-fold crossvalidation strategy. We used the distributed random forest function, for implementing random forest models, available in h2o package in R.
Derivation of probability threshold to categorize sensitivity and resistance
Though there is no clear relationship between the proportion of data to be used for training, testing and the model performance, Shabin et al. identified that the best results were obtained when 75% of the whole dataset was used for training and 25% for testing (9). Similar as implemented by Pfeiffer et al. and Hake et al., we used that probability cutoff as the optimal threshold to distinguish between resistant and sensitive viral sequences (10, 11). In particular, we chose for each bNAb classifier a cutoff that provided the best balance between average true positive and true negative rate.
Epitope and paratope buried surface area calculations
The buried surface area between antibody and antigen was calculated using NACCESS software (12, 13). The epitope and paratope residues for each antibody were defined as residues with non-zero buried surface area. In the case of 2G12, the epitope residues were defined as glycans N295, N332, N339, N386, and N392, based on Scanlan et al. (14). The final epitope residues for each category were defined as follows. V1V2 category epitope residues comprised all alignment positions between residue numbers 131-196 (HXB2 numbering). The epitope residues for all other categories were defined as the union of all bNAb epitope residues within each category determined as described above.
Statistical analyses
P-value and odds ratio values presented in Fig.1c and 1d were calculated using Fisher’s exact test (R function fisher.test). Statistical significance, presented in Supplementary Fig. S3 was determined using the following procedure. First, we tested for the list of AUC values for normal distribution using R library nortest, in particular function ad.test. If normal distribution and additionally variance homogeneity were given (R function var.test), we used t-test to determine significance (R function t. test). If neither normal distribution nor variance homogeneity were given, we applied Mann-Whitney test (R function wilcox.test).
Data and software availability
We provide all neutralization and sequence data, as well as the resistance predictors for 33 broadly neutralizing HIV-1 antibodies at https://github.com/RedaRawi/bNAb-ReP.
Supplementary Data S1. HIV-1 Env sequences of VRC01-ATI study
>V08DA_POVIp_2H3.A.9
MRVRGIRKNYQRLWKWG-TLLSMLLGLLMICNAKEQLWVTVYYGVPVWKEANTTLFCASDARAYSTEKHNVWATHACVPTDPDPQEVKLGNVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVEITPLCVTLNCTDYKGNASSTN------NSGGEMENKEEMKNCSFNITTSLSDKV-QERALFYIYDITPINN-----TTNTSYKLTRCNSSVITQACPKVSFEPIPIHYCAPAGFAILKCRQKNFKGTGPCTDVSTVQCTHGIRPVVSTQLLLNGSLAEDEVVIKSENLTDNSKNIIVQLQEAINITCTRPNNNTRKSINI--GPGRAIYATGEIIGDIRQAHCNISEEKWNKTLGQIVEKLRKQFN-NKTIVFAQPSGGDPEIVMHSFNCGGEFFYCNTTRLFNSTWSTWNDS-STLSNGTGNNTITLPCRIKQVINMWQKVGKAMYAPPIAGQIRCSSNITGLILTRDGG----SNNMSNETFRPGGGNMKDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAIG-IGAVFLGFLGAAGSTMGAASVTLTVQARQLLFGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTAVPWNTSWSNKSLEQIWDNMTWMQWEKEINNYTGLIYTLIEQSQNQQEKNEQELLELDTWASLWNWFDISKWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSFQTRLPAQRG-PDRPEGIEEEGGERDRDRSGPLVDGFFAIIWVDLRNLCLFIYHRLRDLLLIVTRIVELLGR-RGWEILKYWWNLLQYWRQELKNSAISLLNATAIAVAEGTDRVIEVVQRIGRGILHIPVRIRQGLERALL
>V08_DA_PRAp_2G2.2
MRVRGIRKNYQRLWKWG-TLLSMLLGLLMICNAKEQLWVTVYYGVPVWREANTTLFCASDARAYSTEKHNVWATHACVPTDPDPQEVKLGNVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVEITPLCVTLNCTDYRGNDNSTN------NSNETIVNKEEIKNCSFNITTSLSDKR-QERALFYKYDITPINS-----TTNTSYRLTRCNSSVITQACPKVSFEPIPIHYCAPAGFAILKCRQKNFKGTGPCTDVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIKSESFTDNSKNIIVQLKEAINITCTRPNNNTRKSINI--GPGRAIYATGEIIGDIRQAHCNISEEKWNETLGQIVKKLQEQFG-NKSIIFAQPAGGDPETVMHSFNCGGEFFYCNTTRLFNSTWNDSS----TRSNGTGNNTITLPCRIKQVINMWQKVGKAMYAPPIAGQIRCSSNITGLILTRDGG----SNNMSNETFRPGGGNMKDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAIG-IGAVFLGFLGAAGSTMGAASVTLTVQARQLLFGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTAVPWNISWSNKSLEQIWNNMTWMQWEKEINNYTGLIYTLIEQSQNQQEKNEQELLELDTWASLWNWFDISKWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSFQTRPPAQRG-PDRPEGIEEEGGERDRDRSGPLVDGFFAIIWVDLRNLCLFIYHRLRDLLLIVTRIVELLGR-RGWEILKYWWNLLQYWRQELKNSAISLLNATAIAVAEGTDRVIEVVQRIGRGILHIPVRIRQGLERALL
>V08DA_POVIp_1H8.6
MRVRGIRKNYQRLWKWG-TLLSMLLGLLMICNAKEQLWVTVYYGVPVWKEANTTLFCASDARAYSTEKHNVWATHACVPTDPDPQEVGLGNVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVEITPLCVTLNCTDYRGNDNSTN------NSNETIVNKEGIKNCSFNITTSLSDKR-QERALFYKYDITPINN-----TTNTSYKLTRCNSSVITQACPKVSFEPIPIHYCAPAGFAILKCRQKNFNGTGPCTNVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIKSENFTDNSKNIIVQLKEAINITCTRPNNNTRKSINI--GPGRAIYATGEIIGDIRQAHCNISEEKWNETLGQIVEKLQKQFG-NKSIIFAQPSGGDPETIMHSFNCGGEFFYCNTTQLFNSTWNDSS----TRSNGTGNNTITLPCRIKQVINMWQKVGKAMYAPPIAGQIRCSSNITGLILTRDGG-----SNMSNETFRPGGGNMKDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAIG-IGAVFLGFLGAAGSTMGAASVTLTVQARQLLFGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTAVPWNISWSDKSLEQIWNNMTWMQWEKEINNYTGLIYTLIEQSQNQQEKNEQELLELDTWASLWNWFDISKWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSFQTRPPAQRG-PDRPEGIEEEGGERDRDRSGPLVDGFFAIIWVDLRNLCLFIYHRLRDLLLIVTRIVELLGR-RGWEILKYWWNLLQYWRQELKNSAISLLNATAIAVAEGTDRVIEVVQRIGRGILHIPVRIRQGLERALL
>V08_DA_PRAp_2G9.7
MRVRGIRKNYQRLWKWG-TLLSMLLGLLMICNAKEQLWVTVYYGVPVWEEANTTLFCASDARAYSTEKHNVWATHACVPTDPDPQEVGLGNVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVEITPLCVTLNCTDYRGNDNSTN------NSNETIVNKEEIKNCSFNITTSLSDKV-QERALFYKYDITPINS-----TTNTSYKLTRCNSSVITQACPKVSFEPIPIHYCAPAGFAILKCRQKNFNGTGPCTEVSTVQCTHGIRPVVSTQLLLNGSLAEGEVVIKSENFTDNSKNIIVQLKEAINITCTRPNNNTRKSINI--GPGRAIYATGEIIGDIRQAHCNISEEKWNETLGQIVEKLQEQFG-NKSIIFTQPSGGDPEIVMHSFNCGGEFFYCNTTRLFNSTWNDSS----TRSNGTGNNTITLPCRIKQVINMWQKVGKAMYAPPIAGQIRCSSNITGLILTRDGG----SNNMSNETFRPGGGNMKDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAIG-IGAVFLGFLGAAGSTMGAASVTLTVQARQLLFGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTAVPWNASWSNKSLEQIWNNMTWMQWEKEINNYTGLIYTLIEQSQNQQEKNEQELLELDTWASLWNWFDISKWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSFQTRPPAQRG-PDRPEGIEEEGGERDRDRSGPLVDGFFAIIWVDLRNLCLFIYHRLRDLLLIVTRIVELLGR-RGWEILKYWWNLLQYWRQELKNSAISLLNATAIAVAEGTDRVIEVVQRIGRGILHIPVRIRQGLERALL
>V08DA_POVIp_1E4.1
MRVRGIRKNYQRLWKWG-TLLSMLLGLLMICNAKEQLWVTVYYGVPVWKEASTTLFCASDARAYSTEKHNVWATHACVPTDPDPQEVGLGNVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVEITPLCVTLNCTDYRGNDNSTN------NSNETIANKEEIKNCSFNITTSLSDKV-QERALFYMYDITPINN-----TTNTSYKLTRCNSSVITQACPKVSFEPIPIHYCAPAGFAILKCRQKNFNGTGPCTNVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIKSENITDNSKNIIVQLKEAINITCTRPNNNTRKSINI--GPGRAIYATGEIIGDIRQAHCNISEEKWNETLRQIVEKLQKQFG-NKSIIFAQPAGGDPETVMHSFNCGGEFFYCNTTRLFNSTWSTWNDS-STLSNGTGNNTITLPCRIKQVINMWQKVGKAMYAPPIAGQIRCSSNITGLILTRDGG----SNNMSNETFRPGGGNMKDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAIG-IGAVFLGFLGAAGSTMGAASVTLTVQARQLLFGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLMGIWGCSGKLICTTAVPWNASWSNKSLQQIWDNMTWMQWEKEINNYTGLIYTLIEQSQNQQEKNEQELLELDTWASLWNWFDISKWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSFQTRPPAQRG-PDRPEGIEEEGGERDRDRSGPLVDGFFAIIWVDLRNLCLFIYHRLRDLLLIVTRIVELLGR-RGWEILKYWWNLLQYWRQELKNSAISLLNATAIAVAEGTDRVIEVVQRIGRGILHIPVRIRQGLERALL
>V08DA_POVIp_2C7.1
MRVRGIRKNYQRLWKWG-TLLSMLLGLLMICNAKEQLWVTVYYGVPVWKEANTTLFCASDARAYSTEKHNVWATHACVPTDPDPQEVGLGNVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVEITPLCVTLNCTDYKGNDNSTN------NSNETIVNKEEIKNCSFNITTSLSDKV-QERALFYMYDITPINN-----TTNTSYRLTRCNSSVITQACPKVSFEPIPIHYCAPAGFAILKCRQKNFNGTGPCTNVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIKSENFTDNSKNIIVQLKEAINITCTRPNNNTRKSINI--GPGRAIYATGEIIGDIRQAHCNISEGKWNKTLGQIVEKLQKQFG-NKSIIFTQPSGGDPEIVMHSFNCGGEFFYCNTTRLFNSTWNDSS----TRSNGTGNNTITLPCRIKQVINMWQKVGKAMYAPPIAGQIRCSSNITGLILTRDGG----SNNMSNETFRPGGGNMKDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAIG-IGAVFLGFLGAAGSTMGAASVTLTVQARQLLFGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTAVPWNTSWSDKSLEQIWNNMTWMQWEKEINNYTGLIYTLIEQSQNQQEKNEQELLELDTWASLWNWFDISKWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSFQTRPPAQRG-PDRPEGIEEEGGERDRDRSGPLVDGFFAIIWVDLRNLCLFIYHRLRDLLLIVTRIVELLGR-RGWEILKYWWNLLQYWRQELKNSAISLLNATAIAVAEGTDRVIEVVQRIGRGILHIPVRIRQGLERALL
>V08_DA_PRAp_2E12.5
MRVRGIRKNYQRLWKWG-TLLSMLLGLLMICNAKEQLWVTVYYGVPVWKEANTTLFCASDARAYSTEKHNVWATHACVPTDPDPQEVGLGNVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVEITPLCVTLNCTDYRGNDNSTN------NSNETIVNKEEIKNCSFNITTSLSDKV-QERALFYMYDITPINN-----TTNTSYKLTRCNSSVITQACPKVSFEPIPIHYCAPAGFAILKCRQKNFNGTGPCTDVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIKSENFTDNSKNIIVQLKEAINITCTRPNNNTRKSINI--GPGRAIYATGEIIGDIRQAHCNISEEKWNKTLGQIVKKLQKQFG-NKSIIFAQPSGGDPEIVMHSFNCGGEFFYCNTTRLFNSTWSTWNDS-STGSNGTGNNTITLPCRIKQVINMWQKVGKAMYAPPIAGQIRCSSNITGLILTRDGG----SNNMSNETFRPGGGNMKDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAIG-IGAVFLGFLGAAGSTMGAASVTLTVQARQLLFGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTAVPWNTSWSNKSLQQIWDNMTWMQWEKEINNYTGLIYTLIEQSQNQQEKNEQELLELDTWASLWNWFDISKWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSFQTRPPAQRG-PDRPEGIEEEGGERDRDRSGPLVDGFFAIIWVDLRNLCLFIYHRLRDLLLIVTRIVELLGR-RGWEILKYWWNLLQYWRQELKNSAISLLNATAIAVAEGTDRVIEVVQRIGRGILHIPVRIRQGLERALL
>V08DA_POVIp_2D12.A.3
MRVRGIRKNYQRWWKWG-TLLSMLLGLLMICNAKEQLWVTVYYGVPVWKEASTTLFCASDARAYSTEKHNVWATHACVPTDPDPQEVGLGNVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVEITPLCVTLNCTDYRGND----------NSNETIVNKEEIKNCSFNITTSLSDKV-QERALFYMYDITPINN-----TTNTSYKLTRCNSSVITQACPKVSFEPIPIHYCAPAGFAILKCRQKNFNGTGPCTDVSTVQCTHGIRPVVSTQLLLNGSLAEGEVVIKSENFTDNSKNIIVQLKEAINITCTRPNNNTRKSINI--GPGRAIYATGEIIGDIRQAHCNISEEKWNKTLGQIVEKLQKQFG-NKSIIFAQPSGGDPETVMHSFNCGGEFFYCNTTRLFNSTWNDSS----TLSNGTGNNTITLPCRIKQVINMWQKVGKAMYAPPIAGQIRCSSNITGLILTRDGG----SNNMSNETFRPGGGNMKDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAIG-IGAVFLGFLGAAGSTMGAASVTLTVQARQLLFGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTAVPWNTSWSNKSLGQIWDNMTWMQWEKEINNYTGLIYTLIEQSQNQQEKNEQELLELDTWASLWNWFDISKWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSFQTRLPAQRG-PDRPEGIEEEGGERDRDRSGPLVNGFFAIIWVDLRNLCLFIYHRLRDLLLIVTRIVELLGR-RGWEILKYWWNLLQYWRQELKNSAISLLNATAIAVAEGTDRVIEVVQRIGRGILHIPVRIRQGLERALL
>V08DA_PRAp_2A5.5
MRVRGIRKNYQRLWKWG-TLLSMLLGLLMICNAKEQLWVTVYYGVPVWKEASTTLFCASDARAYSTEKHNVWATHACVPTDPDPQEVGLGNVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVEITPLCVTLNCTDYRGNDNSTN------NSNETIVNKEEIKNCSFNITTSLSDKV-QERALFYMYDITPINN-----TTNTSYKLTRCNSSVITQACPKVSFEPIPIHYCAPAGFAILKCRQKNFNGTGPCTDVSTVQCTHGIRPVVSTQLLLNGSLAEGEVVIKSENFTDNSKNIIVQLKEAINITCTRPNNNTRKSINI--GPGRAIYATGEIIGDIRQAHCNISEEKWNKTLEQIVEKLQKQFG-NKSIIFAQPSGGDPETVMHSFNCGGEFFYCNTTRLFNSTWNDSS----TLSNGTGNNTITLPCRIKQVINMWQKVGKAMYAPPIAGQIRCSSNITGLILTRDGG----SNNMSNETFRPGGGNMKDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAIG-IGAVFLGFLGAAGSTMGAASVTLTVQARQLLFGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTAVPWNTSWSNKSLEQIWDNMTWMQWEKEINNYTGLIYTLIEQSQNQQEKNEQELLELDTWASLWNWFDISKWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSFQTRLPAQRG-PDRPEGIEEEGGERDRDRSGPLVDGFFAIIWVDLRNLCLFIYHRLRDLLLIVTRIVELLGR-RGWEILKYWWNLLQYWRQELKNSAISLLNATAIAVAEGTDRVIEVVQRIGRGILHIPVRIRQGLERALL
>V06_MO_POVIp_2D8.7
MKVKGIRRNYQLLWRW----GIMLLGILRICNATEELWVTVYYGVPVWKEANTTLFCASDAKTYDTEAHNVWATHACVPTDPNPQEIVLGNVTENFNMWKNDMVEQMHEDIISLWDQSLKPCVKLTPLCVKLNCTDASTNTTNSTSTTNSTSGGEERMEKEEIKNCSFKITTSTREKV-KGSAFFYKTDVVSIAK------DNTSYKLIHCNTSVITQACPKVSFEPIPIHYCAPAGFAILKCNNKTFNGTGPCKNVSTVLCTHGIKPVVSTQLLLNGSLAEGEVIIRSENFTNNAKTIIVHLNESVVINCTRPNNNTRRSINM--GPGRAFYAT-GIIGDIRQAYCNISQGKWNYTLKQIVKKLREQFGENKTIVFNQSSGGDPEIVTHSFNCGGEFFYCNTTKLFNSTWNTSA----WHGYQEKNDTITLPCKIRQIINMWQEVGKAMYAPPIAGVIRCSSNITGLLLTRDGG----NESEGNEIFRPGGGDMRDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAVG-IGALFLGFLGAAGSTMGAASITLTGQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARILAVERYLKDQQLLGIWGCSGKLICTTAVPWNTSWSNKSMDNIWNNMTWMEWEREIDNYTGIIYNLLEDSQYQQEKNEKELLELDKWASLWNWFSITNWLWYIKIFIMIVGGLVGLRIVFAAFSIINRVRQGYSPLSFQTRFPAPRG-PDRPEGIEEEGGERDRDRSGRLVSGFLPLIWDDLRNLCLFSYHRLRDLLLIVTRIVELLGR-RGWEALKYWWNLLQYWGQELKNSAVSLLNAATIAVAEGTDRIIEVLQGTFRGILHIPTRIRQGLERALL
>V06MO_PRAp_1G3.2_b
MKVKGIRRNYQLLWRW----GIMLLGILMICNATEKLWVTVYYGVPVWKEANTTLFCASDAKAYDTEVHNVWATHACVPTDPNPQELLLENVTENFNMWKNDMVEQMHEDIISLWDQSLKPCVKLTPLCVKLNCTDANTSTTNSTSTTNSTSGGEERMEKEEIKNCSFNITTSTQEKV-KGSAFFYKIDIVPIGN----DTTNTSYKLLHCNTSVITQACPKVSFEPIPIHYCAPAGFAILKCNNEKFNGTGPCKNVSTVLCTHGIKPVVSTQLLLNGSLAEGEVMIRSENFTNNAKTIIVHLNESVVINCTRPNNNTRKSINM--GPGRAFYAT-GIIGDIRQAHCNISRTAWNNTLEKIVKKLREQFGKNKTIVFNQSSGGDPEIVTHSFNCGGEFFYCNTTKLFNSTWNTSM----WHGSNEQEENVTIPCKIRQIINMWQEVGKAMYAPPIAGVIRCSSEITGLLLTRDGG--NESESEGKEIFRPGGGDMRDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAVG-IGALFLGFLGAAGSTMGAASIVLTGQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTAVPWNASWSNKSQNEIWNNMTWMEWEREIDNYTGIIYNLLEDSQYQQEKNEKELLELDKWASLWNWFSITNWLWYIRIFIMIVGGLVGLRIVFAVFSVINKVRQGYSPLSFQTRFPAPRG-PDRPEGIEEEGGERDRDRSGRLVNGFLPLIWDDLRSLCLFSYRRLRDLLLIATRIVELLGR-RGWEALKYWWNLLQYWGQELKNSAVSLLNAAAIAVAEGTDRIIEVLQRIFRAILHIPTRIRQGLERALL
>V06_MO_POVIp_1F2.7
MKVKGIRRNYQLLWRW----GIMLLGILRICNATEELWVTVYYGVPVWKEANTTLFCASDAKAYDKEVHNVWATHACVPTDPTPQEIVLGNVTENFNMWKNDMVEQMHEDIISLWDQSLKPCVKLTPLCVKLNCTDASTSTTNSTSTTNSTSGGEERMEKEEIKNCSFNITTSTQEKV-KGSAFFYKIDIVPIGN----DTTNTSYRLIHCNTSVITQACPKVSFEPIPIHYCAPAGFAILKCNNETFNGTGPCTNVSTVICTHGIKPVVSTQLLLNGSLAEGEVMIRSENFTNNAKTIIVHLNESVVINCTRPNNNTRKSINM--GPGRAFYAT-GIIGDIRQAHCNISRTAWNNTLKKIVEKLREQFGKNKTIVFNQSSGGDPEIVTHSFNCGGEFFYCNTTKLFNSTWNTSI----EHGSKDQEGNITLPCKIRQIINMWQEVGKAMYAPPIAGVIRCSSDITGLLLTRDGG--NESESEGKEIFRPGGGDMRDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAVG-IGALFLGFLGAAGSTMGAASIMLTGQTRQLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTDVPWNTSWSNKTLDKIWNNMTWMEWEREIDNYTGIIYNLLEDSQYQQEKNEKELLELDKWTSLWTWFSITNWLWYIRIFIMIVGGLVGLRIVFAVISIINKVRQGYSPLSFQTRFPAPRGRPDRPEGIEEEGGERDRDRSGRLVNGFLPLIWDDLRSLCLFSYRRLRDLLLIVTRIVELLGR-RGWEALKYWWNLLQYWGQELKNSAVSLLNAAAIAVAEGTDRIIEVLQRIFRAILHIPTRIRQGLERALL
>V06_MO_PRAp_1E2.6
MKVKGIRRNYQLLWKW----GIMLLGILMICNATEKLWVTVYYGVPVWKEANTTLFCASDARAYDKEVHNVWATHACVPTDPNPQEVVLGNVTENFNMWKNDMVEQMHEDIISLWDQSLKPCVKLTPLCVKLNCTNASTNTTNSTSTTNSTSDDKERMEKEEIKNCSFNITTSTQEKV-KGSAFFYKIDIVPMGN----DTTNTSYRLIHCNTSVITQACPKVSFEPIPIHYCAPAGFAILKCNNETFNGTGPCKNVSTVLCTHGIKPVVSTQLLLNGSLAEGEVMIRSENFTNNAKTIIVHLNEAVVINCTRPNNNTRRSINM--GPGRAFYAT-GIIGDIRQAHCNISRTAWNDTLKKIVKKLREQFG-NKTIVFNQSSGGDPEIVTHSFNCGGEFFYCNTTKLFSSTWNTST----WHESNDQGENITLPCKIRQIINMWQEVGKAMYAPPIAGVIRCSSNITGLLLTRDGG----NGSEGKEIFRPGGGDMRDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAVG-IGALFLGFLGAAGSTMGAASIMLTGQTRLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICTTDVPWDTNWSNKTLDKIWNNMTWMEWEREIDNYTGIIYNLLEESQYQQEKNEKELLELDKWASLWNWFSITNWLWYIRIFIMIVGGLVGLRIVFTVLSIINRVRQGYSPLSFQTRFPAPRGRPDRPEEIEEEGGERDRDRSGRLVNGFLPLIWDDLRSLCLFSYRRLRDLILIVTRIVELLGR-RGWEALKYWWNLLQYWGQELKNSAVSLLNAAAIAVAEGTDRIIEVLQRIFRAILHIPTRIRQGLERALL
>V04EB_PRAp_1H5.6
MKIKETRKNYQHFWKW----GTMLLGMLLICSAAEQLWVTVYYGVPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLGNVTENFNAWKNNMVEQMHEDVISLWDQSLKPCVKLTPLCVTLNCTDWKNPTNNTNTT-DTNATTVNREELGEIKNCSFNITTSMRDKMQKAYAPFYKLDVEPIDN------NNTSYRLISCNTSVITQACPKMSFEPIPIHYCAPAGFAILKCNNKTFNGKGPCTNVSTVQCTHGIKPVVSTQLLLNGSLAEKEVVIRSENFTDNAKVITVHLNDSVGINCTRPNNNTRKSIQI--GPGRAFFETGGIIGNIRQAHCNISQKQWNTTLGQIVVKLREQYGSNKTIVFNQSSGGDPEIVMHSFNCGGEFFYCNTTPLFNSTWNGTNG--TWKGTNGTNGTITIQCRIKQFINMWQGVGKAMYAPPISGIIRCSSNITGLLLTRDGG----NETSGTEIFRPAGGNMKDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGTIGAMFLGFLGAAGSTMGAASMTLTVQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERFLRDQQLLGIWGCSGKLICTTTVPWNASWSNKSLDMIWHNMTWMQWEREIDNYTQLIYTLIEESQNQQEKNEQELLELDKWASLWNWFSITQWLWYIKISIMIVAGLVGLRIVFAVLSIVNRVRQGYSPLSFQTRLPTPRG-PDRPEETEEEGGDRDRDRSVRLVNGFLALIWDDLRSLCLFSYHHLRDLLLIAARIVELLGR-RGWEALKYWWNLLQYWIQELKNSAVSLLNATAIAVAEGTDRIIEVVQRACR------------ANRA--
>V04EB_POVIp_2A8.4
MKIKETRKNYQHFWKW----GTMLLGMLLICSAAEQLWVTVYYGVPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLGNVTENFNAWKNNMVEQMHEDVISLWDQSLKPCVKLTPLCVTLNCTDWKNPTNNTNTT-DTNTTIVNREELGEIKNCSFNITTSMRDKMQKAYAPFYKLDVEPIDN------NNTSYRLISCNTSVITQACPKMSFEPIPIHYCAPAGFAILKCNNKTFNGKGPCTNVSTVQCTHGIKPVVSTQLLLNGSLAEKEVVIRSENFTDNAKVITVHLNDSVGINCTRPNNNTRKSIQI--GPGRAFFETGGIIGNIRQAHCNISQKHWNTTLGQIVVKLREQYGSNKTIVFNQSSGGDPEIVMHSFNCGGEFFYCNTTPLFNSTWNGTNG--TWKGTNGTNGTITIQCRIKQFINMWQGVGKAMYAPPISGIIRCSSNITGLLLTRDGG----NETSGTEIFRPAGGNMKDNWRSELYKYKVVKIEPVGVAPTKAKRRVVQREKRAVGTIGAMFLGFLGAAGSTMGAASMTLTVQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERFLRDQQLLGIWGCSGKLICTTTVPWNASWSNKSLDMIWHNMTWMQWEREIDNYTQLIYTLIEESQNQQEKNEQELLELDKWASLWNWFSITQWLWYIKIFIMIVAGLVGLRIVFAVLSIVNRVRQGYSPLSFQTRLPTPRG-PDRPEETEEEGGDRDRDRSVRLVNGFLALIWDDLRSLCLFSYHHLRDLLLIAARIVELLGR-RGWEALKYWWNLLQYWIQELKNSAVSLLNATAIAVAEGTDRIIEVVQRACR------------ANRA--
>V04EB_POVIp_1D1.1
MKIKETRKNYQHFWKW----GTMLLGMLLICSAAEQLWVTVYYGVPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLGNVTENFNAWKNNMVEQMHEDVISLWDQSLKPCVKLTPLCVTLNCTDWKNPTNNTNTT-DTNATTVNREELGEIKNCSFNITTSMRDKMQKAYAPFYKLDVEPIDN------NNTSYRLISCNTSVITQACPKMSFEPIPIHYCAPAGFAILKCNNKTFNGKGPCTNVSTVQCTHGIKPVVSTQLLLNGSLAEKEVVIRSENFTDNAKVITVHLNDSVGINCTRPNNNTRKSIQI--GPGRAFFETGGIIGNIRQAHCNISQKQWNTTLGQIVVKLREQYGSNKTIVFNQSSGGDPEIVMHSFNCGGEFFYCNTTPLFNSTWNGTNG--TWKGTNGTNGTITIQCRIKQFINMWQGVGKAMYAPPISGIIRCSSYITGLLLTRDGG----NETSGTEIFRPAGGNMKDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGTIGAMSLGFLGAAGSTMGAASMTLTVQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERFLRDQQLLGIWGCSGKLICTTTVPWNASWSNKSLDMIWHNMTWMQWEREIDNYTQLIYTLIEESQNQQEKNEQELLELDKWASLWNWFSITQWLWYIKIFIMIVAGLVGLRIVFAVLSIVNRVRQGYSPLSFQTRLPTPRG-PDRPEETEEEGGDRDRDRSVRLVNGFLALIWDDLRSLCLFSYHHLRDLLLIAARIVELLGR-RGWEALKYWWNLLQYWIQELKNSAVSLLNATAIAVAEGTDRIIEVVQRACR------------ANRA--
>V04EB_POVIp_1F8.7
MKIKETRKNYQHFWKW----GTMLLGMLLICSAAEQLWVTVYYGVPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLGNVTENFNAWKNNMVEQMHEDVISLWDQSLKPCVKLTPLCVTLNCTDWKNPTNNTNTT-DTNATTTNREELGEIKNCSFNITTSMRDKMQKAYAPFYKLDVEPIDN------NNTSYRLISCNTSVITQACPKMSFEPIPIHYCAPAGFAILKCNNKTFNGKGPCTNVSTVQCTHGIKPVVSTQLLLNGSLAEKEVVIRSENFTDNAKVITVHLNDSVGINCTRPNNNTRKSIQI--GPGRAFFETGGIIGNIRQAHCNISQKQWNTTLGQIVVKLREQYGSNKTIVFNQSSGGDPEIVMHSFNCGGEFFYCNTTPLFNSTWNGTNG--TWKGTNGTNGTITIQCRIKQFINMWQGVGKAMYAPPISGIIRCSSNITGLLLTRDGG----NETSGTEIFRPAGGNMKDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGTIGAMFLGFLGAAGSTMGAASMTLTVQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERFLRDQQLLGIWGCSGKLICTTTVPWNASWSNKSLDMIWHNMTWMQWEREIDNYTQLIYTLIEESQNQQEKNEQELLELDKWASLWNWFSITQWLWYIKIFIMIVAGLVGLRIVFAVLSIVNRVRQGYSPLSFQTRLPTPRG-PDRPEETEEEGGDRDRDRSVRLVNGFLALIWDDLRSLCLFSYHHLRDLLLIAARIVELLGR-RGWEALKYWWNLLQYWIQELKNSAVSLLNATAIAVAEGTDRIIEVVQRACR------------ANRA--
>V04EB_POVIp_1G4.5
MKIKETRKNYQHFWKW----GTMLLGMLLICSAAEQLWVTVYYGVPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLGNVTENFNAWKNNMVEQMHEDVISLWDQSLKPCVKLTPLCVTLNCTDWKNPTNNTNTT-DTNATTTNREELGEIKNCSFNITTSMRDKMQKAYAPFYKLDVEPIDN------NNTSYRLISCNTSVITQACPKMSFEPIPIHYCAPAGFAILKCNNKTFNGKGPCTNVSTVQCTHGIKPVVSTQLLLNGSLAEKEVVIRSENFTDNAKVITVHLNDSVGINCTRPNNNTRKSIQI--GPGRAFFETGGIIGNIRQAHCNISQKQWNTTLGQIVVKLREQYGSNKTIVFNQSSGGDPEIVMHSFNCGGEFFYCNTTPLFNSTWNGTNG--TWKGTNGTNGTITIQCRIKQFINMWQGVGKAMYAPPISGIIRCSSNITGLLLTRDGG----NETSGTEIFRPAGGNMKDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGTIGAMFLGFLGAAGSTMGAASMTLTVQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERFLRDQQLLGIWGCSGKLICTTTVPWNASWSNKSLDMIWHNMTWMQWEREIDNYTQLIYTLIEESQNQQEKNEQELLELDKWASLWNWFSITQWLWYIKIFIMIVAGLVGLRIVFAVLSIVNRVRQGYSPLSFQTRLPTPRG-PDRPEETEEEGGDRDRDRSVRLVNGFLALIWDDLRSLCLFSYHHLRDLLLIAARIVELLGR-RGWEALKYWWNLLQYWIQELKNSAVSLLNATAIAVAEGTDRIIEVVQRACR------------ANRA--
>V04EB_PRAp_2D9.5
MKIKETRKNYQHFWKW----GTMLLGMLLICSAAEQLWVTVYYGVPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLGNVTENFNAWKNNMVEQMHEDVISLWDQSLKPCVKLTPLCVTLNCTDWKNPTNNTNTT-DTNATTVNREELGEIKNCSFNITTSMRDKMQKAYAPFYKLDVEPIDN------NNTSYRLISCNTSVITQACPKMSFEPIPIHYCAPAGFAILKCNNKTFNGKGPCTNVSTVQCTHGIKPVVSTQLLLNGSLAEKEVVIRSENFTDNAKVITVHLNDSVGINCTRPNNNTRKSIQI--GPGRAFFETGGIIGNIRQAHCNISQKQWNTTLGQIVVKLREQYGSNKTIVFNQSSGGDPEIVMHSFNCGGEFFYCNTTPLFNSTWNGTNG--TWKGTNGTNGTITIQCRIKQFINMWQGVGKAMYAPPISGIIRCSSNITGLLLTRDGG----NETSGTEIFRPAGGNMKDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGTIGAMFLGFLGAAGSTMGAASMTLTVQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERFLRDQQLLGIWGCSGKLICTTTVPWNASWSNKSLDMIWHNMTWMQWEREIDNYTQLIYTLIEESQNQQEKNEQELLELDKWASLWNWFSITQWLWYIKTFIMIVAGLVGLRIVFAVLSIVNRVRQGYSPLSFQTRLPTPRG-PDRPEETEEEGGDRDRDRSVRLVNGFLALIWDDLRSLCLFSYHHLRDLLLIAARIVELLGR-RGWEALKYWWNLLQYWIQELKNSAVSLLNATAIAVAEGTDRIIEVVQRACR------------ANRA--
>V04EB_PRAp_2A10.7
MKIKETRKNYQHFWKW----GTMLLGMLLICSAAEQLWVTVYYGVPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLGNVTENFNAWKNNMVEQMHEDVISLWDQSLKPCVKLTPLCVTLNCTDWKNPTNNTNTT-DTNATTVNREELGEIKNCSFNITTSMRDKMQKAYAPFYKLDVEPIDN------NNTSYRLISCNTSVITQACPKMSFEPIPIHYCAPAGFAILKCNNKTFNGKGPCTNVSTVQCTHGIKPVVSTQLLLNGSLAEKEVVIRSENFTDNAKVITVHLNDSVGINCTRPNNNTRKSIQI--GPGRAFFETGGIIGNIRQAHCNISQKQWNTTLGQIVVKLREQYGSNKTIVFNQSSGGDPEIVMHSFNCGGEFFYCNTTPLFNSTWNGTNG--TWKGTNGTNGTITIQCRIKQFINMWQGVGKAMYAPPISGIIRCSSNITGLLLTRDGG----NETSGTEIFRPAGGNMKDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGTIGAMFLGFLGAAGSTMGAASMTLTVQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERFLRDQQLLGIWGCSGKLICTTTVPWNASWSNKSLDMIWHNMTWMQWEREIDNYTQLIYTLIEESQNQQEKNEQELLELDKWASLWNWFSITQWLWYIKIFIMIVAGLVGLRIVFAVLSIVNRVRQGYSPLSFQTRLPTPRG-PDRPEETEEEGGDRDRDRSVRLVNGFLALIWDDLRSLCLFSYHHLRDLLLIAARIVELLGR-RGWEALKYWWNLLQYWIQELKNSAVSLLNATAIAVAEGTDRIIEVVQRACR------------ANRA--
>V04EB_PRAp_2C10.1
MKIKETRKNYQHFWKW----GTMLLGMLLICSAAEQLWVTVYYGVPVWKEATTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLGNVTENFNAWKNNMVEQMHEDVISLWDQSLKPCVKLTPLCVTLNCTDWKNPTNNTNTT-DTNATTVNREELGEIKNCSFNITTSMRDKMQKAYAPFYKLDVEPIDN------NNTSYRLISCNTSVITQACPKMSFEPIPIHYCAPAGFAILKCNNKTFNGKGPCTNVSTVQCTHGIKPVVSTQLLLNGSLAEKEVVIRSENFTDNAKVITVHLNDSVGINCTRPNNNTRKSIQI--GPGRAFFETGGIIGNIRQAHCNISQKQWNTTLGQIVVKLREQYGSNKTIVFNQSSGGDPEIVMHSFNCGGEFFYCNTTPLFNSTWNGTNG--TWKGTNGTNGTITIQCRIKQFINMWQGVGKAMYAPPISGIIRCSSNITGLLLTRDGG----NETSGTEIFRPAGGNMKDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGTIGAMFLGFLGAAGSTMGAASMTLTVQARLLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARVLAVERFLRDQQLLGIWGCSGKLICTTTVPWNASWSNKSLDMIWHNMTWMQWEREIDNYTQLIYTLIEESQNQQEKNEQELLELDKWASLWNWFSITQWLWYIKIFIMIVAGLVGLRIVFAALSIVNRVRQGYSPLSFQTRLPTPRG-PDRPEETEEEGGDRDRDRSVRLVNGFLALIWDDLRSLCLFSYHHLRDLLLIAARIVELLGR-RGWEALKYWWNLLQYWIQELKNSAVSLLNATAIAVAEGTDRIIEVVQRACR------------ANRA--
>V02_RP_PRAp_2A12.3
MRVREIRRNSRQFWKW----GIMLLGMLMICSAVEQLWVTVYYGVPVWKDANTTLFCASDAKAYDTEVHNVWATHACVPTDPNPQEVLLENVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVKLTPLCVTLDCTDLRNATNTTSN---DTSNGGGMPEGGEMKNCSFNITTSVRDKMQKTYALFYKLDVEPING------DNTSYRLISCNTSVITQACPKVTFEPIPIHYCAPAGFAILKCNDKKFNGTGQCRNVSTVQCTHGIKPVVSTQLLLNGSLAEEEVVIRSVNFTNNAKTIIVQLKEAVQINCTRPNNNTRKGIHI--GPGRAFYATGSIIGNIRQAYCNLSSTKWHNTLQQIVKKLREQFG-NKTIIFNQSSGGDPEIVMHSFNCGGEFFYCNSTQLFNSTWTNST-----EGLNTTEDPITLPCRIKQIINLWQEVGKAMYAPPISGIISCSSNITGLLLTRDGG--KNGSANNTEIFRPEGGDMRDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAVG-LGAMFLGFLGAAGSTMGAAAITLTGQARQLLSGIVQQQNNLLRAIEAQQHMLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICPTAVPWNTSWSNRSLEKIWDNMTWMEWEREIDNYTGLIYSLIEESQNQQEKNEQELLELDKWASLWNWFNITQWLWYIKIFIMIVGGLVGLRIVFAVFSIVNRVRQGYSPLSFQTLLPAPRG-PDRPEGIEEEGGERDRDRSGRLVTGFLALIWDDLRSLCLFSYHRLRDLLLIVARIVELLGR-RGWEALKYWWNLLKYWSQELKSSAISLLNVTAIAVAEGTDRIVEVLQRICRAFRNIPRRIRQGLERILI
>V02_POVIp_2D4.2
MRVREIRKSSRQFWKW----GIMLLGMLMICSAVEQLWVTVYYGVPVWKDANTTLFCASDAKAYDTEVHNVWATHACVPTDPNPQEVILENVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVKLTPLCVTLDCTDLKNATNTTSN---GTSNGGGMPEGGEIKNCSFNITTSVRDKMQKTYALFYKLDVEPING------DNTSYRLISCNTSVITQACPKVTFEPIPIHYCAPAGFAILKCNDKKFNGTGQCRNVSTVQCTHGIKPVVSTQLLLNGSLAEEEVVIRSVNLTDNAKTIIVQLKEAVQINCTRPNNNTRKGIHI--GPGRAFYATGDIIGNIRQAHCNLSSTKWNNTLKQIVKKLREQFG-NKTIIFNQSSGGDPEIVMHTFNCGGEFFYCNSTKLFNSTWNSTE-----ELNTTEGPPITLPCRIKQIINLWQEVGKAMYAPPISGIISCSSNITGLLLTRDGG----NGSETNETFRPGGGNMRDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAVG-LGAMFLGFLGAAGSTMGAAAITLTGQARQLLSGIVQQQNNLLRAIEAQQHMLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICPTAVPWNTSWSNRSLEKIWNNMTWMEWEREIDNYTGLIYSLIEESQNQQEKNEQELLELDKWASLWNWFNITQWLWYIKIFIMIVGGLVGLRIVFAVFSIVNRVRQGYSPLSFQTLLPAPRG-PDRPEGIEEEGGERDRDRSGRLVTGFLALIWDDLRSLCLFSYHRLRDLLLIVARIVELLGR-RGWEALKYWWNLLKYWSQELKSSAISLLNVTAIAVAEGTDRIVEVLQRTCRAFLNIPRRIRQGLERILI
>V02RP_POVIp_1A5.8_2
MRVREIRKSSRQFWKW----GIMLLGMLMICSAVEQLWVTVYYGVPVWKDANTTLFCASDAKAYDTEVHNVWATHACVPTDPNPQEVILENVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVKLTPLCVTLDCTDLKNATNTTSN---GTSNGGGMPEGGEMKNCSFNITTSVRDKMQKTYALFYKLDVEPING------DNTSYRLISCNTSVITQACPKVTFEPIPIHYCAPAGFAILKCNDKKFNGTGQCRNVSTVQCTHGIKPVVSTQLLLNGSLAEEEVVIRSVNLTDNAKTIIVQLKEAVQINCTRPNNNTRKGIHI--GPGRAFYATGDIIGNIRQAHCNLSSTKWNNTLKQIVKKLREQFG-NKTIIFNQSSGGDPEIVMHTFNCGGEFFYCNSTKLFNSTWNSTE-----ELNTTEGPPITLPCRIKQIINLWQEVGKAMYAPPISGIISCSSNITGLLLTRDGG----NGSETNETFRPGGGNMRDNWRSELYKYKVVKIEPLGIAPTKAKRRVVQREKRAVG-LGAMFLGFLGAAGSTMGAAAITLTGQARQLLSGIVQQQNNLLRAIEAQQHMLQLTVWGIKQLQARVLAVERYLKDQQLLGIWGCSGKLICPTAVPWNTSWSNRSLEKIWNNMTWMEWEREIDNYTGLIYSLIEESQNQQEKNEQELLELDKWASLWNWFNITQWLWYIKIFIMIVGGLVGLRIVFAVFSIVNRVRQGYSPLSFQTLLPAPRG-PDRPEGIEEEGGERDRNRSGRLVTGFLALIWDDLRSLCLFSYHRLRDLLLIVARIVELLGR-RGWEALKYWWNLLKYWSQELKSSAISLLNVTAIAVAEGTDRIVEVLQRTCRAFLNIPRRIRQGLERILI
>V09JF_PRAp-2A11.4
MKAKEIRKNCQRLWRW----GILLLGMLMICSATEKLWVTVYYGVPVWKEANTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLANVTENFNMWKNDMVEQMHQDIISLWDQSLKPCVKLTPLCVTLNCTNLNTNNTNNSSGETNNNANNSSGDYGEIKNCSFNITTPIRNKVQKEYALFNRLDIVPIDD------DNASYTLISCNTSVITQACPKISFDPIPIHYCAPAGFAILKCNNKTFNGTGPCTNVSTVQCTHGIRPVVSTQLLLNGSLAEKETVIRSENFTNNAKTIIVQLNEPIPINCTRPNNNTRKSIHI--APGRAFYATGTIIGNIRQAHCNISNTTWQNALKNVVAKLREQFG-NKTIVFNQSSGGDPEITMHTFNCGGEFFYCDTTRLFNSTWGFNDT---SDRANSTNSNITLPCRIKQIINRWQEVGRAMYAPPIRGQIRCSSNITGLLLTRDGGNETEENKTNTETFRPGGGNMKDNWRSELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVG-LGAMFLGFLGAAGSTMGAASITLTVQARQLLSGIVQQQNNLLRAIEAQQHMLQLTVWGIKQLQARVLAVERYLKDQQLLGLWGCSGKLICTTAVPWNTSWSNKSQTEIWNNMTWMEWEREINNYTKVIYTLIEESQNQQEKNEQDLLELDTWAGLWNWFDITNWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSLQTLFPGPRG-PDRPEGIEEGGGERDRNGSRTLVHGFLALVWVDLRSLCLFSYHRLRDLLLIVTRTVELLGR-RGWEALKYWWNILQYWSQELRSSAVSLLNATAIAVAEGTDRIIEVVQRACRAILHIPRRIRQGAERLLL
>V09JF_POVIp-1D4.2
MKAKEIRKNCQRLWRW----GVLLLGMLMICSATEKLWVTVYYGVPVWKEANTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLANVTENFNMWKNDMVEQMHQDIISLWDQSLKPCVKLTPLCVILNCTNLNTNNTNN-------SSGDYGEIKGEIKNCSFNITTPIRNKVQKEYALFNRLDIVPIDD------DNASYTLISCNTSVITQACPKISFEPIPIHYCAPAGFAILKCNNKTFNGTGPCTNVSTVQCTHGIRPVVSTQLLLNGSLAEKETVIRSENFTNNAKTIIVQLNESIPIDCIRPNNNTRKSIHI--APGRAFYATGTIIGDIRQAYCNISNTTWQNALKKIVTKLREQFG-NKTIVFNKSSGGDPEITMHTFNCGGEFFYCDTTGLFNSTWGFNDT---SDWAKSTDSNITLPCRIKQIINRWQEVGRAMYAPPIRGQIRCSSNITGLLLTRDGGNETEENKTTTETFRPGGGNMKDNWRSELYKYKVVKIEPLGVAPTQAKRRVVQREKRAVG-LGAMFLGFLGAAGSTMGAASITLTVQARQLLSGIVQQQNNLLRAIEAQQHMLQLTVWGIKQLQARVLAVERYLKDQQLLGLWGCSGKLICTTAVPWNTSWSNKSQKEIWDNMTWMEWEREINNYTKVIYTLIEESQNQQEKNEQELLALDTWASLWNWFDITKWLWYIKIFIMIVGGLIGLRIVFTVLSIVNRVRQGYSPLSLQTLFPGPRG-PDRPEGIEEGGGERDRNGSRTLVHGFLALVWVDLRSLCLFSYHRLRDLLLIVTRTVELLGR-RGWEALKYWWNILQYWSQELRNSAVSLLNATAIAVAEGTDRIIEVVQRACRAILHIPRRIRQGAERLLL
>V09F_PRAp-2F9.2
MKAKEIRKNCQRLWRW----GILLLGMLMICSATEKLWVTVYYGVPVWKEANTTLFCASDAKAYDTEAHNVWATHACVPTDPNPQEVVLANVTENFNMWKNDMVEQMHQDIISLWDQSLKPCVKLTPLCVTLNCTNLNTNNTNNSSR-ETSNANNSSGDYGEIKNCSFNITTPIRNKVQKEYALFNRLDIVPIDD------DNASYTLISCNTSVITQACPKISFDPIPIHYCAPAGFAILKCNNKTFNGTGPCTNVSTVQCTHGIRPVVSTQLLLNGSLAEKEIIIRSENFTNNAKTIIVQLNESIPINCIRPNNNTRKSIHI--APGRAFYATGTIIGNIRQAHCNISNTTWQNALKKIVTKLREQFG-NKIIVFNQSSGGDPEITMHTFNCGGEFFYCDTTRLFNSTWEFNDT---SDQANSTNINITLPCRIKQIINRWQEVGRAMYAPPIRGQIRCSSNITGLLLTRDGGNETEENKTNTETFRPGGGNMKDNWRSELYKYKVVKIEPLGVAPTQAKRRVVQREKRAVG-LGAMFLGFLGAAGSTMGAASITLTVQARQLLSGIVQQQNNLLRAIEAQQHMLQLTVWGIKQLQARVLAVERYLKDQQLLGLWGCSGKLICTTAVPWNTSWSNKSQTEIWNNMTWMEWEREINNYTKVIYTLIEESQNQQEKNEQELLALDTWASLWNWFDITKWLWYIKIFITIVGGLIGLRIVFTVLSIVNRVRQGYSPLSLQTLFPGPRG-PDRPEGIEEGGGERDRNGSRTLVHGFLALVWVDLRSLCLFSYHRLRDLLLIVTRTVELLGR-RGWEALKYWWNILQYWSQELKNSAISLLNATAIAVAEGTDRIIEVVQRACRAILHIPRRIRQGAERILL
>V03NP_POVIp-2D5.10.2
MRVRGIRKNYQHWWGWS-TMLTMLLGMLMICKAAEQLWVTIYYGVPVWKEATTTLFCASDAKAYDTEMHNVWATHACVPTDPNPQEVVLENVTENFNMWENNMVEQMHEDIISLWDQSLKPCVKLTPLCVTLNCTNANISSNTSN------IYNSSIYKPGDIKNCSFNITTHIRDKVKKEYALFYALDVIPIDPTVGNKTTTTDYRLISCNTSVVTQACPKVSFEPIPIHYCAPAGFAILKCNDKMFNGTGPCKNVSTIQCTHGIRPVVSTQLLLNGSLAEEDIVIRSKNFTDNAKTIIVQLNKTVTINCTRPSNNTRKSIHI--APGRAFYATGNIIGDIRQAHCNLSRKDWNSTLRLIANKLREQFG-NKTIVFNSSSGGDPEIVMHSFNCRGEFFYCNTTPLFNSTWYSNN----TQEGMEGNSIITLQCRIKQIVNMWQEVGKAMYAPPIKGQIRCSSNITGLLLTRDGG----NTTNRPEVFRPGGGNMKDNWRSELYKYKVVKLEPLGVAPTKAKRRVVQREKRAVG-IGALFLGFLGAAGSTMGAASMTLTVQARQLLSGIVQQQNNLLMAIEAQQHMLQLTVWGIKQLQARVLAVESYLKDQQLLGIWGCSGKLICTTTVPWNTSWSNKTYNEIWDNMTWMQWEKEIDNHTSLIYTLIEESQNQQEKNELDLLALDKWANLWSWFNITNWLWYIKIFIMIVGGLVGLRIVFTVLSIVNRVRQGYSPLSFQTHLPARRG-PDRPEEIEEEGGERDRGRSEPLVTGFLTLIWVDLRSLCLFSYHRLRDLLLIAARIVELLGR-RGWEILKYWWNLLQYWSQELKNSAVSLLNATAIAVAEGTDRVIEIVQGIGRAILHIPARIRQGLERVLL
Acknowledgements
We thank J. Stuckey and J. Lara for assistance with figures, and members of the Structural Biology Section and Structural Bioinformatics Core, Vaccine Research Center, for discussions or comments on the manuscript. Support for this work was provided by the Intramural Research Program of the Vaccine Research Center, National Institute of Allergy and Infectious Diseases, and by IAVI Neutralizing Antibody Consortium (NAC). This work utilized the computational resources of the NIH HPC Biowulf cluster (NIH HPC). This study used the Office of Cyber Infrastructure and Computational Biology (OCICB) High Performance Computing (HPC) cluster at the National Institute of Allergy and Infectious Diseases (NIAID), Bethesda, MD.