Overview of the SAMPL6 host-guest binding affinity prediction challenge

Andrea Rizzi; Steven Murkli; John N. McNeill; Wei Yao; Matthew Sullivan; Michael K. Gilson; Michael W. Chiu; Lyle Isaacs; Bruce C. Gibb; David L. Mobley; John D. Chodera

doi:10.1101/371724

Abstract

The ability to accurately predict the binding affinities of small organic molecules to biological macromolecules would greatly accelerate drug discovery by reducing the number of compounds that must be synthesized to realize desired potency and selectivity goals. Unfortunately, the process of assessing the accuracy of current quantitative physical and empirical modeling approaches to affinity prediction against binding data to biological macromolecules is frustrated by several challenges, such as slow conformational dynamics, multiple titratable groups, and the lack of high-quality blinded datasets. Over the last several SAMPL blind challenge exercises, host-guest systems have emerged as a practical and effective way to circumvent these challenges in assessing the predictive performance of current-generation quantitative modeling tools, while still providing systems capable of possessing tight binding affinities. Here, we present an overview of the SAMPL6 host-guest binding affinity prediction challenge, which featured three supramolecular hosts: octa-acid (OA), the closely related tetra-endo-methyl-octa-acid (TEMOA), and cucurbit[8]uril (CB8), along with 21 small organic guest molecules. A total of 119 entries were received from 10 participating groups employing a variety of methods that spanned electronic structure and movable type calculations in implicit solvent to alchemical and potential of mean force strategies using empirical force fields and explicit solvent models. While empirical models tended to obtain better performance, it was not possible to identify a single approach consistently providing superior predictions across all host-guest systems and statistical metrics, and the accuracy of the methodologies generally displayed a substantial dependence on the systems considered, arguing for the importance of considering a diverse set of hosts in blind evaluations. Several entries exploited previous experimental measurements of similar host-guest systems in an effort to improve their physical-based predictions via some manner of rudimentary machine learning; while this strategy succeeded in reducing systematic errors, it was not able to generated a corresponding improvement of correlation statistics. Comparison to previous rounds of the host-guest binding free energy challenge highlights an overall improvement in the correlation obtained by the affinity predictions for OA and TEMOA systems, but a surprising lack of improvement in root mean square error over the past several challenge rounds. The data suggests that further refinement of force field parameters and improved treatment of chemical effects (e.g., buffer salt conditions, protonation states) may be required to continue to enhance predictive accuracy.

Introduction

Quantitative physical and empirical modeling is playing a growing role in aiding or directing the design of small molecule biomolecular ligands for use as potential therapeutics or chemical probes [1–6]. Despite these successes, the effectiveness of these calculations in prioritizing molecules for synthesis is a strong function of the inaccuracy of predictions [7], with retrospective estimates of accuracy suggesting current methodologies are around 1–2 kcal/mol inaccurate on well-behaved protein-ligand systems [8, 9].

Assessment of how much of this inaccuracy can be attributed to fundamental limitations of the force field in accurately modeling energetics is complicated by the presence of numerous additional factors [10]. Proteins are highly dynamic entities, and many common drug targets—such as kinases [11] and GPCRs [12]— possess slow dynamics with timescales of microseconds to milliseconds [13] that frustrate the ability to obtain true equilibrium affinities. While there has been some attempt to curate benchmark sets of protein-ligand affinity data in well-behaved model systems that are believed to be mostly free of slow-timescale motions that would convolve convergence issues with forcefield inaccuracies [10], other effects can complicate assessment of the accuracy of physical modeling benchmarks. Ionizable residues, for example, comprise approximately 29% of all protein residues [14], and large-scale computational surveys suggest that 60% of all protein-ligand complexes undergo a change in ionization state upon binding [15]. For physical or empirical modeling approaches that assume fixed protonation states throughout the complexation process, protonation state effects are hopelessly convolved with issues of force field inaccuracy.

Host-guest systems are a tractable model for assessing force field inaccuracies

Over the last decade, supramolecular host-guest complexes have emerged as a practical and useful model system for the quantitative assessment of modeling errors for the interaction of druglike small molecules with receptors. Supramolecular hosts such as cucurbiturils, cavitands, and cyclodextrins can bind small druglike molecules with affinities similar to protein-ligand complexes [16–18]. The lack of soft conformational degrees of freedom of these hosts eliminates the potential for slow microsecond-to-millisecond receptor relaxation timescales as a source of convergence issues [10], while the small size of these systems allows many methodologies to take advantage of faster simulation times to rapidly assess force field quality. The high solubilities of these systems permit high-quality biophysical characterization of their interactions via gold-standard methods such as isothermal titration calorimetry (ITC) and nuclear magnetic resonance (NMR) [19–21]. Additionally, the stability of supramolecular hosts at extreme pH allows for strict control of protonation states in a manner not possible with protein-ligand systems, allowing confounding protonation state effects to be eliminated from consideration if desired [22]. Collectively, these properties have made host-guest systems a productive route for revealing deficiencies in modern force fields through blind community challenge exercises we have organized as part of the Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) series of blind prediction challenge [23–26].

SAMPL host-guest challenges have driven advances in our understanding of sources of error

The SAMPL (Statistical Assessment of the Modeling of Proteins and Ligands) challenges are a recurring series of blind prediction challenges for the computational chemistry community [27, 28]. Through these challenges, SAMPL aims to evaluate and advance computational tools for rational drug design: By focusing the community on specific phenomena relevant to drug discovery—such as the contribution of force field inaccuracy to binding affinity prediction failures—isolating these phenomena from other confounding factors in well-designed test systems, evaluating tools prospectively, enforcing data sharing to learn from failures, and releasing the resulting high-quality datasets into the community as benchmark sets, SAMPL has driven progress in a number of areas over five previous rounds of challenge cycles [23, 24, 24–26, 29–37].

More specifically, SAMPL host-guest challenges have provided key tests for modeling of binding interactions [10], resulting in an increased focus on how co-solvents and ions modulate binding (resulting in errors of up to 5 kcal/mol when these effects are neglected) and the importance of adequately sampling water rearrangements [10, 25, 26, 38]. In turn, this detailed examination has resulted in clear improvements in subsequent SAMPL challenges [26], though host-guest binding remains difficult to model accurately [39], in part due to force field limitations (resulting in new efforts to remedy major force field deficiencies [40]).

SAMPL6 host-guest systems

Three hosts were selected for the SAMPL6 host-guest binding challenge from the Gibb Deep Cavity Cavitand (GDCC) [41–44] and the cucurbituril (CB) [45–47] families (Figure 1). The guest ligand sets were purposefully selected for the SAMPL6 challenge. The utility of these particular host systems for evaluating free energy calculations has been reviewed in detail elsewhere [43, 44].

Figure 1. Hosts and guests featured in the SAMPL6 host-guest blind challenge dataset.

Three-dimensional structures of the three hosts featured in the SAMPL6 challenge dataset (OA, TEMOA, and CB8) are shown in stick view from top and side perspective views. Carbon atoms are represented in gray, hydrogens in white, nitrogens in blue, and oxygens in red. Guest ligands for each complex are shown as two-dimensional chemical structures annotated by hyphenated host and guest names. Protonation states of the guest structures correspond to the predicted dominant microstate at the experimental pH at which binding affinities were collected, and matches those provided in the mol2 and sdf input files shared with the participants when the challenge was announced. The same set of guests OA-G0 through OA-G7 was used for both OA and TEMOA hosts. The gray frame (lower right) contains the three CB8 guests that constitute the bonus challenge.

The two GDCCs, octa-acid (OA) [41] and tetra-endo-methyl-octa-acid (TEMOA) [48], are low-symmetry hosts with a basket-shaped binding site accessible through the larger entryway located at the top. These hosts also appeared in two previous SAMPL host-guest challenges— SAMPL4 [25] and SAMPL5 [26]—with the names of OAH and OAMe respectively with different sets of guests. OA and TEMOA differ by four methyl groups that reduce the size of the binding site entryway (Figure 1). Both hosts expose eight carboxyl groups that increase their solubility. The molecular structures of the eight guests selected for the SAMPL6 challenge for characterization against both OA and TEMOA are shown in Figure 1(denoted OA-G0 through OA-G7). These guests feature a single polar group situated at one end of the molecule that tends to be exposed to solvent when complexed, while the rest of the compound remains buried in the hydrophobic binding site.

A second set of guest ligands were developed for the host cucurbit[8]uril (CB8). This host previously appeared in the SAMPL3 host-guest binding challenge [49], but members of the same family or analogs such as cucurbit[7]uril (CB7) and CBClip [50] were featured in SAMPL4 and SAMPL5 challenges as well. CB8 is a symmetric (D_8h), ring-shaped host comprising eight identical glycoluril monomers linked by pairs of methylene bridges. Its top-bottom symmetry means that asymmetric guests have at least two symmetry-equivalent binding modes that can be kinetically separated by timescales not easily achievable by standard molecular dynamics (MD) or Monte Carlo simulations and may require special considerations, in particular in alchemical absolute binding free energy calculations [51]. The CB8 guest set (compounds CB8-G0 to CB8-G13 in Figure 1) includes both fragment-like and bulkier drug-like compounds.

Some of the general modeling challenges posed by both families of host-guest systems have been characterized in previous studies. While their relatively rigid structure minimizes convergence difficulties associated with slow receptor conformational dynamics, both families have been shown to bind guest ligands via a dewetting processes—in which waters must be removed from the binding site to accommodate guests— in a manner that can frustrate convergence for strategies based on molecular simulation. In the absence of tight-binding guest ligands, the octa-acid host experiences fluctuations in the number of bound waters on timescales of several nanoseconds [52]; a similar phenomenon was observed in alchemical absolute binding free energy calculations of CB7 at intermediate alchemical states with partially decoupled Lennard-Jones interactions [53]. In addition, both experimental measurements and computational predictions revealed significant sensitivity of the binding affinity to the buffer salt composition and concentration [54–58], which in principle requires buffer conditions to be modeled carefully for comparison to experiments to be meaningful.

Experimental host-guest affinity measurements

A detailed description of the experimental methodology used to collect binding affinity data for OA, TEMOA, and CB8 host-guest systems is described elsewhere [59?]. Briefly, all host-guest binding affinities were determined via direct or competitive isothermal titration calorimetry (ITC) at 298 K. OA and TEMOA measurements were performed in 10 mM sodium phosphate buffer at pH 11.7±0.1 whereas CB8 guests binding affinities were measured in a 25 mM sodium phosphate buffer at pH 7.4. Binding stoichiometries were determined by ¹H NMR spectral integration and/or by ITC. The ITC titration curves were fitted to a single-site model or a competition model for all guests, except for CB8-G12 (donepezil), for which a sequential binding model was used. The stoichiometry coefficient was either fitted simultaneously with the other parameters or fixed to the value verified by the NMR titrations, which is the case for the CB8 guest set, as well as for OA-G5, TEMOA-G5, and TEMOA-G7.

To determine experimental uncertainties, we added the relative error in the nonlinear fit-derived association constant (K_a) or binding enthalpy (ΔH) with the relative error in the titrant concentration in quadrature [60]. We decided to arbitrarily assume a relative error in the titrant concentration of 3% after personal communication with Professor Lyle Isaacs who suggested a value inferior to 5% based on his experience. The minimum relative nonlinear fit-derived uncertainty permitted was 1%, since the fit uncertainty was reported by the ITC software as smaller than this in some cases. It should be noted that the error propagation strategy adopted here assumes that the stoichiometry coefficient is fitted to the ITC data in order to absorb errors in cell volume and titrand concentration; this approach is exact only for the OA/TEMOA sets with the exclusion of OA-G5, TEMOA-G5, and TEMOA-G7, and an underestimate of the true error for the remaining cases. The error was then further propagated to the binding free energies and entropies that were calculated from K_a and ΔH. The final estimated experimental uncertainties are relatively small, never exceeding 0.1 kcal/mol.

The resulting experimental measurements with their uncertainties are reported in Table 1 and Figure 2. The dynamic range of the binding free energy ΔG spans 4.25 kcal/mol for the merged OA and TEMOA guest set, and 7.05 kcal/mol for CB8. The relatively wide cavity of CB8 enables binding stoichiometries different than 1:1. This is the case for three of the CB8 guests, specifically CB8-G1 (tolterodine), CB8-G4 (gallamine triethiodate), and CB8-G12 (donepezil). Curiously, while CB8-G12 was found to bind in 2:1 complexes (two guests bound to the same host), the NMR experiments determined stoichiometries of 1:2 and 1:3 for CB8-G1 and CB8-G4 respectively (one guest bound to multiple hosts). For the last two guests, the ITC titration curves fit well to a single set of sites binding model which indicates that the each of the binding events are equivalent. In Table 1 and Figure 2 we report the binding affinity of both the 1:1 and the 2:1 complex for CB8-G12, which are identified by CB8-G12a and CB8-G12b respectively, and the free energy of the 1:1 complex for CB8-G1 and CB8-G4.

Figure 2. Overview of experimental binding affinities for all host-guest complexes in the SAMPL6 challenge set.

Binding free energies (ΔG) measured via isothermal titration calorimetry (ITC) are shown (filled circles), along with experimental uncertainties denoting standard error of the mean (black error bars), for OA (yellow), TEMOA (green), and CB8 (blue) complexes.

View this table:

Table 1. Summary of ITC and NMR measurements for the SAMPL6 host-guest dataset.

Guest identifiers (ID), association constants (K_a), binding free energies (ΔG), enthalpies (ΔH), entropies at room temperature (T ΔS) and stoichiometric ratios (n) as determined by ITC and NMR assays are reported for all compounds featured in the challenge. All quantities are reported as point estimates ± statistical error obtained by error propagation. For K_a and ΔH, the reported uncertainties incorporate both the uncertainty in the ITC enthalpogram least-squares fit and an assumed 3% uncertainty in titrant concentration. A minimum least-squares fit uncertainty of 1% was assumed for fit errors reported by instrumentation as < 1%. ΔG and T ΔS and their uncertainties were obtained from the first two quantities. Some of the compounds in the CB8 guest set can be bound by their hosts with stoichiometries different than 1:1. For CB8-G1 and CB8-G4, which can form 1:2 (two hosts bound to the same guest) and 1:3 complexes with CB8, respectively, we report the thermodynamic quantities of only one of the equivalent binding events—the value used to calculate the statistics for challenge entries. For CB8-G12, we report the measurements of both the 1:1 (CB8-G12a) and the 2:1 (CB8-G12b) bound complexes. The original data can be found at https://github.com/MobleyLab/SAMPL6/tree/master/host_guest/Analysis/ ExperimentalMeasurements/experimental_measurements.csv. Eventual updates or corrections to the data will be made available at the same URL, and anyone wishing to reuse the data should refer there.

Methods

Challenge design and logistics

Challenge timeline

On August 24th, 2017, we released in a publicly accessible GitHub repository (https://github.com/MobleyLab/SAMPL6) a brief description of the host-guest systems and the experimental methodology, together with the challenge directions, and input files in mol2 and sdf formats for the three hosts and their guests. The instructions shared online included information about buffer concentrations, temperature, and pH used for the experiments. The participants were asked to submit their predicted absolute binding free energies and, optionally, binding enthalpies, along with a detailed description of the methodology and the software employed through the Drug Design Data Resource (D3R) website (https://drugdesigndata.org/about/sampl6) by January 19th, 2018. We also encouraged the inclusion of uncertainties and/or standard error of the mean (SEM) of the predictions when available. The results of the experimental assays were released on January 26th in the same GitHub repository. The challenge culminated in a conference held on February 22–23, 2018 in La Jolla, CA where the participants shared lessons learned from participating in the challenge after performing retrospective analysis of their data.

Bonus challenge

Three molecules in the CB8 guest sets, namely CB8-G11, CB8-G12, and CB8-G13, were proposed to participants as an optional bonus challenge since they were identified in advance to present some atypical difficulties for molecular modeling. In particular, the initial experimental data suggested both CB8-G11 and CB8-G12 to bind with 2:1 binding stoichiometry while CB8-G13 was deemed to be an especially challenging case for modeling due to the presence of a coordinated platinum atom, which is commonly not readily handled by classical force fields and usually requires larger basis sets for quantum mechanics (QM) calculations than those commonly employed with simple organic molecules. Further investigation after the start date of the challenge revealed an error in the calibration of a CB8 solution which affected the measurement of CB8-G11. After correcting the error, a 1:1 stoichiometry was recovered, and the experiment was repeated to validate the result. Unfortunately, the new data was obtained too late to send out a correction to all participants, so only six entries included predictions for this guest.

Preparation of standard input files

Standard input files for the three hosts were generated for the previous rounds of the SAMPL host-guest binding challenge and uploaded to the repository unchanged, while the guests’ atomic coordinates were generated from their SMILES string representation through the OMEGA library [61] in the OpenEye Toolkit (version 2017.Oct.b5) except for oxaliplatin (CB8-G13), which was generated with OpenBabel to handle the platinum atom. The compounds were then docked into their hosts with OpenEye’s FRED docking facility [62, 63]. Stereochemistry of the 3D structures recapitulated the stereochemistry of compounds assayed experimentally; experimental assays for chiral compounds were enantiopure except OA-G5, which was measured as a racemic mixture. For this molecule, we picked at random one of the two enantiomers under the assumption that the guest chirality (for this guest with a single chiral center) would not affect the binding free energy to an achiral host such as OA and TEMOA since the system otherwise contains no chiral centers. This information was included in the instructions when the challenge was released. Guest mol2 files also included AM1-BCC point charges generated with the AM1-BCC charge engine in the Quacpac tool from the OpenEye toolkit [64, 65]. Figure 1 shows the protonation state of the molecules as provided in the input files, which reflects the most likely protonation state as predicted by Epik [66, 67] from the Schrödinger Suite 2017-2 (Schrödinger) at experimental buffer pH (11.7 for OA and 7.4 for CB8). This resulted in all molecules possessing a net charge, with the exception of oxaliplatin and the CB8 host, which have no acidic or basic groups. Specifically, the eight carboxyl groups of OA and TEMOA were modeled as deprotonated and charged. The instructions stated clearly that the protonation and tautomeric states provided were not guaranteed to be optimal. In particular, participants in the bonus challenge were advised to treat CB8-G12 with care as, in its protonated state, the nitrogen proton could be placed so that the substituent was axial or equatorial. The latter solution was arbitrarily adopted by the tools used to generate the input files for CB8-G12.

Statistical analysis of challenge entries

Performance statistics

We computed root mean squared error (RMSE), mean signed error (ME), coefficient of determination (R²), and Kendall rank correlation coefficient (τ) comparing experimentally determined binding free energies with blinded participant free energy predictions.

The mean signed error (ME), which quantifies the bias in predictions, was computed as where are the experimental measurement of the binding free energy and its computational prediction respectively for the i-th molecule, and N is the total number of molecules in the dataset. A positive ME reflects an overestimated binding free energy ΔG (or underestimated affinity K_d = e^−βΔG × (1 M).

Some of the methods appearing in SAMPL6 were also used in previous rounds of the same challenge to predict relative binding free energies of similar host-guest systems. In order to comment on the performance of these methods over sequential challenges, for which statistics on absolute free energies are not readily available, we computed a separate set of statistics defined as offset statistics, as opposed to the absolute statistics defined above, in the same way they were reported in previous challenge overview papers. These statistics, termed RMSE_a, and τ₀, were computed identically to absolute statistics but by substituting with in the estimator expressions.

Given the similarities of the two octa-acid hosts the set of their guest molecules, and that the large majority of the submitted methodologies were applied to both sets, we decided to report here the statistics computed using all the 16 predictions performed for OA and TEMOA (i.e., 8 predictions for each host). This merged set will be referred to as OA/TEMOA set in the rest of the work. The only method used to predict the binding free energies of the TEMOA set but not of the OA set was US-CGenFF (see Table 2 for a schematic description of the methodology). We also decided to calculate separate statistics for the CB8 to highlight the general difference in performance between the predictions of the two host families. Statistics calculated on the two separate OA and TEMOA sets, as well as on the full dataset including CB8, OA, and TEMOA, are available on the GitHub repository (https://github.com/MobleyLab/SAMPL6/tree/master/host_guest/Analysis).

View this table:

Table 2. Summary of methodologies used by the participants in the SAMPL6 host-guest challenge.

When a method uses multiple models (e.g., MM is used to generate the conformations to evaluate at the QM level in DFT(TPSS)-D3), only the energy and solvation models used for the final free energy prediction are listed. COSMO-RS: conductor-like screening model for real solvents [72]; DDM: double decoupling method [73]; FM: Force Matching [74]; FSDAM: Fast switching double annihilation method [75, 76] KMTISM: KECSA-Movable Type Implicit Solvation Model [77]; MD: molecular dynamics; MovTyp Movable Type method [78]; PBSA: Poisson-Boltzmann surface area [79]; REST: replica exchange with solute torsional tempering [80, 81]; RFEC: relative free energy calculation; QM/MM: mixed quantum mechanics and molecular mechanics; SOMD: double annihilation or decoupling method performed with Sire/OpenMM6.3 software [82, 83]; SQM: semi-empirical quantum mechanics; US: umbrella sampling [84]; VSGB2.1: VSGB2.0 solvation model reflt to OPLS2.1/3/3e [85];

We generated bootstrap distributions of the statistics and computed 95-percentile bootstrap confidence intervals of the point estimates by generating 100 000 bootstrap samples through random sampling of the set of host-guest pairs with replacement. When the submission included SEMs for each prediction, we accounted for the statistical uncertainty in predictions by adding, for each bootstrap replicate, an additional Gaussian perturbation to the prediction with a standard deviation indicated by the SEM for that prediction.

Null model

In order to compare the results obtained by the participants to a simple model that can be evaluated with minimal effort, we computed the binding free energy predicted by MM-GBSA rescoring [68] using Prime [69, 70] with the OPLS3 forcefield [71] in the Schrödinger Suite 2018-1 (Schrödinger). We used the same docked poses provided in the input files that were shared with all the participants as the initial coordinates for all the calculations. All docked positions were minimized before being rescored with the OPLS3 force field and the VSGB2.1 solvent model. The only exception to this was CB8-G4, which was manually re-docked into the host, as the initial structure contained steric clashes that could not be relaxed by minimization, causing the predicted binding free energy to spike to an unreasonable value of +2443 kcal/mol.

Results

We received 42 submissions for the OA guest set, 43 for TEMOA, and 34 for CB8, for a total of 119 submissions, from 10 different participants, 5 of whom uploaded predictions for the three compounds in the bonus challenge as well. Only two groups submitted enthalpy predictions, which makes it impractical to draw general conclusions about the state of the field regarding the reliability of enthalpy predictions. Moreover, the predictive performance was generally poor (see Supplementary Figure 9). The results of the enthalpy calculations are thus not discussed in details here, but they are nevertheless available on the GitHub repository.

Overview of the methodologies

Including the null model, 41 different methodologies were applied to one or more of the three datasets. In particular, the submissions included a total of 25 different variations of the movable type method exploring the effect of the input structures, the force field, the presence of conformational changes upon binding, and the introduction of previous experimental information on the free energy estimates. In order to facilitate the comparison among methods, we focus in this analysis on a representative subset of 7 different variations of the methodology. Supplementary Figure 7 and Supplementary Figure 8 show statistic bootstrap distributions and correlation plots for all the movable type free energy calculations submitted. As many of the methodologies are reported in detail elsewhere, in this section, we give a brief overview of the different strategies employed for the challenge to model the host-guest systems and estimate the binding free energies, and we leave the detailed descriptions of the various methodologies to the articles referenced in Table 2.

Modeling

The majority of the participants either used the docked poses provided in the input files or ran a separate docking program to generate the initial complex conformation for the calculations. In few cases, the starting configuration was found by manually placing the guest inside the host. Surprisingly, the most common solvent model used in classical simulations was still TIP3P [86], a water model parameterized by Jorgensen 35 years ago for use with a fixed-cutoff Monte Carlo code neglecting long-range dispersion interactions and omitting long-range electrostatics. The only other explicit water models used in this round of the challenge were the significantly more modern AMOEBA [87] and TIP4P-Ew [88] water models, which was used to sample conformations to evaluate at the QM level. Implicit solvent models were adopted only in MMPBSA and for the movable type and QM calculations. We observed more variability in the treatment of buffer salt concentrations despite the known importance of this element in affecting the binding predictions, which may reflect a lack of standard practices in the field. Some entries modeled the buffer ionic strength explicitly with Na+ and Cl-ions while others included only the neutralizing counterions or used a uniform neutralizing charge. One of the participating groups submitted multiple variants of the SOMD method either utilizing only neutralizing counterions or including additional ions simulating the ionic strength at experimental conditions, which makes it possible to directly assess the effect of this modeling decision on the selected host-guest systems.

Most methods employing classical force fields used GAFF [89] or GAFF2 (still under active development) with AM1-BCC [64, 65] or RESP [90] charges, which were usually derived at the Hartree-Fock or MP2 level of theory. Other approaches made use of the AMOEBA polarizable model [87], CGenFF [74] or force matching [91] starting from CGenFF parameters. The movable type calculations utilized either the KECSA [92] scoring algorithm or the more recently developed GARF [93]. Several submissions employed QM potentials at the semi-empirical PM6-DH+ [94, 95] or DFT level of theory either modeling the full host-guest system or in hybrid QM/MM approaches that treated quantum mechanically the guest only. DFT calculations employed B3LYP [96], B3PW91 [96], or TPSS [97] functionals and often the DFT-D3 dispersion correction [98].

Sampling and free energy prediction

All the challenge entries used MD to sample host-guest conformations; uses of docking were limited to preparation of initial bound geometries for subsequent simulations. This was also the case also for QM and movable type calculations, where samples generated from MD were in some cases clustered prior to quantum chemical energy evaluations. In a few cases, enhanced sampling techniques were used; in particular, the entries identified by DDM-FM and DDM-FM-QMM used Hamiltonian Replica Exchange (HREX) [99] as part of their double decoupling method (DDM) calculation [73] while Replica Exchange with Solute torsional Tempering (REST) [80, 81] was employed in FSDAM to generate from equilibrium the starting configurations for the fast switching protocol. Many groups used the double decoupling or the double annihilation method with purely classical force fields or with hybrid QM/MM potentials and either Bennett acceptance ratio (BAR) [100, 101] or the multistate Bennett acceptance ratio (MBAR) [102] to estimate free energies for the aggregated simulation data. Other classes of methodologies applied to this dataset include umbrella sampling (US) [84], movable type [78], MMPBSA [103], and free energy predictions based on QM calculations.

The repeat appearance of hosts chosen from the octa-acid and cucurbituril families as test systems for the SAMPL binding challenge, which reflects the continuous contribution of experimental data from the Gibb and Isaacs laboratories, led some groups to take advantage of previously available experimental data to improve their computational predictions. Several entries (e.g., SOMD-D, US-GAFF-C, and MovTyp-GE3L) were submitted with a linear¹ correction of the form where the slope and offset coefficients (i.e., a and b respectively) were trained on data generated for previous rounds of the challenge. In some of the movable type calculations (e.g., MovTyp-GE3O), the coefficient a was fixed to unity and the training data used to determine a purely additive bias correction. Relatedly, RFEC-GAFF2 and RFEC-QMMM, which included predictions for the OA and TEMOA guest sets, calculated the relative binding free energy between the compound and determined the offsets necessary to obtain absolute free energy using binding measurements of similar OA and TEMOA guests.

Submission performance statistics

As mentioned above, we present here the statistics obtained by the challenge entries on the CB8 dataset and the merged OA and TEMOA dataset with the exception of US-CGenFF, for which we received a submission for the TEMOA set only. Moreover, since only a minority of entries had predictions for the bonus challenge, we excluded CB8-G11, CB8-G12, and CB8-G13 when computing the statistics of all the methodologies in order to compare them on the same set of compounds. Table 3 reports such statistics with 95-percentile confidence intervals and and Figure 4 show the statistics bootstrap distributions. Some of the methods were used to estimate the binding free energy of only one between the OA/TEMOA and the CB8 sets, and, as a consequence, some of the table entries are missing. For the methodologies that made predictions of the bonus compounds, we report the statistics obtained including them separately in Table 4. While it is difficult to isolate methods and models that performed very well across datasets and statistics, few patterns emerged from comparing the different entries.

Figure 3. Free energy correlation plots obtained by the methods on the three host-guest sets.

Scatter plots showing the experimental measurements of the host-guest binding free energies (horizontal axis) against the methods’ predictions on the OA (yellow), TEMOA (green), and CB8 (blue) guest sets with the respective regression lines of the same color. The solid black line is the regression line obtained by using all the data points. The gray shaded area represent the points within 1.5 kcal/mol from the diagonal (dashed black line). Only a representative subset of the movable type calculations results are shown. See Supplementary Figure 7 for the free energy correlation plots of all the movable type predictions.

Figure 4. Bootstrap distribution of the methods performance statistics.

Bootstrap distributions of root mean squared error (RMSE), mean signed error (ME), coefficient of determination (R²) and Kendall rank correlation coefficient (τ). For each methodology and statistic, two distributions are shown for the merged OA/TEMOA set (yellow, pointing upwards) and the CB8 set excluding the bonus challenge compounds (blue, downwards). The black horizontal box between the two distributions of each method shows the median (white circle) and interquartile range (box extremes) of the overall distribution of statistics (i.e., pooling together the OA/TEMOA and CB8 statistic distributions). The short vertical segment in each distribution is the statistic computed using all the data. The distributions of the methods that incorporate previous experimental data into the computational prediction are highlighted in gray. Methodologies are ordered using the statistics computed on the OA/TEMOA set, unless only data for the CB8 set was submitted (e.g., DDM-FM), in which case the CB8 set statistic was used to determine the order. Only a representative subset of the movable type calculations results are shown. See Supplementary Figure 8 for the bootstrap distributions including all the movable type submissions.

View this table:

Table 3. Method performance statistics and bootstrap confidence intervals on OA/TEMOA and CB8 datasets.

Root mean square error (RMSE), mean signed error (ME), coefficient of determination (R²), and Kendall correlation coefficient (τ) obtained by each methodology on the merged OA/TEMOA and the CB8 datasets. The only exception is US-CGenFF whose OA/TEMOA statistics were computed using only the TEMOA set since no submission was received for OA. Table entries are left blank for those methods that were applied to only one of the guest sets. The predictions performed for the bonus challenge guests were excluded when computing the statistics for the CB8 dataset. Each statistic is reported with bootstrap distribution mean (between parentheses) and 95-percentile bootstrap confidence interval (square brackets) obtained through 100 000 cycles of resampling with replacement. The standard errors of the mean of the predictions reported in the submissions are included in the confidence intervals. The original data for the combined OA/TEMOA and CB8 datasets can be found respectively at and https://github.com/MobleyLab/SAMPL6/tree/master/host_guest/Analysis/OA-TEMOA/StatisticsTables/statistics.csv https://github.com/MobleyLab/SAMPL6/tree/master/host_guest/Analysis/CB8-NOBONUS/StatisticsTables/statistics.csv. Eventual updates or corrections to the data will be made available at the same URL, and anyone wishing to reuse the data should refer there.

Challenge entries generally performed better on OA/TEMOA than CB8

In general, the CB8 guest set proved to be more challenging than the OA/TEMOA set both in terms of error and correlation statistics. It is rarely the case that the same method scored better statistics on the former set, and only MovTyp-GT1N does so with statistical significance while the opposite can be observed relatively often. Figure 5 shows the root mean squared error (RMSE) and mean signed error (ME) with 95-percentile bootstrap confidence interval computed for each molecule using the ten methods that scored best in RMSE statistics in the merged OA/TEMOA set or the CB8 set (excluding the bonus challenge), which formed a set of 14 different techniques employing GAFF and GAFF2 [89], CGenFF [91], force matching [74], AMOEBA [87], and QM/MM potentials using DFT(B3LYP) [96] or PM6-DH+ [94, 95]. These top ten methods performed poorly on eight out of the eleven CB8 compounds, and while confidence intervals for all the statistics are generally large, they also performed significantly worse on several CB8 guests than the OA/TEMOA ligands they accurately predicted affinities for. This loss of accuracy seems to be fairly consistent across models and methodologies, but the data is not sufficient to determine the exact cause of this behavior (e.g., force field parameters, the generally larger dimensions of the CB8 guests, protonation states). However, the results of the related SAMPL6 SAMPLing challenge does suggest that properly accounting for slow conformational dynamics for some of the CB8 guests may require longer simulation times than for the OA compounds, which may have contributed to poorer performance over the OA set [104]. Moreover, explicitly modeling the buffer salt concentration in SOMD significantly reduced the difference in error on the two guest sets (compare SOMD-C with SOMD-C-nobuffer), albeit without a commensurate improvement in correlation statistics, so the issue of missing chemical effects may also have role.

Figure 5. Free energy error statistics for ten best-performing methods RMSE by molecule.

Root mean square error (RMSE) and mean signed error (ME) computed using the ten methodologies with the lowest RMSE on the merged OA/TEMOA and CB8 datasets (excluding bonus challenge compounds) for all guests binding to OA (yellow), TEMOA (green), and CB8 (blue). Error bars represent 95-percentile bootstrap confidence intervals.

Linear corrections fit to prior experimental data can reduce error without improving correlation

Nine of the entries represented in Figure 4 incorporate fits to prior experimental data with the goal of either improving the computationally-predicted affinities or determining the offset necessary to convert relative free energy estimates into absolute binding affinities; of these, seven are among the top 10 methods scoring the lowest RMSE on the OA/TEMOA set. When considering multiple submissions of the same technique that differ only in whether a fit to prior experimental data was included, the entry with the lowest RMSE incorporates experimental data in every case. However, the results are less consistent when considering the CB8 guest set. The trend is the same for the SOMD, US-GAFF, and MovTyp submissions that used the KECSA potential, but it is reversed for the majority of the MovTyp submissions employing the GARF energy model (see also Supplementary Figure 8). It should be noted that many of the MovTyp corrections were trained on a dataset that pooled binding measurements of OA, TEMOA, and CB8 guests, so it is possible that the approach failed to generalize when the methodology was affected by a systematic error of opposite sign on the OA/TEMOA and CB8 sets (see Figure 3). The methods that scored best (in terms of lowest RMSE) are US-GAFF-C for OA/TEMOA, and SOMD-D-nobuffer for CB8; excluding methods utilizing fits to experimental data, US-CGenFF and MovTyp-GT1N have the lowest RMSE on the OA/TEMOA and CB8 sets, respectively.

On the other hand, integrating prior experimental data did not appreciably impact correlation statistics, and the same methods with or without experimental correction show very similar R² and τ bootstrap distributions. It should be noted that a constant offset or multiplicative factor modifying all data points cannot alter the R² statistic beside correcting an inverse correlation, and they can change τ only if the transformation is such that the ranking of at least two data points is switched, which a single linear transformation with positive slope cannot do. However, since some of the entries trained different corrections for OA and TEMOA guests, the correlation statistics for the combined OA/TEMOA set were affected (see for example SOMD-C and SOMD-D, MovTyp-GE3N and MovTyp-GE3S in Supplementary Figure 8). It is true that the initial performance of these methods without the experiment-based correction on the separated OA and TEMOA sets was relatively similar, thus leaving a small margin of improvement for this type of correction to reduce the data variance around the regression line and increasing R². However, comparing the statistics computed pooling together the OA/TEMOA and CB8 predictions, which displayed very different correlation statistics, did not show any significant improvement (data not shown). In fact, R² for the SOMD-C calculations decreased from 0.47 [0.09,0.78] to 0.18 [0.01,0.48] when incorporating the experimental correction in SOMD-D, despite the expected drop in RMSE, and a similar observation can be made for SOMD-D-nobuffer and the τ statistic.

GAFF/AM1-BCC and TIP3P consistently overestimated the host-guest binding affinities

Several entries used GAFF to parameterize the host-guest systems with AM1-BCC charges and TIP3P water molecules (i.e., SOMD, US-GAFF, DDM-GAFF) so it is possible to make relatively general observations about the performance of this model. Firstly, if we ignore the submissions that employ an experiment-based correction, every single method in this group predicted tighter binding than what supported by experiments with both the OA/TEMOA and the CB8 sets. This observation extends to MMPBSA-GAFF as well, which still used GAFF but with RESP charges and the implicit PBSA solvent model, but many of the methodologies that entered the challenge display a similar systematic error (see also ME in Figure 5), although GAFF is the only force field that was independently adopted by multiple groups and used with various classes of techniques.

Secondly, while error statistics vary substantially among GAFF entries, the correlation statistics are quite similar. Most of these are among the best-performing methods for the OA/TEMOA set, with τ ranging between 0.7–0.8, despite showing poor correlations on the CB8 set. The main exception to this pattern is given by DDM-GAFF, which shows moderate correlations for bot datasets. The reason for this is not entirely clear, as the methodology adopted for DDM-GAFF entry is very similar to SOMD-C-nobuffer. Their main difference appears to lie in their treatment of long-range electrostatics, with SOMD using reaction field electrostatics [105] and DDM-GAFF using PME [106], as well as the use of restraints, with SOMD employing a single flat-bottom restraint to keep the guest in the host’s cavity and DDM-GAFF restraining the relative orientation of the guest by means of harmonic restraining potentials applied to one distance, two angles, and three torsions.

Comparison to null model and general observations

The vast majority of the entries statistically outperformed the MMGBSA calculation we used as a null model. Surprisingly, while the null model correlation on the CB8 set was objectively poor (R² = 0.6 [0.2, 0.8], τ = 0.5 [0.2, 0.8]), the R² and τ statistics obtained by the MMGBSA null model on the OA/TEMOA set was comparable to more expensive methods and, in fact, surpassed many of the challenge entries (Table 3). Many of the best-performing methods obtained essentially statistically indistinguishable correlation statistics, but US-GAFF and RFEC-QMMM obtained the highest R² and τ respectively on the OA/TEMOA set, while DDM-FM and DDM-FM-QMMM scored at the top for the CB8 guest set. When the prediction obtained from a classic force field was corrected with the free energy of moving to a QM/MM potential, the correlation slightly increased, although this difference was not statistically significant. This is the case of RFEC-GAFF2 and DDM-FM, both of which included only the guest in the QM region using PM6-DH+ and DFT(B3LYP) respectively. On the other hand, calculations based on pure QM potentials were generally outperformed by force field and QM/MM models despite the usage of molecular dynamics to collect multiple samples.

Bonus challenge

The platinum atom in CB8-G13 required particular attention during parameterization as this atom is not customarily handled by general small molecule force fields. Even in the case of DFT(B3PW91) and DFT(B3PW91)-D3, the configurations used for the QM calculations were generated by classical molecular dynamics requiring empirical parameters. In general, all the participants to the bonus challenge relied on DFT-level quantum mechanics calculation to address the problem. In MMPBSA-GAFF, DFT(B3PW91), and DFT(B3PW91)-D3, Mulliken charges were generated from DFT(B3LYP), which were subsequently used to determine AM1-BCC charges. A different approach was adopted in DDM-FM-QMMM in which the platinum was substituted by palladium, and the conformations necessary to the force matching parameterization procedure were obtained by MNDO(d) dynamics.

All groups participating to the bonus challenge submitted 1:1 complex predictions also for CB8-G11 and CB8-G12, for which the initial experimental data suggested the possibility of 2:1 complexes (two guests simultaneously bound to one host). This later turned out to be correct only for CB8-G12, and several groups reported to have computationally tested the hypothesis for CB8-G11 with the correct outcome. DDM-AMOEBA was used to estimate affinity of both the 1:1 and 2:1 complexes, but in the end the first one was used in the submission as the two predicted binding free energies differed by only 0.1 kcal/mol. Accordingly, we used the experimental measurement determined for the first binding event to compute the statistics (CB8-G12a in Table 1).

Summary statistics incorporating bonus challenge compounds are reported in Table 4. Although the RMSE generally improves in most cases, it should be noted that this effect varies greatly across the three molecules, and this improvement is mainly due to CB8-G11, whose predictions are regularly much closer to the experimental measurement than the estimates provided for the other two compounds.

View this table:

Table 4. Performance statistics including the bonus challenge molecules.

Root mean square error (RMSE), mean signed error (ME), coefficient of determination (R²), and Kendall correlation coefficient (τ) obtained by all methods applied to the bonus challenge on the full CB8 set (left super column), including the three bonus molecules. Statistics computed excluding the bonus molecules are reported again here (right super column) for easy comparison. Bootstrap distribution mean and 95-percentile confidence intervals are reported between parentheses and square brackets respectively.

Comparison to previous rounds of the SAMPL host-guest binding challenge

Since previous rounds of the host-guest binding challenge featured identical or similar hosts to those tested in SAMPL6, it is possible to compare earlier results and observe the evolution of methodological performance.

Correlation improvements over SAMPL5 were largely driven by fits to prior experimental data

SAMPL5 featured a set of compounds binding to both OA and TEMOA, which will be referred in the following as the OA/TEMOA-5 set to differentiate it from the combined OA/TEMOA set used in this round of the challenge, and, in the top row of Figure 6-A, we show median and fitted distributions of the RMSE and R² statistics taken from the SAMPL5 overview paper [26] together with the results from SAMPL6. OA was used as a test system in SAMPL4 as well, but in this case, only relative free energy predictions were submitted so we cannot draw a direct comparison. Prediction accuracy displays a slight improvement of the median RMSE from the previous round from 3.00 [2.70, 3.60] kcal/mol to 2.76 [1.85, 3.28] kcal/mol (95-percentile bootstrap confidence intervals of the medians not shown in Figure 6-A), but this change seems to be entirely driven by the methods employing experiment-based fit corrections since removing them results in a median RMSE that is essentially identical to SAMPL5. The data raises the question of whether the field is hitting the accuracy limit of current general force fields. On the other hand, the median R² improved with respect to the last round from 0.0 [0.0,0.8] to 0.5 [0.4,0.8]. In this case, the slightly lower SAMPL6 median R² obtained by ignoring methods incorporating experimental data is likely due not to the correction itself but to the fact that the top performing methods were generally submitted with and without correction, thus reducing the density at values closer to unit. Indeed, as already discussed, no positive effect on correlation was evident from the inclusion of a trained linear correction.

Figure 6. CB analogues and distribution of RMSE and R² achieved by methods in SAMPL3 and SAMPL5.

(A) Probability distribution fitting of root mean square error (RMSE, left column) and coefficient of determination (R², right column) achieved by all the methods entering the SAMPL6 (yellow), SAMPL5 (green), and SAMPL3 (purple) challenge. The markers on the x-axis indicate the medians of the distributions. Distributions are shown for all the methods entering the challenge (solid line, square marker), excluding the SAMPL6 entries that used previous experimental data (dotted line, triangle marker), or isolating alchemical and potential of mean force methodologies that did not use an experiment-based correction (dashed line, circle marker). The RMSE axis is truncated to 14 kcal/mol, and a few outlier submissions are not shown. The data shows an essentially identical median RMSE and an increased median correlation on the combined OA/TEMOA guest sets (top row) with respect to the previous round of the challenge. The comparison of the results to different sets of guests binding few cucurbit[n]uril and cucurbit[n]uril-like hosts appearing in SAMPL3 and SAMPL5 (bottom row) shows instead a deteriorated performance in the most recent round of the challenge, which is likely explained by the major complexity of the SAMPL6 C8 guest set. (B) Three-dimensional structures in stick view of the CBClip (top) and H1 (bottom) hosts featuring in SAMPL5 and SAMPL3 respectively. Carbon atoms are represented in gray, nitrogens in blue, oxygens in red, and sulfur atoms in yellow. Hydrogen atoms are not shown.

For a better interpretation of these results, it should be pointed out that these statistics can be largely affected by the particular set of guests tested and the composition of the methods entering the challenge. However, one of the goals of the SAMPL challenge series is to push the community to use techniques that prove more reliable, and the composition of the methods entering the competition is influenced by the results of previous studies. Moreover, limiting the comparison to free energy-based methodologies (e.g., alchemical and potential of mean force calculations) does not change the conclusion, and, in fact, it widens the difference in median R².

Since SOMD calculations entered the SAMPL5 challenge as well [107], we can compare directly the same statistics obtained by the method on the two guest sets to form an idea about the relative complexity of the two sets for free energy methods. To this end, we report in Table 5 the uncertainties of the absolute statistics in terms of the mean and standard deviations (between parentheses) of the bootstrap distributions instead of their 95-percentile confidence intervals to allow a direct comparison to those published in the SAMPL5 overview paper. The results of the SOMD methods applied to the OA/TEMOA-5 were submitted with a restraint and long-range dispersion correction, similarly to SOMD-C-nobuffer here, and without it, similarly to SOMD-A-nobuffer here. The two methods were referred as SOMD-3 and SOMD-1 respectively in the SAMPL5 overview. In both cases, the calculations used GAFF with AM1-BCC charges and TIP3P water molecules as well as a single flat-bottom restraint. The RMSE obtained by SOMD-C-nobuffer increased with respect to the statistic computed for SOMD-3 on OA/TEMOA-5 from 2.1 (2.1 ± 0.3) kcal/mol to 3.0 (3.0 ± 0.4) kcal/mol. Incorporating experimental data into the prediction improved the error as SOMD-D-nobuffer obtained a RMSE of 1.6 (1.6 ± 0.3) kcal/mol. On the other hand, the Kendall correlation coefficient slightly increased on the SAMPL6 dataset from 0.4 (0.4 ± 0.2) to 0.7 (0.7 ± 0.4) while R² remained more or less stationary from the already high value of 0.9 (0.7 ± 0.2) obtained on OA/TEMOA-5. Very similar observations can be made for SOMD-A-nobuffer and SOMD-1. While the improved τ correlation does not rule out the possibility of system-dependent effects on R², it is unlikely for the difference between the median R² of SAMPL5 and SAMPL6 (amounting to 0.76) to be entirely explained by the different set of guests, and the improvement is likely due, at least in part, to the different methodologies entering the challenge. In particular, SAMPL5 featured several free energy methods that scored near-zero R² on the OA/TEMOA-5 set, affecting considerably the SAMPL5 median statistic. One of these methods is BEDAM, which used the OPLS-2005 [108, 109] force field and the implicit solvent model AGBNP2 [110], none of which entered the latest round of the challenge. However, the rest of these methods consist of double decoupling calculations carried out either with thermodynamic integration (TI) [111, 112] or HREX and BAR that employed CGenFF and TIP3P, which performed relatively well in SAMPL6 on OA/TEMOA. It should be noted that the TI and HREX/BAR methodologies in SAMPL5 made use of a Boresch-style restraint [113] harmonically constraining one distance, two angles, and three dihedrals. This is similar to the solution adopted in DDM-GAFF in SAMPL6, which also showed a relatively low R² compared to the other free energy submissions in the same round of the challenge so it is natural to suspect that it may be particularly challenging to treat this class of host-guest systems with this type of restraint in alchemical calculations.

View this table:

Table 5. Offset statistics of the methods appearing in previous rounds of the SAMPL host-guest binding challenge.

Offset root mean square error (RMSE_a), coefficient of determination , and Kendall correlation coefficient (τ₀) computed by subtracting the mean signed error from the free energy prediction. Absolute statistics are identical to those presented before, but, consistently with the format adopted in the SAMPL5 host-guest binding challenge overview paper, we report mean ± standard deviation of the bootstrap distribution between parentheses.

An improvement can also be observed for the movable type method, which was applied to the OA/TEMOA-5 set as well [114] using the KECSA 1 and KECSA 2 potentials. These two submissions, identified with MovTyp-1 and MovTyp-2 respectively in the SAMPL5 overview paper, obtained similar statistics so we will use MovTyp-2 for the comparison. The SAMPL6 entry MovTyp-KT1N, which uses the KECSA energy model too, obtained a comparable RMSE of 2.9 (2.9 ± 0.2) kcal/mol against the 3.1 (2.9 ± 1.1) kcal/mol achieved by MovTyp-2 on OA/TEMOA-5, but, even in this case, the error becomes statistically distinguishable once the experimental-based correction is included (i.e., in MovTyp-KT1L), which decreases the RMSE to 1.0 kcal/mol. The correlation statistics generally compare favorably with respect to SAMPL5 with R² moving from 0.0 (0.3 ± 0.3) to 0.5 (0.5 ± 0.2) and τ going from 0.1 (0.1 ± 0.3) to 0.3 (0.3 ± 0.2), although the uncertainties are too large to achieve statistical significance. Moreover, MovTyp-GE3N, which employs the more recently developed GARF energy model, obtained a better RMSE (1.8 (1.8 ± 0.4) kcal/mol) and comparable correlation statistics to MovTyp-KT1N.

Finally, it seems appropriate to compare the performance of DFT(TPSS)-D3 on OA/TEMOA to DFT/TPSS-c [115] in SAMPL5 and RRHO-551 [116] in SAMPL4 [25]. DFT(TPSS)-D3 an DFT/TPSS-c are very similar in that they both use the DFT-D3 approach to include dispersion correction, but while DFT(TPSS)-D3 generated an ensemble of configurations with MD, DFT/TPSS-c estimated the binding free energy from a single minimized structure. On the other hand, RRHO-551 does use MD for conformational sampling, but it employs DTF-D to correct for dispersion interactions, which was developed earlier than DFT-D3. As already mentioned, SAMPL4 featured a set of OA guests [25], but only relative free energy predictions were submitted so absolute statistics are not available. Thus, in order to facilitate the comparison, we decided to report offset statistics for the subset of the SAMPL6 methods analyzed in this section in the same way they were computed in the previous two rounds of the challenge. The results are given in Table 5. The RMSE of the two models was relatively similar in SAMPL4 and SAMPL5: 5.8 ± 1.6 kcal/mol for RRHO-551 and 5.3 (5.2 ± 0.8) kcal/mol for DFT/TPSS-c, where the estimate for RRHO-551 does not include the mean of the statistic bootstrap distribution, which was not reported in the SAMPL4 overview paper. However, the SAMPL6 DFT(TPSS)-D3 calculations attained a lower error (2.6 (2.5 ± 0.4) kcal/mol) while maintaining a similar coefficient of determination of 0.5 (0.5 ± 0.2) against the 0.3 (0.4 ± 0.2) and 0.5 ± 0.4 of DFT/TPSS-c and RRHO-551 respectively.

The SAMPL6 CB8 system presents significant challenges to modern methodologies

A different perspective is offered by the history of the binding free energy predictions involving cucurbituril hosts. CB8 and the closely related CB7 appeared previously in SAMPL3 [49] together with an acyclic cucurbit[n]uril-type molecular container referred to as H1 [117]. Moreover, SAMPL5 featured another acyclic CB analogue called CBClip [50]. In Figure 6-A (bottom row), we show the distribution of RMSE and R² computed from the binding free energy predictions submitted for SAMPL3 and SAMPL5 against these four hosts. The 3D structures of H1 and CBClip are shown in Figure 6-B.

An interesting pattern emerging from the data is that simulation-based free energy methods entering the three SAMPL challenges considered here, which encompass a variety energy models and both alchemical and PMF methodologies, always obtained equal or slightly greater median RMSE with respect to the global RMSE computed across all methods but also greater median R² than the global median coefficient of determination. In general, however, both statistics appear to have deteriorated from SAMPL3 to SAMPL5. Even though H1 and CBClip are sufficiently different for system-dependent effects to reasonably dominate the overall performance, the most marked difference appears from the comparison of the SAMPL6 predictions to those submitted for CB7 and CB8 in SAMPL3, which achieved a much greater R² and none of which involved simulation-based methods. The explanation for this inequality is likely to be found in the complexity of the guest sets rather than a methodological regression as SAMPL3 featured only two relatively simple fragment-like binders while the latest round of the challenge included compounds of moderate size and/or complex stereochemistry (e.g., gallamine triethiodate, quinine). That the CB8 guests in SAMPL6 were particularly challenging is corroborated by the comparison between the performance of DDM-AMOEBA and the results obtained by BAR-560, which also uses the double decoupling method and the AMOEBA polarizable force field, on the CB7 guests in SAMPL4 [118]. In this case as well, only offset statistics are available for comparison as SAMPL4 accepted exclusively relative free energy predictions. DDM-AMOEBA generally performed worse on the CB8 guest set featured in SAMPL6 with R² decreasing from 0.6 ± 0.1 to 0.1 (0.3 ± 0.2) and RMSE increasing from 2.2 ± 0.4 to 3.2 (3.0 ± 0.7). While the CB8 guest set featured in SAMPL6 highlights the limits of current free energy methodologies, it also uncovers new learning opportunities that can be exploited to push the boundaries of the domain of applicability of these technologies.

Discussion

As in previous years, the SAMPL host-guest binding challenge has provided an opportunity for the computational chemistry community to focus on a common set of systems to assess the state-of-the-art practices and performance of current binding free energy calculation methodologies. The value of the blind challenge does not lie exclusively in the comparison and benchmarking of different methods, but also in its ability to highlight general areas of weakness in the field as a whole on which the community can focus. The latter aspect, in particular, risks to become of secondary importance in retrospective studies. Moreover, the consistent use of octa-acid and cucurbiturils since SAMPL3, which took place in 2011, give us the opportunity to make general observations over a longer time span.

The variability in difficulty highlights the need to evaluate methodologies on the same systems

Several recurring themes have emerged from this and previous rounds of the challenge. Firstly, even for systems relatively simple as supramolecular host-guests, the performance of free energy methodologies and models can be heavily system-dependent. This is evident not only from the results of the same method applied to different guest sets, but also from the relative performance of the methods against different molecules. For example, most of the prediction employing GAFF obtained among the highest correlation statistics on the OA/TEMOA set while ranking among the lowest positions on the CB8 set. This stresses the importance of using the same set of systems when comparing multiple methodologies, which, without any coordination between groups, is a difficult task to carry out on a medium-large scale given the amount of expertise and resources necessary to perform this type of studies.

Force field accuracy is a dominant limiting factor for modeling affinity

A second consideration surfacing from previous SAMPL rounds as well is the tendency of classical methods to overestimate the binding affinities. Since the results of the related SAMPLing challenge support the claim that convergence for this class of systems is achievable [104], and considering that the RMSE has not improved significantly across rounds of the challenge, this seem to suggest that an investment of resources into improving the empirical parameters of force fields and solvent models could have a dramatic impact. It should be noted that, while these systems do not put to the test protein parameters, they rely on general force fields that are routinely used in drug and small molecule design.

Other missing chemical details may also be major limiting factors

However, the problem of missing details of the chemical environment such as salts and alternative protomers cannot be ruled out as a major determinant of predictive accuracy. Explicitly modeling the buffer salt concentrations in the SOMD-C predictions reduced the RMSE from 7.9 to 5.1 kcal/mol for two simulations otherwise identical, and, curiously, it had the opposite effect of increasing the error statistics on the OA/TEMOA set. Despite the sensitivity of the free energy prediction to the presence of ions, a lack of standard best practices emerges from the challenge entries. Many participants decided to add only neutralizing counterions or use a uniform neutralizing charge, and others did not include information about how the buffer was modeled in the submitted method sections, which possibly reflects a generally minor role currently played by this particular aspect of the decision-making process during the modeling step in comparison to other elements (e.g., charges force field parameters, water model).

Even at extreme pH, protonation state effects may still contribute

Moreover, the possible influence on the binding free energy of multiple accessible protonation states of the guest compounds was left unexplored during the challenge, mirroring the widespread tendency in the free energy literature to neglect its effect, and participants largely used the most likely protonation states predicted by Epik that were provided in the input mol2 and sdf files. However, the pK_a free energy penalties estimated by Epik for the second most probable protonation state of the CB8 guests in water at experimental pH (Table 6), which is obtained in all cases by the deprotonation of the charged nitrogen atoms as given in Figure 1, suggest that for several guests, and in particular for CB8-G3 and CB8-G11, the deprotonated state is accessible by paying a cost of a few k_BT (where k_B is the Boltzmann constant and T is the temperature), and a change in relative populations between the end states driven by the hydrophobic binding cavity may have a non-negligible effect on the binding affinity. Furthermore, even if the probability of having the carboxyl group of the octa-acid guests protonated at pH 11.7 is usually neglected, a previous study performed for SAMPL5 showed that modeling changes in protonation state populations upon binding resulted in improved predictive performance for a set of OA and TEMOA guests that, similarly to the latest round of the challenge, included several carboxylic acids and was measured at a similar buffer pH [119]. Similarly to buffer salts, there are no established practices in the community to treat multiple protonation states in free energy calculations, but further development and testing of force fields and solvent models with the goal of improving accuracy to experiments should consider these issues as ignoring them during the fitting procedure could push the error caused by missing essential chemicals (e.g., ions, protonation and tautomeric states) to other force field parameters with the risk of decreasing the transferability of the model.

View this table:

Table 6. pK_a free energy penalties predicted by Epik for the second most likely protonation state of the CB8 guests.

In all cases, the second most probable protonation state predicted by Epik can be obtained by removing the nitrogen proton of the dominant state. The estimated free energy penalties to access the deprotonated state are reported in kcal/mol and units of k_B T (between parentheses), where k_B is the Boltzmann constant and T is the temperature, taken to be 298 K. For all the other compounds, including the octa-acid guests, Epik was not able to find a second protonation state within a tolerance of 3 pH units.

Linear corrections fit to prior experimental measurements do not improve predictive utility

The experimental-based correction adopted by several groups introduces a new theme in the challenge which pertains to strategies that can be used to inject previous knowledge into molecular simulations. Force field parameters are in principle capable of incorporating experimental data, but an update of the model driven by binding free energy measurements or other ensemble observables is doubtlessly challenging and may involve calculations as expensive as the production calculations so this is normally not routinely viable, although previous studies indicated the validity and feasibility of such an approach [120, 121]. Other schemes that emerged in particular from the field of crystallographic structural refinement avoid modifying the force field parameters and instead add one or more biasing terms to the simulation to replicate experimental measurements that the underlying force field cannot reproduce [122, 123]. The simple linear corrections used independently by various participants in this round of the challenge had a positive impact on the error, but a very small effect in terms of correlation, which is often of central importance in the context of molecular design. However, the simplicity of its application, which is confined entirely to the post-processing step, was such that the participants were able to submit multiple entries with and without the correction.

Outlook for future SAMPL host-guest challenges

The SAMPL roadmap [124] outlines a proposal for subsequent host-guest challenges for SAMPL7–10. While the future of these blind exercises is uncertain given the absence of a sustainable funding source, we briefly review the likely future design of these host-guest challenges below.

In one line of exploration ([124], section 2.2), SAMPL7 proposes to explore variants of Gibb deep cavity cavitands (related to OA/TEMOA) in which carboxylate substitutent locations are modified, comparing multiple host variants against a set of guests to explore how well affinities and selectivities could be predicted. SAMPL8 would provide a second iteration of this experiment with novel guests and a trimethylammonium-substituted host variant to assess how algorithmic improvements from the first round could lead to improved performance. SAMPL9–10 would consider the effect of common biologically relevant salts, comparing the effects of NaCl and NaI on various host variants, while SAMPL11 would consider the effects of cosolvents that might compete for the binding site or modulate the strength of the hydrophobic effect

In another line of exploration ([124], section 2.1), SAMPL7-11 are also proposed to feature cucubituril variants, including methylated forms of CB8, glcoyuracil hexamer, and acyclic forms of CB[n]-type receptors. By comparing the constrained cyclic and less constrained acyclic forms of CB[n] hosts, the accuracy with which participants can model the energetics of receptor flexibility and receptor desolvation can be probed. SAMPL8–9 also plans to feature small molecule guests with pKa values between 3.8–7.4, which brings the possibility that host binding can induce substantian shifts in protonation state.

Finally, recent work by one of the authors has demonstrated how a library of monostubstituted β- cyclodextrin analogues can be generated via a simple chemical route [125]. This strategy could ultimately lead to the attachment of chemical groups that resemble biopolymer residues, such as amino or nucleic acids, allowing interactions between small druglike molecules and biopolymer-like functional groups to be probed without the multifold challenges that protein-ligand interactions present. While development of this system is still ongoing, it is likely to make an appearance in upcoming SAMPL host-guest challenges.

Code and data availability

Input files and setup scripts: https://github.com/MobleyLab/SAMPL6/tree/master/host_guest/
Analysis scripts: https://github.com/MobleyLab/SAMPL6/tree/master/host_guest/Analysis/Scripts/
Analysis results: https://github.com/MobleyLab/SAMPL6/tree/master/host_guest/Analysis/
Participants’ submissions: https://github.com/MobleyLab/SAMPL6/tree/master/host_guest/Analysis/Submissions

Author Contributions

Conceptualization, AR, JDC, DLM; Methodology, AR, JDC, DLM; Software, AR; Formal Analysis, AR, JDC; Investigation, AR, QY, SM, MS, JNM; Resources, JDC, BCG, LI, MWC, MKG, DLM; Data Curation, AR, MWC; Writing-Original Draft, AR, JDC; Writing - Review and Editing, AR, JDC, DLM, MKG, LI, BCG, SM; Visualization, AR, SM; Supervision, JDC, DLM; Project Administration, AR, JDC, DLM; Funding Acquisition, JDC, DLM, MKG, BCG, LI.

Disclosures

JDC is a member of the Scientific Advisory Board for Schrödinger, LLC. DLM is a member of the Scientific Advisory Board of OpenEye Scientific Software.

Acknowledgments

AR and JDC acknowledge support from the Sloan Kettering Institute. JDC acknowledges support from NIH grant P30 CA008748. AR acknowledges partial support from the Tri-Institutional Program in Computational Biology and Medicine. LI thanks the National Science Foundation for supporting (CHE-1404911) the participation in SAMPL6. DLM appreciates financial support from the National Institutes of Health (1R01GM108889-01), the National Science Foundation (CHE 1352608). AR and JDC are grateful to OpenEye Scientific for providing a free academic software license for use in this work.

Footnotes

↵¹ Technically, this is an affine correction, but we will refer to it as linear here.

List of abbreviations

AM1-BCC: Austin model 1 bond charge correction [64, 65]
AMOEBA: atomic multipole optimized energetics for biomolecular simulation [87]
B3LYP: Becke 3-parameter Lee-Yang-Parr exchange-correlation functional [96]
B3PW91: Becke 3-parameter Perdew-Wang 91 exchange-correlation functional [96]
CGenFF: CHARMM generalized force field [91]
COSMO-RS: conductor-like screening model for real solvents [72]
DDM: double decoupling method [73]
DFT-D3: density functional theory with the D3 dispersion corrections [98]
FM: Force Matching [74]
FSDAM: Fast switching double annihilation method [75, 76]
GAFF: generalized AMBER force field [89]
HREX: Hamiltonian replica exchange [99]
KECSA: knowledge-based and empirical combined scoring algorithm [92]
KMTISM: KECSA-Movable Type Implicit Solvation Model [77]
MD: molecular dynamics
MMPBSA: molecular mechanics Poisson Boltzmann/solvent accessible surface area [103]
MovTyp: Movable Type method [78]
OPLS3: optimized potential for liquid simulations [71]
PBSA: Poisson-Boltzmann surface area [79]
PM6-DH+: PM6 semiempirical method with dispersion and hydrogen bonding corrections [94, 95]
RESP: restrained electrostatic potential [90]
REST: replica exchange with solute torsional tempering [80, 81]
RFEC: relative free energy calculation
QM/MM: mixed quantum mechanics and molecular mechanics
SOMD: double annihilation or decoupling method performed with Sire/OpenMM6.3 software [82, 83]
SQM: semi-empirical quantum mechanics
TIP3P: transferable interaction potential three-point [86]
TPSS: Tao, Perdew, Staroverov, and Scuseria exchange functional [97]
US: umbrella sampling [84]
VSGB2.1: VSGB2.0 solvation model refit to OPLS2.1/3/3e [85]

References

[1].↵
Cournia Z, Allen B, Sherman W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J Chem Inf Model. 2017 Dec; 57(12):2911–2937. doi: 10.1021/acs.jcim.7b00564.
OpenUrl CrossRef
[2].
Abel R, Wang L, Mobley DL, Friesner RA. A Critical Review of Validation, Blind Testing, and Real-World Use of Alchemical Protein-Ligand Binding Free Energy Calculations. Curr Top Med Chem. 2017 Aug; 17(23). doi: 10.2174/1568026617666170414142131.
OpenUrl CrossRef
[3].
Abel R, Bhat S. Free Energy Calculation Guided Virtual Screening of Synthetically Feasible Ligand R-Group and Scaffold Modifications: An Emerging Paradigm for Lead Optimization. In: Annual Reports in Medicinal Chemistry, vol. 50 Elsevier; 2017.p. 237–262. doi: 10.1016/bs.armc.2017.08.007.
OpenUrl CrossRef
[4].
Abel R, Wang L, Harder ED, Berne BJ, Friesner RA. Advancing Drug Discovery through Enhanced Free Energy Calculations. Acc Chem Res. 2017 Jul; 50(7):1625–1632. doi: 10.1021/acs.accounts.7b00083.
OpenUrl CrossRef
[5].
Kuhn B, Tichý M, Wang L, Robinson S, Martin RE, Kuglstatter A, Benz J, Giroud M, Schirmeister T, Abel R, Diederich F, Hert J. Prospective Evaluation of Free Energy Calculations for the Prioritization of Cathepsin L Inhibitors. J Med Chem. 2017 Mar; 60(6):2485–2497. doi: 10.1021/acs.jmedchem.6b01881.
OpenUrl CrossRef
[6].↵
Abel R, Mondal S, Masse C, Greenwood J, Harriman G, Ashwell MA, Bhat S, Wester R, Frye L, Kapeller R, Friesner RA. Accelerating Drug Discovery through Tight Integration of Expert Molecular Design and Predictive Scoring. Curr Opin Struct Biol. 2017 Apr; 43:38–44. doi: 10.1016/j.sbi.2016.10.007.
OpenUrl CrossRef
[7].↵
Shirts MR, Mobley DL, Brown SP. Free energy calculations in structure-based drug design. Drug design: structure- and ligand-based approaches. 2010; p. 61–86. doi: 10.1016/j.sbi.2016.10.007.
OpenUrl CrossRef
[8].↵
Wang L, Wu Y, Deng Y, Kim B, Pierce L, Krilov G, Lupyan D, Robinson S, Dahlgren MK, Greenwood J, Romero DL, Masse C, Knight JL, Steinbrecher T, Beuming T, Damm W, Harder E, Sherman W, Brewer M, Wester R, et al. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J Am Chem Soc. 2015 Feb; 137(7):2695–2703. doi: 10.1021/ja512751q.
OpenUrl CrossRef PubMed
[9].↵
Aldeghi M, Heifetz A, Bodkin MJ, Knapp S, Biggin PC. Predictions of Ligand Selectivity from Absolute Binding Free Energy Calculations. J Am Chem Soc. 2017 Jan; 139(2):946–957. doi: 10.1021/jacs.6b11467.
OpenUrl CrossRef
[10].↵
Mobley DL, Gilson MK. Predicting Binding Free Energies: Frontiers and Benchmarks. 2016 Dec; doi: 10.1101/074625.
OpenUrl Abstract/FREE Full Text
[11].↵
Sultan MM, Denny RA, Unwalla R, Lovering F, Pande VS. Millisecond Dynamics of BTK Reveal Kinome-Wide Conformational Plasticity within the Apo Kinase Domain. Sci Rep. 2017 Dec; 7(1). doi: 10.1038/s41598-017-10697-0.
OpenUrl CrossRef
[12].↵
Kohlhoff KJ, Shukla D, Lawrenz M, Bowman GR, Konerding DE, Belov D, Altman RB, Pande VS. Cloud-Based Simulations on Google Exacycle Reveal Ligand Modulation of GPCR Activation Pathways. Nat Chem. 2014 Jan; 6(1):15–21. doi: 10.1038/nchem.1821.
OpenUrl CrossRef PubMed
[13].↵
Klepeis JL, Lindorff-Larsen K, Dror RO, Shaw DE. Long-Timescale Molecular Dynamics Simulations of Protein Structure and Function. Curr Opin Struct Biol. 2009 Apr; 19(2):120–127. doi: 10.1016/j.sbi.2009.03.004.
OpenUrl CrossRef PubMed Web of Science
[14].↵
Jordan IK, Kondrashov FA, Adzhubei IA, Wolf YI, Koonin EV, Kondrashov AS, Sunyaev S. A Universal Trend of Amino Acid Gain and Loss in Protein Evolution. Nature. 2005 Feb; 433(7026):633–638. doi: 10.1038/nature03306.
OpenUrl CrossRef PubMed Web of Science
[15].↵
Aguilar B, Anandakrishnan R, Ruscio JZ, Onufriev AV. Statistics and Physical Origins of pK and Ionization State Changes upon Protein-Ligand Binding. Biophys J. 2010 Mar; 98(5):872–880. doi: 10.1016/j.bpj.2009.11.016.
OpenUrl CrossRef PubMed Web of Science
[16].↵
Rekharsky MV, Mori T, Yang C, Ko YH, Selvapalam N, Kim H, Sobransingh D, Kaifer AE, Liu S, Isaacs L, Chen W, Moghaddam S, Gilson MK, Kim K, Inoue Y. A Synthetic Host-Guest System Achieves Avidin-Biotin Affinity by Over-coming Enthalpy–entropy Compensation. PNAS. 2007 Dec; 104(52):20737–20742. doi: 10.1073/pnas.0706407105.
OpenUrl Abstract/FREE Full Text
[17].
Moghaddam S, Inoue Y, Gilson MK. Host-Guest Complexes with Protein-Ligand-like Affinities: Computational Analysis and Design. J Am Chem Soc. 2009 Mar; 131(11):4012–4021. doi: 10.1021/ja808175m.
OpenUrl CrossRef PubMed
[18].↵
Moghaddam S, Yang C, Rekharsky M, Ko YH, Kim K, Inoue Y, Gilson MK. New Ultrahigh Affinity Host-Guest Complexes of Cucurbit[7]Uril with Bicyclo[2.2.2]Octane and Adamantane Guests: Thermodynamic Analysis and Evaluation of M2 Affinity Calculations. J Am Chem Soc. 2011 Mar; 133(10):3570–3581. doi: 10.1021/ja109904u.
OpenUrl CrossRef PubMed
[19].↵
Gibb CLD, Gibb BC. Binding of Cyclic Carboxylates to Octa-Acid Deep-Cavity Cavitand. J Comput Aided Mol Des. 2013 Nov; 28(4):319–325. doi: 10.1007/s10822-013-9690-2.
OpenUrl CrossRef
[20].
Cao L, Isaacs L. Absolute and Relative Binding Affinity of Cucurbit[7]Uril towards a Series of Cationic Guests. Supramol Chem. 2014 Mar; 26(3-4):251–258. doi: 10.1080/10610278.2013.852674.
OpenUrl CrossRef
[21].↵
Sullivan MR, Sokkalingam P, Nguyen T, Donahue JP, Gibb BC. Binding of Carboxylate and Trimethylammonium Salts to Octa-Acid and TEMOA Deep-Cavity Cavitands. J Comput Aided Mol Des. 2017; 31(1):1–8. doi: 10.1007/s10822- 016-9925-0.
OpenUrl CrossRef
[22].↵
Sullivan MR, Sokkalingam P, Nguyen T, Donahue JP, Gibb BC. Binding of Carboxylate and Trimethylammonium Salts to Octa-Acid and TEMOA Deep-Cavity Cavitands. J Comput Aided Mol Des. 2017 Jan; 31(1):21–28. doi: 10.1007/s10822-016-9925-0.
OpenUrl CrossRef
[23].↵
Muddana HS, Varnado CD, Bielawski CW, Urbach AR, Isaacs L, Geballe MT, Gilson MK. Blind Prediction of Host–guest Binding Affinities: A New SAMPL3 Challenge. J Comput Aided Mol Des. 2012 Feb; 26(5):475–487. doi: 10.1007/s10822- 012-9554-1.
OpenUrl CrossRef PubMed
[24].
Skillman AG. SAMPL3: Blinded Prediction of Host–guest Binding Affinities, Hydration Free Energies, and Trypsin Inhibitors. J Comput Aided Mol Des. 2012 May; 26(5):473–474. doi: 10.1007/s10822-012-9580-z.
OpenUrl CrossRef PubMed
[25].↵
Muddana HS, Fenley AT, Mobley DL, Gilson MK. The SAMPL4 Host–guest Blind Prediction Challenge: An Overview. J Comput Aided Mol Des. 2014 Mar; 28(4):305–317. doi: 10.1007/s10822-014-9735-1.
OpenUrl CrossRef
[26].↵
Yin J, Henriksen NM, Slochower DR, Shirts MR, Chiu MW, Mobley DL, Gilson MK. Overview of the SAMPL5 Host–guest Challenge: Are We Doing Better? J Comput Aided Mol Des. 2017; 31(1):1–19. doi: 10.1007/s10822-016-9974-4.
OpenUrl CrossRef
[27].↵
Mobley DL, Chodera JD, Isaacs L, Gibb BC. Advancing predictive modeling through focused development of model systems to drive new modeling innovations. UC Irvine: Department of Pharmaceutical Sciences, UCI. 2016; https://escholarship.org/uc/item/7cf8c6cr.
[28].↵
Drug Design Data Resource, SAMPL;. https://drugdesigndata.org/about/sampl.
[29].
Nicholls A, Mobley DL, Guthrie JP, Chodera JD, Bayly CI, Cooper MD, Pande VS. Predicting Small-Molecule Solvation Free Energies: An Informal Blind Test for Computational Chemistry. J Med Chem. 2008 Feb; 51(4):769–779. doi: 10.1021/jm070549+.
OpenUrl CrossRef PubMed
[30].
Guthrie JP. A Blind Challenge for Computational Solvation Free Energies: Introduction and Overview. J Phys Chem B. 2009 Jan; 113(14):4501–4507. doi: 10.1021/jp806724u.
OpenUrl CrossRef PubMed
[31].
Skillman AG, Geballe MT, Nicholls A. SAMPL2 Challenge: Prediction of Solvation Energies and Tautomer Ratios. J Comput Aided Mol Des. 2010 Apr; 24(4):257–258. doi: 10.1007/s10822-010-9358-0.
OpenUrl CrossRef PubMed
[32].
Geballe MT, Skillman AG, Nicholls A, Guthrie JP, Taylor PJ. The SAMPL2 Blind Prediction Challenge: Introduction and Overview. J Comput Aided Mol Des. 2010 May; 24(4):259–279. doi: 10.1007/s10822-010-9350-8.
OpenUrl CrossRef PubMed
[33].
Geballe MT, Guthrie JP. The SAMPL3 Blind Prediction Challenge: Transfer Energy Overview. J Comput Aided Mol Des. 2012 Apr; 26(5):489–496. doi: 10.1007/s10822-012-9568-8.
OpenUrl CrossRef PubMed
[34].
Guthrie JP. SAMPL4, a Blind Challenge for Computational Solvation Free Energies: The Compounds Considered. J Comput Aided Mol Des. 2014 Apr; 28(3):151–168. doi: 10.1007/s10822-014-9738-y.
OpenUrl CrossRef
[35].
Mobley DL, Wymer KL, Lim NM, Guthrie JP. Blind Prediction of Solvation Free Energies from the SAMPL4 Challenge. J Comput Aided Mol Des. 2014 Mar; 28(3):135–150. doi: 10.1007/s10822-014-9718-2.
OpenUrl CrossRef
[36].
Mobley DL, Liu S, Lim NM, Wymer KL, Perryman AL, Forli S, Deng N, Su J, Branson K, Olson A J. Blind Prediction of HIV Integrase Binding from the SAMPL4 Challenge. J Comput Aided Mol Des. 2014 Mar; 28(4):327–345. doi: 10.1007/s10822-014-9723-5.
OpenUrl CrossRef
[37].
Bannan CC, Burley KH, Chiu M, Shirts MR, Gilson MK, Mobley DL. Blind Prediction of Cyclohexane–water Distribution Coefficients from the SAMPL5 Challenge. J Comput Aided Mol Des. 2016 Sep; 30(11):1–18. doi: 10.1007/s10822-016- 9954-8.
OpenUrl CrossRef
[38].
Bhakat S, Söderhjelm P. Resolving the Problem of Trapped Water in Binding Cavities: Prediction of Host-Guest Binding Free Energies in the SAMPL5 Challenge by Funnel Metadynamics. J Comput Aided Mol Des. 2017; 31(1):119–132. doi: 10.1007/s10822-016-9948-6.
OpenUrl CrossRef
[39].↵
Henriksen NM, Fenley AT, Gilson MK. Computational Calorimetry: High-Precision Calculation of Host–Guest Binding Thermodynamics. J Chem Theory Comput. 2015 Sep; 11(9):4377–4394. doi: 10.1021/acs.jctc.5b00405.
OpenUrl CrossRef
[40].↵
Yin J, Fenley AT, Henriksen NM, Gilson MK. Toward Improved Force-Field Accuracy through Sensitivity Analysis of Host-Guest Binding Thermodynamics. J Phys Chem B. 2015 Aug; 119(32):10145–10155. doi: 10.1021/acs.jpcb.5b04262.
OpenUrl CrossRef
[41].↵
Gibb CL, Gibb BC. Well-defined, organic nanoenvironments in water: the hydrophobic effect drives a capsular assembly. Journal of the American Chemical Society. 2004; 126(37):11408–11409. doi: 10.1021/ja0475611.
OpenUrl CrossRef PubMed Web of Science
[42].
Hillyer MB, Gibb CL, Sokkalingam P, Jordan JH, Ioup SE, Gibb BC. Synthesis of water-soluble deep-cavity cavitands. Organic letters. 2016; 18(16):4048–4051. doi: 10.1021/acs.orglett.6b01903.
OpenUrl CrossRef
[43].↵
Mobley DL, Gilson MK. Predicting binding free energies: Frontiers and benchmarks. Annual review of biophysics. 2017; 46:531–558. doi: 10.1146/annurev-biophys-070816-033654.
OpenUrl CrossRef
[44].↵
Mobley DL, Heinzelmann G, Henriksen NM, Gilson MK. Predicting binding free energies: Frontiers and benchmarks (a perpetual review). UC Irvine: Department of Pharmaceutical Sciences, UCI. 2017; https://escholarship.org/uc/item/9p37m6bq.
[45].↵
Freeman W, Mock W, Shih N. Cucurbituril. Journal of the American Chemical Society. 1981; 103(24):7367–7368. doi: 10.1021/ja00414a070.
OpenUrl CrossRef Web of Science
[46].
Mock W, Shih N. Host-guest binding capacity of cucurbituril. The Journal of Organic Chemistry. 1983; 48(20):3618–3619. doi: 10.1021/jo00168a069.
OpenUrl CrossRef
[47].↵
Liu S, Ruspic C, Mukhopadhyay P, Chakrabarti S, Zavalij PY, Isaacs L. The cucurbit[n]uril family: prime components for self-sorting systems. Journal of the American Chemical Society. 2005; 127(45):15959–15967. doi: 10.1021/ja055013x.
OpenUrl CrossRef PubMed Web of Science
[48].↵
Gan H, Benjamin CJ, Gibb BC. Nonmonotonic assembly of a deep-cavity cavitand. Journal of the American Chemical Society. 2011; 133(13):4770–4773. doi: 10.1021/ja200633d.
OpenUrl CrossRef PubMed
[49].↵
Muddana HS, Gilson MK. Prediction of SAMPL3 Host–guest Binding Affinities: Evaluating the Accuracy of Generalized Force-Fields. J Comput Aided Mol Des. 2012 Jan; 26(5):517–525. doi: 10.1007/s10822-012-9544-3.
OpenUrl CrossRef PubMed
[50].↵
Zhang B, Isaacs L. Acyclic cucurbit[n]uril-type molecular containers: influence of aromatic walls on their function as solubilizing excipients for insoluble drugs. Journal of medicinal chemistry. 2014; 57(22):9554–9563. doi: 10.1021/jm501276u.
OpenUrl CrossRef
[51].↵
Mobley DL, Chodera JD, Dill KA. On the use of orientational restraints and symmetry corrections in alchemical free energy calculations. The Journal of chemical physics. 2006; 125(8):084902. doi: 10.1063/1.2221683.
OpenUrl CrossRef PubMed
[52].↵
Ewell J, Gibb BC, Rick SW. Water inside a hydrophobic cavitand molecule. The Journal of Physical Chemistry B. 2008; 112(33):10272–10279. doi: 10.1021/jp804429n.
OpenUrl CrossRef
[53].↵
Rogers KE, Ortiz-Sánchez JM, Baron R, Fajer M, de Oliveira CAF, McCammon JA. On the role of dewetting transitions in host–guest binding free energy calculations. Journal of chemical theory and computation. 2012; 9(1):46–53. doi: 10.1021/ct300515n.
OpenUrl CrossRef
[54].↵
Gibb CL, Gibb BC. Anion binding to hydrophobic concavity is central to the salting-in effects of Hofmeister chaotropes. Journal of the American Chemical Society. 2011; 133(19):7344–7347. doi: 10.1021/ja202308n.
OpenUrl CrossRef PubMed
[55].
Hsiao YW, Söderhjelm P. Prediction of SAMPL4 host–guest binding affinities using funnel metadynamics. Journal of computer-aided molecular design. 2014; 28(4):443–454. doi: 10.1007/s10822-014-9724-4.
OpenUrl CrossRef
[56].
Muddana HS, Yin J, Sapra NV, Fenley AT, Gilson MK. Blind prediction of SAMPL4 cucurbit[7]uril binding affinities with the mining minima method. Journal of computer-aided molecular design. 2014; 28(4):463–474. doi: 10.1007/s10822- 014-9726-2.
OpenUrl CrossRef
[57].
Moghaddam S, Yang C, Rekharsky M, Ko YH, Kim K, Inoue Y, Gilson MK. New ultrahigh affinity host-guest complexes of cucurbit[7]uril with bicyclo[2.2.2]octane and adamantane guests: Thermodynamic analysis and evaluation of m2 aZnity calculations. Journal of the American Chemical Society. 2011; 133(10):3570–3581. doi: 10.1021/ja109904u.
OpenUrl CrossRef PubMed
[58].↵
Rekharsky MV, Ko YH, Selvapalam N, Kim K, Inoue Y. Complexation thermodynamics of cucurbit[6]uril with aliphatic alcohols, amines, and diamines. Supramolecular Chemistry. 2007; 19(1-2):39–46. doi: 10.1080/10610270600915292.
OpenUrl CrossRef
[59].
Murkli S, McNeill JN, Isaacs L. Cucurbit[8]uril Guest Complexes: Blinded Dataset for the SAMPL6 Challenge. Supramolecular Chemistry. submitted; XX.
[60].↵
Boyce SE, Tellinghuisen J, Chodera JD. Avoiding accuracy-limiting pitfalls in the study of protein-ligand interactions with isothermal titration calorimetry. bioRxiv. 2015; p. 023796. doi: 10.1101/023796.
OpenUrl Abstract/FREE Full Text
[61].↵
Hawkins PC, Skillman AG, Warren GL, Ellingson BA, Stahl MT. Conformer generation with OMEGA: algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database. Journal of chemical information and modeling. 2010; 50(4):572–584. doi: 10.1021/ci100031x.
OpenUrl CrossRef PubMed
[62].↵
McGann M. FRED pose prediction and virtual screening accuracy. Journal of chemical information and modeling. 2011; 51(3):578–596. doi: 10.1021/ci100436p.
OpenUrl CrossRef PubMed
[63].↵
McGann M. FRED and HYBRID docking performance on standardized datasets. Journal of computer-aided molecular design. 2012; 26(8):897–906. doi: 10.1007/s10822-012-9584-8.
OpenUrl CrossRef PubMed
[64].↵
Jakalian A, Bush BL, Jack DB, Bayly CI. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: I. Method. Journal of computational chemistry. 2000; 21(2):132–146.
OpenUrl CrossRef Web of Science
[65].↵
Jakalian A, Jack DB, Bayly CI. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. Journal of computational chemistry. 2002; 23(16):1623–1641.
OpenUrl CrossRef PubMed Web of Science
[66].↵
Shelley JC, Cholleti A, Frye LL, Greenwood JR, Timlin MR, Uchimaya M. Epik: a software program for pK a prediction and protonation state generation for drug-like molecules. Journal of computer-aided molecular design. 2007; 21(12):681–691. doi: 10.1007/s10822-007-9133-z.
OpenUrl CrossRef PubMed Web of Science
[67].↵
Greenwood JR, Calkins D, Sullivan AP, Shelley JC. Towards the comprehensive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules in aqueous solution. Journal of computer-aided molecular design. 2010; 24(6-7):591–604. doi: 10.1007/s10822-010-9349-1.
OpenUrl CrossRef PubMed
[68].↵
Graves AP, Shivakumar DM, Boyce SE, Jacobson MP, Case DA, Shoichet BK. Rescoring docking hit lists for model cavity sites: predictions and experimental testing. Journal of molecular biology. 2008; 377(3):914–934. doi: 10.1016/j.jmb.2008.01.049.
OpenUrl CrossRef PubMed
[69].↵
Jacobson MP, Friesner RA, Xiang Z, Honig B. On the role of the crystal environment in determining protein side-chain conformations. Journal of molecular biology. 2002; 320(3):597–608. doi: 10.1016/S0022-2836(02)00470-9.
OpenUrl CrossRef PubMed Web of Science
[70].↵
Jacobson MP, Pincus DL, Rapp CS, Day TJ, Honig B, Shaw DE, Friesner RA. A hierarchical approach to all-atom protein loop prediction. Proteins: Structure, Function, and Bioinformatics. 2004; 55(2):351–367. doi: 10.1002/prot.10613.
OpenUrl CrossRef PubMed Web of Science
[71].↵
Harder E, Damm W, Maple J, Wu C, Reboul M, Xiang JY, Wang L, Lupyan D, Dahlgren MK, Knight JL, et al. OPLS3: a force field providing broad coverage of drug-like small molecules and proteins. Journal of chemical theory and computation. 2015; 12(1):281–296. doi: 10.1021/acs.jctc.5b00864.
OpenUrl CrossRef
[72].↵
Klamt A. Conductor-like screening model for real solvents: a new approach to the quantitative calculation of solvation phenomena. The Journal of Physical Chemistry. 1995; 99(7):2224–2235. doi: 10.1021/j100007a062.
OpenUrl CrossRef Web of Science
[73].↵
Gilson MK, Given JA, Bush BL, McCammon JA. The statistical-thermodynamic basis for computation of binding affinities: a critical review. Biophysical journal. 1997; 72(3):1047–1069. doi: 10.1016/S0006-3495(97)78756-3.
OpenUrl CrossRef PubMed Web of Science
[74].↵
Ercolessi F, Adams JB. Interatomic potentials from first-principles calculations: the force-matching method. EPL (Europhysics Letters). 1994; 26(8):583. doi: 10.1209/0295-5075/26/8/005.
OpenUrl CrossRef Web of Science
[75].↵
Procacci P. I.. Dissociation free energies of drug–receptor systems via non-equilibrium alchemical simulations: a theoretical framework. Physical Chemistry Chemical Physics. 2016; 18(22):14991–15004. doi: 10.1039/C5CP05519A.
OpenUrl CrossRef
[76].↵
Nerattini F, Chelli R, Procacci P. II.. Dissociation free energies in drug–receptor systems via nonequilibrium alchemical simulations: application to the FK506-related immunophilin ligands. Physical Chemistry Chemical Physics. 2016; 18(22):15005–15018. doi: 10.1039/C5CP05521K.
OpenUrl CrossRef
[77].↵
Zheng Z, Wang T, Li P, Merz Jr KM. KECSA-movable type implicit solvation model (KMTISM). Journal of chemical theory and computation. 2015; 11(2):667–682. doi: 10.1021/ct5007828.
OpenUrl CrossRef
[78].↵
Zheng Z, Ucisik MN, Merz KM. The movable type method applied to protein–ligand binding. Journal of chemical theory and computation. 2013; 9(12):5526–5538. doi: 10.1021/ct4005992.
OpenUrl CrossRef
[79].↵
Sitkoff D, Sharp KA, Honig B. Accurate calculation of hydration free energies using macroscopic solvent models. The Journal of Physical Chemistry. 1994; 98(7):1978–1988. doi: 10.1021/j100058a043.
OpenUrl CrossRef
[80].↵
Liu P, Kim B, Friesner RA, Berne B. Replica exchange with solute tempering: A method for sampling biological systems in explicit water. Proceedings of the National Academy of Sciences of the United States of America. 2005; 102(39):13749–13754. doi: 10.1073/pnas.0506346102.
OpenUrl Abstract/FREE Full Text
[81].↵
Marsili S, Signorini GF, Chelli R, Marchi M, Procacci P. ORAC: A molecular dynamics simulation program to explore free energy surfaces in biomolecular systems at the atomistic level. Journal of computational chemistry. 2010; 31(5):1106–1116. doi: 10.1002/jcc.21388.
OpenUrl CrossRef PubMed
[82].↵
Woods CJ, Mey AS, Calabro G, Julien M, Sire molecular simulation framework;. https://siremol.org.
[83].↵
Eastman P, Swails J, Chodera JD, McGibbon RT, Zhao Y, Beauchamp KA, Wang LP, Simmonett AC, Harrigan MP, Stern CD, et al. OpenMM 7: rapid development of high performance algorithms for molecular dynamics. PLoS computational biology. 2017; 13(7):e1005659. doi: 10.1371/journal.pcbi.1005659.
OpenUrl CrossRef
[84].↵
Torrie GM, Valleau JP. Monte Carlo free energy estimates using non-Boltzmann sampling: Application to the sub-critical Lennard-Jones fluid. Chemical Physics Letters. 1974; 28(4):578–581. doi: 10.1016/0009-2614(74)80109-0.
OpenUrl CrossRef
[85].↵
Li J, Abel R, Zhu K, Cao Y, Zhao S, Friesner RA. The VSGB 2.0 model: a next generation energy model for high resolution protein structure modeling. Proteins: Structure, Function, and Bioinformatics. 2011; 79(10):2794–2812. doi: 10.1002/prot.23106.
OpenUrl CrossRef PubMed
[86].↵
Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. The Journal of chemical physics. 1983; 79(2):926–935. doi: 10.1063/1.445869.
OpenUrl CrossRef
[87].↵
Ponder JW, Wu C, Ren P, Pande VS, Chodera JD, Schnieders MJ, Haque I, Mobley DL, Lambrecht DS, DiStasio Jr RA, et al. Current status of the AMOEBA polarizable force field. The journal of physical chemistry B. 2010; 114(8):2549–2564. doi: 10.1021/jp910674d.
OpenUrl CrossRef PubMed
[88].↵
Horn HW, Swope WC, Pitera JW, Madura JD, Dick TJ, Hura GL, Head-Gordon T. Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. The Journal of chemical physics. 2004; 120(20):9665–9678. doi: 10.1063/1.1683075.
OpenUrl CrossRef PubMed Web of Science
[89].↵
Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA. Development and testing of a general amber force field. Journal of computational chemistry. 2004; 25(9):1157–1174. doi: 10.1002/jcc.20035.
OpenUrl CrossRef PubMed Web of Science
[90].↵
Bayly CI, Cieplak P, Cornell W, Kollman PA. A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. The Journal of Physical Chemistry. 1993; 97(40):10269–10280. doi: 10.1021/j100142a004.
OpenUrl CrossRef PubMed Web of Science
[91].↵
Vanommeslaeghe K, Hatcher E, Acharya C, Kundu S, Zhong S, Shim J, Darian E, Guvench O, Lopes P, Vorobyov I, et al. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. Journal of computational chemistry. 2010; 31(4):671–690. doi: 10.1002/jcc.21367.
OpenUrl CrossRef PubMed Web of Science
[92].↵
Zheng Z, Merz Jr KM. Development of the knowledge-based and empirical combined scoring algorithm (kecsa) to score protein–ligand interactions. Journal of chemical information and modeling. 2013; 53(5):1073–1083. doi: 10.1021/ci300619x.
OpenUrl CrossRef PubMed
[93].↵
Bansal N, Zheng Z, Song LF, Pei J, Merz Jr KM. The Role of the Active Site Flap in Streptavidin/Biotin Complex Formation. Journal of the American Chemical Society. 2018; 140(16):5434–5446. doi: 10.1021/jacs.8b00743.
OpenUrl CrossRef
[94].↵
ŘezáČ J, Fanfrlík J, Salahub D, Hobza P. Semiempirical quantum chemical PM6 method augmented by dispersion and H-bonding correction terms reliably describes various types of noncovalent complexes. Journal of Chemical Theory and Computation. 2009; 5(7):1749–1760. doi: 10.1021/ct9000922.
OpenUrl CrossRef
[95].↵
Korth M. Third-generation hydrogen-bonding corrections for semiempirical QM methods and force fields. Journal of Chemical Theory and Computation. 2010; 6(12):3808–3816. doi: 10.1021/ct100408b.
OpenUrl CrossRef
[96].↵
Becke AD. Density-functional thermochemistry. III. The role of exact exchange. The Journal of chemical physics. 1993; 98(7):5648–5652. doi: 10.1063/1.464913.
OpenUrl CrossRef Web of Science
[97].↵
Tao J, Perdew JP, Staroverov VN, Scuseria GE. Climbing the density functional ladder: Nonempirical meta–generalized gradient approximation designed for molecules and solids. Physical Review Letters. 2003; 91(14):146401. doi: 10.1103/PhysRevLett.91.146401.
OpenUrl CrossRef PubMed
[98].↵
Grimme S, Antony J, Ehrlich S, Krieg H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. The Journal of chemical physics. 2010; 132(15):154104. doi: 10.1063/1.3382344.
OpenUrl CrossRef PubMed
[99].↵
Sugita Y, Kitao A, Okamoto Y. Multidimensional replica-exchange method for free-energy calculations. The Journal of Chemical Physics. 2000; 113(15):6042–6051. doi: 10.1063/1.1308516.
OpenUrl CrossRef
[100].↵
Bennett CH. Efficient estimation of free energy differences from Monte Carlo data. Journal of Computational Physics. 1976; 22(2):245–268. doi: 10.1016/0021-9991(76)90078-4.
OpenUrl CrossRef Web of Science
[101].↵
Shirts MR, Bair E, Hooker G, Pande VS. Equilibrium free energies from nonequilibrium measurements using maximum-likelihood methods. Physical review letters. 2003; 91(14):140601. doi: 10.1103/PhysRevLett.91.140601.
OpenUrl CrossRef PubMed
[102].↵
Shirts MR, Chodera JD. Statistically optimal analysis of samples from multiple equilibrium states. The Journal of chemical physics. 2008; 129(12):124105. doi: 10.1063/1.2978177.
OpenUrl CrossRef PubMed
[103].↵
Srinivasan J, Cheatham TE, Cieplak P, Kollman PA, Case DA. Continuum solvent studies of the stability of DNA, RNA, and phosphoramidate-DNA helices. Journal of the American Chemical Society. 1998; 120(37):9401–9409. doi: 10.1021/ja981844+.
OpenUrl CrossRef
[104].↵
Rizzi A, Jensen T, Bosisio S, Slochower DR, Dickson A, Henriksen NM, Gilson MK, Michel J, Mobley DL, Shirts MR, Chodera JD. The SAMPL6 SAMPLing challenge: Assessing the reliability and efficiency of free energy calculations. BioRxiv. in preparation;.
[105].↵
Tironi IG, Sperb R, Smith PE, van Gunsteren WF. A generalized reaction field method for molecular dynamics simulations. The Journal of chemical physics. 1995; 102(13):5451–5459. doi: 10.1063/1.469273.
OpenUrl CrossRef Web of Science
[106].↵
Essmann U, Perera L, Berkowitz ML, Darden T, Lee H, Pedersen LG. A smooth particle mesh Ewald method. The Journal of chemical physics. 1995; 103(19):8577–8593. doi: 10.1063/1.470117.
OpenUrl CrossRef Web of Science
[107].↵
Bosisio S, Mey ASJS, Michel J. Blinded Predictions of Host-Guest Standard Free Energies of Binding in the SAMPL5 Challenge. J Comput Aided Mol Des. 2017; 31(1):61–70. doi: 10.1007/s10822-016-9933-0.
OpenUrl CrossRef
[108].↵
Kaminski GA, Friesner RA, Tirado-Rives J, Jorgensen WL. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. The Journal of Physical Chemistry B. 2001; 105(28):6474–6487. doi: 10.1021/jp003919d.
OpenUrl CrossRef Web of Science
[109].↵
Banks JL, Beard HS, Cao Y, Cho AE, Damm W, Farid R, Felts AK, Halgren TA, Mainz DT, Maple JR, et al. Integrated modeling program, applied chemical theory (IMPACT). Journal of computational chemistry. 2005; 26(16):1752–1780. doi: 10.1002/jcc.20292.
OpenUrl CrossRef PubMed Web of Science
[110].↵
Gallicchio E, Paris K, Levy RM. The AGBNP2 implicit solvation model. Journal of chemical theory and computation. 2009; 5(9):2544–2564. doi: 10.1021/ct900234u.
OpenUrl CrossRef
[111].↵
Kirkwood JG. Statistical mechanics of fluid mixtures. The Journal of Chemical Physics. 1935; 3(5):300–313. doi: 10.1063/1.1749657.
OpenUrl CrossRef
[112].↵
Straatsma T, McCammon J. Multiconfiguration thermodynamic integration. The Journal of chemical physics. 1991; 95(2):1175–1188. doi: 10.1063/1.461148.
OpenUrl CrossRef Web of Science
[113].↵
Boresch S, Tettinger F, Leitgeb M, Karplus M. Absolute binding free energies: a quantitative approach for their calculation. The Journal of Physical Chemistry B. 2003; 107(35):9535–9551. doi: 10.1021/jp0217839.
OpenUrl CrossRef
[114].↵
Bansal N, Zheng Z, Cerutti DS, Merz KM. On the fly estimation of host–guest binding free energies using the movable type method: participation in the SAMPL5 blind challenge. Journal of computer-aided molecular design. 2017; 31(1):47–60. doi: 10.1007/s10822-016-9980-6.
OpenUrl CrossRef
[115].↵
Caldararu O, Olsson MA, Riplinger C, Neese F, Ryde U. Binding Free Energies in the SAMPL5 Octa-Acid Host–guest Challenge Calculated with DFT-D3 and CCSD(T). J Comput Aided Mol Des. 2017; 31(1):87–106. doi: 10.1007/s10822- 016-9957-5.
OpenUrl CrossRef
[116].↵
Mikulskis P, Cioloboc D, Andrejić M, Khare S, Brorsson J, Genheden S, Mata RA, Söderhjelm P, Ryde U. Free-Energy Perturbation and Quantum Mechanical Study of SAMPL4 Octa-Acid Host–guest Binding Energies. J Comput Aided Mol Des. 2014 Apr; 28(4):375–400. doi: 10.1007/s10822-014-9739-x.
OpenUrl CrossRef
[117].↵
Ma D, Zavalij PY, Isaacs L. Acyclic cucurbit[n]uril congeners are high affinity hosts. The Journal of organic chemistry. 2010; 75(14):4786–4795. doi: 10.1021/jo100760g.
OpenUrl CrossRef PubMed
[118].↵
Bell DR, Qi R, Jing Z, Xiang JY, Mejias C, Schnieders MJ, Ponder JW, Ren P. Calculating binding free energies of host–guest systems using the AMOEBA polarizable force field. Physical Chemistry Chemical Physics. 2016; 18(44):30261–30269. doi: 10.1039/C6CP02509A.
OpenUrl CrossRef
[119].↵
Tofoleanu F, Lee J, Pickard IV FC, König G, Huang J, Baek M, Seok C, Brooks BR. Absolute binding free energies for octa-acids and guests in SAMPL5. Journal of computer-aided molecular design. 2017; 31(1):107–118. doi: 10.1007/s10822-016-9965-5.
OpenUrl CrossRef
[120].↵
Yin J, Fenley AT, Henriksen NM, Gilson MK. Toward improved force-field accuracy through sensitivity analysis of host-guest binding thermodynamics. The Journal of Physical Chemistry B. 2015; 119(32):10145–10155. doi: 10.1021/acs.jpcb.5b04262.
OpenUrl CrossRef
[121].↵
Yin J, Henriksen NM, Muddana HS, Gilson MK. Bind3P: Optimization of a Water Model Based on Host-Guest Binding Data. Journal of chemical theory and computation. 2018; doi: 10.1021/acs.jctc.8b00318.
OpenUrl CrossRef
[122].↵
Best RB, Vendruscolo M. Determination of protein structures consistent with NMR order parameters. Journal of the American Chemical Society. 2004; 126(26):8090–8091. doi: 10.1021/ja0396955.
OpenUrl CrossRef PubMed Web of Science
[123].↵
White AD, Voth GA. Efficient and minimal method to bias molecular simulations with experimental data. Journal of chemical theory and computation. 2014; 10(8):3023–3030. doi: 10.1021/ct500320c.
OpenUrl CrossRef
[124].↵
Mobley DL, Chodera JD, Isaacs L, Gibb BC. Advancing predictive modeling through focused development of model systems to drive new modeling innovations. 2016; Retrieved from https://escholarship.org/uc/item/7cf8c6cr.
[125].↵
Kellett K, Duggan BM, Gilson MK. Facile Synthesis of a Diverse Library of Mono-3-substituted β-Cyclodextrin Analogues. chemRxiv.;.

View the discussion thread.

Posted July 19, 2018.

Download PDF

Citation Tools

Subject Area

Biophysics

Subject Areas

All Articles

Animal Behavior and Cognition (5204)
Biochemistry (11718)
Bioengineering (8724)
Bioinformatics (29132)
Biophysics (14937)
Cancer Biology (12052)
Cell Biology (17362)
Clinical Trials (138)
Developmental Biology (9407)
Ecology (14146)
Epidemiology (2067)
Evolutionary Biology (18270)
Genetics (12223)
Genomics (16768)
Immunology (11844)
Microbiology (28016)
Molecular Biology (11560)
Neuroscience (60841)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4940)
Plant Biology (10405)
Scientific Communication and Education (1681)
Synthetic Biology (2878)
Systems Biology (7333)
Zoology (1642)

[1] [1].↵
Cournia Z, Allen B, Sherman W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J Chem Inf Model. 2017 Dec; 57(12):2911–2937. doi: 10.1021/acs.jcim.7b00564.
OpenUrl CrossRef

[2] [2].
Abel R, Wang L, Mobley DL, Friesner RA. A Critical Review of Validation, Blind Testing, and Real-World Use of Alchemical Protein-Ligand Binding Free Energy Calculations. Curr Top Med Chem. 2017 Aug; 17(23). doi: 10.2174/1568026617666170414142131.
OpenUrl CrossRef

[3] [3].
Abel R, Bhat S. Free Energy Calculation Guided Virtual Screening of Synthetically Feasible Ligand R-Group and Scaffold Modifications: An Emerging Paradigm for Lead Optimization. In: Annual Reports in Medicinal Chemistry, vol. 50 Elsevier; 2017.p. 237–262. doi: 10.1016/bs.armc.2017.08.007.
OpenUrl CrossRef

[4] [4].
Abel R, Wang L, Harder ED, Berne BJ, Friesner RA. Advancing Drug Discovery through Enhanced Free Energy Calculations. Acc Chem Res. 2017 Jul; 50(7):1625–1632. doi: 10.1021/acs.accounts.7b00083.
OpenUrl CrossRef

[5] [5].
Kuhn B, Tichý M, Wang L, Robinson S, Martin RE, Kuglstatter A, Benz J, Giroud M, Schirmeister T, Abel R, Diederich F, Hert J. Prospective Evaluation of Free Energy Calculations for the Prioritization of Cathepsin L Inhibitors. J Med Chem. 2017 Mar; 60(6):2485–2497. doi: 10.1021/acs.jmedchem.6b01881.
OpenUrl CrossRef

[6] [6].↵
Abel R, Mondal S, Masse C, Greenwood J, Harriman G, Ashwell MA, Bhat S, Wester R, Frye L, Kapeller R, Friesner RA. Accelerating Drug Discovery through Tight Integration of Expert Molecular Design and Predictive Scoring. Curr Opin Struct Biol. 2017 Apr; 43:38–44. doi: 10.1016/j.sbi.2016.10.007.
OpenUrl CrossRef

[7] [7].↵
Shirts MR, Mobley DL, Brown SP. Free energy calculations in structure-based drug design. Drug design: structure- and ligand-based approaches. 2010; p. 61–86. doi: 10.1016/j.sbi.2016.10.007.
OpenUrl CrossRef

[8] [8].↵
Wang L, Wu Y, Deng Y, Kim B, Pierce L, Krilov G, Lupyan D, Robinson S, Dahlgren MK, Greenwood J, Romero DL, Masse C, Knight JL, Steinbrecher T, Beuming T, Damm W, Harder E, Sherman W, Brewer M, Wester R, et al. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J Am Chem Soc. 2015 Feb; 137(7):2695–2703. doi: 10.1021/ja512751q.
OpenUrl CrossRef PubMed

[9] [9].↵
Aldeghi M, Heifetz A, Bodkin MJ, Knapp S, Biggin PC. Predictions of Ligand Selectivity from Absolute Binding Free Energy Calculations. J Am Chem Soc. 2017 Jan; 139(2):946–957. doi: 10.1021/jacs.6b11467.
OpenUrl CrossRef

[10] [10].↵
Mobley DL, Gilson MK. Predicting Binding Free Energies: Frontiers and Benchmarks. 2016 Dec; doi: 10.1101/074625.
OpenUrl Abstract/FREE Full Text

[11] [11].↵
Sultan MM, Denny RA, Unwalla R, Lovering F, Pande VS. Millisecond Dynamics of BTK Reveal Kinome-Wide Conformational Plasticity within the Apo Kinase Domain. Sci Rep. 2017 Dec; 7(1). doi: 10.1038/s41598-017-10697-0.
OpenUrl CrossRef

[12] [12].↵
Kohlhoff KJ, Shukla D, Lawrenz M, Bowman GR, Konerding DE, Belov D, Altman RB, Pande VS. Cloud-Based Simulations on Google Exacycle Reveal Ligand Modulation of GPCR Activation Pathways. Nat Chem. 2014 Jan; 6(1):15–21. doi: 10.1038/nchem.1821.
OpenUrl CrossRef PubMed

[13] [13].↵
Klepeis JL, Lindorff-Larsen K, Dror RO, Shaw DE. Long-Timescale Molecular Dynamics Simulations of Protein Structure and Function. Curr Opin Struct Biol. 2009 Apr; 19(2):120–127. doi: 10.1016/j.sbi.2009.03.004.
OpenUrl CrossRef PubMed Web of Science

[14] [14].↵
Jordan IK, Kondrashov FA, Adzhubei IA, Wolf YI, Koonin EV, Kondrashov AS, Sunyaev S. A Universal Trend of Amino Acid Gain and Loss in Protein Evolution. Nature. 2005 Feb; 433(7026):633–638. doi: 10.1038/nature03306.
OpenUrl CrossRef PubMed Web of Science

[15] [15].↵
Aguilar B, Anandakrishnan R, Ruscio JZ, Onufriev AV. Statistics and Physical Origins of pK and Ionization State Changes upon Protein-Ligand Binding. Biophys J. 2010 Mar; 98(5):872–880. doi: 10.1016/j.bpj.2009.11.016.
OpenUrl CrossRef PubMed Web of Science

[16] [16].↵
Rekharsky MV, Mori T, Yang C, Ko YH, Selvapalam N, Kim H, Sobransingh D, Kaifer AE, Liu S, Isaacs L, Chen W, Moghaddam S, Gilson MK, Kim K, Inoue Y. A Synthetic Host-Guest System Achieves Avidin-Biotin Affinity by Over-coming Enthalpy–entropy Compensation. PNAS. 2007 Dec; 104(52):20737–20742. doi: 10.1073/pnas.0706407105.
OpenUrl Abstract/FREE Full Text

[17] [17].
Moghaddam S, Inoue Y, Gilson MK. Host-Guest Complexes with Protein-Ligand-like Affinities: Computational Analysis and Design. J Am Chem Soc. 2009 Mar; 131(11):4012–4021. doi: 10.1021/ja808175m.
OpenUrl CrossRef PubMed

[18] [18].↵
Moghaddam S, Yang C, Rekharsky M, Ko YH, Kim K, Inoue Y, Gilson MK. New Ultrahigh Affinity Host-Guest Complexes of Cucurbit[7]Uril with Bicyclo[2.2.2]Octane and Adamantane Guests: Thermodynamic Analysis and Evaluation of M2 Affinity Calculations. J Am Chem Soc. 2011 Mar; 133(10):3570–3581. doi: 10.1021/ja109904u.
OpenUrl CrossRef PubMed

[19] [19].↵
Gibb CLD, Gibb BC. Binding of Cyclic Carboxylates to Octa-Acid Deep-Cavity Cavitand. J Comput Aided Mol Des. 2013 Nov; 28(4):319–325. doi: 10.1007/s10822-013-9690-2.
OpenUrl CrossRef

[20] [20].
Cao L, Isaacs L. Absolute and Relative Binding Affinity of Cucurbit[7]Uril towards a Series of Cationic Guests. Supramol Chem. 2014 Mar; 26(3-4):251–258. doi: 10.1080/10610278.2013.852674.
OpenUrl CrossRef

[21] [21].↵
Sullivan MR, Sokkalingam P, Nguyen T, Donahue JP, Gibb BC. Binding of Carboxylate and Trimethylammonium Salts to Octa-Acid and TEMOA Deep-Cavity Cavitands. J Comput Aided Mol Des. 2017; 31(1):1–8. doi: 10.1007/s10822- 016-9925-0.
OpenUrl CrossRef

[22] [22].↵
Sullivan MR, Sokkalingam P, Nguyen T, Donahue JP, Gibb BC. Binding of Carboxylate and Trimethylammonium Salts to Octa-Acid and TEMOA Deep-Cavity Cavitands. J Comput Aided Mol Des. 2017 Jan; 31(1):21–28. doi: 10.1007/s10822-016-9925-0.
OpenUrl CrossRef

[23] [23].↵
Muddana HS, Varnado CD, Bielawski CW, Urbach AR, Isaacs L, Geballe MT, Gilson MK. Blind Prediction of Host–guest Binding Affinities: A New SAMPL3 Challenge. J Comput Aided Mol Des. 2012 Feb; 26(5):475–487. doi: 10.1007/s10822- 012-9554-1.
OpenUrl CrossRef PubMed

[24] [24].
Skillman AG. SAMPL3: Blinded Prediction of Host–guest Binding Affinities, Hydration Free Energies, and Trypsin Inhibitors. J Comput Aided Mol Des. 2012 May; 26(5):473–474. doi: 10.1007/s10822-012-9580-z.
OpenUrl CrossRef PubMed

[25] [25].↵
Muddana HS, Fenley AT, Mobley DL, Gilson MK. The SAMPL4 Host–guest Blind Prediction Challenge: An Overview. J Comput Aided Mol Des. 2014 Mar; 28(4):305–317. doi: 10.1007/s10822-014-9735-1.
OpenUrl CrossRef

[26] [26].↵
Yin J, Henriksen NM, Slochower DR, Shirts MR, Chiu MW, Mobley DL, Gilson MK. Overview of the SAMPL5 Host–guest Challenge: Are We Doing Better? J Comput Aided Mol Des. 2017; 31(1):1–19. doi: 10.1007/s10822-016-9974-4.
OpenUrl CrossRef

[27] [27].↵
Mobley DL, Chodera JD, Isaacs L, Gibb BC. Advancing predictive modeling through focused development of model systems to drive new modeling innovations. UC Irvine: Department of Pharmaceutical Sciences, UCI. 2016; https://escholarship.org/uc/item/7cf8c6cr.

[28] [28].↵
Drug Design Data Resource, SAMPL;. https://drugdesigndata.org/about/sampl.

[29] [29].
Nicholls A, Mobley DL, Guthrie JP, Chodera JD, Bayly CI, Cooper MD, Pande VS. Predicting Small-Molecule Solvation Free Energies: An Informal Blind Test for Computational Chemistry. J Med Chem. 2008 Feb; 51(4):769–779. doi: 10.1021/jm070549+.
OpenUrl CrossRef PubMed

[30] [30].
Guthrie JP. A Blind Challenge for Computational Solvation Free Energies: Introduction and Overview. J Phys Chem B. 2009 Jan; 113(14):4501–4507. doi: 10.1021/jp806724u.
OpenUrl CrossRef PubMed

[31] [31].
Skillman AG, Geballe MT, Nicholls A. SAMPL2 Challenge: Prediction of Solvation Energies and Tautomer Ratios. J Comput Aided Mol Des. 2010 Apr; 24(4):257–258. doi: 10.1007/s10822-010-9358-0.
OpenUrl CrossRef PubMed

[32] [32].
Geballe MT, Skillman AG, Nicholls A, Guthrie JP, Taylor PJ. The SAMPL2 Blind Prediction Challenge: Introduction and Overview. J Comput Aided Mol Des. 2010 May; 24(4):259–279. doi: 10.1007/s10822-010-9350-8.
OpenUrl CrossRef PubMed

[33] [33].
Geballe MT, Guthrie JP. The SAMPL3 Blind Prediction Challenge: Transfer Energy Overview. J Comput Aided Mol Des. 2012 Apr; 26(5):489–496. doi: 10.1007/s10822-012-9568-8.
OpenUrl CrossRef PubMed

[34] [34].
Guthrie JP. SAMPL4, a Blind Challenge for Computational Solvation Free Energies: The Compounds Considered. J Comput Aided Mol Des. 2014 Apr; 28(3):151–168. doi: 10.1007/s10822-014-9738-y.
OpenUrl CrossRef

[35] [35].
Mobley DL, Wymer KL, Lim NM, Guthrie JP. Blind Prediction of Solvation Free Energies from the SAMPL4 Challenge. J Comput Aided Mol Des. 2014 Mar; 28(3):135–150. doi: 10.1007/s10822-014-9718-2.
OpenUrl CrossRef

[36] [36].
Mobley DL, Liu S, Lim NM, Wymer KL, Perryman AL, Forli S, Deng N, Su J, Branson K, Olson A J. Blind Prediction of HIV Integrase Binding from the SAMPL4 Challenge. J Comput Aided Mol Des. 2014 Mar; 28(4):327–345. doi: 10.1007/s10822-014-9723-5.
OpenUrl CrossRef

[37] [37].
Bannan CC, Burley KH, Chiu M, Shirts MR, Gilson MK, Mobley DL. Blind Prediction of Cyclohexane–water Distribution Coefficients from the SAMPL5 Challenge. J Comput Aided Mol Des. 2016 Sep; 30(11):1–18. doi: 10.1007/s10822-016- 9954-8.
OpenUrl CrossRef

[38] [38].
Bhakat S, Söderhjelm P. Resolving the Problem of Trapped Water in Binding Cavities: Prediction of Host-Guest Binding Free Energies in the SAMPL5 Challenge by Funnel Metadynamics. J Comput Aided Mol Des. 2017; 31(1):119–132. doi: 10.1007/s10822-016-9948-6.
OpenUrl CrossRef

[39] [39].↵
Henriksen NM, Fenley AT, Gilson MK. Computational Calorimetry: High-Precision Calculation of Host–Guest Binding Thermodynamics. J Chem Theory Comput. 2015 Sep; 11(9):4377–4394. doi: 10.1021/acs.jctc.5b00405.
OpenUrl CrossRef

[40] [40].↵
Yin J, Fenley AT, Henriksen NM, Gilson MK. Toward Improved Force-Field Accuracy through Sensitivity Analysis of Host-Guest Binding Thermodynamics. J Phys Chem B. 2015 Aug; 119(32):10145–10155. doi: 10.1021/acs.jpcb.5b04262.
OpenUrl CrossRef

[41] [41].↵
Gibb CL, Gibb BC. Well-defined, organic nanoenvironments in water: the hydrophobic effect drives a capsular assembly. Journal of the American Chemical Society. 2004; 126(37):11408–11409. doi: 10.1021/ja0475611.
OpenUrl CrossRef PubMed Web of Science

[42] [42].
Hillyer MB, Gibb CL, Sokkalingam P, Jordan JH, Ioup SE, Gibb BC. Synthesis of water-soluble deep-cavity cavitands. Organic letters. 2016; 18(16):4048–4051. doi: 10.1021/acs.orglett.6b01903.
OpenUrl CrossRef

[43] [43].↵
Mobley DL, Gilson MK. Predicting binding free energies: Frontiers and benchmarks. Annual review of biophysics. 2017; 46:531–558. doi: 10.1146/annurev-biophys-070816-033654.
OpenUrl CrossRef

[44] [44].↵
Mobley DL, Heinzelmann G, Henriksen NM, Gilson MK. Predicting binding free energies: Frontiers and benchmarks (a perpetual review). UC Irvine: Department of Pharmaceutical Sciences, UCI. 2017; https://escholarship.org/uc/item/9p37m6bq.

[45] [45].↵
Freeman W, Mock W, Shih N. Cucurbituril. Journal of the American Chemical Society. 1981; 103(24):7367–7368. doi: 10.1021/ja00414a070.
OpenUrl CrossRef Web of Science

[46] [46].
Mock W, Shih N. Host-guest binding capacity of cucurbituril. The Journal of Organic Chemistry. 1983; 48(20):3618–3619. doi: 10.1021/jo00168a069.
OpenUrl CrossRef

[47] [47].↵
Liu S, Ruspic C, Mukhopadhyay P, Chakrabarti S, Zavalij PY, Isaacs L. The cucurbit[n]uril family: prime components for self-sorting systems. Journal of the American Chemical Society. 2005; 127(45):15959–15967. doi: 10.1021/ja055013x.
OpenUrl CrossRef PubMed Web of Science

[48] [48].↵
Gan H, Benjamin CJ, Gibb BC. Nonmonotonic assembly of a deep-cavity cavitand. Journal of the American Chemical Society. 2011; 133(13):4770–4773. doi: 10.1021/ja200633d.
OpenUrl CrossRef PubMed

[49] [49].↵
Muddana HS, Gilson MK. Prediction of SAMPL3 Host–guest Binding Affinities: Evaluating the Accuracy of Generalized Force-Fields. J Comput Aided Mol Des. 2012 Jan; 26(5):517–525. doi: 10.1007/s10822-012-9544-3.
OpenUrl CrossRef PubMed

[50] [50].↵
Zhang B, Isaacs L. Acyclic cucurbit[n]uril-type molecular containers: influence of aromatic walls on their function as solubilizing excipients for insoluble drugs. Journal of medicinal chemistry. 2014; 57(22):9554–9563. doi: 10.1021/jm501276u.
OpenUrl CrossRef

[51] [51].↵
Mobley DL, Chodera JD, Dill KA. On the use of orientational restraints and symmetry corrections in alchemical free energy calculations. The Journal of chemical physics. 2006; 125(8):084902. doi: 10.1063/1.2221683.
OpenUrl CrossRef PubMed

[52] [52].↵
Ewell J, Gibb BC, Rick SW. Water inside a hydrophobic cavitand molecule. The Journal of Physical Chemistry B. 2008; 112(33):10272–10279. doi: 10.1021/jp804429n.
OpenUrl CrossRef

[53] [53].↵
Rogers KE, Ortiz-Sánchez JM, Baron R, Fajer M, de Oliveira CAF, McCammon JA. On the role of dewetting transitions in host–guest binding free energy calculations. Journal of chemical theory and computation. 2012; 9(1):46–53. doi: 10.1021/ct300515n.
OpenUrl CrossRef

[54] [54].↵
Gibb CL, Gibb BC. Anion binding to hydrophobic concavity is central to the salting-in effects of Hofmeister chaotropes. Journal of the American Chemical Society. 2011; 133(19):7344–7347. doi: 10.1021/ja202308n.
OpenUrl CrossRef PubMed

[55] [55].
Hsiao YW, Söderhjelm P. Prediction of SAMPL4 host–guest binding affinities using funnel metadynamics. Journal of computer-aided molecular design. 2014; 28(4):443–454. doi: 10.1007/s10822-014-9724-4.
OpenUrl CrossRef

[56] [56].
Muddana HS, Yin J, Sapra NV, Fenley AT, Gilson MK. Blind prediction of SAMPL4 cucurbit[7]uril binding affinities with the mining minima method. Journal of computer-aided molecular design. 2014; 28(4):463–474. doi: 10.1007/s10822- 014-9726-2.
OpenUrl CrossRef

[57] [57].
Moghaddam S, Yang C, Rekharsky M, Ko YH, Kim K, Inoue Y, Gilson MK. New ultrahigh affinity host-guest complexes of cucurbit[7]uril with bicyclo[2.2.2]octane and adamantane guests: Thermodynamic analysis and evaluation of m2 aZnity calculations. Journal of the American Chemical Society. 2011; 133(10):3570–3581. doi: 10.1021/ja109904u.
OpenUrl CrossRef PubMed

[58] [58].↵
Rekharsky MV, Ko YH, Selvapalam N, Kim K, Inoue Y. Complexation thermodynamics of cucurbit[6]uril with aliphatic alcohols, amines, and diamines. Supramolecular Chemistry. 2007; 19(1-2):39–46. doi: 10.1080/10610270600915292.
OpenUrl CrossRef

[59] [59].
Murkli S, McNeill JN, Isaacs L. Cucurbit[8]uril Guest Complexes: Blinded Dataset for the SAMPL6 Challenge. Supramolecular Chemistry. submitted; XX.

[60] [60].↵
Boyce SE, Tellinghuisen J, Chodera JD. Avoiding accuracy-limiting pitfalls in the study of protein-ligand interactions with isothermal titration calorimetry. bioRxiv. 2015; p. 023796. doi: 10.1101/023796.
OpenUrl Abstract/FREE Full Text

[61] [61].↵
Hawkins PC, Skillman AG, Warren GL, Ellingson BA, Stahl MT. Conformer generation with OMEGA: algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database. Journal of chemical information and modeling. 2010; 50(4):572–584. doi: 10.1021/ci100031x.
OpenUrl CrossRef PubMed

[62] [62].↵
McGann M. FRED pose prediction and virtual screening accuracy. Journal of chemical information and modeling. 2011; 51(3):578–596. doi: 10.1021/ci100436p.
OpenUrl CrossRef PubMed

[63] [63].↵
McGann M. FRED and HYBRID docking performance on standardized datasets. Journal of computer-aided molecular design. 2012; 26(8):897–906. doi: 10.1007/s10822-012-9584-8.
OpenUrl CrossRef PubMed

[64] [64].↵
Jakalian A, Bush BL, Jack DB, Bayly CI. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: I. Method. Journal of computational chemistry. 2000; 21(2):132–146.
OpenUrl CrossRef Web of Science

[65] [65].↵
Jakalian A, Jack DB, Bayly CI. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. Journal of computational chemistry. 2002; 23(16):1623–1641.
OpenUrl CrossRef PubMed Web of Science

[66] [66].↵
Shelley JC, Cholleti A, Frye LL, Greenwood JR, Timlin MR, Uchimaya M. Epik: a software program for pK a prediction and protonation state generation for drug-like molecules. Journal of computer-aided molecular design. 2007; 21(12):681–691. doi: 10.1007/s10822-007-9133-z.
OpenUrl CrossRef PubMed Web of Science

[67] [67].↵
Greenwood JR, Calkins D, Sullivan AP, Shelley JC. Towards the comprehensive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules in aqueous solution. Journal of computer-aided molecular design. 2010; 24(6-7):591–604. doi: 10.1007/s10822-010-9349-1.
OpenUrl CrossRef PubMed

[68] [68].↵
Graves AP, Shivakumar DM, Boyce SE, Jacobson MP, Case DA, Shoichet BK. Rescoring docking hit lists for model cavity sites: predictions and experimental testing. Journal of molecular biology. 2008; 377(3):914–934. doi: 10.1016/j.jmb.2008.01.049.
OpenUrl CrossRef PubMed

[69] [69].↵
Jacobson MP, Friesner RA, Xiang Z, Honig B. On the role of the crystal environment in determining protein side-chain conformations. Journal of molecular biology. 2002; 320(3):597–608. doi: 10.1016/S0022-2836(02)00470-9.
OpenUrl CrossRef PubMed Web of Science

[70] [70].↵
Jacobson MP, Pincus DL, Rapp CS, Day TJ, Honig B, Shaw DE, Friesner RA. A hierarchical approach to all-atom protein loop prediction. Proteins: Structure, Function, and Bioinformatics. 2004; 55(2):351–367. doi: 10.1002/prot.10613.
OpenUrl CrossRef PubMed Web of Science

[71] [71].↵
Harder E, Damm W, Maple J, Wu C, Reboul M, Xiang JY, Wang L, Lupyan D, Dahlgren MK, Knight JL, et al. OPLS3: a force field providing broad coverage of drug-like small molecules and proteins. Journal of chemical theory and computation. 2015; 12(1):281–296. doi: 10.1021/acs.jctc.5b00864.
OpenUrl CrossRef

[72] [72].↵
Klamt A. Conductor-like screening model for real solvents: a new approach to the quantitative calculation of solvation phenomena. The Journal of Physical Chemistry. 1995; 99(7):2224–2235. doi: 10.1021/j100007a062.
OpenUrl CrossRef Web of Science

[73] [73].↵
Gilson MK, Given JA, Bush BL, McCammon JA. The statistical-thermodynamic basis for computation of binding affinities: a critical review. Biophysical journal. 1997; 72(3):1047–1069. doi: 10.1016/S0006-3495(97)78756-3.
OpenUrl CrossRef PubMed Web of Science

[74] [74].↵
Ercolessi F, Adams JB. Interatomic potentials from first-principles calculations: the force-matching method. EPL (Europhysics Letters). 1994; 26(8):583. doi: 10.1209/0295-5075/26/8/005.
OpenUrl CrossRef Web of Science

[75] [75].↵
Procacci P. I.. Dissociation free energies of drug–receptor systems via non-equilibrium alchemical simulations: a theoretical framework. Physical Chemistry Chemical Physics. 2016; 18(22):14991–15004. doi: 10.1039/C5CP05519A.
OpenUrl CrossRef

[76] [76].↵
Nerattini F, Chelli R, Procacci P. II.. Dissociation free energies in drug–receptor systems via nonequilibrium alchemical simulations: application to the FK506-related immunophilin ligands. Physical Chemistry Chemical Physics. 2016; 18(22):15005–15018. doi: 10.1039/C5CP05521K.
OpenUrl CrossRef

[77] [77].↵
Zheng Z, Wang T, Li P, Merz Jr KM. KECSA-movable type implicit solvation model (KMTISM). Journal of chemical theory and computation. 2015; 11(2):667–682. doi: 10.1021/ct5007828.
OpenUrl CrossRef

[78] [78].↵
Zheng Z, Ucisik MN, Merz KM. The movable type method applied to protein–ligand binding. Journal of chemical theory and computation. 2013; 9(12):5526–5538. doi: 10.1021/ct4005992.
OpenUrl CrossRef

[79] [79].↵
Sitkoff D, Sharp KA, Honig B. Accurate calculation of hydration free energies using macroscopic solvent models. The Journal of Physical Chemistry. 1994; 98(7):1978–1988. doi: 10.1021/j100058a043.
OpenUrl CrossRef

[80] [80].↵
Liu P, Kim B, Friesner RA, Berne B. Replica exchange with solute tempering: A method for sampling biological systems in explicit water. Proceedings of the National Academy of Sciences of the United States of America. 2005; 102(39):13749–13754. doi: 10.1073/pnas.0506346102.
OpenUrl Abstract/FREE Full Text

[81] [81].↵
Marsili S, Signorini GF, Chelli R, Marchi M, Procacci P. ORAC: A molecular dynamics simulation program to explore free energy surfaces in biomolecular systems at the atomistic level. Journal of computational chemistry. 2010; 31(5):1106–1116. doi: 10.1002/jcc.21388.
OpenUrl CrossRef PubMed

[82] [82].↵
Woods CJ, Mey AS, Calabro G, Julien M, Sire molecular simulation framework;. https://siremol.org.

[83] [83].↵
Eastman P, Swails J, Chodera JD, McGibbon RT, Zhao Y, Beauchamp KA, Wang LP, Simmonett AC, Harrigan MP, Stern CD, et al. OpenMM 7: rapid development of high performance algorithms for molecular dynamics. PLoS computational biology. 2017; 13(7):e1005659. doi: 10.1371/journal.pcbi.1005659.
OpenUrl CrossRef

[84] [84].↵
Torrie GM, Valleau JP. Monte Carlo free energy estimates using non-Boltzmann sampling: Application to the sub-critical Lennard-Jones fluid. Chemical Physics Letters. 1974; 28(4):578–581. doi: 10.1016/0009-2614(74)80109-0.
OpenUrl CrossRef

[85] [85].↵
Li J, Abel R, Zhu K, Cao Y, Zhao S, Friesner RA. The VSGB 2.0 model: a next generation energy model for high resolution protein structure modeling. Proteins: Structure, Function, and Bioinformatics. 2011; 79(10):2794–2812. doi: 10.1002/prot.23106.
OpenUrl CrossRef PubMed

[86] [86].↵
Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. The Journal of chemical physics. 1983; 79(2):926–935. doi: 10.1063/1.445869.
OpenUrl CrossRef

[87] [87].↵
Ponder JW, Wu C, Ren P, Pande VS, Chodera JD, Schnieders MJ, Haque I, Mobley DL, Lambrecht DS, DiStasio Jr RA, et al. Current status of the AMOEBA polarizable force field. The journal of physical chemistry B. 2010; 114(8):2549–2564. doi: 10.1021/jp910674d.
OpenUrl CrossRef PubMed

[88] [88].↵
Horn HW, Swope WC, Pitera JW, Madura JD, Dick TJ, Hura GL, Head-Gordon T. Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. The Journal of chemical physics. 2004; 120(20):9665–9678. doi: 10.1063/1.1683075.
OpenUrl CrossRef PubMed Web of Science

[89] [89].↵
Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA. Development and testing of a general amber force field. Journal of computational chemistry. 2004; 25(9):1157–1174. doi: 10.1002/jcc.20035.
OpenUrl CrossRef PubMed Web of Science

[90] [90].↵
Bayly CI, Cieplak P, Cornell W, Kollman PA. A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. The Journal of Physical Chemistry. 1993; 97(40):10269–10280. doi: 10.1021/j100142a004.
OpenUrl CrossRef PubMed Web of Science

[91] [91].↵
Vanommeslaeghe K, Hatcher E, Acharya C, Kundu S, Zhong S, Shim J, Darian E, Guvench O, Lopes P, Vorobyov I, et al. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. Journal of computational chemistry. 2010; 31(4):671–690. doi: 10.1002/jcc.21367.
OpenUrl CrossRef PubMed Web of Science

[92] [92].↵
Zheng Z, Merz Jr KM. Development of the knowledge-based and empirical combined scoring algorithm (kecsa) to score protein–ligand interactions. Journal of chemical information and modeling. 2013; 53(5):1073–1083. doi: 10.1021/ci300619x.
OpenUrl CrossRef PubMed

[93] [93].↵
Bansal N, Zheng Z, Song LF, Pei J, Merz Jr KM. The Role of the Active Site Flap in Streptavidin/Biotin Complex Formation. Journal of the American Chemical Society. 2018; 140(16):5434–5446. doi: 10.1021/jacs.8b00743.
OpenUrl CrossRef

[94] [94].↵
ŘezáČ J, Fanfrlík J, Salahub D, Hobza P. Semiempirical quantum chemical PM6 method augmented by dispersion and H-bonding correction terms reliably describes various types of noncovalent complexes. Journal of Chemical Theory and Computation. 2009; 5(7):1749–1760. doi: 10.1021/ct9000922.
OpenUrl CrossRef

[95] [95].↵
Korth M. Third-generation hydrogen-bonding corrections for semiempirical QM methods and force fields. Journal of Chemical Theory and Computation. 2010; 6(12):3808–3816. doi: 10.1021/ct100408b.
OpenUrl CrossRef

[96] [96].↵
Becke AD. Density-functional thermochemistry. III. The role of exact exchange. The Journal of chemical physics. 1993; 98(7):5648–5652. doi: 10.1063/1.464913.
OpenUrl CrossRef Web of Science

[97] [97].↵
Tao J, Perdew JP, Staroverov VN, Scuseria GE. Climbing the density functional ladder: Nonempirical meta–generalized gradient approximation designed for molecules and solids. Physical Review Letters. 2003; 91(14):146401. doi: 10.1103/PhysRevLett.91.146401.
OpenUrl CrossRef PubMed

[98] [98].↵
Grimme S, Antony J, Ehrlich S, Krieg H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. The Journal of chemical physics. 2010; 132(15):154104. doi: 10.1063/1.3382344.
OpenUrl CrossRef PubMed

[99] [99].↵
Sugita Y, Kitao A, Okamoto Y. Multidimensional replica-exchange method for free-energy calculations. The Journal of Chemical Physics. 2000; 113(15):6042–6051. doi: 10.1063/1.1308516.
OpenUrl CrossRef

[100] [100].↵
Bennett CH. Efficient estimation of free energy differences from Monte Carlo data. Journal of Computational Physics. 1976; 22(2):245–268. doi: 10.1016/0021-9991(76)90078-4.
OpenUrl CrossRef Web of Science

[101] [101].↵
Shirts MR, Bair E, Hooker G, Pande VS. Equilibrium free energies from nonequilibrium measurements using maximum-likelihood methods. Physical review letters. 2003; 91(14):140601. doi: 10.1103/PhysRevLett.91.140601.
OpenUrl CrossRef PubMed

[102] [102].↵
Shirts MR, Chodera JD. Statistically optimal analysis of samples from multiple equilibrium states. The Journal of chemical physics. 2008; 129(12):124105. doi: 10.1063/1.2978177.
OpenUrl CrossRef PubMed

[103] [103].↵
Srinivasan J, Cheatham TE, Cieplak P, Kollman PA, Case DA. Continuum solvent studies of the stability of DNA, RNA, and phosphoramidate-DNA helices. Journal of the American Chemical Society. 1998; 120(37):9401–9409. doi: 10.1021/ja981844+.
OpenUrl CrossRef

[104] [104].↵
Rizzi A, Jensen T, Bosisio S, Slochower DR, Dickson A, Henriksen NM, Gilson MK, Michel J, Mobley DL, Shirts MR, Chodera JD. The SAMPL6 SAMPLing challenge: Assessing the reliability and efficiency of free energy calculations. BioRxiv. in preparation;.

[105] [105].↵
Tironi IG, Sperb R, Smith PE, van Gunsteren WF. A generalized reaction field method for molecular dynamics simulations. The Journal of chemical physics. 1995; 102(13):5451–5459. doi: 10.1063/1.469273.
OpenUrl CrossRef Web of Science

[106] [106].↵
Essmann U, Perera L, Berkowitz ML, Darden T, Lee H, Pedersen LG. A smooth particle mesh Ewald method. The Journal of chemical physics. 1995; 103(19):8577–8593. doi: 10.1063/1.470117.
OpenUrl CrossRef Web of Science

[107] [107].↵
Bosisio S, Mey ASJS, Michel J. Blinded Predictions of Host-Guest Standard Free Energies of Binding in the SAMPL5 Challenge. J Comput Aided Mol Des. 2017; 31(1):61–70. doi: 10.1007/s10822-016-9933-0.
OpenUrl CrossRef

[108] [108].↵
Kaminski GA, Friesner RA, Tirado-Rives J, Jorgensen WL. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. The Journal of Physical Chemistry B. 2001; 105(28):6474–6487. doi: 10.1021/jp003919d.
OpenUrl CrossRef Web of Science

[109] [109].↵
Banks JL, Beard HS, Cao Y, Cho AE, Damm W, Farid R, Felts AK, Halgren TA, Mainz DT, Maple JR, et al. Integrated modeling program, applied chemical theory (IMPACT). Journal of computational chemistry. 2005; 26(16):1752–1780. doi: 10.1002/jcc.20292.
OpenUrl CrossRef PubMed Web of Science

[110] [110].↵
Gallicchio E, Paris K, Levy RM. The AGBNP2 implicit solvation model. Journal of chemical theory and computation. 2009; 5(9):2544–2564. doi: 10.1021/ct900234u.
OpenUrl CrossRef

[111] [111].↵
Kirkwood JG. Statistical mechanics of fluid mixtures. The Journal of Chemical Physics. 1935; 3(5):300–313. doi: 10.1063/1.1749657.
OpenUrl CrossRef

[112] [112].↵
Straatsma T, McCammon J. Multiconfiguration thermodynamic integration. The Journal of chemical physics. 1991; 95(2):1175–1188. doi: 10.1063/1.461148.
OpenUrl CrossRef Web of Science

[113] [113].↵
Boresch S, Tettinger F, Leitgeb M, Karplus M. Absolute binding free energies: a quantitative approach for their calculation. The Journal of Physical Chemistry B. 2003; 107(35):9535–9551. doi: 10.1021/jp0217839.
OpenUrl CrossRef

[114] [114].↵
Bansal N, Zheng Z, Cerutti DS, Merz KM. On the fly estimation of host–guest binding free energies using the movable type method: participation in the SAMPL5 blind challenge. Journal of computer-aided molecular design. 2017; 31(1):47–60. doi: 10.1007/s10822-016-9980-6.
OpenUrl CrossRef

[115] [115].↵
Caldararu O, Olsson MA, Riplinger C, Neese F, Ryde U. Binding Free Energies in the SAMPL5 Octa-Acid Host–guest Challenge Calculated with DFT-D3 and CCSD(T). J Comput Aided Mol Des. 2017; 31(1):87–106. doi: 10.1007/s10822- 016-9957-5.
OpenUrl CrossRef

[116] [116].↵
Mikulskis P, Cioloboc D, Andrejić M, Khare S, Brorsson J, Genheden S, Mata RA, Söderhjelm P, Ryde U. Free-Energy Perturbation and Quantum Mechanical Study of SAMPL4 Octa-Acid Host–guest Binding Energies. J Comput Aided Mol Des. 2014 Apr; 28(4):375–400. doi: 10.1007/s10822-014-9739-x.
OpenUrl CrossRef

[117] [117].↵
Ma D, Zavalij PY, Isaacs L. Acyclic cucurbit[n]uril congeners are high affinity hosts. The Journal of organic chemistry. 2010; 75(14):4786–4795. doi: 10.1021/jo100760g.
OpenUrl CrossRef PubMed

[118] [118].↵
Bell DR, Qi R, Jing Z, Xiang JY, Mejias C, Schnieders MJ, Ponder JW, Ren P. Calculating binding free energies of host–guest systems using the AMOEBA polarizable force field. Physical Chemistry Chemical Physics. 2016; 18(44):30261–30269. doi: 10.1039/C6CP02509A.
OpenUrl CrossRef

[119] [119].↵
Tofoleanu F, Lee J, Pickard IV FC, König G, Huang J, Baek M, Seok C, Brooks BR. Absolute binding free energies for octa-acids and guests in SAMPL5. Journal of computer-aided molecular design. 2017; 31(1):107–118. doi: 10.1007/s10822-016-9965-5.
OpenUrl CrossRef

[120] [120].↵
Yin J, Fenley AT, Henriksen NM, Gilson MK. Toward improved force-field accuracy through sensitivity analysis of host-guest binding thermodynamics. The Journal of Physical Chemistry B. 2015; 119(32):10145–10155. doi: 10.1021/acs.jpcb.5b04262.
OpenUrl CrossRef

[121] [121].↵
Yin J, Henriksen NM, Muddana HS, Gilson MK. Bind3P: Optimization of a Water Model Based on Host-Guest Binding Data. Journal of chemical theory and computation. 2018; doi: 10.1021/acs.jctc.8b00318.
OpenUrl CrossRef

[122] [122].↵
Best RB, Vendruscolo M. Determination of protein structures consistent with NMR order parameters. Journal of the American Chemical Society. 2004; 126(26):8090–8091. doi: 10.1021/ja0396955.
OpenUrl CrossRef PubMed Web of Science

[123] [123].↵
White AD, Voth GA. Efficient and minimal method to bias molecular simulations with experimental data. Journal of chemical theory and computation. 2014; 10(8):3023–3030. doi: 10.1021/ct500320c.
OpenUrl CrossRef

[124] [124].↵
Mobley DL, Chodera JD, Isaacs L, Gibb BC. Advancing predictive modeling through focused development of model systems to drive new modeling innovations. 2016; Retrieved from https://escholarship.org/uc/item/7cf8c6cr.

[125] [125].↵
Kellett K, Duggan BM, Gilson MK. Facile Synthesis of a Diverse Library of Mono-3-substituted β-Cyclodextrin Analogues. chemRxiv.;.

Overview of the SAMPL6 host-guest binding affinity prediction challenge

Abstract

Introduction

Host-guest systems are a tractable model for assessing force field inaccuracies

SAMPL host-guest challenges have driven advances in our understanding of sources of error

SAMPL6 host-guest systems

Experimental host-guest affinity measurements

Methods

Challenge design and logistics

Challenge timeline

Bonus challenge

Preparation of standard input files

Statistical analysis of challenge entries

Performance statistics

Null model

Results

Overview of the methodologies

Modeling

Sampling and free energy prediction

Submission performance statistics

Challenge entries generally performed better on OA/TEMOA than CB8

Linear corrections fit to prior experimental data can reduce error without improving correlation

GAFF/AM1-BCC and TIP3P consistently overestimated the host-guest binding affinities

Comparison to null model and general observations

Bonus challenge

Comparison to previous rounds of the SAMPL host-guest binding challenge

Correlation improvements over SAMPL5 were largely driven by fits to prior experimental data

The SAMPL6 CB8 system presents significant challenges to modern methodologies

Discussion

The variability in difficulty highlights the need to evaluate methodologies on the same systems

Force field accuracy is a dominant limiting factor for modeling affinity

Other missing chemical details may also be major limiting factors

Even at extreme pH, protonation state effects may still contribute

Linear corrections fit to prior experimental measurements do not improve predictive utility

Outlook for future SAMPL host-guest challenges

Code and data availability

Author Contributions

Disclosures

Acknowledgments

Footnotes

List of abbreviations

References

Citation Manager Formats

Subject Area