Abstract
Virtual screening is receiving renewed attention in drug discovery, but progress is hampered by challenges on two fronts: handling the ever increasing sizes of libraries of drug-like compounds, and separating true positives from false positives. Here we developed a machine learning-enabled pipeline for large-scale virtual screening that promises breakthroughs on both fronts. By clustering compounds according to molecular properties and limited docking against a drug target, the full library was trimmed by 10-fold; the remaining compounds were then screened individually by docking; and finally a dense neural network was trained to classify the hits into true and false positives. As illustration, we screened for inhibitors against RPN11, the deubiquitinase subunit of the proteasome and a drug target for breast cancer.
Competing Interest Statement
The authors have declared no competing interest.