A Machine Learning-Enabled Pipeline for Large-Scale Virtual Drug Screening

Aayush Gupta; Huan-Xiang Zhou

doi:10.1101/2021.06.20.449177

Abstract

Virtual screening is receiving renewed attention in drug discovery, but progress is hampered by challenges on two fronts: handling the ever increasing sizes of libraries of drug-like compounds, and separating true positives from false positives. Here we developed a machine learning-enabled pipeline for large-scale virtual screening that promises breakthroughs on both fronts. By clustering compounds according to molecular properties and limited docking against a drug target, the full library was trimmed by 10-fold; the remaining compounds were then screened individually by docking; and finally a dense neural network was trained to classify the hits into true and false positives. As illustration, we screened for inhibitors against RPN11, the deubiquitinase subunit of the proteasome and a drug target for breast cancer.

Competing Interest Statement

The authors have declared no competing interest.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.