ABSTRACT
Drug discovery is an extensive and rigorous process that requires up to 2 billion dollars of investments and more than ten years of research and development to bring a molecule “from bench to a bedside”. While virtual screening can significantly enhance drug discovery workflow, it ultimately lags the current rate of expansion of chemical databases that already incorporate billions of purchasable compounds. This surge of available small molecules presents great opportunities for drug discovery but also demands for faster virtual screening methods and protocols. In order to address this challenge, we herein introduce Deep Docking (D2) - a novel deep learning-based approach which is suited for docking billions of molecular structures. The developed D2-platform utilizes quantitative structure-activity relationship (QSAR) based deep models trained on docking scores of subsets of a large chemical library (Big Base) to approximate the docking outcome for yet unprocessed molecular entries and to remove unfavorable structures in an iterative manner. We applied D2 to virtually screen 1.36 billion molecules form the ZINC15 library against 12 prominent target proteins, and demonstrated up to 100-fold chemical data reduction and 6,000-fold enrichment for top hits, without notable loss of well-docked entities. The developed D2 protocol can readily be used in conjunction with any docking program and was made publicly available.