Optimization of Molecules via Deep Reinforcement Learning
Abstract

We present a framework, which we call Molecule DeepQ-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (doubleQ-learning and randomized value functions). We directly define modifications on molecules, thereby ensuring 100% chemical validity. Further, we operate without pre-training on any dataset to avoid possible bias from the choice of that set. MolDQN achieves comparable or better performance against several other recently published algorithms for benchmark molecular optimization tasks. However, we also argue that many of these tasks are not representative of real optimization problems in drug discovery. Inspired by problems faced during medicinal chemistry lead optimization, we extend our model with multi-objective reinforcement learning, which maximizes drug-likeness while maintaining similarity to the original molecule. We further show the path through chemical space to achieve optimization for a molecule to understand how the model works.

Authors:
; ; ; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10153643
Journal Name:
Scientific Reports
Volume:
9
Issue:
1
ISSN:
2045-2322
Publisher:
Nature Publishing Group
National Science Foundation
More Like this
1. With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery’s search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC 50 better than 50 μM. Without any medicinal chemistry optimization,more »
2. Abstract

Our career-forward approach to general chemistry laboratory for engineers involves the use of design challenges (DCs), an innovation that employs authentic professional context and practice to transform traditional tasks into developmentally appropriate career experiences. These challenges are scaled-down engineering problems related to the US National Academy of Engineering’s Grand Challenges that engage students in collaborative problem solving via the modeling process. With task features aligned with professional engineering practice, DCs are hypothesized to support student motivation for the task as well as for the profession. As an evaluation of our curriculum design process, we use expectancy–value theory to test our hypotheses by investigating the association between students’ task value beliefs and self-confidence with their user experience, gender and URM status. Using stepwise multiple regression analysis, the results reveal that students find value in completing a DC (F(5,2430) = 534.96,p < .001) and are self-confident (F(8,2427) = 154.86,p < .001) when they feel like an engineer, are satisfied, perceive collaboration, are provided help from a teaching assistant, and the tasks are not too difficult. We highlight that although female and URM students felt less self-confidence in completing a DC, these feelings were moderated by their perceptions of feeling like an engineer and collaboration in the learning process (F(10,2425) = 127.06,p < .001).more »

3. Study of the permeability of small organic molecules across lipid membranes plays a significant role in designing potential drugs in the field of drug discovery. Approaches to design promising drug molecules have gone through many stages, from experiment-based trail-and-error approaches, to the well-established avenue of the quantitative structure–activity relationship, and currently to the stage guided by machine learning (ML) and artificial intelligence techniques. In this work, we present a study of the permeability of small drug-like molecules across lipid membranes by two types of ML models, namely the least absolute shrinkage and selection operator (LASSO) and deep neural network (DNN) models. Molecular descriptors and fingerprints are used for featurization of organic molecules. Using molecular descriptors, the LASSO model uncovers that the electro-topological, electrostatic, polarizability, and hydrophobicity/hydrophilicity properties are the most important physical properties to determine the membrane permeability of small drug-like molecules. Additionally, with molecular fingerprints, the LASSO model suggests that certain chemical substructures can significantly affect the permeability of organic molecules, which closely connects to the identified main physical properties. Moreover, the DNN model using molecular fingerprints can help develop a more accurate mapping between molecular structures and their membrane permeability than LASSO models. Our results provide deep understandingmore »
4. Abstract Motivation

The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through graph neural networks (GNNs). Both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model ought to exploit both node (atom) and edge (bond) information simultaneously. Inspired by this observation, we explore the multi-view modeling with GNN (MVGNN) to form a novel paralleled framework, which considers both atoms and bonds equally important when learning molecular representations. In specific, one view is atom-central and the other view is bond-central, then the two views are circulated via specifically designed components to enable more accurate predictions. To further enhance the expressive power of MVGNN, we propose a cross-dependent message-passing scheme to enhance information communication of different views. The overall framework is termed as CD-MVGNN.

Results

We theoretically justify the expressiveness of the proposed model in terms of distinguishing non-isomorphism graphs. Extensive experiments demonstrate that CD-MVGNN achieves remarkably superior performance over the state-of-the-art models on various challenging benchmarks. Meanwhile, visualization results of the node importance are consistent with prior knowledge, which confirms the interpretability power of CD-MVGNN.

Availability and implementation

The code and data underlyingmore »

Supplementary information

Supplementary data are available at Bioinformatics online.

5. Abstract

The quantum simulation of quantum chemistry is a promising application of quantum computers. However, forNmolecular orbitals, the$${\mathcal{O}}({N}^{4})$$$O\left({N}^{4}\right)$gate complexity of performing Hamiltonian and unitary Coupled Cluster Trotter steps makes simulation based on such primitives challenging. We substantially reduce the gate complexity of such primitives through a two-step low-rank factorization of the Hamiltonian and cluster operator, accompanied by truncation of small terms. Using truncations that incur errors below chemical accuracy allow one to perform Trotter steps of the arbitrary basis electronic structure Hamiltonian with$${\mathcal{O}}({N}^{3})$$$O\left({N}^{3}\right)$gate complexity in small simulations, which reduces to$${\mathcal{O}}({N}^{2})$$$O\left({N}^{2}\right)$gate complexity in the asymptotic regime; and unitary Coupled Cluster Trotter steps with$${\mathcal{O}}({N}^{3})$$$O\left({N}^{3}\right)$gate complexity as a function of increasing basis size for a given molecule. In the case of the Hamiltonian Trotter step, these circuits have$${\mathcal{O}}({N}^{2})$$$O\left({N}^{2}\right)$depth on a linearly connected array, an improvement over the$${\mathcal{O}}({N}^{3})$$$O\left({N}^{3}\right)$scaling assuming no truncation. As a practical example, we show that a chemically accurate Hamiltonian Trotter step for a 50 qubit molecular simulation can be carried out in the molecular orbital basis with as few as 4000 layers of parallel nearest-neighbor two-qubit gates, consisting of fewer than 105non-Clifford rotations. We also apply our algorithm to iron–sulfur clusters relevant for elucidating the mode of action of metalloenzymes.