skip to main content


Title: Optimization of Molecules via Deep Reinforcement Learning
Abstract

We present a framework, which we call Molecule DeepQ-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (doubleQ-learning and randomized value functions). We directly define modifications on molecules, thereby ensuring 100% chemical validity. Further, we operate without pre-training on any dataset to avoid possible bias from the choice of that set. MolDQN achieves comparable or better performance against several other recently published algorithms for benchmark molecular optimization tasks. However, we also argue that many of these tasks are not representative of real optimization problems in drug discovery. Inspired by problems faced during medicinal chemistry lead optimization, we extend our model with multi-objective reinforcement learning, which maximizes drug-likeness while maintaining similarity to the original molecule. We further show the path through chemical space to achieve optimization for a molecule to understand how the model works.

 
more » « less
Award ID(s):
1734082
NSF-PAR ID:
10153643
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
9
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Neurotransmitters are small molecules involved in neuronal signaling and can also serve as stress biomarkers.1Their abnormal levels have been also proposed to be indicative of several neurological diseases such as Alzheimer’s disease, Parkinson’s disease, Huntington disease, among others. Hence, measuring their levels is highly important for early diagnosis, therapy, and disease prognosis. In this work, we investigate facile functionalization methods to tune and enhance sensitivity of printed graphene sensors to neurotransmitters. Sensors based on direct laser scribing and screen-printed graphene ink are studied. These printing methods offer ease of prototyping and scalable fabrication at low cost.

    The effect of functionalization of laser induced graphene (LIG) by electrodeposition and solution-based deposition of TMDs (molybdenum disulfide2and tungsten disulfide) and metal nanoparticles is studied. For different processing methods, electrochemical characteristics (such as electrochemically active surface area: ECSA and heterogenous electron transfer rate: k0) are extracted and correlated to surface chemistry and defect density obtained respectively using X-ray photoelectron spectroscopy (XPS) and Raman spectroscopy. These functionalization methods are observed to directly impact the sensitivity and limit of detection (LOD) of the graphene sensors for the studied neurotransmitters. For example, as compared to bare LIG, it is observed that electrodeposition of MoS2on LIG improves ECSA by 3 times and k0by 1.5 times.3Electrodeposition of MoS2also significantly reduces LOD of serotonin and dopamine in saliva, enabling detection of their physiologically relevant concentrations (in pM-nM range). In addition, chemical treatment of LIG sensors is carried out in the form of acetic acid treatment. Acetic acid treatment has been shown previously to improve C-C bonds improving the conductivity of LIG sensors.4In our work, in particular, acetic acid treatment leads to larger improvement of LOD of norepinephrine compared to MoS2electrodeposition.

    In addition, we investigate the effect of plasma treatment to tune the sensor response by modifying the defect density and chemistry. For example, we find that oxygen plasma treatment of screen-printed graphene ink greatly improves LOD of norepinephrine up to three orders of magnitude, which may be attributed to the increased defects and oxygen functional groups on the surface as evident by XPS measurements. Defects are known to play a key role in enhancing the sensitivity of 2D materials to surface interactions, and have been explored in tuning/enhancing the sensor sensitivity.5Building on our previous work,3we apply a custom machine learning-based data processing method to further improve that sensitivity and LOD, and also to automatically benchmark different molecule-material pairs.

    Future work includes expanding the plasma chemistry and conditions, studying the effect of precursor mixture in laser-induced solution-based functionalization, and understanding the interplay between molecule-material system. Work is also underway to improve the machine learning model by using nonlinear learning models such as neural networks to improve the sensor sensitivity, selectivity, and robustness.

    References

    A. J. Steckl, P. Ray, (2018), doi:10.1021/acssensors.8b00726.

    Y. Lei, D. Butler, M. C. Lucking, F. Zhang, T. Xia, K. Fujisawa, T. Granzier-Nakajima, R. Cruz-Silva, M. Endo, H. Terrones, M. Terrones, A. Ebrahimi,Sci. Adv.6, 4250–4257 (2020).

    V. Kammarchedu, D. Butler, A. Ebrahimi,Anal. Chim. Acta.1232, 340447 (2022).

    H. Yoon, J. Nah, H. Kim, S. Ko, M. Sharifuzzaman, S. C. Barman, X. Xuan, J. Kim, J. Y. Park,Sensors Actuators B Chem.311, 127866 (2020).

    T. Wu, A. Alharbi, R. Kiani, D. Shahrjerdi,Adv. Mater.31, 1–12 (2019).

     
    more » « less
  2. In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent’s task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of the Q-function on other agents decreases exponentially as the distance between them increases. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by three case studies, two wireless communication case studies with independent and dependent reward functions, respectively, and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation. 
    more » « less
  3. In precision medicine, the ultimate goal is to recommend the most effective treatment to an individual patient based on patient‐specific molecular and clinical profiles, possibly high‐dimensional. To advance cancer treatment, large‐scale screenings of cancer cell lines against chemical compounds have been performed to help better understand the relationship between genomic features and drug response; existing machine learning approaches use exclusively supervised learning, including penalized regression and recommender systems. However, it would be more efficient to apply reinforcement learning to sequentially learn as data accrue, including selecting the most promising therapy for a patient given individual molecular and clinical features and then collecting and learning from the corresponding data. In this article, we propose a novel personalized ranking system called Proximal Policy Optimization Ranking (PPORank), which ranks the drugs based on their predicted effects per cell line (or patient) in the framework of deep reinforcement learning (DRL). Modeled as a Markov decision process, the proposed method learns to recommend the most suitable drugs sequentially and continuously over time. As a proof‐of‐concept, we conduct experiments on two large‐scale cancer cell line data sets in addition to simulated data. The results demonstrate that the proposed DRL‐based PPORank outperforms the state‐of‐the‐art competitors based on supervised learning. Taken together, we conclude that novel methods in the framework of DRL have great potential for precision medicine and should be further studied.

     
    more » « less
  4. Tasks across diverse application domains can be posed as large-scale optimization problems, these include graphics, vision, machine learning, imaging, health, scheduling, planning, and energy system forecasting. Independently of the application domain, proximal algorithms have emerged as a formal optimization method that successfully solves a wide array of existing problems, often exploiting problem-specific structures in the optimization. Although model-based formal optimization provides a principled approach to problem modeling with convergence guarantees, at first glance, this seems to be at odds with black-box deep learning methods. A recent line of work shows that, when combined with learning-based ingredients, model-based optimization methods are effective, interpretable, and allow for generalization to a wide spectrum of applications with little or no extra training data. However, experimenting with such hybrid approaches for different tasks by hand requires domain expertise in both proximal optimization and deep learning, which is often error-prone and time-consuming. Moreover, naively unrolling these iterative methods produces lengthy compute graphs, which when differentiated via autograd techniques results in exploding memory consumption, making batch-based training challenging. In this work, we introduce ∇-Prox, a domain-specific modeling language and compiler for large-scale optimization problems using differentiable proximal algorithms. ∇-Prox allows users to specify optimization objective functions of unknowns concisely at a high level, and intelligently compiles the problem into compute and memory-efficient differentiable solvers. One of the core features of ∇-Prox is its full differentiability, which supports hybrid model- and learning-based solvers integrating proximal optimization with neural network pipelines. Example applications of this methodology include learning-based priors and/or sample-dependent inner-loop optimization schedulers, learned with deep equilibrium learning or deep reinforcement learning. With a few lines of code, we show ∇-Prox can generate performant solvers for a range of image optimization problems, including end-to-end computational optics, image deraining, and compressive magnetic resonance imaging. We also demonstrate ∇-Prox can be used in a completely orthogonal application domain of energy system planning, an essential task in the energy crisis and the clean energy transition, where it outperforms state-of-the-art CVXPY and commercial Gurobi solvers. 
    more » « less
  5. null (Ed.)
    With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery’s search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC 50 better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC 50 280 nM, corresponding to K i of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts. 
    more » « less