skip to main content


Title: Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.  more » « less
Award ID(s):
2144153
NSF-PAR ID:
10396889
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Chemical Science
Volume:
14
Issue:
2
ISSN:
2041-6520
Page Range / eLocation ID:
226 to 244
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as well as or better than the best previous models on two HTE datasets for the Suzuki–Miyaura and Buchwald–Hartwig reactions. However, training the AGNN on an ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions. 
    more » « less
  2. Accurate prediction of the sensitivity properties of high-energy materials (HEMs) and the study of their decomposition mechanisms are two major focuses within energetics research. Due to the hazards associated with the synthesis and handling of energetic materials, predictive models for HEM sensitivity are of great importance in enabling the safe and efficient development of future HEMs. Traditional predictive modeling of HEM decomposition via machine learning algorithms generally displays limited interpretability, while mechanistic studies of HEMs typically focus on small subsets of structurally analogous compounds lacking generalizability. This study aims to bridge the gap between predictive modeling and computational mechanistic analysis of HEMs, with the goal of providing chemically interpretable models for HEM sensitivity property prediction. Herein, we disclose the use of multivariate linear regression (MLR) modeling for the prediction of the decomposition temperature and impact sensitivity of HEMs. We report an explosophore-based approach to sensitivity property prediction featuring an ensemble of quantum mechanical parameters and computational workflows that enable rapid parameterization and modeling of energetic functional groups. We then employ these methods to accurately predict sensitivity properties of nitrogen-rich tetrazole and azide HEMs. These statistical MLR models are readily interpreted based on the principles of physical organic chemistry, producing structure-property relationships to guide the rational design of new HEMs. Furthermore, we extend our explosophore-based approach to predict the sensitivity properties of HEMs containing multiple, non-equivalent energetic functional groups through the identification of molecular triggers for the bulk decomposition of HEMs. Finally, we showcase the viability of our methods towards ab initio virtual screening of HEMs through predictive modeling of external test sets of tetrazole HEMs using structures and parameters generated exclusively in silico. 
    more » « less
  3. null (Ed.)
    The properties of concretes are controlled by the rate of reaction of their precursors, the chemical composition of the binding phase(s), and their structure at different scales. However, the complex and multiscale structure of the cementitious hydrates and the dissimilar rates of numerous chemical reactions make it challenging to eluci- date such linkages. In particular, reliable predictions of strength development in concretes remain unavailable. As an alternative route to physics- or chemistry-based models, machine learning (ML) offers a means to develop powerful predictive models for materials using existing data. Here, it is shown that ML models can be used to accurately predict concrete’s compressive strength at 28 days. This approach relies on the analysis of a large data set (>10,000 observations) of measured compressive strengths for industrially produced concretes, based on knowledge of their mixture proportions. It is demonstrated that these models can readily predict the 28-day compressive strength of any concrete based merely on the knowledge of the mixture proportions with an accuracy of approximately ±4.4 MPa (as captured by the root- mean-square error). By comparing the performance of select ML algorithms, the balance between accuracy, simplicity, and inter- pretability in ML approaches is discussed. 
    more » « less
  4. Sodium-containing batteries have the potential to address many of the challenges faced in the ongoing development of enhanced energy storage devices. Sodium is inexpensive and earth abundant, and aprotic Na−O2 batteries, in particular, have gravimetric energy densities significantly exceeding those of Li-ion devices. However, poor functional cell lifespans present a significant obstacle to the development of Na−O2 cells, with parasitic side reactions involving the NaO2 discharge products, leading to a rapid decline in cell performance. These parasitic reactions are hypothesized to occur through two main pathways: (i) deleterious dissolution of NaO2 into the electrolyte during periods of cell idling and (ii) disproportionation of NaO2 in the near-surface region to form Na-rich species (Na1+xO2) on the cathode. To formulate practical strategies to suppress these processes, in turn, the development of fundamental, molecular-level mechanistic understanding is essential. In this contribution, such mechanistic insights are elucidated by coupling density functional theory calculations with experimental observations to study the surface chemistry of the NaO2 discharge product. First, a series of ab initio surface phase diagrams are constructed to determine the structure of the NaO2 surfaces under realistic operating conditions, whereby an inverse relationship between surface coordination and surface energy is determined. Next, a molecular surface dissolution analysis is performed for the identified surface terminations, demonstrating a further inverse relationship between surface energy and the thermodynamic barrier for dissolution. Finally, a study of the thermodynamics of thin-film formation of sodium oxides over the NaO2 discharge product is carried out and suggests that an electrochemical reduction reaction, rather than an inherent chemical disproportionation, forms the observed Na-rich species in the near-surface region under high discharge overpotentials. From these insights, we suggest future studies that may yield practical design changes to improve stability and extend the lifespan of Na−O2 batteries. 
    more » « less
  5. Abstract. Oxidation of biogenic volatile organic compounds (BVOC) by the nitrate radical (NO3) represents one of the important interactions between anthropogenic emissions related to combustion and natural emissions from the biosphere. This interaction has been recognized for more than 3 decades, during which time a large body of research has emerged from laboratory, field, and modeling studies. NO3-BVOC reactions influence air quality, climate and visibility through regional and global budgets for reactive nitrogen (particularly organic nitrates), ozone, and organic aerosol. Despite its long history of research and the significance of this topic in atmospheric chemistry, a number of important uncertainties remain. These include an incomplete understanding of the rates, mechanisms, and organic aerosol yields for NO3-BVOC reactions, lack of constraints on the role of heterogeneous oxidative processes associated with the NO3 radical, the difficulty of characterizing the spatial distributions of BVOC and NO3 within the poorly mixed nocturnal atmosphere, and the challenge of constructing appropriate boundary layer schemes and non-photochemical mechanisms for use in state-of-the-art chemical transport and chemistry–climate models.

    This review is the result of a workshop of the same title held at the Georgia Institute of Technology in June 2015. The first half of the review summarizes the current literature on NO3-BVOC chemistry, with a particular focus on recent advances in instrumentation and models, and in organic nitrate and secondary organic aerosol (SOA) formation chemistry. Building on this current understanding, the second half of the review outlines impacts of NO3-BVOC chemistry on air quality and climate, and suggests critical research needs to better constrain this interaction to improve the predictive capabilities of atmospheric models.

     
    more » « less