Mostly covering 2018 to 2022 This article describes a personal selection of recent misassigned structures of natural products and their revision with the aid of DU8ML, a machine learning-augmented DFT computational method for fast and accurate calculations of solution NMR chemical shifts and spin–spin coupling constants.
more »
« less
This content will become publicly available on March 31, 2026
Predicting Solid-state NMR Observables via Machine Learning
Machine learning is becoming increasingly important in the prediction of nuclear magnetic resonance (NMR) chemical shifts and other observable properties. This chapter provides an introduction to the construction of machine learning (ML) models for predicting NMR properties, including the discussion of feature engineering, common ML model types, Δ-ML and transfer learning, and the curation of training and testing data. Then it discusses a number of recent examples of ML models for predicting chemical shifts and spin–spin coupling constants in organic and inorganic species. These examples highlight how the decisions made in constructing the ML model impact its performance, discuss strategies for achieving more accurate ML models, and present some representative case studies showing how ML is transforming the way NMR crystallography is performed.
more »
« less
- Award ID(s):
- 1955554
- PAR ID:
- 10629166
- Publisher / Repository:
- Royal Society of Chemistry
- Date Published:
- ISBN:
- 978-1-83767-066-6
- Page Range / eLocation ID:
- 224 to 255
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Nuclear magnetic resonance (NMR) is one of the primary techniques used to elucidate the chemical structure, bonding, stereochemistry, and conformation of organic compounds. The distinct chemical shifts in an NMR spectrum depend upon each atom's local chemical environment and are influenced by both through-bond and through-space interactions with other atoms and functional groups. The in silico prediction of NMR chemical shifts using quantum mechanical (QM) calculations is now commonplace in aiding organic structural assignment since spectra can be computed for several candidate structures and then compared with experimental values to find the best possible match. However, the computational demands of calculating multiple structural- and stereo-isomers, each of which may typically exist as an ensemble of rapidly-interconverting conformations, are expensive. Additionally, the QM predictions themselves may lack sufficient accuracy to identify a correct structure. In this work, we address both of these shortcomings by developing a rapid machine learning (ML) protocol to predict 1 H and 13 C chemical shifts through an efficient graph neural network (GNN) using 3D structures as input. Transfer learning with experimental data is used to improve the final prediction accuracy of a model trained using QM calculations. When tested on the CHESHIRE dataset, the proposed model predicts observed 13 C chemical shifts with comparable accuracy to the best-performing DFT functionals (1.5 ppm) in around 1/6000 of the CPU time. An automated prediction webserver and graphical interface are accessible online at http://nova.chem.colostate.edu/cascade/. We further demonstrate the model in three applications: first, we use the model to decide the correct organic structure from candidates through experimental spectra, including complex stereoisomers; second, we automatically detect and revise incorrect chemical shift assignments in a popular NMR database, the NMRShiftDB; and third, we use NMR chemical shifts as descriptors for determination of the sites of electrophilic aromatic substitution.more » « less
-
Calculations with high accuracy for atomic and inter-atomic properties, such as nuclear magnetic resonance (NMR) spectroscopy and bond dissociation energies (BDEs) are valuable for pharmaceutical molecule structural analysis, drug exploration, and screening. It is important that these calculations should include relativistic effects, which are computationally expensive to treat. Non-relativistic calculations are less expensive but their results are less accurate. In this study, we present a computational framework for predicting atomic and inter-atomic properties by using machine-learning in a non-relativistic but accurate and computationally inexpensive framework. The accurate atomic and inter-atomic properties are obtained with a low dimensional deep neural network (DNN) embedded in a fragment-based graph convolutional neural network (F-GCN). The F-GCN acts as an atomic fingerprint generator that converts the atomistic local environments into data for the DNN, which improves the learning ability, resulting in accurate results as compared to experiments. Using this framework, the 13C/1H NMR chemical shifts of Nevirapine and phenol O–H BDEs are predicted to be in good agreement with experimental measurement.more » « less
-
Abstract Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.more » « less
-
Abstract Recent advances in explainable artificial intelligence (XAI) methods show promise for understanding predictions made by machine learning (ML) models. XAI explains how the input features are relevant or important for the model predictions. We train linear regression (LR) and convolutional neural network (CNN) models to make 1-day predictions of sea ice velocity in the Arctic from inputs of present-day wind velocity and previous-day ice velocity and concentration. We apply XAI methods to the CNN and compare explanations to variance explained by LR. We confirm the feasibility of using a novel XAI method [i.e., global layerwise relevance propagation (LRP)] to understand ML model predictions of sea ice motion by comparing it to established techniques. We investigate a suite of linear, perturbation-based, and propagation-based XAI methods in both local and global forms. Outputs from different explainability methods are generally consistent in showing that wind speed is the input feature with the highest contribution to ML predictions of ice motion, and we discuss inconsistencies in the spatial variability of the explanations. Additionally, we show that the CNN relies on both linear and nonlinear relationships between the inputs and uses nonlocal information to make predictions. LRP shows that wind speed over land is highly relevant for predicting ice motion offshore. This provides a framework to show how knowledge of environmental variables (i.e., wind) on land could be useful for predicting other properties (i.e., sea ice velocity) elsewhere. Significance StatementExplainable artificial intelligence (XAI) is useful for understanding predictions made by machine learning models. Our research establishes trustability in a novel implementation of an explainable AI method known as layerwise relevance propagation for Earth science applications. To do this, we provide a comparative evaluation of a suite of explainable AI methods applied to machine learning models that make 1-day predictions of Arctic sea ice velocity. We use explainable AI outputs to understand how the input features are used by the machine learning to predict ice motion. Additionally, we show that a convolutional neural network uses nonlinear and nonlocal information in making its predictions. We take advantage of the nonlocality to investigate the extent to which knowledge of wind on land is useful for predicting sea ice velocity elsewhere.more » « less
An official website of the United States government
