skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predicting chemical shifts with graph neural networks
Inferring molecular structure from Nuclear Magnetic Resonance (NMR) measurements requires an accurate forward model that can predict chemical shifts from 3D structure. Current forward models are limited to specific molecules like proteins and state-of-the-art models are not differentiable. Thus they cannot be used with gradient methods like biased molecular dynamics. Here we use graph neural networks (GNNs) for NMR chemical shift prediction. Our GNN can model chemical shifts accurately and capture important phenomena like hydrogen bonding induced downfield shift between multiple proteins, secondary structure effects, and predict shifts of organic molecules. Previous empirical NMR models of protein NMR have relied on careful feature engineering with domain expertise. These GNNs are trained from data alone with no feature engineering yet are as accurate and can work on arbitrary molecular structures. The models are also efficient, able to compute one million chemical shifts in about 5 seconds. This work enables a new category of NMR models that have multiple interacting types of macromolecules.  more » « less
Award ID(s):
1764415 1751471
PAR ID:
10282214
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Chemical Science
ISSN:
2041-6520
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The ability to theoretically predict accurate NMR chemical shifts in solids is increasingly important due to the role such shifts play in selecting among proposed model structures. Herein, two theoretical methods are evaluated for their ability to assign15N shifts from guanosine dihydrate to one of the two independent molecules present in the lattice. The NMR data consist of15N shift tensors from 10 resonances. Analysis using periodic boundary or fragment methods consider a benchmark dataset to estimate errors and predict uncertainties of 5.6 and 6.2 ppm, respectively. Despite this high accuracy, only one of the five sites were confidently assigned to a specific molecule of the asymmetric unit. This limitation is not due to negligible differences in experimental data, as most sites exhibit differences of >6.0 ppm between pairs of resonances representing a given position. Instead, the theoretical methods are insufficiently accurate to make assignments at most positions. 
    more » « less
  2. Nuclear magnetic resonance (NMR) is one of the primary techniques used to elucidate the chemical structure, bonding, stereochemistry, and conformation of organic compounds. The distinct chemical shifts in an NMR spectrum depend upon each atom's local chemical environment and are influenced by both through-bond and through-space interactions with other atoms and functional groups. The in silico prediction of NMR chemical shifts using quantum mechanical (QM) calculations is now commonplace in aiding organic structural assignment since spectra can be computed for several candidate structures and then compared with experimental values to find the best possible match. However, the computational demands of calculating multiple structural- and stereo-isomers, each of which may typically exist as an ensemble of rapidly-interconverting conformations, are expensive. Additionally, the QM predictions themselves may lack sufficient accuracy to identify a correct structure. In this work, we address both of these shortcomings by developing a rapid machine learning (ML) protocol to predict 1 H and 13 C chemical shifts through an efficient graph neural network (GNN) using 3D structures as input. Transfer learning with experimental data is used to improve the final prediction accuracy of a model trained using QM calculations. When tested on the CHESHIRE dataset, the proposed model predicts observed 13 C chemical shifts with comparable accuracy to the best-performing DFT functionals (1.5 ppm) in around 1/6000 of the CPU time. An automated prediction webserver and graphical interface are accessible online at http://nova.chem.colostate.edu/cascade/. We further demonstrate the model in three applications: first, we use the model to decide the correct organic structure from candidates through experimental spectra, including complex stereoisomers; second, we automatically detect and revise incorrect chemical shift assignments in a popular NMR database, the NMRShiftDB; and third, we use NMR chemical shifts as descriptors for determination of the sites of electrophilic aromatic substitution. 
    more » « less
  3. Short hydrogen bonds (SHBs), which have donor and acceptor separations below 2.7 Å, occur extensively in small molecules and proteins. Due to their compact structures, SHBs exhibit prominent covalent characters with elongated Donor-H bonds and highly downfield (>14 ppm) 1H NMR chemical shifts. In this work, we carry out first principles simulations on a set of model molecules to assess how quantum effects determine the symmetry and chemical shift of their SHBs. From simulations that incorporate the quantum mechanical nature of both the electrons and nuclei, we reveal a universal relation between the chemical shift and the position of the proton in a SHB, and unravel the origin of the observed downfield spectral signatures. We further develop a metric that allows one to accurately and efficiently determine the proton position directly from its 1H chemical shift, which will facilitate the experimental examination of SHBs in both small molecules and biological macromolecules. 
    more » « less
  4. Machine learning is becoming increasingly important in the prediction of nuclear magnetic resonance (NMR) chemical shifts and other observable properties. This chapter provides an introduction to the construction of machine learning (ML) models for predicting NMR properties, including the discussion of feature engineering, common ML model types, Δ-ML and transfer learning, and the curation of training and testing data. Then it discusses a number of recent examples of ML models for predicting chemical shifts and spin–spin coupling constants in organic and inorganic species. These examples highlight how the decisions made in constructing the ML model impact its performance, discuss strategies for achieving more accurate ML models, and present some representative case studies showing how ML is transforming the way NMR crystallography is performed. 
    more » « less
  5. null (Ed.)
    Study of the permeability of small organic molecules across lipid membranes plays a significant role in designing potential drugs in the field of drug discovery. Approaches to design promising drug molecules have gone through many stages, from experiment-based trail-and-error approaches, to the well-established avenue of the quantitative structure–activity relationship, and currently to the stage guided by machine learning (ML) and artificial intelligence techniques. In this work, we present a study of the permeability of small drug-like molecules across lipid membranes by two types of ML models, namely the least absolute shrinkage and selection operator (LASSO) and deep neural network (DNN) models. Molecular descriptors and fingerprints are used for featurization of organic molecules. Using molecular descriptors, the LASSO model uncovers that the electro-topological, electrostatic, polarizability, and hydrophobicity/hydrophilicity properties are the most important physical properties to determine the membrane permeability of small drug-like molecules. Additionally, with molecular fingerprints, the LASSO model suggests that certain chemical substructures can significantly affect the permeability of organic molecules, which closely connects to the identified main physical properties. Moreover, the DNN model using molecular fingerprints can help develop a more accurate mapping between molecular structures and their membrane permeability than LASSO models. Our results provide deep understanding of drug–membrane interactions and useful guidance for the inverse molecular design of drug-like molecules. Last but not least, while the current focus is on the permeability of drug-like molecules, the methodology of this work is general and can be applied for other complex physical chemistry problems to gain molecular insights. 
    more » « less