skip to main content


Title: Prediction of anisotropic NMR data without knowledge of alignment medium structure by surface decomposition
Prediction of anisotropic NMR data directly from solute-medium interaction is of significant theoretical and practical interest, particularly for structure elucidation, configurational analysis and conformational studies of complex organic molecules and natural products. Current prediction methods require an explicit structural model of the alignment medium: a requirement either impossible or impractical on a scale necessary for small organic molecules. Here we formulate a comprehensive mathematical framework for a parametrization protocol that deconvolutes an arbitrary surface of the medium into several simple local landscapes that are distributed over the medium's surface by specific orientational order parameters. The shapes and order parameters of these local landscapes are determined via fitting that maximizes the congruence between experimentally determined anisotropic NMR measurables and their predicted counterparts, thus avoiding the need for an a priori knowledge of the global medium morphology. This method achieves substantial improvements in the accuracy of predicted anisotropic NMR values compared to current methods, as demonstrated herein with sixteen natural products. Furthermore, because this formalism extracts structural commonalities of the medium by combining anisotropic NMR data from different compounds, its robustness and accuracy are expected to improve as more experimental data become available for further re-optimization of fitting parameters.  more » « less
Award ID(s):
2116395
NSF-PAR ID:
10416701
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Physical Chemistry Chemical Physics
Volume:
24
Issue:
34
ISSN:
1463-9076
Page Range / eLocation ID:
20164 to 20182
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Accurate estimation of solvation free energy (SFE) lays the foundation for accurate prediction of binding free energy. The Poisson‐Boltzmann (PB) or generalized Born (GB) combined with surface area (SA) continuum solvation method (PBSA and GBSA) have been widely used in SFE calculations because they can achieve good balance between accuracy and efficiency. However, the accuracy of these methods can be affected by several factors such as the charge models, polar and nonpolar SFE calculation methods and the atom radii used in the calculation. In this work, the performance of the ABCG2 (AM1‐BCC‐GAFF2) charge model as well as other two charge models, that is, RESP (Restrained Electrostatic Potential) and AM1‐BCC (Austin Model 1‐bond charge corrections), on the SFE prediction of 544 small molecules in water by PBSA/GBSA was evaluated. In order to improve the performance of the PBSA prediction based on the ABCG2 charge, we further explored the influence of atom radii on the prediction accuracy and yielded a set of atom radius parameters for more accurate SFE prediction using PBSA based on the ABCG2/GAFF2 by reproducing the thermodynamic integration (TI) calculation results. The PB radius parameters of carbon, oxygen, sulfur, phosphorus, chloride, bromide and iodine, were adjusted. New atom types,on,oi,hn1,hn2,hn3, were introduced to further improve the fitting performance. Then, we tuned the parameters in the nonpolar SFE model using the experimental SFE data and the PB calculation results. By adopting the new radius parameters and new nonpolar SFE model, the root mean square error (RMSE) of the SFE calculation for the 544 molecules decreased from 2.38 to 1.05 kcal/mol. Finally, the new radius parameters were applied in the prediction of protein‐ligand binding free energies using the MM‐PBSA method. For the eight systems tested, we could observe higher correlation between the experiment data and calculation results and smaller prediction errors for the absolute binding free energies, demonstrating that our new radius parameters can improve the free energy calculation using the MM‐PBSA method.

     
    more » « less
  2. Proteins and nucleic acids participate in essentially every biochemical process in living organisms, and the elucidation of their structure and motions is essential for our understanding how these molecular machines perform their function. Nuclear Magnetic Resonance (NMR) spectroscopy is a powerful versatile technique that provides critical information on the molecular structure and dynamics. Spin-relaxation data are used to determine the overall rotational diffusion and local motions of biological macromolecules, while residual dipolar couplings (RDCs) reveal local and long-range structural architecture of these molecules and their complexes. This information allows researchers to refine structures of proteins and nucleic acids and provides restraints for molecular docking. Several software packages have been developed by NMR researchers in order to tackle the complicated experimental data analysis and structure modeling. However, many of them are offline packages or command-line applications that require users to set up the run time environment and also to possess certain programming skills, which inevitably limits accessibility of this software to a broad scientific community. Here we present new science gateways designed for NMR/structural biology community that address these current limitations in NMR data analysis. Using the GenApp technology for scientific gateways (https://genapp.rocks), we successfully transformed ROTDIF and ALTENS, two offline packages for bio-NMR data analysis, into science gateways that provide advanced computational functionalities, cloud-based data management, and interactive 2D and 3D plotting and visualizations. Furthermore, these gateways are integrated with molecular structure visualization tools (Jmol) and with gateways/engines (SASSIE-web) capable of generating huge computer-simulated structural ensembles of proteins and nucleic acids. This enables researchers to seamlessly incorporate conformational ensembles into the analysis in order to adequately take into account structural heterogeneity and dynamic nature of biological macromolecules. ROTDIF-web offers a versatile set of integrated modules/tools for determining and predicting molecular rotational diffusion tensors and model-free characterization of bond dynamics in biomacromolecules and for docking of molecular complexes driven by the information extracted from NMR relaxation data. ALTENS allows characterization of the molecular alignment under anisotropic conditions, which enables researchers to obtain accurate local and long-range bond-vector restraints for refining 3-D structures of macromolecules and their complexes. We will describe our experience bringing our programs into GenApp and illustrate the use of these gateways for specific examples of protein systems of high biological significance. We expect these gateways to be useful to structural biologists and biophysicists as well as NMR community and to stimulate other researchers to share their scientific software in a similar way. 
    more » « less
  3. Abstract

    A simultaneously accurate and computationally efficient parametrization of the potential energy surface of molecules and materials is a long-standing goal in the natural sciences. While atom-centered message passing neural networks (MPNNs) have shown remarkable accuracy, their information propagation has limited the accessible length-scales. Local methods, conversely, scale to large simulations but have suffered from inferior accuracy. This work introduces Allegro, a strictly local equivariant deep neural network interatomic potential architecture that simultaneously exhibits excellent accuracy and scalability. Allegro represents a many-body potential using iterated tensor products of learned equivariant representations without atom-centered message passing. Allegro obtains improvements over state-of-the-art methods on QM9 and revMD17. A single tensor product layer outperforms existing deep MPNNs and transformers on QM9. Furthermore, Allegro displays remarkable generalization to out-of-distribution data. Molecular simulations using Allegro recover structural and kinetic properties of an amorphous electrolyte in excellent agreement with ab-initio simulations. Finally, we demonstrate parallelization with a simulation of 100 million atoms.

     
    more » « less
  4. Boreal lakes are the most abundant lakes on Earth. Changes in acid rain deposition, climate, and catchment land use have increased lateral fluxes of terrestrial dissolved organic matter (DOM), resulting in a widespread browning of boreal freshwaters. This browning affects the aqueous communities and ecosystem processes, and boost emissions of the greenhouse gases (GHG) CH 4 , CO 2 , and N 2 O. In this study, we predicted biotic saturation of GHGs in boreal lakes by using a set of chemical, hydrological, climate, and land use parameters. For this purpose, concentrations of GHGs and nutrients (organic C, -P, and -N) were determined in surface water samples from 73 lakes in south-eastern Norway covering wide ranges in DOM and nutrient concentrations, as well as catchment properties and land use. The spatial variation in saturation of each GHG is related to explanatory variables. Catchment characteristics (hydrological and climate parameters) such as lake size and summer precipitation, as well as NDVI, were key determinants when fitting GAM models for CH 4 and CO 2 saturation (explaining 71 and 54%, respectively), while summer precipitation and land use data were the best predictors for the N 2 O saturation, explaining almost 50% of deviance. Our results suggest that lake size, precipitation, and terrestrial primary production in the watershed control the saturation of GHG in boreal lakes. These predictions based on the 73-lake dataset was validated against an independent dataset from 46 lakes in the same region. Together, this provides an improved understanding of drivers and spatial variation in GHG saturation in boreal lakes across wide gradients of lake and catchment properties. The assessment highlights the need to incorporate multiple explanatory parameters in prediction models of GHGs for extrapolation across the boreal biome. 
    more » « less
  5. Nuclear magnetic resonance (NMR) is one of the primary techniques used to elucidate the chemical structure, bonding, stereochemistry, and conformation of organic compounds. The distinct chemical shifts in an NMR spectrum depend upon each atom's local chemical environment and are influenced by both through-bond and through-space interactions with other atoms and functional groups. The in silico prediction of NMR chemical shifts using quantum mechanical (QM) calculations is now commonplace in aiding organic structural assignment since spectra can be computed for several candidate structures and then compared with experimental values to find the best possible match. However, the computational demands of calculating multiple structural- and stereo-isomers, each of which may typically exist as an ensemble of rapidly-interconverting conformations, are expensive. Additionally, the QM predictions themselves may lack sufficient accuracy to identify a correct structure. In this work, we address both of these shortcomings by developing a rapid machine learning (ML) protocol to predict 1 H and 13 C chemical shifts through an efficient graph neural network (GNN) using 3D structures as input. Transfer learning with experimental data is used to improve the final prediction accuracy of a model trained using QM calculations. When tested on the CHESHIRE dataset, the proposed model predicts observed 13 C chemical shifts with comparable accuracy to the best-performing DFT functionals (1.5 ppm) in around 1/6000 of the CPU time. An automated prediction webserver and graphical interface are accessible online at http://nova.chem.colostate.edu/cascade/. We further demonstrate the model in three applications: first, we use the model to decide the correct organic structure from candidates through experimental spectra, including complex stereoisomers; second, we automatically detect and revise incorrect chemical shift assignments in a popular NMR database, the NMRShiftDB; and third, we use NMR chemical shifts as descriptors for determination of the sites of electrophilic aromatic substitution. 
    more » « less