skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Modeling molecular ensembles with gradient-domain machine learning force fields
Gradient-domain machine learning (GDML) force fields have shown excellent accuracy, data efficiency, and applicability for molecules with hundreds of atoms, but the employed global descriptor limits transferability to ensembles of molecules. Many-body expansions (MBEs) should provide a rigorous procedure for size-transferable GDML by training models on fundamental n-body interactions. We developed many-body GDML (mbGDML) force fields for water, acetonitrile, and methanol by training 1-, 2-, and 3-body models on only 1000 MP2/def2-TZVP calculations each. Our mbGDML force field includes intramolecular flexibility and intermolecular interactions, providing that the reference data adequately describe these effects. Energy and force predictions of clusters containing up to 20 molecules are within 0.38 kcal/mol per monomer and 0.06 kcal/(mol Å) per atom of reference supersystem calculations. This deviation partially arises from the restriction of the mbGDML model to 3-body interactions. GAP and SchNet in this MBE framework achieved similar accuracies but occasionally had abnormally high errors up to 17 kcal/mol. NequIP trained on total energies and forces of trimers experienced much larger energy errors (at least 15 kcal/mol) as the number of monomers increased—demonstrating the effectiveness of size transferability with MBEs. Given these approximations, our automated mbGDML training schemes also resulted in fair agreement with reference radial distribution functions (RDFs) of bulk solvents. These results highlight mbGDML as valuable for modeling explicitly solvated systems with quantum-mechanical accuracy.  more » « less
Award ID(s):
1653392 1705592 1856460
PAR ID:
10528635
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Royal Society of Chemistry
Date Published:
Journal Name:
Digital Discovery
Volume:
2
Issue:
3
ISSN:
2635-098X
Page Range / eLocation ID:
871 to 880
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Hydration free energies of small molecules are commonly used as benchmarks for solvation models. However, errors in predicting hydration free energies are partially due to the force fields used and not just the solvation model. To address this, we have used the 3D reference interaction site model (3D-RISM) of molecular solvation and existing benchmark explicit solvent calculations with a simple element count correction (ECC) to identify problems with the non-bond parameters in the general AMBER force field (GAFF). 3D-RISM was used to calculate hydration free energies of all 642 molecules in the FreeSolv database, and a partial molar volume correction (PMVC), ECC, and their combination (PMVECC) were applied to the results. The PMVECC produced a mean unsigned error of 1.01±0.04kcal/mol and root mean squared error of 1.44±0.07kcal/mol, better than the benchmark explicit solvent calculations from FreeSolv, and required less than 15 s of computing time per molecule on a single CPU core. Importantly, parameters for PMVECC showed systematic errors for molecules containing Cl, Br, I, and P. Applying ECC to the explicit solvent hydration free energies found the same systematic errors. The results strongly suggest that some small adjustments to the Lennard–Jones parameters for GAFF will lead to improved hydration free energy calculations for all solvent models. 
    more » « less
  2. Abstract Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. Most DNN potentials were parametrized for neutral molecules or closed-shell ions due to architectural limitations. In this work, we propose an improved machine learning framework for simulating open-shell anions and cations. We introduce the AIMNet-NSE (Neural Spin Equilibration) architecture, which can predict molecular energies for an arbitrary combination of molecular charge and spin multiplicity with errors of about 2–3 kcal/mol and spin-charges with error errors ~0.01e for small and medium-sized organic molecules, compared to the reference QM simulations. The AIMNet-NSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory quantities like electronegativity, hardness, and condensed Fukui functions. We show that these descriptors, along with learned atomic representations, could be used to model chemical reactivity through an example of regioselectivity in electrophilic aromatic substitution reactions. 
    more » « less
  3. Abstract We have performed a series of highly accurate calculations between CO2and the 20 naturally occurring amino acids for the investigation of the attractive noncovalent interactions. Different nucleophilic groups present in the amino acid structures were considered (α‐NH2, COOH, side groups), and the stronger binding sites were identified. A database of accurate reference interactions energies was compiled as computed by explicitly‐correlated coupled‐cluster singles‐and‐doubles, together with perturbative triples extrapolated to the complete‐basis‐set limit. The CCSD(F12)(T)/CBS reference values were used for comparing a variety of popular density functionals with different basis sets. Our results show that most density functionals with the triple‐zeta basis set def2‐TZVPP align with the CCSD(F12)(T)/CBS reference values, but errors range from 0.1 kcal/mol up to 1.0 kcal/mol. 
    more » « less
  4. null (Ed.)
    Dinitrogen pentoxide (N2O5) is an important intermediate in the atmospheric chemistry of nitrogen oxides. Although there has been much research, the processes that govern the physical interactions between N2O5 and water are still not fully understood at a molecular level. Gaining a quantitative insight from computer simulations requires going beyond the accuracy of classical force fields while accessing length scales and time scales that are out of reach for high-level quantum-chemical approaches. To this end, we present the development of MB-nrg many-body potential energy functions for nonreactive simulations of N2O5 in water. This MB-nrg model is based on electronic structure calculations at the coupled cluster level of theory and is compatible with the successful MB-pol model for water. It provides a physically correct description of long-range many-body interactions in combination with an explicit representation of up to three-body short-range interactions in terms of multidimensional permutationally invariant polynomials. In order to further investigate the importance of the underlying interactions in the model, a TTM-nrg model was also devised. TTM-nrg is a more simplistic representation that contains only two-body short-range interactions represented through Born–Mayer functions. In this work, an active learning approach was employed to efficiently build representative training sets of monomer, dimer, and trimer structures, and benchmarks are presented to determine the accuracy of our new models in comparison to a range of density functional theory methods. By assessing the binding curves, distortion energies of N2O5, and interaction energies in clusters of N2O5 and water, we evaluate the importance of two-body and three-body short-range potentials. The results demonstrate that our MB-nrg model has high accuracy with respect to the coupled cluster reference, outperforms current density functional theory models, and thus enables highly accurate simulations of N2O5 in aqueous environments. 
    more » « less
  5. Abstract Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal ⋅ mol−1with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal ⋅ mol−1) on test data. Moreover, density-basedΔ-learning (learning only the correction to a standard DFT calculation, termedΔ-DFT ) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness ofΔ-DFT  is highlighted by correcting “on the fly” DFT-based molecular dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, thatΔ-DFT  facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails. 
    more » « less