skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Modeling molecular ensembles with gradient-domain machine learning force fields
Gradient-domain machine learning (GDML) force fields have shown excellent accuracy, data efficiency, and applicability for molecules with hundreds of atoms, but the employed global descriptor limits transferability to ensembles of molecules. Many-body expansions (MBEs) should provide a rigorous procedure for size-transferable GDML by training models on fundamental n-body interactions. We developed many-body GDML (mbGDML) force fields for water, acetonitrile, and methanol by training 1-, 2-, and 3-body models on only 1000 MP2/def2-TZVP calculations each. Our mbGDML force field includes intramolecular flexibility and intermolecular interactions, providing that the reference data adequately describe these effects. Energy and force predictions of clusters containing up to 20 molecules are within 0.38 kcal/mol per monomer and 0.06 kcal/(mol Å) per atom of reference supersystem calculations. This deviation partially arises from the restriction of the mbGDML model to 3-body interactions. GAP and SchNet in this MBE framework achieved similar accuracies but occasionally had abnormally high errors up to 17 kcal/mol. NequIP trained on total energies and forces of trimers experienced much larger energy errors (at least 15 kcal/mol) as the number of monomers increased—demonstrating the effectiveness of size transferability with MBEs. Given these approximations, our automated mbGDML training schemes also resulted in fair agreement with reference radial distribution functions (RDFs) of bulk solvents. These results highlight mbGDML as valuable for modeling explicitly solvated systems with quantum-mechanical accuracy.  more » « less
Award ID(s):
1653392 1705592 1856460
PAR ID:
10528635
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Royal Society of Chemistry
Date Published:
Journal Name:
Digital Discovery
Volume:
2
Issue:
3
ISSN:
2635-098X
Page Range / eLocation ID:
871 to 880
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Hydration free energies of small molecules are commonly used as benchmarks for solvation models. However, errors in predicting hydration free energies are partially due to the force fields used and not just the solvation model. To address this, we have used the 3D reference interaction site model (3D-RISM) of molecular solvation and existing benchmark explicit solvent calculations with a simple element count correction (ECC) to identify problems with the non-bond parameters in the general AMBER force field (GAFF). 3D-RISM was used to calculate hydration free energies of all 642 molecules in the FreeSolv database, and a partial molar volume correction (PMVC), ECC, and their combination (PMVECC) were applied to the results. The PMVECC produced a mean unsigned error of 1.01±0.04kcal/mol and root mean squared error of 1.44±0.07kcal/mol, better than the benchmark explicit solvent calculations from FreeSolv, and required less than 15 s of computing time per molecule on a single CPU core. Importantly, parameters for PMVECC showed systematic errors for molecules containing Cl, Br, I, and P. Applying ECC to the explicit solvent hydration free energies found the same systematic errors. The results strongly suggest that some small adjustments to the Lennard–Jones parameters for GAFF will lead to improved hydration free energy calculations for all solvent models. 
    more » « less
  2. Abstract Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. Most DNN potentials were parametrized for neutral molecules or closed-shell ions due to architectural limitations. In this work, we propose an improved machine learning framework for simulating open-shell anions and cations. We introduce the AIMNet-NSE (Neural Spin Equilibration) architecture, which can predict molecular energies for an arbitrary combination of molecular charge and spin multiplicity with errors of about 2–3 kcal/mol and spin-charges with error errors ~0.01e for small and medium-sized organic molecules, compared to the reference QM simulations. The AIMNet-NSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory quantities like electronegativity, hardness, and condensed Fukui functions. We show that these descriptors, along with learned atomic representations, could be used to model chemical reactivity through an example of regioselectivity in electrophilic aromatic substitution reactions. 
    more » « less
  3. Abstract We have performed a series of highly accurate calculations between CO2and the 20 naturally occurring amino acids for the investigation of the attractive noncovalent interactions. Different nucleophilic groups present in the amino acid structures were considered (α‐NH2, COOH, side groups), and the stronger binding sites were identified. A database of accurate reference interactions energies was compiled as computed by explicitly‐correlated coupled‐cluster singles‐and‐doubles, together with perturbative triples extrapolated to the complete‐basis‐set limit. The CCSD(F12)(T)/CBS reference values were used for comparing a variety of popular density functionals with different basis sets. Our results show that most density functionals with the triple‐zeta basis set def2‐TZVPP align with the CCSD(F12)(T)/CBS reference values, but errors range from 0.1 kcal/mol up to 1.0 kcal/mol. 
    more » « less
  4. null (Ed.)
    Abstract Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal ⋅ mol −1 with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal ⋅ mol −1 ) on test data. Moreover, density-based Δ -learning (learning only the correction to a standard DFT calculation, termed Δ -DFT ) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of Δ -DFT  is highlighted by correcting “on the fly” DFT-based molecular dynamics (MD) simulations of resorcinol (C 6 H 4 (OH) 2 ) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ -DFT  facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails. 
    more » « less
  5. Abstract Base stacking interactions between adjacent bases in DNA and RNA are important for many biological processes and in biotechnology applications. Previous work has estimated stacking energies between pairs of bases, but contributions of individual bases has remained unknown. Here, we use a Centrifuge Force Microscope for high-throughput single molecule experiments to measure stacking energies between adjacent bases. We found stacking energies strongest between purines (G|A at −2.3 ± 0.2 kcal/mol) and weakest between pyrimidines (C|T at −0.5 ± 0.1 kcal/mol). Hybrid stacking with phosphorylated, methylated, and RNA nucleotides had no measurable effect, but a fluorophore modification reduced stacking energy. We experimentally show that base stacking can influence stability of a DNA nanostructure, modulate kinetics of enzymatic ligation, and assess accuracy of force fields in molecular dynamics simulations. Our results provide insights into fundamental DNA interactions that are critical in biology and can inform design in biotechnology applications. 
    more » « less