skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: Assessing conformer energies using electronic structure and machine learning methods
Abstract

We have performed a large‐scale evaluation of current computational methods, including conventional small‐molecule force fields; semiempirical, density functional, ab initio electronic structure methods; and current machine learning (ML) techniques to evaluate relative single‐point energies. Using up to 10 local minima geometries across ~700 molecules, each optimized by B3LYP‐D3BJ with single‐point DLPNO‐CCSD(T) triple‐zeta energies, we consider over 6500 single points to compare the correlation between different methods for both relative energies and ordered rankings of minima. We find that the current ML methods have potential and recommend methods at each tier of the accuracy‐time tradeoff, particularly the recent GFN2 semiempirical method, the B97‐3c density functional approximation, and RI‐MP2 for accurate conformer energies. The ANI family of ML methods shows promise, particularly the ANI‐1ccx variant trained in part on coupled‐cluster energies. Multiple methods suggest continued improvements should be expected in both performance and accuracy.

 
more » « less
Award ID(s):
1800435
PAR ID:
10453347
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
International Journal of Quantum Chemistry
Volume:
121
Issue:
1
ISSN:
0020-7608
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern semiempirical electronic structure methods have considerable promise in drug discovery as universal “force fields” that can reliably model biological and drug-like molecules, including alternative tautomers and protonation states. Herein, we compare the performance of several neglect of diatomic differential overlap-based semiempirical (MNDO/d, AM1, PM6, PM6-D3H4X, PM7, and ODM2), density-functional tight-binding based (DFTB3, DFTB/ChIMES, GFN1-xTB, and GFN2-xTB) models with pure machine learning potentials (ANI-1x and ANI-2x) and hybrid quantum mechanical/machine learning potentials (AIQM1 and QD π) for a wide range of data computed at a consistent ωB97X/6-31G* level of theory (as in the ANI-1x database). This data includes conformational energies, intermolecular interactions, tautomers, and protonation states. Additional comparisons are made to a set of natural and synthetic nucleic acids from the artificially expanded genetic information system that has important implications for the design of new biotechnology and therapeutics. Finally, we examine the acid/base chemistry relevant for RNA cleavage reactions catalyzed by small nucleolytic ribozymes, DNAzymes, and ribonucleases. Overall, the hybrid quantum mechanical/machine learning potentials appear to be the most robust for these datasets, and the recently developed QD π model performs exceptionally well, having especially high accuracy for tautomers and protonation states relevant to drug discovery. 
    more » « less
  2. Abstract

    Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.

     
    more » « less
  3. Abstract

    We have carried out a large scale computational investigation to assess the utility of common small‐molecule force fields for computational screening of low energy conformers of typical organic molecules. Using statistical analyses on the energies and relative rankings of up to 250 diverse conformers of 700 different molecular structures, we find that energies from widely used classical force fields (MMFF94, UFF, and GAFF) show unconditionally poor energy and rank correlation with semiempirical (PM7) and Kohn–Sham density functional theory (DFT) energies calculated at PM7 and DFT optimized geometries. In contrast, semiempirical PM7 calculations show significantly better correlation with DFT calculations and generally better geometries. With these results, we make recommendations to more reliably carry out conformer screening.

     
    more » « less
  4. Abstract

    This study explores open-shell biradical and polyradical molecular compounds based on extended multireference (MR) methods (MR-configuration interaction with singles and doubles (CISD) and MR-averaged quadratic coupled cluster (AQCC) approach) using the numbers of unpaired densitiesNU. These results were used to guide the analysis of the fractional occupation number weighted density (FOD) calculated within the finite temperature (FT) density functional theory (DFT) approach. As critical test examples, the dissociation of carbon–carbon (CC) single, double and triple bonds and a benchmark set of polycyclic aromatic hydrocarbons (PAHs) have been chosen. By examining single, double, and triple bond dissociations, we demonstrate the utility and accuracy but also limitations of the FOD analysis for describing these dissociation processes. In significant extension of previous work (Phys Chem Chem Phys 25: 27380–27393), the assessment of FOD applications for different classes of DFT functionals was performed examining the range-separated functionals ωB97XD, ωB97M-V, CAM-B3LYP, LC-ωPBE, and MN12-SX, the hybrid (M06-2X) functional and the double hybrid (B2P-LYP) functional. In all cases, strong correlations betweenNFODandNUvalues are found. The major task was to develop a new linear regression formula for range-separated functionals allowing a convenient determination of the optimal electronic temperatureTelfor the FT-DFT calculation. We also established an optimal temperature for the semiempirical extended tight-binding GFN2-xTB method. These findings significantly broaden the applicability of FOD analysis across various DFT functionals and semiempirical methods.

     
    more » « less
  5. Abstract

    Thes‐homodesmotic method for computing conventional strain energies (CSE) has been extended for the first time to bicyclic systems and to individual rings within these systems. Unique isodesmic, homodesmotic, and hyperhomodesmotic reactions originate from thes‐homodesmotic method. These are used to investigate 12 bicyclic systems comprising cyclopropane and cyclobutane and how the CSE of each system compares to the sum of the individual rings within each. Equilibrium geometries, harmonic vibrational frequencies, and the corresponding electronic energies and zero point vibrational energy corrections are computed for all relevant molecules using second‐order perturbation theory and density functional theory (B3LYP) with the correlation consistent basis sets cc‐pVDZ and cc‐pVTZ. Single‐point CCSD(T) energies are computed at the MP2/cc‐pVTZ optimized geometries to ascertain the importance of higher order correlation effects. Results indicate that CSEs are additive when the two rings are separated by one or two bonds and somewhat additive in other cases.

     
    more » « less