skip to main content


Title: Seven confluence principles: a case study of standardized statistical analysis for 26 methods that assign net atomic charges in molecules
This article studies two kinds of information extracted from statistical correlations between methods for assigning net atomic charges (NACs) in molecules. First, relative charge transfer magnitudes are quantified by performing instant least squares fitting (ILSF) on the NACs reported by Cho et al. ( ChemPhysChem , 2020, 21 , 688–696) across 26 methods applied to ∼2000 molecules. The Hirshfeld and Voronoi deformation density (VDD) methods had the smallest charge transfer magnitudes, while the quantum theory of atoms in molecules (QTAIM) method had the largest charge transfer magnitude. Methods optimized to reproduce the molecular dipole moment ( e.g. , ACP, ADCH, CM5) have smaller charge transfer magnitudes than methods optimized to reproduce the molecular electrostatic potential ( e.g. , CHELPG, HLY, MK, RESP). Several methods had charge transfer magnitudes even larger than the electrostatic potential fitting group. Second, confluence between different charge assignment methods is quantified to identify which charge assignment method produces the best NAC values for predicting via linear correlations the results of 20 charge assignment methods having a complete basis set limit across the dataset of ∼2000 molecules. The DDEC6 NACs were the best such predictor of the entire dataset. Seven confluence principles are introduced explaining why confluent quantitative descriptors offer predictive advantages for modeling a broad range of physical properties and target applications. These confluence principles can be applied in various fields of scientific inquiry. A theory is derived showing confluence is better revealed by standardized statistical analysis ( e.g. , principal components analysis of the correlation matrix and standardized reversible linear regression) than by unstandardized statistical analysis. These confluence principles were used together with other key principles and the scientific method to make assigning atom-in-material properties non-arbitrary. The N@C 60 system provides an unambiguous and non-arbitrary falsifiable test of atomic population analysis methods. The HLY, ISA, MK, and RESP methods failed for this material.  more » « less
Award ID(s):
1555376
NSF-PAR ID:
10205902
Author(s) / Creator(s):
Date Published:
Journal Name:
RSC Advances
Volume:
10
Issue:
72
ISSN:
2046-2069
Page Range / eLocation ID:
44121 to 44148
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Articles by Cho et al. ( ChemPhysChem , 2020, 21 , 688–696) and Manz ( RSC Adv. , 2020, 10 , 44121–44148) performed unstandardized and standardized, respectively, principal component analysis (PCA) to study atomic charge assignment methods for molecular systems. Both articles used subsets of atomic charges computed by Cho et al. ; however, the data subsets employed were not strictly identical. Herein, an element by element analysis of this dataset is first performed to compare the spread of charge values across individual chemical elements and charge assignment methods. This reveals an underlying problem with the reported Becke partial atomic charges in this dataset. Due to their unphysical values, these Becke charges were not included in the subsequent PCA. Standardized and unstandardized PCA are performed across two datasets: (i) 19 charge assignment methods having a complete basis set limit and (ii) all 25 charge assignment methods (excluding Becke) for which Cho et al. computed atomic charges. The dataset contained ∼2000 molecules having a total of 29 907 atoms in materials. The following five methods (listed here in alphabetical order) showed the greatest correlation to the first principal component in standardized and unstandardized PCA: DDEC6, Hirshfeld-I, ISA, MBIS, and MBSBickelhaupt (note: MBSBickelhaupt does not appear in the 19 methods dataset). For standardized PCA, the DDEC6 method ranked first followed closely by MBIS. For unstandardized PCA, Hirshfeld-I (19 methods) or MBSBickelhaupt (25 methods) ranked first followed by DDEC6 in second place (both 19 and 25 methods). 
    more » « less
  2. Abstract

    Accurate estimation of solvation free energy (SFE) lays the foundation for accurate prediction of binding free energy. The Poisson‐Boltzmann (PB) or generalized Born (GB) combined with surface area (SA) continuum solvation method (PBSA and GBSA) have been widely used in SFE calculations because they can achieve good balance between accuracy and efficiency. However, the accuracy of these methods can be affected by several factors such as the charge models, polar and nonpolar SFE calculation methods and the atom radii used in the calculation. In this work, the performance of the ABCG2 (AM1‐BCC‐GAFF2) charge model as well as other two charge models, that is, RESP (Restrained Electrostatic Potential) and AM1‐BCC (Austin Model 1‐bond charge corrections), on the SFE prediction of 544 small molecules in water by PBSA/GBSA was evaluated. In order to improve the performance of the PBSA prediction based on the ABCG2 charge, we further explored the influence of atom radii on the prediction accuracy and yielded a set of atom radius parameters for more accurate SFE prediction using PBSA based on the ABCG2/GAFF2 by reproducing the thermodynamic integration (TI) calculation results. The PB radius parameters of carbon, oxygen, sulfur, phosphorus, chloride, bromide and iodine, were adjusted. New atom types,on,oi,hn1,hn2,hn3, were introduced to further improve the fitting performance. Then, we tuned the parameters in the nonpolar SFE model using the experimental SFE data and the PB calculation results. By adopting the new radius parameters and new nonpolar SFE model, the root mean square error (RMSE) of the SFE calculation for the 544 molecules decreased from 2.38 to 1.05 kcal/mol. Finally, the new radius parameters were applied in the prediction of protein‐ligand binding free energies using the MM‐PBSA method. For the eight systems tested, we could observe higher correlation between the experiment data and calculation results and smaller prediction errors for the absolute binding free energies, demonstrating that our new radius parameters can improve the free energy calculation using the MM‐PBSA method.

     
    more » « less
  3. null (Ed.)
    Abstract The restrained electrostatic potential (RESP) approach is a highly regarded and widely used method of assigning partial charges to molecules for simulations. RESP uses a quantum-mechanical method that yields fortuitous overpolarization and thereby accounts only approximately for self-polarization of molecules in the condensed phase. Here we present RESP2, a next generation of this approach, where the polarity of the charges is tuned by a parameter, δ, which scales the contributions from gas- and aqueous-phase calculations. When the complete non-bonded force field model, including Lennard-Jones parameters, is optimized to liquid properties, improved accuracy is achieved, even with this reduced set of five Lennard-Jones types. We argue that RESP2 with δ  ≈ 0.6 (60% aqueous, 40% gas-phase charges) is an accurate and robust method of generating partial charges, and that a small set of Lennard-Jones types is a good starting point for a systematic re-optimization of this important non-bonded term. 
    more » « less
  4. Abstract

    A next‐generation protocol (Poltype 2) has been developed which automatically generates AMOEBA polarizable force field parameters for small molecules. Both features and computational efficiency have been drastically improved. Notable advances include improved database transferability using SMILES, robust torsion fitting, non‐aromatic ring torsion parameterization, coupled torsion‐torsion parameterization, Van der Waals parameter refinement using ab initio dimer data and an intelligent fragmentation scheme that produces parameters with dramatically reduced ab initio computational cost. Additional improvements include better local frame assignment for atomic multipoles, automated formal charge assignment, Zwitterion detection, smart memory resource defaults, parallelized fragment job submission, incorporation of Psi4 quantum package, ab initio error handling, ionization state enumeration, hydration free energy prediction and binding free energy prediction. For validation, we have applied Poltype 2 to ~1000 FDA approved drug molecules from DrugBank. The ab initio molecular dipole moments and electrostatic potential values were compared with Poltype 2 derived AMOEBA counterparts. Parameters were further substantiated by calculating hydration free energy (HFE) on 40 small organic molecules and were compared with experimental data, resulting in an RMSE error of 0.59 kcal/mol. The torsion database has expanded to include 3543 fragments derived from FDA approved drugs. Poltype 2 provides a convenient utility for applications including binding free energy prediction for computational drug discovery. Further improvement will focus on automated parameter refinement by experimental liquid properties, expansion of the Van der Waals parameter database and automated parametrization of modified bio‐fragments such as amino and nucleic acids.

     
    more » « less
  5. Introduction Products of plant secondary metabolism, such as phenolic compounds, flavonoids, alkaloids, and hormones, play an important role in plant growth, development, stress resistance. The plant family Rubiaceae is extremely diverse and abundant in Central America and contains several economically important genera, e.g. Coffea and other medicinal plants. These are known for the production of bioactive polyphenols (e.g. caffeine and quinine), which have had major impacts on human society. The overall goal of this study was to develop a high-throughput workflow to identify and quantify plant polyphenols. Methods First, a method was optimized to extract over 40 families of phytochemicals. Then, a high-throughput metabolomic platform has been developed to identify and quantify 184 polyphenols in 15 min. Results The current metabolomics study of secondary metabolites was conducted on leaves from one commercial coffee variety and two wild species that also belong to the Rubiaceae family. Global profiling was performed using liquid chromatography high-resolution time-of-flight mass spectrometry. Features whose abundance was significantly different between coffee species were discriminated using statistical analysis and annotated using spectral databases. The identified features were validated by commercially available standards using our newly developed liquid chromatography tandem mass spectrometry method. Discussion Caffeine, trigonelline and theobromine were highly abundant in coffee leaves, as expected. Interestingly, wild Rubiaceae leaves had a higher diversity of phytochemicals in comparison to commercial coffee: defense-related molecules, such as phenylpropanoids (e.g., cinnamic acid), the terpenoid gibberellic acid, and the monolignol sinapaldehyde were found more abundantly in wild Rubiaceae leaves. 
    more » « less