skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Apples to apples comparison of standardized to unstandardized principal component analysis of methods that assign partial atomic charges in molecules
Articles by Cho et al. ( ChemPhysChem , 2020, 21 , 688–696) and Manz ( RSC Adv. , 2020, 10 , 44121–44148) performed unstandardized and standardized, respectively, principal component analysis (PCA) to study atomic charge assignment methods for molecular systems. Both articles used subsets of atomic charges computed by Cho et al. ; however, the data subsets employed were not strictly identical. Herein, an element by element analysis of this dataset is first performed to compare the spread of charge values across individual chemical elements and charge assignment methods. This reveals an underlying problem with the reported Becke partial atomic charges in this dataset. Due to their unphysical values, these Becke charges were not included in the subsequent PCA. Standardized and unstandardized PCA are performed across two datasets: (i) 19 charge assignment methods having a complete basis set limit and (ii) all 25 charge assignment methods (excluding Becke) for which Cho et al. computed atomic charges. The dataset contained ∼2000 molecules having a total of 29 907 atoms in materials. The following five methods (listed here in alphabetical order) showed the greatest correlation to the first principal component in standardized and unstandardized PCA: DDEC6, Hirshfeld-I, ISA, MBIS, and MBSBickelhaupt (note: MBSBickelhaupt does not appear in the 19 methods dataset). For standardized PCA, the DDEC6 method ranked first followed closely by MBIS. For unstandardized PCA, Hirshfeld-I (19 methods) or MBSBickelhaupt (25 methods) ranked first followed by DDEC6 in second place (both 19 and 25 methods).  more » « less
Award ID(s):
1555376
PAR ID:
10384167
Author(s) / Creator(s):
Date Published:
Journal Name:
RSC Advances
Volume:
12
Issue:
49
ISSN:
2046-2069
Page Range / eLocation ID:
31617 to 31628
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This article studies two kinds of information extracted from statistical correlations between methods for assigning net atomic charges (NACs) in molecules. First, relative charge transfer magnitudes are quantified by performing instant least squares fitting (ILSF) on the NACs reported by Cho et al. ( ChemPhysChem , 2020, 21 , 688–696) across 26 methods applied to ∼2000 molecules. The Hirshfeld and Voronoi deformation density (VDD) methods had the smallest charge transfer magnitudes, while the quantum theory of atoms in molecules (QTAIM) method had the largest charge transfer magnitude. Methods optimized to reproduce the molecular dipole moment ( e.g. , ACP, ADCH, CM5) have smaller charge transfer magnitudes than methods optimized to reproduce the molecular electrostatic potential ( e.g. , CHELPG, HLY, MK, RESP). Several methods had charge transfer magnitudes even larger than the electrostatic potential fitting group. Second, confluence between different charge assignment methods is quantified to identify which charge assignment method produces the best NAC values for predicting via linear correlations the results of 20 charge assignment methods having a complete basis set limit across the dataset of ∼2000 molecules. The DDEC6 NACs were the best such predictor of the entire dataset. Seven confluence principles are introduced explaining why confluent quantitative descriptors offer predictive advantages for modeling a broad range of physical properties and target applications. These confluence principles can be applied in various fields of scientific inquiry. A theory is derived showing confluence is better revealed by standardized statistical analysis ( e.g. , principal components analysis of the correlation matrix and standardized reversible linear regression) than by unstandardized statistical analysis. These confluence principles were used together with other key principles and the scientific method to make assigning atom-in-material properties non-arbitrary. The N@C 60 system provides an unambiguous and non-arbitrary falsifiable test of atomic population analysis methods. The HLY, ISA, MK, and RESP methods failed for this material. 
    more » « less
  2. The DDEC6 method is one of the most accurate and broadly applicable atomic population analysis methods. It works for a broad range of periodic and non-periodic materials with no magnetism, collinear magnetism, and non-collinear magnetism irrespective of the basis set type. First, we show DDEC6 charge partitioning to assign net atomic charges corresponds to solving a series of 14 Lagrangians in order. Then, we provide flow diagrams for overall DDEC6 analysis, spin partitioning, and bond order calculations. We wrote an OpenMP parallelized Fortran code to provide efficient computations. We show that by storing large arrays as shared variables in cache line friendly order, memory requirements are independent of the number of parallel computing cores and false sharing is minimized. We show that both total memory required and the computational time scale linearly with increasing numbers of atoms in the unit cell. Using the presently chosen uniform grids, computational times of ∼9 to 94 seconds per atom were required to perform DDEC6 analysis on a single computing core in an Intel Xeon E5 multi-processor unit. Parallelization efficiencies were usually >50% for computations performed on 2 to 16 cores of a cache coherent node. As examples we study a B-DNA decamer, nickel metal, supercells of hexagonal ice crystals, six X@C 60 endohedral fullerene complexes, a water dimer, a Mn 12 -acetate single molecule magnet exhibiting collinear magnetism, a Fe 4 O 12 N 4 C 40 H 52 single molecule magnet exhibiting non-collinear magnetism, and several spin states of an ozone molecule. Efficient parallel computation was achieved for systems containing as few as one and as many as >8000 atoms in a unit cell. We varied many calculation factors ( e.g. , grid spacing, code design, thread arrangement, etc. ) and report their effects on calculation speed and precision. We make recommendations for excellent performance. 
    more » « less
  3. Abstract Discovery of target‐binding molecules, such as aptamers and peptides, is usually performed with the use of high‐throughput experimental screening methods. These methods typically generate large datasets of sequences of target‐binding molecules, which can be enriched with high affinity binders. However, the identification of the highest affinity binders from these large datasets often requires additional low‐throughput experiments or other approaches. Bioinformatics‐based analyses could be helpful to better understand these large datasets and identify the parts of the sequence space enriched with high affinity binders.BinderSpaceis an open‐source Python package that performs motif analysis, sequence space visualization, clustering analyses, and sequence extraction from clusters of interest. The motif analysis, resulting in text‐based and visual output of motifs, can also provide heat maps of previously measured user‐defined functional properties for all the motif‐containing molecules. Users can also run principal component analysis (PCA) and t‐distributed stochastic neighbor embedding (t‐SNE) analyses on whole datasets and on motif‐related subsets of the data. Functionally important sequences can also be highlighted in the resulting PCA and t‐SNE maps. If points (sequences) in two‐dimensional maps in PCA or t‐SNE space form clusters, users can perform clustering analyses on their data, and extract sequences from clusters of interest. We demonstrate the use ofBinderSpaceon a dataset of oligonucleotides binding to single‐wall carbon nanotubes in the presence and absence of a bioanalyte, and on a dataset of cyclic peptidomimetics binding to bovine carbonic anhydrase protein.BinderSpaceis openly accessible to the public via the GitHub website:https://github.com/vukoviclab/BinderSpace. 
    more » « less
  4. This paper explores variability in the fundamental frequency (f0) of utterances containing the remote past marker BIN in African American English, which has been described as having higher f0, intensity and duration relative to preceding material, and reduced f0 following, though with some interspeaker variability (Green et al. 2022). Here we re-analyze data from Green et al. (2022) to characterize the space of possible phonetic realizations of BIN utterances. We computed the 90th percentile f0 value in pre-/on-/post-BIN regions to create a 3-point "topline" f0 shape profile of the utterance (Cooper & Sorensen 1981) and performed time series clustering and principal components analysis (PCA). Two clusters were identified, one with higher f0 on BIN and lower f0 post-BIN, and one with lower f0 on BIN and higher f0 post-BIN. Results from PCA indicate speakers vary along two dimensions: one relating to pre-BIN f0 and one to post-BIN f0. Both dimensions were tied to f0 height on BIN, demonstrating the role that global aspects of the contour play in the variability. We show how the topline representation of f0 contour shape is robust to missing values and uncontrolled sentences and thus useful for naturalistic speech. 
    more » « less
  5. The closed-form solution of the 1.5 post-Newtonian (PN) accurate binary black hole (BBH) Hamiltonian system has proven to be evasive for a long time since the introduction of the system in 1966. Solutions of the PN BBH systems with arbitrary parameters (masses, spins, eccentricity) are required for modeling the gravitational waves emitted by them. Accurate models of gravitational waves are crucial for their detection by LIGO/Virgo and LISA. Only recently, two solution methods for solving the BBH dynamics were proposed in Ref. [G. Cho and H. M. Lee, Phys. Rev. D 100, 044046 (2019)] (without using action-angle variables), and Refs. [S. Tanay et al., Phys. Rev. D 103, 064066 (2021), S. Tanay et al., Phys. Rev. D 107, 103040 (2023)] (action-angle based). This paper combines the ideas laid out in the above articles, fills the missing gaps and compiles the two solutions which are fully 1.5PN accurate. We also present a public Mathematica package bbhpntoolkit which implements these two solutions and compares them with the result of numerical integration of the evolution equations. The level of agreement between these solutions provides a numerical verification for all the five action variables constructed in Refs. [S. Tanay et al., Phys. Rev. D 103, 064066 (2021), S. Tanay et al., Phys. Rev. D 107, 103040 (2023)]. This paper hence serves as a stepping stone for pushing the action-angle-based solution to 2PN order via canonical perturbation theory. 
    more » « less