skip to main content


Title: Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations
Recent studies illustrate how machine learning (ML) can be used to bypass a core challenge of molecular modeling: the trade-off between accuracy and computational cost. Here, we assess multiple ML approaches for predicting the atomization energy of organic molecules. Our resulting models learn the difference between low-fidelity, B3LYP, and high-accuracy, G4MP2, atomization energies and predict the G4MP2 atomization energy to 0.005 eV (mean absolute error) for molecules with less than nine heavy atoms (training set of 117,232 entries, test set 13,026) and 0.012 eV for a small set of 66 molecules with between 10 and 14 heavy atoms. Our two best models, which have different accuracy/speed trade-offs, enable the efficient prediction of G4MP2-level energies for large molecules and are available through a simple web interface.  more » « less
Award ID(s):
1636950
NSF-PAR ID:
10134744
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
MRS Communications
Volume:
9
Issue:
3
ISSN:
2159-6859
Page Range / eLocation ID:
891 to 899
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    A catalytic surface should be stable under reaction conditions to be effective. However, it takes significant effort to screen many surfaces for their stability, as this requires intensive quantum chemical calculations. To more efficiently estimate stability, we provide a general and data-efficient machine learning (ML) approach to accurately and efficiently predict the surface energies of metal alloy surfaces. Our ML approach introduces an element-centered fingerprint (ECFP) which was used as a vector representation for fitting models for predicting surface formation energies. The ECFP is significantly more accurate than several existing feature sets when applied to dilute alloy surfaces and is competitive with existing feature sets when applied to bulk alloy surfaces or gas-phase molecules. Models using the ECFP as input can be quite general, as we created models with good accuracy over a broad set of bimetallic surfaces including most d-block metals, even with relatively small datasets. For example, using the ECFP, we developed a kernel ridge regression ML model which is able to predict the surface energies of alloys of diverse metal combinations with a mean absolute error of 0.017 eV atom−1. Combining this model with an existing model for predicting adsorption energies, we estimated segregation trends of 596 single-atom alloys (SAAs)with and without CO adsorbed on these surfaces. As a simple test of the approach, we identify specific cases where CO does not induce segregation in these SAAs.

     
    more » « less
  2. Abstract

    The earlier integration of validated Lennard–Jones (LJ) potentials for 8 fcc metals into materials and biomolecular force fields has advanced multiple research fields, for example, metal–electrolyte interfaces, recognition of biomolecules, colloidal assembly of metal nanostructures, alloys, and catalysis. Here we introduce 12-6 and 9-6 LJ parameters for classical all-atom simulations of 10 further fcc metals (Ac, Ca (α), Ce (γ), Es (β), Fe (γ), Ir, Rh, Sr (α), Th (α), Yb (β)) and stainless steel. The parameters reproduce lattice constants, surface energies, water interfacial energies, and interactions with (bio)organic molecules in 0.1 to 5% agreement with experiment, as well as qualitative mechanical properties under standard conditions. Deviations are reduced up to a factor of one hundred in comparison to earlier Lennard–Jones parameters, embedded atom models, and density functional theory. We also explain a quantitative correlation between atomization energies from experiments and surface energies that supports parameter development. The models are computationally very efficient and applicable to an exponential space of alloys. Compatibility with a wide range of force fields such as the Interface force field (IFF), AMBER, CHARMM, COMPASS, CVFF, DREIDING, OPLS-AA, and PCFF enables reliable simulations of nanostructures up to millions of atoms and microsecond time scales. User-friendly model building and input generation are available in the CHARMM-GUI Nanomaterial Modeler. As a limitation, deviations in mechanical properties vary and are comparable to DFT methods. We discuss the incorporation of reactivity and features of the electronic structure to expand the range of applications and further increase the accuracy.

     
    more » « less
  3. Ultra-high-energy (UHE) photons are an important tool for studying the high-energy Universe. A plausible source of photons with exa-eV (EeV) energy is provided by UHE cosmic rays (UHECRs) undergoing the Greisen–Zatsepin–Kuzmin process (Greisen 1966; Zatsepin & Kuzmin 1966) or pair production process (Blumenthal 1970) on a cosmic background radiation. In this context, the EeV photons can be a probe of both UHECR mass composition and the distribution of their sources (Gelmini, Kalashev & Semikoz 2008; Hooper, Taylor & Sarkar 2011). At the same time, the possible flux of photons produced by UHE protons in the vicinity of their sources by pion photoproduction or inelastic nuclear collisions would be noticeable only for relatively near sources, as the attenuation length of UHE photons is smaller than that of UHE protons; see, for example, Bhattacharjee & Sigl (2000) for a review. There also exists a class of so-called top-down models of UHECR generation that efficiently produce the UHE photons, for instance by the decay of heavy dark-matter particles (Berezinsky, Kachelriess & Vilenkin 1997; Kuzmin & Rubakov 1998) or by the radiation from cosmic strings (Berezinsky, Blasi & Vilenkin 1998). The search for the UHE photons was shown to be the most sensitive method of indirect detection of heavy dark matter (Kalashev & Kuznetsov 2016, 2017; Kuznetsov 2017; Kachelriess, Kalashev & Kuznetsov 2018; Alcantara, Anchordoqui & Soriano 2019). Another fundamental physics scenario that could be tested with UHE photons (Fairbairn, Rashba & Troitsky 2011) is the photon mixing with axion-like particles (Raffelt & Stodolsky 1988), which could be responsible for the correlation of UHECR events with BL Lac type objects observed by the High Resolution Fly’s Eye (HiRes) experiment (Gorbunov et al. 2004; Abbasi et al. 2006). In most of these scenarios, a clustering of photon arrival directions, rather than diffuse distribution, is expected, so point-source searches can be a suitable test for photon - axion-like particle mixing models. Finally, UHE photons could also be used as a probe for the models of Lorentz-invariance violation (Coleman & Glashow 1999; Galaverni & Sigl 2008; Maccione, Liberati & Sigl 2010; Rubtsov, Satunin & Sibiryakov 2012, 2014). The Telescope Array (TA; Tokuno et al. 2012; Abu-Zayyad et al. 2013c) is the largest cosmic ray experiment in the Northern Hemisphere. It is located at 39.3° N, 112.9° W in Utah, USA. The observatory includes a surface detector array (SD) and 38 fluorescence telescopes grouped into three stations. The SD consists of 507 stations that contain plastic scintillators, each with an area of 3 m2 (SD stations). The stations are placed in the square grid with 1.2 km spacing and cover an area of ∼700 km2. The TA SD is capable of detecting extensive air showers (EASs) in the atmosphere caused by cosmic particles of EeV and higher energies. The TA SD has been operating since 2008 May. A hadron-induced EAS significantly differs from an EAS induced by a photon because the depth of the shower maximum Xmax for a photon shower is larger, and a photon shower contains fewer muons and has a more curved front (see Risse & Homola 2007 for a review). The TA SD stations are sensitive to both muon and electromagnetic components of the shower and therefore can be triggered by both hadron-induced and photon-induced EAS events. In the present study, we use 9 yr of TA SD data for a blind search for point sources of UHE photons. We utilize the statistics of the SD data, which benefit from a high duty cycle. The full Monte Carlo (MC) simulation of proton-induced and photon-induced EAS events allows us to perform the photon search up to the highest accessible energies, E ≳ 1020 eV. As the main tool for the present photon search, we use a multivariate analysis based on a number of SD parameters that make it possible to distinguish between photon and hadron primaries. While searches for diffuse UHE photons were performed by several EAS experiments, including Haverah Park (Ave et al. 2000), AGASA (Shinozaki et al. 2002; Risse et al. 2005), Yakutsk (Rubtsov et al. 2006; Glushkov et al. 2007, 2010), Pierre Auger (Abraham et al. 2007, 2008a; Bleve 2016; Aab et al. 2017c) and TA (Abu-Zayyad et al. 2013b; Abbasi et al. 2019a), the search for point sources of UHE photons has been done only by the Pierre Auger Observatory (Aab et al. 2014, 2017a). The latter searches were based on hybrid data and were limited to the 1017.3 < E < 1018.5 eV energy range. In the present paper, we use the TA SD data alone. We perform the searches in five energy ranges: E > 1018, E > 1018.5, E > 1019, E > 1019.5 and E > 1020 eV. We find no significant evidence of photon point sources in all energy ranges and we set the point-source flux upper limits from each direction in the TA field of view (FOV). The search for unspecified neutral particles was also previously performed by the TA (Abbasi et al. 2015). The limit on the point-source flux of neutral particles obtained in that work is close to the present photon point-source flux limits. 
    more » « less
  4. We applied localized orbital scaling correction (LOSC) in Bethe–Salpeter equation (BSE) to predict accurate excitation energies for molecules. LOSC systematically eliminates the delocalization error in the density functional approximation and is capable of approximating quasiparticle (QP) energies with accuracy similar to or better than GW Green’s function approach and with much less computational cost. The QP energies from LOSC, instead of commonly used G 0 W 0 and ev GW, are directly used in BSE. We show that the BSE/LOSC approach greatly outperforms the commonly used BSE/ G 0 W 0 approach for predicting excitations with different characters. For the calculations of Truhlar–Gagliardi test set containing valence, charge transfer, and Rydberg excitations, BSE/LOSC with the Tamm–Dancoff approximation provides a comparable accuracy to time-dependent density functional theory (TDDFT) and BSE/ev GW. For the calculations of Stein CT test set and Rydberg excitations of atoms, BSE/LOSC considerably outperforms both BSE/ G 0 W 0 and TDDFT approaches with a reduced starting point dependence. BSE/LOSC is, thus, a promising and efficient approach to calculate excitation energies for molecular systems. 
    more » « less
  5. Abstract Designing a new heterostructure electrode has many challenges associated with interface engineering. Demanding simulation resources and lack of heterostructure databases continue to be a barrier to understanding the chemistry and mechanics of complex interfaces using simulations. Mixed-dimensional heterostructures composed of two-dimensional (2D) and three-dimensional (3D) materials are undisputed next-generation materials for engineered devices due to their changeable properties. The present work computationally investigates the interface between 2D graphene and 3D tin (Sn) systems with density functional theory (DFT) method. This computationally demanding simulation data is further used to develop machine learning (ML)-based potential energy surfaces (PES). The approach to developing PES for complex interface systems in the light of limited data and the transferability of such models has been discussed. To develop PES for graphene-tin interface systems, high-dimensional neural networks (HDNN) are used that rely on atom-centered symmetry function to represent structural information. HDNN are modified to train on the total energies of the interface system rather than atomic energies. The performance of modified HDNN trained on 5789 interface structures of graphene|Sn is tested on new interfaces of the same material pair with varying levels of structural deviations from the training dataset. Root-mean-squared error (RMSE) for test interfaces fall in the range of 0.01–0.45 eV/atom, depending on the structural deviations from the reference training dataset. By avoiding incorrect decomposition of total energy into atomic energies, modified HDNN model is shown to obtain higher accuracy and transferability despite a limited dataset. Improved accuracy in the ML-based modeling approach promises cost-effective means of designing interfaces in heterostructure energy storage systems with higher cycle life and stability. 
    more » « less