This content will become publicly available on December 1, 2024
The theorems of density functional theory (DFT) establish bijective maps between the local external potential of a manybody system and its electron density, wavefunction and, therefore, oneparticle reduced density matrix. Building on this foundation, we show that machine learning models based on the oneelectron reduced density matrix can be used to generate surrogate electronic structure methods. We generate surrogates of local and hybrid DFT, HartreeFock and full configuration interaction theories for systems ranging from small molecules such as water to more complex compounds like benzene and propanol. The surrogate models use the oneelectron reduced density matrix as the central quantity to be learned. From the predicted density matrices, we show that either standard quantum chemistry or a second machinelearning model can be used to compute molecular observables, energies, and atomic forces. The surrogate models can generate essentially anything that a standard electronic structure method can, ranging from band gaps and KohnSham orbitals to energyconserving abinitio molecular dynamics simulations and infrared spectra, which account for anharmonicity and thermal effects, without the need to employ computationally expensive algorithms such as selfconsistent field theory. The algorithms are packaged in an efficient and easy to use Python code, QMLearn, accessible on popular platforms.
more » « less Award ID(s):
 2117429
 NSFPAR ID:
 10471660
 Publisher / Repository:
 Nature Publishing Group
 Date Published:
 Journal Name:
 Nature Communications
 Volume:
 14
 Issue:
 1
 ISSN:
 20411723
 Page Range / eLocation ID:
 19
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of firstprinciple computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 2,399 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an opensource benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at \url{https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench}.more » « less

Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of firstprinciple computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 2,399 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an opensource benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications.more » « less

Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of firstprinciple computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 2,399 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an opensource benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at \url{https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench}.more » « less

Standard approximations for the exchange–correlation functional in Kohn–Sham density functional theory (KSDFT) typically lead to unacceptably large errors when applied to strongly correlated electronic systems. PartitionDFT (PDFT) is a formally exact reformulation of KSDFT in which the groundstate density and energy of a system are obtained through selfconsistent calculations on isolated fragments, with a partition energy representing interfragment interactions. Here, we show how typical errors of the local density approximation (LDA) in KSDFT can be largely suppressed through a simple approximation, the multifragment overlap approximation (MFOA), for the partition energy in PDFT. Our method is illustrated on simple models of onedimensional strongly correlated linear hydrogen chains. The MFOA, when used in combination with the LDA for the fragments, improves LDA dissociation curves of hydrogen chains and produces results that are comparable to those of spinunrestricted LDA, but without breaking the spin symmetry. MFOA also induces a correction to the LDA electron density that partially captures the correct density dimerization in strongly correlated hydrogen chains. Moreover, with an additional correction to the partition energy that is specific to the onedimensional LDA, the approximation is shown to produce dissociation energies in quantitative agreement with calculations based on the density matrix renormalization group method.

Accelerating the development of πconjugated molecules for applications such as energy generation and storage, catalysis, sensing, pharmaceuticals, and (semi)conducting technologies requires rapid and accurate evaluation of the electronic, redox, or optical properties. While highthroughput computational screening has proven to be a tremendous aid in this regard, machine learning (ML) and other datadriven methods can further enable orders of magnitude reduction in time while at the same time providing dramatic increases in the chemical space that is explored. However, the lack of benchmark datasets containing the electronic, redox, and optical properties that characterize the diverse, known chemical space of organic πconjugated molecules limits ML model development. Here, we present a curated dataset containing 25k molecules with density functional theory (DFT) and timedependent DFT (TDDFT) evaluated properties that include frontier molecular orbitals, ionization energies, relaxation energies, and lowlying optical excitation energies. Using the dataset, we train a hierarchy of ML models, ranging from classical models such as ridge regression to sophisticated graph neural networks, with molecular SMILES representation as input. We observe that graph neural networks augmented with contextual information allow for significantly better predictions across a wide array of properties. Our bestperforming models also provide an uncertainty quantification for the predictions. To democratize access to the data and trained models, an interactive web platform has been developed and deployed.more » « less