In this work, we describe a simple approach to select the most important molecular orbitals (MOs) to compute the optical rotation tensor through linear response (LR) Kohn‐Sham density functional theory (KS‐DFT). Taking advantage of the iterative nature of the algorithms commonly used to solve the LR equations, we select the MOs with contributions to the guess perturbed density that are larger than a certain threshold and solve the LR equations with the selected MOs only. We propose two criteria for the selection, and two definitions of the selection threshold. We then test the approach with two functionals (B3LYP and CAM‐B3LYP) and two basis sets (aug‐cc‐pVDZ and aug‐cc‐pVTZ) on a set of 51 organic molecules with specific rotation spanning five orders of magnitude, 100–104deg (dm−1(g/mL)−1). We show that this approach indeed can provide very accurate values of specific rotation with estimated speedup that ranges from 2 to 8× with the most conservative selection criterion, and up to 20 to 30× with the intermediate criterion.
- Award ID(s):
- 2213324
- PAR ID:
- 10414033
- Date Published:
- Journal Name:
- Journal of Chemical Theory and Computation
- ISSN:
- 1549-9618
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Current neural networks for predictions of molecular properties use quantum chemistry only as a source of training data. This paper explores models that use quantum chemistry as an integral part of the prediction process. This is done by implementing self-consistent-charge Density-Functional-Tight-Binding (DFTB) theory as a layer for use in deep learning models. The DFTB layer takes, as input, Hamiltonian matrix elements generated from earlier layers and produces, as output, electronic properties from self-consistent field solutions of the corresponding DFTB Hamiltonian. Backpropagation enables efficient training of the model to target electronic properties. Two types of input to the DFTB layer are explored, splines and feed-forward neural networks. Because overfitting can cause models trained on smaller molecules to perform poorly on larger molecules, regularizations are applied that penalize nonmonotonic behavior and deviation of the Hamiltonian matrix elements from those of the published DFTB model used to initialize the model. The approach is evaluated on 15,700 hydrocarbons by comparing the root-mean-square error in energy and dipole moment, on test molecules with eight heavy atoms, to the error from the initial DFTB model. When trained on molecules with up to seven heavy atoms, the spline model reduces the test error in energy by 60% and in dipole moments by 42%. The neural network model performs somewhat better, with error reductions of 67% and 59%, respectively. Training on molecules with up to four heavy atoms reduces performance, with both the spline and neural net models reducing the test error in energy by about 53% and in dipole by about 25%.more » « less
-
SUMMARY The ability to accurately and reliably obtain images of shallow subsurface anomalies within the Earth is important for hazard monitoring and a fundamental understanding of many geologic structures, such as volcanic edifices. In recent years, machine learning (ML) has gained increasing attention as a novel approach for addressing complex problems in the geosciences. Here we present an ML-based inversion method to integrate cosmic-ray muon and gravity data sets for shallow subsurface density imaging at a volcano. Starting with an ensemble of random density anomalies, we use physics-based forward calculations to find the corresponding set of expected gravity and muon attenuation observations. Given a large enough ensemble of synthetic density patterns and observations, the ML algorithm is trained to recognize the expected spatial relations within the synthetic input–output pairs, learning the inherent physical relationships between them. Once trained, the ML algorithm can then interpolate the best-fitting anomalous pattern given data that were not used in training, such as those obtained from field measurements. We test the validity of our ML algorithm using field data from the Showa-Shinzan lava dome (Mt Usu, Japan) and show that our model produces results consistent with those obtained using a more traditional Bayesian joint inversion. Our results are similar to the previously published inversion, and suggest that the Showa-Shinzan lava dome consists of a relatively high-density (2200–2400 km m–3) cylindrical anomaly, about 300 m in diameter. Adding noise to synthetic training and testing data sets shows that, as expected, the ML algorithm is most robust in areas of high sensitivity, as determined by the forward kernels. Overall, we discover that ML offers a viable alternate method to a Bayesian joint inversion when used with gravity and muon data sets for subsurface density imaging.
-
Abstract Coarse graining techniques play an essential role in accelerating molecular simulations of systems with large length and time scales. Theoretically grounded bottom-up models are appealing due to their thermodynamic consistency with the underlying all-atom models. In this direction, machine learning approaches hold great promise to fitting complex many-body data. However, training models may require collection of large amounts of expensive data. Moreover, quantifying trained model accuracy is challenging, especially in cases of non-trivial free energy configurations, where training data may be sparse. We demonstrate a path towards uncertainty-aware models of coarse grained free energy surfaces. Specifically, we show that principled Bayesian model uncertainty allows for efficient data collection through an on-the-fly active learning framework and opens the possibility of adaptive transfer of models across different chemical systems. Uncertainties also characterize models’ accuracy of free energy predictions, even when training is performed only on forces. This work helps pave the way towards efficient autonomous training of reliable and uncertainty aware many-body machine learned coarse grain models.
-
Recent studies illustrate how machine learning (ML) can be used to bypass a core challenge of molecular modeling: the trade-off between accuracy and computational cost. Here, we assess multiple ML approaches for predicting the atomization energy of organic molecules. Our resulting models learn the difference between low-fidelity, B3LYP, and high-accuracy, G4MP2, atomization energies and predict the G4MP2 atomization energy to 0.005 eV (mean absolute error) for molecules with less than nine heavy atoms (training set of 117,232 entries, test set 13,026) and 0.012 eV for a small set of 66 molecules with between 10 and 14 heavy atoms. Our two best models, which have different accuracy/speed trade-offs, enable the efficient prediction of G4MP2-level energies for large molecules and are available through a simple web interface.more » « less