The many-body expansion (MBE) is promising for the efficient, parallel computation of lattice energies in organic crystals. Very high accuracy should be achievable by employing coupled-cluster singles, doubles, and perturbative triples at the complete basis set limit [CCSD(T)/CBS] for the dimers, trimers, and potentially tetramers resulting from the MBE, but such a brute-force approach seems impractical for crystals of all but the smallest molecules. Here, we investigate hybrid or multi-level approaches that employ CCSD(T)/CBS only for the closest dimers and trimers and utilize much faster methods like Møller–Plesset perturbation theory (MP2) for more distant dimers and trimers. For trimers, MP2 is supplemented with the Axilrod–Teller–Muto (ATM) model of three-body dispersion. MP2(+ATM) is shown to be a very effective replacement for CCSD(T)/CBS for all but the closest dimers and trimers. A limited investigation of tetramers using CCSD(T)/CBS suggests that the four-body contribution is entirely negligible. The large set of CCSD(T)/CBS dimer and trimer data should be valuable in benchmarking approximate methods for molecular crystals and allows us to see that a literature estimate of the core-valence contribution of the closest dimers to the lattice energy using just MP2 was overbinding by 0.5 kJ mol−1, and an estimate of the three-body contribution from the closest trimers using the T0 approximation in local CCSD(T) was underbinding by 0.7 kJ mol−1. Our CCSD(T)/CBS best estimate of the 0 K lattice energy is −54.01 kJ mol−1, compared to an estimated experimental value of −55.3 ± 2.2 kJ mol−1.
more »
« less
Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning
Abstract Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist’s toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.
more »
« less
- PAR ID:
- 10153429
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The focal-point approximation can be used to estimate a high-accuracy, slow quantum chemistry computation by combining several lower-accuracy, faster computations. We examine the performance of focal-point methods by combining second-order Møller–Plesset perturbation theory (MP2) with coupled-cluster singles, doubles, and perturbative triples [CCSD(T)] for the calculation of harmonic frequencies and that of fundamental frequencies using second-order vibrational perturbation theory (VPT2). In contrast to standard CCSD(T), the focal-point CCSD(T) method approaches the complete basis set (CBS) limit with only triple-ζ basis sets for the coupled-cluster portion of the computation. The predicted harmonic and fundamental frequencies were compared with the experimental values for a set of 20 molecules containing up to six atoms. The focal-point method combining CCSD(T)/aug-cc-pV(T + d)Z with CBS-extrapolated MP2 has mean absolute errors vs experiment of only 7.3 cm−1 for the fundamental frequencies, which are essentially the same as the mean absolute error for CCSD(T) extrapolated to the CBS limit using the aug-cc-pV(Q + d)Z and aug-cc-pV(5 + d)Z basis sets. However, for H2O, the focal-point procedure requires only 3% of the computation time as the extrapolated CCSD(T) result, and the cost savings will grow for larger molecules.more » « less
-
Hydrazoic acid (HN3) is used as a case study for investigating the accuracy and precision by which a molecular structure—specifically, a semi-experimental equilibrium structure (reSE)—may be determined using current state-of-the-art methodology. The influence of the theoretical corrections for effects of vibration–rotation coupling and electron-mass distribution that are employed in the analysis is explored in detail. The small size of HN3 allowed us to deploy considerable computational resources to probe the basis-set dependence of these corrections using a series of coupled-cluster single, double, perturbative triple [CCSD(T)] calculations with cc-pCVXZ (X = D, T, Q, 5) basis sets. We extrapolated the resulting corrections to the complete basis set (CBS) limit to obtain CCSD(T)/CBS corrections, which were used in a subsequent reSE structure determination. The reSE parameters obtained using the CCSD(T)/cc-pCV5Z corrections are nearly identical to those obtained using the CCSD(T)/CBS corrections, with uncertainties in the bond distances and angles of less than 0.0006 Å and 0.08°, respectively. The previously obtained reSE structure using CCSD(T)/ANO2 agrees with that using CCSD(T)/cc-pCV5Z to within 0.000 08 Å and 0.016° for bond distances and angles, respectively, and with only 25% larger uncertainties, validating the idea that reSE structure determinations can be carried out with significantly smaller basis sets than those needed for similarly accurate, strictly ab initio determinations. Although the purely computational re structural parameters [CCSD(T)/cc-pCV6Z] fall outside of the statistical uncertainties (2σ) of the corresponding reSE structural parameters, the discrepancy is rectified by applying corrections to address the theoretical limitations of the CCSD(T)/cc-pCV6Z geometry with respect to basis set, electron correlation, relativity, and the Born–Oppenheimer approximation, thereby supporting the contention that the semi-experimental approach is both an accurate and vastly more efficient method for structure determinations than is brute-force computation.more » « less
-
Abstract We have performed a series of highly accurate calculations between CO2and the 20 naturally occurring amino acids for the investigation of the attractive noncovalent interactions. Different nucleophilic groups present in the amino acid structures were considered (α‐NH2, COOH, side groups), and the stronger binding sites were identified. A database of accurate reference interactions energies was compiled as computed by explicitly‐correlated coupled‐cluster singles‐and‐doubles, together with perturbative triples extrapolated to the complete‐basis‐set limit. The CCSD(F12)(T)/CBS reference values were used for comparing a variety of popular density functionals with different basis sets. Our results show that most density functionals with the triple‐zeta basis set def2‐TZVPP align with the CCSD(F12)(T)/CBS reference values, but errors range from 0.1 kcal/mol up to 1.0 kcal/mol.more » « less
-
Abstract Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.more » « less
An official website of the United States government
