Abstract Large-density functional theory (DFT) databases are a treasure trove of energies, forces, and stresses that can be used to train machine-learned interatomic potentials for atomistic modeling. Herein, we employ structural relaxations from the AFLOW database to train moment tensor potentials (MTPs) for four carbide systems: CHfTa, CHfZr, CMoW, and CTaTi. The resulting MTPs are used to relax ~6300 random symmetric structures, and are subsequently improved via active learning to generate robust potentials (RP) that can relax a wide variety of structures, and accurate potentials (AP) designed for the relaxation of low-energy systems. This protocol is shown to yield convex hulls that are indistinguishable from those predicted by AFLOW for the CHfTa, CHfZr, and CTaTi systems, and in the case of the CMoW system to predict thermodynamically stable structures that are not found within AFLOW, highlighting the potential of the employed protocol within crystal structure prediction. Relaxation of over three hundred (Mo1−xWx)C stoichiometry crystals first with the RP then with the AP yields formation enthalpies that are in excellent agreement with those obtained via DFT.
more »
« less
This content will become publicly available on December 1, 2025
The seventh blind test of crystal structure prediction: structure ranking methods
A seventh blind test of crystal structure prediction has been organized by the Cambridge Crystallographic Data Centre. The results are presented in two parts, with this second part focusing on methods for ranking crystal structures in order of stability. The exercise involved standardized sets of structures seeded from a range of structure generation methods. Participants from 22 groups applied several periodic DFT-D methods, machine learned potentials, force fields derived from empirical data or quantum chemical calculations, and various combinations of the above. In addition, one non-energy-based scoring function was used. Results showed that periodic DFT-D methods overall agreed with experimental data within expected error margins, while one machine learned model, applying system-specific AIMnet potentials, agreed with experiment in many cases demonstrating promise as an efficient alternative to DFT-based methods. For target XXXII, a consensus was reached across periodic DFT methods, with consistently high predicted energies of experimental forms relative to the global minimum (above 4 kJ mol−1at both low and ambient temperatures) suggesting a more stable polymorph is likely not yet observed. The calculation of free energies at ambient temperatures offered improvement of predictions only in some cases (for targets XXVII and XXXI). Several avenues for future research have been suggested, highlighting the need for greater efficiency considering the vast amounts of resources utilized in many cases.
more »
« less
- PAR ID:
- 10579946
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Publisher / Repository:
- International Union of Crystallography
- Date Published:
- Journal Name:
- Acta Crystallographica Section B Structural Science, Crystal Engineering and Materials
- Volume:
- 80
- Issue:
- 6
- ISSN:
- 2052-5206
- Page Range / eLocation ID:
- 548 to 574
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The nucleobase derivative 5‐aminouracil (AUr, C4H5N3O2) is of interest for its biological activity, yet the solid state structure of this compound has remained elusive owing to its propensity to crystallize as aggregates of microcrystalline particles. Here we report the first single‐crystal structure of AUr determined from synchrotron x‐ray diffraction data. An early crystal structure prediction effort, which assumed that AUr was rigid in the isolated molecule optimized conformation, provided several poor matches to the simulated PXRD pattern. Revisiting these crystal structures, by periodic electronic level modelling (PBE‐TS optimization) gave more realistic relative lattice energies, but a good match to the experimental powder pattern required using the experimental cell parameters. PXRD and Raman spectroscopy suggest that phase impurities may be present in the bulk crystallization product, though the identity of alternative polymorphs could not be confirmed on the basis of the data available.more » « less
-
The goal of molecular crystal structure prediction (CSP) is to find all the plausible polymorphs for a given molecule. This requires performing global optimization over a high-dimensional search space. Genetic algorithms (GAs) perform global optimization by starting from an initial population of structures and generating new candidate structures by breeding the fittest structures in the population. Typically, the fitness function is based on relative lattice energies, such that structures with lower energies have a higher probability of being selected for mating. GAs may be adapted to perform multi-modal optimization by using evolutionary niching methods that support the formation of several stable subpopulations and suppress the over-sampling of densely populated regions. Evolutionary niching is implemented in the GAtor molecular crystal structure prediction code by using techniques from machine learning to dynamically cluster the population into niches of structural similarity. A cluster-based fitness function is constructed such that structures in less populated clusters have a higher probability of being selected for breeding. Here, the effects of evolutionary niching are investigated for the crystal structure prediction of 1,3-dibromo-2-chloro-5-fluorobenzene. Using the cluster-based fitness function increases the success rate of generating the experimental structure and additional low-energy structures with similar packing motifs.more » « less
-
Quasi-harmonic approaches provide an economical route to modeling the temperature dependence of molecular crystal structures and properties. Several studies have demonstrated good performance of these models, at least for rigid molecules, when using fragment-based approaches with correlated wavefunction techniques. Many others have found success employing dispersion-corrected density functional theory (DFT). Here, a hierarchy of models in which the energies, geometries, and phonons are computed either with correlated methods or DFT are examined to identify which combinations produce useful predictions for properties such as the molar volume, enthalpy, and entropy as a function of temperature. The results demonstrate that refining DFT geometries and phonons with single-point energies based on dispersion-corrected second-order M{\o}ller-Plesset perturbation theory can provide clear improvements in the molar volumes and enthalpies compared to those obtained from DFT alone. Predicted entropies, which are governed by vibrational contributions, benefit less clearly from the hybrid schemes. Using these hybrid techniques, the room-temperature thermochemistry of acetaminophen (paracetamol) is predicted to address the discrepancy between two experimental sublimation enthalpy measurements.more » « less
-
Modern semiempirical electronic structure methods have considerable promise in drug discovery as universal “force fields” that can reliably model biological and drug-like molecules, including alternative tautomers and protonation states. Herein, we compare the performance of several neglect of diatomic differential overlap-based semiempirical (MNDO/d, AM1, PM6, PM6-D3H4X, PM7, and ODM2), density-functional tight-binding based (DFTB3, DFTB/ChIMES, GFN1-xTB, and GFN2-xTB) models with pure machine learning potentials (ANI-1x and ANI-2x) and hybrid quantum mechanical/machine learning potentials (AIQM1 and QD π) for a wide range of data computed at a consistent ωB97X/6-31G* level of theory (as in the ANI-1x database). This data includes conformational energies, intermolecular interactions, tautomers, and protonation states. Additional comparisons are made to a set of natural and synthetic nucleic acids from the artificially expanded genetic information system that has important implications for the design of new biotechnology and therapeutics. Finally, we examine the acid/base chemistry relevant for RNA cleavage reactions catalyzed by small nucleolytic ribozymes, DNAzymes, and ribonucleases. Overall, the hybrid quantum mechanical/machine learning potentials appear to be the most robust for these datasets, and the recently developed QD π model performs exceptionally well, having especially high accuracy for tautomers and protonation states relevant to drug discovery.more » « less