Identifying thermodynamically stable crystal structures remains a key challenge in materials chemistry. Computational crystal structure prediction (CSP) workflows typically rank candidate structures by lattice energy to assess relative stability. Approaches using self-consistent first-principles calculations become prohibitively expensive, especially when millions of energy evaluations are required for complex molecular systems with many atoms per unit cell. Here, we provide a detailed analysis of our methodology and results from the seventh blind test of crystal structure prediction organized by the Cambridge Crystallographic Data Centre (CCDC). We present an approach that significantly accelerates CSP by training target-specific machine learned interatomic potentials (MLIPs). AIMNet2 MLIPs are trained on density functional theory (DFT) calculations of molecular clusters, herein referred to as n-mers. We demonstrate that potentials trained on gas phase dispersion-corrected DFT reference data of n-mers successfully extend to crystalline environments, accurately characterizing the CSP landscape and correctly ranking structures by relative stability. Our methodology effectively captures the underlying physics of thermodynamic crystal stability using only molecular cluster data, avoiding the need for expensive periodic calculations. The performance of target-specific AIMNet2 interatomic potentials is illustrated across diverse chemical systems relevant to pharmaceutical, optoelectronic, and agrochemical applications, demonstrating their promise as efficient alternatives to full DFT calculations for routine CSP tasks.
more »
« less
This content will become publicly available on December 1, 2025
The seventh blind test of crystal structure prediction: structure generation methods
A seventh blind test of crystal structure prediction was organized by the Cambridge Crystallographic Data Centre featuring seven target systems of varying complexity: a silicon and iodine-containing molecule, a copper coordination complex, a near-rigid molecule, a cocrystal, a polymorphic small agrochemical, a highly flexible polymorphic drug candidate, and a polymorphic morpholine salt. In this first of two parts focusing on structure generation methods, many crystal structure prediction (CSP) methods performed well for the small but flexible agrochemical compound, successfully reproducing the experimentally observed crystal structures, while few groups were successful for the systems of higher complexity. A powder X-ray diffraction (PXRD) assisted exercise demonstrated the use of CSP in successfully determining a crystal structure from a low-quality PXRD pattern. The use of CSP in the prediction of likely cocrystal stoichiometry was also explored, demonstrating multiple possible approaches. Crystallographic disorder emerged as an important theme throughout the test as both a challenge for analysis and a major achievement where two groups blindly predicted the existence of disorder for the first time. Additionally, large-scale comparisons of the sets of predicted crystal structures also showed that some methods yield sets that largely contain the same crystal structures.
more »
« less
- PAR ID:
- 10579945
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Publisher / Repository:
- International Union of Crystallography
- Date Published:
- Journal Name:
- Acta Crystallographica Section B Structural Science, Crystal Engineering and Materials
- Volume:
- 80
- Issue:
- 6
- ISSN:
- 2052-5206
- Page Range / eLocation ID:
- 517 to 547
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A seventh blind test of crystal structure prediction has been organized by the Cambridge Crystallographic Data Centre. The results are presented in two parts, with this second part focusing on methods for ranking crystal structures in order of stability. The exercise involved standardized sets of structures seeded from a range of structure generation methods. Participants from 22 groups applied several periodic DFT-D methods, machine learned potentials, force fields derived from empirical data or quantum chemical calculations, and various combinations of the above. In addition, one non-energy-based scoring function was used. Results showed that periodic DFT-D methods overall agreed with experimental data within expected error margins, while one machine learned model, applying system-specific AIMnet potentials, agreed with experiment in many cases demonstrating promise as an efficient alternative to DFT-based methods. For target XXXII, a consensus was reached across periodic DFT methods, with consistently high predicted energies of experimental forms relative to the global minimum (above 4 kJ mol−1at both low and ambient temperatures) suggesting a more stable polymorph is likely not yet observed. The calculation of free energies at ambient temperatures offered improvement of predictions only in some cases (for targets XXVII and XXXI). Several avenues for future research have been suggested, highlighting the need for greater efficiency considering the vast amounts of resources utilized in many cases.more » « less
-
The goal of molecular crystal structure prediction (CSP) is to find all the plausible polymorphs for a given molecule. This requires performing global optimization over a high-dimensional search space. Genetic algorithms (GAs) perform global optimization by starting from an initial population of structures and generating new candidate structures by breeding the fittest structures in the population. Typically, the fitness function is based on relative lattice energies, such that structures with lower energies have a higher probability of being selected for mating. GAs may be adapted to perform multi-modal optimization by using evolutionary niching methods that support the formation of several stable subpopulations and suppress the over-sampling of densely populated regions. Evolutionary niching is implemented in the GAtor molecular crystal structure prediction code by using techniques from machine learning to dynamically cluster the population into niches of structural similarity. A cluster-based fitness function is constructed such that structures in less populated clusters have a higher probability of being selected for breeding. Here, the effects of evolutionary niching are investigated for the crystal structure prediction of 1,3-dibromo-2-chloro-5-fluorobenzene. Using the cluster-based fitness function increases the success rate of generating the experimental structure and additional low-energy structures with similar packing motifs.more » « less
-
Abstract The nucleobase derivative 5‐aminouracil (AUr, C4H5N3O2) is of interest for its biological activity, yet the solid state structure of this compound has remained elusive owing to its propensity to crystallize as aggregates of microcrystalline particles. Here we report the first single‐crystal structure of AUr determined from synchrotron x‐ray diffraction data. An early crystal structure prediction effort, which assumed that AUr was rigid in the isolated molecule optimized conformation, provided several poor matches to the simulated PXRD pattern. Revisiting these crystal structures, by periodic electronic level modelling (PBE‐TS optimization) gave more realistic relative lattice energies, but a good match to the experimental powder pattern required using the experimental cell parameters. PXRD and Raman spectroscopy suggest that phase impurities may be present in the bulk crystallization product, though the identity of alternative polymorphs could not be confirmed on the basis of the data available.more » « less
-
The two-step nucleation (TSN) theory and crystal structure prediction (CSP) techniques are two disjointed yet popular methods to predict nucleation rate and crystal structure, respectively. The TSN theory is a well-established mechanism to describe the nucleation of a wide range of crystalline materials in different solvents. However, it has never been expanded to predict the crystal structure or polymorphism. On the contrary, the existing CSP techniques only empirically account for the solvent effects. As a result, the TSN theory and CSP techniques continue to evolve as separate methods to predict two essential attributes of nucleation – rate and structure. Here we bridge this gap and show for the first time how a crystal structure is formed within the framework of TSN theory. A sequential desolvation mechanism is proposed in TSN, where the first step involves partial desolvation to form dense clusters followed by selective desolvation of functional groups directing the formation of crystal structure. We investigate the effect of the specific interaction on the degree of solvation around different functional groups of glutamic acid molecules using molecular simulations. The simulated energy landscape and activation barriers at increasing supersaturations suggest sequential and selective desolvation. We validate computationally and experimentally that the crystal structure formation and polymorph selection are due to a previously unrecognized consequence of supersaturation-driven asymmetric desolvation of molecules.more » « less
An official website of the United States government
