The goal of molecular crystal structure prediction (CSP) is to find all the plausible polymorphs for a given molecule. This requires performing global optimization over a high-dimensional search space. Genetic algorithms (GAs) perform global optimization by starting from an initial population of structures and generating new candidate structures by breeding the fittest structures in the population. Typically, the fitness function is based on relative lattice energies, such that structures with lower energies have a higher probability of being selected for mating. GAs may be adapted to perform multi-modal optimization by using evolutionary niching methods that support the formation of several stable subpopulations and suppress the over-sampling of densely populated regions. Evolutionary niching is implemented in the GAtor molecular crystal structure prediction code by using techniques from machine learning to dynamically cluster the population into niches of structural similarity. A cluster-based fitness function is constructed such that structures in less populated clusters have a higher probability of being selected for breeding. Here, the effects of evolutionary niching are investigated for the crystal structure prediction of 1,3-dibromo-2-chloro-5-fluorobenzene. Using the cluster-based fitness function increases the success rate of generating the experimental structure and additional low-energy structures with similar packing motifs.
more »
« less
Contact map based crystal structure prediction using global optimization
Crystal structure prediction is now playing an increasingly important role in the discovery of new materials or crystal engineering. Global optimization methods such as genetic algorithms (GAs) and particle swarm optimization have been combined with first-principles free energy calculations to predict crystal structures given the composition or only a chemical system. While these approaches can exploit certain crystal patterns such as symmetry and periodicity in their search process, they usually do not exploit the large amount of implicit rules and constraints of atom configurations embodied in the large number of known crystal structures. They currently can only handle crystal structure prediction of relatively small systems. Inspired by the knowledge-rich protein structure prediction approach, herein we explore whether known geometric constraints such as the atomic contact map of a target crystal material can help predict its structure given its space group information. We propose a global optimization-based algorithm, CMCrystal, for crystal structure (atomic coordinates) reconstruction based on atomic contact maps. Based on extensive experiments using six global optimization algorithms, we show that it is viable to reconstruct the crystal structure given the atomic contact map for some crystal materials, but more geometric or physicochemical constraints are needed to achieve the successful reconstruction of other materials.
more »
« less
- PAR ID:
- 10288164
- Date Published:
- Journal Name:
- CrystEngComm
- Volume:
- 23
- Issue:
- 8
- ISSN:
- 1466-8033
- Page Range / eLocation ID:
- 1765 to 1776
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
SE3Lig: SE(3)-equivariant CNNs for the reconstruction of cofactors and ligands in protein structuresProtein structure prediction algorithms such as AlphaFold2 and ESMFold have dramatically increased the availability of high-quality models of protein structures. Because these algorithms predict only the structure of the protein itself, there is a growing need for methods that can rapidly screen protein structures for ligands. Previous work on similar tasks has shown promise but is lacking scope in the classes of atoms predicted and can benefit from the recent architectural developments in convolutional neural networks (CNNs). In this work, we introduce SE3Lig, a model for semantic in-painting of small molecules in protein structures. Specifically, we report SE(3)-equivariant CNNs trained to predict the atomic densities of common classes of cofactors (hemes, flavins, etc.) and the water molecules and inorganic ions in their vicinity. While the models are trained on high-resolution crystal structures of enzymes, they perform well on structures predicted by AlphaFold2, which suggests that the algorithm correctly represents cofactor-binding cavities.more » « less
-
Crystal structure prediction using neural network potential and age-fitness Pareto genetic algorithmWhile crystal structure prediction (CSP) remains a longstanding challenge, we introduce ParetoCSP, a novel algorithm for CSP, which combines a multi-objective genetic algorithm (GA) with a neural network inter-atomic potential model to find energetically optimal crystal structures given chemical compositions. We enhance the updated multi-objective GA (NSGA-III) by incorporating the genotypic age as an independent optimization criterion and employ the M3GNet universal inter-atomic potential to guide the GA search. Compared to GN-OA, a state-of-the-art neural potential-based CSP algorithm, ParetoCSP demonstrated significantly better predictive capabilities, outperforming by a factor of $$ 2.562 $$ across $$ 55 $$ diverse benchmark structures, as evaluated by seven performance metrics. Trajectory analysis of the traversed structures of all algorithms shows that ParetoCSP generated more valid structures than other algorithms, which helped guide the GA to search more effectively for the optimal structures. Our implementation code is available at https://github.com/sadmanomee/ParetoCSP .more » « less
-
Polymorphism in molecular crystals influences their properties and performance. Crystal structure prediction (CSP) can help explore the crystal structure landscape and discover potentially stable polymorphs computationally. We present a new version of the Genarris open-source code, which generates random molecular crystal structures in all space groups and applies physical constraints on intermolecular distances. The main new feature in Genarris 3.0 is the ``Rigid Press algorithm, which uses a regularized hard-sphere potential to compress the unit cell and achieve a maximally close-packed structure based on purely geometric considerations without performing any energy evaluations. In addition, Genarris 3.0 is interfaced with machine-learned interatomic potentials (MLIPs) to accelerate the exploration of the potential energy landscape. We present a new clustering and down-selection workflow that employs the MACE-OFF23(L) MLIPs to perform geometry optimization and energy ranking in the early stages. We use Genarris 3.0 to successfully predict the structure of six targets: aspirin, Target I and Target XXII from previous CSP blind tests, and the energetic materials HMX, CL-20, and DNI. We further analyze the performance of MACE-OFF23(L) compared to dispersion-inclusive density functional theory (DFT) for geometry relaxation and energy ranking. We find significant variability in the performance of MACE-OFF23(L) across chemically diverse targets with particularly poor performance for energetic materials, which is mitigated by our clustering and down-selection procedure. Genarris 3.0 can thus be used effectively to perform CSP and to generate molecular crystal datasets for training ML models.more » « less
-
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery. Existing generative models have several drawbacks including lack of modeling important molecular geometry elements (e.g. torsion angles), separate optimization stages prone to error accumulation, and the need for structure fine-tuning based on approximate classical force-fields or computationally expensive methods such as metadynamics with approximate quantum mechanics calculations at each geometry. We propose GeoMol--an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate distributions of low-energy molecular 3D conformers. Leveraging the power of message passing neural networks (MPNNs) to capture local and global graph information, we predict local atomic 3D structures and torsion angles, avoiding unnecessary over-parameterization of the geometric degrees of freedom (e.g. one angle per non-terminal bond). Such local predictions suffice both for the training loss computation, as well as for the full deterministic conformer assembly (at test time). We devise a non-adversarial optimal transport based loss function to promote diverse conformer generation. GeoMol predominantly outperforms popular open-source, commercial, or state-of-the-art machine learning (ML) models, while achieving significant speed-ups. We expect such differentiable 3D structure generators to significantly impact molecular modeling and related applications.more » « less
An official website of the United States government

