skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on March 2, 2025

Title: Crystal structure prediction using neural network potential and age-fitness Pareto genetic algorithm

While crystal structure prediction (CSP) remains a longstanding challenge, we introduce ParetoCSP, a novel algorithm for CSP, which combines a multi-objective genetic algorithm (GA) with a neural network inter-atomic potential model to find energetically optimal crystal structures given chemical compositions. We enhance the updated multi-objective GA (NSGA-III) by incorporating the genotypic age as an independent optimization criterion and employ the M3GNet universal inter-atomic potential to guide the GA search. Compared to GN-OA, a state-of-the-art neural potential-based CSP algorithm, ParetoCSP demonstrated significantly better predictive capabilities, outperforming by a factor of $$ 2.562 $$ across $$ 55 $$ diverse benchmark structures, as evaluated by seven performance metrics. Trajectory analysis of the traversed structures of all algorithms shows that ParetoCSP generated more valid structures than other algorithms, which helped guide the GA to search more effectively for the optimal structures. Our implementation code is available at https://github.com/sadmanomee/ParetoCSP .

 
more » « less
Award ID(s):
2110033 2311202 2320292
NSF-PAR ID:
10527994
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
OAEpublish
Date Published:
Journal Name:
Journal of materials Informatics
Volume:
4
Issue:
2
ISSN:
2770-372X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In multi-objective search, edges are annotated with cost vectors consisting of multiple cost components. A path dominates another path with the same start and goal vertices iff the component-wise sum of the cost vectors of the edges of the former path is 'less than' the component-wise sum of the cost vectors of the edges of the latter path. The Pareto-optimal solution set is the set of all undominated paths from a given start vertex to a given goal vertex. Its size can be exponential in the size of the graph being searched, which makes multi-objective search time-consuming. In this paper, we therefore study how to find an approximate Pareto-optimal solution set for a user-provided vector of approximation factors. The size of such a solution set can be significantly smaller than the size of the Pareto-optimal solution set, which enables the design of approximate multi-objective search algorithms that are efficient and produce small solution sets. We present such an algorithm in this paper, called A*pex. A*pex builds on PPA*, a state-of-the-art approximate bi-objective search algorithm (where there are only two cost components) but (1) makes PPA* more efficient for bi-objective search and (2) generalizes it to multi-objective search for any number of cost components. We first analyze the correctness of A*pex and then experimentally demonstrate its efficiency advantage over existing approximate algorithms for bi- and tri-objective search. 
    more » « less
  2. null (Ed.)
    Crystal structure prediction is now playing an increasingly important role in the discovery of new materials or crystal engineering. Global optimization methods such as genetic algorithms (GAs) and particle swarm optimization have been combined with first-principles free energy calculations to predict crystal structures given the composition or only a chemical system. While these approaches can exploit certain crystal patterns such as symmetry and periodicity in their search process, they usually do not exploit the large amount of implicit rules and constraints of atom configurations embodied in the large number of known crystal structures. They currently can only handle crystal structure prediction of relatively small systems. Inspired by the knowledge-rich protein structure prediction approach, herein we explore whether known geometric constraints such as the atomic contact map of a target crystal material can help predict its structure given its space group information. We propose a global optimization-based algorithm, CMCrystal, for crystal structure (atomic coordinates) reconstruction based on atomic contact maps. Based on extensive experiments using six global optimization algorithms, we show that it is viable to reconstruct the crystal structure given the atomic contact map for some crystal materials, but more geometric or physicochemical constraints are needed to achieve the successful reconstruction of other materials. 
    more » « less
  3. Predictions of the structures of stoichiometric, fractional, or nonstoichiometric hydrates of organic molecular crystals are immensely challenging due to the extensive search space of different water contents, host molecular placements throughout the crystal, and internal molecular conformations. However, the dry frameworks of these hydrates, especially for nonstoichiometric or isostructural dehydrates, can often be predicted from a standard anhydrous crystal structure prediction (CSP) protocol. Inspired by developments in the field of drug binding, we introduce an efficient data-driven and topologically aware approach for predicting organic molecular crystal hydrate structures through a mapping of water positions within the crystal structure. The method does not require a priori specification of water content and can, therefore, predict stoichiometric, fractional, and nonstoichiometric hydrate structures. This approach, which we term a mapping approach for crystal hydrates (MACH), establishes a set of rules for systematic determination of favorable positions for water insertion within predicted or experimental crystal structures based on considerations of the chemical features of local environments and void regions. The proposed approach is tested on hydrates of three pharmaceutically relevant compounds that exhibit diverse crystal packing motifs and void environments characteristic of hydrate structures. Overall, we show that our mapping approach introduces an advance in the efficient performance of hydrate CSP through generation of stable hydrate stoichiometries at low cost and should be considered an integral component for CSP workflows. 
    more » « less
  4. Many interesting search problems can be formulated as bi-objective search problems, that is, search problems where two kinds of costs have to be minimized, for example, travel distance and time for transportation problems. Bi-objective search algorithms have to maintain the set of undominated paths from the start state to each state to compute the set of paths from the start state to the goal state that are not dominated by some other path from the start state to the goal state (called the Pareto-optimal solution set). Each time they find a new path to a state s, they perform a dominance check to determine whether this path dominates any of the previously found paths to s or whether any of the previously found paths to s dominates this path. Existing algorithms do not perform these checks efficiently. On the other hand, our Bi-Objective A* (BOA*) algorithm requires only constant time per check. In our experimental evaluation, we show that BOA* can run an order of magnitude (or more) faster than state-of-the-art bi-objective search algorithms, such as NAMOA*, NAMOA*dr, Bi-Objective Dijkstra, and Bidirectional Bi-Objective Dijkstra. 
    more » « less
  5. The goal of molecular crystal structure prediction (CSP) is to find all the plausible polymorphs for a given molecule. This requires performing global optimization over a high-dimensional search space. Genetic algorithms (GAs) perform global optimization by starting from an initial population of structures and generating new candidate structures by breeding the fittest structures in the population. Typically, the fitness function is based on relative lattice energies, such that structures with lower energies have a higher probability of being selected for mating. GAs may be adapted to perform multi-modal optimization by using evolutionary niching methods that support the formation of several stable subpopulations and suppress the over-sampling of densely populated regions. Evolutionary niching is implemented in the GAtor molecular crystal structure prediction code by using techniques from machine learning to dynamically cluster the population into niches of structural similarity. A cluster-based fitness function is constructed such that structures in less populated clusters have a higher probability of being selected for breeding. Here, the effects of evolutionary niching are investigated for the crystal structure prediction of 1,3-dibromo-2-chloro-5-fluorobenzene. Using the cluster-based fitness function increases the success rate of generating the experimental structure and additional low-energy structures with similar packing motifs. 
    more » « less