In real-world materials research, machine learning (ML) models are usually expected to predict and discover novel exceptional materials that deviate from the known materials. It is thus a pressing question to provide an objective evaluation of ML model performances in property prediction of out-of-distribution (OOD) materials that are different from the training set. Traditional performance evaluation of materials property prediction models through the random splitting of the dataset frequently results in artificially high-performance assessments due to the inherent redundancy of typical material datasets. Here we present a comprehensive benchmark study of structure-based graph neural networks (GNNs) for extrapolative OOD materials property prediction. We formulate five different categories of OOD ML problems for three benchmark datasets from the MatBench study. Our extensive experiments show that current state-of-the-art GNN algorithms significantly underperform for the OOD property prediction tasks on average compared to their baselines in the MatBench study, demonstrating a crucial generalization gap in realistic material prediction tasks. We further examine the latent physical spaces of these GNN models and identify the sources of CGCNN, ALIGNN, and DeeperGATGNN’s significantly more robust OOD performance than those of the current best models in the MatBench study (coGN and coNGN) as a case study for the perovskites dataset, and provide insights to improve their performance.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
Prediction of crystal structures with desirable material properties is a grand challenge in materials research. We deployed graph theory assisted structure searcher and combined with universal machine learning potentials to accelerate the process.
Free, publicly-accessible full text available April 2, 2025 -
Crystal structure prediction using neural network potential and age-fitness Pareto genetic algorithm
While crystal structure prediction (CSP) remains a longstanding challenge, we introduce ParetoCSP, a novel algorithm for CSP, which combines a multi-objective genetic algorithm (GA) with a neural network inter-atomic potential model to find energetically optimal crystal structures given chemical compositions. We enhance the updated multi-objective GA (NSGA-III) by incorporating the genotypic age as an independent optimization criterion and employ the M3GNet universal inter-atomic potential to guide the GA search. Compared to GN-OA, a state-of-the-art neural potential-based CSP algorithm, ParetoCSP demonstrated significantly better predictive capabilities, outperforming by a factor of $$ 2.562 $$ across $$ 55 $$ diverse benchmark structures, as evaluated by seven performance metrics. Trajectory analysis of the traversed structures of all algorithms shows that ParetoCSP generated more valid structures than other algorithms, which helped guide the GA to search more effectively for the optimal structures. Our implementation code is available at https://github.com/sadmanomee/ParetoCSP .
Free, publicly-accessible full text available March 2, 2025 -
Free, publicly-accessible full text available February 1, 2025
-
Abstract Existing machine learning potentials for predicting phonon properties of crystals are typically limited on a material-to-material basis, primarily due to the exponential scaling of model complexity with the number of atomic species. We address this bottleneck with the developed Elemental Spatial Density Neural Network Force Field, namely Elemental-SDNNFF. The effectiveness and precision of our Elemental-SDNNFF approach are demonstrated on 11,866 full, half, and quaternary Heusler structures spanning 55 elements in the periodic table by prediction of complete phonon properties. Self-improvement schemes including active learning and data augmentation techniques provide an abundant 9.4 million atomic data for training. Deep insight into predicted ultralow lattice thermal conductivity (<1 Wm −1 K −1 ) of 774 Heusler structures is gained by p–d orbital hybridization analysis. Additionally, a class of two-band charge-2 Weyl points, referred to as “double Weyl points”, are found in 68% and 87% of 1662 half and 1550 quaternary Heuslers, respectively.more » « lessFree, publicly-accessible full text available December 1, 2024
-
Data driven generative deep learning models have recently emerged as one of the most promising approaches for new materials discovery. While generator models can generate millions of candidates, it is critical to train fast and accurate machine learning models to filter out stable, synthesizable materials with the desired properties. However, such efforts to build supervised regression or classification screening models have been severely hindered by the lack of unstable or unsynthesizable samples, which usually are not collected and deposited in materials databases such as ICSD and Materials Project (MP). At the same time, there is a significant amount of unlabelled data available in these databases. Here we propose a semi-supervised deep neural network (TSDNN) model for high-performance formation energy and synthesizability prediction, which is achieved via its unique teacher-student dual network architecture and its effective exploitation of the large amount of unlabeled data. For formation energy based stability screening, our semi-supervised classifier achieves an absolute 10.3% accuracy improvement compared to the baseline CGCNN regression model. For synthesizability prediction, our model significantly increases the baseline PU learning's true positive rate from 87.9% to 92.9% using 1/49 model parameters. To further prove the effectiveness of our models, we combined our TSDNN-energy and TSDNN-synthesizability models with our CubicGAN generator to discover novel stable cubic structures. Out of the 1000 recommended candidate samples by our models, 512 of them have negative formation energies as validated by our DFT formation energy calculations. Our experimental results show that our semi-supervised deep neural networks can significantly improve the screening accuracy in large-scale generative materials design. Our source code can be accessed at https://git/hub.com/usccolumbia/tsdnn.more » « less
-
Abstract Driven by the big data science, material informatics has attracted enormous research interests recently along with many recognized achievements. To acquire knowledge of materials by previous experience, both feature descriptors and databases are essential for training machine learning (ML) models with high accuracy. In this regard, the electronic charge density ρ ( r ), which in principle determines the properties of materials at their ground state, can be considered as one of the most appropriate descriptors. However, the systematic electronic charge density ρ ( r ) database of inorganic materials is still in its infancy due to the difficulties in collecting raw data in experiment and the expensive first-principles based computational cost in theory. Herein, a real space electronic charge density ρ ( r ) database of 17,418 cubic inorganic materials is constructed by performing high-throughput density functional theory calculations. The displayed ρ ( r ) patterns show good agreements with those reported in previous studies, which validates our computations. Further statistical analysis reveals that it possesses abundant and diverse data, which could accelerate ρ ( r ) related machine learning studies. Moreover, the electronic charge density database will also assists chemical bonding identifications and promotes new crystal discovery in experiments.more » « less
-
Abstract Discovering new materials is a challenging task in materials science crucial to the progress of human society. Conventional approaches based on experiments and simulations are labor-intensive or costly with success heavily depending on experts’ heuristic knowledge. Here, we propose a deep learning based Physics Guided Crystal Generative Model (PGCGM) for efficient crystal material design with high structural diversity and symmetry. Our model increases the generation validity by more than 700% compared to FTCP, one of the latest structure generators and by more than 45% compared to our previous CubicGAN model. Density Functional Theory (DFT) calculations are used to validate the generated structures with 1869 materials out of 2000 are successfully optimized and deposited into the Carolina Materials Database
www.carolinamatdb.org , of which 39.6% have negative formation energy and 5.3% have energy-above-hull less than 0.25 eV/atom, indicating their thermodynamic stability and potential synthesizability.