In real-world materials research, machine learning (ML) models are usually expected to predict and discover novel exceptional materials that deviate from the known materials. It is thus a pressing question to provide an objective evaluation of ML model performances in property prediction of out-of-distribution (OOD) materials that are different from the training set. Traditional performance evaluation of materials property prediction models through the random splitting of the dataset frequently results in artificially high-performance assessments due to the inherent redundancy of typical material datasets. Here we present a comprehensive benchmark study of structure-based graph neural networks (GNNs) for extrapolative OOD materials property prediction. We formulate five different categories of OOD ML problems for three benchmark datasets from the MatBench study. Our extensive experiments show that current state-of-the-art GNN algorithms significantly underperform for the OOD property prediction tasks on average compared to their baselines in the MatBench study, demonstrating a crucial generalization gap in realistic material prediction tasks. We further examine the latent physical spaces of these GNN models and identify the sources of CGCNN, ALIGNN, and DeeperGATGNN’s significantly more robust OOD performance than those of the current best models in the MatBench study (coGN and coNGN) as a case study for the perovskites dataset, and provide insights to improve their performance.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
Abstract Discovering new materials is a challenging task in materials science crucial to the progress of human society. Conventional approaches based on experiments and simulations are labor-intensive or costly with success heavily depending on experts’ heuristic knowledge. Here, we propose a deep learning based Physics Guided Crystal Generative Model (PGCGM) for efficient crystal material design with high structural diversity and symmetry. Our model increases the generation validity by more than 700% compared to FTCP, one of the latest structure generators and by more than 45% compared to our previous CubicGAN model. Density Functional Theory (DFT) calculations are used to validate the generated structures with 1869 materials out of 2000 are successfully optimized and deposited into the Carolina Materials Database
www.carolinamatdb.org , of which 39.6% have negative formation energy and 5.3% have energy-above-hull less than 0.25 eV/atom, indicating their thermodynamic stability and potential synthesizability. -
Abstract Despite the machine learning (ML) methods have been largely used recently, the predicted materials properties usually cannot exceed the range of original training data. We deployed a boundless objective-free exploration approach to combine traditional ML and density functional theory (DFT) in searching extreme material properties. This combination not only improves the efficiency for screening large-scale materials with minimal DFT inquiry, but also yields properties beyond original training range. We use Stein novelty to recommend outliers and then verify using DFT. Validated data are then added into the training dataset for next round iteration. We test the loop of training-recommendation-validation in mechanical property space. By screening 85,707 crystal structures, we identify 21 ultrahigh hardness structures and 11 negative Poisson’s ratio structures. The algorithm is very promising for future materials discovery that can push materials properties to the limit with minimal DFT calculations on only ~1% of the structures in the screening pool.
-
Thermoelectric materials harvest waste heat and convert it into reusable electricity. Thermoelectrics are also widely used in inverse ways such as refrigerators and cooling electronics. However, most popular and known thermoelectric materials to date were proposed and found by intuition, mostly through experiments. Unfortunately, it is extremely time and resource consuming to synthesize and measure the thermoelectric properties through trial-and-error experiments. Here, we develop a convolutional neural network (CNN) classification model that utilizes the fused orbital field matrix and composition descriptors to screen a large pool of materials to discover new thermoelectric candidates with power factor higher than 10 μW/cm K2. The model used our own data generated by high-throughput density functional theory calculations coupled with ab initio scattering and transport package to obtain electronic transport properties without assuming constant relaxation time of electrons, which ensures more reliable electronic transport properties calculations than previous studies. The classification model was also compared to some traditional machine learning algorithms such as gradient boosting and random forest. We deployed the classification model on 3465 cubic dynamically stable structures with non-zero bandgap screened from Open Quantum Materials Database. We identified many high-performance thermoelectric materials with ZT > 1 or close to 1 across a wide temperature range from 300 to 700 K and for both n- and p-type doping with different doping concentrations. Moreover, our feature importance and maximal information coefficient analysis demonstrates two previously unreported material descriptors, namely, mean melting temperature and low average deviation of electronegativity, that are strongly correlated with power factor and thus provide a new route for quickly screening potential thermoelectrics with high success rate. Our deep CNN model with fused orbital field matrix and composition descriptors is very promising for screening high power factor thermoelectrics from large-scale hypothetical structures.
Free, publicly-accessible full text available June 1, 2025 -
Electronic devices get smaller and smaller in every generation. In micro-/nano-electronic devices such as high electron mobility transistors, heat dissipation has become a crucial design consideration due to the ultrahigh heat flux that has a negative effect on devices' performance and their lifetime. Therefore, thermal transport performance enhancement is required to adapt to the device size reduction. β-Ga2O3 has recently gained significant scientific interest for future power devices because of its inherent material properties such as extremely wide bandgap, outstanding Baliga's figure of merit, large critical electric field, etc. This work aims to use a machine learning approach to search promising substrates or heat sinks for cooling β-Ga2O3, in terms of high interfacial thermal conductance (ITC), from large-scale potential structures taken from existing material databases. With the ITC dataset of 1633 various substrates for β-Ga2O3 calculated by full density functional theory, we trained our recently developed convolutional neural network (CNN) model that utilizes the fused orbital field matrix (OFM) and composition descriptors. Our model proved to be superior in performance to traditional machine learning algorithms such as random forest and gradient boosting. We then deployed the CNN model to predict the ITC of 32 716 structures in contact with β-Ga2O3. The CNN model predicted the top 20 cubic and noncubic substrates with ITC on the same level as density functional theory (DFT) results on β-Ga2O3/YN and β-Ga2O3/MgO interfaces, which has the highest ITC of 1224 and 1211 MW/m2K, respectively, among the DFT-ITC datasets. Phonon density of states, group velocity, and scattering effect on high heat flux transport and consequently increased ITC are also investigated. Moderate to high phonon density of states overlap, high group velocity, and low phonon scattering are required to achieve high ITC. We also found three Magpie descriptors with strong Pearson correlation with ITC, namely, mean atomic number, mean atomic weight, and mean ground state volume per atom. Calculations of such descriptors are computationally efficient, and therefore, these descriptors provide a new route for quickly screening potential substrates from large-scale material pools for high-performance interfacial thermal management of high-electron mobility transistor devices.
Free, publicly-accessible full text available May 28, 2025 -
The discovery of advanced thermal materials with exceptional phonon properties drives technological advancements, impacting innovations from electronics to superconductors. Understanding the intricate relationship between composition, structure, and phonon thermal transport properties is crucial for speeding up such discovery. Exploring innovative materials involves navigating vast design spaces and considering chemical and structural factors on multiple scales and modalities. Artificial intelligence (AI) is transforming science and engineering and poised to transform discovery and innovation. This era offers a unique opportunity to establish a new paradigm for the discovery of advanced materials by leveraging databases, simulations, and accumulated knowledge, venturing into experimental frontiers, and incorporating cutting-edge AI technologies. In this perspective, first, the general approach of density functional theory (DFT) coupled with phonon Boltzmann transport equation (BTE) for predicting comprehensive phonon properties will be reviewed. Then, to circumvent the extremely computationally demanding DFT + BTE approach, some early studies and progress of deploying AI/machine learning (ML) models to phonon thermal transport in the context of structure–phonon property relationship prediction will be presented, and their limitations will also be discussed. Finally, a summary of current challenges and an outlook of future trends will be given. Further development of incorporating AI/ML algorithms for phonon thermal transport could range from phonon database construction to universal machine learning potential training, to inverse design of materials with target phonon properties and to extend ML models beyond traditional phonons.
Free, publicly-accessible full text available May 7, 2025 -
Prediction of crystal structures with desirable material properties is a grand challenge in materials research. We deployed graph theory assisted structure searcher and combined with universal machine learning potentials to accelerate the process.
Free, publicly-accessible full text available April 2, 2025 -
Crystal structure prediction using neural network potential and age-fitness Pareto genetic algorithm
While crystal structure prediction (CSP) remains a longstanding challenge, we introduce ParetoCSP, a novel algorithm for CSP, which combines a multi-objective genetic algorithm (GA) with a neural network inter-atomic potential model to find energetically optimal crystal structures given chemical compositions. We enhance the updated multi-objective GA (NSGA-III) by incorporating the genotypic age as an independent optimization criterion and employ the M3GNet universal inter-atomic potential to guide the GA search. Compared to GN-OA, a state-of-the-art neural potential-based CSP algorithm, ParetoCSP demonstrated significantly better predictive capabilities, outperforming by a factor of $$ 2.562 $$ across $$ 55 $$ diverse benchmark structures, as evaluated by seven performance metrics. Trajectory analysis of the traversed structures of all algorithms shows that ParetoCSP generated more valid structures than other algorithms, which helped guide the GA to search more effectively for the optimal structures. Our implementation code is available at https://github.com/sadmanomee/ParetoCSP .
Free, publicly-accessible full text available March 2, 2025 -
Free, publicly-accessible full text available February 1, 2025
-
Abstract Existing machine learning potentials for predicting phonon properties of crystals are typically limited on a material-to-material basis, primarily due to the exponential scaling of model complexity with the number of atomic species. We address this bottleneck with the developed Elemental Spatial Density Neural Network Force Field, namely Elemental-SDNNFF. The effectiveness and precision of our Elemental-SDNNFF approach are demonstrated on 11,866 full, half, and quaternary Heusler structures spanning 55 elements in the periodic table by prediction of complete phonon properties. Self-improvement schemes including active learning and data augmentation techniques provide an abundant 9.4 million atomic data for training. Deep insight into predicted ultralow lattice thermal conductivity (<1 Wm −1 K −1 ) of 774 Heusler structures is gained by p–d orbital hybridization analysis. Additionally, a class of two-band charge-2 Weyl points, referred to as “double Weyl points”, are found in 68% and 87% of 1662 half and 1550 quaternary Heuslers, respectively.more » « lessFree, publicly-accessible full text available December 1, 2024