Predicting properties from a material’s composition or structure is of great interest for materials design. Deep learning has recently garnered considerable interest in materials predictive tasks with low model errors when dealing with large materials data. However, deep learning models suffer in the small data regime that is common in materials science. Here we develop the AtomSets framework, which utilizes universal compositional and structural descriptors extracted from pre-trained graph network deep learning models with standard multi-layer perceptrons to achieve consistently high model accuracy for both small compositional data (<400) and large structural data (>130,000). The AtomSets models show lower errors than the graph network models at small data limits and other non-deep-learning models at large data limits. They also transfer better in a simulated materials discovery process where the targeted materials have property values out of the training data limits. The models require minimal domain knowledge inputs and are free from feature engineering. The presented AtomSets model framework can potentially accelerate machine learning-assisted materials design and discovery with less data restriction.
This content will become publicly available on January 2, 2025
Modern data mining methods have demonstrated effectiveness in comprehending and predicting materials properties. An essential component in the process of materials discovery is to know which material(s) will possess desirable properties. For many materials properties, performing experiments and density functional theory computations are costly and time-consuming. Hence, it is challenging to build accurate predictive models for such properties using conventional data mining methods due to the small amount of available data. Here we present a framework for materials property prediction tasks using structure information that leverages graph neural network-based architecture along with deep-transfer-learning techniques to drastically improve the model’s predictive ability on diverse materials (3D/2D, inorganic/organic, computational/experimental) data. We evaluated the proposed framework in cross-property and cross-materials class scenarios using 115 datasets to find that transfer learning models outperform the models trained from scratch in 104 cases, i.e., ≈90%, with additional benefits in performance for extrapolation problems. We believe the proposed framework can be widely useful in accelerating materials discovery in materials science.
more » « less- Award ID(s):
- 2053929
- PAR ID:
- 10534256
- Publisher / Repository:
- Springer Nature
- Date Published:
- Journal Name:
- npj Computational Materials
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2057-3960
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract Materials discovery from the infinite earth repository is a major bottleneck for revolutionary technological progress. This labor‐intensive and time‐consuming process hinders the discovery of new materials. Although machine learning techniques show an excellent capability for speeding up materials discovery, obtaining effective material feature representations is still challenging, and making a precise prediction of the material properties is still tricky. This work focuses on developing an automatic material design and discovery framework enabled by data‐driven artificial intelligence (AI) models. Multiple types of material descriptors are first developed to promote the representation and encoding of the materials’ uniqueness, resulting in improved performance for different molecular properties predictions. The material's thermoelectric (TE) properties prediction is then utilized as a baseline to demonstrate the investigation logistic. The proposed framework achieves more than 90% accuracy for predicting materials' TE properties. Furthermore, the developed AI models identify 6 promising p‐type TE materials and 8 promising n‐type TE materials. The prediction results are evaluated by density functional theory calculations and agree with the material's TE property provided by experimental results. The proposed framework is expected to accelerate the design and discovery of the new functional materials.
-
Abstract Modern data mining techniques using machine learning (ML) and deep learning (DL) algorithms have been shown to excel in the regression-based task of materials property prediction using various materials representations. In an attempt to improve the predictive performance of the deep neural network model, researchers have tried to add more layers as well as develop new architectural components to create sophisticated and deep neural network models that can aid in the training process and improve the predictive ability of the final model. However, usually, these modifications require a lot of computational resources, thereby further increasing the already large model training time, which is often not feasible, thereby limiting usage for most researchers. In this paper, we study and propose a deep neural network framework for regression-based problems comprising of fully connected layers that can work with any numerical vector-based materials representations as model input. We present a novel deep regression neural network, iBRNet, with branched skip connections and multiple schedulers, which can reduce the number of parameters used to construct the model, improve the accuracy, and decrease the training time of the predictive model. We perform the model training using composition-based numerical vectors representing the elemental fractions of the respective materials and compare their performance against other traditional ML and several known DL architectures. Using multiple datasets with varying data sizes for training and testing, We show that the proposed iBRNet models outperform the state-of-the-art ML and DL models for all data sizes. We also show that the branched structure and usage of multiple schedulers lead to fewer parameters and faster model training time with better convergence than other neural networks. Scientific contribution: The combination of multiple callback functions in deep neural networks minimizes training time and maximizes accuracy in a controlled computational environment with parametric constraints for the task of materials property prediction.
-
Abstract While machine learning has emerged in recent years as a useful tool for the rapid prediction of materials properties, generating sufficient data to reliably train models without overfitting is often impractical. Towards overcoming this limitation, we present a general framework for leveraging complementary information across different models and datasets for accurate prediction of data-scarce materials properties. Our approach, based on a machine learning paradigm called mixture of experts, outperforms pairwise transfer learning on 14 of 19 materials property regression tasks, performing comparably on four of the remaining five. The approach is interpretable, model-agnostic, and scalable to combining an arbitrary number of pre-trained models and datasets to any downstream property prediction task. We anticipate the performance of our framework will further improve as better model architectures, new pre-training tasks, and larger materials datasets are developed by the community.
-
Abstract High entropy alloys (HEAs) are an important material class in the development of next-generation structural materials, but the astronomically large composition space cannot be efficiently explored by experiments or first-principles calculations. Machine learning (ML) methods might address this challenge, but ML of HEAs has been hindered by the scarcity of HEA property data. In this work, the EMTO-CPA method was used to generate a large HEA dataset (spanning a composition space of 14 elements) containing 7086 cubic HEA structures with structural properties, 1911 of which have the complete elastic tensor calculated. The elastic property dataset was used to train a ML model with the Deep Sets architecture. The Deep Sets model has better predictive performance and generalizability compared to other ML models. Association rule mining was applied to the model predictions to describe the compositional dependence of HEA elastic properties and to demonstrate the potential for data-driven alloy design.