Thanks to the rapid advances in artificial intelligence, AI for science (AI4Science) has emerged as one of the new promising research directions for modern science and engineering. In this review, we focus on recent efforts to develop knowledge-driven Bayesian learning and experimental design methods for accelerating the discovery of novel functional materials as well as enhancing the understanding of composition-process-structure-property relationships. We specifically discuss the challenges and opportunities in integrating prior scientific knowledge and physics principles with AI and machine learning (ML) models for accelerating materials and knowledge discovery. The current state-of-the-art methods in knowledge-based prior construction, model fusion, uncertainty quantification, optimal experimental design, and symbolic regression are detailed in the review, along with several detailed case studies and results in materials discovery.
more »
« less
Machine learning materials properties with accurate predictions, uncertainty estimates, domain guidance, and persistent online accessibility
Abstract One compelling vision of the future of materials discovery and design involves the use of machine learning (ML) models to predict materials properties and then rapidly find materials tailored for specific applications. However, realizing this vision requires both providing detailed uncertainty quantification (model prediction errors and domain of applicability) and making models readily usable. At present, it is common practice in the community to assess ML model performance only in terms of prediction accuracy (e.g. mean absolute error), while neglecting detailed uncertainty quantification and robust model accessibility and usability. Here, we demonstrate a practical method for realizing both uncertainty and accessibility features with a large set of models. We develop random forest ML models for 33 materials properties spanning an array of data sources (computational and experimental) and property types (electrical, mechanical, thermodynamic, etc). All models have calibrated ensemble error bars to quantify prediction uncertainty and domain of applicability guidance enabled by kernel-density-estimate-based feature distance measures. All data and models are publicly hosted on the Garden-AI infrastructure, which provides an easy-to-use, persistent interface for model dissemination that permits models to be invoked with only a few lines of Python code. We demonstrate the power of this approach by using our models to conduct a fully ML-based materials discovery exercise to search for new stable, highly active perovskite oxide catalyst materials.
more »
« less
- PAR ID:
- 10558169
- Publisher / Repository:
- IOP Publishing
- Date Published:
- Journal Name:
- Machine Learning: Science and Technology
- Volume:
- 5
- Issue:
- 4
- ISSN:
- 2632-2153
- Format(s):
- Medium: X Size: Article No. 045051
- Size(s):
- Article No. 045051
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract First-principles techniques for electronic transport property prediction have seen rapid progress in recent years. However, it remains a challenge to predict properties of heterostructures incorporating fabrication-dependent variability. Machine-learning (ML) approaches are increasingly being used to accelerate design and discovery of new materials with targeted properties, and extend the applicability of first-principles techniques to larger systems. However, few studies exploited ML techniques to characterize relationships between local atomic structures and global electronic transport coefficients. In this work, we propose an electronic-transport-informatics (ETI) framework that trains on ab initio models of small systems and predicts thermopower of fabricated silicon/germanium heterostructures, matching measured data. We demonstrate application of ML approaches to extract important physics that determines electronic transport in semiconductor heterostructures, and bridge the gap between ab initio accessible models and fabricated systems. We anticipate that ETI framework would have broad applicability to diverse materials classes.more » « less
-
Machine learning (ML) has become a part of the fabric of high-throughput screening and computational discovery of materials. Despite its increasingly central role, challenges remain in fully realizing the promise of ML. This is especially true for the practical acceleration of the engineering of robust materials and the development of design strategies that surpass trial and error or high-throughput screening alone. Depending on the quantity being predicted and the experimental data available, ML can either outperform physics-based models, be used to accelerate such models, or be integrated with them to improve their performance. We cover recent advances in algorithms and in their application that are starting to make inroads toward ( a) the discovery of new materials through large-scale enumerative screening, ( b) the design of materials through identification of rules and principles that govern materials properties, and ( c) the engineering of practical materials by satisfying multiple objectives. We conclude with opportunities for further advancement to realize ML as a widespread tool for practical computational materials design.more » « less
-
null (Ed.)The rapidly growing interest in machine learning (ML) for materials discovery has resulted in a large body of published work. However, only a small fraction of these publications includes confirmation of ML predictions, either via experiment or via physics-based simulations. In this review, we first identify the core components common to materials informatics discovery pipelines, such as training data, choice of ML algorithm, and measurement of model performance. Then we discuss some prominent examples of validated ML-driven materials discovery across a wide variety of materials classes, with special attention to methodological considerations and advances. Across these case studies, we identify several common themes, such as the use of domain knowledge to inform ML models.more » « less
-
Mura, Cameron (Ed.)Machine learning (ML) is increasingly being used to guide biological discovery in biomedicine such as prioritizing promising small molecules in drug discovery. In those applications, ML models are used to predict the properties of biological systems, and researchers use these predictions to prioritize candidates as new biological hypotheses for downstream experimental validations. However, when applied to unseen situations, these models can be overconfident and produce a large number of false positives. One solution to address this issue is to quantify the model’s prediction uncertainty and provide a set of hypotheses with a controlled false discovery rate (FDR) pre-specified by researchers. We propose CPEC, an ML framework for FDR-controlled biological discovery. We demonstrate its effectiveness using enzyme function annotation as a case study, simulating the discovery process of identifying the functions of less-characterized enzymes. CPEC integrates a deep learning model with a statistical tool known as conformal prediction, providing accurate and FDR-controlled function predictions for a given protein enzyme. Conformal prediction provides rigorous statistical guarantees to the predictive model and ensures that the expected FDR will not exceed a user-specified level with high probability. Evaluation experiments show that CPEC achieves reliable FDR control, better or comparable prediction performance at a lower FDR than existing methods, and accurate predictions for enzymes under-represented in the training data. We expect CPEC to be a useful tool for biological discovery applications where a high yield rate in validation experiments is desired but the experimental budget is limited.more » « less
An official website of the United States government
