Abstract One compelling vision of the future of materials discovery and design involves the use of machine learning (ML) models to predict materials properties and then rapidly find materials tailored for specific applications. However, realizing this vision requires both providing detailed uncertainty quantification (model prediction errors and domain of applicability) and making models readily usable. At present, it is common practice in the community to assess ML model performance only in terms of prediction accuracy (e.g. mean absolute error), while neglecting detailed uncertainty quantification and robust model accessibility and usability. Here, we demonstrate a practical method for realizing both uncertainty and accessibility features with a large set of models. We develop random forest ML models for 33 materials properties spanning an array of data sources (computational and experimental) and property types (electrical, mechanical, thermodynamic, etc). All models have calibrated ensemble error bars to quantify prediction uncertainty and domain of applicability guidance enabled by kernel-density-estimate-based feature distance measures. All data and models are publicly hosted on the Garden-AI infrastructure, which provides an easy-to-use, persistent interface for model dissemination that permits models to be invoked with only a few lines of Python code. We demonstrate the power of this approach by using our models to conduct a fully ML-based materials discovery exercise to search for new stable, highly active perovskite oxide catalyst materials.
more »
« less
Computationally efficient Bayesian unit-level random neural network modelling of survey data under informative sampling for small area estimation
Abstract The topic of neural networks has seen a surge of interest in recent years. However, one of the main challenges with these approaches is quantification of uncertainty. The use of random weight models offer a potential solution. In addition to uncertainty quantification, these models are extremely computationally efficient as they do not require optimisation through stochastic gradient descent. We show how this approach can be used to account for informative sampling of survey data through the use of a pseudo-likelihood. We illustrate the effectiveness of this methodology through simulation and data application involving American National Election Studies data.
more »
« less
- PAR ID:
- 10402604
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society Series A: Statistics in Society
- Volume:
- 186
- Issue:
- 4
- ISSN:
- 0964-1998
- Format(s):
- Medium: X Size: p. 722-737
- Size(s):
- p. 722-737
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Robust quantification of predictive uncertainty is a critical addition needed for machine learning applied to weather and climate problems to improve the understanding of what is driving prediction sensitivity. Ensembles of machine learning models provide predictive uncertainty estimates in a conceptually simple way but require multiple models for training and prediction, increasing computational cost and latency. Parametric deep learning can estimate uncertainty with one model by predicting the parameters of a probability distribution but does not account for epistemic uncertainty. Evidential deep learning, a technique that extends parametric deep learning to higher-order distributions, can account for both aleatoric and epistemic uncertainties with one model. This study compares the uncertainty derived from evidential neural networks to that obtained from ensembles. Through applications of the classification of winter precipitation type and regression of surface-layer fluxes, we show evidential deep learning models attaining predictive accuracy rivaling standard methods while robustly quantifying both sources of uncertainty. We evaluate the uncertainty in terms of how well the predictions are calibrated and how well the uncertainty correlates with prediction error. Analyses of uncertainty in the context of the inputs reveal sensitivities to underlying meteorological processes, facilitating interpretation of the models. The conceptual simplicity, interpretability, and computational efficiency of evidential neural networks make them highly extensible, offering a promising approach for reliable and practical uncertainty quantification in Earth system science modeling. To encourage broader adoption of evidential deep learning, we have developed a new Python package, Machine Integration and Learning for Earth Systems (MILES) group Generalized Uncertainty for Earth System Science (GUESS) (MILES-GUESS) (https://github.com/ai2es/miles-guess), that enables users to train and evaluate both evidential and ensemble deep learning. Significance StatementThis study demonstrates a new technique, evidential deep learning, for robust and computationally efficient uncertainty quantification in modeling the Earth system. The method integrates probabilistic principles into deep neural networks, enabling the estimation of both aleatoric uncertainty from noisy data and epistemic uncertainty from model limitations using a single model. Our analyses reveal how decomposing these uncertainties provides valuable insights into reliability, accuracy, and model shortcomings. We show that the approach can rival standard methods in classification and regression tasks within atmospheric science while offering practical advantages such as computational efficiency. With further advances, evidential networks have the potential to enhance risk assessment and decision-making across meteorology by improving uncertainty quantification, a longstanding challenge. This work establishes a strong foundation and motivation for the broader adoption of evidential learning, where properly quantifying uncertainties is critical yet lacking.more » « less
-
Abstract Obtaining accurate estimates of machine learning model uncertainties on newly predicted data is essential for understanding the accuracy of the model and whether its predictions can be trusted. A common approach to such uncertainty quantification is to estimate the variance from an ensemble of models, which are often generated by the generally applicable bootstrap method. In this work, we demonstrate that the direct bootstrap ensemble standard deviation is not an accurate estimate of uncertainty but that it can be simply calibrated to dramatically improve its accuracy. We demonstrate the effectiveness of this calibration method for both synthetic data and numerous physical datasets from the field of Materials Science and Engineering. The approach is motivated by applications in physical and biological science but is quite general and should be applicable for uncertainty quantification in a wide range of machine learning regression models.more » « less
-
When predicting physical phenomena through simulation, quantification of the total uncertainty due to multiple sources is as crucial as making sure the underlying numerical model is accurate. Possible sources include irreduciblealeatoricuncertainty due to noise in the data,epistemicuncertainty induced by insufficient data or inadequate parameterization andmodel-formuncertainty related to the use of misspecified model equations. In addition, recently proposed approaches provide flexible ways to combine information from data with full or partial satisfaction of equations that typically encode physical principles. Physics-based regularization interacts in non-trivial ways with aleatoric, epistemic and model-form uncertainty and their combination, and a better understanding of this interaction is needed to improve the predictive performance of physics-informed digital twins that operate under real conditions. To better understand this interaction, with a specific focus on biological and physiological models, this study investigates the decomposition of total uncertainty in the estimation of states and parameters of a differential system simulated with MC X-TFC, a new physics-informed approach for uncertainty quantification based on random projections and Monte Carlo sampling. After an introductory comparison between approaches for physics-informed estimation, MC X-TFC is applied to a six-compartment stiff ODE system, the CVSim-6 model, developed in the context of human physiology. The system is first analysed by progressively removing data while estimating an increasing number of parameters, and subsequently by investigating total uncertainty under model-form misspecification of nonlinear resistance in the pulmonary compartment. In particular, we focus on the interaction between the formulation of the discrepancy term and quantification of model-form uncertainty, and show how additional physics can help in the estimation process. The method demonstrates robustness and efficiency in estimating unknown states and parameters, even with limited, sparse and noisy data. It also offers great flexibility in integrating data with physics for improved estimation, even in cases of model misspecification. This article is part of the theme issue ‘Uncertainty quantification for healthcare and biological systems (Part 1)’.more » « less
-
Abstract There is often considerable uncertainty in parameters in ecological models. This uncertainty can be incorporated into models by treating parameters as random variables with distributions, rather than fixed quantities. Recent advances in uncertainty quantification methods, such as polynomial chaos approaches, allow for the analysis of models with random parameters. We introduce these methods with a motivating case study of sea ice algal blooms in heterogeneous environments. We compare Monte Carlo methods with polynomial chaos techniques to help understand the dynamics of an algal bloom model with random parameters. Modelling key parameters in the algal bloom model as random variables changes the timing, intensity and overall productivity of the modelled bloom. The computational efficiency of polynomial chaos methods provides a promising avenue for the broader inclusion of parametric uncertainty in ecological models, leading to improved model predictions and synthesis between models and data.more » « less
An official website of the United States government
