The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed “forecasts of opportunity.” When these opportunities are not present, scientists need prediction systems that are capable of saying “I don't know.” We introduce a novel loss function, termed the “NotWrong loss,” that allows neural networks to identify forecasts of opportunity for classification problems. The NotWrong loss introduces an abstention class that allows the network to identify the more confident samples and abstain (say “I don't know”) on the less confident samples. The abstention loss is designed to abstain on a user‐defined fraction of the samples via a standard adaptive controller. Unlike many machine learning methods used to reject samples post‐training, the NotWrong loss is applied during training to preferentially learn from the more confident samples. We show that the NotWrong loss outperforms other existing loss functions for multiple climate use cases. The implementation of the proposed loss function is straightforward in most network architectures designed for classification as it only requires the addition of an abstention class to the output layer and modification of the loss function.
The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed “forecasts of opportunity.” When these opportunities are not present, scientists need prediction systems that are capable of saying “I don't know.” We introduce a novel loss function, termed “abstention loss,” that allows neural networks to identify forecasts of opportunity for regression problems. The abstention loss works by incorporating uncertainty in the network's prediction to identify the more confident samples and abstain (say “I don't know”) on the less confident samples. The abstention loss is designed to determine the optimal abstention fraction, or abstain on a user‐defined fraction using a standard adaptive controller. Unlike many methods for attaching uncertainty to neural network predictions post‐training, the abstention loss is applied during training to preferentially learn from the more confident samples. The abstention loss is built upon nonlinear heteroscedastic regression, a standard computer science method. While nonlinear heteroscedastic regression is a simple yet powerful tool for incorporating uncertainty in regression problems, we demonstrate that the abstention loss outperforms it for the synthetic climate use cases explored here. The implementation of the proposed abstention loss is straightforward in most network architectures designed for regression, as it only requires modification of the output layer and loss function.more » « less
- Award ID(s):
- NSF-PAR ID:
- Publisher / Repository:
- DOI PREFIX: 10.1029
- Date Published:
- Journal Name:
- Journal of Advances in Modeling Earth Systems
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
Robots are active agents that operate in dynamic scenarios with noisy sensors. Predictions based on these noisy sensor measurements often lead to errors and can be unreliable. To this end, roboticists have used fusion methods using multiple observations. Lately, neural networks have dominated the accuracy charts for perception-driven predictions for robotic decision-making and often lack uncertainty metrics associated with the predictions. Here, we present a mathematical formulation to obtain the heteroscedastic aleatoric uncertainty of any arbitrary distribution without prior knowledge about the data. The approach has no prior assumptions about the prediction labels and is agnostic to network architecture. Furthermore, our class of networks, Ajna, adds minimal computation and requires only a small change to the loss function while training neural networks to obtain uncertainty of predictions, enabling real-time operation even on resource-constrained robots. In addition, we study the informational cues present in the uncertainties of predicted values and their utility in the unification of common robotics problems. In particular, we present an approach to dodge dynamic obstacles, navigate through a cluttered scene, fly through unknown gaps, and segment an object pile, without computing depth but rather using the uncertainties of optical flow obtained from a monocular camera with onboard sensing and computation. We successfully evaluate and demonstrate the proposed Ajna network on four aforementioned common robotics and computer vision tasks and show comparable results to methods directly using depth. Our work demonstrates a generalized deep uncertainty method and demonstrates its utilization in robotics applications.
Neural networks (NN) have become an important tool for prediction tasks—both regression and classification—in environmental science. Since many environmental-science problems involve life-or-death decisions and policy making, it is crucial to provide not only predictions but also an estimate of the uncertainty in the predictions. Until recently, very few tools were available to provide uncertainty quantification (UQ) for NN predictions. However, in recent years the computer-science field has developed numerous UQ approaches, and several research groups are exploring how to apply these approaches in environmental science. We provide an accessible introduction to six of these UQ approaches, then focus on tools for the next step, namely, to answer the question:
Once we obtain an uncertainty estimate (using any approach), how do we know whether it is good or bad?To answer this question, we highlight four evaluation graphics and eight evaluation scores that are well suited for evaluating and comparing uncertainty estimates (NN based or otherwise) for environmental-science applications. We demonstrate the UQ approaches and UQ-evaluation methods for two real-world problems: 1) estimating vertical profiles of atmospheric dewpoint (a regression task) and 2) predicting convection over Taiwan based on Himawari-8satellite imagery (a classification task). We also provide Jupyter notebooks with Python code for implementing the UQ approaches and UQ-evaluation methods discussed herein. This article provides the environmental-science community with the knowledge and tools to start incorporating the large number of emerging UQ methods into their research. Significance Statement
Neural networks are used for many environmental-science applications, some involving life-or-death decision-making. In recent years new methods have been developed to provide much-needed uncertainty estimates for NN predictions. We seek to accelerate the adoption of these methods in the environmental-science community with an accessible introduction to 1) methods for computing uncertainty estimates in NN predictions and 2) methods for evaluating such estimates.
Deep neural networks (DNNs) have started to find their role in the modern healthcare system. DNNs are being developed for diagnosis, prognosis, treatment planning, and outcome prediction for various diseases. With the increasing number of applications of DNNs in modern healthcare, their trustworthiness and reliability are becoming increasingly important. An essential aspect of trustworthiness is detecting the performance degradation and failure of deployed DNNs in medical settings. The softmax output values produced by DNNs are not a calibrated measure of model confidence. Softmax probability numbers are generally higher than the actual model confidence. The model confidence-accuracy gap further increases for wrong predictions and noisy inputs. We employ recently proposed Bayesian deep neural networks (BDNNs) to learn uncertainty in the model parameters. These models simultaneously output the predictions and a measure of confidence in the predictions. By testing these models under various noisy conditions, we show that the (learned) predictive confidence is well calibrated. We use these reliable confidence values for monitoring performance degradation and failure detection in DNNs. We propose two different failure detection methods. In the first method, we define a fixed threshold value based on the behavior of the predictive confidence with changing signal-to-noise ratio (SNR) of the test dataset. The second method learns the threshold value with a neural network. The proposed failure detection mechanisms seamlessly abstain from making decisions when the confidence of the BDNN is below the defined threshold and hold the decision for manual review. Resultantly, the accuracy of the models improves on the unseen test samples. We tested our proposed approach on three medical imaging datasets: PathMNIST, DermaMNIST, and OrganAMNIST, under different levels and types of noise. An increase in the noise of the test images increases the number of abstained samples. BDNNs are inherently robust and show more than 10% accuracy improvement with the proposed failure detection methods. The increased number of abstained samples or an abrupt increase in the predictive variance indicates model performance degradation or possible failure. Our work has the potential to improve the trustworthiness of DNNs and enhance user confidence in the model predictions.more » « less
We investigate the predictability of the sign of daily southeastern U.S. (SEUS) precipitation anomalies associated with simultaneous predictors of large-scale climate variability using machine learning models. Models using index-based climate predictors and gridded fields of large-scale circulation as predictors are utilized. Logistic regression (LR) and fully connected neural networks using indices of climate phenomena as predictors produce neither accurate nor reliable predictions, indicating that the indices themselves are not good predictors. Using gridded fields as predictors, an LR and convolutional neural network (CNN) are more accurate than the index-based models. However, only the CNN can produce reliable predictions that can be used to identify forecasts of opportunity. Using explainable machine learning we identify which variables and grid points of the input fields are most relevant for confident and correct predictions in the CNN. Our results show that the local circulation is most important as represented by maximum relevance of 850-hPa geopotential heights and zonal winds to making skillful, high-probability predictions. Corresponding composite anomalies identify connections with El Niño–Southern Oscillation during winter and the Atlantic multidecadal oscillation and North Atlantic subtropical high during summer.