skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Creating and Evaluating Uncertainty Estimates with Neural Networks for Environmental-Science Applications
Abstract Neural networks (NN) have become an important tool for prediction tasks – both regression and classification – in environmental science. Since many environmental-science problems involve life-or-death decisions and policy-making, it is crucial to provide not only predictions but also an estimate of the uncertainty in the predictions. Until recently, very few tools were available to provide uncertainty quantification (UQ) for NN predictions. However, in recent years the computer-science field has developed numerous UQ approaches, and several research groups are exploring how to apply these approaches in environmental science. We provide an accessible introduction to six of these UQ approaches, then focus on tools for the next step, namely to answer the question: Once we obtain an uncertainty estimate (using any approach), how do we know whether it is good or bad? To answer this question, we highlight four evaluation graphics and eight evaluation scores that are well suited for evaluating and comparing uncertainty estimates (NN-based or otherwise) for environmental-science applications. We demonstrate the UQ approaches and UQ-evaluation methods for two real-world problems: (1) estimating vertical profiles of atmospheric dewpoint (a regression task) and (2) predicting convection over Taiwan based on Himawari-8 satellite imagery (a classification task). We also provide Jupyter notebooks with Python code for implementing the UQ approaches and UQ-evaluation methods discussed herein. This article provides the environmental-science community with the knowledge and tools to start incorporating the large number of emerging UQ methods into their research.  more » « less
Award ID(s):
1934668
PAR ID:
10401892
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Artificial Intelligence for the Earth Systems
ISSN:
2769-7525
Page Range / eLocation ID:
1 to 58
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Neural networks (NNs) are increasingly used for data‐driven subgrid‐scale parameterizations in weather and climate models. While NNs are powerful tools for learning complex non‐linear relationships from data, there are several challenges in using them for parameterizations. Three of these challenges are (a) data imbalance related to learning rare, often large‐amplitude, samples; (b) uncertainty quantification (UQ) of the predictions to provide an accuracy indicator; and (c) generalization to other climates, for example, those with different radiative forcings. Here, we examine the performance of methods for addressing these challenges using NN‐based emulators of the Whole Atmosphere Community Climate Model (WACCM) physics‐based gravity wave (GW) parameterizations as a test case. WACCM has complex, state‐of‐the‐art parameterizations for orography‐, convection‐, and front‐driven GWs. Convection‐ and orography‐driven GWs have significant data imbalance due to the absence of convection or orography in most grid points. We address data imbalance using resampling and/or weighted loss functions, enabling the successful emulation of parameterizations for all three sources. We demonstrate that three UQ methods (Bayesian NNs, variational auto‐encoders, and dropouts) provide ensemble spreads that correspond to accuracy during testing, offering criteria for identifying when an NN gives inaccurate predictions. Finally, we show that the accuracy of these NNs decreases for a warmer climate (4 × CO2). However, their performance is significantly improved by applying transfer learning, for example, re‐training only one layer using ∼1% new data from the warmer climate. The findings of this study offer insights for developing reliable and generalizable data‐driven parameterizations for various processes, including (but not limited to) GWs. 
    more » « less
  2. In recent years, neural networks (NNs) have been embraced by several scientific and engineering disciplines for diverse modeling and inferencing applications. The importance of quantifying the confidence in NN predictions has escalated due to the increasing adoption of these decision models. Nevertheless, conventional NN do not furnish uncertainty estimates associated with their predictions and are therefore ill-calibrated. Uncertainty quantification techniques offer probability distributions or CIs to represent the uncertainty associated with NN predictions, instead of solely presenting the point predictions/estimates. Once the uncertainty in NN is quantified, it is crucial to leverage this information to modify training objectives and improve the accuracy and reliability of the corresponding decision models. This work presents a novel framework to utilize the knowledge of input and output uncertainties in NN to guide querying process in the context of Active Learning. We also derive the lower and upper bounds for label complexity. The efficacy of the proposed framework is established by conducting experiments across classification and regression tasks. 
    more » « less
  3. Deep Learning (DL) methods have been transforming computer vision with innovative adaptations to other domains including climate change. For DL to pervade Science and Engineering (S&EE) applications where risk management is a core component, well-characterized uncertainty estimates must accompany predictions. However, S&E observations and model-simulations often follow heavily skewed distributions and are not well modeled with DL approaches, since they usually optimize a Gaussian, or Euclidean, likelihood loss. Recent developments in Bayesian Deep Learning (BDL), which attempts to capture uncertainties from noisy observations, aleatoric, and from unknown model parameters, epistemic, provide us a foundation. Here we present a discrete-continuous BDL model with Gaussian and lognormal likelihoods for uncertainty quantification (UQ). We demonstrate the approach by developing UQ estimates on “DeepSD’‘, a super-resolution based DL model for Statistical Downscaling (SD) in climate applied to precipitation, which follows an extremely skewed distribution. We find that the discrete-continuous models outperform a basic Gaussian distribution in terms of predictive accuracy and uncertainty calibration. Furthermore, we find that the lognormal distribution, which can handle skewed distributions, produces quality uncertainty estimates at the extremes. Such results may be important across S&E, as well as other domains such as finance and economics, where extremes are often of significant interest. Furthermore, to our knowledge, this is the first UQ model in SD where both aleatoric and epistemic uncertainties are characterized. 
    more » « less
  4. Abstract The rapid intensification (RI) of tropical cyclones (TC), defined here as an intensity increase of ≥ 30 kt in 24 hours, is a difficult but important forecasting problem. Operational RI forecasts have considerably improved since the late 2000s, largely thanks to better statistical models, including machine learning (ML). Most ML applications use scalars from the Statistical Hurricane Intensity Prediction Scheme (SHIPS) development dataset as predictors, describing the TC history, near-TC environment, and satellite presentation of the TC. More recent ML applications use convolutional neural networks (CNN), which can ingest full satellite images (or time series of images) and freely “decide” which spatiotemporal features are important for RI. However, two questions remain unanswered: (1) Does image convolution significantly improve RI skill? (2) What strategies do CNNs use for RI prediction – and can we gain new insights from these strategies? We use an ablation experiment to answer the first question and explainable artificial intelligence (XAI) to answer the second. Convolution leads to only a small performance gain, likely because, as revealed by XAI, the CNN’s main strategy uses image features already well described in scalar predictors used by pre-existing RI models. This work makes three additional contributions to the literature: (1) NNs with SHIPS data outperform pre-existing models in some aspects; (2) NNs provide well calibrated uncertainty quantification (UQ), while pre-existing models have no UQ; (3) the NN without SHIPS data performs surprisingly well and is fairly independent of pre-existing models, suggesting its potential value in an operational ensemble. 
    more » « less
  5. Cyber-physical systems are starting to adopt neural network (NN) models for a variety of smart sensing applications. While several efforts seek better NN architectures for system performance improvement, few attempts have been made to study the deployment of these systems in the field. Proper deployment of these systems is critical to achieving ideal performance, but the current practice is largely empirical via trials and errors, lacking a measure of quality. Sensing quality should reflect the impact on the performance of NN models that drive machine perception tasks. However, traditional approaches either evaluate statistical difference that exists objectively, or model the quality subjectively via human perception. In this work, we propose an efficient sensing quality measure requiring limited data samples using smart voice sensing system as an example. We adopt recent techniques in uncertainty evaluation for NN to estimate audio sensing quality. Intuitively, a deployment at better sensing location should lead to less uncertainty in NN predictions. We design SQEE, Sensing Quality Evaluation at the Edge for NN models, which constructs a model ensemble through Monte-Carlo dropout and estimates posterior total uncertainty via average conditional entropy. We collected data from three indoor environments, with a total of 148 transmitting-receiving (t-r) locations experimented and more than 7,000 examples tested. SQEE achieves the best performance in terms of the top-1 ranking accuracy---whether the measure finds the best spot for deployment, in comparison with other uncertainty strategies. We implemented SQEE on a ReSpeaker to study SQEE's real-world efficacy. Experimental result shows that SQEE can effectively evaluate the data collected from each t-r location pair within 30 seconds and achieve an average top-3 ranking accuracy of over 94%. We further discuss generalization of our framework to other sensing schemes. 
    more » « less