skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Dual Accuracy-Quality-Driven Neural Network for Prediction Interval Generation
Accurate uncertainty quantification is necessary to enhance the reliability of deep learning (DL) models in realworld applications. In the case of regression tasks, prediction intervals (PIs) should be provided along with the deterministic predictions of DL models. Such PIs are useful or “high-quality (HQ)” as long as they are sufficiently narrow and capture most of the probability density. In this article, we present a method to learn PIs for regression-based neural networks (NNs) automatically in addition to the conventional target predictions. In particular, we train two companion NNs: one that uses one output, the target estimate, and another that uses two outputs, the upper and lower bounds of the corresponding PI. Our main contribution is the design of a novel loss function for the PI-generation network that takes into account the output of the target-estimation network and has two optimization objectives: minimizing the mean PI width and ensuring the PI integrity using constraints that maximize the PI probability coverage implicitly. Furthermore, we introduce a self-adaptive coefficient that balances both objectives within the loss function, which alleviates the task of fine-tuning. Experiments using a synthetic dataset, eight benchmark datasets, and a real-world crop yield prediction dataset showed that our method was able to maintain a nominal probability coverage and produce significantly narrower PIs without detriment to its target estimation accuracy when compared to those PIs generated by three state-of-the-art neuralnetwork-based methods. In other words, our method was shown to produce higher quality PIs.  more » « less
Award ID(s):
1664858 2242802
PAR ID:
10493692
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Neural Networks and Learning Systems
ISSN:
2162-237X
Page Range / eLocation ID:
1 to 11
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Given its demonstrated ability in analyzing and revealing patterns underlying data, Deep Learning (DL) has been increasingly investigated to complement physics-based models in various aspects of smart manufacturing, such as machine condition monitoring and fault diagnosis, complex manufacturing process modeling, and quality inspection. However, successful implementation of DL techniques relies greatly on the amount, variety, and veracity of data for robust network training. Also, the distributions of data used for network training and application should be identical to avoid the internal covariance shift problem that reduces the network performance applicability. As a promising solution to address these challenges, Transfer Learning (TL) enables DL networks trained on a source domain and task to be applied to a separate target domain and task. This paper presents a domain adversarial TL approach, based upon the concepts of generative adversarial networks. In this method, the optimizer seeks to minimize the loss (i.e., regression or classification accuracy) across the labeled training examples from the source domain while maximizing the loss of the domain classifier across the source and target data sets (i.e., maximizing the similarity of source and target features). The developed domain adversarial TL method has been implemented on a 1-D CNN backbone network and evaluated for prediction of tool wear propagation, using NASA's milling dataset. Performance has been compared to other TL techniques, and the results indicate that domain adversarial TL can successfully allow DL models trained on certain scenarios to be applied to new target tasks. 
    more » « less
  2. Predicting the minimum operating voltage Vmin of chips is one of the important techniques for improving the manufacturing testing flow, as well as ensuring the long-term reliability and safety of in-field systems. Current Vmin prediction methods often provide only point estimates, necessitating additional techniques for constructing prediction confidence intervals to cover uncertainties caused by different sources of variations. While some existing techniques offer region predictions, but they rely on certain distributional assumptions and/or provide no coverage guarantees. In response to these limitations, we propose a novel distribution-free Vmin interval estimation methodology possessing a theoretical guarantee of coverage. Our approach leverages conformalized quantile regression and on-chip monitors to generate reliable prediction intervals. We demonstrate the effectiveness of the proposed method on an industrial 5nm automotive chip dataset. Moreover, we show that the use of on-chip monitors can reduce the interval length significantly for Vmin prediction. 
    more » « less
  3. Abstract A simple method for adding uncertainty to neural network regression tasks in earth science via estimation of a general probability distribution is described. Specifically, we highlight the sinh-arcsinh-normal distributions as particularly well suited for neural network uncertainty estimation. The methodology supports estimation of heteroscedastic, asymmetric uncertainties by a simple modification of the network output and loss function. Method performance is demonstrated by predicting tropical cyclone intensity forecast uncertainty and by comparing two other common methods for neural network uncertainty quantification (i.e., Bayesian neural networks and Monte Carlo dropout). The simple approach described here is intuitive and applicable when no prior exists and one just wishes to parameterize the output and its uncertainty according to some previously defined family of distributions. The authors believe it will become a powerful, go-to method moving forward. 
    more » « less
  4. Abstract The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed “forecasts of opportunity.” When these opportunities are not present, scientists need prediction systems that are capable of saying “I don't know.” We introduce a novel loss function, termed “abstention loss,” that allows neural networks to identify forecasts of opportunity for regression problems. The abstention loss works by incorporating uncertainty in the network's prediction to identify the more confident samples and abstain (say “I don't know”) on the less confident samples. The abstention loss is designed to determine the optimal abstention fraction, or abstain on a user‐defined fraction using a standard adaptive controller. Unlike many methods for attaching uncertainty to neural network predictions post‐training, the abstention loss is applied during training to preferentially learn from the more confident samples. The abstention loss is built upon nonlinear heteroscedastic regression, a standard computer science method. While nonlinear heteroscedastic regression is a simple yet powerful tool for incorporating uncertainty in regression problems, we demonstrate that the abstention loss outperforms it for the synthetic climate use cases explored here. The implementation of the proposed abstention loss is straightforward in most network architectures designed for regression, as it only requires modification of the output layer and loss function. 
    more » « less
  5. Abstract The arrival time prediction of coronal mass ejections (CMEs) is an area of active research. Many methods with varying levels of complexity have been developed to predict CME arrival. However, the mean absolute error (MAE) of predictions remains above 12 hr, even with the increasing complexity of methods. In this work we develop a new method for CME arrival time prediction that uses magnetohydrodynamic simulations involving data-constrained flux-rope-based CMEs, which are introduced in a data-driven solar wind background. We found that for six CMEs studied in this work the MAE in arrival time was ∼8 hr. We further improved our arrival time predictions by using ensemble modeling and comparing the ensemble solutions with STEREO-A and STEREO-B heliospheric imager data. This was done by using our simulations to create synthetic J-maps. A machine-learning (ML) method called the lasso regression was used for this comparison. Using this approach, we could reduce the MAE to ∼4 hr. Another ML method based on the neural networks (NNs) made it possible to reduce the MAE to ∼5 hr for the cases when HI data from both STEREO-A and STEREO-B were available. NNs are capable of providing similar MAE when only the STEREO-A data are used. Our methods also resulted in very encouraging values of standard deviation (precision) of arrival time. The methods discussed in this paper demonstrate significant improvements in the CME arrival time predictions. Our work highlights the importance of using ML techniques in combination with data-constrained magnetohydrodynamic modeling to improve space weather predictions. 
    more » « less