- Award ID(s):
- 1735505
- NSF-PAR ID:
- 10111431
- Date Published:
- Journal Name:
- KDD 2018
- Page Range / eLocation ID:
- 2377 to 2386
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
With the availability of data and computational technologies in the modern world, machine learning (ML) has emerged as a preferred methodology for data analysis and prediction. While ML holds great promise, the results from such models are not fully unreliable due to the challenges introduced by uncertainty. An ML model generates an optimal solution based on its training data. However, if the uncertainty in the data and the model parameters are not considered, such optimal solutions have a high risk of failure in actual world deployment. This paper surveys the different approaches used in ML to quantify uncertainty. The paper also exhibits the implications of quantifying uncertainty when using ML by performing two case studies with space physics in focus. The first case study consists of the classification of auroral images in predefined labels. In the second case study, the horizontal component of the perturbed magnetic field measured at the Earth’s surface was predicted for the study of Geomagnetically Induced Currents (GICs) by training the model using time series data. In both cases, a Bayesian Neural Network (BNN) was trained to generate predictions, along with epistemic and aleatoric uncertainties. Finally, the pros and cons of both Gaussian Process Regression (GPR) models and Bayesian Deep Learning (DL) are weighed. The paper also provides recommendations for the models that need exploration, focusing on space weather prediction.more » « less
-
Hybrid rocket motors with paraffin-based fuels are of interest due to higher regression rates compared to other polymers. During paraffin combustion, a liquid layer forms on the fuel surface that, together with shearing forces from the oxidizer flow, results in the formation of instabilities at the fuel-oxidizer interface. These instabilities lead to the formation and entrainment of heterogeneous sized liquid droplets into the main flow and the combusting droplets result in higher motor output. The atomization process begins with droplet formation and ends with droplet pinch-off. The goal of this paper is to conduct an uncertainty quantification (UQ) analysis of the pinch-off process characterized by a pinch-off volume ($V_{po}$) and time ($t_{po}$). We study these quantities of interest (QoIs) in the context of a slab burner setup. We have developed a computationally expensive mathematical model that describes droplet formation under external forcing and trained an inexpensive Gaussian Process surrogate of the model to facilitate UQ. We use the pinch-off surrogate to forward propagate uncertainty of the model inputs to the QoIs and conduct two studies: one with gravity present and one without gravity effects. After forward-propagating the uncertainty of the inputs using the surrogate, we concluded that both QoIs have right-skewed distributions, corresponding to larger probability densities towards smaller pinch-off volumes and times. Specifically, for the pinch-off times, the resulting distributions reflect the effect of gravity acting against droplet formation, resulting in longer pinch-off times compared to the case where there is no gravity.more » « less
-
Yamashita, Y. ; Kano, M. (Ed.)Bayesian hybrid models (BHMs) fuse physics-based insights with machine learning constructs to correct for systematic bias. In this paper, we demonstrate a scalable computational strategy to embed BHMs in an equation-oriented modelling environment. Thus, this paper generalizes stochastic programming, which traditionally focuses on aleatoric uncertainty (as characterized by a probability distribution for uncertainty model parameters) to also consider epistemic uncertainty, i.e., mode-form uncertainty or systematic bias as modelled by the Gaussian process in the BHM. As an illustrative example, we consider ballistic firing using a BHM that includes a simplified glass-box (i.e., equation-oriented) model that neglects air resistance and a Gaussian process model to account for systematic bias (i.e., epistemic or model-form uncertainty) induced from the model simplification. The gravity parameter and the GP hypermeters are inferred from data in a Bayesian framework, yielding a posterior distribution. A novel single-stage stochastic program formulation using the posterior samples and Gaussian quadrature rules is proposed to compute the optimal decisions (e.g., firing angle and velocity) that minimize the expected value of an objective (e.g., distance from a stationary target). PySMO is used to generate expressions for the GP prediction mean and uncertainty in Pyomo, enabling efficient optimization with gradient-based solvers such as Ipopt. A scaling study characterizes the solver time and number of iterations for up to 2,000 samples from the posterior.more » « less
-
Abstract The Consistent Artificial Intelligence (AI)-based Soil Moisture (CASM) dataset is a global, consistent, and long-term, remote sensing soil moisture (SM) dataset created using machine learning. It is based on the NASA Soil Moisture Active Passive (SMAP) satellite mission SM data and is aimed at extrapolating SMAP-like quality SM back in time using previous satellite microwave platforms. CASM represents SM in the top soil layer, and it is defined on a global 25 km EASE-2 grid and for 2002–2020 with a 3-day temporal resolution. The seasonal cycle is removed for the neural network training to ensure its skill is targeted at predicting SM extremes. CASM comparison to 367 global
in-situ SM monitoring sites shows a SMAP-like median correlation of 0.66. Additionally, the SM product uncertainty was assessed, and both aleatoric and epistemic uncertainties were estimated and included in the dataset. CASM dataset can be used to study a wide range of hydrological, carbon cycle, and energy processes since only a consistent long-term dataset allows assessing changes in water availability and water stress. -
Abstract Ideally, probabilistic hazard assessments combine available knowledge about physical mechanisms of the hazard, data on past hazards, and any precursor information. Systematically assessing the probability of rare, yet catastrophic hazards adds a layer of difficulty due to limited observation data. Via computer models, one can exercise potentially dangerous scenarios that may not have happened in the past but are probabilistically consistent with the aleatoric nature of previous volcanic behavior in the record. Traditional Monte Carlo‐based methods to calculate such hazard probabilities suffer from two issues: they are computationally expensive, and they are static. In light of new information, newly available data, signs of unrest, and new probabilistic analysis describing uncertainty about scenarios the Monte Carlo calculation would need to be redone under the same computational constraints. Here we present an alternative approach utilizing statistical emulators that provide an efficient way to overcome the computational bottleneck of typical Monte Carlo approaches. Moreover, this approach is independent of an aleatoric scenario model and yet can be applied rapidly to any scenario model making it dynamic. We present and apply this emulator‐based approach to create multiple probabilistic hazard maps for inundation of pyroclastic density currents in the Long Valley Volcanic Region. Further, we illustrate how this approach enables an exploration of the impact of epistemic uncertainties on these probabilistic hazard forecasts. Particularly, we focus on the uncertainty of vent opening models and how that uncertainty both aleatoric and epistemic impacts the resulting probabilistic hazard maps of pyroclastic density current inundation.