Surrogate models are used to map input data to output data when the actual relationship between the two is unknown or computationally expensive to evaluate. We have constructed a tool to recommend the appropriate surrogate modelling technique for a given dataset using attributes calculated from the input and output values. The tool identifies the appropriate surrogate modeling techniques with an accuracy of 98% and a precision of 91%.
more »
« less
Novel Tool for Selecting Surrogate Modeling Techniques for Surface Approximation
Surrogate models are used to map input data to output data when the actual relationship between the two is unknown or computationally expensive to evaluate for several applications, including surface approximation and surrogate-based optimization. Many techniques have been developed for surrogate modeling; however, a systematic method for selecting suitable techniques for an application remains an open challenge. This work compares the performance of eight surrogate modeling techniques for approximating a surface over a set of simulated data. Using the comparison results, we constructed a Random Forest based tool to recommend the appropriate surrogate modeling technique for a given dataset using attributes calculated only from the available input and output values. The tool identifies the appropriate surrogate modeling techniques for surface approximation with an accuracy of 87% and a precision of 86%. Using the tool for surrogate model form selection enables computational time savings by avoiding expensive trial-and-error selection methods.
more »
« less
- Award ID(s):
- 1743445
- PAR ID:
- 10297247
- Editor(s):
- Turkay, M. Aydin
- Date Published:
- Journal Name:
- Computer aided chemical engineering
- Volume:
- 50
- ISSN:
- 2543-1331
- Page Range / eLocation ID:
- 451-456
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Surrogate models are used to map input data to output data when the actual relationship between the two is unknown or computationally expensive to evaluate for sensitivity analysis, uncertainty propagation and surrogate based optimization. This work evaluates the performance of eight surrogate modeling techniques for design space approximation and surrogate based optimization applications over a set of generated datasets with known characteristics. With this work, we aim to provide general rules for selecting an appropriate surrogate model form solely based on the characteristics of the data being modeled. The computational experiments revealed that, in general, multivariate adaptive regression spline models (MARS) and single hidden layer feed forward neural networks (ANN) yielded the most accurate predictions over the design space while Random Forest (RF) models most reliably identified the locations of the optimums when used for surrogate-based optimization.more » « less
-
In this work, generalized polynomial chaos (gPC) expansion for land surface model parameter estimation is evaluated. We perform inverse modeling and compute the posterior distribution of the critical hydrological parameters that are subject to great uncertainty in the Community Land Model (CLM) for a given value of the output LH. The unknown parameters include those that have been identified as the most influential factors on the simulations of surface and subsurface runoff, latent and sensible heat fluxes, and soil moisture in CLM4.0. We set up the inversion problem in the Bayesian framework in two steps: (i) building a surrogate model expressing the input–output mapping, and (ii) performing inverse modeling and computing the posterior distributions of the input parameters using observation data for a given value of the output LH. The development of the surrogate model is carried out with a Bayesian procedure based on the variable selection methods that use gPC expansions. Our approach accounts for bases selection uncertainty and quantifies the importance of the gPC terms, and, hence, all of the input parameters, via the associated posterior probabilities.more » « less
-
Stochastic emulation techniques represent a specialized surrogate modeling branch that is appropriate for applications for which the relationship between input and output is stochastic in nature. Their objective is to address the stochastic uncertainty sources by directly predicting the output distribution for a given input. An example of such application, and the focus of this contribution, is the estimation of structural response (engineering demand parameter) distribution in seismic risk assessment. In this case, the stochastic uncertainty originates from the aleatoric variability in the seismic hazard description. Note that this is a different uncertainty-source than the potential parametric uncertainty associated with structural characteristics or explanatory variables for the seismic hazard (for example, intensity measures), that are treated as the parametric input in surrogate modeling context. The key challenge in stochastic emulation pertains to addressing heteroscedasticity in the output variability. Relevant approaches to-date for addressing this challenge have focused on scalar outputs. In contrast, this paper focuses on the multi-output stochastic emulation problem and presents a methodology for predicting the output correlation matrix, while fully addressing heteroscedastic characteristics. This is achieved by introducing a Gaussian Process (GP) regression model for approximating the components of the correlation matrix, and coupling this approximation with a correction step to guarantee positive definite properties for the resultant predictions. For obtaining the observation data to inform the GP calibration, different approaches are examined, relying-or-not on the existence of replicated samples for the response output. Such samples require that, for a portion of the training points, simulations are repeated for the same inputs and different descriptions of the stochastic uncertainty. This information can be readily used to obtain observation for the response statistics (correlation or covariance in this instance) to inform the GP development. An alternative approach is to use as observations noisy covariance samples based on the sample deviations from a primitive mean approximation. These different observation variants lead to different GP variants that are compared within a comprehensive case study. A computational framework for integrating the correlation matrix approximation within the stochastic emulation for the marginal distribution approximation of each output component is also discussed, to provide the joint response distribution approximation.more » « less
-
Abstract Having the ability to analyze, simulate, and optimize complex systems is becoming more important in all engineering disciplines. Decision‐making using complex systems usually leads to nonlinear optimization problems, which rely on computationally expensive simulations. Therefore, it is often challenging to detect the actual structure of the optimization problem and formulate these problems with closed‐form analytical expressions. Surrogate‐based optimization of complex systems is a promising approach that is based on the concept of adaptively fitting and optimizing approximations of the input–output data. Standard surrogate‐based optimization assumes the degrees of freedom are known a priori; however, in real applications the sparsity and the actual structure of the black‐box formulation may not be known. In this work, we propose to select the correct variables contributing to each objective function and constraints of the black‐box problem, by formulating the identification of the true sparsity of the formulation as a nonlinear feature selection problem. We compare three variable selection criteria based on Support Vector Regression and develop efficient algorithms to detect the sparsity of black‐box formulations when only a limited amount of deterministic or noisy data is available.more » « less
An official website of the United States government

