skip to main content


Title: Unbiased Metamodeling via Likelihood Ratios
Metamodeling has been a topic of longstanding interest in stochastic simulation because of the usefulness of metamodels for optimization, sensitivity, and real- or near-real-time decision making. Experiment design is the foundation of classical metamodeling: an effective experiment design uncovers the spatial relationships among the design/decision variables and the simulation response; therefore, more design points, providing better coverage of space, is almost always better. However, metamodeling based on likelihood ratios (LRs) turns the design question on its head: each design point provides an unbiased prediction of the response at any other location in space, but perhaps with such inflated variance as to be counterproductive. Thus, the question becomes more which design points to employ for prediction and less where to place them. In this paper we take the first comprehensive look at LR metamodeling, categorizing both the various types of LR metamodels and the contexts in which they might be employed.  more » « less
Award ID(s):
1634982
NSF-PAR ID:
10122979
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the 2018 Winter Simulation Conference
Page Range / eLocation ID:
1778-1789
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background

    Metamodels can address some of the limitations of complex simulation models by formulating a mathematical relationship between input parameters and simulation model outcomes. Our objective was to develop and compare the performance of a machine learning (ML)–based metamodel against a conventional metamodeling approach in replicating the findings of a complex simulation model.

    Methods

    We constructed 3 ML-based metamodels using random forest, support vector regression, and artificial neural networks and a linear regression-based metamodel from a previously validated microsimulation model of the natural history hepatitis C virus (HCV) consisting of 40 input parameters. Outcomes of interest included societal costs and quality-adjusted life-years (QALYs), the incremental cost-effectiveness (ICER) of HCV treatment versus no treatment, cost-effectiveness analysis curve (CEAC), and expected value of perfect information (EVPI). We evaluated metamodel performance using root mean squared error (RMSE) and Pearson’s R2on the normalized data.

    Results

    The R2values for the linear regression metamodel for QALYs without treatment, QALYs with treatment, societal cost without treatment, societal cost with treatment, and ICER were 0.92, 0.98, 0.85, 0.92, and 0.60, respectively. The corresponding R2values for our ML-based metamodels were 0.96, 0.97, 0.90, 0.95, and 0.49 for support vector regression; 0.99, 0.83, 0.99, 0.99, and 0.82 for artificial neural network; and 0.99, 0.99, 0.99, 0.99, and 0.98 for random forest. Similar trends were observed for RMSE. The CEAC and EVPI curves produced by the random forest metamodel matched the results of the simulation output more closely than the linear regression metamodel.

    Conclusions

    ML-based metamodels generally outperformed traditional linear regression metamodels at replicating results from complex simulation models, with random forest metamodels performing best.

    Highlights

    Decision-analytic models are frequently used by policy makers and other stakeholders to assess the impact of new medical technologies and interventions. However, complex models can impose limitations on conducting probabilistic sensitivity analysis and value-of-information analysis, and may not be suitable for developing online decision-support tools. Metamodels, which accurately formulate a mathematical relationship between input parameters and model outcomes, can replicate complex simulation models and address the above limitation. The machine learning–based random forest model can outperform linear regression in replicating the findings of a complex simulation model. Such a metamodel can be used for conducting cost-effectiveness and value-of-information analyses or developing online decision support tools.

     
    more » « less
  2. Feng, B. ; Pedrielli, G ; Peng, Y. ; Shashaani, S. ; Song, E. ; Corlu, C. ; Lee, L. ; Chew, E. ; Roeder, T. ; Lendermann, P. (Ed.)
    Ranking&selection (R&S) procedures are simulation-optimization algorithms for making one-time decisions among a finite set of alternative system designs or feasible solutions with a statistical assurance of a good selection. R&S with covariates (R&S+C) extends the paradigm to allow the optimal selection to depend on contextual information that is obtained just prior to the need for a decision. The dominant approach for solving such problems is to employ offline simulation to create metamodels that predict the performance of each system or feasible solution as a function of the covariate. This paper introduces a fundamentally different approach that solves individual R&S problems offline for various values of the covariate, and then treats the real-time decision as a classification problem: given the covariate information, which system is a good solution? Our approach exploits the availability of efficient R&S procedures, requires milder assumptions than the metamodeling paradigm to provide strong guarantees, and can be more efficient. 
    more » « less
  3. Feng, B. ; Pedrielli, G ; Peng, Y. ; Shashaani, S. ; Song, E. ; Corlu, C. ; Lee, L. ; Chew, E. ; Roeder, T. ; Lendermann, P. (Ed.)
    Ranking & selection (R&S) procedures are simulation-optimization algorithms for making one-time decisions among a finite set of alternative system designs or feasible solutions with a statistical assurance of a good selection. R&S with covariates (R&S+C) extends the paradigm to allow the optimal selection to depend on contextual information that is obtained just prior to the need for a decision. The dominant approach for solving such problems is to employ offline simulation to create metamodels that predict the performance of each system or feasible solution as a function of the covariate. This paper introduces a fundamentally different approach that solves individual R&S problems offline for various values of the covariate, and then treats the real-time decision as a classification problem: given the covariate information, which system is a good solution? Our approach exploits the availability of efficient R&S procedures, requires milder assumptions than the metamodeling paradigm to provide strong guarantees, and can be more efficient. 
    more » « less
  4. This research concerns the uncertainty analysis and quantification of the vibration system utilizing the frequency response function (FRF) representation with statistical metamodeling. Different from previous statistical metamodels that are built for individual frequency points, in this research we take advantage of the inherent correlation of FRF values at different frequency points and resort to the multiple response Gaussian process (MRGP) approach. To enable the analysis, vector fitting method is adopted to represent an FRF using a reduced set of parameters with high accuracy. Owing to the efficiency and accuracy of the statistical metamodel with a small set of parameters, Bayesian inference can then be incorporated to realize model updating and uncertainty identification as new measurement/evidence is acquired. The MRGP metamodel developed under this new framework can be used effectively for two-way uncertainty propagation analysis, i.e., FRF prediction and uncertainty identification. Case studies are conducted for illustration and verification. 
    more » « less
  5. Machine learning (ML) methods, such as artificial neural networks (ANN), k-nearest neighbors (kNN), random forests (RF), support vector machines (SVM), and boosted decision trees (DTs), may offer stronger predictive performance than more traditional, parametric methods, such as linear regression, multiple linear regression, and logistic regression (LR), for specific mapping and modeling tasks. However, this increased performance is often accompanied by increased model complexity and decreased interpretability, resulting in critiques of their “black box” nature, which highlights the need for algorithms that can offer both strong predictive performance and interpretability. This is especially true when the global model and predictions for specific data points need to be explainable in order for the model to be of use. Explainable boosting machines (EBM), an augmentation and refinement of generalize additive models (GAMs), has been proposed as an empirical modeling method that offers both interpretable results and strong predictive performance. The trained model can be graphically summarized as a set of functions relating each predictor variable to the dependent variable along with heat maps representing interactions between selected pairs of predictor variables. In this study, we assess EBMs for predicting the likelihood or probability of slope failure occurrence based on digital terrain characteristics in four separate Major Land Resource Areas (MLRAs) in the state of West Virginia, USA and compare the results to those obtained with LR, kNN, RF, and SVM. EBM provided predictive accuracies comparable to RF and SVM and better than LR and kNN. The generated functions and visualizations for each predictor variable and included interactions between pairs of predictor variables, estimation of variable importance based on average mean absolute scores, and provided scores for each predictor variable for new predictions add interpretability, but additional work is needed to quantify how these outputs may be impacted by variable correlation, inclusion of interaction terms, and large feature spaces. Further exploration of EBM is merited for geohazard mapping and modeling in particular and spatial predictive mapping and modeling in general, especially when the value or use of the resulting predictions would be greatly enhanced by improved interpretability globally and availability of prediction explanations at each cell or aggregating unit within the mapped or modeled extent. 
    more » « less