skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Rare-event Simulation for Neural Network and Random Forest Predictors
We study rare-event simulation for a class of problems where the target hitting sets of interest are defined via modern machine learning tools such as neural networks and random forests. This problem is motivated from fast emerging studies on the safety evaluation of intelligent systems, robustness quantification of learning models, and other potential applications to large-scale simulation in which machine learning tools can be used to approximate complex rare-event set boundaries. We investigate an importance sampling scheme that integrates the dominating point machinery in large deviations and sequential mixed integer programming to locate the underlying dominating points. Our approach works for a range of neural network architectures including fully connected layers, rectified linear units, normalization, pooling and convolutional layers, and random forests built from standard decision trees. We provide efficiency guarantees and numerical demonstration of our approach using a classification model in the UCI Machine Learning Repository.  more » « less
Award ID(s):
1849304 1849280
PAR ID:
10416590
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM Transactions on Modeling and Computer Simulation
Volume:
32
Issue:
3
ISSN:
1049-3301
Page Range / eLocation ID:
1 to 33
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract As modeling tools and approaches become more advanced, ecological models are becoming more complex. Traditional sensitivity analyses can struggle to identify the nonlinearities and interactions emergent from such complexity, especially across broad swaths of parameter space. This limits understanding of the ecological mechanisms underlying model behavior. Machine learning approaches are a potential answer to this issue, given their predictive ability when applied to complex large datasets. While perceptions that machine learning is a “black box” linger, we seek to illuminate its interpretive potential in ecological modeling. To do so, we detail our process of applying random forests to complex model dynamics to produce both high predictive accuracy and elucidate the ecological mechanisms driving our predictions. Specifically, we employ an empirically rooted ontogenetically stage-structured consumer-resource simulation model. Using simulation parameters as feature inputs and simulation output as dependent variables in our random forests, we extended feature analyses into a simple graphical analysis from which we reduced model behavior to three core ecological mechanisms. These ecological mechanisms reveal the complex interactions between internal plant demography and trophic allocation driving community dynamics while preserving the predictive accuracy achieved by our random forests. 
    more » « less
  2. Abstract Deep-learning models have become pervasive tools in science and engineering. However, their energy requirements now increasingly limit their scalability 1 . Deep-learning accelerators 2–9 aim to perform deep learning energy-efficiently, usually targeting the inference phase and often by exploiting physical substrates beyond conventional electronics. Approaches so far 10–22 have been unable to apply the backpropagation algorithm to train unconventional novel hardware in situ. The advantages of backpropagation have made it the de facto training method for large-scale neural networks, so this deficiency constitutes a major impediment. Here we introduce a hybrid in situ–in silico algorithm, called physics-aware training, that applies backpropagation to train controllable physical systems. Just as deep learning realizes computations with deep neural networks made from layers of mathematical functions, our approach allows us to train deep physical neural networks made from layers of controllable physical systems, even when the physical layers lack any mathematical isomorphism to conventional artificial neural network layers. To demonstrate the universality of our approach, we train diverse physical neural networks based on optics, mechanics and electronics to experimentally perform audio and image classification tasks. Physics-aware training combines the scalability of backpropagation with the automatic mitigation of imperfections and noise achievable with in situ algorithms. Physical neural networks have the potential to perform machine learning faster and more energy-efficiently than conventional electronic processors and, more broadly, can endow physical systems with automatically designed physical functionalities, for example, for robotics 23–26 , materials 27–29 and smart sensors 30–32 . 
    more » « less
  3. Abstract SummaryAdvances in survival analysis have facilitated unprecedented flexibility in data modeling, yet there remains a lack of tools for illustrating the influence of continuous covariates on predicted survival outcomes. We propose the utilization of a colored contour plot to depict the predicted survival probabilities over time. Our approach is capable of supporting conventional models, including the Cox and Fine–Gray models. However, its capability shines when coupled with cutting-edge machine learning models such as random survival forests and deep neural networks. Availability and implementationWe provide a Shiny app at https://biostatistics.mdanderson.org/shinyapps/survivalContour/ and an R package available at https://github.com/YushuShi/survivalContour as implementations of this tool. 
    more » « less
  4. BackgroundMetamodels can address some of the limitations of complex simulation models by formulating a mathematical relationship between input parameters and simulation model outcomes. Our objective was to develop and compare the performance of a machine learning (ML)–based metamodel against a conventional metamodeling approach in replicating the findings of a complex simulation model. MethodsWe constructed 3 ML-based metamodels using random forest, support vector regression, and artificial neural networks and a linear regression-based metamodel from a previously validated microsimulation model of the natural history hepatitis C virus (HCV) consisting of 40 input parameters. Outcomes of interest included societal costs and quality-adjusted life-years (QALYs), the incremental cost-effectiveness (ICER) of HCV treatment versus no treatment, cost-effectiveness analysis curve (CEAC), and expected value of perfect information (EVPI). We evaluated metamodel performance using root mean squared error (RMSE) and Pearson’s R2on the normalized data. ResultsThe R2values for the linear regression metamodel for QALYs without treatment, QALYs with treatment, societal cost without treatment, societal cost with treatment, and ICER were 0.92, 0.98, 0.85, 0.92, and 0.60, respectively. The corresponding R2values for our ML-based metamodels were 0.96, 0.97, 0.90, 0.95, and 0.49 for support vector regression; 0.99, 0.83, 0.99, 0.99, and 0.82 for artificial neural network; and 0.99, 0.99, 0.99, 0.99, and 0.98 for random forest. Similar trends were observed for RMSE. The CEAC and EVPI curves produced by the random forest metamodel matched the results of the simulation output more closely than the linear regression metamodel. ConclusionsML-based metamodels generally outperformed traditional linear regression metamodels at replicating results from complex simulation models, with random forest metamodels performing best. HighlightsDecision-analytic models are frequently used by policy makers and other stakeholders to assess the impact of new medical technologies and interventions. However, complex models can impose limitations on conducting probabilistic sensitivity analysis and value-of-information analysis, and may not be suitable for developing online decision-support tools. Metamodels, which accurately formulate a mathematical relationship between input parameters and model outcomes, can replicate complex simulation models and address the above limitation. The machine learning–based random forest model can outperform linear regression in replicating the findings of a complex simulation model. Such a metamodel can be used for conducting cost-effectiveness and value-of-information analyses or developing online decision support tools. 
    more » « less
  5. Abstract In this paper we consider the problem of simultaneously estimating rare-event probabilities for a class of Gaussian random fields. A conventional rare-event simulation method is usually tailored to a specific rare event and consequently would lose estimation efficiency for different events of interest, which often results in additional computational cost in such simultaneous estimation problems. To overcome this issue, we propose a uniformly efficient estimator for a general family of Hölder continuous Gaussian random fields. We establish the asymptotic and uniform efficiency of the proposed method and also conduct simulation studies to illustrate its effectiveness. 
    more » « less