skip to main content

Title: Metric Learning For Simulation Analytics
The sample path generated by a stochastic simulation often exhibits significant variability within each replication, revealing periods of good and poor performance alike. As such, traditional summaries of aggregate performance measures overlook the more fine-grained insights into the operational system behavior. In this paper, we take a simulation analytics view of output analysis, turning to machine learning methods to uncover key insights from the dynamic sample path. We present a k nearest neighbors model on system state information to facilitate real-time predictions of a stochastic performance measure. This model is built on the premise of a system-specific measure of similarity between observations of the state, which we inform via metric learning. An evaluation of our approach is provided on a stochastic activity network and a wafer fabrication facility, both of which give us confidence in the ability of metric learning to provide interpretation and improved predictive performance.
; ; ;
Bae, K-H; Feng, B; Kim, S; Lazarova-Molnar, S; Zheng, Z; Roeder, T; Thiesing, R
Award ID(s):
Publication Date:
Journal Name:
Proceedings of the Winter Simulation Conference
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Inference-based optimization via simulation, which substitutes Gaussian process (GP) learning for the structural properties exploited in mathematical programming, is a powerful paradigm that has been shown to be remarkably effective in problems of modest feasible-region size and decision-variable dimension. The limitation to “modest” problems is a result of the computational overhead and numerical challenges encountered in computing the GP conditional (posterior) distribution on each iteration. In this paper, we substantially expand the size of discrete-decision-variable optimization-via-simulation problems that can be attacked in this way by exploiting a particular GP—discrete Gaussian Markov random fields—and carefully tailored computational methods. The result ismore »the rapid Gaussian Markov Improvement Algorithm (rGMIA), an algorithm that delivers both a global convergence guarantee and finite-sample optimality-gap inference for significantly larger problems. Between infrequent evaluations of the global conditional distribution, rGMIA applies the full power of GP learning to rapidly search smaller sets of promising feasible solutions that need not be spatially close. We carefully document the computational savings via complexity analysis and an extensive empirical study. Summary of Contribution: The broad topic of the paper is optimization via simulation, which means optimizing some performance measure of a system that may only be estimated by executing a stochastic, discrete-event simulation. Stochastic simulation is a core topic and method of operations research. The focus of this paper is on significantly speeding-up the computations underlying an existing method that is based on Gaussian process learning, where the underlying Gaussian process is a discrete Gaussian Markov Random Field. This speed-up is accomplished by employing smart computational linear algebra, state-of-the-art algorithms, and a careful divide-and-conquer evaluation strategy. Problems of significantly greater size than any other existing algorithm with similar guarantees can solve are solved as illustrations.« less
  2. Wasserstein gradient flows provide a powerful means of understanding and solving many diffusion equations. Specifically, Fokker-Planck equations, which model the diffusion of probability measures, can be understood as gradient descent over entropy functionals in Wasserstein space. This equivalence, introduced by Jordan, Kinderlehrer and Otto, inspired the so-called JKO scheme to approximate these diffusion processes via an implicit discretization of the gradient flow in Wasserstein space. Solving the optimization problem associated with each JKO step, however, presents serious computational challenges. We introduce a scalable method to approximate Wasserstein gradient flows, targeted to machine learning applications. Our approach relies on input-convex neuralmore »networks (ICNNs) to discretize the JKO steps, which can be optimized by stochastic gradient descent. Contrarily to previous work, our method does not require domain discretization or particle simulation. As a result, we can sample from the measure at each time step of the diffusion and compute its probability density. We demonstrate the performance of our algorithm by computing diffusions following the Fokker-Planck equation and apply it to unnormalized density sampling as well as nonlinear filtering.« less
  3. Esposito, Lauren (Ed.)
    Abstract This article investigates a form of rank deficiency in phenotypic covariance matrices derived from geometric morphometric data, and its impact on measures of phenotypic integration. We first define a type of rank deficiency based on information theory then demonstrate that this deficiency impairs the performance of phenotypic integration metrics in a model system. Lastly, we propose methods to treat for this information rank deficiency. Our first goal is to establish how the rank of a typical geometric morphometric covariance matrix relates to the information entropy of its eigenvalue spectrum. This requires clear definitions of matrix rank, of which wemore »define three: the full matrix rank (equal to the number of input variables), the mathematical rank (the number of nonzero eigenvalues), and the information rank or “effective rank” (equal to the number of nonredundant eigenvalues). We demonstrate that effective rank deficiency arises from a combination of methodological factors—Generalized Procrustes analysis, use of the correlation matrix, and insufficient sample size—as well as phenotypic covariance. Secondly, we use dire wolf jaws to document how differences in effective rank deficiency bias two metrics used to measure phenotypic integration. The eigenvalue variance characterizes the integration change incorrectly, and the standardized generalized variance lacks the sensitivity needed to detect subtle changes in integration. Both metrics are impacted by the inclusion of many small, but nonzero, eigenvalues arising from a lack of information in the covariance matrix, a problem that usually becomes more pronounced as the number of landmarks increases. We propose a new metric for phenotypic integration that combines the standardized generalized variance with information entropy. This metric is equivalent to the standardized generalized variance but calculated only from those eigenvalues that carry nonredundant information. It is the standardized generalized variance scaled to the effective rank of the eigenvalue spectrum. We demonstrate that this metric successfully detects the shift of integration in our dire wolf sample. Our third goal is to generalize the new metric to compare data sets with different sample sizes and numbers of variables. We develop a standardization for matrix information based on data permutation then demonstrate that Smilodon jaws are more integrated than dire wolf jaws. Finally, we describe how our information entropy-based measure allows phenotypic integration to be compared in dense semilandmark data sets without bias, allowing characterization of the information content of any given shape, a quantity we term “latent dispersion”. [Canis dirus; Dire wolf; effective dispersion; effective rank; geometric morphometrics; information entropy; latent dispersion; modularity and integration; phenotypic integration; relative dispersion.]« less
  4. Communication between human and mobile agents is getting increasingly important as such agents are widely deployed in our daily lives. Vision-and-Dialogue Navigation is one of the tasks that evaluate the agent’s ability to interact with humans for assistance and navigate based on natural language responses. In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers. However, despite achieving competitive performance, we find that the agent in the NDH task is not evaluated appropriately by the primary metricmore »– Goal Progress. By analyzing the performance mismatch between Goal Progress and other metrics (e.g., normalized Dynamic Time Warping) from our state-of-the-art model, we show that NDH’s sub-path based task setup (i.e., navigating partial trajectory based on its correspondent subset of the full dialogue) does not provide the agent with enough supervision signal towards the goal region. Therefore, we propose a new task setup called NDH-Full which takes the full dialogue and the whole navigation path as one instance. We present a strong baseline model and show initial results on this new task. We further describe several approaches that we try, in order to improve the model performance (based on curriculum learning, pre-training, and data-augmentation), suggesting potential useful training methods on this new NDH-Full task.« less
  5. Collaborative bandit learning, i.e., bandit algorithms that utilize collaborative filtering techniques to improve sample efficiency in online interactive recommendation, has attracted much research attention as it enjoys the best of both worlds. However, all existing collaborative bandit learning solutions impose a stationary assumption about the environment, i.e., both user preferences and the dependency among users are assumed static over time. Unfortunately, this assumption hardly holds in practice due to users' ever-changing interests and dependency relations, which inevitably costs a recommender system sub-optimal performance in practice. In this work, we develop a collaborative dynamic bandit solution to handle a changing environmentmore »for recommendation. We explicitly model the underlying changes in both user preferences and their dependency relation as a stochastic process. Individual user's preference is modeled by a mixture of globally shared contextual bandit models with a Dirichlet process prior. Collaboration among users is thus achieved via Bayesian inference over the global bandit models. To balance exploitation and exploration during the interactions, Thompson sampling is used for both model selection and arm selection. Our solution is proved to maintain a standard $\tilde O(\sqrt{T})$ Bayesian regret in this challenging environment. Extensive empirical evaluations on both synthetic and real-world datasets further confirmed the necessity of modeling a changing environment and our algorithm's practical advantages against several state-of-the-art online learning solutions.« less