skip to main content


Title: Getting to "rate optimal" in ranking & selection
In their 2004 seminal paper, Glynn and Juneja formally and precisely established the rate-optimal, probability of incorrect-selection, replication allocation scheme for selecting the best of k simulated systems. In the case of independent, normally distributed outputs this allocation has a simple form that depends in an intuitively appealing way on the true means and variances. Of course the means and (typically) variances are unknown, but the rate-optimal allocation provides a target for implementable, dynamic, data-driven policies to achieve. In this paper we compare the empirical behavior of four related replication-allocation policies: mCEI from Chen and Rzyhov and our new gCEI policy that both converge to the Glynn and Juneja allocation; AOMAP from Peng and Fu that converges to the OCBA optimal allocation; and TTTS from Russo that targets the rate of convergence of the posterior probability of incorrect selection. We find that these policies have distinctly different behavior in some settings.  more » « less
Award ID(s):
1854562
NSF-PAR ID:
10335103
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 2021 Winter Simulation Conference
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This paper studies a remote sensing system where multiple wireless sensors generate possibly noisy information updates of various surveillance fields and delivering these updates to a control center over a wireless network. The control center needs a sufficient number of recently generated information updates to have an accurate estimate of the current system status, which is critical for the control center to make appropriate control decisions. The goal of this work is then to design the optimal policy for scheduling the transmissions of information updates. Through Brownian approximation, we demonstrate that the control center’s ability to make accurate real-time estimates depends on the averages and temporal variances of the delivery processes. We then formulate a constrained optimization problem to find the optimal means and variances. We also develop a simple online scheduling policy that employs the optimal means and variances to achieve the optimal system-wide performance. Simulation results show that our scheduling policy enjoys fast convergence speed and better performance when compared to other state-of-the-art policies. 
    more » « less
  2. This paper proposes two fully sequential procedures for selecting the best system with a guaranteed probability of correct selection (PCS). The main features of the proposed procedures include the following: (1) adopting a Bonferroni-free model that overcomes the conservativeness of the Bonferroni correction and delivers the exact probabilistic guarantee without overshooting; (2) conducting always valid and fully sequential hypothesis tests that enable continuous monitoring of each candidate system and control the type I error rate (or equivalently, PCS) at a prescribed level; and (3) assuming an indifference-zone-flexible formulation, which means that the indifference-zone parameter is not indispensable but could be helpful if provided. We establish statistical validity and asymptotic efficiency for the proposed procedures under normality settings with and without the knowledge of true variances. Numerical studies conducted under various configurations corroborate the theoretical findings and demonstrate the superiority of the proposed procedures. Funding: W. Wang and H. Wan were supported in part by CollinStar Capital Pty Ltd. X. Chen was supported in part by the National Science Foundation [Grant IIS-1849300 and CAREER CMMI-1846663]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.2447 . 
    more » « less
  3. We investigate statistical uncertainty quantification for reinforcement learning (RL) and its implications in exploration policy. Despite ever-growing literature on RL applications, fundamental questions about inference and error quantification, such as large-sample behaviors, appear to remain quite open. In this paper, we fill in the literature gap by studying the central limit theorem behaviors of estimated Q-values and value functions under various RL settings. In particular, we explicitly identify closed-form expressions of the asymptotic variances, which allow us to efficiently construct asymptotically valid confidence regions for key RL quantities. Furthermore, we utilize these asymptotic expressions to design an effective exploration strategy, which we call Q-value-based Optimal Computing Budget Allocation (Q-OCBA). The policy relies on maximizing the relative discrepancies among the Q-value estimates. Numerical experiments show superior performances of our exploration strategy than other benchmark policies. Funding: This work was supported by the National Science Foundation (1720433). 
    more » « less
  4. This paper focuses on optimizing resource allocation amongst a set of tenants, network slices, supporting dynamic customer loads over a set of distributed resources, e.g., base stations. The aim is to reap the benefits of statistical multiplexing resulting from flexible sharing of ‘pooled’ resources, while enabling tenants to differentiate and protect their performance from one another’s load fluctuations. To that end we consider a setting where resources are grouped into Virtual Resource Pools (VRPs) wherein resource allocation is jointly and dynam- ically managed. Specifically for each VRP we adopt a Share- Constrained Proportionally Fair (SCPF) allocation scheme where each tenant is allocated a fixed share (budget). This budget is to be distributed equally amongst its active customers which in turn are granted fractions of their associated VRP resources in proportion to customer shares. For a VRP with a single resource, this translates to the well known Generalized Processor Sharing (GPS) policy. For VRPs with multiple resources SCPF provides a flexible means to achieve load elastic allocations across tenants sharing the pool. Given tenants’ per resource shares and expected loads, this paper formulates the problem of determining optimal VRP partitions which maximize the overall expected shared weighted utility while ensuring protection guarantees. For a high load/capacity setting we exhibit this network utility function explicitly, quantifying the benefits and penalties of any VRP partition, in terms of network slices’ ability to achieve performance differentiation, load balancing, and statistical multiplexing. Although the problem is shown to be NP-Hard, a simple greedy heuristic is shown to be effective. Analysis and simulations confirm that the selection of optimal VRP partitions provide a practical avenue towards improving network utility in network slicing scenarios with dynamic loads. 
    more » « less
  5. Many decision problems are set in changing environments. For example, determining the optimal investment in cyber maintenance depends on whether there is evidence of an unusual vulnerability, such as “Heartbleed,” that is causing an especially high rate of incidents. This gives rise to the need for timely information to update decision models so that optimal policies can be generated for each decision period. Social media provide a streaming source of relevant information, but that information needs to be efficiently transformed into numbers to enable the needed updates. This article explores the use of social media as an observation source for timely decision making. To efficiently generate the observations for Bayesian updates, we propose a novel computational method to fit an existing clustering model. The proposed method is called k-means latent Dirichlet allocation (KLDA).We illustrate the method using a cybersecurity problem. Many organizations ignore “medium” vulnerabilities identified during periodic scans. Decision makers must choose whether staff should be required to address these vulnerabilities during periods of elevated risk. Also, we study four text corpora with 100 replications and show that KLDA is associated with significantly reduced computational times and more consistent model accuracy. 
    more » « less