skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Bonferroni-Free and Indifference-Zone-Flexible Sequential Elimination Procedures for Ranking and Selection
This paper proposes two fully sequential procedures for selecting the best system with a guaranteed probability of correct selection (PCS). The main features of the proposed procedures include the following: (1) adopting a Bonferroni-free model that overcomes the conservativeness of the Bonferroni correction and delivers the exact probabilistic guarantee without overshooting; (2) conducting always valid and fully sequential hypothesis tests that enable continuous monitoring of each candidate system and control the type I error rate (or equivalently, PCS) at a prescribed level; and (3) assuming an indifference-zone-flexible formulation, which means that the indifference-zone parameter is not indispensable but could be helpful if provided. We establish statistical validity and asymptotic efficiency for the proposed procedures under normality settings with and without the knowledge of true variances. Numerical studies conducted under various configurations corroborate the theoretical findings and demonstrate the superiority of the proposed procedures. Funding: W. Wang and H. Wan were supported in part by CollinStar Capital Pty Ltd. X. Chen was supported in part by the National Science Foundation [Grant IIS-1849300 and CAREER CMMI-1846663]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.2447 .  more » « less
Award ID(s):
1846663 1849300
PAR ID:
10443659
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Operations Research
ISSN:
0030-364X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider a simulation-based ranking and selection (R&S) problem with input uncertainty, in which unknown input distributions can be estimated using input data arriving in batches of varying sizes over time. Each time a batch arrives, additional simulations can be run using updated input distribution estimates. The goal is to confidently identify the best design after collecting as few batches as possible. We first introduce a moving average estimator for aggregating simulation outputs generated under heterogenous input distributions. Then, based on a sequential elimination framework, we devise two major R&S procedures by establishing exact and asymptotic confidence bands for the estimator. We also extend our procedures to the indifference zone setting, which helps save simulation effort for practical usage. Numerical results show the effectiveness and necessity of our procedures in controlling error from input uncertainty. Moreover, the efficiency can be further boosted through optimizing the “drop rate” parameter, which is the proportion of past simulation outputs to discard, of the moving average estimator. 
    more » « less
  2. We consider the problem of finding a system with the best primary performance measure among a finite number of simulated systems in the presence of subjective stochastic constraints on secondary performance measures. When no feasible system exists, the decision maker may be willing to relax some constraint thresholds. We take multiple threshold values for each constraint as a user’s input and propose indifference-zone procedures that perform the phases of feasibility check and selection-of-the-best sequentially or simultaneously. Given that there is no change in the underlying simulated systems, our procedures recycle simulation observations to conduct feasibility checks across all potential thresholds. We prove that the proposed procedures yield the best system in the most desirable feasible region possible with at least a pre-specified probability. Our experimental results show that our procedures perform well with respect to the number of observations required to make a decision, as compared with straight-forward procedures that repeatedly solve the problem for each set of constraint thresholds, and that our simultaneously-running procedure provides the best overall performance. 
    more » « less
  3. A sequential design problem for rank aggregation is commonly encountered in psychology, politics, marketing, sports, etc. In this problem, a decision maker is responsible for ranking K items by sequentially collecting noisy pairwise comparisons from judges. The decision maker needs to choose a pair of items for comparison in each step, decide when to stop data collection, and make a final decision after stopping based on a sequential flow of information. Because of the complex ranking structure, existing sequential analysis methods are not suitable. In this paper, we formulate the problem under a Bayesian decision framework and propose sequential procedures that are asymptotically optimal. These procedures achieve asymptotic optimality by seeking a balance between exploration (i.e., finding the most indistinguishable pair of items) and exploitation (i.e., comparing the most indistinguishable pair based on the current information). New analytical tools are developed for proving the asymptotic results, combining advanced change of measure techniques for handling the level crossing of likelihood ratios and classic large deviation results for martingales, which are of separate theoretical interest in solving complex sequential design problems. A mirror-descent algorithm is developed for the computation of the proposed sequential procedures. 
    more » « less
  4. Building and expanding on principles of statistics, machine learning, and scientific inquiry, we propose the predictability, computability, and stability (PCS) framework for veridical data science. Our framework, composed of both a workflow and documentation, aims to provide responsible, reliable, reproducible, and transparent results across the data science life cycle. The PCS workflow uses predictability as a reality check and considers the importance of computation in data collection/storage and algorithm design. It augments predictability and computability with an overarching stability principle. Stability expands on statistical uncertainty considerations to assess how human judgment calls impact data results through data and model/algorithm perturbations. As part of the PCS workflow, we develop PCS inference procedures, namely PCS perturbation intervals and PCS hypothesis testing, to investigate the stability of data results relative to problem formulation, data cleaning, modeling decisions, and interpretations. We illustrate PCS inference through neuroscience and genomics projects of our own and others. Moreover, we demonstrate its favorable performance over existing methods in terms of receiver operating characteristic (ROC) curves in high-dimensional, sparse linear model simulations, including a wide range of misspecified models. Finally, we propose PCS documentation based on R Markdown or Jupyter Notebook, with publicly available, reproducible codes and narratives to back up human choices made throughout an analysis. The PCS workflow and documentation are demonstrated in a genomics case study available on Zenodo. 
    more » « less
  5. We study highway-based shipping of preowned automobiles by auto carriers, an important although overlooked problem in the automobile shipping literature. The special structure associated with auto carriers implies many different ways of loading a set of automobiles to an auto carrier with different loading costs. Thus, in addition to vehicle routing decisions, loading decisions are essential in automobile shipping optimization. The objective of our problem is to maximize the total revenue minus the total routing and loading cost subject to time windows and loading constraints among others. Most existing automobile shipping studies treat loading and routing separately; some studies partially address the loading aspect in routing optimization but only check the loading feasibility without evaluating the quality of loading decisions. We, thus, contribute to the literature by fully integrating loading decisions into routing decision making. An integrated machine learning (ML) and optimization approach is proposed to solve the problem. The overall approach follows a column generation–based solution framework, in which an insertion heuristic is proposed to find new routes based on existing routes, and ML is employed to predict the loading feasibility and estimate the minimum loading cost of a given route without solving the complex loading optimization problem. The integration of the ML approach and the insertion heuristic enables us to find high-quality new routes quickly in each column generation iteration. Two variants of this integrated approach are evaluated against a benchmark sequential approach in which routing and loading are tackled separately and another benchmark approach in which routing and loading are optimized jointly without using ML. Computational experiments demonstrate that the proposed integrated ML and optimization approach generates significantly better solutions than the sequential benchmark approach with only slightly more computation time and similar solutions to the joint optimization benchmark approach but with significantly less computation time. The proposed solution approach can be adopted by automobile shipping companies. It can also be adapted for other joint optimization problems, such as those in aircraft load planning. Funding: Y. Sun is partially supported by the National Science Foundation [Grants 2332161, 2100745, and 2055347]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/trsc.2024.0712 . 
    more » « less