skip to main content


Title: Cost effective active search
We study a special paradigm of active learning, called cost effective active search, where the goal is to find a given number of positive points from a large unlabeled pool with minimum labeling cost. Most existing methods solve this problem heuristically, and few theoretical results have been established. We adopt a principled Bayesian approach for the first time. We first derive the Bayesian optimal policy and establish a strong hardness result: the optimal policy is hard to approximate, with the best-possible approximation ratio lower bounded by Ω(n^0.16). We then propose an efficient and nonmyopic policy using the negative Poisson binomial distribution. We propose simple and fast approximations for computing its expectation, which serves as an essential role in our proposed policy. We conduct comprehensive experiments on various domains such as drug and materials discovery, and demonstrate that our proposed search procedure is superior to the widely used greedy baseline.  more » « less
Award ID(s):
1845434
NSF-PAR ID:
10140928
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Advances in Neural Information Processing Systems
Volume:
32
Page Range / eLocation ID:
4880 - 4889
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the scheduling of multiple tasks under varying processing costs and derive a priority rule for optimal scheduling policies. Each task has a due date, and a non‐completion penalty cost is incurred if the task is not completely processed before its due date. We assume that the task arrival process is stochastic and the processing rate is capacitated. Our work is motivated by both traditional and emerging application domains, such as construction industry and freelance consulting industry. We establish the optimality ofShorter Slack time and Longer remaining Processing time(SSLP)principlethat determines the priority among active tasks. Based on the derived structural properties, we also propose an effective cost‐balancing heuristic policy and demonstrate the efficacy of the proposed policy through extensive numerical experiments. We believe our results provide operators/managers valuable insights on how to devise effective service scheduling policies under varying costs.

     
    more » « less
  2. Problem definition: We study scheduling multi-class impatient customers in parallel server queueing systems. At the time of arrival, customers are identified as being one of many classes, and the class represents the service and patience time distributions as well as cost characteristics. From the system’s perspective, customers of the same class at time of arrival get differentiated on their residual patience time as they wait in queue. We leverage this property and propose two novel and easy-to-implement multi-class scheduling policies. Academic/practical relevance: Scheduling multi-class impatient customers is an important and challenging topic, especially when customers’ patience times are nonexponential. In these contexts, even for customers of the same class, processing them under the first-come, first-served (FCFS) policy is suboptimal. This is because, at time of arrival, the system only knows the overall patience distribution from which a customer’s patience value is drawn, and as time elapses, the estimate of the customer’s residual patience time can be further updated. For nonexponential patience distributions, such an update indeed reveals additional information, and using this information to implement within-class prioritization can lead to additional benefits relative to the FCFS policy. Methodology: We use fluid approximations to analyze the multi-class scheduling problem with ideas borrowed from convex optimization. These approximations are known to perform well for large systems, and we use simulations to validate our proposed policies for small systems. Results: We propose a multi-class time-in-queue policy that prioritizes both across customer classes and within each class using a simple rule and further show that most of the gains of such a policy can be achieved by deviating from within-class FCFS for at most one customer class. In addition, for systems with exponential patience times, our policy reduces to a simple priority-based policy, which we prove is asymptotically optimal for Markovian systems with an optimality gap that does not grow with system scale. Managerial implications: Our work provides managers ways of improving quality of service to manage parallel server queueing systems. We propose easy-to-implement policies that perform well relative to reasonable benchmarks. Our work also adds to the academic literature on multi-class queueing systems by demonstrating the joint benefits of cross- and within-class prioritization.

    Funding: A. Bassamboo received financial support from the National Science Foundation [Grant CMMI 2006350]. C. (A.) Wu received financial support from the Hong Kong General Research Fund [Early Career Scheme, Project 26206419].

    Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.1190 .

     
    more » « less
  3. Active search is a setting in adaptive experimental design where we aim to uncover members of rare, valuable class(es) subject to a budget constraint. An important consideration in this problem is diversity among the discovered targets – in many applications, diverse discoveries offer more insight and may be preferable in downstream tasks. However, most existing active search policies either assume that all targets belong to a common positive class or encourage diversity via simple heuristics. We present a novel formulation of active search with multiple target classes, characterized by a utility function chosen from a flexible family whose members encourage diversity among discoveries via a diminishing returns mechanism. We then study this problem under the Bayesian lens and prove a hardness result for approximating the optimal policy for arbitrary positive, increasing, and concave utility functions. Finally, we design an efficient, nonmyopic approximation to the optimal policy for this class of utilities and demonstrate its superior empirical performance in a variety of experimental settings, including drug discovery. 
    more » « less
  4. Millimeter-wave (mm-wave) systems rely on narrow- beams to cope with the severe signal attenuation in the mm- wave frequency band. However, susceptibility to beam mis- alignment due to mobility or blockage requires the use of beam- alignment schemes, with huge cost in terms of overhead and use of system resources. In this paper, a beam-alignment scheme is proposed based on Bayesian multi-armed bandits, with the goal to maximize the alignment probability and the data-communication throughput. A Bayesian approach is proposed, by considering the state as a posterior distribution over angles of arrival (AoA) and of departure (AoD), given the history of feedback signaling and of beam pairs scanned by the base-station (BS) and the user- end (UE). A simplified sufficient statistic for optimal control is identified, in the form of preference of BS-UE beam pairs. By bounding a value function, the second-best preference policy is formulated, which strikes an optimal balance between exploration and exploitation by selecting the beam pair with the current second-best preference. Through Monte-Carlo simulation with analog beamforming, the superior performance of the second- best preference policy is demonstrated in comparison to existing schemes based on first-best preference, linear Thompson sampling, and upper confidence bounds, with up to 7%, 10% and 30% improvements in alignment probability, respectively. 
    more » « less
  5. Abstract

    Managing risk in infrastructure systems implies dealing with interdependent physical networks and their relationships with the natural and societal contexts. Computational tools are often used to support operational decisions aimed at improving resilience, whereas economics‐related tools tend to be used to address broader societal and policy issues in infrastructure management. We propose an optimization‐based framework for infrastructure resilience analysis that incorporates organizational and socioeconomic aspects into operational problems, allowing to understand relationships between decisions at the policy level (e.g., regulation) and the technical level (e.g., optimal infrastructure restoration). We focus on three issues that arise when integrating such levels. First, optimal restoration strategies driven by financial and operational factors evolve differently compared to those driven by socioeconomic and humanitarian factors. Second, regulatory aspects have a significant impact on recovery dynamics (e.g., effective recovery is most challenging in societies with weak institutions and regulation, where individual interests may compromise societal well‐being). And third, the decision space (i.e., available actions) in postdisaster phases is strongly determined by predisaster decisions (e.g., resource allocation). The proposed optimization framework addresses these issues by using: (1) parametric analyses to test the influence of operational and socioeconomic factors on optimization outcomes, (2) regulatory constraints to model and assess the cost and benefit (for a variety of actors) of enforcing specific policy‐related conditions for the recovery process, and (3) sensitivity analyses to capture the effect of predisaster decisions on recovery. We illustrate our methodology with an example regarding the recovery of interdependent water, power, and gas networks in Shelby County, TN (USA), with exposure to natural hazards.

     
    more » « less