skip to main content


Title: BINOCULARS for efficient, nonmyopic sequential experimental design
Finite-horizon sequential experimental design (SED) arises naturally in many contexts, including hyperparameter tuning in machine learning among more traditional settings. Computing the optimal policy for such problems requires solving Bellman equations, which are generally intractable. Most existing work resorts to severely myopic approximations by limiting the decision horizon to only a single time-step, which can underweight exploration in favor of exploitation. We present BINOCULARS: Batch-Informed NOnmyopic Choices, Using Long-horizons for Adaptive, Rapid SED, a general framework for deriving efficient, nonmyopic approximations to the optimal experimental policy. Our key idea is simple and surprisingly effective: we first compute a one-step optimal batch of experiments, then select a single point from this batch to evaluate. We realize BINOCULARS for Bayesian optimization and Bayesian quadrature -- two notable example problems with radically different objectives -- and demonstrate that BINOCULARS significantly outperforms significantly outperforms myopic alternatives in real-world scenarios.  more » « less
Award ID(s):
1940224 1845434
NSF-PAR ID:
10182850
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 37th International Conference on Machine Learning
Page Range / eLocation ID:
3134 - 3143
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Bayesian optimization is a sequential decision making framework for optimizing expensive-to-evaluate black-box functions. Computing a full lookahead policy amounts to solving a highly intractable stochastic dynamic program. Myopic approaches, such as expected improvement, are often adopted in practice, but they ignore the long-term impact of the immediate decision. Existing nonmyopic approaches are mostly heuristic and/or computationally expensive. In this paper, we provide the first efficient implementation of general multi-step lookahead Bayesian optimization, formulated as a sequence of nested optimization problems within a multi-step scenario tree. Instead of solving these problems in a nested way, we equivalently optimize all decision variables in the full tree jointly, in a "one-shot" fashion. Combining this with an efficient method for implementing multi-step Gaussian process "fantasization," we demonstrate that multi-step expected improvement is computationally tractable and exhibits performance superior to existing methods on a wide range of benchmarks. 
    more » « less
  2. We study a special paradigm of active learning, called cost effective active search, where the goal is to find a given number of positive points from a large unlabeled pool with minimum labeling cost. Most existing methods solve this problem heuristically, and few theoretical results have been established. We adopt a principled Bayesian approach for the first time. We first derive the Bayesian optimal policy and establish a strong hardness result: the optimal policy is hard to approximate, with the best-possible approximation ratio lower bounded by Ω(n^0.16). We then propose an efficient and nonmyopic policy using the negative Poisson binomial distribution. We propose simple and fast approximations for computing its expectation, which serves as an essential role in our proposed policy. We conduct comprehensive experiments on various domains such as drug and materials discovery, and demonstrate that our proposed search procedure is superior to the widely used greedy baseline. 
    more » « less
  3. Abstract

    An omnichannel retailer with a network of physical stores and online fulfillment centers facing two demands (online and in‐store) has to make important, interlinked decisions—how much inventory to keep at each location and where to fulfill each online order from, as online demand can be fulfilled from any location with available inventory. We consider inventory decisions at the start of the selling horizon for a seasonal product, with online fulfillment decisions made multiple times over the horizon. To address the intractability in considering inventory and fulfillment decisions together, we relax the problem using a hindsight‐optimal bound, for which the inventory decision can be made independent of the optimal fulfillment decisions, while still incorporating virtual pooling of online demands across locations. We develop a computationally fast and scalable inventory heuristic for the multilocation problem based on the two‐store analysis. The inventory heuristic directly informs dynamic fulfillment decisions that guide online demand fulfillment from stores. Using a numerical study based on a fictitious network embedded in the United States, we show that our heuristic significantly outperforms traditional strategies. The value of centralized inventory planning is highest when there is a moderate mix of online and in‐store demands leading to synergies between pooling within and across locations, and this value increases with the size of the network. The inventory‐aware fulfillment heuristic considerably outperforms myopic policies seen in practice, and is found to be near‐optimal under a wide range of problem parameters.

     
    more » « less
  4. Active search is a setting in adaptive experimental design where we aim to uncover members of rare, valuable class(es) subject to a budget constraint. An important consideration in this problem is diversity among the discovered targets – in many applications, diverse discoveries offer more insight and may be preferable in downstream tasks. However, most existing active search policies either assume that all targets belong to a common positive class or encourage diversity via simple heuristics. We present a novel formulation of active search with multiple target classes, characterized by a utility function chosen from a flexible family whose members encourage diversity among discoveries via a diminishing returns mechanism. We then study this problem under the Bayesian lens and prove a hardness result for approximating the optimal policy for arbitrary positive, increasing, and concave utility functions. Finally, we design an efficient, nonmyopic approximation to the optimal policy for this class of utilities and demonstrate its superior empirical performance in a variety of experimental settings, including drug discovery. 
    more » « less
  5. We consider dynamic assortment problems with reusable products, in which each arriving customer chooses a product within an offered assortment, uses the product for a random duration of time, and returns the product back to the firm to be used by other customers. The goal is to find a policy for deciding on the assortment to offer to each customer so that the total expected revenue over a finite selling horizon is maximized. The dynamic-programming formulation of this problem requires a high-dimensional state variable that keeps track of the on-hand product inventories, as well as the products that are currently in use. We present a tractable approach to compute a policy that is guaranteed to obtain at least 50% of the optimal total expected revenue. This policy is based on constructing linear approximations to the optimal value functions. When the usage duration is infinite or follows a negative binomial distribution, we also show how to efficiently perform rollout on a simple static policy. Performing rollout corresponds to using separable and nonlinear value function approximations. The resulting policy is also guaranteed to obtain at least 50% of the optimal total expected revenue. The special case of our model with infinite usage durations captures the case where the customers purchase the products outright without returning them at all. Under infinite usage durations, we give a variant of our rollout approach whose total expected revenue differs from the optimal by a factor that approaches 1 with rate cubic-root of Cmin, where Cmin is the smallest inventory of a product. We provide computational experiments based on simulated data for dynamic assortment management, as well as real parking transaction data for the city of Seattle. Our computational experiments demonstrate that the practical performance of our policies is substantially better than their performance guarantees and that performing rollout yields noticeable improvements. 
    more » « less