skip to main content


Title: Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees
Bayesian optimization is a sequential decision making framework for optimizing expensive-to-evaluate black-box functions. Computing a full lookahead policy amounts to solving a highly intractable stochastic dynamic program. Myopic approaches, such as expected improvement, are often adopted in practice, but they ignore the long-term impact of the immediate decision. Existing nonmyopic approaches are mostly heuristic and/or computationally expensive. In this paper, we provide the first efficient implementation of general multi-step lookahead Bayesian optimization, formulated as a sequence of nested optimization problems within a multi-step scenario tree. Instead of solving these problems in a nested way, we equivalently optimize all decision variables in the full tree jointly, in a "one-shot" fashion. Combining this with an efficient method for implementing multi-step Gaussian process "fantasization," we demonstrate that multi-step expected improvement is computationally tractable and exhibits performance superior to existing methods on a wide range of benchmarks.  more » « less
Award ID(s):
1845434 1940224
NSF-PAR ID:
10216888
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Advances in neural information processing systems
Volume:
33
ISSN:
1049-5258
Page Range / eLocation ID:
18039-18049
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recently, there has been renewed interest in data-driven control, that is, the design of controllers directly from observed data. In the case of linear time-invariant (LTI) systems, several approaches have been proposed that lead to tractable optimization problems. On the other hand, the case of nonlinear dynamics is considerably less developed, with existing approaches limited to at most rational dynamics and requiring the solution to a computationally expensive Sum of Squares (SoS) optimization. Since SoS problems typically scale combinatorially with the size of the problem, these approaches are limited to relatively low order systems. In this paper, we propose an alternative, based on the use of state-dependent representations. This idea allows for synthesizing data-driven controllers by solving at each time step an on-line optimization problem whose complexity is comparable to the LTI case. Further, the proposed approach is not limited to rational dynamics. The main result of the paper shows that the feasibility of this on-line optimization problem guarantees that the proposed controller renders the origin a globally asymptotically stable equilibrium point of the closed-loop system. These results are illustrated with some simple examples. The paper concludes by briefly discussing the prospects for adding performance criteria. 
    more » « less
  2. Finite-horizon sequential experimental design (SED) arises naturally in many contexts, including hyperparameter tuning in machine learning among more traditional settings. Computing the optimal policy for such problems requires solving Bellman equations, which are generally intractable. Most existing work resorts to severely myopic approximations by limiting the decision horizon to only a single time-step, which can underweight exploration in favor of exploitation. We present BINOCULARS: Batch-Informed NOnmyopic Choices, Using Long-horizons for Adaptive, Rapid SED, a general framework for deriving efficient, nonmyopic approximations to the optimal experimental policy. Our key idea is simple and surprisingly effective: we first compute a one-step optimal batch of experiments, then select a single point from this batch to evaluate. We realize BINOCULARS for Bayesian optimization and Bayesian quadrature -- two notable example problems with radically different objectives -- and demonstrate that BINOCULARS significantly outperforms significantly outperforms myopic alternatives in real-world scenarios. 
    more » « less
  3. The problem of allocating limited resources to maintain components of a multicomponent system, known as selective maintenance, is naturally formulated as a high-dimensional Markov decision process (MDP). Unfortunately, these problems are difficult to solve exactly for realistically sized systems. With this motivation, we contribute an approximate dynamic programming (ADP) algorithm for solving the selective maintenance problem for a series–parallel system with binary-state components. To the best of our knowledge, this paper describes the first application of ADP to maintain multicomponent systems. Our ADP is compared, using a numerical example from the literature, against exact solutions to the corresponding MDP. We then summarize the results of a more comprehensive set of experiments that demonstrate the ADP’s favorable performance on larger instances in comparison to both the exact (but computationally intensive) MDP approach and the heuristic (but computationally faster) one-step-lookahead approach. Finally, we demonstrate that the ADP is capable of solving an extension of the basic selective maintenance problem in which maintenance resources are permitted to be shared across stages. 
    more » « less
  4. Many machine learning problems can be abstracted in solving game theory formulations and boil down to optimizing nested objectives, such as generative adversarial networks (GANs) and multi-agent reinforcement learning. Solving these games requires finding their stable fixed points or Nash equilibrium. However, existing algorithms for solving games suffer from empirical instability, hence demanding heavy ad-hoc tuning in practice. To tackle these challenges, we resort to the emerging scheme of Learning to Optimize (L2O), which discovers problem-specific efficient optimization algorithms through data-driven training. Our customized L2O framework for differentiable game theory problems, dubbed “Learning to Play Games" (L2PG), seeks a stable fixed point solution, by predicting the fast update direction from the past trajectory, with a novel gradient stability-aware, sign-based loss function. We further incorporate curriculum learning and self-learning to strengthen the empirical training stability and generalization of L2PG. On test problems including quadratic games and GANs, L2PG can substantially accelerate the convergence, and demonstrates a remarkably more stable trajectory. Codes are available at https://github.com/VITA-Group/L2PG. 
    more » « less
  5. Many transit agencies operating paratransit and microtransit ser-vices have to respond to trip requests that arrive in real-time, which entails solving hard combinatorial and sequential decision-making problems under uncertainty. To avoid decisions that lead to signifi-cant inefficiency in the long term, vehicles should be allocated to requests by optimizing a non-myopic utility function or by batching requests together and optimizing a myopic utility function. While the former approach is typically offline, the latter can be performed online. We point out two major issues with such approaches when applied to paratransit services in practice. First, it is difficult to batch paratransit requests together as they are temporally sparse. Second, the environment in which transit agencies operate changes dynamically (e.g., traffic conditions can change over time), causing the estimates that are learned offline to become stale. To address these challenges, we propose a fully online approach to solve the dynamic vehicle routing problem (DVRP) with time windows and stochastic trip requests that is robust to changing environmental dynamics by construction. We focus on scenarios where requests are relatively sparse-our problem is motivated by applications to paratransit services. We formulate DVRP as a Markov decision process and use Monte Carlo tree search to evaluate actions for any given state. Accounting for stochastic requests while optimizing a non-myopic utility function is computationally challenging; indeed, the action space for such a problem is intractably large in practice. To tackle the large action space, we leverage the structure of the problem to design heuristics that can sample promising actions for the tree search. Our experiments using real-world data from our partner agency show that the proposed approach outperforms existing state-of-the-art approaches both in terms of performance and robustness. 
    more » « less