We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of O√(K/N + N)T, communicates for O(log T) steps and broadcasts O(log K) bits in each communication step. We extend the work to sparse graphs with maximum degree KG and diameter D to propose LCC-UCB-GRAPH which enjoys a regret bound of O√(D(K/N + KG)DT). Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithms perform well and outperform strategies that communicate through a central node.
more »
« less
Global Optimization with Parametric Function Approximation
We consider the problem of global optimization with noisy zeroth order oracles — a well-motivated problem useful for various applications ranging from hyper-parameter tuning for deep learning to new material design. Existing work relies on Gaussian processes or other non-parametric family, which suffers from the curse of dimensionality. In this paper, we propose a new algorithm GO-UCB that leverages a parametric family of functions (e.g., neural networks) instead. Under a realizable assumption and a few other mild geometric conditions, we show that GO-UCB achieves a cumulative regret of $\tilde{O}(\sqrt{T})$ where $T$ is the time horizon. At the core of GO-UCB is a carefully designed uncertainty set over parameters based on gradients that allows optimistic exploration. Synthetic and real-world experiments illustrate GO-UCB works better than popular Bayesian optimization approaches, even if the model is misspecified.
more »
« less
- Award ID(s):
- 2134214
- NSF-PAR ID:
- 10467176
- Editor(s):
- Krause, Andreas and
- Publisher / Repository:
- Proceedings of the 40th International Conference on Machine Learning
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- Volume:
- 202
- ISSN:
- 2640-3498
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle. In real-life, the feedbacks of such oracles are often noisy and available after some unknown delay that may depend on the computation time of the oracle. Additionally, if the exact evaluations are expensive but coarse approximations are available at a lower cost, the feedbacks can have multi-fidelity. In order to address this problem, we propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and noise-tolerant bandit algorithm. We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks. Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1) and delayed-UCB-V (DUCBV) algorithms. Given a horizon T, PCTS retains the regret bound of non-delayed HOO for expected delay of O(log T), and worsens by T^((1-α)/(d+2)) for expected delays of O(T^(1-α)) for α ∈ (0,1]. We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the state-of-the-art black-box optimization methods for feedbacks with different noise levels, delays, and fidelity.more » « less
-
In this paper, we propose and study opportunistic bandits - a new variant of bandits where the regret of pulling a suboptimal arm varies under different environmental conditions, such as network load or produce price. When the load/price is low, so is the cost/regret of pulling a suboptimal arm (e.g., trying a suboptimal network configuration). Therefore, intuitively, we could explore more when the load/price is low and exploit more when the load/price is high. Inspired by this intuition, we propose an Adaptive Upper-Confidence-Bound (AdaUCB) algorithm to adaptively balance the exploration-exploitation tradeoff for opportunistic bandits. We prove that AdaUCB achieves O(log T) regret with a smaller coefficient than the traditional UCB algorithm. Furthermore, AdaUCB achieves O(1) regret with respect to T if the exploration cost is zero when the load level is below a certain threshold. Last, based on both synthetic data and real-world traces, experimental results show that AdaUCB significantly outperforms other bandit algorithms, such as UCB and TS (Thompson Sampling), under large load/price fluctuation.more » « less
-
We consider a prototypical path planning problem on a graph with uncertain cost of mobility on its edges. At a given node, the planning agent can access the true cost for edges to its neighbors and uses a noisy simulator to estimate the cost-to-go from the neighboring nodes. The objective of the planning agent is to select a neighboring node such that, with high probability, the cost-to-go is minimized for the worst possible realization of uncertain parameters in the simulator. By modeling the cost-to-go as a Gaussian process (GP) for every realization of the uncertain parameters, we apply a scenario approach in which we draw fixed independent samples of the uncertain parameter. We present a scenario-based iterative algorithm using the upper confidence bound (UCB) of the fixed independent scenarios to compute the choice of the neighbor to go to. We characterize the performance of the proposed algorithm in terms of a novel notion of regret defined with respect to an additional draw of the uncertain parameter, termed as scenario regret under re-draw. In particular, we characterize a high probability upper bound on the regret under re-draw for any finite number of iterations of the algorithm, and show that this upper bound tends to zero asymptotically with the number of iterations. We supplement our analysis with numerical results.more » « less
-
Yin, George (Ed.)We consider a discrete time stochastic Markovian control problem under model uncertainty. Such uncertainty not only comes from the fact that the true probability law of the underlying stochastic process is unknown, but the parametric family of probability distributions which the true law belongs to is also unknown. We propose a nonparametric adaptive robust control methodology to deal with such problem where the relevant system random noise is, for simplicity, assumed to be i.i.d. and onedimensional. Our approach hinges on the following building concepts: first, using the adaptive robust paradigm to incorporate online learning and uncertainty reduction into the robust control problem; second, learning the unknown probability law through the empirical distribution, and representing uncertainty reduction in terms of a sequence of Wasserstein balls around the empirical distribution; third, using Lagrangian duality to convert the optimization over Wasserstein balls to a scalar optimization problem, and adopting a machine learning technique to achieve efficient computation of the optimal control. We illustrate our methodology by considering a utility maximization problem. Numerical comparisons show that the nonparametric adaptive robust control approach is preferable to the traditional robust frameworksmore » « less