One of the primary challenges in large-scale distributed learning stems from stringent communication constraints. While several recent works address this challenge for static optimization problems, sequential decision-making under uncertainty has remained much less explored in this regard. Motivated by this gap, we introduce a new linear stochastic bandit formulation over a bit-constrained channel. Specifically, in our setup, an agent interacting with an environment transmits encoded estimates of an unknown model parameter to a server over a communication channel of finite capacity. The goal of the server is to take actions based on these estimates to minimize cumulative regret. To this end, we develop a novel and general algorithmic framework that hinges on two main components: (i) an adaptive encoding mechanism that exploits statistical concentration bounds, and (ii) a decision-making principle based on confidence sets that account for encoding errors. As our main result, we prove that when the unknown model is d-dimensional, a channel capacity of O(d) bits suffices to achieve order-optimal regret. We also establish that for the simpler unstructured multi-armed bandit problem, 1 bit channel capacity is sufficient for achieving optimal regret bounds.
more »
« less
Linear Stochastic Bandits over a Bit-Constrained Channel
One of the primary challenges in large-scale distributed learning stems from stringent communication constraints. While several recent works address this challenge for static optimization problems, sequential decision-making under uncertainty has remained much less explored in this regard. Motivated by this gap, we introduce a new linear stochastic bandit formulation over a bit-constrained channel. Specifically, in our setup, an agent interacting with an environment transmits encoded estimates of an unknown model parameter to a server over a communication channel of finite capacity. The goal of the server is to take actions based on these estimates to minimize cumulative regret. To this end, we develop a novel and general algorithmic framework that hinges on two main components: (i) an adaptive encoding mechanism that exploits statistical concentration bounds, and (ii) a decision-making principle based on confidence sets that account for encoding errors. As our main result, we prove that when the unknown model is d-dimensional, a channel capacity of O(d) bits suffices to achieve order-optimal regret. We also establish that for the simpler unstructured multi-armed bandit problem, 1 bit channel capacity is sufficient for achieving optimal regret bounds. Keywords: Linear Bandits, Distributed Learning, Communication Constraints
more »
« less
- Award ID(s):
- 1910056
- PAR ID:
- 10490378
- Publisher / Repository:
- Proceedings of Machine Learning Research
- Date Published:
- Journal Name:
- Learning for Dynamics and Control Conference
- Format(s):
- Medium: X
- Location:
- Philadelphia
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
One of the primary challenges in large-scale distributed learning stems from stringent communication constraints. While several recent works address this challenge for static optimization problems, sequential decision-making under uncertainty has remained much less explored in this regard. Motivated by this gap, we introduce a new linear stochastic bandit formulation over a bit-constrained channel. Specifically, in our setup, an agent interacting with an environment transmits encoded estimates of an unknown model parameter to a server over a communication channel of finite capacity. The goal of the server is to take actions based on these estimates to minimize cumulative regret. To this end, we develop a novel and general algorithmic framework that hinges on two main components:(i) an adaptive encoding mechanism that exploits statistical concentration bounds, and (ii) a decision-making principle based on confidence sets that account for encoding errors. As our main result, we prove that when the unknown model is -dimensional, a channel capacity of bits suffices to achieve order-optimal regret. We also establish that for the simpler unstructured multi-armed bandit problem, bit channel capacity is sufficient for achieving optimal regret bounds.more » « less
-
Adapting to a priori unknown noise level is a very important but challenging problem in sequential decision-making as efficient exploration typically requires knowledge of the noise level, which is often loosely specified. We report significant progress in addressing this issue in linear bandits in two respects. First, we propose a novel confidence set that is ’semi-adaptive’ to the unknown sub-Gaussian parameter $$\sigma_*^2$$ in the sense that the (normalized) confidence width scales with $$\sqrt{d\sigma_*^2 + \sigma_0^2}$$ where $$d$$ is the dimension and $$\sigma_0^2$$ is the specified sub-Gaussian parameter (known) that can be much larger than $$\sigma_*^2$$. This is a significant improvement over $$\sqrt{d\sigma_0^2}$$ of the standard confidence set of Abbasi-Yadkori et al. (2011), especially when $$d$$ is large. We show that this leads to an improved regret bound in linear bandits. Second, for bounded rewards, we propose a novel variance-adaptive confidence set that has a much improved numerical performance upon prior art. We then apply this confidence set to develop, as we claim, the first practical variance-adaptive linear bandit algorithm via an optimistic approach, which is enabled by our novel regret analysis technique. Both of our confidence sets rely critically on ‘regret equality’ from online learning. Our empirical evaluation in Bayesian optimization tasks shows that our algorithms demonstrate better or comparable performance compared to existing methods.more » « less
-
Most existing works on federated bandits take it for granted that all clients are altruistic about sharing their data with the server for the collective good whenever needed. Despite their compelling theoretical guarantee on performance and communication efficiency, this assumption is overly idealistic and oftentimes violated in practice, especially when the algorithm is operated over self-interested clients, who are reluctant to share data without explicit benefits. Negligence of such self-interested behaviors can significantly affect the learning efficiency and even the practical operability of federated bandit learning. In light of this, we aim to spark new insights into this under-explored research area by formally introducing an incentivized communication problem for federated bandits, where the server shall motivate clients to share data by providing incentives. Without loss of generality, we instantiate this bandit problem with the contextual linear setting and propose the first incentivized communication protocol, namely, Inc-FedUCB, that achieves near-optimal regret with provable communication and incentive cost guarantees. Extensive empirical experiments on both synthetic and real-world datasets further validate the effectiveness of the proposed method across various environments.more » « less
-
Camps-Valls, Gustau; Ruiz, Francisco J.; Valera, Isabel (Ed.)Linear contextual bandit is a popular online learning problem. It has been mostly studied in centralized learning settings. With the surging demand of large-scale decentralized model learning, e.g., federated learning, how to retain regret minimization while reducing communication cost becomes an open challenge. In this paper, we study linear contextual bandit in a federated learning setting. We propose a general framework with asynchronous model update and communication for a collection of homogeneous clients and heterogeneous clients, respectively. Rigorous theoretical analysis is provided about the regret and communication cost under this distributed learning framework; and extensive empirical evaluations demonstrate the effectiveness of our solution.more » « less
An official website of the United States government
