One of the primary challenges in largescale distributed learning stems from stringent communication constraints. While several recent works address this challenge for static optimization problems, sequential decisionmaking under uncertainty has remained much less explored in this regard. Motivated by this gap, we introduce a new linear stochastic bandit formulation over a bitconstrained channel. Specifically, in our setup, an agent interacting with an environment transmits encoded estimates of an unknown model parameter to a server over a communication channel of finite capacity. The goal of the server is to take actions based on these estimates to minimize cumulative regret. To this end, we develop a novel and general algorithmic framework that hinges on two main components: (i) an adaptive encoding mechanism that exploits statistical concentration bounds, and (ii) a decisionmaking principle based on confidence sets that account for encoding errors. As our main result, we prove that when the unknown model is ddimensional, a channel capacity of O(d) bits suffices to achieve orderoptimal regret. We also establish that for the simpler unstructured multiarmed bandit problem, 1 bit channel capacity is sufficient for achieving optimal regret bounds.
more »
« less
Linear Stochastic Bandits over a BitConstrained Channel
One of the primary challenges in largescale distributed learning stems from stringent communication constraints. While several recent works address this challenge for static optimization problems, sequential decisionmaking under uncertainty has remained much less explored in this regard. Motivated by this gap, we introduce a new linear stochastic bandit formulation over a bitconstrained channel. Specifically, in our setup, an agent interacting with an environment transmits encoded estimates of an unknown model parameter to a server over a communication channel of finite capacity. The goal of the server is to take actions based on these estimates to minimize cumulative regret. To this end, we develop a novel and general algorithmic framework that hinges on two main components: (i) an adaptive encoding mechanism that exploits statistical concentration bounds, and (ii) a decisionmaking principle based on confidence sets that account for encoding errors. As our main result, we prove that when the unknown model is ddimensional, a channel capacity of O(d) bits suffices to achieve orderoptimal regret. We also establish that for the simpler unstructured multiarmed bandit problem, 1 bit channel capacity is sufficient for achieving optimal regret bounds. Keywords: Linear Bandits, Distributed Learning, Communication Constraints
more »
« less
 Award ID(s):
 1910056
 NSFPAR ID:
 10490378
 Publisher / Repository:
 Proceedings of Machine Learning Research
 Date Published:
 Journal Name:
 Learning for Dynamics and Control Conference
 Format(s):
 Medium: X
 Location:
 Philadelphia
 Sponsoring Org:
 National Science Foundation
More Like this


One of the primary challenges in largescale distributed learning stems from stringent communication constraints. While several recent works address this challenge for static optimization problems, sequential decisionmaking under uncertainty has remained much less explored in this regard. Motivated by this gap, we introduce a new linear stochastic bandit formulation over a bitconstrained channel. Specifically, in our setup, an agent interacting with an environment transmits encoded estimates of an unknown model parameter to a server over a communication channel of finite capacity. The goal of the server is to take actions based on these estimates to minimize cumulative regret. To this end, we develop a novel and general algorithmic framework that hinges on two main components:(i) an adaptive encoding mechanism that exploits statistical concentration bounds, and (ii) a decisionmaking principle based on confidence sets that account for encoding errors. As our main result, we prove that when the unknown model is dimensional, a channel capacity of bits suffices to achieve orderoptimal regret. We also establish that for the simpler unstructured multiarmed bandit problem, bit channel capacity is sufficient for achieving optimal regret bounds.more » « less

We consider a stochastic inventory control problem under censored demand, lost sales, and positive lead times. This is a fundamental problem in inventory management, with significant literature establishing near optimality of a simple class of policies called “basestock policies” as well as the convexity of longrun average cost under those policies. We consider a relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown. The goal is to bound the regret of the algorithm when compared with the best basestock policy. Our main contribution is a learning algorithm with a regret bound of [Formula: see text] for the inventory control problem. Here, [Formula: see text] is the fixed and known lead time, and D is an unknown parameter of the demand distribution described roughly as the expected number of time steps needed to generate enough demand to deplete one unit of inventory. Notably, our regret bounds depend linearly on L, which significantly improves the previously bestknown regret bounds for this problem where the dependence on L was exponential. Our techniques utilize the convexity of the longrun average cost and a newly derived bound on the “bias” of basestock policies to establish an almost black box connection between the problem of learning in Markov decision processes (MDPs) with these properties and the stochastic convex bandit problem. The techniques presented here may be of independent interest for other settings that involve large structured MDPs but with convex asymptotic average cost functions.more » « less

null (Ed.)We consider the communication complexity of a number of distributed optimization problems. We start with the problem of solving a linear system. Suppose there is a coordinator together with s servers P1, …, Ps, the ith of which holds a subset A(i) x = b(i) of ni constraints of a linear system in d variables, and the coordinator would like to output an x ϵ ℝd for which A(i) x = b(i) for i = 1, …, s. We assume each coefficient of each constraint is specified using L bits. We first resolve the randomized and deterministic communication complexity in the pointtopoint model of communication, showing it is (d2 L + sd) and (sd2L), respectively. We obtain similar results for the blackboard communication model. As a result of independent interest, we show the probability a random matrix with integer entries in {–2L, …, 2L} is invertible is 1–2−Θ(dL), whereas previously only 1 – 2−Θ(d) was known. When there is no solution to the linear system, a natural alternative is to find the solution minimizing the ℓp loss, which is the ℓp regression problem. While this problem has been studied, we give improved upper or lower bounds for every value of p ≥ 1. One takeaway message is that sampling and sketching techniques, which are commonly used in earlier work on distributed optimization, are neither optimal in the dependence on d nor on the dependence on the approximation ε, thus motivating new techniques from optimization to solve these problems. Towards this end, we consider the communication complexity of optimization tasks which generalize linear systems, such as linear, semidefinite, and convex programming. For linear programming, we first resolve the communication complexity when d is constant, showing it is (sL) in the pointtopoint model. For general d and in the pointtopoint model, we show an Õ(sd3L) upper bound and an (d2 L + sd) lower bound. In fact, we show if one perturbs the coefficients randomly by numbers as small as 2−Θ(L), then the upper bound is Õ(sd2L) + poly(dL), and so this bound holds for almost all linear programs. Our study motivates understanding the bit complexity of linear programming, which is related to the running time in the unit cost RAM model with words of O(log(nd)) bits, and we give the fastest known algorithms for linear programming in this model. Read More: https://epubs.siam.org/doi/10.1137/1.9781611975994.106more » « less

We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and nonstationary dynamics A_t, B_t. The sequence of dynamics matrices can be arbitrary, but with a total variation, V_T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially suboptimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of O(V_T^2/5 T^3/5 ). With piecewise constant dynamics, our algorithm achieves the optimal regret of O(sqrtST ) where S is the number of switches. The crux of our algorithm is an adaptive nonstationarity detection strategy, which builds on an approach recently developed for contextual Multiarmed Bandit problems. We also argue that nonadaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of $V_T$. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is nonstationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.more » « less