skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems
We study predictive control in a setting where the dynamics are time-varying and linear, and the costs are time-varying and well-conditioned. At each time step, the controller receives the exact predictions of costs, dynamics, and disturbances for the future k time steps. We show that when the prediction window k is sufficiently large, predictive control is input-to-state stable and achieves a dynamic regret of O(λkT), where λ<1 is a positive constant. This is the first dynamic regret bound on the predictive control of linear time-varying systems. We also show a variation of predictive control obtains the first competitive bound for the control of linear time-varying systems: 1+O(λk). Our results are derived using a novel proof framework based on a perturbation bound that characterizes how a small change to the system parameters impacts the optimal trajectory.  more » « less
Award ID(s):
2105648
PAR ID:
10324697
Author(s) / Creator(s):
Date Published:
Journal Name:
Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider a general non-stochastic online pricing bandit setting in a procurement scenario where a buyer with a budget wants to procure items from a fixed set of sellers to maximize the buyer's reward by dynamically offering purchasing prices to the sellers, where the sellers' costs and values at each time period can change arbitrarily and the sellers determine whether to accept the offered prices to sell the items. This setting models online pricing scenarios of procuring resources or services in multi-agent systems. We first consider the offline setting when sellers' costs and values are known in advance and investigate the best fixed-price policy in hindsight. We show that it has a tight approximation guarantee with respect to the offline optimal solutions. In the general online setting, we propose an online pricing policy, Granularity-based Pricing (GAP), which exploits underlying side-information from the feedback graph when the budget is given as the input. We show that GAP achieves an upper bound of O(n{v_{max}}{c_{min}}sqrt{B/c_{min}}ln B) on the alpha-regret where n, v_{max}, c_{min}, and B are the number, the maximum value, the minimum cost of sellers, and the budget, respectively. We then extend it to the unknown budget case by developing a variant of GAP, namely Doubling-GAP, and show its alpha-regret is at most O(n{v_{max}}{c_{min}}sqrt{B/c_{min}}ln2 B). We also provide an alpha-regret lower bound Omega(v_{max}sqrt{Bn/c_{min}}) of any online policy that is tight up to sub-linear terms. We conduct simulation experiments to show that the proposed policy outperforms the baseline algorithms. 
    more » « less
  2. Feldman, Vitaly; Ligett, Katrina; Sabato, Sivan (Ed.)
    Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $$K$$ out of $$N$$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $$K$$ out of $$N$$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $$K$$ arms. The direct use of multi-armed bandit requires choosing among $$N$$-choose-$$K$$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $$N$$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $$\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $$T$$, which is \textit{sub-linear} in all parameters $$T$$, $$N$$, and $$K$$. 
    more » « less
  3. We introduce the E$^4$ algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Eliminate-Exploit framework. With a proper choice of exploration rate, we prove E$^4$ achieves the finite-time minimax optimal regret with only $$O(\log\log T)$$ batches, and the asymptotically optimal regret with only $$3$$ batches as $$T\rightarrow\infty$$, where $$T$$ is the time horizon. We further prove a lower bound on the batch complexity of linear contextual bandits showing that any asymptotically optimal algorithm must require at least $$3$$ batches in expectation as $$T\rightarrow\infty$$, which indicates E$^4$ achieves the asymptotic optimality in regret and batch complexity simultaneously. To the best of our knowledge, E$^4$ is the first algorithm for linear bandits that simultaneously achieves the minimax and asymptotic optimality in regret with the corresponding optimal batch complexities. In addition, we show that with another choice of exploration rate E$^4$ achieves an instance-dependent regret bound requiring at most $$O(\log T)$$ batches, and maintains the minimax optimality and asymptotic optimality. We conduct thorough experiments to evaluate our algorithm on randomly generated instances and the challenging \textit{End of Optimism} instances \citep{lattimore2017end} which were shown to be hard to learn for optimism based algorithms. Empirical results show that E$^4$ consistently outperforms baseline algorithms with respect to regret minimization, batch complexity, and computational efficiency. 
    more » « less
  4. We introduce the E$^4$ algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Eliminate-Exploit framework. With a proper choice of exploration rate, we prove E$^4$ achieves the finite-time minimax optimal regret with only $$O(\log\log T)$$ batches, and the asymptotically optimal regret with only $$3$$ batches as $$T\rightarrow\infty$$, where $$T$$ is the time horizon. We further prove a lower bound on the batch complexity of linear contextual bandits showing that any asymptotically optimal algorithm must require at least $$3$$ batches in expectation as $$T\rightarrow\infty$$, which indicates E$^4$ achieves the asymptotic optimality in regret and batch complexity simultaneously. To the best of our knowledge, E$^4$ is the first algorithm for linear bandits that simultaneously achieves the minimax and asymptotic optimality in regret with the corresponding optimal batch complexities. In addition, we show that with another choice of exploration rate E$^4$ achieves an instance-dependent regret bound requiring at most $$O(\log T)$$ batches, and maintains the minimax optimality and asymptotic optimality. We conduct thorough experiments to evaluate our algorithm on randomly generated instances and the challenging \textit{End of Optimism} instances \citep{lattimore2017end} which were shown to be hard to learn for optimism based algorithms. Empirical results show that E$^4$ consistently outperforms baseline algorithms with respect to regret minimization, batch complexity, and computational efficiency. 
    more » « less
  5. Ruiz, Francisco and (Ed.)
    We consider the problem of universal dynamic regret minimization under exp-concave and smooth losses. We show that appropriately designed Strongly Adaptive algorithms achieve a dynamic regret of $$\tilde O(d^2 n^{1/5} [\mathcal{TV}_1(w_{1:n})]^{2/5} \vee d^2)$$, where $$n$$ is the time horizon and $$\mathcal{TV}_1(w_{1:n})$$ a path variational based on second order differences of the comparator sequence. Such a path variational naturally encodes comparator sequences that are piece-wise linear – a powerful family that tracks a variety of non-stationarity patterns in practice (Kim et al., 2009). The aforementioned dynamic regret is shown to be optimal modulo dimension dependencies and poly-logarithmic factors of $$n$$. To the best of our knowledge, this path variational has not been studied in the non-stochastic online learning literature before. Our proof techniques rely on analysing the KKT conditions of the offline oracle and requires several non-trivial generalizations of the ideas in Baby and Wang (2021) where the latter work only implies an $$\tilde{O}(n^{1/3})$$ regret for the current problem. 
    more » « less