skip to main content


Title: Data-Driven Control of Markov Jump Systems: Sample Complexity and Regret Bounds
Learning how to effectively control unknown dynamical systems from data is crucial for intelligent autonomous systems. This task becomes a significant challenge when the underlying dynamics are changing with time. Motivated by this challenge, this paper considers the problem of controlling an unknown Markov jump linear system (MJS) to optimize a quadratic objective in a data-driven way. By taking a model-based perspective, we consider identification-based adaptive control for MJS. We first provide a system identification algorithm for MJS to learn the dynamics in each mode as well as the Markov transition matrix, underlying the evolution of the mode switches, from a single trajectory of the system states, inputs, and modes. Through mixing-time arguments, sample complexity of this algorithm is shown to be O(1/T−−√). We then propose an adaptive control scheme that performs system identification together with certainty equivalent control to adapt the controllers in an episodic fashion. Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves O(T−−√) regret. Our proof strategy introduces innovations to handle Markovian jumps and a weaker notion of stability common in MJSs. Our analysis provides insights into system theoretic quantities that affect learning accuracy and control performance. Numerical simulations are presented to further reinforce these insights.  more » « less
Award ID(s):
1931982 1845076
NSF-PAR ID:
10387222
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2022 American Control Conference
Page Range / eLocation ID:
4901 to 4908
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Real-world control applications often involve complex dynamics subject to abrupt changes or variations. Markov jump linear systems (MJS) provide a rich framework for modeling such dynamics. Despite an extensive history, theoretical understanding of parameter sensitivities of MJS control is somewhat lacking. Motivated by this, we investigate robustness aspects of certainty equivalent model-based optimal control for MJS with a quadratic cost function. Given the uncertainty in the system matrices and in the Markov transition matrix is bounded by ϵ and η respectively, robustness results are established for (i) the solution to coupled Riccati equations and (ii) the optimal cost, by providing explicit perturbation bounds that decay as O(ε+η) and O((ε+η)2) respectively. 
    more » « less
  2. We propose a learning-based robust predictive control algorithm that compensates for significant uncertainty in the dynamics for a class of discrete-time systems that are nominally linear with an additive nonlinear component. Such systems commonly model the nonlinear effects of an unknown environment on a nominal system. We optimize over a class of nonlinear feedback policies inspired by certainty equivalent "estimate-and-cancel" control laws pioneered in classical adaptive control to achieve significant performance improvements in the presence of uncertainties of large magnitude, a setting in which existing learning-based predictive control algorithms often struggle to guarantee safety. In contrast to previous work in robust adaptive MPC, our approach allows us to take advantage of structure (i.e., the numerical predictions) in the a priori unknown dynamics learned online through function approximation. Our approach also extends typical nonlinear adaptive control methods to systems with state and input constraints even when we cannot directly cancel the additive uncertain function from the dynamics. Moreover, we apply contemporary statistical estimation techniques to certify the system’s safety through persistent constraint satisfaction with high probability. Finally, we show in simulation that our method can accommodate more significant unknown dynamics terms than existing methods. 
    more » « less
  3. We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics A_t, B_t. The sequence of dynamics matrices can be arbitrary, but with a total variation, V_T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of O(V_T^2/5 T^3/5 ). With piecewise constant dynamics, our algorithm achieves the optimal regret of O(sqrtST ) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of $V_T$. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application. 
    more » « less
  4. While Markov jump systems (MJSs) are more appropriate than LTI systems in terms of modeling abruptly changing dynamics, MJSs (and other switched systems) may suffer from the model complexity brought by the potentially sheer number of switching modes. Much of the existing work on reducing switched systems focuses on the state space where techniques such as discretization and dimension reduction are performed, yet reducing mode complexity receives few attention. In this work, inspired by clustering techniques from unsupervised learning, we propose a reduction method for MJS such that a mode-reduced MJS can be constructed with guaranteed approximation performance. Furthermore, we show how this reduced MJS can be used in designing controllers for the original MJS to reduce the computation cost while maintaining guaranteed suboptimality. Keywords: Markov Jump Systems, System Reduction, Clustering 
    more » « less
  5. While Markov jump systems (MJSs) are more appropriate than LTI systems in terms of modeling abruptly changing dynamics, MJSs (and other switched systems) may suffer from the model complexity brought by the potentially sheer number of switching modes. Much of the existing work on reducing switched systems focuses on the state space where techniques such as discretization and dimension reduction are performed, yet reducing mode complexity receives few attention. In this work, inspired by clustering techniques from unsupervised learning, we propose a reduction method for MJS such that a mode-reduced MJS can be constructed with guaranteed approximation performance. Furthermore, we show how this reduced MJS can be used in designing controllers for the original MJS to reduce the computation cost while maintaining guaranteed suboptimality. 
    more » « less