This paper studies the issue of data-driven optimal control design for traffic signals of oversaturated urban road networks. The signal control system based on the store and forward model is generally uncontrollable for which the controllable decomposition is needed. Instead of identifying the unknown parameters like saturation rates and turning ratios, a finite number of measured trajectories can be used to parametrize the system and help directly construct a transformation matrix for Kalman controllable decomposition through the fundamental lemma of J. C. Willems. On top of that, an infinite-horizon linear quadratic regulator (LQR) problem is formulated considering the constraints of green times for traffic signals. The problem can be solved through a two-phase data-driven learning process, where one solves an infinite-horizon unconstrained LQR problem and the other solves a finite-horizon constrained LQR problem. The simulation result shows the theoretical analysis is effective and the proposed data-driven controller can yield desired performance for reducing traffic congestion.
more »
« less
A Data-driven Approach for Constrained Infinite-Horizon Linear Quadratic Regulation
This paper presents a data-driven algorithm to solve the problem of infinite-horizon linear quadratic regulation (LQR), for a class of discrete-time linear time-invariant systems subjected to state and control constraints. The problem is divided into a constrained finite-horizon LQR subproblem and an unconstrained infinite-horizon LQR subproblem, which can be solved directly from collected input/state data, separately. Under certain conditions, the combination of the solutions of the subproblems converges to the optimal solution of the original problem. The effectiveness of the proposed approach is validated by a numerical example.
more »
« less
- Award ID(s):
- 1903781
- PAR ID:
- 10268424
- Date Published:
- Journal Name:
- Proc. 59th IEEE Conference on Decision and Control
- Page Range / eLocation ID:
- 6010 to 6015
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Common reinforcement learning methods seek optimal controllers for unknown dynamical systems by searching in the "policy" space directly. A recent line of research, starting with [1], aims to provide theoretical guarantees for such direct policy-update methods by exploring their performance in classical control settings, such as the infinite horizon linear quadratic regulator (LQR) problem. A key property these analyses rely on is that the LQR cost function satisfies the "gradient dominance" property with respect to the policy parameters. Gradient dominance helps guarantee that the optimal controller can be found by running gradient-based algorithms on the LQR cost. The gradient dominance property has so far been verified on a case-by-case basis for several control problems including continuous/discrete time LQR, LQR with decentralized controller, H2/H∞ robust control.In this paper, we make a connection between this line of work and classical convex parameterizations based on linear matrix inequalities (LMIs). Using this, we propose a unified framework for showing that gradient dominance indeed holds for a broad class of control problems, such as continuous- and discrete-time LQR, minimizing the L2 gain, and problems using system-level parameterization. Our unified framework provides insights into the landscape of the cost function as a function of the policy, and enables extending convergence results for policy gradient descent to a much larger class of problems.more » « less
-
We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics A_t, B_t. The sequence of dynamics matrices can be arbitrary, but with a total variation, V_T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of O(V_T^2/5 T^3/5 ). With piecewise constant dynamics, our algorithm achieves the optimal regret of O(sqrtST ) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of $$V_T$$. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.more » « less
-
This paper investigates the problem of persistent monitoring, where a finite set of mobile agents persistently visits a finite set of targets in a multi-dimensional environment. The agents must estimate the targets’ internal states and the goal is to minimize the mean squared estimation error over time. The internal states of the targets evolve with linear stochastic dynamics and thus the optimal estimator is a Kalman-Bucy Filter. We constrain the trajectories of the agents to be periodic and represented by a truncated Fourier series. Taking advantage of the periodic nature of this solution, we define the infinite horizon version of the problem and explore the property that the mean estimation squared error converges to a limit cycle. We present a technique to compute online the gradient of the steady state mean estimation error of the targets’ states with respect to the parameters defining the trajectories and use a gradient descent scheme to obtain locally optimal movement schedules. This scheme allows us to address the infinite horizon problem with only a small number of parameters to be optimized.more » « less
-
null (Ed.)Recent empirical evidence suggests that the Weston-Watkins support vector machine is among the best performing multiclass extensions of the binary SVM. Current state-of-the-art solvers repeatedly solve a particular subproblem approximately using an iterative strategy. In this work, we propose an algorithm that solves the subproblem exactly using a novel reparametrization of the Weston-Watkins dual problem. For linear WW-SVMs, our solver shows significant speed-up over the state-of-the-art solver when the number of classes is large. Our exact subproblem solver also allows us to prove linear convergence of the overall solver.more » « less
An official website of the United States government

