Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
In this paper, we address the problem of model-free optimal output regulation of discrete-time systems that aims at achieving asymptotic tracking and disturbance rejection when we have no exact knowledge of the system parameters. Insights from reinforcement learning and adaptive dynamic programming are used to solve this problem. An interesting discovery is that the model-free discrete-time output regulation differs from the continuous-time counterpart in terms of the persistent excitation condition required to ensure the uniqueness and convergence of the policy iteration. In this work, it is shown that this persistent excitation condition must be carefully established in order to ensure the uniqueness and convergence properties of the policy iteration.more » « less
-
This paper presents a unified approach to the problem of learning-based optimal control of connected human-driven and autonomous vehicles in mixed-traffic environments including both the freeway and ring road settings. The stabilizability of a string of connected vehicles including multiple autonomous vehicles (AVs) and heterogeneous human-driven vehicles (HDVs) is studied by a model reduction technique and the Popov-Belevitch-Hautus (PBH) test. For this problem setup, a linear quadratic regulator (LQR) problem is formulated and a solution based on adaptive dynamic programming (ADP) techniques is proposed without a priori knowledge on model parameters. To start the learning process, an initial stabilizing control law is obtained using the small-gain theorem for the ring road case. It is shown that the obtained stabilizing control law can achieve general Lp string stability under appropriate conditions. Besides, to minimize the impact of external disturbance, a linear quadratic zero-sum game is introduced and solved by an iterative learning-based algorithm. Finally, the simulation results verify the theoretical analysis and the proposed methods achieve desirable performance for control of a mixed-vehicular network.more » « less
-
This paper studies the adaptive optimal control problem for a class of linear time-delay systems described by delay differential equations (DDEs). A crucial strategy is to take advantage of recent developments in reinforcement learning (RL) and adaptive dynamic programming (ADP) and develop novel methods to learn adaptive optimal controllers from finite samples of input and state data. In this paper, the data-driven policy iteration (PI) is proposed to solve the infinite-dimensional algebraic Riccati equation (ARE) iteratively in the absence of exact model knowledge. Interestingly, the proposed recursive PI algorithm is new in the present context of continuous-time time-delay systems, even when the model knowledge is assumed known. The efficacy of the proposed learning-based control methods is validated by means of practical applications arising from metal cutting and autonomous driving.more » « less
-
Alessandro Astolfi (Ed.)This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed, which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.more » « less
-
This article proposes a deep learning (DL)-based control algorithm—DL velocity-based model predictive control (VMPC)—for reducing traffic congestion with slowly time-varying traffic signal controls. This control algorithm consists of system identification using DL and traffic signal control using VMPC. For the training process of DL, we established a modeling error entropy loss as the criteria inspired by the theory of stochastic distribution control (SDC) originated by the fourth author. Simulation results show that the proposed algorithm can reduce traffic congestion with a slowly varying traffic signal control input. Results of an ablation study demonstrate that this algorithm compares favorably to other model-based controllers in terms of prediction error, signal varying speed, and control effectiveness.more » « less
-
This paper studies the issue of data-driven optimal control design for traffic signals of oversaturated urban road networks. The signal control system based on the store and forward model is generally uncontrollable for which the controllable decomposition is needed. Instead of identifying the unknown parameters like saturation rates and turning ratios, a finite number of measured trajectories can be used to parametrize the system and help directly construct a transformation matrix for Kalman controllable decomposition through the fundamental lemma of J. C. Willems. On top of that, an infinite-horizon linear quadratic regulator (LQR) problem is formulated considering the constraints of green times for traffic signals. The problem can be solved through a two-phase data-driven learning process, where one solves an infinite-horizon unconstrained LQR problem and the other solves a finite-horizon constrained LQR problem. The simulation result shows the theoretical analysis is effective and the proposed data-driven controller can yield desired performance for reducing traffic congestion.more » « less
-
Noises are ubiquitous in sensorimotor interactions and contaminate the information provided to the central nervous system (CNS) for motor learning. An interesting question is how the CNS manages motor learning with imprecise information. Integrating ideas from reinforcement learning and adaptive optimal control, this paper develops a novel computational mechanism to explain the robustness of human motor learning to the imprecise information, caused by control-dependent noise that exists inherently in the sensorimotor systems. Starting from an initial admissible control policy, in each learning trial the mechanism collects and uses the noisy sensory data (caused by the control-dependent noise) to form an imprecise evaluation of the performance of the current policy and then constructs an updated policy based on the imprecise evaluation. As the number of learning trials increases, the generated policies mathematically provably converge to a (potentially small) neighborhood of the optimal policy under mild conditions, despite the imprecise information in the learning process. The mechanism directly synthesizes the policies from the sensory data, without identifying an internal forward model. Our preliminary computational results on two classic arm reaching tasks are in line with experimental observations reported in the literature. The model-free control principle proposed in the paper sheds more lights into the inherent robustness of human sensorimotor systems to the imprecise information, especially control-dependent noise, in the CNS.more » « less
-
null (Ed.)This paper presents a data-driven algorithm to solve the problem of infinite-horizon linear quadratic regulation (LQR), for a class of discrete-time linear time-invariant systems subjected to state and control constraints. The problem is divided into a constrained finite-horizon LQR subproblem and an unconstrained infinite-horizon LQR subproblem, which can be solved directly from collected input/state data, separately. Under certain conditions, the combination of the solutions of the subproblems converges to the optimal solution of the original problem. The effectiveness of the proposed approach is validated by a numerical example.more » « less