This content will become publicly available on December 13, 2024
- Award ID(s):
- 2200692
- PAR ID:
- 10493719
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- 2023 62nd IEEE Conference on Decision and Control (CDC)
- ISSN:
- 2576-2370
- ISBN:
- 979-8-3503-0124-3
- Page Range / eLocation ID:
- 1334 to 1341
- Format(s):
- Medium: X
- Location:
- Singapore, Singapore
- Sponsoring Org:
- National Science Foundation
More Like this
-
Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a control prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the prior policy has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a wide range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.more » « less
-
With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called RL for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing, and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.more » « less
-
Constrained Reinforcement Learning for Fair and Environmentally Efficient Traffic Signal Controllers
Traffic signal controller (TSC) has a crucial role in managing traffic flow in urban areas. Recently, reinforcement learning (RL) models have received a great attention for TSC with promising results. However, these RL-TSC models still need to be improved for real-world deployment due to limited exploration of different performance metrics such as fair traffic scheduling or air quality impact. In this work, we introduce a constrained multi-objective RL model that minimizes multiple constrained objectives while achieving a higher expected reward. Furthermore, our proposed RL strategy integrates the peak and average constraint models to the RL problem formulation with maximum entropy off-policy models. We applied this strategy to a single TSC and a network of TSCs. As part of this constrained RL-TSC formulation, we discuss fairness and air quality parameters as constraints for the closed-loop control system optimization model at TSCs called
FAirLight . Our experimental analysis shows that the proposedFAirLight achieves a good traffic flow performance in terms of average waiting time while being fair and environmentally friendly. Our method outperforms the baseline models and allows a more comprehensive view of RL-TSC regarding its applicability to the real world. -
In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multiconstraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.more » « less
-
Compared with capital improvement projects, real-time control of stormwater systems may be a more effective and efficient approach to address the increasing risk of flooding in urban areas. One way to automate the design process of control policies is through reinforcement learning (RL). Recently, RL methods have been applied to small stormwater systems and have demonstrated better performance over passive systems and simple rule-based strategies. However, it remains unclear how effective RL methods are for larger and more complex systems. Current RL-based control policies also suffer from poor convergence and stability, which may be due to large updates made by the underlying RL algorithm. In this study, we use the Proximal Policy Optimization (PPO) algorithm and develop control policies for a medium-sized stormwater system that can significantly mitigate flooding during large storm events. Our approach demonstrates good convergence behavior and stability, and achieves robust out-of-sample performance.more » « less