skip to main content


Title: Improving Deep Reinforcement Learning-Based Perimeter Metering Control Methods With Domain Control Knowledge
Perimeter metering control has long been an active research topic since well-defined relationships between network productivity and usage, that is, network macroscopic fundamental diagrams (MFDs), were shown to be capable of describing regional traffic dynamics. Numerous methods have been proposed to solve perimeter metering control problems, but these generally require knowledge of the MFDs or detailed equations that govern traffic dynamics. Recently, a study applied model-free deep reinforcement learning (Deep-RL) methods to two-region perimeter control and found comparable performances to the model predictive control scheme, particularly when uncertainty exists. However, the proposed methods therein provide very low initial performances during the learning process, which limits their applicability to real life scenarios. Furthermore, the methods may not be scalable to more complicated networks with larger state and action spaces. To combat these issues, this paper proposes to integrate the domain control knowledge (DCK) of congestion dynamics into the agent designs for improved learning and control performances. A novel agent is also developed that builds on the Bang-Bang control policy. Two types of DCK are then presented to provide knowledge-guided exploration strategies for the agents such that they can explore around the most rewarding part of the action spaces. The results from extensive numerical experiments on two- and three-region urban networks show that integrating DCK can (a) effectively improve learning and control performances for Deep-RL agents, (b) enhance the agents’ resilience against various types of environment uncertainties, and (c) mitigate the scalability issue for the agents.  more » « less
Award ID(s):
1749200
NSF-PAR ID:
10403490
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Transportation Research Record: Journal of the Transportation Research Board
ISSN:
0361-1981
Page Range / eLocation ID:
036119812311524
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Regional perimeter control based on the existence of macroscopic fundamental diagrams has been widely studied as an effective tool to regulate traffic and prevent oversaturation in dense urban areas. Significant research efforts have been performed concerning the modeling aspects of perimeter control. More recently, data-driven techniques for perimeter control have shown remarkable promise; however, few studies have examined the transferability of these techniques. While it is surely of the highest priority to devise effective perimeter control methods, the ability of such methods to transfer the learned knowledge and quickly adapt control policies to a new setting is critical, particularly in real-life situations where training a method from scratch is intractable. This work seeks to bridge this research gap by comprehensively examining the effectiveness and transferability of a reinforcement-learning-based perimeter control method for a two-region urban network in a microsimulation setting. The results suggest: 1) the presented data-driven method demonstrates promising control effectiveness in comparison with no perimeter control and an extended greedy controller and 2) the method can readily transfer its learned knowledge and adapt its control policy with newly collected data to simulation settings with different traffic demands, driving behaviors, or both.

     
    more » « less
  2. null (Ed.)
    Given the aging infrastructure and the anticipated growing number of highway work zones in the U.S.A., it is important to investigate work zone merge control, which is critical for improving work zone safety and capacity. This paper proposes and evaluates a novel highway work zone merge control strategy based on cooperative driving behavior enabled by artificial intelligence. The proposed method assumes that all vehicles are fully automated, connected, and cooperative. It inserts two metering zones in the open lane to make space for merging vehicles in the closed lane. In addition, each vehicle in the closed lane learns how to adjust its longitudinal position optimally to find a safe gap in the open lane using an off-policy soft actor critic reinforcement learning (RL) algorithm, considering its surrounding traffic conditions. The learning results are captured in convolutional neural networks and used to control individual vehicles in the testing phase. By adding the metering zones and taking the locations, speeds, and accelerations of surrounding vehicles into account, cooperation among vehicles is implicitly considered. This RL-based model is trained and evaluated using a microscopic traffic simulator. The results show that this cooperative RL-based merge control significantly outperforms popular strategies such as late merge and early merge in terms of both mobility and safety measures. It also performs better than a strategy assuming all vehicles are equipped with cooperative adaptive cruise control. 
    more » « less
  3. Reinforcement learning (RL) methods can be used to develop a controller for the heating, ventilation, and air conditioning (HVAC) systems that both saves energy and ensures high occupants’ thermal comfort levels. However, the existing works typically require on-policy data to train an RL agent, and the occupants’ personalized thermal preferences are not considered, which is limited in the real-world scenarios. This paper designs a high-performance model-based offline RL algorithm for personalized HVAC systems. The proposed algorithm can quickly adapt to different occupants’ thermal preferences with a few thermal feedbacks, guaranteeing the high occupants’ personalized thermal comfort levels efficiently. First, we use a meta-supervised learning algorithm to train an occupant's thermal preference model. Then, we train an ensemble neural network to predict the thermal states of the considered zone. In addition, the obtained ensemble networks can indicate the regions in the state and action spaces covered by the offline dataset. With the personalized thermal preference model updated via meta-testing, model-based RL is used to derive the optimal HVAC controller. Since the proposed algorithm only requires offline datasets and a few online thermal feedbacks for training, it contributes to a more practical deployment of the RL algorithm to HVAC systems. We use the ASHRAE database II to verify the effectiveness and advantage of the meta-learning algorithm for modeling different occupants’ thermal preferences. Numerical simulations on the EnergyPlus environment demonstrate that the proposed algorithm can guarantee personalized thermal preferences with a slight increase of power consumption of 1.91% compared with the model-based RL algorithm with on-policy data aggregation. 
    more » « less
  4. Recent studies have leveraged the existence of network macroscopic fundamental diagrams (MFD) to develop regional control strategies for urban traffic networks. Existing MFD-based control strategies focus on vehicle movement within and across regions of an urban network and do not consider how freeway traffic can be controlled to improve overall traffic operations in mixed freeway and urban networks. The purpose of this study is to develop a coordinated traffic management scheme that simultaneously implements perimeter flow control on an urban network and variable speed limits (VSL) on a freeway to reduce total travel time in such a mixed network. By slowing down vehicles traveling along the freeway, VSL can effectively meter traffic exiting the freeway into the urban network. This can be particularly useful since freeways often have large storage capacities and vehicles accumulating on freeways might be less disruptive to overall system operations than on urban streets. VSL can also be used to change where freeway vehicles enter the urban network to benefit the entire system. The combined control strategy is implemented in a model predictive control framework with several realistic constraints, such as gradual reductions in freeway speed limit. Numerical tests suggest that the combined implementation of VSL and perimeter metering control can improve traffic operations compared with perimeter metering alone. 
    more » « less
  5. Abstract Background Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods We present a novel, deep neural network, reinforcement learning-based robust controller for a LLRE based on a decoupled offline human-exoskeleton simulation training with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy to different human conditions, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to patients with different degrees of neuromuscular disorders without any control parameter tuning. Results and conclusion A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions without any control parameter tuning. Analysis of the RMSE for joint tracking, CoP-based stability, and gait symmetry shows the effectiveness of the controller. An ablation study also demonstrates the strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameter tuning. 
    more » « less