Cordon control with spatially-varying metering rates: a Reinforcement Learning approach

Ni, W. and

doi:https://doi.org/10.1016/j.trc.2018.12.007

The work explores how Reinforcement Learning can be used to re-time traffic signals around cordoned neighborhoods. An RL-based controller is developed by representing traffic states as graph-structured data and customizing corresponding neural network architectures to handle those data. The customizations enable the controller to: (i) model neighborhood-wide traffic based on directed-graph representations; (ii) use the representations to identify patterns in real-time traffic measurements; and (iii) capture those patterns to a spatial representation needed for selecting optimal cordon-metering rates. Input to the selection process also includes a total inflow to be admitted through a cordon. The rate is optimized in a separate process that is not part of the present work. Our RL-controller distributes that separately-optimized rate across the signalized street links that feed traffic through the cordon. The resulting metering rates vary from one feeder link to the next. The selection process can reoccur at short time intervals in response to changing traffic patterns. Once trained on a few cordons, the RL-controller can be deployed on cordons elsewhere in a city without additional training. This portability feature is confirmed via simulations of traffic on an idealized street network. The tests also indicate that the controller can reduce the network’s vehicle hours traveled well beyond what can be achieved via spatially-uniform cordon metering. The extra reductions in VHT are found to grow larger when traffic exhibits greater in-homogeneities over the network.

More Like this