NSF PAR Search | NSF Public Access Repository

Reinforcement Learning from Optimization Proxy for Ride-Hailing Vehicle Relocation

https://doi.org/10.1613/jair.1.13794

Yuan, Enpeng; Chen, Wenbo; Van Hentenryck, Pascal (September 2022, Journal of Artificial Intelligence Research)

Idle vehicle relocation is crucial for addressing demand-supply imbalance that frequently arises in the ride-hailing system. Current mainstream methodologies - optimization and reinforcement learning - suffer from obvious computational drawbacks. Optimization models need to be solved in real-time and often trade off model fidelity (hence quality of solutions) for computational efficiency. Reinforcement learning is expensive to train and often struggles to achieve coordination among a large fleet. This paper designs a hybrid approach that leverages the strengths of the two while overcoming their drawbacks. Specifically, it trains an optimization proxy, i.e., a machine-learning model that approximates an optimization model, and then refines the proxy with reinforcement learning. This Reinforcement Learning from Optimization Proxy (RLOP) approach is computationally efficient to train and deploy, and achieves better results than RL or optimization alone. Numerical experiments on the New York City dataset show that the RLOP approach reduces both the relocation costs and computation time significantly compared to the optimization model, while pure reinforcement learning fails to converge due to computational complexity.

Full Text Available

When demand increases beyond the system capacity, riders in ride-hailing/ride-sharing systems often experience long waiting time, resulting in poor customer satisfaction. This paper proposes a spatio-temporal pricing framework (AP-RTRS) to alleviate this challenge and shows how it naturally complements state-of-the-art dispatching and routing algorithms. Specifically, the pricing optimization model regulates demand to ensure that every rider opting to use the system is served within reason-able time: it does so either by reducing demand to meet the capacity constraints or by prompting potential riders to postpone service to a later time. The pricing model is a model-predictive control algorithm that works at a coarser temporal and spatial granularity compared to the real-time dispatching and routing, and naturally integrates vehicle relocations. Simulation experiments indicate that the pricing optimization model achieves short waiting times without sacrificing revenues and geographical fairness.

Search for: All records