<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/terms/"><records count="1" morepages="false" start="1" end="1"><record rownumber="1"><dc:product_type>Conference Paper</dc:product_type><dc:title>Graph Meta-Reinforcement Learning for Transferable Autonomous Mobility-on-Demand</dc:title><dc:creator>Gammelli, Daniele; Yang, Kaidi; Harrison, James; Rodrigues, Filipe; Pereira, Francisco; Pavone, Marco</dc:creator><dc:corporate_author/><dc:editor/><dc:description>Autonomous Mobility-on-Demand (AMoD) systems represent an attractive alternative to existing transportation paradigms, currently challenged by urbanization and increasing travel needs. By centrally controlling a fleet of self-driving vehicles, these systems provide mobility service to customers and are currently starting to be deployed in a number of cities around the world. Current learning-based approaches for controlling AMoD systems are limited to the single-city scenario, whereby the service operator is allowed to take an unlimited amount of operational decisions within the same transportation system. However, real-world system operators can hardly afford to fully re-train AMoD controllers for every city they operate in, as this could result in a high number of poor-quality decisions during training, making the single-city strategy a potentially impractical solution. To address these limitations, we propose to formalize the multi-city AMoD problem through the lens of meta-reinforcement learning (meta-RL) and devise an actor-critic algorithm based on recurrent graph neural networks. In our approach, AMoD controllers are explicitly trained such that a small amount of experience within a new city will produce good system performance. Empirically, we show how control policies learned through meta-RL are able to achieve near-optimal performance on unseen cities by learning rapidly adaptable policies, thus making them more robust not only to novel environments, but also to distribution shifts common in real-world operations, such as special events, unexpected congestion, and dynamic pricing schemes.</dc:description><dc:publisher/><dc:date>2022-08-14</dc:date><dc:nsf_par_id>10414575</dc:nsf_par_id><dc:journal_name>Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</dc:journal_name><dc:journal_volume/><dc:journal_issue/><dc:page_range_or_elocation>2913 to 2923</dc:page_range_or_elocation><dc:issn/><dc:isbn/><dc:doi>https://doi.org/10.1145/3534678.3539180</dc:doi><dcq:identifierAwardId>1837135</dcq:identifierAwardId><dc:subject/><dc:version_number/><dc:location/><dc:rights/><dc:institution/><dc:sponsoring_org>National Science Foundation</dc:sponsoring_org></record></records></rdf:RDF>