- NSF-PAR ID:
- 10225176
- Date Published:
- Journal Name:
- 2020 IEEE International Conference on Data Mining (ICDM)
- Page Range / eLocation ID:
- 801 to 810
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
xGAIL: Explainable Generative Adversarial Imitation Learning for Explainable Human Decision AnalysisTo make daily decisions, human agents devise their own "strategies" governing their mobility dynamics (e.g., taxi drivers have preferred working regions and times, and urban commuters have preferred routes and transit modes). Recent research such as generative adversarial imitation learning (GAIL) demonstrates successes in learning human decision-making strategies from their behavior data using deep neural networks (DNNs), which can accurately mimic how humans behave in various scenarios, e.g., playing video games, etc. However, such DNN-based models are "black box" models in nature, making it hard to explain what knowledge the models have learned from human, and how the models make such decisions, which was not addressed in the literature of imitation learning. This paper addresses this research gap by proposing xGAIL, the first explainable generative adversarial imitation learning framework. The proposed xGAIL framework consists of two novel components, including Spatial Activation Maximization (SpatialAM) and Spatial Randomized Input Sampling Explanation (SpatialRISE), to extract both global and local knowledge from a well-trained GAIL model that explains how a human agent makes decisions. Especially, we take taxi drivers' passenger-seeking strategy as an example to validate the effectiveness of the proposed xGAIL framework. Our analysis on a large-scale real-world taxi trajectory data shows promising results from two aspects: i) global explainable knowledge of what nearby traffic condition impels a taxi driver to choose a particular direction to find the next passenger, and ii) local explainable knowledge of what key (sometimes hidden) factors a taxi driver considers when making a particular decision.more » « less
-
Public transits, such as buses and subway lines, offer affordable ride-sharing services and reduce the road network traffic, thus have significant impacts in mitigating the urban traffic congestion problem. However, it is non-trivial to evaluate a new transit plan, such as a new bus route or a new subway line, of its future ridership prior to actual deployment, since the travel preferences of passengers along the planned routes may vary. In this paper, we make the first attempt to model passengers' preferences of making various transit choices using a Markov Decision Process (MDP). Moreover, we develop a novel inverse preference learning algorithm to infer the passengers' preferences and predict the future human behavior changes, e.g., ridership, of a new urban transit plan before its deployment. We validate our proposed framework using a unique real-world dataset (from Shenzhen, China) with three subway lines opened during the data time span. With the data collected from both before and after the transit plan deployments, Our evaluation results demonstrated that the proposed framework can predict the ridership with only 19.8% relative error, which is 23%-51% lower than other baseline approaches.more » « less
-
null (Ed.)Smart passenger-seeking strategies employed by taxi drivers contribute not only to drivers’ incomes, but also higher quality of service passengers received. Therefore, understanding taxi drivers’ behaviors and learning the good passenger-seeking strategies are crucial to boost taxi drivers’ well-being and public transportation quality of service. However, we observe that drivers’ preferences of choosing which area to find the next passenger are diverse and dynamic across locations and drivers. It is hard to learn the location-dependent preferences given the partial data (i.e., an individual driver's trajectory may not cover all locations). In this paper, we make the first attempt to develop conditional generative adversarial imitation learning (cGAIL) model, as a unifying collective inverse reinforcement learning framework that learns the driver's decision-making preferences and policies by transferring knowledge across taxi driver agents and across locations. Our evaluation results on three months of taxi GPS trajectory data in Shenzhen, China, demonstrate that the driver's preferences and policies learned from cGAIL are on average 34.7% more accurate than those learned from other state-of-the-art baseline approaches.more » « less
-
null (Ed.)Suppliers registered within a manufacturing-as-a-service (MaaS) marketplace require near real time decision making to accept or reject orders received on the platform. Myopic decision-making such as a first come, first serve method in this dynamic and stochastic environment can lead to suboptimal revenue generation. In this paper, this sequential decision making problem is formulated as a Markov Decision Process and solved using deep reinforcement learning (DRL). Empirical simulations demonstrate that DRL has considerably better performance compared to four baselines. This early work demonstrates a learning approach for near real-time decision making for suppliers participating in a MaaS marketplace.more » « less
-
The expanding role of reinforcement learning (RL) in safety-critical system design has promoted ω-automata as a way to express learning requirements—often non-Markovian—with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1)
weighted preference , where the decision maker provides scalar weights for various objectives, and (2)lexicographic preference , where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple ω-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple ω-regular objectives to a scalar reward signal that is bothfaithful (maximising reward means maximising probability of achieving the objectives under the corresponding preference) andeffective (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool,Mungojerrie , and we present an experimental evaluation of our technique on benchmark learning problems.