skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Agent-based model construction using inverse reinforcement learning
Agent-based modeling (ABM) assumes that behavioral rules affecting an agent's states and actions are known. However, discovering these rules is often challenging and requires deep insight about an agent's behaviors. Inverse reinforcement learning (IRL) can complement ABM by providing a systematic way to find behavioral rules from data. IRL frames learning behavioral rules as a problem of recovering motivations from observed behavior and generating rules consistent with these motivations. In this paper, we propose a method to construct an agent-based model directly from data using IRL. We explain each step of the proposed method and describe challenges that may occur during implementation. Our experimental results show that the proposed method can extract rules and construct an agent-based model with rich but concise behavioral rules for agents while still maintaining aggregate-level properties.  more » « less
Award ID(s):
1650512
PAR ID:
10053915
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2017 Winter Simulation Conference (WSC)
Page Range / eLocation ID:
1264-1275
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In large agent-based models, it is difficult to identify the correlate system-level dynamics with individuallevel attributes. In this paper, we use inverse reinforcement learning to estimate compact representations of behaviors in large-scale pandemic simulations in the form of reward functions. We illustrate the capacity and performance of these representations identifying agent-level attributes that correlate with the emerging dynamics of large-scale multi-agent systems. Our experiments use BESSIE, an ABM for COVID-like epidemic processes, where agents make sequential decisions (e.g., use PPE/refrain from activities) based on observations (e.g., number of mask wearing people) collected when visiting locations to conduct their activities. The IRL-based reformulations of simulation outputs perform significantly better in classification of agent-level attributes than direct classification of decision trajectories and are thus more capable of determining agent-level attributes with definitive role in the collective behavior of the system. We anticipate that this IRL-based approach is broadly applicable to general ABMs. 
    more » « less
  2. null (Ed.)
    The nexus of food, energy, and water systems (FEWS) has become a salient research topic, as well as a pressing societal and policy challenge. Computational modeling is a key tool in addressing these challenges, and FEWS modeling as a subfield is now established. However, social dimensions of FEWS nexus issues, such as individual or social learning, technology adoption decisions, and adaptive behaviors, remain relatively underdeveloped in FEWS modeling and research. Agent-based models (ABMs) have received increasing usage recently in efforts to better represent and integrate human behavior into FEWS research. A systematic review identified 29 articles in which at least two food, energy, or water sectors were explicitly considered with an ABM and/or ABM-coupled modeling approach. Agent decision-making and behavior ranged from reactive to active, motivated by primarily economic objectives to multi-criteria in nature, and implemented with individual-based to highly aggregated entities. However, a significant proportion of models did not contain agent interactions, or did not base agent decision-making on existing behavioral theories. Model design choices imposed by data limitations, structural requirements for coupling with other simulation models, or spatial and/or temporal scales of application resulted in agent representations lacking explicit decision-making processes or social interactions. In contrast, several methodological innovations were also noted, which were catalyzed by the challenges associated with developing multi-scale, cross-sector models. Several avenues for future research with ABMs in FEWS research are suggested based on these findings. The reviewed ABM applications represent progress, yet many opportunities for more behaviorally rich agent-based modeling in the FEWS context remain. 
    more » « less
  3. Agent navigation has been a crucial task in today's service and automated factories. Many efforts are to set specific rules for agents in a certain scenario to regulate the agent's behaviors. However, not all situations could be in advance considered, which might lead to terrible performance in a real-world application. In this paper, we propose CrowdGAIL, a method to learn from expert behaviors as an instructing policy, can train most 'human-like' agents in navigation problems without manually setting any reward function or beforehand regulations. First, the proposed model structure is based on generative adversarial imitation learning (GAIL), which imitates how humans take actions and move toward the target to a maximum extent, and by comparison, we prove the advantage of proximal policy optimization (PPO) to trust region policy optimization, thus, GAIL-PPO is what we base. Second, we design a special Sequential DemoBuffer compatible with the inner long short-term memory structure to apply spatiotemporal instruction on the agent's next step. Third, the paper demonstrates the potential of the model with an integrated social manner in a multi-agent scenario by considering human collision avoidance as well as social comfort distance. At last, experiments on the generated dataset from CrowdNav verify how close our model would act like a human being in the trajectory aspect and also how it could guide the multi-agents by avoiding any collision. Under the same evaluation metrics, CrowdGAIL shows better results compared with classic Social-GAN. 
    more » « less
  4. Inverse reinforcement learning (IRL) deals with estimating an agent’s utility function from its actions. In this paper, we consider how an agent can hide its strategy and mitigate an adversarial IRL attack; we call this inverse IRL (I-IRL). How should the decision maker choose its response to ensure a poor reconstruction of its strategy by an adversary performing IRL to estimate the agent’s strategy? This paper comprises four results: First, we present an adversarial IRL algorithm that estimates the agent’s strategy while controlling the agent’s utility function. Then, we propose an I-IRL result that mitigates the IRL algorithm used by the adversary. Our I-IRL results are based on revealed preference theory in microeconomics. The key idea is for the agent to deliberately choose sub-optimal responses so that its true strategy is sufficiently masked. Third, we give a sample complexity result for our main I-IRL result when the agent has noisy estimates of the adversary-specified utility function. Finally, we illustrate our I-IRL scheme in a radar problem where a meta-cognitive radar is trying to mitigate an adversarial target. 
    more » « less
  5. Most existing policy learning solutions require the learning agents to receive high-quality supervision signals, e.g., rewards in reinforcement learning (RL) or high-quality expert demonstrations in behavioral cloning (BC). These quality supervisions are either infeasible or prohibitively expensive to obtain in practice. We aim for a unified framework that leverages the available cheap weak supervisions to perform policy learning efficiently. To handle this problem, we treat the weak supervision'' as imperfect information coming from a peer agent, and evaluate the learning agent's policy based on a correlated agreement'' with the peer agent's policy (instead of simple agreements). Our approach explicitly punishes a policy for overfitting to the weak supervision. In addition to theoretical guarantees, extensive evaluations on tasks including RL with noisy reward, BC with weak demonstrations, and standard policy co-training (RL + BC) show that our method leads to substantial performance improvements, especially when the complexity or the noise of the learning environments is high. 
    more » « less