skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems
This paper presents an inverse reinforcement learning (IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed strategies. Our IRL algorithm identifies optimality and then constructs set-valued estimates of the cost function. To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search. As a real-world example, we illustrate using a YouTube dataset comprising metadata from 190000 videos how the proposed IRL method predicts user engagement in online multimedia platforms with high accuracy. Finally, for finite datasets, we propose an IRL detection algorithm and give finite sample bounds on its error probabilities.  more » « less
Award ID(s):
2112457
PAR ID:
10425757
Author(s) / Creator(s):
Date Published:
Journal Name:
Journal of machine learning research
Volume:
24
ISSN:
1533-7928
Page Range / eLocation ID:
1-64
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics (PSGLD) algo- rithm. This algorithm is designed to achieve adaptive inverse reinforcement learning (IRL). Adaptive IRL aims to estimate the cost function of a forward learner performing a stochastic gradient algorithm (e.g., policy gradient reinforcement learning) by observing their estimates in real-time. The PSGLD algorithm is considered passive because it incorporates noisy gradients provided by an external stochastic gradient algorithm (forward learner), of which it has no control. The PSGLD algorithm acts as a randomized sampler to achieve adaptive IRL by reconstructing the forward learner’s cost function nonparametrically from the stationary measure of a Langevin diffusion. This paper analyzes the non-asymptotic (finite-sample) performance; we provide explicit bounds on the 2-Wasserstein distance between PSGLD algorithm sample measure and the stationary measure encoding the cost function, and provide guarantees for a kernel density estimation scheme which reconstructs the cost function from empirical samples. Our analysis uses tools from the study of Markov diffusion operators. The derived bounds have both practical and theoretical significance. They provide finite-time guarantees for an adaptive IRL mechanism, and substantially generalize the analytical framework of a line of research in passive stochastic gradient algorithms. 
    more » « less
  2. Inverse reinforcement learning (IRL) deals with estimating an agent’s utility function from its actions. In this paper, we consider how an agent can hide its strategy and mitigate an adversarial IRL attack; we call this inverse IRL (I-IRL). How should the decision maker choose its response to ensure a poor reconstruction of its strategy by an adversary performing IRL to estimate the agent’s strategy? This paper comprises four results: First, we present an adversarial IRL algorithm that estimates the agent’s strategy while controlling the agent’s utility function. Then, we propose an I-IRL result that mitigates the IRL algorithm used by the adversary. Our I-IRL results are based on revealed preference theory in microeconomics. The key idea is for the agent to deliberately choose sub-optimal responses so that its true strategy is sufficiently masked. Third, we give a sample complexity result for our main I-IRL result when the agent has noisy estimates of the adversary-specified utility function. Finally, we illustrate our I-IRL scheme in a radar problem where a meta-cognitive radar is trying to mitigate an adversarial target. 
    more » « less
  3. Most of the existing methodologies for control of Gene Regulatory Networks (GRNs) assume that the immediate cost function at each state and time point is fully known. In this paper, we introduce an optimal control strategy for control of GRNs with unknown or partially-known immediate cost function. Toward this, we adopt a partially-observed Boolean dynamical system (POBDS) model for the GRN and propose an Inverse Reinforcement Learning (IRL) methodology for quantifying the imperfect behavior of experts, obtained via prior biological knowledge or experimental data. The constructed cost function then is used in finding the optimal infinite-horizon control strategy for the POBDS. The application of the proposed method using a single sequence of experimental data is investigated through numerical experiments using a melanoma gene regulatory network. 
    more » « less
  4. Agent-based modeling (ABM) assumes that behavioral rules affecting an agent's states and actions are known. However, discovering these rules is often challenging and requires deep insight about an agent's behaviors. Inverse reinforcement learning (IRL) can complement ABM by providing a systematic way to find behavioral rules from data. IRL frames learning behavioral rules as a problem of recovering motivations from observed behavior and generating rules consistent with these motivations. In this paper, we propose a method to construct an agent-based model directly from data using IRL. We explain each step of the proposed method and describe challenges that may occur during implementation. Our experimental results show that the proposed method can extract rules and construct an agent-based model with rich but concise behavioral rules for agents while still maintaining aggregate-level properties. 
    more » « less
  5. Inverse reinforcement Learning (IRL) has emerged as a powerful paradigm for extracting expert skills from observed behavior, with applications ranging from autonomous systems to humanrobot interaction. However, the identifiability issue within IRL poses a significant challenge, as multiple reward functions can explain the same observed behavior. This paper provides a linear algebraic characterization of several identifiability notions for an entropy-regularized finite horizon Markov decision process (MDP). Moreover, our approach allows for the seamless integration of prior knowledge, in the form of featurized reward functions, to enhance the identifiability of IRL problems. The results are demonstrated with experiments on a grid world environment. 
    more » « less