skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems
This paper presents an inverse reinforcement learning (IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed strategies. Our IRL algorithm identifies optimality and then constructs set-valued estimates of the cost function. To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search. As a real-world example, we illustrate using a YouTube dataset comprising metadata from 190000 videos how the proposed IRL method predicts user engagement in online multimedia platforms with high accuracy. Finally, for finite datasets, we propose an IRL detection algorithm and give finite sample bounds on its error probabilities.  more » « less
Award ID(s):
2112457
PAR ID:
10425757
Author(s) / Creator(s):
Date Published:
Journal Name:
Journal of machine learning research
Volume:
24
ISSN:
1533-7928
Page Range / eLocation ID:
1-64
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Inverse reinforcement learning (IRL) deals with estimating an agent’s utility function from its actions. In this paper, we consider how an agent can hide its strategy and mitigate an adversarial IRL attack; we call this inverse IRL (I-IRL). How should the decision maker choose its response to ensure a poor reconstruction of its strategy by an adversary performing IRL to estimate the agent’s strategy? This paper comprises four results: First, we present an adversarial IRL algorithm that estimates the agent’s strategy while controlling the agent’s utility function. Then, we propose an I-IRL result that mitigates the IRL algorithm used by the adversary. Our I-IRL results are based on revealed preference theory in microeconomics. The key idea is for the agent to deliberately choose sub-optimal responses so that its true strategy is sufficiently masked. Third, we give a sample complexity result for our main I-IRL result when the agent has noisy estimates of the adversary-specified utility function. Finally, we illustrate our I-IRL scheme in a radar problem where a meta-cognitive radar is trying to mitigate an adversarial target. 
    more » « less
  2. Most of the existing methodologies for control of Gene Regulatory Networks (GRNs) assume that the immediate cost function at each state and time point is fully known. In this paper, we introduce an optimal control strategy for control of GRNs with unknown or partially-known immediate cost function. Toward this, we adopt a partially-observed Boolean dynamical system (POBDS) model for the GRN and propose an Inverse Reinforcement Learning (IRL) methodology for quantifying the imperfect behavior of experts, obtained via prior biological knowledge or experimental data. The constructed cost function then is used in finding the optimal infinite-horizon control strategy for the POBDS. The application of the proposed method using a single sequence of experimental data is investigated through numerical experiments using a melanoma gene regulatory network. 
    more » « less
  3. Agent-based modeling (ABM) assumes that behavioral rules affecting an agent's states and actions are known. However, discovering these rules is often challenging and requires deep insight about an agent's behaviors. Inverse reinforcement learning (IRL) can complement ABM by providing a systematic way to find behavioral rules from data. IRL frames learning behavioral rules as a problem of recovering motivations from observed behavior and generating rules consistent with these motivations. In this paper, we propose a method to construct an agent-based model directly from data using IRL. We explain each step of the proposed method and describe challenges that may occur during implementation. Our experimental results show that the proposed method can extract rules and construct an agent-based model with rich but concise behavioral rules for agents while still maintaining aggregate-level properties. 
    more » « less
  4. Inverse reinforcement Learning (IRL) has emerged as a powerful paradigm for extracting expert skills from observed behavior, with applications ranging from autonomous systems to humanrobot interaction. However, the identifiability issue within IRL poses a significant challenge, as multiple reward functions can explain the same observed behavior. This paper provides a linear algebraic characterization of several identifiability notions for an entropy-regularized finite horizon Markov decision process (MDP). Moreover, our approach allows for the seamless integration of prior knowledge, in the form of featurized reward functions, to enhance the identifiability of IRL problems. The results are demonstrated with experiments on a grid world environment. 
    more » « less
  5. Control of gene regulatory networks (GRNs) to shift gene expression from undesirable states to desirable ones has received much attention in recent years. Most of the existing methods assume that the cost of intervention at each state and time point, referred to as the immediate cost function, is fully known. In this paper, we employ the Partially-Observed Boolean Dynamical System (POBDS) signal model for a time sequence of noisy expression measurement from a Boolean GRN and develop a Bayesian Inverse Reinforcement Learning (BIRL) approach to address the realistic case in which the only available knowledge regarding the immediate cost function is provided by the sequence of measurements and interventions recorded in an experimental setting by an expert. The Boolean Kalman Smoother (BKS) algorithm is used for optimally mapping the available gene-expression data into a sequence of Boolean states, and then the BIRL method is efficiently combined with the Q-learning algorithm for quantification of the immediate cost function. The performance of the proposed methodology is investigated by applying a state-feedback controller to two GRN models: a melanoma WNT5A Boolean network and a p53-MDM2 negative feedback loop Boolean network, when the cost of the undesirable states, and thus the identity of the undesirable genes, is learned using the proposed methodology. 
    more » « less