skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Provably efficient Third-person imitation from Offline Observation
Domain adaptation in imitation learning represents an essential step towards improving gen- eralizability. However, even in the restricted setting of third-person imitation where transfer is between isomorphic Markov Decision Processes, there are no strong guarantees on the perfor- mance of transferred policies. We present problem-dependent, statistical learning guarantees for third-person imitation from observation in an offline setting, and a lower bound on performance in the online setting.  more » « less
Award ID(s):
1816753
PAR ID:
10185412
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Uncertainty in artificial intelligence
ISSN:
1525-3384
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Domain adaptation in imitation learning repre- sents an essential step towards improving gen- eralizability. However, even in the restricted setting of third-person imitation where trans- fer is between isomorphic Markov Decision Processes, there are no strong guarantees on the performance of transferred policies. We present problem-dependent, statistical learn- ing guarantees for third-person imitation from observation in an offline setting, and a lower bound on performance in an online setting. 
    more » « less
  2. Egocentric and exocentric perspectives of human action differ significantly, yet overcoming this extreme viewpoint gap is critical in augmented reality and robotics. We propose VIEWPOINTROSETTA, an approach that unlocks large-scale unpaired ego and exo video data to learn clip-level viewpoint-invariant video representations. Our framework introduces (1) a diffusion-based Rosetta Stone Translator (RST), which, leveraging a moderate amount of synchronized multi-view videos, serves as a translator in feature space to decipher the alignment between unpaired ego and exo data, and (2) a dual encoder that aligns unpaired data representations through contrastive learning with RST-based synthetic feature augmentation and soft alignment. To evaluate the learned features in a standardized setting, we construct a new cross-view benchmark using Ego-Exo4D, covering cross-view retrieval, action recognition, and skill assessment tasks. Our framework demonstrates superior cross-view understanding compared to previous view-invariant learning and ego video representation learning approaches, and opens the door to bringing vast amounts of traditional third-person video to bear on the more nascent first-person setting. 
    more » « less
  3. Storkel, Holly L; Mills, Monique M (Ed.)
    Purpose: The purpose of this assessment-focused clinical focus article is to increase familiarity with African American English (AAE)–speaking children’s pattern of language use in third- person singular contexts and to discuss implications for speech- language assessments of developing AAE-speaking children. Method: The clinical focus draws on descriptive case study data from four typically developing child speakers of AAE who are between the ages of 3 and 5 years. The children’s data from three different sources—sentence imitation, story retell, and play-based language samples—were subjected to linguistic analyses. Results: The three sources of linguistic data offered different insights into the children’s production of –s and other linguistic patterns in third-person singular contexts. Conclusions: This study underscores the importance of exploring developing child AAE from a descriptive approach to reveal different types of information about patterns of morphological marking in different linguistic contexts, which is crucial in assessing developing AAE. Implications for language assessment are discussed. 
    more » « less
  4. We consider the imitation learning problem of learning a policy in a Markov Decision Process (MDP) setting where the reward function is not given, but demonstrations from experts are available. Although the goal of imitation learning is to learn a policy that produces behaviors nearly as good as the experts’ for a desired task, assumptions of consistent optimality for demonstrated behaviors are often violated in practice. Finding a policy that is distributionally robust against noisy demonstrations based on an adversarial construction potentially solves this problem by avoiding optimistic generalizations of the demonstrated data. This paper studies Distributionally Robust Imitation Learning (DRoIL) and establishes a close connection between DRoIL and Maximum Entropy Inverse Reinforcement Learning. We show that DRoIL can be seen as a framework that maximizes a generalized concept of entropy. We develop a novel approach to transform the objective function into a convex optimization problem over a polynomial number of variables for a class of loss functions that are additive over state and action spaces. Our approach lets us optimize both stationary and non-stationary policies and, unlike prevalent previous methods, it does not require repeatedly solving an inner reinforcement learning problem. We experimentally show the significant benefits of DRoIL’s new optimization method on synthetic data and a highway driving environment. 
    more » « less
  5. When applying imitation learning techniques to fit a policy from expert demonstrations, one can take advantage of prior stability/robustness assumptions on the expert's policy and incorporate such control-theoretic prior knowledge explicitly into the learning process. In this paper, we formulate the imitation learning of linear policies as a constrained optimization problem, and present efficient methods which can be used to enforce stability and robustness constraints during the learning processes. Specifically, we show that one can guarantee the closed-loop stability and robustness by posing linear matrix inequality (LMI) constraints on the fitted policy. Then both the projected gradient descent method and the alternating direction method of multipliers (ADMM) method can be applied to solve the resultant constrained policy fitting problem. Finally, we provide numerical results to demonstrate the effectiveness of our methods in producing linear polices with various stability and robustness guarantees. 
    more » « less