skip to main content


Title: A structured prediction approach for robot imitation learning

We propose a structured prediction approach for robot imitation learning from demonstrations. Among various tools for robot imitation learning, supervised learning has been observed to have a prominent role. Structured prediction is a form of supervised learning that enables learning models to operate on output spaces with complex structures. Through the lens of structured prediction, we show how robots can learn to imitate trajectories belonging to not only Euclidean spaces but also Riemannian manifolds. Exploiting ideas from information theory, we propose a class of loss functions based on the f-divergence to measure the information loss between the demonstrated and reproduced probabilistic trajectories. Different types of f-divergence will result in different policies, which we call imitation modes. Furthermore, our approach enables the incorporation of spatial and temporal trajectory modulation, which is necessary for robots to be adaptive to the change in working conditions. We benchmark our algorithm against state-of-the-art methods in terms of trajectory reproduction and adaptation. The quantitative evaluation shows that our approach outperforms other algorithms regarding both accuracy and efficiency. We also report real-world experimental results on learning manifold trajectories in a polishing task with a KUKA LWR robot arm, illustrating the effectiveness of our algorithmic framework.

 
more » « less
NSF-PAR ID:
10472437
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
SAGE Publications
Date Published:
Journal Name:
The International Journal of Robotics Research
Volume:
43
Issue:
2
ISSN:
0278-3649
Format(s):
Medium: X Size: p. 113-133
Size(s):
p. 113-133
Sponsoring Org:
National Science Foundation
More Like this
  1. Designing reward functions is a difficult task in AI and robotics. The complex task of directly specifying all the desirable behaviors a robot needs to optimize often proves challenging for humans. A popular solution is to learn reward functions using expert demonstrations. This approach, however, is fraught with many challenges. Some methods require heavily structured models, for example, reward functions that are linear in some predefined set of features, while others adopt less structured reward functions that may necessitate tremendous amounts of data. Moreover, it is difficult for humans to provide demonstrations on robots with high degrees of freedom, or even quantifying reward values for given trajectories. To address these challenges, we present a preference-based learning approach, where human feedback is in the form of comparisons between trajectories. We do not assume highly constrained structures on the reward function. Instead, we employ a Gaussian process to model the reward function and propose a mathematical formulation to actively fit the model using only human preferences. Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework. We further analyze our algorithm in comparison to several baselines on reward optimization, where the goal is to find the optimal robot trajectory in a data-efficient way instead of learning the reward function for every possible trajectory. Our results in three different simulation experiments and a user study show our approach can efficiently learn expressive reward functions for robotic tasks, and outperform the baselines in both reward learning and reward optimization.

     
    more » « less
  2. Human-robot collaboration (HRC) has become an integral element of many industries, including manufacturing. A fundamental requirement for safe HRC is to understand and predict human intentions and trajectories, especially when humans and robots operate in close proximity. However, predicting both human intention and trajectory components simultaneously remains a research gap. In this paper, we have developed a multi-task learning (MTL) framework designed for HRC, which processes motion data from both human and robot trajectories. The first task predicts human trajectories, focusing on reconstructing the motion sequences. The second task employs supervised learning, specifically a Support Vector Machine (SVM), to predict human intention based on the latent representation. In addition, an unsupervised learning method, Hidden Markov Model (HMM), is utilized for human intention prediction that offers a different approach to decoding the latent features. The proposed framework uses MTL to understand human behavior in complex manufacturing environments. The novelty of the work includes the use of a latent representation to capture temporal dynamics in human motion sequences and a comparative analysis of various encoder architectures. We validate our framework through a case study focused on a HRC disassembly desktop task. The findings confirm the system's capability to accurately predict both human intentions and trajectories. 
    more » « less
  3. null (Ed.)
    Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to quantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose f-GAIL – a new generative adversarial imitation learning model – that automatically learns a discrepancy measure from the f-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, f-GAIL learns better policies with higher data efficiency in six physics-based control tasks. 
    more » « less
  4. Abstract

    Training language-conditioned policies is typically time-consuming and resource-intensive. Additionally, the resulting controllers are tailored to the specific robot they were trained on, making it difficult to transfer them to other robots with different dynamics. To address these challenges, we propose a new approach called Hierarchical Modularity, which enables more efficient training and subsequent transfer of such policies across different types of robots. The approach incorporates Supervised Attention which bridges the gap between modular and end-to-end learning by enabling the re-use of functional building blocks. In this contribution, we build upon our previous work, showcasing the extended utilities and improved performance by expanding the hierarchy to include new tasks and introducing an automated pipeline for synthesizing a large quantity of novel objects. We demonstrate the effectiveness of this approach through extensive simulated and real-world robot manipulation experiments.

     
    more » « less
  5. Legged locomotion is a highly promising but under–researched subfield within the field of soft robotics. The compliant limbs of soft-limbed robots offer numerous benefits, including the ability to regulate impacts, tolerate falls, and navigate through tight spaces. These robots have the potential to be used for various applications, such as search and rescue, inspection, surveillance, and more. The state-of-the-art still faces many challenges, including limited degrees of freedom, a lack of diversity in gait trajectories, insufficient limb dexterity, and limited payload capabilities. To address these challenges, we develop a modular soft-limbed robot that can mimic the locomotion of pinnipeds. By using a modular design approach, we aim to create a robot that has improved degrees of freedom, gait trajectory diversity, limb dexterity, and payload capabilities. We derive a complete floating-base kinematic model of the proposed robot and use it to generate and experimentally validate a variety of locomotion gaits. Results show that the proposed robot is capable of replicating these gaits effectively. We compare the locomotion trajectories under different gait parameters against our modeling results to demonstrate the validity of our proposed gait models. 
    more » « less