skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Towards Robust Skill Generalization: Unifying Learning from Demonstration and Motion Planning
In this paper, we present Combined Learning from demonstration And Motion Planning (CLAMP) as an efficient approach to skill learning and generalizable skill reproduction. CLAMP combines the strengths of Learning from Demonstration (LfD) and motion planning into a unifying framework. We carry out probabilistic inference to find trajectories which are optimal with respect to a given skill and also feasible in different scenarios. We use factor graph optimization to speed up inference. To encode optimality, we provide a new probabilistic skill model based on a stochastic dynamical system. This skill model requires minimal parameter tuning to learn, is suitable to encode skill constraints, and allows efficient inference. Preliminary experimental results showing skill generalization over initial robot state and unforeseen obstacles are presented.  more » « less
Award ID(s):
1637562
PAR ID:
10076786
Author(s) / Creator(s):
Date Published:
Journal Name:
Conference on Robot Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Learning a robot motor skill from scratch is impractically slow; so much so that in practice, learning must typically be bootstrapped using human demonstration. However, relying on human demonstration necessarily degrades the autonomy of robots that must learn a wide variety of skills over their operational lifetimes. We propose using kinematic motion planning as a completely autonomous, sample efficient way to bootstrap motor skill learning for object manipulation. We demonstrate the use of motion planners to bootstrap motor skills in two complex object manipulation scenarios with different policy representations: opening a drawer with a dynamic movement primitive representation, and closing a microwave door with a deep neural network policy. We also show how our method can bootstrap a motor skill for the challenging dynamic task of learning to hit a ball off a tee, where a kinematic plan based on treating the scene as static is insufficient to solve the task, but sufficient to bootstrap a more dynamic policy. In all three cases, our method is competitive with human-demonstrated initialization, and significantly outperforms starting with a random policy. This approach enables robots to to efficiently and autonomously learn motor policies for dynamic tasks without human demonstration. 
    more » « less
  2. This thesis summary presents research focused on incorporating high-level abstract behavioral requirements, called 'conceptual constraints', into the modeling processes of robot Learning from Demonstration (LfD) techniques. This idea is realized via an LfD algorithm called Concept Constrained Learning from Demonstration. This algorithm encodes motion planning constraints as temporally associated logical formulae of Boolean operators that enforce high-level constraints over portions of the robot's motion plan during learned skill execution. This results in more easily trained, more robust, and safer learned skills. Current work focuses on automating constraint discovery, introducing conceptual constraints into human-aware motion planning algorithms, and expanding upon trajectory alignment techniques for LfD. Future work will focus on how concept constrained algorithms and models are best incorporated into effective interfaces for end-users. 
    more » « less
  3. This paper focuses on inverse reinforcement learning (IRL) to enable safe and efficient autonomous navigation in unknown partially observable environments. The objective is to infer a cost function that explains expert-demonstrated navigation behavior while relying only on the observations and state-control trajectory used by the expert. We develop a cost function representation composed of two parts: a probabilistic occupancy encoder, with recurrent dependence on the observation sequence, and a cost encoder, defined over the occupancy features. The representation parameters are optimized by differentiating the error between demonstrated controls and a control policy computed from the cost encoder. Such differentiation is typically computed by dynamic programming through the value function over the whole state space. We observe that this is inefficient in large partially observable environments because most states are unexplored. Instead, we rely on a closed-form subgradient of the cost-to-go obtained only over a subset of promising states via an efficient motion-planning algorithm such as A* or RRT. Our experiments show that our model exceeds the accuracy of baseline IRL algorithms in robot navigation tasks, while substantially improving the efficiency of training and test-time inference. 
    more » « less
  4. null (Ed.)
    The objective of this work is to augment the basic abilities of a robot by learning to use sensorimotor primitives to solve complex long-horizon manipulation problems. This requires flexible generative planning that can combine primitive abilities in novel combinations and, thus, generalize across a wide variety of problems. In order to plan with primitive actions, we must have models of the actions: under what circumstances will executing this primitive successfully achieve some particular effect in the world? We use, and develop novel improvements to, state-of-the-art methods for active learning and sampling. We use Gaussian process methods for learning the constraints on skill effectiveness from small numbers of expensive-to-collect training examples. In addition, we develop efficient adaptive sampling methods for generating a comprehensive and diverse sequence of continuous candidate control parameter values (such as pouring waypoints for a cup) during planning. These values become end-effector goals for traditional motion planners that then solve for a full robot motion that performs the skill. By using learning and planning methods in conjunction, we take advantage of the strengths of each and plan for a wide variety of complex dynamic manipulation tasks. We demonstrate our approach in an integrated system, combining traditional robotics primitives with our newly learned models using an efficient robot task and motion planner. We evaluate our approach both in simulation and in the real world through measuring the quality of the selected primitive actions. Finally, we apply our integrated system to a variety of long-horizon simulated and real-world manipulation problems. 
    more » « less
  5. Despite many successful examples in which probabilistic inference can account for perception, we have little understanding of how the brain represents and uses structured priors that capture the complexity of natural input statistics. Here we construct a recurrent circuit model that can implicitly represent priors over latent variables, and combine them with sensory and contextual sources of information to encode task-specific posteriors. Inspired by the recent success of diffusion models as means of learning and using priors over images, our model uses dendritic nonlinearities optimized for denoising, and stochastic somatic integration with the degree of noise modulated by an oscillating global signal. Combining these elements into a recurrent network yields a stochastic dynamical system that samples from the prior at a rate prescribed by the period of the global oscillator. Additional inputs reflecting sensory or top-down contextual information alter these dynamics to generate samples from the corresponding posterior, with different input gating patterns selecting different inference tasks. We demonstrate that this architecture can sample from low dimensional nonlinear manifolds and multimodal posteriors. Overall, the model provides a new framework for circuit-level representation of probabilistic information, in a format that facilitates flexible inference. 
    more » « less