skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning From Sparse Demonstrations
This paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot’s actual execution. The method jointly finds an objective function and a time-warping function such that the robot’s resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent, by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions.  more » « less
Award ID(s):
1837515
PAR ID:
10471787
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Robotics
Volume:
39
Issue:
1
ISSN:
1552-3098
Page Range / eLocation ID:
645 to 664
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This paper proposes a dynamic system based learning from demonstration approach to teach a robot activities of daily living. The approach takes inspiration from human movement literature to formulate trajectory learning as an optimal control problem.We assume a weighted combination of basis objective functions is the true objective function for a demonstrated motion. We derive basis objective functions analogous to those in human movement literature to optimize the robot’s motion. This method aims to naturally adapt the learned motion in different situations. To validate our approach, we learn motions from two categories: 1) commonly prescribed therapeutic exercises and 2) tea making. We show the reproduction accuracy of our method and compare torque requirements to the dynamic motion primitive for each motion, with and without an added load. 
    more » « less
  2. Robots and humans closely working together within dynamic environments must be able to continuously look ahead and identify potential collisions within their ever-changing environment. To enable the robot to act upon such situational awareness, its controller requires an iterative collision detection capability that will allow for computationally efficient Proactive Adaptive Collaboration Intelligence (PACI) to ensure safe interactions. In this paper, an algorithm is developed to evaluate a robot’s trajectory, evaluate the dynamic environment that the robot operates in, and predict collisions between the robot and dynamic obstacles in its environment. This algorithm takes as input the joint motion data of predefined robot execution plans and constructs a sweep of the robot’s instantaneous poses throughout time. The sweep models the trajectory as a point cloud containing all locations occupied by the robot and the time at which they will be occupied. To reduce the computational burden, Coons patches are leveraged to approximate the robot’s instantaneous poses. In parallel, the algorithm creates a similar sweep to model any human(s) and other obstacles being tracked in the operating environment. Overlaying temporal mapping of the sweeps reveals anticipated collisions that will occur if the robot-human do not proactively modify their motion. The algorithm is designed to feed into a segmentation and switching logic framework and provide real-time proactive-n-reactive behavior for different levels of human-robot interactions, while maintaining safety and production efficiency. To evaluate the predictive collision detection approach, multiple test cases are presented to quantify the computational speed and accuracy in predicting collisions. 
    more » « less
  3. This paper proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to over-corrections and learning inefficiency. The proposed method only requires human directional corrections — corrections that only indicate the direction of an input change without indicating its magnitude. We only assume that each correction, regardless of its magnitude, points in a direction that improves the robot’s current motion relative to an unknown objective function. The allowable corrections satisfying this assumption account for half of the input space, as opposed to the magnitude corrections which have to lie in a shrinking level set. For each directional correction, the proposed method updates the estimate of the objective function based on a cutting plane method, which has a geometric interpretation. We have established theoretical results to show the convergence of the learning process. The proposed method has been tested in numerical examples, a user study on two human-robot games, and a real-world quadrotor experiment. The results confirm the convergence of the proposed method and further show that the method is significantly more effective (higher success rate), efficient/effortless (less human corrections needed), and potentially more accessible (fewer early wasted trials) than the state-of-the-art robot learning frameworks. 
    more » « less
  4. Robot arms should be able to learn new tasks. One framework here is reinforcement learning, where the robot is given a reward function that encodes the task, and the robot autonomously learns actions to maximize its reward. Existing approaches to reinforcement learning often frame this problem as a Markov decision process, and learn a policy (or a hierarchy of policies) to complete the task. These policies reason over hundreds of fine-grained actions that the robot arm needs to take: e.g., moving slightly to the right or rotating the end-effector a few degrees. But the manipulation tasks that we want robots to perform can often be broken down into a small number of high-level motions: e.g., reaching an object or turning a handle. In this paper we therefore propose a waypoint-based approach for model-free reinforcement learning. Instead of learning a low-level policy, the robot now learns a trajectory of waypoints, and then interpolates between those waypoints using existing controllers. Our key novelty is framing this waypoint-based setting as a sequence of multi-armed bandits: each bandit problem corresponds to one waypoint along the robot’s motion. We theoretically show that an ideal solution to this reformulation has lower regret bounds than standard frameworks. We also introduce an approximate posterior sampling solution that builds the robot’s motion one waypoint at a time. Results across benchmark simulations and two real-world experiments suggest that this proposed approach learns new tasks more quickly than state-of-the-art baselines. See our website here: https://collab.me.vt.edu/rl-waypoints/ 
    more » « less
  5. In this paper, we examine the problem of push recovery for bipedal robot locomotion and present a reactive decision-making and robust planning framework for locomotion resilient to external perturbations. Rejecting perturbations is an essential capability of bipedal robots and has been widely studied in the locomotion literature. However, adversarial disturbances and aggressive turning can lead to negative lateral step width (i.e., crossed-leg scenarios) with unstable motions and self-collision risks. These motion planning problems are computationally difficult and have not been explored under a hierarchically integrated task and motion planning method. We explore a planning and decision-making framework that closely ties linear-temporal-logic-based reactive synthesis with trajectory optimization incorporating the robot’s full-body dynamics, kinematics, and leg collision avoidance constraints. Between the high-level discrete symbolic decision-making and the low-level continuous motion planning, behavior trees serve as a reactive interface to handle perturbations occurring at any time of the locomotion process. Our experimental results show the efficacy of our method in generating resilient recovery behaviors in response to diverse perturbations from any direction with bounded magnitudes. 
    more » « less