skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning From Human Directional Corrections
This paper proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to over-corrections and learning inefficiency. The proposed method only requires human directional corrections — corrections that only indicate the direction of an input change without indicating its magnitude. We only assume that each correction, regardless of its magnitude, points in a direction that improves the robot’s current motion relative to an unknown objective function. The allowable corrections satisfying this assumption account for half of the input space, as opposed to the magnitude corrections which have to lie in a shrinking level set. For each directional correction, the proposed method updates the estimate of the objective function based on a cutting plane method, which has a geometric interpretation. We have established theoretical results to show the convergence of the learning process. The proposed method has been tested in numerical examples, a user study on two human-robot games, and a real-world quadrotor experiment. The results confirm the convergence of the proposed method and further show that the method is significantly more effective (higher success rate), efficient/effortless (less human corrections needed), and potentially more accessible (fewer early wasted trials) than the state-of-the-art robot learning frameworks.  more » « less
Award ID(s):
1837515
PAR ID:
10471788
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Robotics
Volume:
39
Issue:
1
ISSN:
1552-3098
Page Range / eLocation ID:
625 to 644
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Robots can learn to imitate humans by inferring what the human is optimizing for. One common framework for this is Bayesian reward learning, where the robot treats the human's demonstrations and corrections as observations of their underlying reward function. Unfortunately, this inference is doubly-intractable: the robot must reason over all the trajectories the person could have provided and all the rewards the person could have in mind. Prior work uses existing robotic tools to approximate this normalizer. In this letter, we group previous approaches into three fundamental classes and analyze the theoretical pros and cons of their approach. We then leverage recent research from the statistics community to introduce Double MH reward learning, a Monte Carlo method for asymptotically learning the human's reward in continuous spaces. We extend Double MH to conditionally independent settings (where each human correction is viewed as completely separate) and conditionally dependent environments (where the human's current correction may build on previous inputs). Across simulations and user studies, our proposed approach infers the human's reward parameters more accurately than the alternate approximations when learning from either demonstrations or corrections. 
    more » « less
  2. Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine some interaction types. Some methods do so by assuming that the robot has prior information about the features of the task and the reward structure. By contrast, in this article, we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human’s input to nearby alternatives, i.e., trajectories close to the human’s feedback. We first derive a loss function that trains an ensemble of reward models to match the human’s demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: We enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study, we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at https://youtu.be/FSUJsTYvEKU 
    more » « less
  3. null (Ed.)
    This paper proposes a dynamic system based learning from demonstration approach to teach a robot activities of daily living. The approach takes inspiration from human movement literature to formulate trajectory learning as an optimal control problem.We assume a weighted combination of basis objective functions is the true objective function for a demonstrated motion. We derive basis objective functions analogous to those in human movement literature to optimize the robot’s motion. This method aims to naturally adapt the learned motion in different situations. To validate our approach, we learn motions from two categories: 1) commonly prescribed therapeutic exercises and 2) tea making. We show the reproduction accuracy of our method and compare torque requirements to the dynamic motion primitive for each motion, with and without an added load. 
    more » « less
  4. This paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot’s actual execution. The method jointly finds an objective function and a time-warping function such that the robot’s resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent, by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions. 
    more » « less
  5. Abstract We examine how a human–robot interaction (HRI) system may be designed when input–output data from previous experiments are available. Our objective is to learn an optimal impedance in the assistance design for a cooperative manipulation task with a new operator. Due to the variability between individuals, the design parameters that best suit one operator of the robot may not be the best parameters for another one. However, by incorporating historical data using a linear autoregressive (AR-1) Gaussian process, the search for a new operator’s optimal parameters can be accelerated. We lay out a framework for optimizing the human–robot cooperative manipulation that only requires input–output data. We characterize the learning performance using a notion called regret, establish how the AR-1 model improves the bound on the regret, and numerically illustrate this improvement in the context of a human–robot cooperative manipulation task. Furthermore, we show how our approach’s input–output nature provides robustness against modeling error through an additional numerical study. 
    more » « less