skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Towards Robot Learning from Spoken Language
The paper proposes a robot learning framework that empowers a robot to automatically generate a sequence of actions from unstructured spoken language. The robot learning framework was able to distinguish between instructions and unrelated conversations. Data were collected from 25 participants, who were asked to instruct the robot to perform a collaborative cooking task while being interrupted and distracted. The system was able to identify the sequence of instructed actions for a cooking task with the accuracy of 92.85 ± 3.87%.  more » « less
Award ID(s):
2226165
NSF-PAR ID:
10433568
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’23 Companion)
Page Range / eLocation ID:
112 to 116
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Robots have the potential to assist people in daily tasks, such as cooking a meal. Communicating with the robots verbally and in an unstructured way is important, as spoken language is the main form of communication for humans. This paper proposes a novel framework that automatically generates robot actions from unstructured speech. The proposed framework was evaluated by collecting data from 15 participants preparing their meals while seating on a chair in a randomly disrupted environment. The system can identify and respond to a task sequence while the user may be engaged in unrelated conversations, even if the user’s speech might be unstructured and grammatically incorrect. The accuracy of the proposed system is 98.6%, which is a very promising finding. 
    more » « less
  2. In an efficient and flexible human-robot collaborative work environment, a robot team member must be able to recognize both explicit requests and implied actions from human users. Identifying “what to do” in such cases requires an agent to have the ability to construct associations between objects, their actions, and the effect of actions on the environment. In this regard, semantic memory is being introduced to understand the explicit cues and their relationships with available objects and required skills to make “tea” and “sandwich”. We have extended our previous hierarchical robot control architecture to add the capability to execute the most appropriate task based on both feedback from the user and the environmental context. To validate this system, two types of skills were implemented in the hierarchical task tree: 1) Tea making skills and 2) Sandwich making skills. During the conversation between the robot and the human, the robot was able to determine the hidden context using ontology and began to act accordingly. For instance, if the person says “I am thirsty” or “It is cold outside” the robot will start to perform the tea-making skill. In contrast, if the person says, “I am hungry” or “I need something to eat”, the robot will make the sandwich. A humanoid robot Baxter was used for this experiment. We tested three scenarios with objects at different positions on the table for each skill. We observed that in all cases, the robot used only objects that were relevant to the skill. 
    more » « less
  3. We propose a learning‐from‐demonstration approach for grounding actions from expert data and an algorithm for using these actions to perform a task in new environments. Our approach is based on an application of sampling‐based motion planning to search through the tree of discrete, high‐level actions constructed from a symbolic representation of a task. Recursive sampling‐based planning is used to explore the space of possible continuous‐space instantiations of these actions. We demonstrate the utility of our approach with a magnetic structure assembly task, showing that the robot can intelligently select a sequence of actions in different parts of the workspace and in the presence of obstacles. This approach can better adapt to new environments by selecting the correct high‐level actions for the particular environment while taking human preferences into account. 
    more » « less
  4. This paper tackles the task of goal-conditioned dynamic manipulation of deformable objects. This task is highly challenging due to its complex dynamics (introduced by object deformation and high-speed action) and strict task requirements (defined by a precise goal specification). To address these challenges, we present Iterative Residual Policy (IRP), a general learning framework applicable to repeatable tasks with complex dynamics. IRP learns an implicit policy via delta dynamics—instead of modeling the entire dynamical system and inferring actions from that model, IRP learns delta dynamics that predict the effects of delta action on the previously observed trajectory. When combined with adaptive action sampling, the system can quickly optimize its actions online to reach a specified goal. We demonstrate the effectiveness of IRP on two tasks: whipping a rope to hit a target point and swinging a cloth to reach a target pose. Despite being trained only in simulation on a fixed robot setup, IRP is able to efficiently generalize to noisy real-world dynamics, new objects with unseen physical properties, and even different robot hardware embodiments, demonstrating its excellent generalization capability relative to alternative approaches.

     
    more » « less
  5. As service robots become more capable of autonomous behaviors, it becomes increasingly important to consider how people will be able to communicate with a robot about what task it should perform and how to do the task. There has been a rise in attention to end-user development (EUD), where researchers create interfaces that enable non-roboticist end users to script tasks for autonomous robots to perform. Currently, state-of-the-art interfaces are largely constrained, often through simplified domains or restrictive end-user interaction. Motivated by our past qualitative design work exploring how to integrate a care robot in an assisted living community, we discuss challenges of EUD in this complex domain. One set of challenges stems from different user-facing representations, e.g., certain tasks may lend themselves better to a rule-based trigger-action representations, whereas other tasks may be easier to specify via a sequence of actions. The other stems from considering the needs of multiple stakeholders, e.g., caregivers and residents of the facility may all create tasks for the robot, but the robot may not be able to share information about all tasks with all residents due to privacy concerns. We present scenarios that illustrate these challenges and also discuss possible solutions. 
    more » « less