skip to main content


This content will become publicly available on October 1, 2024

Title: Skill Generalization with Verbs
It is imperative that robots can understand natural language commands issued by humans. Such commands typically contain verbs that signify what action should be performed on a given object and that are applicable to many objects. We propose a method for generalizing manipulation skills to novel objects using verbs. Our method learns a probabilistic classifier that determines whether a given object trajectory can be described by a specific verb. We show that this classifier accurately generalizes to novel object categories with an average accuracy of 76.69% across 13 object categories and 14 verbs. We then perform policy search over the object kinematics to find an object trajectory that maximizes classifier prediction for a given verb. Our method allows a robot to generate a trajectory for a novel object based on a verb, which can then be used as input to a motion planner. We show that our model can generate trajectories that are usable for executing five verb commands applied to novel instances of two different object categories on a real robot.  more » « less
Award ID(s):
1955361
NSF-PAR ID:
10467322
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Noun incorporation is commonly thought to avoid the weak compositionality of compounds because it involves conjunction of an argument noun with the incorporating verb. However, it is weakly compositional in two ways. First, the noun’s entity argument needs to be bound or saturated, but previous accounts fail to adequately ensure that it is. Second, non-arguments are often incorporated in many languages, and their thematic role is available for contextual selection. We show that these two weaknesses are actually linked. We focus on the Kiowa language, which generally bars objects from incorporation but allows non-arguments. We show that a mediating relation is required to semantically link the noun to the verb. Absent a relation, the noun’s entity argument is not saturated, and the entire expression is uninterpretable. The mediating relation for non-objects also assigns it a thematic role instead of a postposition. Speakers can choose this role freely, subject to independent constraints from the pragmatics, syntax, and semantics. Objects in Kiowa are in fact allowed to incorporate in certain environments, but we show that these all independently involve a mediating relation. The mediating relation for objects quantifies over the noun and links the noun+verb construction to the rest of the clause. The head that introduces this relation re-categorizes the verb in the syntactic derivation. Essentially, we demonstrate two distinct mechanisms for noun incorporation. Having derived the distribution of Kiowa, we apply the same relations to derive constraints on English complex verbs and synthetic compounds, which exhibit most of the same constraints as Kiowa noun incorporation. We also look at languages with routine object incorporation, and show how the transitivity of the verb depends on whether the v° head introducing the external argument assigns case to the re-categorized verb. 
    more » « less
  2. null (Ed.)
    As autonomous robots interact and navigate around real-world environments such as homes, it is useful to reliably identify and manipulate articulated objects, such as doors and cabinets. Many prior works in object articulation identification require manipulation of the object, either by the robot or a human. While recent works have addressed predicting articulation types from visual observations alone, they often assume prior knowledge of category-level kinematic motion models or sequence of observations where the articulated parts are moving according to their kinematic constraints. In this work, we propose FormNet, a neural network that identifies the articulation mechanisms between pairs of object parts from a single frame of an RGB-D image and segmentation masks. The network is trained on 100k synthetic images of 149 articulated objects from 6 categories. Synthetic images are rendered via a photorealistic simulator with domain randomization. Our proposed model predicts motion residual flows of object parts, and these flows are used to determine the articulation type and parameters. The network achieves an articulation type classification accuracy of 82.5% on novel object instances in trained categories. Experiments also show how this method enables generalization to novel categories and can be applied to real-world images without fine-tuning. 
    more » « less
  3. Abstract

    Verb learning is difficult for children (Gentner,), partially because children have a bias to associate a novel verb not only with the action it represents, but also with the object on which it is learned (Kersten & Smith,). Here we investigate how well 4‐ and 5‐year‐old children (N= 48) generalize novel verbs for actions on objects after doing or seeing the action (e.g., twisting a knob on an object) or after doing or seeing a gesture for the action (e.g., twisting in the air near an object). We find not only that children generalize more effectively through gesture experience, but also that this ability to generalize persists after a 24‐hour delay.

     
    more » « less
  4. null (Ed.)
    This work presents ideation and preliminary results of using contextual information and information of the objects present in the scene to query applicable social navigation rules for the sensed context. Prior work in socially-Aware Navigation (SAN) shows its importance in human-robot interaction as it improves the interaction quality, safety and comfort of the interacting partner. In this work, we are interested in automatic detection of social rules in SAN and we present three major components of our method, namely: a Convolutional Neural Network-based context classifier that can autonomously perceive contextual information using camera input; a YOLO-based object detection to localize objects with a scene; and a knowledge base of social rules relationships with the concepts to query them using both contextual and detected objects in the scene. Our preliminary results suggest that our approach can observe an on-going interaction, given an image input, and use that information to query the social navigation rules required in that particular context. 
    more » « less
  5. Abstract— A core capability of robots is to reason about mul- tiple objects under uncertainty. Partially Observable Markov Decision Processes (POMDPs) provide a means of reasoning under uncertainty for sequential decision making, but are computationally intractable in large domains. In this paper, we propose Object-Oriented POMDPs (OO-POMDPs), which represent the state and observation spaces in terms of classes and objects. The structure afforded by OO-POMDPs support a factorization of the agent’s belief into independent object distributions, which enables the size of the belief to scale linearly versus exponentially in the number of objects. We formulate a novel Multi-Object Search (MOS) task as an OO-POMDP for mobile robotics domains in which the agent must find the locations of multiple objects. Our solution exploits the structure of OO-POMDPs by featuring human language to selectively update the belief at task onset. Using this structure, we develop a new algorithm for efficiently solving OO-POMDPs: Object- Oriented Partially Observable Monte-Carlo Planning (OO- POMCP). We show that OO-POMCP with grounded language commands is sufficient for solving challenging MOS tasks both in simulation and on a physical mobile robot. 
    more » « less