We present the Semantic Robot Programming (SRP) paradigm as a convergence of robot programming by demonstration and semantic mapping. In SRP, a user can directly program a robot manipulator by demonstrating a snapshot of their intended goal scene in workspace. The robot
then parses this goal as a scene graph comprised of object poses and inter-object relations, assuming known object geometries. Task and motion planning is then used to realize the user’s goal from an arbitrary initial scene configuration. Even when faced with different initial scene configurations, SRP enables the robot to seamlessly adapt to reach the user’s demonstrated goal. For scene perception, we propose the Discriminatively-Informed Generative Estimation of Scenes and Transforms (DIGEST)
method to infer the initial and goal states of the world from RGBD images. The efficacy of SRP with DIGEST perception is demonstrated for the task of tray-setting with a Michigan Progress Fetch robot. Scene perception and task execution are evaluated with a public household occlusion dataset and our
cluttered scene dataset.
more »
« less
Scene-level Programming by Demonstration
Scene-level Programming by Demonstration (PbD) is faced with an important challenge - perceptual uncertainty. Addressing this problem, we present a scene-level PbD paradigm that programs robots to perform goal-directed manipulation in unstructured environments with grounded perception. Scene estimation is enabled by our discriminatively-informed generative scene estimation method (DIGEST). Given scene observations, DIGEST utilizes candidates from discriminative object detectors to generate and evaluate hypothesized scenes of object poses. Scene graphs are generated from the estimated object poses, which in turn is used in the PbD system for high-level task planning. We demonstrate that DIGEST performs better than existing method and is robust to false positive detections. Building a PbD system on DIGEST, we show experiments of programming a Fetch robot to set up a tray for delivery with various objects through demonstration of goal scenes.
more »
« less
- Award ID(s):
- 1638047
- NSF-PAR ID:
- 10032700
- Date Published:
- Journal Name:
- arXiv.org
- ISSN:
- 2331-8422
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In order to perform autonomous sequential manipulation tasks, perception in cluttered scenes remains a critical challenge for robots. In this paper, we propose a probabilistic approach for robust sequential scene estimation and manipulation - Sequential Scene Understanding and Manipulation(SUM). SUM considers uncertainty due to discriminative object detection and recognition in the generative estimation of the most likely object poses maintained over time to achieve a robust estimation of the scene under heavy occlusions and unstructured environment. Our method utilizes candidates from discriminative object detector and recognizer to guide the generative process of sampling scene hypothesis, and each scene hypotheses is evaluated against the observations. Also SUM maintains beliefs of scene hypothesis over robot physical actions for better estimation and against noisy detections. We conduct extensive experiments to show that our approach is able to perform robust estimation and manipulation.more » « less
-
Performing robust goal-directed manipulation tasks remains a crucial challenge for autonomous robots. In an ideal case, shared autonomous control of manipulators would allow human users to specify their intent as a goal state and have the robot reason over the actions and motions to achieve this goal. However, realizing this goal remains elusive due to the problem of perceiving the robot’s environment. We address and describe the problem of axiomatic scene estimation for robot manipulation in cluttered scenes which is the estimation of a tree-structured scene graph describing the configuration of objects observed from robot sensing. We propose generative approaches to scene inference (as the axiomatic particle filter, and the axiomatic scene estimation by Markov chain Monte Carlo based sampler) of the robot’s environment as a scene graph. The result from AxScEs estimation are axioms amenable to goal-directed manipulation through symbolic inference for task planning and collision-free motion planning and execution. We demonstrate the results for goal-directed manipulation of multi-object scenes by a PR2 robot.more » « less
-
This paper introduces key machine learning operations that allow the realization of robust, joint 6D pose estimation of multiple instances of objects either densely packed or in unstructured piles from RGB-D data. The first objective is to learn semantic and instance-boundary detectors without manual labeling. An adversarial training framework in conjunction with physics-based simulation is used to achieve detectors that behave similarly in synthetic and real data. Given the stochastic output of such detectors, candidates for object poses are sampled. The second objective is to automatically learn a single score for each pose candidate that represents its quality in terms of explaining the entire scene via a gradient boosted tree. The proposed method uses features derived from surface and boundary alignment between the observed scene and the object model placed at hypothesized poses. Scene-level, multi-instance pose estimation is then achieved by an integer linear programming process that selects hypotheses that maximize the sum of the learned individual scores, while respecting constraints, such as avoiding collisions. To evaluate this method, a dataset of densely packed objects with challenging setups for state-of-the-art approaches is collected. Experiments on this dataset and a public one show that the method significantly outperforms alternatives in terms of 6D pose accuracy while trained only with synthetic datasets.more » « less
-
A key challenge for generalizing programming-by-demonstration (PBD) scripts is the data description problem - when a user demonstrates performing an action, the system needs to determine features for describing this action and the target object in a way that can reflect the user's intention for the action. However, prior approaches for creating data descriptions in PBD systems have problems with usability, applicability, feasibility, transparency and/or user control. Our APPINITE system introduces a multimodal interface with which users can specify data descriptions verbally using natural language instructions. APPINITE guides users to describe their intentions for the demonstrated actions through mixed-initiative conversations. APPINITE constructs data descriptions for these actions from the natural language instructions. Our evaluation showed that APPINITE is easy-to-use and effective in creating scripts for tasks that would otherwise be difficult to create with prior PBD systems, due to ambiguous data descriptions in demonstrations on GUIs.more » « less