NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

GraphEQA: Using 3D Semantic Scene Graphs for Real-time Embodied Question Answering

Saxena, Saumya; Buchanan, Blake; Paxton, Chris; Chen, Bingqing; Vaskevicius, Narunas; Palmieri, Luigi; Francis, Jonathan; Kroemer, Oliver (December 2024, Arxiv)

Free, publicly-accessible full text available December 18, 2025
GOAT: GO to Any Thing

https://doi.org/10.15607/RSS.2024.XX.073

Chang, Matthew; Gervet, Theophile; Khanna, Mukul; Yenamandra, Sriram; Shah, Dhruv; Min, So; Shah, Kavit; Paxton, Chris; Gupta, Saurabh; Batra, Dhruv; et al (July 2024, Robotics: Science and Systems Foundation)

In deployment scenarios such as homes and warehouses, mobile robots are expected to autonomously navigate for extended periods, seamlessly executing tasks articulated in terms that are intuitively understandable by human operators. We present GO To Any Thing (GOAT), a universal navigation system capable of tackling these requirements with three key features: a) Multimodal: it can tackle goals specified via category labels, target images, and language descriptions, b) Lifelong: it benefits from its past experience in the same environment, and c) Platform Agnostic: it can be quickly deployed on robots with different embodiments. GOAT is made possible through a modular system design and a continually augmented instance-aware semantic memory that keeps track of the appearance of objects from different viewpoints in addition to category-level semantics. This enables GOAT to distinguish between different instances of the same category to enable navigation to targets specified by images and language descriptions. In experimental comparisons spanning over 90 hours in 9 different homes consisting of 675 goals selected across 200+ different object instances, we find GOAT achieves an overall success rate of 83%, surpassing previous methods and ablations by 32% (absolute improvement). GOAT improves with experience in the environment, from a 60% success rate at the first goal to a 90% success after exploration. In addition, we demonstrate that GOAT can readily be applied to downstream tasks such as pick and place and social navigation.
more » « less
Full Text Available
SORNet: Spatial object-centric representations for sequential manipulation

Yuan, Wentao; Paxton, Chris; Desingh, Karthik; Fox, Dieter (January 2022, Conference on Robot Learning)

Full Text Available
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution

Blukis, Valts; Paxton, Chris; Fox, Dieter; Garg, Animesh; Artzi, Yoav (January 2021, In Proceedings of the Conference on Robot Learning (CoRL))

Full Text Available
“Good Robot!”: Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer

https://doi.org/10.1109/LRA.2020.3015448

Hundt, Andrew; Killeen, Benjamin; Greene, Nicholas; Wu, Hongtao; Kwon, Heeyeon; Paxton, Chris; Hager, Gregory D. (October 2020, IEEE Robotics and Automation Letters)
null (Ed.)
Full Text Available
The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

Hundt, Andrew; Jain, Varun; Lin, Chia-Hung; Paxton, Chris; Hager, Gregory D. (November 2019, IROS 2019)

A robot can now grasp an object more effectively than ever before, but once it has the object what happens next? We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances. To address this, we introduce the JHU CoSTAR Block Stacking Dataset (BSD), where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data. We discuss the ways in which this dataset provides a valuable resource for a broad range of other topics of investigation. We find that hand-designed neural networks that work on prior datasets do not generalize to this task. Thus, to establish a baseline for this dataset, we demonstrate an automated search of neural network based models using a novel multiple-input HyperTree MetaModel, and find a final model which makes reasonable 3D pose predictions for grasping and stacking on our dataset. The CoSTAR BSD, code, and instructions are available at sites.google.com/site/costardataset
more » « less
Full Text Available
Evaluating Methods for End-User Creation of Robot Task Plans

Paxton, Chris; Jonathan, Felix; Hundt, Andrew; Bilge, Mutlu; Hager, Gregory D. (October 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems)

How can we enable users to create effective, perception-driven task plans for collaborative robots? We conducted a 35-person user study with the Behavior Tree-based CoSTAR system to determine which strategies for end user creation of generalizable robot task plans are most usable and effective. CoSTAR allows domain experts to author complex, perceptually grounded task plans for collaborative robots. As a part of CoSTAR's wide range of capabilities, it allows users to specify SmartMoves: abstract goals such as "pick up component A from the right side of the table." Users were asked to perform pick-and-place assembly tasks with either SmartMoves or one of three simpler baseline versions of CoSTAR. Overall, participants found CoSTAR to be highly usable, with an average System Usability Scale score of 73.4 out of 100. SmartMove also helped users perform tasks faster and more effectively; all SmartMove users completed the first two tasks, while not all users completed the tasks using the other strategies. SmartMove users showed better performance for incorporating perception across all three tasks.
more » « less
Full Text Available
CoSTAR: Instructing collaborative robots with behavior trees and vision

https://doi.org/10.1109/ICRA.2017.7989070

Paxton, Chris; Hundt, Andrew; Jonathan, Felix; Guerin, Kelleher; Hager, Gregory D. (May 2017, ICRA 2017)

For collaborative robots to become useful, end users who are not robotics experts must be able to instruct them to perform a variety of tasks. With this goal in mind, we developed a system for end‐user creation of robust task plans with a broad range of capabilities. CoSTAR: the Collaborative System for Task Automation and Recognition} is our winning entry in the 2016 KUKA Innovation Award competition at the Hannover Messe trade show, which this year focused on Flexible Manufacturing. CoSTAR is unique in how it creates natural abstractions that use perception to represent the world in a way users can both understand and utilize to author capable and robust task plans. Our Behavior Tree‐based task editor integrates high‐level information from known object segmentation and pose estimation with spatial reasoning and robot actions to create robust task plans. We describe the crossplatform design and implementation of this system on multiple industrial robots and evaluate its suitability for a wide variety of use cases.
more » « less
Full Text Available
Do what i want, not what i did: Imitation of skills by planning sequences of actions

https://doi.org/10.1109/IROS.2016.7759556

Paxton, Chris; Jonathan, Felix; Kobilarov, Marin; Hager, Gregory D. (October 2016, IROS)

We propose a learning‐from‐demonstration approach for grounding actions from expert data and an algorithm for using these actions to perform a task in new environments. Our approach is based on an application of sampling‐based motion planning to search through the tree of discrete, high‐level actions constructed from a symbolic representation of a task. Recursive sampling‐based planning is used to explore the space of possible continuous‐space instantiations of these actions. We demonstrate the utility of our approach with a magnetic structure assembly task, showing that the robot can intelligently select a sequence of actions in different parts of the workspace and in the presence of obstacles. This approach can better adapt to new environments by selecting the correct high‐level actions for the particular environment while taking human preferences into account.
more » « less
Full Text Available

Search for: All records