NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

One-shot manipulation strategy learning by making contact analogies.

Liu, Yuyao; Mao, Jiayuan; Tenenbaum, Joshua; Lozano-Perez, Tomas; Kaelbling, Leslie (June 2025, Proceedings IEEE International Conference on Robotics and Automation)

We present a novel approach, MAGIC (manipulation analogies for generalizable intelligent contacts), for one-shot learning of manipulation strategies with fast and extensive generalization to novel objects. By leveraging a reference action trajectory, MAGIC effectively identifies similar contact points and sequences of actions on novel objects to replicate a demonstrated strategy, such as using different hooks to retrieve distant objects of different shapes and sizes. Our method is based on a twostage contact-point matching process that combines global shape matching using pretrained neural features with local curvature analysis to ensure precise and physically plausible contact points. We experiment with three tasks including scooping, hanging, and hooking objects. MAGIC demonstrates superior performance over existing methods, achieving significant improvements in runtime speed and generalization to different object categories. Website: https://magic-2024.github.io/.
more » « less
Free, publicly-accessible full text available June 2, 2026
Guiding long-horizon task and motion planning with vision language models.

Yang, Zhutian; Garrett, Caelan; Kumar, Nishanth; Fox, Dieter; Lozano-Perez, Tomas; Kaelbling, Leslie (June 2025, Proceedings IEEE International Conference on Robotics and Automation)

ision-Language Models (VLM) can generate plausible high-level plans when prompted with a goal, the context, an image of the scene, and any planning constraints. However, there is no guarantee that the predicted actions are geometrically and kinematically feasible for a particular robot embodiment. As a result, many prerequisite steps such as opening drawers to access objects are often omitted in their plans. Robot task and motion planners can generate motion trajectories that respect the geometric feasibility of actions and insert physically necessary actions, but do not scale to everyday problems that require common-sense knowledge and involve large state spaces comprised of many variables. We propose VLM-TAMP, a hierarchical planning algorithm that leverages a VLM to generate goth semantically-meaningful and horizon-reducing intermediate subgoals that guide a task and motion planner. When a subgoal or action cannot be refined, the VLM is queried again for replanning. We evaluate VLMTAMP on kitchen tasks where a robot must accomplish cooking goals that require performing 30-50 actions in sequence and interacting with up to 21 objects. VLM-TAMP substantially outperforms baselines that rigidly and independently execute VLM-generated action sequences, both in terms of success rates (50 to 100% versus 0%) and average task completion percentage (72 to 100% versus 15 to 45%).
more » « less
Free, publicly-accessible full text available June 2, 2026
Keypoint abstraction using large models for object-relative imitation learning.

Fang, Xiaolin; Huang, Bo-Ruei; Mao, Jiayuan; Shone, Jasmine; Tenenbaum, Joshua; Lozano-Perez, Tomas; Kaelbling, Leslie (June 2025, Proceedings IEEE International Conference on Robotics and Automation)

Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics. Keypoint-based representations have been proven effective as a succinct representation for capturing essential object features, and for establishing a reference frame in action prediction, enabling data-efficient learning of robot skills. However, their manual design nature and reliance on additional human labels limit their scalability. In this paper, we propose KALM, a framework that leverages large pre-trained vision-language models (LMs) to automatically generate taskrelevant and cross-instance consistent keypoints. KALM distills robust and consistent keypoints across views and objects by generating proposals using LMs and verifies them against a small set of robot demonstration data. Based on the generated keypoints, we can train keypoint-conditioned policy models that predict actions in keypoint-centric frames, enabling robots to generalize effectively across varying object poses, camera views, and object instances with similar functional shapes. Our method demonstrates strong performance in the real world, adapting to different tasks and environments from only a handful of demonstrations while requiring no additional labels.
more » « less
Free, publicly-accessible full text available June 2, 2026
Embodied Uncertainty-Aware Object Segmentation

Fang; Xiaolin; Lozano-Perez, Tomas; Kaelbling, Leslie (October 2024, IEEE/RSJ International Conference on Intelligent Robots and Systems)

We introduce uncertainty-aware object instance segmentation (UNCOS) and demonstrate its usefulness for embodied interactive segmentation. To deal with uncertainty in robot perception, we propose a method for generating a hypothesis distribution of object segmentation. We obtain a set of region-factored segmentation hypotheses together with confidence estimates by making multiple queries of large pre-trained models. This process can produce segmentation results that achieve state-of-the-art performance on unseen object segmentation problems. The output can also serve as input to a belief-driven process for selecting robot actions to perturb the scene to reduce ambiguity. We demonstrate the effectiveness of this method in real-robot experiments.
more » « less
Full Text Available
Trust the proc3s: Solving long-horizon robotics problems with llms and constraint satisfaction.

Curtis, Aidan; Kumar, Nishanth; Cao, Jing; Lozano-Perez, Tomas; Kaelbling, Leslie (November 2024, Conference on Robot Learning)

Recent developments in pretrained large language models (LLMs) ap- plied to robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks. In this paper, we ex- amine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and phys- ical constraints. We prompt the LLM to output code for a function with open parameters, which, together with environmental constraints, can be viewed as a Continuous Constraint Satisfaction Problem (CCSP). This CCSP can be solved through sampling or optimization to find a skill sequence and continuous param- eter settings that achieve the goal while avoiding constraint violations. Addition- ally, we consider cases where the LLM proposes unsatisfiable CCSPs, such as those that are kinematically infeasible, dynamically unstable, or lead to colli- sions, and re-prompt the LLM to form a new CCSP accordingly. Experiments across simulated and real-world domains demonstrate that our proposed strategy, PRoC3S, is capable of solving a wide range of complex manipulation tasks with realistic constraints much more efficiently and effectively than existing baselines.
more » « less
Full Text Available
Hybrid declarative-imperative representations for hybrid discrete-continuous decision-making

Mao, Jiayuan; Tenenbaum, Joshua; Lozano-Perez, Tomas; Kaelbling, Leslie (October 2024, Workshop on Algorithmic Foundations of Robotics)

We present a robot-behavior description language cdl that can express both direct imperative strategies and planning-based strategies, and combine them seamlessly within the same program. Accompanying this language is a general-purpose planner Crow, which interprets the behavior description and searches as necessary to find a sound plan. We demonstrate (1) via example programs, that cdl can be used to specify, very intuitively, different known strategies for navigation among movable obstacle (NAMO) problems, (2) via empirical results, that Crow can take advantage of the priors expressed in cdl to very quickly solve problem instances with known simplifying structure but still generalize to hard instances, and (3) via theory, that width, a powerful characterization of the worst-case complexity of planning problems, corresponds to a natural property of cdl descriptions and that Crow operates in time on the same order as the width-based worst-case complexity.
more » « less
Full Text Available
DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability

Fang, Xiaolin; Garrett, Caelan; Eppner, Clemens; Lozano-Perez, Tomas; Kaelbling, Leslie; Fox, Dietwr (October 2024, IEEE/RSJ International Conference on Intelligent Robots and Systems)

Generative models such as diffusion models, excel at capturing high-dimensional distributions with diverse input modalities, e.g. robot trajectories, but are less effective at multistep constraint reasoning. Task and Motion Planning (TAMP) approaches are suited for planning multi-step autonomous robot manipulation. However, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by composing diffusion models using a TAMP system. We use the learned components for constraints and samplers that are difficult to engineer in the planning model, and use a TAMP solver to search for the task plan with constraint-satisfying action parameter values. To tractably make predictions for unseen objects in the environment, we define the learned samplers and TAMP operators on learned latent embedding of changing object states. We evaluate our approach in a simulated articulated object manipulation domain and show how the combination of classical TAMP, generative modeling, and latent embedding enables multi-step constraint-based reasoning. We also apply the learned sampler in the real world.
more » « less
Full Text Available
Partially Observable Task and Motion Planning with Uncertainty and Risk Awareness

Curtis, Aidan; Matheos, George; Gothoskar, Nishad; Mansinghka, Vikash; Tenenbaum, Joshua; Lozano-Perez, Tomas; Kaelbling, Leslie (July 2024, Robotics: Science and Systems Proceedings 2023)

Integrated task and motion planning (TAMP) has proven to be a valuable approach to generalizable long-horizon robotic manipulation and navigation problems. However, the typical TAMP problem formulation assumes full observability and deterministic action effects. These assumptions limit the ability of the planner to gather information and make decisions that are risk-aware. We propose a strategy for TAMP with Uncertainty and Risk Awareness (TAMPURA) that is capable of efficiently solving long-horizon planning problems with initial- state and action outcome uncertainty, including problems that require information gathering and avoiding undesirable and irreversible outcomes. Our planner reasons under uncertainty at both the abstract task level and continuous controller level. Given a set of closed-loop goal-conditioned controllers operating in the primitive action space and a description of their preconditions and potential capabilities, we learn a high-level abstraction that can be solved efficiently and then refined to continuous actions for execution. We demonstrate our approach on several robotics problems where uncertainty is a crucial factor and show that reasoning under uncertainty in these problems outperforms previously proposed determinized planning, direct search, and reinforcement learning strategies. Lastly, we demonstrate our planner on two real-world robotics problems using recent ad- vancements in probabilistic perception.
more » « less
Full Text Available
Practice Makes Perfect: Planning to Learn Skill Parameter Policies

Kumar, Nishanth; Silver, Tom; McClinton, Willie; Zhao, Linfeng; Proulx, Stephen; Lozano-Perez, Tomas; Kaelbling, Leslie; Barry, Jennifer (July 2024, Robotics: Science and Systems Proceedings 2024)

One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: “how much would the competence improve through practice?”), and situate the skill in the task distribution through competence- aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective pa- rameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach’s ability to handle noise from perception and control and improve the robot’s ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu
more » « less
Full Text Available
What Planning Problems Can A Relational Neural Network Solve?

Mao, Jiayuan; Lozano-Perez, Tomas; Tenenbaum, Joshua; Kaelbling, Leslie (December 2023, Advances in neural information processing systems (NeurIPS) 2023)

Goal-conditioned policies are generally understood to be “feed-forward” circuits, in the form of neural networks that map from the current state and the goal specifi- cation to the next action to take. However, under what circumstances such a policy can be learned and how efficient the policy will be are not well understood. In this paper, we present a circuit complexity analysis for relational neural networks (such as graph neural networks and transformers) representing policies for planning problems, by drawing connections with serialized goal regression search (S-GRS). We show that there are three general classes of planning problems, in terms of the growth of circuit width and depth as a function of the number of objects and planning horizon, providing constructive proofs. We also illustrate the utility of this analysis for designing neural networks for policy learning.
more » « less
Full Text Available

« Prev Next »

Search for: All records