Guiding long-horizon task and motion planning with vision language models.

Yang, Zhutian; Garrett, Caelan; Kumar, Nishanth; Fox, Dieter; Lozano-Perez, Tomas; Kaelbling, Leslie

Citation Details

This content will become publicly available on June 2, 2026

Guiding long-horizon task and motion planning with vision language models.

ision-Language Models (VLM) can generate plausible high-level plans when prompted with a goal, the context, an image of the scene, and any planning constraints. However, there is no guarantee that the predicted actions are geometrically and kinematically feasible for a particular robot embodiment. As a result, many prerequisite steps such as opening drawers to access objects are often omitted in their plans. Robot task and motion planners can generate motion trajectories that respect the geometric feasibility of actions and insert physically necessary actions, but do not scale to everyday problems that require common-sense knowledge and involve large state spaces comprised of many variables. We propose VLM-TAMP, a hierarchical planning algorithm that leverages a VLM to generate goth semantically-meaningful and horizon-reducing intermediate subgoals that guide a task and motion planner. When a subgoal or action cannot be refined, the VLM is queried again for replanning. We evaluate VLMTAMP on kitchen tasks where a robot must accomplish cooking goals that require performing 30-50 actions in sequence and interacting with up to 21 objects. VLM-TAMP substantially outperforms baselines that rigidly and independently execute VLM-generated action sequences, both in terms of success rates (50 to 100% versus 0%) and average task completion percentage (72 to 100% versus 15 to 45%). more »

Award ID(s):: 2214177

PAR ID:: 10629493

Author(s) / Creator(s):: Yang, Zhutian; Garrett, Caelan; Kumar, Nishanth; Fox, Dieter; Lozano-Perez, Tomas; Kaelbling, Leslie

Publisher / Repository:: IEEE International Conference on Robotics and Automation

Date Published:: 2025-06-02

Journal Name:: Proceedings IEEE International Conference on Robotics and Automation

ISSN:: 1050-4729

Format(s):: Medium: X

Location:: Atlanta, Georgia

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 2, 2026
Conference Paper:
The DOI is not currently available.

More Like this