ision-Language Models (VLM) can generate plausible high-level plans when prompted with a goal, the context, an image of the scene, and any planning constraints. However, there is no guarantee that the predicted actions are geometrically and kinematically feasible for a particular robot embodiment. As a result, many prerequisite steps such as opening drawers to access objects are often omitted in their plans. Robot task and motion planners can generate motion trajectories that respect the geometric feasibility of actions and insert physically necessary actions, but do not scale to everyday problems that require common-sense knowledge and involve large state spaces comprised of many variables. We propose VLM-TAMP, a hierarchical planning algorithm that leverages a VLM to generate goth semantically-meaningful and horizon-reducing intermediate subgoals that guide a task and motion planner. When a subgoal or action cannot be refined, the VLM is queried again for replanning. We evaluate VLMTAMP on kitchen tasks where a robot must accomplish cooking goals that require performing 30-50 actions in sequence and interacting with up to 21 objects. VLM-TAMP substantially outperforms baselines that rigidly and independently execute VLM-generated action sequences, both in terms of success rates (50 to 100% versus 0%) and average task completion percentage (72 to 100% versus 15 to 45%).
more »
« less
Learning adaptive planning representations with natural language guidance
Effective planning in the real world requires not only world knowledge, but the ability to leverage that knowledge to build the right representation of the task at hand. Decades of hierarchical planning techniques have used domain-specific temporal action abstractions to support efficient and accurate planning, almost always relying on human priors and domain knowledge to decompose hard tasks into smaller subproblems appropriate for a goal or set of goals. This paper describes Ada (Action Domain Acquisition), a framework for automatically constructing task-specific planning representations using task-general background knowledge from language models (LMs). Starting with a general-purpose hierarchical planner and a low-level goal-conditioned policy, Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks. On two language-guided interactive planning benchmarks (Mini Minecraft and ALFRED Household Tasks), Ada strongly outperforms other approaches that use LMs for sequential decision- making, offering more accurate plans and better generalization to complex tasks.
more »
« less
- Award ID(s):
- 2212310
- PAR ID:
- 10535719
- Publisher / Repository:
- International Conference on Learning Representations
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper takes the first step towards a reactive, hierarchical multi-robot task allocation and planning framework given a global Linear Temporal Logic specification. The capabilities of both quadrupedal and wheeled robots are leveraged via a heterogeneous team to accomplish a variety of navigation and delivery tasks. However, when deployed in the real world, all robots can be susceptible to different types of disturbances, including but not limited to locomotion failures, human interventions, and obstructions from the environment. To address these disturbances, we propose task-level local and global reallocation strategies to efficiently generate updated action-state sequences online while guaranteeing the completion of the original task. These task reallocation approaches eliminate reconstructing the entire plan or resynthesizing a new task. To integrate the task planner with low-level inputs, a Behavior Tree execution layer monitors different types of disturbances and employs the reallocation methods to make corresponding recovery strategies. To evaluate this planning framework, dynamic simulations are conducted in a realistic hospital environment with a heterogeneous robot team consisting of quadrupeds and wheeled robots for delivery tasks.more » « less
-
In this work, we theoretically investigate why large language model (LLM)-empowered agents can solve decision-making problems in the physical world. We consider a hierarchical reinforcement learning (RL) model where the LLM Planner handles high-level task planning and the Actor performs low-level execution. Within this model, the LLM Planner operates in a partially observable Markov decision process (POMDP), iteratively generating language-based subgoals through prompting. Assuming appropriate pretraining data, we prove that the pretrained LLM Planner effectively conducts Bayesian aggregated imitation learning (BAIL) via in-context learning. We also demonstrate the need for exploration beyond the subgoals produced by BAIL, showing that naively executing these subgoals results in linear regret. To address this, we propose an ε-greedy exploration strategy for BAIL, which we prove achieves sublinear regret when pretraining error is low. Finally, we extend our theoretical framework to cases where the LLM Planner acts as a world model to infer the environment’s transition model and to multi-agent settings, facilitating coordination among multiple Actors.more » « less
-
General-purpose agents require fine-grained controls and rich sensory inputs to perform a wide range of tasks. However, this complexity often leads to intractable decision-making. Traditionally, agents are provided with task-specific action and observation spaces to mitigate this challenge, but this reduces autonomy. Instead, agents must be capable of building state-action spaces at the correct abstraction level from their sensorimotor experiences. We leverage the structure of a given set of temporally-extended actions to learn abstract Markov decision processes (MDPs) that operate at a higher level of temporal and state granularity. We characterize state abstractions necessary to ensure that planning with these skills, by simulating trajectories in the abstract MDP, results in policies with bounded value loss in the original MDP. We evaluate our approach in goal-based navigation environments that require continuous abstract states to plan successfully and show that abstract model learning improves the sample efficiency of planning and learning.more » « less
-
Integrated task and motion planning (TAMP) has proven to be a valuable approach to generalizable long-horizon robotic manipulation and navigation problems. However, the typical TAMP problem formulation assumes full observability and deterministic action effects. These assumptions limit the ability of the planner to gather information and make decisions that are risk-aware. We propose a strategy for TAMP with Uncertainty and Risk Awareness (TAMPURA) that is capable of efficiently solving long-horizon planning problems with initial- state and action outcome uncertainty, including problems that require information gathering and avoiding undesirable and irreversible outcomes. Our planner reasons under uncertainty at both the abstract task level and continuous controller level. Given a set of closed-loop goal-conditioned controllers operating in the primitive action space and a description of their preconditions and potential capabilities, we learn a high-level abstraction that can be solved efficiently and then refined to continuous actions for execution. We demonstrate our approach on several robotics problems where uncertainty is a crucial factor and show that reasoning under uncertainty in these problems outperforms previously proposed determinized planning, direct search, and reinforcement learning strategies. Lastly, we demonstrate our planner on two real-world robotics problems using recent ad- vancements in probabilistic perception.more » « less
An official website of the United States government

