skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Generating Executable Action Plans with Environmentally-Aware Language Models
Large Language Models (LLMs) trained using massive text datasets have recently shown promise in generating action plans for robotic agents from high level text queries. However, these models typically do not consider the robot’s environment, resulting in generated plans that may not actually be executable, due to ambiguities in the planned actions or environmental constraints. In this paper, we propose an approach to generate environmentally-aware action plans that agents are better able to execute. Our approach involves integrating environmental objects and object relations as additional inputs into LLM action plan generation to provide the system with an awareness of its surroundings, resulting in plans where each generated action is mapped to objects present in the scene. We also design a novel scoring function that, along with generating the action steps and associating them with objects, helps the system disambiguate among object instances and take into account their states. We evaluated our approach using the VirtualHome simulator and the ActivityPrograms knowledge base and found that action plans generated from our system had a 310% improvement in executability and a 147% improvement in correctness over prior work. The complete code and a demo of our method is publicly available at https://github.com/hri-ironlab/scene_aware_language_planner.  more » « less
Award ID(s):
2222952
PAR ID:
10615158
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE
Date Published:
ISBN:
978-1-6654-9190-7
Page Range / eLocation ID:
3568 to 3575
Format(s):
Medium: X
Location:
Detroit, MI, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose a method for autonomously learning an object-centric representation of a continuous and high-dimensional environment that is suitable for planning. Such representations can immediately be transferred between tasks that share the same types of objects, resulting in agents that require fewer samples to learn a model of a new task. We first demonstrate our approach on a 2D crafting domain consisting of numerous objects where the agent learns a compact, lifted representation that generalises across objects. We then apply it to a series of Minecraft tasks to learn object-centric representations and object types---directly from pixel data---that can be leveraged to solve new tasks quickly. The resulting learned representations enable the use of a task-level planner, resulting in an agent capable of transferring learned representations to form complex, long-term plans. 
    more » « less
  2. Typically to a roboticist, a plan is the outcome of other work, a synthesized object that realizes ends defined by some problem; plans qua plans are seldom treated as first-class objects of study. Plans designate functionality: a plan can be viewed as defining a robot’s behavior throughout its execution. This informs and reveals many other aspects of the robot’s design, including: necessary sensors and action choices, history, state, task structure, and how to define progress. Interrogating sets of plans helps in comprehending the ways in which differing executions influence the interrelationships between these various aspects. Revisiting Erdmann’s theory of action-based sensors, a classical approach for characterizing fundamental information requirements, we show how plans (in their role of designating behavior) influence sensing requirements. Using an algorithm for enumerating plans, we examine how some plans for which no action-based sensor exists can be transformed into sets of sensors through the identification and handling of features that preclude the existence of action-based sensors. We are not aware of those obstructing features having been previously identified. Action-based sensors may be treated as standalone reactive plans; we relate them to the set of all possible plans through a lattice structure. This lattice reveals a boundary between plans with action-based sensors and those without. Some plans, specifically those that are not reactive plans and require some notion of internal state, can never have associated action-based sensors. Even so, action-based sensors can serve as a framework to explore and interpret how such plans make use of state. 
    more » « less
  3. Procedural modeling has produced amazing results, yet fundamental issues such as controllability and limited user guidance persist. We introduce a novel procedural system called PICO (Procedural Iterative Constrained Optimizer) using PICO-Graph, a procedural model designed with optimization in mind. PICO enables the exploration of generative designs by combining user and environmental constraints into a single framework and using optimization without the need to write procedural rules. The PICO-Graph is a data-flow procedural model consisting of a set of geometry-generating operation nodes. The forward generation is initiated by sending geometric objects from initial nodes. These objects travel through the graph, triggering generation of more objects along the way. We combine the PICO-Graph with evolutionary optimization that allows for exploration of the generated models and the generation of variants. The user defines the geometry-generating operations and the set of constraints; e.g, whether an existing object should be supported by the generated model, whether symmetries exist, etc. PICO then generates geometric models that fulfill the constraints through optimization, allowing interactive user control of constraints. We show PICO on a variety of examples, including generation of procedural chairs, generation of support structures for 3D printing, or generation of procedural terrains matching a given input. 
    more » « less
  4. Recognizing and generating object-state compositions has been a challenging task, especially when generalizing to unseen compositions. In this paper, we study the task of cutting objects in different styles and the resulting object state changes. We propose a new benchmark suite Chop & Learn, to accommodate the needs of learning objects and different cut styles using multiple viewpoints. We also propose a new task of Compositional Image Generation, which can transfer learned cut styles to different objects, by generating novel object-state images. Moreover, we also use the videos for Compositional Action Recognition, and show valuable uses of this dataset for multiple video tasks. Project website: https://chopnlearn.github.io. 
    more » « less
  5. Virtual reality is progressively more widely used to support embodied AI agents, such as robots, which frequently engage in ‘sim-to-real’ based learning approaches. At the same time, tools such as large vision-and-language models offer new capabilities that tie into a wide variety of tasks and capabilities. In order to understand how such agents can learn from simulated environments, we explore a language model’s ability to recover the type of object represented by a photorealistic 3D model as a function of the 3D perspective from which the model is viewed. We used photogrammetry to create 3D models of commonplace objects and rendered 2D images of these models from an fixed set of 420 virtual camera perspectives. A well-studied image and language model (CLIP) was used to generate text (i.e., prompts) corresponding to these images. Using multiple instances of various object classes, we studied which camera perspectives were most likely to return accurate text categorizations for each class of object. 
    more » « less