PROC2PDDL: Open-Domain Planning Representations from Texts

Zhang, Tianyi; Zhang, Li; Hou, Zhaoyi; Wang, Ziyu; Gu, Yuling; Clark, Peter; Callison-Burch, Chris; Tandon, Niket

doi:10.18653/v1/2024.nlrse-1.2

Citation Details

PROC2PDDL: Open-Domain Planning Representations from Texts

Planning in a text-based environment continues to be a significant challenge for AI systems. Recent approaches have utilized language models to predict planning domain definitions (e.g., PDDL) but have only been evaluated in closed-domain simulated environments. To address this, we present Proc2PDDL, the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representations. Using this dataset, we evaluate the task of predicting domain actions (parameters, preconditions, and effects). We experiment with various large language models (LLMs) and prompting mechanisms, including a novel instruction inspired by the zone of proximal development (ZPD), which reconstructs the task as incremental basic skills. Our results demonstrate that Proc2PDDL is highly challenging for end-to-end LLMs, with GPT-3.5’s success rate close to 0% and GPT-4o’s 38%. With ZPD instructions, GPT-4o’s success rate increases to 45%, outperforming regular chain-of-thought prompting’s 34%. Our analysis systematically examines both syntactic and semantic errors, providing insights into the strengths and weaknesses of language models in generating domain-specific programs. more »

Award ID(s):: 1928474

PAR ID:: 10563508

Author(s) / Creator(s):: Zhang, Tianyi; Zhang, Li; Hou, Zhaoyi; Wang, Ziyu; Gu, Yuling; Clark, Peter; Callison-Burch, Chris; Tandon, Niket

Publisher / Repository:: Association for Computational Linguistics

Date Published:: 2024-01-01

Page Range / eLocation ID:: 13 to 24

Subject(s) / Keyword(s):: LLMs planning PDDL

Format(s):: Medium: X

Location:: Bangkok, Thailand

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2024.nlrse-1.2

More Like this