skip to main content


Title: Robot Behavior-Tree-Based Task Generation with Large Language Models
Nowadays, the behavior tree is gaining popularity as a representation for robot tasks due to its modularity and reusability. Designing behavior-tree tasks manually is time-consuming for robot end-users, thus there is a need for investigating automatic behavior-tree-based task generation. Prior behavior-tree- based task generation approaches focus on fixed primitive tasks and lack generalizability to new task domains. To cope with this issue, we propose a novel behavior-tree-based task generation approach that utilizes state-of-the-art large language models. We propose a Phase-Step prompt design that enables a hierarchical-structured robot task generation and further integrate it with behavior-tree-embedding- based search to set up the appropriate prompt. In this way, we enable an automatic and cross-domain behavior-tree task generation. Our behavior-tree-based task generation approach does not require a set of pre-defined primitive tasks. End-users only need to describe an abstract desired task and our proposed approach can swiftly generate the corresponding behavior tree. A full-process case study is provided to demonstrate our proposed approach. An ablation study is conducted to evaluate the effectiveness of our Phase-Step prompts. Assessment on Phase-Step prompts and the limitation of large language models are presented and discussed.  more » « less
Award ID(s):
1813935
NSF-PAR ID:
10475187
Author(s) / Creator(s):
;
Publisher / Repository:
RWTH Aachen University
Date Published:
Journal Name:
CEUR workshop proceedings
Volume:
3433
ISSN:
1613-0073
Format(s):
Medium: X
Location:
San Francisco, CA, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Martin, A ; Hinkelmann, K ; Fill, H ; Gerber, A ; Lenat, D. ; Stolle, R. ; van Harmelen, F (Ed.)
    Nowadays, the behavior tree is gaining popularity as a representation for robot tasks due to its modularity and reusability. Designing behavior-tree tasks manually is time-consuming for robot end-users, thus there is a need for investigating automatic behavior-tree-based task generation. Prior behavior-tree- based task generation approaches focus on fixed primitive tasks and lack generalizability to new task domains. To cope with this issue, we propose a novel behavior-tree-based task generation approach that utilizes state-of-the-art large language models. We propose a Phase-Step prompt design that enables a hierarchical-structured robot task generation and further integrate it with behavior-tree-embedding- based search to set up the appropriate prompt. In this way, we enable an automatic and cross-domain behavior-tree task generation. Our behavior-tree-based task generation approach does not require a set of pre-defined primitive tasks. End-users only need to describe an abstract desired task and our proposed approach can swiftly generate the corresponding behavior tree. A full-process case study is provided to demonstrate our proposed approach. An ablation study is conducted to evaluate the effectiveness of our Phase-Step prompts. Assessment on Phase-Step prompts and the limitation of large language models are presented and discussed. 
    more » « less
  2. Nowadays, the behavior tree is gaining popularity as a representation for robot tasks due to its modularity and reusability. Designing behavior-tree tasks manually is a time-consuming work for robot end- users, thus suggests a need for automatic behavior-tree task generation. Prior behavior-tree generation approaches focus on fixed primitive tasks and lack generalizability to new task domains. To cope with this issue, we propose a novel behavior-tree task generation approach with state-of-the-art large language models. We present a Phase-Step prompt design that enables hierarchical-structured robot task generation. We further integrate with behavior-tree-embedding-based search to set up the appropriate prompt. In such way, we enable automatic and cross-domain behavior-tree task generation. Our task generation approach does not require a set of pre-defined primitive tasks. End-user only needs to describe an abstract desired task and our approach can swiftly generate the corresponding behavior tree. Case studies are provided to demonstrate our approach. 
    more » « less
  3. Abstract Importance

    The study highlights the potential of large language models, specifically GPT-3.5 and GPT-4, in processing complex clinical data and extracting meaningful information with minimal training data. By developing and refining prompt-based strategies, we can significantly enhance the models’ performance, making them viable tools for clinical NER tasks and possibly reducing the reliance on extensive annotated datasets.

    Objectives

    This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks and proposes task-specific prompts to improve their performance.

    Materials and Methods

    We evaluated these models on 2 clinical NER tasks: (1) to extract medical problems, treatments, and tests from clinical notes in the MTSamples corpus, following the 2010 i2b2 concept extraction shared task, and (2) to identify nervous system disorder-related adverse events from safety reports in the vaccine adverse event reporting system (VAERS). To improve the GPT models' performance, we developed a clinical task-specific prompt framework that includes (1) baseline prompts with task description and format specification, (2) annotation guideline-based prompts, (3) error analysis-based instructions, and (4) annotated samples for few-shot learning. We assessed each prompt's effectiveness and compared the models to BioClinicalBERT.

    Results

    Using baseline prompts, GPT-3.5 and GPT-4 achieved relaxed F1 scores of 0.634, 0.804 for MTSamples and 0.301, 0.593 for VAERS. Additional prompt components consistently improved model performance. When all 4 components were used, GPT-3.5 and GPT-4 achieved relaxed F1 socres of 0.794, 0.861 for MTSamples and 0.676, 0.736 for VAERS, demonstrating the effectiveness of our prompt framework. Although these results trail BioClinicalBERT (F1 of 0.901 for the MTSamples dataset and 0.802 for the VAERS), it is very promising considering few training samples are needed.

    Discussion

    The study’s findings suggest a promising direction in leveraging LLMs for clinical NER tasks. However, while the performance of GPT models improved with task-specific prompts, there's a need for further development and refinement. LLMs like GPT-4 show potential in achieving close performance to state-of-the-art models like BioClinicalBERT, but they still require careful prompt engineering and understanding of task-specific knowledge. The study also underscores the importance of evaluation schemas that accurately reflect the capabilities and performance of LLMs in clinical settings.

    Conclusion

    While direct application of GPT models to clinical NER tasks falls short of optimal performance, our task-specific prompt framework, incorporating medical knowledge and training samples, significantly enhances GPT models' feasibility for potential clinical applications.

     
    more » « less
  4. Prompting has shown impressive success in enabling large pre-trained language models (LMs) to perform diverse NLP tasks, especially with only few downstream data. Automatically finding the optimal prompt for each task, however, is challenging. Most existing work resorts to tuning *soft* prompts (e.g., embeddings) which fall short of interpretability, reusability across LMs, and applicability when gradients are not accessible. *Discrete* prompts, on the other hand, are difficult to optimize, and are often created by “enumeration (e.g., paraphrasing)-then-selection” heuristics that do not explore the prompt space systematically. This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL). RLPrompt formulates a parameter-efficient policy network that generates the optimized discrete prompt after training with reward. To harness the complex and stochastic reward signals from the large LM environment, we incorporate effective reward stabilization that substantially enhances training efficiency. RLPrompt is flexibly applicable to different types of LMs, such as masked (e.g., BERT) and left-to-right models (e.g., GPTs), for both classification and generation tasks. Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing fine-tuning or prompting methods. Interestingly, the resulting optimized prompts are often ungrammatical gibberish text; and surprisingly, those gibberish prompts are transferrable between different LMs to retain significant performance, indicating that LM prompting may not follow human language patterns. 
    more » « less
  5. We propose a locomotion framework for bipedal robots consisting of a new motion planning method, dubbed trajectory optimization for walking robots plus (TOWR+), and a new whole-body control method, dubbed implicit hierarchical whole-body controller (IHWBC). For versatility, we consider the use of a composite rigid body (CRB) model to optimize the robot’s walking behavior. The proposed CRB model considers the floating base dynamics while accounting for the effects of the heavy distal mass of humanoids using a pre-trained centroidal inertia network. TOWR+ leverages the phase-based parameterization of its precursor, TOWR, and optimizes for base and end-effectors motions, feet contact wrenches, as well as contact timing and locations without the need to solve a complementary problem or integer program. The use of IHWBC enforces unilateral contact constraints (i.e., non-slip and non-penetration constraints) and a task hierarchy through the cost function, relaxing contact constraints and providing an implicit hierarchy between tasks. This controller provides additional flexibility and smooth task and contact transitions as applied to our 10 degree-of-freedom, line-feet biped robot DRACO. In addition, we introduce a new open-source and light-weight software architecture, dubbed planning and control (PnC), that implements and combines TOWR+ and IHWBC. PnC provides modularity, versatility, and scalability so that the provided modules can be interchanged with other motion planners and whole-body controllers and tested in an end-to-end manner. In the experimental section, we first analyze the performance of TOWR+ using various bipeds. We then demonstrate balancing behaviors on the DRACO hardware using the proposed IHWBC method. Finally, we integrate TOWR+ and IHWBC and demonstrate step-and-stop behaviors on the DRACO hardware. 
    more » « less