skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Scarecrows in Oz: The Use of Large Language Models in HRI
The proliferation of Large Language Models (LLMs) presents both a critical design challenge and a remarkable opportunity for the field of Human-Robot Interaction (HRI). While the direct deployment of LLMs on interactive robots may be unsuitable for reasons of ethics, safety, and control, LLMs might nevertheless provide a promising baseline technique for many elements of HRI. Specifically, in this position paper, we argue for the use of LLMs as Scarecrows: ‘brainless,’ straw-man black-box modules integrated into robot architectures for the purpose of quickly enabling full-pipeline solutions, much like the use of “Wizard of Oz” (WoZ) and other human-in-the-loop approaches. We explicitly acknowledge that these Scarecrows, rather than providing a satisfying or scientifically complete solution, incorporate a form of the wisdom of the crowd, and, in at least some cases, will ultimately need to be replaced or supplemented by a robust and theoretically motivated solution. We provide examples of how Scarecrows could be used in language-capable robot architectures as useful placeholders, and suggest initial reporting guidelines for authors, mirroring existing guidelines for the use and reporting of WoZ techniques.  more » « less
Award ID(s):
2024878 2145642
PAR ID:
10466820
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Association for Computing Machinery
Date Published:
Journal Name:
ACM transactions on humanrobot interaction
ISSN:
2573-9522
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. To support positive, ethical human-robot interactions, robots need to be able to respond to unexpected situations in which societal norms are violated, including rejecting unethical commands. Implementing robust communication for robots is inherently difficult due to the variability of context in real-world settings and the risks of unintended influence during robots’ communication. HRI researchers have begun exploring the potential use of LLMs as a solution for language-based communication, which will require an in-depth understanding and evaluation of LLM applications in different contexts. In this work, we explore how an existing LLM responds to and reasons about a set of norm-violating requests in HRI contexts. We ask human participants to assess the performance of a hypothetical GPT-4-based robot on moral reasoning and explanatory language selection as it compares to human intuitions. Our findings suggest that while GPT-4 performs well at identifying norm violation requests and suggesting non-compliant responses, its flaws in not matching the linguistic preferences and context sensitivity of humans prevent it from being a comprehensive solution for moral communication between humans and robots. Based on our results, we provide a four-point recommendation for the community in incorporating LLMs into HRI systems. 
    more » « less
  2. Natural Human-Robot interaction (HRI) attracts considerable interest in letting robots understand the users’ emotional state. This paper demonstrates a method to introduce the affection model to the robotic system’s conversational agent to provide natural and empathetic HRI. We use a large-scale pre-trained language model and fine-tune it on a dialogue dataset with empathetic characteristics. Based on existing studies’ progress, we extend the current method and enable the agent to perform advanced sentiment analysis using the affection model. This dialogue agent will allow the robot to provide natural response along with emotion classification and the estimations of arousal and valence level. We evaluate our model using different metrics, comparing it with the recent studies and showing its emotion detection capacity. 
    more » « less
  3. In this paper, we argue that, as HCI becomes more multimodal with the integration of gesture, gaze, posture, and other nonverbal behavior, it is important to understand the role played by affordances and their associated actions in human-object interactions (HOI), so as to facilitate reasoning in HCI and HRI environments. We outline the requirements and challenges involved in developing a multimodal semantics for human-computer and human-robot interactions. Unlike unimodal interactive agents (e.g., text-based chatbots or voice-based personal digital assistants), multimodal HCI and HRI inherently require a notion of embodiment, or an understanding of the agent’s placement within the environment and that of its interlocutor. We present a dynamic semantics of the language, VoxML, to model human-computer, human-robot, and human-human interactions by creating multimodal simulations of both the communicative content and the agents’ common ground, and show the utility of VoxML information that is reified within the environment on computational understanding of objects for HOI. 
    more » « less
  4. Large-language models (LLMs) hold significant promise in improving human-robot interaction, offering advanced conversational skills and versatility in managing diverse, open-ended user requests in various tasks and domains. Despite the potential to transform human-robot interaction, very little is known about the distinctive design requirements for utilizing LLMs in robots, which may differ from text and voice interaction and vary by task and context. To better understand these requirements, we conducted a user study (n = 32) comparing an LLM-powered social robot against text- and voice-based agents, analyzing task-based requirements in conversational tasks, including choose, generate, execute, and negotiate. Our findings show that LLM-powered robots elevate expectations for sophisticated non-verbal cues and excel in connection-building and deliberation, but fall short in logical communication and may induce anxiety. We provide design implications both for robots integrating LLMs and for fine-tuning LLMs for use with robots. 
    more » « less
  5. Augmented Reality (AR) technologies present an exciting new medium for human-robot interactions, enabling new opportunities for both implicit and explicit human-robot communication. For example, these technologies enable physically-limited robots to execute non-verbal interaction patterns such as deictic gestures despite lacking the physical morphology necessary to do so. However, a wealth of HRI research has demonstrated real benefits to physical embodiment (compared to, e.g., virtual robots on screens), suggesting AR augmentation of virtual robot parts could face challenges.In this work, we present empirical evidence comparing the use of virtual (AR) and physical arms to perform deictic gestures that identify virtual or physical referents. Our subjective and objective results demonstrate the success of mixed reality deictic gestures in overcoming these potential limitations, and their successful use regardless of differences in physicality between gesture and referent. These results help to motivate the further deployment of mixed reality robotic systems and provide nuanced insight into the role of mixed-reality technologies in HRI contexts. 
    more » « less