skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Investigating Confidence-Based Category Transition of Spatial Gestures
Situated human-human communication typically involves a combination of both natural language and gesture, especially deictic gestures intended to draw the listener’s attention to target referents. To engage in natural communication, robots must thus be similarly enabled not only to generate natural language, but to generate the appropriate gestures to accompany that language. In this work, we examine the gestures humans use to accompany spatial language, specifically the way that these gestures continuously degrade in specificity and then discretely transition into non-deictic gestural forms along with decreasing confidence in referent location. We then outline a research plan in which we propose to use data collected through our study of this transition to design more human-like gestures for language-capable robots.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2nd Workshop on Natural Language Generation for Human-Robot Interaction
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    To enable robots to select between different types of nonverbal behavior when accompanying spatial language, we must first understand the factors that guide human selection between such behaviors. In this work, we argue that to enable appropriate spatial gesture selection, HRI researchers must answer four questions: (1) What are the factors that determine the form of gesture used to accompany spatial language? (2) What parameters of these factors cause speakers to switch between these categories? (3) How do the parameterizations of these factors inform the performance of gestures within these categories? and (4) How does human generation of gestures differ from human expectations of how robots should generate such gestures? In this work, we consider the first three questions and make two key contributions: (1) a human-human interaction experiment investigating how human gestures transition between deictic and non-deictic under changes in contextual factors, and (2) a model of gesture category transition informed by the results of this experiment. 
    more » « less
  2. In previous work, researchers have repeatedly demonstrated that robots' use of deictic gestures enables effective and natural human-robot interaction. However, new technologies such as augmented reality head mounted displays enable environments in which mixed-reality becomes possible, and in such environments, physical gestures become but one category among many different types of mixed reality deictic gestures. In this paper, we present the first experimental exploration of the effectiveness of mixed reality deictic gestures beyond physical gestures. Specifically, we investigate human perception of videos simulating the display of allocentric gestures, in which robots circle their targets in users' fields of view. Our results suggest that this is an effective communication strategy, both in terms of objective accuracy and subjective perception, especially when paired with complex natural language references. 
    more » « less
  3. Abstract

    Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co‐speech gestures is a long‐standing problem in computer animation and is considered an enabling technology for creating believable characters in film, games, and virtual social spaces, as well as for interaction with social robots. The problem is made challenging by the idiosyncratic and non‐periodic nature of human co‐speech gesture motion, and by the great diversity of communicative functions that gestures encompass. The field of gesture generation has seen surging interest in the last few years, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep‐learning‐based generative models that benefit from the growing availability of data. This review article summarizes co‐speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule‐based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text and non‐linguistic input. Concurrent with the exposition of deep learning approaches, we chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method (e.g., optical motion capture or pose estimation from video). Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human‐like motion; grounding the gesture in the co‐occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.

    more » « less
  4. Augmented Reality (AR) technologies present an exciting new medium for human-robot interactions, enabling new opportunities for both implicit and explicit human-robot communication. For example, these technologies enable physically-limited robots to execute non-verbal interaction patterns such as deictic gestures despite lacking the physical morphology necessary to do so. However, a wealth of HRI research has demonstrated real benefits to physical embodiment (compared to, e.g., virtual robots on screens), suggesting AR augmentation of virtual robot parts could face challenges.In this work, we present empirical evidence comparing the use of virtual (AR) and physical arms to perform deictic gestures that identify virtual or physical referents. Our subjective and objective results demonstrate the success of mixed reality deictic gestures in overcoming these potential limitations, and their successful use regardless of differences in physicality between gesture and referent. These results help to motivate the further deployment of mixed reality robotic systems and provide nuanced insight into the role of mixed-reality technologies in HRI contexts. 
    more » « less
  5. Recently, researchers have initiated a new wave of convergent research in which Mixed Reality visualizations enable new modalities of human-robot communication, including Mixed Reality Deictic Gestures (MRDGs) – the use of visualizations like virtual arms or arrows to serve the same purpose as traditional physical deictic gestures. But while researchers have demonstrated a variety of benefits to these gestures, it is unclear whether the success of these gestures depends on a user’s level and type of cognitive load. We explore this question through an experiment grounded in rich theories of cognitive resources, attention, and multi-tasking, with significant inspiration drawn from Multiple Resource Theory. Our results suggest that MRDGs provide task-oriented benefits regardless of cognitive load, but only when paired with complex language. These results suggest that designers can pair rich referring expressions with MRDGs without fear of cognitively overloading their users. 
    more » « less