skip to main content


Title: GESTURAL ALIGNMENT AND ACCOMMODATION IN SPEAKER-LISTENER HEAD GESTURES
The timing of both manual co-speech gestures and head gestures is sensitive to prosodic structure of speech. However, head gesters are used not only by speakers, but also by listeners as a backchanneling device. Little research exists on the timing of gestures in back-channeling. To address this gap, we compare timing of listener and speaker head gestures in an interview context. Results reveal the dual role that head gestures play in speech and conversational interaction: while they are coordinated in key ways to one’s own speech, they are also coordinated to the gestures (and hence, the speech) of a conversation partner when one is actively listening to them. We also show that head gesture timing is sensitive to social dynamics between interlocutors. This study provides a novel contribution to literature on head gesture timing and has implications for studies of discourse and accommodation.  more » « less
Award ID(s):
2306149
PAR ID:
10451416
Author(s) / Creator(s):
;
Editor(s):
Skarnitzl, R. &
Date Published:
Journal Name:
Proceedings of the International Congress of Phonetic Sciences
ISSN:
0301-3162
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This research establishes a better understanding of the syntax choices in speech interactions and of how speech, gesture, and multimodal gesture and speech interactions are produced by users in unconstrained object manipulation environments using augmented reality. The work presents a multimodal elicitation study conducted with 24 participants. The canonical referents for translation, rotation, and scale were used along with some abstract referents (create, destroy, and select). In this study time windows for gesture and speech multimodal interactions are developed using the start and stop times of gestures and speech as well as the stoke times for gestures. While gestures commonly precede speech by 81 ms we find that the stroke of the gesture is commonly within 10 ms of the start of speech. Indicating that the information content of a gesture and its co-occurring speech are well aligned to each other. Lastly, the trends across the most common proposals for each modality are examined. Showing that the disagreement between proposals is often caused by a variation of hand posture or syntax. Allowing us to present aliasing recommendations to increase the percentage of users' natural interactions captured by future multimodal interactive systems. 
    more » « less
  2. Abstract

    Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co‐speech gestures is a long‐standing problem in computer animation and is considered an enabling technology for creating believable characters in film, games, and virtual social spaces, as well as for interaction with social robots. The problem is made challenging by the idiosyncratic and non‐periodic nature of human co‐speech gesture motion, and by the great diversity of communicative functions that gestures encompass. The field of gesture generation has seen surging interest in the last few years, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep‐learning‐based generative models that benefit from the growing availability of data. This review article summarizes co‐speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule‐based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text and non‐linguistic input. Concurrent with the exposition of deep learning approaches, we chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method (e.g., optical motion capture or pose estimation from video). Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human‐like motion; grounding the gesture in the co‐occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.

     
    more » « less
  3. Abstract

    Teaching a new concept through gestures—hand movements that accompany speech—facilitates learning above‐and‐beyond instruction through speech alone (e.g., Singer & Goldin‐Meadow,). However, the mechanisms underlying this phenomenon are still under investigation. Here, we use eye tracking to explore one often proposed mechanism—gesture's ability to direct visual attention. Behaviorally, we replicate previous findings: Children perform significantly better on a posttest after learning through Speech+Gesture instruction than through Speech Alone instruction. Using eye tracking measures, we show that children who watch a math lesson with gesturedoallocate their visual attention differently from children who watch a math lesson without gesture—they look more to the problem being explained, less to the instructor, and are more likely to synchronize their visual attention with information presented in the instructor's speech (i.e.,follow along with speech) than children who watch the no‐gesture lesson. The striking finding is that, even though these looking patterns positively predict learning outcomes, the patterns do notmediatethe effects of training condition (Speech Alone vs. Speech+Gesture) on posttest success. We find instead a complex relation between gesture and visual attention in which gesturemoderatesthe impact of visual looking patterns on learning—following along with speechpredicts learning for children in the Speech+Gesture condition, but not for children in the Speech Alone condition. Gesture's beneficial effects on learning thus come not merely from its ability to guide visual attention, but also from its ability to synchronize with speech and affect what learners glean from that speech.

     
    more » « less
  4. Abstract

    When asked to explain their solutions to a problem, children often gesture and, at times, these gestures convey information that is different from the information conveyed in speech. Children who produce these gesture‐speech “mismatches” on a particular task have been found to profit from instruction on that task. We have recently found that some children produce gesture‐speech mismatches when identifying numbers at the cusp of their knowledge, for example, a child incorrectly labels a set of two objects with the word “three” and simultaneously holds up two fingers. These mismatches differ from previously studied mismatches (where the information conveyed in gesture has the potential to be integrated with the information conveyed in speech) in that the gestured response contradicts the spoken response. Here, we ask whether these contradictory number mismatches predict which learners will profit from number‐word instruction. We used theGive‐a‐Numbertask to measure number knowledge in 47 children (Mage = 4.1 years,SD = 0.58), and used theWhat's on this Cardtask to assess whether children produced gesture‐speech mismatches above their knower level. Children who were early in their number learning trajectories (“one‐knowers” and “two‐knowers”) were then randomly assigned, within knower level, to one of two training conditions: a Counting condition in which children practiced counting objects; or an Enriched Number Talk condition containing counting, labeling set sizes, spatial alignment of neighboring sets, and comparison of these sets. Controlling for counting ability, we found that children were more likely to learn the meaning of new number words in the Enriched Number Talk condition than in the Counting condition, but only if they had produced gesture‐speech mismatches at pretest. The findings suggest that numerical gesture‐speech mismatches are a reliable signal that a child is ready to profit from rich number instruction and provide evidence, for the first time, that cardinal number gestures have a role to play in number‐learning.

     
    more » « less
  5. Skarnitzl, R. & (Ed.)
    While motion capture is rapidly becoming the gold standard for research on the intricacies of co-speech gesture and its relationship to speech, traditional marker-based motion capture technology is not always feasible, meaning researchers must code video data manually. We compare two methods for coding co-speech gestures of the hands and arms in video data of spontaneous speech: manual coding and semi-automated coding using OpenPose, a markerless motion capture software. We provide a comparison of the temporal alignment of gesture apexes based on video recordings of interviews with speakers of Medumba (Grassfields Bantu). Our results show a close correlation between the computationally calculated apexes and our hand-annotated apexes, suggesting that both methods are equally valid for coding video data. The use of markerless motion capture technology for gesture coding will enable more rapid coding of manual gestures, while still allowing 
    more » « less