skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PROSODIC PREDICTORS OF TEMPORAL STRUCTURE IN CONVERSATIONAL TURN-TAKING
Fluid conversation depends on conversation partners’ ability to make predictions about one another’s speech in order to forecast turn ends and prepare upcoming turns. One model used to explain this process of temporal prediction is the coupled oscillator model of turn-taking. A generalization that the model captures is the relative scarcity of interruption in turn-taking, as it predicts partners’ turns should be counter-phased to one another, with minimal pause time between turns. However, in naturalistic conversation, turns are often delayed, rather than occurring in perfect succession. We hypothesize that these delays are not of arbitrary duration, but are structured in their timing, just as between turns with immediate transitions. We demonstrate that relative timing of prosodic events occurring at turn ends is key to modelling pause duration between turns, providing evidence that interturn pauses exist in a temporal trading relation with the final syllable and prosodic word of immediately preceding turn.  more » « less
Award ID(s):
2306149
PAR ID:
10451410
Author(s) / Creator(s):
Editor(s):
Skarnitzl, R.; Volín, J.
Date Published:
Journal Name:
Proceedings of the International Congress of Phonetic Sciences
ISSN:
0301-3162
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents a computational study to analyze and predict turns (i.e., turn-taking and turn-keeping) in multiparty conversations. Specifically, we use a high-fidelity hybrid data acquisition system to capture a large-scale set of multi-modal natural conversational behaviors of interlocutors in three-party conversations, including gazes, head movements, body movements, speech, etc. Based on the inter-pausal units (IPUs) extracted from the in-house acquired dataset, we propose a transformer-based computational model to predict the turns based on the interlocutor states (speaking/back-channeling/silence) and the gaze targets. Our model can robustly achieve more than 80% accuracy, and the generalizability of our model was extensively validated through cross-group experiments. Also, we introduce a novel computational metric called “relative engagement level" (REL) of IPUs, and further validate its statistical significance between turn-keeping IPUs and turn-taking IPUs, and between different conversational groups. Our experimental results also found that the patterns of the interlocutor states can be used as a more effective cue than their gaze behaviors for predicting turns in multiparty conversations. 
    more » « less
  2. Conversational search is an emerging topic in the information retrieval community. One of the major challenges to multi-turn conversational search is to model the conversation history to understand the current question. Existing methods either prepend history turns to the current question or use complicated attention mechanisms to model the history. We propose a conceptually simple yet highly effective approach referred to as history answer embedding. It enables seamless integration of conversation history into a conversational question answering (ConvQA) model built on BERT (Bidirectional Encoder Representations from Transformers). We first explain our view that ConvQA is a simplified but concrete setting of conversational search, and then we provide a general framework to solve ConvQA. We further demonstrate the effectiveness of our approach under this framework. Finally, we analyze the impact of different numbers of history turns under different settings. We show that history prepending methods degrade dramatically when given a long conversation history while our method is robust and shows advantages under such a situation, which provides new insights into conversation history modeling in ConvQA. 
    more » « less
  3. Skarnitzl, R. & (Ed.)
    The timing of both manual co-speech gestures and head gestures is sensitive to prosodic structure of speech. However, head gesters are used not only by speakers, but also by listeners as a backchanneling device. Little research exists on the timing of gestures in back-channeling. To address this gap, we compare timing of listener and speaker head gestures in an interview context. Results reveal the dual role that head gestures play in speech and conversational interaction: while they are coordinated in key ways to one’s own speech, they are also coordinated to the gestures (and hence, the speech) of a conversation partner when one is actively listening to them. We also show that head gesture timing is sensitive to social dynamics between interlocutors. This study provides a novel contribution to literature on head gesture timing and has implications for studies of discourse and accommodation. 
    more » « less
  4. Participants in a conversation must carefully monitor the turn-management (speaking and listening) willingness of other conversational partners and adjust their turn-changing behaviors accordingly to have smooth conversation. Many studies have focused on developing actual turn-changing (i.e., next speaker or end-of-turn) models that can predict whether turn-keeping or turn-changing will occur. Participants' verbal and non-verbal behaviors have been used as input features for predictive models. To the best of our knowledge, these studies only model the relationship between participant behavior and turn-changing. Thus, there is no model that takes into account participants' willingness to acquire a turn (turn-management willingness). In this paper, we address the challenge of building such models to predict the willingness of both speakers and listeners. Firstly, we find that dissonance exists between willingness and actual turn-changing. Secondly, we propose predictive models that are based on trimodal inputs, including acoustic, linguistic, and visual cues distilled from conversations. Additionally, we study the impact of modeling willingness to help improve the task of turn-changing prediction. To do so, we introduce a dyadic conversation corpus with annotated scores of speaker/listener turn-management willingness. Our results show that using all three modalities (i.e., acoustic, linguistic, and visual cues) of the speaker and listener is critically important for predicting turn-management willingness. Furthermore, explicitly adding willingness as a prediction task improves the performance of turn-changing prediction. Moreover, turn-management willingness prediction becomes more accurate when this joint prediction of turn-management willingness and turn-changing is performed by using multi-task learning techniques. 
    more » « less
  5. Robot-mediated interventions have been investigated for the treatment of social skill deficits amongst children with Autism Spectrum Disorder (ASD). Does the use of a Nao robot as a mediator increase vocal interaction between children with ASD? The present study examined the vocalization and turn-taking rate in six children with ASD (mean age = 11.4 years, SD = 0.86 years) interacting with and without a Nao robot for 10 sessions, order counterbalanced. Each session lasted nine minutes. In the Robot condition, the robot provided vocal prompts; in the No Robot condition, children interacted freely. Child vocalization and turn-taking rate defined as the number of utterances/turns per second were measured. Results demonstrated that three children produced higher vocalization and turn-taking rates when a robot was present, and two when it was absent. One participant produced higher vocalization rates when the robot was not present, but more conversational turns when the robot was present. The findings suggest that the use of a Nao robot as a social mediator increases vocalization and turn-taking rates among children with ASD, but large individual variability is observed. The effect of the robot as a mediator on lexical diversity of child speech will also be investigated. 
    more » « less