skip to main content


Title: Multimodal Turn Analysis and Prediction for Multi-party Conversations
This paper presents a computational study to analyze and predict turns (i.e., turn-taking and turn-keeping) in multiparty conversations. Specifically, we use a high-fidelity hybrid data acquisition system to capture a large-scale set of multi-modal natural conversational behaviors of interlocutors in three-party conversations, including gazes, head movements, body movements, speech, etc. Based on the inter-pausal units (IPUs) extracted from the in-house acquired dataset, we propose a transformer-based computational model to predict the turns based on the interlocutor states (speaking/back-channeling/silence) and the gaze targets. Our model can robustly achieve more than 80% accuracy, and the generalizability of our model was extensively validated through cross-group experiments. Also, we introduce a novel computational metric called “relative engagement level" (REL) of IPUs, and further validate its statistical significance between turn-keeping IPUs and turn-taking IPUs, and between different conversational groups. Our experimental results also found that the patterns of the interlocutor states can be used as a more effective cue than their gaze behaviors for predicting turns in multiparty conversations.  more » « less
Award ID(s):
2005430
NSF-PAR ID:
10532343
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400700552
Page Range / eLocation ID:
436 to 444
Subject(s) / Keyword(s):
Multi-party conversations conversational gesture understanding Multimodal interaction Machine learning Human-human interaction Empirical studies
Format(s):
Medium: X
Location:
Paris France
Sponsoring Org:
National Science Foundation
More Like this
  1. Understanding why certain individuals work well (or poorly) together as a team is a key research focus in the psychological and behavioral sciences and a fundamental problem for team-based organizations. Nevertheless, we have a limited ability to predict the social and work-related dynamics that will emerge from a given combination of team members. In this work, we model vocal turn-taking behavior within conversations as a parametric stochastic process on a network composed of the team members. More precisely, we model the dynamic of exchanging the `speaker token' among team members as a random walk in a graph that is driven by both individual level features and the conversation history. We fit our model to conversational turn-taking data extracted from audio recordings of multinational student teams during undergraduate engineering design internships. Through this real-world data we validate the explanatory power of our model and we unveil statistically significant differences in speaking behaviors between team members of different nationalities. 
    more » « less
  2. Participants in a conversation must carefully monitor the turn-management (speaking and listening) willingness of other conversational partners and adjust their turn-changing behaviors accordingly to have smooth conversation. Many studies have focused on developing actual turn-changing (i.e., next speaker or end-of-turn) models that can predict whether turn-keeping or turn-changing will occur. Participants' verbal and non-verbal behaviors have been used as input features for predictive models. To the best of our knowledge, these studies only model the relationship between participant behavior and turn-changing. Thus, there is no model that takes into account participants' willingness to acquire a turn (turn-management willingness). In this paper, we address the challenge of building such models to predict the willingness of both speakers and listeners. Firstly, we find that dissonance exists between willingness and actual turn-changing. Secondly, we propose predictive models that are based on trimodal inputs, including acoustic, linguistic, and visual cues distilled from conversations. Additionally, we study the impact of modeling willingness to help improve the task of turn-changing prediction. To do so, we introduce a dyadic conversation corpus with annotated scores of speaker/listener turn-management willingness. Our results show that using all three modalities (i.e., acoustic, linguistic, and visual cues) of the speaker and listener is critically important for predicting turn-management willingness. Furthermore, explicitly adding willingness as a prediction task improves the performance of turn-changing prediction. Moreover, turn-management willingness prediction becomes more accurate when this joint prediction of turn-management willingness and turn-changing is performed by using multi-task learning techniques. 
    more » « less
  3. Effective storytelling relies on engagement and interaction. This work develops an automated software platform for telling stories to children and investigates the impact of two design choices on children’s engagement and willingness to interact with the system: story distribution and the use of complex gesture. A storyteller condition compares stories told in a third person, narrator voice with those distributed between a narrator and first-person story characters. Basic gestures are used in all our storytellings, but, in a second factor, some are augmented with gestures that indicate conversational turn changes, references to other characters and prompt children to ask questions. An analysis of eye gaze indicates that children attend more to the story when a distributed storytelling model is used. Gesture prompts appear to encourage children to ask questions, something that children did, but at a relatively low rate. Interestingly, the children most frequently asked “why” questions. Gaze switching happened more quickly when the story characters began to speak than for narrator turns. These results have implications for future agent-based storytelling system research. 
    more » « less
  4. null (Ed.)
    Effective storytelling relies on engagement and interaction. This work develops an automated software platform for telling stories to children and investigates the impact of two design choices on children’s engagement and willingness to interact with the system: story distribution and the use of complex gesture. A storyteller condition compares stories told in a third person, narrator voice with those distributed between a narrator and first-person story characters. Basic gestures are used in all our storytellings, but, in a second factor, some are augmented with gestures that indicate conversational turn changes, references to other characters and prompt children to ask questions. An analysis of eye gaze indicates that children attend more to the story when a distributed storytelling model is used. Gesture prompts appear to encourage children to ask questions, something that children did, but at a relatively low rate. Interestingly, the children most frequently asked “why” questions. Gaze switching happened more quickly when the story characters began to speak than for narrator turns. These results have implications for future agent-based storytelling system research. 
    more » « less
  5. Abstract

    This article introduces the concept of “topic territoriality,” a mechanism that governs participation in conversational spaces. When a discussion becomes prone to territorialization, individuals are more likely to claim topics (participating in discussions about topics they own as “stakeholders”) and defer (reducing participation in topics owned by others). They are also more likely to patrol topic boundaries (monitoring who is participating and confronting topic “intruders”). We document the operation of topic territoriality by analyzing 112,278 conversational turns on Weibo before and after a policy that reveals users’ broad geographic locations. We find that revealing these locations increased territorial behaviors, leading to more homogenous participation in conversations. Although the display of locations has improved the overall civility in language, the confrontations between stakeholders and intruders became more toxic. Our research emphasizes the impact of topic territoriality in online conversations and sheds light on the unintended consequences of social media policies.

     
    more » « less