Conference PaperMultimodal Turn Analysis and Prediction for Multi-party ConversationsLee, Meng-Chen; Trinh, Mai; Deng, ZhigangThis paper presents a computational study to analyze and predict turns (i.e., turn-taking and turn-keeping) in multiparty conversations. Specifically, we use a high-fidelity hybrid data acquisition system to capture a large-scale set of multi-modal natural conversational behaviors of interlocutors in three-party conversations, including gazes, head movements, body movements, speech, etc. Based on the inter-pausal units (IPUs) extracted from the in-house acquired dataset, we propose a transformer-based computational model to predict the turns based on the interlocutor states (speaking/back-channeling/silence) and the gaze targets. Our model can robustly achieve more than 80% accuracy, and the generalizability of our model was extensively validated through cross-group experiments. Also, we introduce a novel computational metric called “relative engagement level" (REL) of IPUs, and further validate its statistical significance between turn-keeping IPUs and turn-taking IPUs, and between different conversational groups. Our experimental results also found that the patterns of the interlocutor states can be used as a more effective cue than their gaze behaviors for predicting turns in multiparty conversations.ACM2023-10-0910532343436 to 4449798400700552https://doi.org/10.1145/3577190.36141392005430Multi-party conversationsconversational gesture understandingMultimodal interactionMachine learningHuman-human interactionEmpirical studiesParis FranceNational Science Foundation