<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/terms/"><records count="1" morepages="false" start="1" end="1"><record rownumber="1"><dc:product_type>Conference Paper</dc:product_type><dc:title>Multimodal Turn Analysis and Prediction for Multi-party Conversations</dc:title><dc:creator>Lee, Meng-Chen; Trinh, Mai; Deng, Zhigang</dc:creator><dc:corporate_author/><dc:editor/><dc:description>This paper presents a computational study to analyze and predict turns (i.e., turn-taking and turn-keeping) in multiparty conversations. Specifically, we use a high-fidelity hybrid data acquisition system to capture a large-scale set of multi-modal natural conversational behaviors of interlocutors in three-party conversations, including gazes, head movements, body movements, speech, etc. Based on the inter-pausal units (IPUs) extracted from the in-house acquired dataset, we propose a transformer-based computational model to predict the turns based on the interlocutor states (speaking/back-channeling/silence) and the gaze targets. Our model
can robustly achieve more than 80% accuracy, and the generalizability of our model was extensively validated through cross-group experiments. Also, we introduce a novel computational metric called “relative engagement level" (REL) of IPUs, and further validate its statistical significance between turn-keeping IPUs and turn-taking
IPUs, and between different conversational groups. Our experimental results also found that the patterns of the interlocutor states can be used as a more effective cue than their gaze behaviors for predicting turns in multiparty conversations.</dc:description><dc:publisher>ACM</dc:publisher><dc:date>2023-10-09</dc:date><dc:nsf_par_id>10532343</dc:nsf_par_id><dc:journal_name/><dc:journal_volume/><dc:journal_issue/><dc:page_range_or_elocation>436 to 444</dc:page_range_or_elocation><dc:issn/><dc:isbn>9798400700552</dc:isbn><dc:doi>https://doi.org/10.1145/3577190.3614139</dc:doi><dcq:identifierAwardId>2005430</dcq:identifierAwardId><dc:subject>Multi-party conversations</dc:subject><dc:subject>conversational gesture understanding</dc:subject><dc:subject>Multimodal interaction</dc:subject><dc:subject>Machine learning</dc:subject><dc:subject>Human-human interaction</dc:subject><dc:subject>Empirical studies</dc:subject><dc:version_number/><dc:location>Paris France</dc:location><dc:rights/><dc:institution/><dc:sponsoring_org>National Science Foundation</dc:sponsoring_org></record></records></rdf:RDF>