S2M-Net: Speech Driven Three-party Conversational Motion Synthesis Networks

Jin, Aobo; Deng, Qixin; Deng, Zhigang

doi:10.1145/3561975.3562954

Citation Details

S2M-Net: Speech Driven Three-party Conversational Motion Synthesis Networks

In this paper we propose a novel conditional generative adversarial network (cGAN) architecture, called S2M-Net, to holistically synthesize realistic three-party conversational animations based on acoustic speech input together with speaker marking (i.e., the speak- ing time of each interlocutor). Specifically, based on a pre-collected three-party conversational motion dataset, we design and train the S2M-Net for three-party conversational animation synthesis. In the architecture, a generator contains a LSTM encoder to encode a sequence of acoustic speech features to a latent vector that is further fed into a transform unit to transform the latent vector into a gesture kinematics space. Then, the output of this transform unit is fed into a LSTM decoder to generate corresponding three-party conversational gesture kinematics. Meanwhile, a discriminator is implemented to check whether an input sequence of three-party conversational gesture kinematics is real or fake. To evaluate our method, besides quantitative and qualitative evaluations, we also conducted paired comparison user studies to compare it with the state of the art. more »

Award ID(s):: 2005430

PAR ID:: 10463603

Author(s) / Creator(s):: Jin, Aobo; Deng, Qixin; Deng, Zhigang

Date Published:: 2022-11-03

Journal Name:: Proceeding of ACM SIGGRAPH Conference on Motion, Interaction, and Games 2022

Page Range / eLocation ID:: 2:1 to 2:10

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3561975.3562954

More Like this