NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

HARP 3.0: Generalizing I/O and API Support for Machine Learning in Digital Audio Workstations

Cwitkowitz, Frank; Benetatos, Christodoulos; Deng, Qixin; Yu, Huiran; Pruyne, Nathan; O’Reilly, Patrick; Garcia, Hugo Flores; Duan, Zhiyao; Pardo, Bryan (December 2025, NeurIPS 2025 Workshop on AI for Music)

Free, publicly-accessible full text available December 1, 2026
S2M-Net: Speech Driven Three-party Conversational Motion Synthesis Networks

https://doi.org/10.1145/3561975.3562954

Jin, Aobo; Deng, Qixin; Deng, Zhigang (November 2022, Proceeding of ACM SIGGRAPH Conference on Motion, Interaction, and Games 2022)

In this paper we propose a novel conditional generative adversarial network (cGAN) architecture, called S2M-Net, to holistically synthesize realistic three-party conversational animations based on acoustic speech input together with speaker marking (i.e., the speak- ing time of each interlocutor). Specifically, based on a pre-collected three-party conversational motion dataset, we design and train the S2M-Net for three-party conversational animation synthesis. In the architecture, a generator contains a LSTM encoder to encode a sequence of acoustic speech features to a latent vector that is further fed into a transform unit to transform the latent vector into a gesture kinematics space. Then, the output of this transform unit is fed into a LSTM decoder to generate corresponding three-party conversational gesture kinematics. Meanwhile, a discriminator is implemented to check whether an input sequence of three-party conversational gesture kinematics is real or fake. To evaluate our method, besides quantitative and qualitative evaluations, we also conducted paired comparison user studies to compare it with the state of the art.
more » « less
Full Text Available
End-to-End 3D Face Reconstruction with Expressions and Specular Albedos from Single In-the-wild Images

https://doi.org/10.1145/3503161.3547800

Deng, Qixin; Le, Binh H.; Jin, Aobo; Deng, Zhigang (October 2022, Proceeding of ACM International Conference on Multimedia 2022)

Recovering 3D face models from in-the-wild face images has numerous potential applications. However, properly modeling complex lighting effects in reality, including specular lighting, shadows, and occlusions, from a single in-the-wild face image is still considered as a widely open research challenge. In this paper, we propose a convolutional neural network based framework to regress the face model from a single image in the wild. The outputted face model includes dense 3D shape, head pose, expression, diffuse albedo, specular albedo, and the corresponding lighting conditions. Our approach uses novel hybrid loss functions to disentangle face shape identities, expressions, poses, albedos, and lighting. Besides a carefully designed ablation study, we also conduct direct comparison experiments to show that our method can outperform state-of-art methods both quantitatively and qualitatively.
more » « less
Full Text Available
Plausible 3D Face Wrinkle Generation Using Variational Autoencoders

https://doi.org/10.1109/TVCG.2021.3051251

Deng, Qixin; Ma, Luming; Jin, Aobo; Bi, Huikun; Le, Binh Huy; Deng, Zhigang (September 2022, IEEE Transactions on Visualization and Computer Graphics)

Full Text Available
A Live Speech-Driven Avatar-Mediated Three-Party Telepresence System: Design and Evaluation

https://doi.org/10.1162/PRES_a_00358

Jin, Aobo; Deng, Qixin; Deng, Zhigang (January 2020, PRESENCE: Virtual and Augmented Reality)

Abstract In this article, we present a live speech-driven, avatar-mediated, three-party telepresence system, through which three distant users, embodied as avatars in a shared 3D virtual world, can perform natural three-party telepresence that does not require tracking devices. Based on live speech input from three users, this system can real-time generate the corresponding conversational motions of all the avatars, including head motion, eye motion, lip movement, torso motion, and hand gesture. All motions are generated automatically at each user side based on live speech input, and a cloud server is utilized to transmit and synchronize motion and speech among different users. We conduct a formal user study to evaluate the usability and effectiveness of the system by comparing it with a well-known online virtual world, Second Life, and a widely-used online teleconferencing system, Skype. The user study results indicate our system can provide a measurably better telepresence user experience than the two widely-used methods.
more » « less
Full Text Available

Search for: All records