3D facial animation synthesis from audio has been a focus in recent years. However, most existing literature works are designed to map audio and visual content, providing limited knowledge regarding the relationship between emotion in audio and expressive facial animation. This work generates audio‐matching facial animations with the specified emotion label. In such a task, we argue that separating the content from audio is indispensable—the proposed model must learn to generate facial content from audio content while expressions from the specified emotion. We achieve it by an adaptive instance normalization module that isolates the content in the audio and combines the emotion embedding from the specified label. The joint content‐emotion embedding is then used to generate 3D facial vertices and texture maps. We compare our method with state‐of‐the‐art baselines, including the facial segmentation‐based and voice conversion‐based disentanglement approaches. We also conduct a user study to evaluate the performance of emotion conditioning. The results indicate that our proposed method outperforms the baselines in animation quality and expression categorization accuracy.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
The structure mapping task is a simple method to test people’s mental representations of spatial relationships, and has recently been particularly useful in the study of volumetric spatial cognition such as the spatial memory for locations in multilevel buildings. However, there does not exist a standardised method to analyse such data and structure mapping tasks are typically analysed by human raters, based on criteria defined by the researchers. In this article, we introduce a computational method to assess spatial relationships of objects in the vertical and horizontal domains, which are realized through the structure mapping task. Here, we reanalyse participants’ digitised structure maps from an earlier study (N=41) using the proposed computational methodology. Our results show that the new method successfully distinguishes between different types of structure map representations, and is sensitive to learning order effects. This method can be useful to advance the study of volumetric spatial cognition.more » « less