Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition

Tran, Minh; Yin, Yufeng; Soleymani, Mohammad

doi:10.21437/Interspeech.2023-2170

Citation Details

Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition

There are individual differences in expressive behaviors driven by cultural norms and personality. This between-person variation can result in reduced emotion recognition performance. Therefore, personalization is an important step in improving the generalization and robustness of speech emotion recognition. In this paper, to achieve unsupervised personalized emotion recognition, we first pre-train an encoder with learnable speaker embeddings in a self-supervised manner to learn robust speech representations conditioned on speakers. Second, we propose an unsupervised method to compensate for the label distribution shifts by finding similar speakers and leveraging their label distributions from the training set. Extensive experimental results on the MSP-Podcast corpus indicate that our method consistently outperforms strong personalization baselines and achieves state-of-the-art performance for valence estimation. more »

Award ID(s):: 2211550

PAR ID:: 10474297

Author(s) / Creator(s):: Tran, Minh; Yin, Yufeng; Soleymani, Mohammad

Publisher / Repository:: ISCA

Date Published:: 2023-08-20

Journal Name:: Proc. INTERSPEECH 2023

Page Range / eLocation ID:: 636-640

Format(s):: Medium: X

Location:: Dublin, Ireland

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.21437/Interspeech.2023-2170

More Like this