Quantifying Emotional Similarity in Speech

Harvill, John; Leem, Seong-Gyun; AbdelWahab, Mohammed; Lotfian, Reza; Busso, Carlos

doi:10.1109/TAFFC.2021.3127390

Citation Details

Quantifying Emotional Similarity in Speech

This study proposes the novel formulation of measuring emotional similarity between speech recordings. This formulation explores the ordinal nature of emotions by comparing emotional similarities instead of predicting an emotional attribute, or recognizing an emotional category. The proposed task determines which of two alternative samples has the most similar emotional content to the emotion of a given anchor. This task raises some interesting questions. Which is the emotional descriptor that provide the most suitable space to assess emotional similarities? Can deep neural networks (DNNs) learn representations to robustly quantify emotional similarities? We address these questions by exploring alternative emotional spaces created with attribute-based descriptors and categorical emotions. We create the representation using a DNN trained with the triplet loss function, which relies on triplets formed with an anchor, a positive example, and a negative example. We select a positive sample that has similar emotion content to the anchor, and a negative sample that has dissimilar emotion to the anchor. The task of our DNN is to identify the positive sample. The experimental evaluations demonstrate that we can learn a meaningful embedding to assess emotional similarities, achieving higher performance than human evaluators asked to complete the same task. more »

Award ID(s):: 2016719 1453781

PAR ID:: 10532835

Author(s) / Creator(s):: Harvill, John; Leem, Seong-Gyun; AbdelWahab, Mohammed; Lotfian, Reza; Busso, Carlos

Corporate Creator(s):: NA

Editor(s):: NA

Publisher / Repository:: IEEE

Date Published:: 2023-04-01

Journal Name:: IEEE Transactions on Affective Computing

Volume:: 14

Issue:: 2

ISSN:: 2371-9850

Page Range / eLocation ID:: 1376 to 1390

Subject(s) / Keyword(s):: Speech emotion recognition, ordinal affective computing, representation learning of emotion similarity, triplet loss function, speech emotion retrieval

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1109/TAFFC.2021.3127390

More Like this