Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function

Harvill, John; AbdelWahab, Mohammed; Lotfian, Reza; Busso, Carlos

doi:10.1109/ICASSP.2019.8683273

Citation Details

Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function

The ability to identify speech with similar emotional content is valuable to many applications, including speech retrieval, surveillance, and emotional speech synthesis. While current formulations in speech emotion recognition based on classification or regression are not appropriate for this task, solutions based on preference learning offer appealing approaches for this task. This paper aims to find speech samples that are emotionally similar to an anchor speech sample provided as a query. This novel formulation opens interesting research questions. How well can a machine complete this task? How does the accuracy of automatic algorithms compare to the performance of a human performing this task? This study addresses these questions by training a deep learning model using a triplet loss function, mapping the acoustic features into an embedding that is discriminative for this task. The network receives an anchor speech sample and two competing speech samples, and the task is to determine which of the candidate speech sample conveys the closest emotional content to the emotion conveyed by the anchor. By comparing the results from our model with human perceptual evaluations, this study demonstrates that the proposed approach has performance very close to human performance in retrieving samples with similar emotional content. more »

Award ID(s):: 1453781

PAR ID:: 10099017

Author(s) / Creator(s):: Harvill, John; AbdelWahab, Mohammed; Lotfian, Reza; Busso, Carlos

Date Published:: 2019-05-01

Journal Name:: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)

Page Range / eLocation ID:: 7400 to 7404

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICASSP.2019.8683273

More Like this