NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)

https://doi.org/10.1109/taffc.2019.2916092

Gideon, John; McInnis, Melvin; Mower Provost, Emily (May 2019, IEEE Transactions on Affective Computing)

Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To address this, recent work has focused on adversarial methods to find more generalized representations of emotional speech. However, many of these methods have issues converging, and only involve datasets collected in laboratory conditions. In this paper, we introduce Adversarial Discriminative Domain Generalization (ADDoG), which follows an easier to train “meet in the middle“ approach. The model iteratively moves representations learned for each dataset closer to one another, improving cross-dataset generalization. We also introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed method to more than two datasets, simultaneously. Our results show consistent convergence for the introduced methods, with significantly improved results when not using labels from the target dataset. We also show how, in most cases, ADDoG and MADDoG can be used to improve upon baseline state-of-the-art methods when target dataset labels are added and in-the-wild data are considered. Even though our experiments focus on cross-corpus speech emotion, these methods could be used to remove unwanted factors of variation in other settings.
more » « less
Full Text Available
Emotion Recognition from Natural Phone Conversations in Individuals with and without Recent Suicidal Ideation

https://doi.org/10.21437/interspeech.2019-1830

Gideon, John; Schatten, Heather T.; McInnis, Melvin G.; Provost, Emily Mower (September 2019, Interspeech)

Suicide is a serious public health concern in the U.S., taking the lives of over 47,000 people in 2017. Early detection of suicidal ideation is key to prevention. One promising approach to symptom monitoring is suicidal speech prediction, as speech can be passively collected and may indicate changes in risk. However, directly identifying suicidal speech is difficult, as characteristics of speech can vary rapidly compared with suicidal thoughts. Suicidal ideation is also associated with emotion dysregulation. Therefore, in this work, we focus on the detection of emotion from speech and its relation to suicide. We introduce the Ecological Measurement of Affect, Speech, and Suicide (EMASS) dataset, which contains phone call recordings of individuals recently discharged from the hospital following admission for suicidal ideation or behavior, along with controls. Participants self-report their emotion periodically throughout the study. However, the dataset is relatively small and has uncertain labels. Because of this, we find that most features traditionally used for emotion classification fail. We demonstrate how outside emotion datasets can be used to generate more relevant features, making this analysis possible. Finally, we use emotion predictions to differentiate healthy controls from those with suicidal ideation, providing evidence for suicidal speech detection using emotion.
more » « less
Full Text Available
The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild

Khorram, Soheil; Jaiswal, Mimansa; Gideon, John; McInnis, Melvin; Mower Provost, Emily. (October 2018, Interspeech)

Bipolar Disorder is a chronic psychiatric illness characterized by pathological mood swings associated with severe disruptions in emotion regulation. Clinical monitoring of mood is key to the care of these dynamic and incapacitating mood states. Frequent and detailed monitoring improves clinical sensitivity to detect mood state changes, but typically requires costly and limited resources. Speech characteristics change during both depressed and manic states, suggesting automatic methods applied to the speech signal can be effectively used to monitor mood state changes. However, speech is modulated by many factors, which renders mood state prediction challenging. We hypothesize that emotion can be used as an intermediary step to improve mood state prediction. This paper presents critical steps in developing this pipeline, including (1) a new in the wild emotion dataset, the PRIORI Emotion Dataset, collected from everyday smartphone conversational speech recordings, (2) activation/valence emotion recognition baselines on this dataset (PCC of 0.71 and 0.41, respectively), and (3) significant correlation between predicted emotion and mood state for individuals with bipolar disorder. This provides evidence and a working baseline for the use of emotion as a meta-feature for mood state monitoring.
more » « less
Full Text Available

Search for: All records