NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Emotion Recognition in the Real World: Passively Collecting and Estimating Emotions from Natural Speech Data of Individuals with Bipolar Disorder

https://doi.org/10.1109/TAFFC.2024.3407683

Mower_Provost, Emily; Sperry, Sarah H; Tavernor, James; Anderau, Steve; Yocum, Anastasia; McInnis, Melvin G (January 2024, IEEE Transactions on Affective Computing)

Emotions provide critical information regarding a person's health and well-being. Therefore, the ability to track emotion and patterns in emotion over time could provide new opportunities in measuring health longitudinally. This is of particular importance for individuals with bipolar disorder (BD), where emotion dysregulation is a hallmark symptom of increasing mood severity. However, measuring emotions typically requires self-assessment, a willful action outside of one's daily routine. In this paper, we describe a novel approach for collecting real-world natural speech data from daily life and measuring emotions from these data. The approach combines a novel data collection pipeline and validated robust emotion recognition models. We describe a deployment of this pipeline that included parallel clinical and self-report measures of mood and self-reported measures of emotion. Finally, we present approaches to estimate clinical and self-reported mood measures using a combination of passive and self-reported emotion measures. The results demonstrate that both passive and self-reported measures of emotion contribute to our ability to accurately estimate mood symptom severity for individuals with BD.
more » « less
Full Text Available
Capturing Mismatch between Textual and Acoustic Emotion Expressions for Mood Identification in Bipolar Disorder

https://doi.org/10.21437/Interspeech.2023-1990

Niu, Minxue; Romana, Amrit; Jaiswal, Mimansa; McInnis, Melvin; Mower_Provost, Emily (August 2023, Interspeech)

Full Text Available
Gene-set Enrichment with Mathematical Biology (GEMB)

https://doi.org/10.1093/gigascience/giaa091

Cochran, Amy L; Nieser, Kenneth J; Forger, Daniel B; Zöllner, Sebastian; McInnis, Melvin G (October 2020, GigaScience)
null (Ed.)
Abstract Background Gene-set analyses measure the association between a disease of interest and a “set" of genes related to a biological pathway. These analyses often incorporate gene network properties to account for differential contributions of each gene. We extend this concept further—defining gene contributions based on biophysical properties—by leveraging mathematical models of biology to predict the effects of genetic perturbations on a particular downstream function. Results We present a method that combines gene weights from model predictions and gene ranks from genome-wide association studies into a weighted gene-set test. We demonstrate in simulation how such a method can improve statistical power. To this effect, we identify a gene set, weighted by model-predicted contributions to intracellular calcium ion concentration, that is significantly related to bipolar disorder in a small dataset (P = 0.04; n = 544). We reproduce this finding using publicly available summary data from the Psychiatric Genomics Consortium (P = 1.7 × 10−4; n = 41,653). By contrast, an approach using a general calcium signaling pathway did not detect a significant association with bipolar disorder (P = 0.08). The weighted gene-set approach based on intracellular calcium ion concentration did not detect a significant relationship with schizophrenia (P = 0.09; n = 65,967) or major depression disorder (P = 0.30; n = 500,199). Conclusions Together, these findings show how incorporating math biology into gene-set analyses might help to identify biological functions that underlie certain polygenic disorders.
more » « less
Full Text Available
Into the Wild: Transitioning from Recognizing Mood in Clinical Interactions to Personal Conversations for Individuals with Bipolar Disorder

https://doi.org/10.21437/interspeech.2019-2698

Matton, Katie; McInnis, Melvin G.; Provost, Emily Mower (September 2019, Interspeech)

Bipolar Disorder, a mood disorder with recurrent mania and depression, requires ongoing monitoring and specialty management. Current monitoring strategies are clinically-based, engaging highly specialized medical professionals who are becoming increasingly scarce. Automatic speech-based monitoring via smartphones has the potential to augment clinical monitoring by providing inexpensive and unobtrusive measurements of a patient’s daily life. The success of such an approach is contingent on the ability to successfully utilize “in-the-wild” data. However, most existing work on automatic mood detection uses datasets collected in clinical or laboratory settings. This study presents experiments in automatically detecting depression severity in individuals with Bipolar Disorder using data derived from clinical interviews and from personal conversations. We find that mood assessment is more accurate using data collected from clinical interactions, in part because of their highly structured nature. We demonstrate that although the features that are most effective in clinical interactions do not extend well to personal conversational data, we can identify alternative features relevant in personal conversational speech to detect mood symptom severity. Our results highlight the challenges unique to working with “in-the-wild” data, providing insight into the degree to which the predictive ability of speech features is preserved outside of a clinical interview.
more » « less
Full Text Available
Jointly Aligning and Predicting Continuous Emotion Annotations

https://doi.org/10.1109/taffc.2019.2917047

Khorram, Soheil; McInnis, Melvin; Mower Provost, Emily (May 2019, IEEE Transactions on Affective Computing)

Time-continuous dimensional descriptions of emotions (e.g., arousal, valence) allow researchers to characterize short-time changes and to capture long-term trends in emotion expression. However, continuous emotion labels are generally not synchronized with the input speech signal due to delays caused by reaction-time, which is inherent in human evaluations. To deal with this challenge, we introduce a new convolutional neural network (multi-delay sinc network) that is able to simultaneously align and predict labels in an end-to-end manner. The proposed network is a stack of convolutional layers followed by an aligner network that aligns the speech signal and emotion labels. This network is implemented using a new convolutional layer that we introduce, the delayed sinc layer. It is a time-shifted low-pass (sinc) filter that uses a gradient-based algorithm to learn a single delay. Multiple delayed sinc layers can be used to compensate for a non-stationary delay that is a function of the acoustic space. We test the efficacy of this system on two common emotion datasets, RECOLA and SEWA, and show that this approach obtains state-of-the-art speech-only results by learning time-varying delays while predicting dimensional descriptors of emotions.
more » « less
Full Text Available
Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)

https://doi.org/10.1109/taffc.2019.2916092

Gideon, John; McInnis, Melvin; Mower Provost, Emily (May 2019, IEEE Transactions on Affective Computing)

Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To address this, recent work has focused on adversarial methods to find more generalized representations of emotional speech. However, many of these methods have issues converging, and only involve datasets collected in laboratory conditions. In this paper, we introduce Adversarial Discriminative Domain Generalization (ADDoG), which follows an easier to train “meet in the middle“ approach. The model iteratively moves representations learned for each dataset closer to one another, improving cross-dataset generalization. We also introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed method to more than two datasets, simultaneously. Our results show consistent convergence for the introduced methods, with significantly improved results when not using labels from the target dataset. We also show how, in most cases, ADDoG and MADDoG can be used to improve upon baseline state-of-the-art methods when target dataset labels are added and in-the-wild data are considered. Even though our experiments focus on cross-corpus speech emotion, these methods could be used to remove unwanted factors of variation in other settings.
more » « less
Full Text Available
Emotion Recognition from Natural Phone Conversations in Individuals with and without Recent Suicidal Ideation

https://doi.org/10.21437/interspeech.2019-1830

Gideon, John; Schatten, Heather T.; McInnis, Melvin G.; Provost, Emily Mower (September 2019, Interspeech)

Suicide is a serious public health concern in the U.S., taking the lives of over 47,000 people in 2017. Early detection of suicidal ideation is key to prevention. One promising approach to symptom monitoring is suicidal speech prediction, as speech can be passively collected and may indicate changes in risk. However, directly identifying suicidal speech is difficult, as characteristics of speech can vary rapidly compared with suicidal thoughts. Suicidal ideation is also associated with emotion dysregulation. Therefore, in this work, we focus on the detection of emotion from speech and its relation to suicide. We introduce the Ecological Measurement of Affect, Speech, and Suicide (EMASS) dataset, which contains phone call recordings of individuals recently discharged from the hospital following admission for suicidal ideation or behavior, along with controls. Participants self-report their emotion periodically throughout the study. However, the dataset is relatively small and has uncertain labels. Because of this, we find that most features traditionally used for emotion classification fail. We demonstrate how outside emotion datasets can be used to generate more relevant features, making this analysis possible. Finally, we use emotion predictions to differentiate healthy controls from those with suicidal ideation, providing evidence for suicidal speech detection using emotion.
more » « less
Full Text Available
Trainable Time Warping: Aligning Time-series in the Continuous-time Domain

https://doi.org/10.1109/icassp.2019.8682322

Khorram, Soheil; McInnis, Melvin G; Mower Provost, Emily (May 2019, International Conference on Acoustics, Speech, and Signal Processing (ICASSP))

DTW calculates the similarity or alignment between two signals, subject to temporal warping. However, its computational complexity grows exponentially with the number of time-series. Although there have been algorithms developed that are linear in the number of time-series, they are generally quadratic in time-series length. The exception is generalized time warping (GTW), which has linear computational cost. Yet, it can only identify simple time warping functions. There is a need for a new fast, high-quality multisequence alignment algorithm. We introduce trainable time warping (TTW), whose complexity is linear in both the number and the length of time-series. TTW performs alignment in the continuoustime domain using a sinc convolutional kernel and a gradient-based optimization technique. We compare TTW and GTW on S5 UCR datasets in time-series averaging and classification. TTW outperforms GTW on 67.1% of the datasets for the averaging tasks, and 61.2% of the datasets for the classification tasks.
more » « less
Full Text Available
Identifying Mood Episodes Using Dialogue Features from Clinical Interviews

https://doi.org/10.21437/interspeech.2019-1878

Aldeneh, Zakaria; Jaiswal, Mimansa; Picheny, Michael; McInnis, Melvin G.; Provost, Emily Mower (September 2019, Interspeech)

Bipolar disorder, a severe chronic mental illness characterized by pathological mood swings from depression to mania, requires ongoing symptom severity tracking to both guide and measure treatments that are critical for maintaining long-term health. Mental health professionals assess symptom severity through semi-structured clinical interviews. During these interviews, they observe their patients’ spoken behaviors, including both what the patients say and how they say it. In this work, we move beyond acoustic and lexical information, investigating how higher-level interactive patterns also change during mood episodes. We then perform a secondary analysis, asking if these interactive patterns, measured through dialogue features, can be used in conjunction with acoustic features to automatically recognize mood episodes. Our results show that it is beneficial to consider dialogue features when analyzing and building automated systems for predicting and monitoring mood.
more » « less
Full Text Available
The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild

Khorram, Soheil; Jaiswal, Mimansa; Gideon, John; McInnis, Melvin; Mower Provost, Emily. (October 2018, Interspeech)

Bipolar Disorder is a chronic psychiatric illness characterized by pathological mood swings associated with severe disruptions in emotion regulation. Clinical monitoring of mood is key to the care of these dynamic and incapacitating mood states. Frequent and detailed monitoring improves clinical sensitivity to detect mood state changes, but typically requires costly and limited resources. Speech characteristics change during both depressed and manic states, suggesting automatic methods applied to the speech signal can be effectively used to monitor mood state changes. However, speech is modulated by many factors, which renders mood state prediction challenging. We hypothesize that emotion can be used as an intermediary step to improve mood state prediction. This paper presents critical steps in developing this pipeline, including (1) a new in the wild emotion dataset, the PRIORI Emotion Dataset, collected from everyday smartphone conversational speech recordings, (2) activation/valence emotion recognition baselines on this dataset (PCC of 0.71 and 0.41, respectively), and (3) significant correlation between predicted emotion and mood state for individuals with bipolar disorder. This provides evidence and a working baseline for the use of emotion as a meta-feature for mood state monitoring.
more » « less
Full Text Available

Search for: All records