- Award ID(s):
- 1453781
- NSF-PAR ID:
- 10099019
- Date Published:
- Journal Name:
- Interspeech 2018
- Page Range / eLocation ID:
- 941 to 945
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
A challenging task in affective computing is to build reliable speech emotion recognition (SER) systems that can accurately predict emotional attributes from spontaneous speech. To increase the trust in these SER systems, it is important to predict not only their accuracy, but also their confidence. An intriguing approach to predict uncertainty is Monte Carlo (MC) dropout, which obtains pre- dictions from multiple feed-forward passes through a deep neural network (DNN) by using dropout regularization in both training and inference. This study evaluates this approach with regression models to predict emotional attribute scores for valence, arousal and dom- inance. The analysis illustrates that predicting uncertainty in this problem is possible, where the performance is higher for samples in the test set with lower uncertainty. The study evaluates uncertainty estimation as a function of the emotional attributes, showing that samples with extreme values have lower uncertainty. Finally, we demonstrate the benefits of uncertainty estimation with reject option, where a classifier can decline to give a prediction when its confi- dence is low. By rejecting only 25% of the test set with the highest uncertainty, we achieve relative performance gains of 7.34% for arousal, 13.73% for valence and 8.79% for dominance.more » « less
-
Recognizing emotions using few attribute dimensions such as arousal, valence and dominance provides the flexibility to effectively represent complex range of emotional behaviors. Conventional methods to learn these emotional descriptors primarily focus on separate models to recognize each of these attributes. Recent work has shown that learning these attributes together regularizes the models, leading to better feature representations. This study explores new forms of regularization by adding unsupervised auxiliary tasks to reconstruct hidden layer representations. This auxiliary task requires the denoising of hidden representations at every layer of an auto-encoder. The framework relies on ladder networks that utilize skip connections between encoder and decoder layers to learn powerful representations of emotional dimensions. The results show that ladder networks improve the performance of the system compared to baselines that individually learn each attribute, and conventional denoising autoencoders. Furthermore, the unsupervised auxiliary tasks have promising potential to be used in a semi-supervised setting, where few labeled sentences are available.more » « less
-
This paper presents a method for extracting novel spectral features based on a sinusoidal model. The method is focused on characterizing the spectral shapes of audio signals using spectra peaks in frequency sub-bands. The extracted features are evaluated for predicting the levels of emotional dimensions, namely arousal and valence. Principal component regression, partial least squares regression, and deep convolutional neural network (CNN) models are used as prediction models for the levels of the emotional dimensions. The experimental results indicate that the proposed features include additional spectral information that common baseline features may not include. Since the quality of audio signals, especially timbre, plays a major role in affecting the perception of emotional valence in music, the inclusion of the presented features will contribute to decreasing the prediction error rate.more » « less
-
Multivariate pattern analysis (MVPA) of functional magnetic resonance imaging (fMRI) data has critically advanced the neuroanatomical understanding of affect processing in the human brain. Central to these advancements is the brain state, a temporally-succinct fMRI-derived pattern of neural activation, which serves as a processing unit. Establishing the brain state’s central role in affect processing, however, requires that it predicts multiple independent measures of affect. We employed MVPA-based regression to predict the valence and arousal properties of visual stimuli sampled from the International Affective Picture System (IAPS) along with the corollary skin conductance response (SCR) for demographically diverse healthy human participants (n = 19). We found that brain states significantly predicted the normative valence and arousal scores of the stimuli as well as the attendant individual SCRs. In contrast, SCRs significantly predicted arousal only. The prediction effect size of the brain state was more than three times greater than that of SCR. Moreover, neuroanatomical analysis of the regression parameters found remarkable agreement with regions long-established by fMRI univariate analyses in the emotion processing literature. Finally, geometric analysis of these parameters also found that the neuroanatomical encodings of valence and arousal are orthogonal as originally posited by the circumplex model of dimensional emotion.more » « less
-
Abstract Multivariate pattern analysis (MVPA) of functional magnetic resonance imaging (fMRI) data has critically advanced the neuroanatomical understanding of affect processing in the human brain. Central to these advancements is the brain state, a temporally-succinct fMRI-derived pattern of neural activation, which serves as a processing unit. Establishing the brain state’s central role in affect processing, however, requires that it predicts multiple independent measures of affect. We employed MVPA-based regression to predict the valence and arousal properties of visual stimuli sampled from the International Affective Picture System (IAPS) along with the corollary skin conductance response (SCR) for demographically diverse healthy human participants (n = 19). We found that brain states significantly predicted the normative valence and arousal scores of the stimuli as well as the attendant individual SCRs. In contrast, SCRs significantly predicted arousal only. The prediction effect size of the brain state was more than three times greater than that of SCR. Moreover, neuroanatomical analysis of the regression parameters found remarkable agreement with regions long-established by fMRI univariate analyses in the emotion processing literature. Finally, geometric analysis of these parameters also found that the neuroanatomical encodings of valence and arousal are orthogonal as originally posited by the circumplex model of dimensional emotion.