NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Audio-Visual Emotion Recognition With Preference Learning Based on Intended and Multi-Modal Perceived Labels

https://doi.org/10.1109/TAFFC.2023.3234777

Lei, Yuanyuan; Cao, Houwei (October 2023, IEEE Transactions on Affective Computing)

Full Text Available
Multimodal Emotion Recognition with Surgical and Fabric Masks

https://doi.org/10.1109/ICASSP43922.2022.9746414

Yang, Ziqing; Nayan, Katherine; Fan, Zehao; Cao, Houwei (May 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

In this study, we investigate how different types of masks affect automatic emotion classification in different channels of audio, visual, and multimodal. We train emotion classification models for each modality with the original data without mask and the re-generated data with mask respectively, and investigate how muffled speech and occluded facial expressions change the prediction of emotions. Moreover, we conduct the contribution analysis to study how muffled speech and occluded face interplay with each other and further investigate the individual contribution of audio, visual, and audio-visual modalities to the prediction of emotion with and without mask. Finally, we investigate the cross-corpus emotion recognition across clear speech and re-generated speech with different types of masks, and discuss the robustness of speech emotion recognition.
more » « less
Full Text Available
Analysis of Eye Fixations During Emotion Recognition in Talking Faces

https://doi.org/10.1109/ACII52823.2021.9597440

Cao, Houwei; Elliott, Forest (September 2021, 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII))

Full Text Available
Exploration of Acoustic and Lexical Cues for the INTERSPEECH 2020 Computational Paralinguistic Challenge

https://doi.org/10.21437/Interspeech.2020-2999

Yang, Ziqing; An, Zifan; Fan, Zehao; Jing, Chengye; Cao, Houwei (October 2020, INTERSPEECH 2020)
null (Ed.)
In this paper, we investigate various acoustic features and lexical features for the INTERSPEECH 2020 Computational Paralinguistic Challenge. For the acoustic analysis, we show that the proposed FV-MFCC feature is very promising, which has very strong prediction power on its own, and can also provide complementary information when fused with other acoustic features. For the lexical representation, we find that the corpus-dependent TF.IDF feature is by far the best representation. We also explore several model fusion techniques to combine different modalities together, and propose novel SVM models to aggregate the chunk-level predictions to the narrative-level predictions based on the chunk-level decision functionals. Finally we discuss the potential for improving prediction by combining the lexical and acoustic modalities together, and we find that fusion of lexical and acoustic modalities do not lead to consistent improvements over elderly Arousal, but substantially improve over the Valence. Our methods significantly outperform the official baselines on the test set in the participated Mask and Elderly Sub-challenges. We obtain an UAR of 75.1%, 54.3%, and 59.0% on the Mask, Elderly Arousal and Valence prediction tasks respectively.
more » « less
Full Text Available

Search for: All records