A Unified Biosensor–Vision Multi-Modal Transformer network for emotion recognition

Ali, Kamran; Hughes, Charles E

doi:10.1016/j.bspc.2024.107232

Citation Details

This content will become publicly available on April 1, 2026

A Unified Biosensor–Vision Multi-Modal Transformer network for emotion recognition

The development of transformer-based models has resulted in significant advances in addressing various vision and NLP-based research challenges. However, the progress made in transformer-based methods has not been effectively applied to biosensor/physiological signal-based emotion recognition research. The reasons are that transformers require large training data, and most of the biosensor datasets are not large enough to train these models. To address this issue, we propose a novel Unified Biosensor–Vision Multimodal Transformer (UBVMT) architecture, which enables self-supervised pretraining by extracting Remote Photoplethysmography (rPPG) signals from videos in the large CMU-MOSEI dataset. UBVMT classifies emotions in the arousal-valence space by combining a 2D representation of ECG/PPG signals with facial information. As opposed to modality-specific architecture, our novel unified architecture of UBVMT consists of homogeneous transformer blocks that take as input the image-based representation of the biosensor signals and the corresponding face information for emotion representation. This minimal modality-specific design reduces the number of parameters in UBVMT by half compared to conventional multimodal transformer networks, enabling its application in our web-based system, where loading large models poses significant memory challenges. UBVMT is pretrained in a self-supervised manner by employing masked autoencoding to reconstruct masked patches of video frames and 2D scalogram images of ECG/PPG signals, and contrastive modeling to align face and ECG/PPG data. Extensive experiments on publicly available datasets show that our UBVMT-based model produces comparable results to state-of-the-art techniques. more »

Award ID(s):: 2114808

PAR ID:: 10628188

Author(s) / Creator(s):: Ali, Kamran; Hughes, Charles E

Publisher / Repository:: Elsevier Ltd.

Date Published:: 2025-04-01

Journal Name:: Biomedical Signal Processing and Control

Volume:: 102

Issue:: C

ISSN:: 1746-8094

Page Range / eLocation ID:: 107232

Subject(s) / Keyword(s):: emotion recognition photoplethysmography signal, transformers, representation learning

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on April 1, 2026
Journal Article:
https://doi.org/10.1016/j.bspc.2024.107232

More Like this