skip to main content


Title: Multimodal Transformer for Unaligned Multimodal Language Sequences
Award ID(s):
1750439 1722822
NSF-PAR ID:
10126511
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the Annual Meeting of the Association for Computational Linguistics
Page Range / eLocation ID:
6558 to 6569
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
  2. We present a new multimodal, context-based dataset for continuous authentication. The dataset contains 27 subjects, with an age range of [8, 72], where data has been collected across multiple sessions while the subjects are watching videos meant to elicit an emotional response. Collected data includes accelerometer data, heart rate, electrodermal activity, skin temperature, and face videos. We also propose a baseline approach for fair comparisons when using the proposed dataset. The approach uses a combination of a pretrained backbone network with supervised contrastive loss for face. Time-series features are also extracted, from the physiological signals, which are used for classification. This approach, on the proposed dataset, results in an average accuracy, precision, and recall of 76.59%, 88.90, and 53.25, respectively, on electrical signals, and 90.39%, 98.77, and 75.71, respectively on face videos. 
    more » « less