FACS3D-Net: 3D Convolution based Spatiotemporal Representation for Action Unit Detection

Yang, Le; Ertugrul, Itir Onal; Cohn, Jeffrey F.; Hammal, Zakia; Jiang, Dongmei; Sahli, Hichem

doi:10.1109/ACII.2019.8925514

Citation Details

FACS3D-Net: 3D Convolution based Spatiotemporal Representation for Action Unit Detection

Most approaches to automatic facial action unit (AU) detection consider only spatial information and ignore AU dynamics. For humans, dynamics improves AU perception. Is same true for algorithms? To make use of AU dynamics, recent work in automated AU detection has proposed a sequential spatiotemporal approach: Model spatial information using a 2D CNN and then model temporal information using LSTM (Long-Short-Term Memory). Inspired by the experience of human FACS coders, we hypothesized that combining spatial and temporal information simultaneously would yield more powerful AU detection. To achieve this, we propose FACS3D-Net that simultaneously integrates 3D and 2D CNN. Evaluation was on the Expanded BP4D+ database of 200 participants. FACS3D-Net outperformed both 2D CNN and 2D CNN-LSTM approaches. Visualizations of learnt representations suggest that FACS3D-Net is consistent with the spatiotemporal dynamics attended to by human FACS coders. To the best of our knowledge, this is the first work to apply 3D CNN to the problem of AU detection. more »

Award ID(s):: 1721667

PAR ID:: 10168237

Author(s) / Creator(s):: Yang, Le; Ertugrul, Itir Onal; Cohn, Jeffrey F.; Hammal, Zakia; Jiang, Dongmei; Sahli, Hichem

Date Published:: 2019-09-01

Journal Name:: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)

Page Range / eLocation ID:: 538 to 544

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ACII.2019.8925514

More Like this