skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multimodal Engagement Analysis from Facial Videos in the Classroom
Student engagement is a key component of learning and teaching, resulting in a plethora of automated methods to measure it. Whereas most of the literature explores student engagement analysis using computer-based learning often in the lab, we focus on using classroom instruction in authentic learning environments. We collected audiovisual recordings of secondary school classes over a one and a half month period, acquired continuous engagement labeling per student (N=15) in repeated sessions, and explored computer vision methods to classify engagement from facial videos. We learned deep embeddings for attentional and affective features by training Attention-Net for head pose estimation and Affect-Net for facial expression recognition using previously-collected large-scale datasets. We used these representations to train engagement classifiers on our data, in individual and multiple channel settings, considering temporal dependencies. The best performing engagement classifiers achieved student-independent AUCs of .620 and .720 for grades 8 and 12, respectively, with attention-based features outperforming affective features. Score-level fusion either improved the engagement classifiers or was on par with the best performing modality. We also investigated the effect of personalization and found that only 60 seconds of person-specific data, selected by margin uncertainty of the base classifier, yielded an average AUC improvement of .084.  more » « less
Award ID(s):
1920510 2019805
PAR ID:
10349953
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
IEEE Transactions on Affective Computing
ISSN:
2371-9850
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Agents must monitor their partners' affective states continuously in order to understand and engage in social interactions. However, methods for evaluating affect recognition do not account for changes in classification performance that may occur during occlusions or transitions between affective states. This paper addresses temporal patterns in affect classification performance in the context of an infant-robot interaction, where infants’ affective states contribute to their ability to participate in a therapeutic leg movement activity. To support robustness to facial occlusions in video recordings, we trained infant affect recognition classifiers using both facial and body features. Next, we conducted an in-depth analysis of our best-performing models to evaluate how performance changed over time as the models encountered missing data and changing infant affect. During time windows when features were extracted with high confidence, a unimodal model trained on facial features achieved the same optimal performance as multimodal models trained on both facial and body features. However, multimodal models outperformed unimodal models when evaluated on the entire dataset. Additionally, model performance was weakest when predicting an affective state transition and improved after multiple predictions of the same affective state. These findings emphasize the benefits of incorporating body features in continuous affect recognition for infants. Our work highlights the importance of evaluating variability in model performance both over time and in the presence of missing data when applying affect recognition to social interactions. 
    more » « less
  2. A major challenge for online learning is the inability of systems to support student emotion and to maintain student engagement. In response to this challenge, computer vision has become an embedded feature in some instructional applications. In this paper, we propose a video dataset of college students solving math problems on the educational platform MathSpring.org with a front facing camera collecting visual feedback of student gestures. The video dataset is annotated to indicate whether students’ attention at specific frames is engaged or wandering. In addition, we train baselines for a computer vision module that determines the extent of student engagement during remote learning. Baselines include state-of-the-art deep learning image classifiers and traditional conditional and logistic regression for head pose estimation. We then incorporate a gaze baseline into the MathSpring learning platform, and we are evaluating its performance with the currently implemented approach. 
    more » « less
  3. Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)
    Multimodal Learning Analytics (MMLA) has emerged as a powerful approach within the computer-supported collaborative learning community, offering nuanced insights into learning processes through diverse data sources. Despite its potential, the prevalent reliance on traditional instruments such as tripod-mounted digital cameras for video capture often results in sub optimal data quality for facial expressions captured, which is crucial for understanding collaborative dynamics. This study introduces an innovative approach to overcome this limitation by employing 360-degree camera technology to capture students' facial features while collaborating in small working groups. A comparative analysis of 1.5 hours of video data from both traditional tripod-mounted digital cameras and 360-degree cameras evaluated the efficacy of these methods in capturing Facial Action Units (AU) and facial keypoints. The use of OpenFace revealed that the 360-degree camera captured high-quality facial features in 33.17\% of frames, significantly outperforming the traditional method's 8.34\%, thereby enhancing reliability in facial feature detection. The findings suggest a pathway for future research to integrate 360-degree camera technology in MMLA. Future research directions involve refining this technology further to improve the detection of affective states in collaborative learning environments, thereby offering a richer understanding of the learning process. 
    more » « less
  4. Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)
    Collaborative game-based learning offers opportunities for students to participate in small group learning experiences that foster knowledge sharing, problem solving, and engagement. Student satisfaction with their collaborative experiences plays a pivotal role in shaping positive learning outcomes and is a critical factor in group success during learning. Gauging students申f satisfaction within collaborative learning contexts can offer insights into student engagement and participation levels while affording practitioners the ability to provide targeted interventions or scaffolding. In this paper,we propose a framework for inferring student collaboration satisfaction with multimodal learning analytics from collaborative interactions. Utilizing multimodal data collected from 50 middle school students engaged in collaborative game-based learning, we predict student collaboration satisfaction. We first evaluate the performance of baseline models on individual modalities for insight into which modalities are most informative. We then devise a multimodal deep learning model that leverages a cross-attention mechanism to attend to salient information across modalities to enhance collaboration satisfaction prediction. Finally,we conduct ablation and feature importance analysis to understand which combination of modalities and features is most effective. Findings indicate that various combinations of data sources are highly beneficial for student collaboration satisfaction prediction. 
    more » « less
  5. Abstract Most of the research in the field of affective computing has focused on detecting and classifying human emotions through electroencephalogram (EEG) or facial expressions. Designing multimedia content to evoke certain emotions has been largely motivated by manual rating provided by users. Here we present insights from the correlation of affective features between three modalities namely, affective multimedia content, EEG, and facial expressions. Interestingly, low-level Audio-visual features such as contrast and homogeneity of the video and tone of the audio in the movie clips are most correlated with changes in facial expressions and EEG. We also detect the regions associated with the human face and the brain (in addition to the EEG frequency bands) that are most representative of affective responses. The computational modeling between the three modalities showed a high correlation between features from these regions and user-reported affective labels. Finally, the correlation between different layers of convolutional neural networks with EEG and Face images as input provides insights into human affection. Together, these findings will assist in (1) designing more effective multimedia contents to engage or influence the viewers, (2) understanding the brain/body bio-markers of affection, and (3) developing newer brain-computer interfaces as well as facial-expression-based algorithms to read emotional responses of the viewers. 
    more » « less