skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Long-Term Human Participation Assessment in Collaborative Learning Environments Using Dynamic Scene Analysis
The paper develops datasets and methods to assess student participation in real-life collaborative learning environments. In collaborative learning environments, students are organized into small groups where they are free to interact within their group. Thus, students can move around freely causing issues with strong pose variation, move out and re-enter the camera scene, or face away from the camera. We formulate the problem of assessing student participation into two subproblems: (i) student group detection against strong background interference from other groups, and (ii) dynamic participant tracking within the group. A massive independent testing dataset of 12,518,250 student label instances, of total duration of 21 hours and 22 minutes of real-life videos, is used for evaluating the performance of our proposed method for student group detection. The proposed method of using multiple image representations is shown to perform equally or better than YOLO on all video instances. Over the entire dataset, the proposed method achieved an F1 score of 0.85 compared to 0.80 for YOLO. Following student group detection, the paper presents the development of a dynamic participant tracking system for assessing student group participation through long video sessions. The proposed dynamic participant tracking system is shown to perform exceptionally well, missing a student in just one out of 35 testing videos. In comparison, a stateof- the-art method fails to track students in 14 out of the 35 testing videos. The proposed method achieves 82.3% accuracy on an independent set of long, real-life collaborative videos.  more » « less
Award ID(s):
1949230
PAR ID:
10520573
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE Access
Date Published:
Journal Name:
IEEE Access
Volume:
12
ISSN:
2169-3536
Page Range / eLocation ID:
53141 to 53157
Subject(s) / Keyword(s):
Human participation assessment, dynamic participant tracking, occlusion detection
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce a new method to detect student group interactions in collaborative learning videos. We consider the following video activities: (i) human to human, (ii) human to others, and (iii) lack of any interaction. The system uses multidimensional AM-FM methods to detect student faces, hair, and then use the results to detect possible interactions. We use dynamic graphs to represent group interactions within each video.We tested our methods with 15 videos and achieved an 84% accuracy for students facing the camera and 76% for students facing both towards and away from the camera. 
    more » « less
  2. Face recognition in collaborative learning videos presents many challenges. In collaborative learning videos, students sit around a typical table at different positions to the recording camera, come and go, move around, get partially or fully occluded. Furthermore, the videos tend to be very long, requiring the development of fast and accurate methods. We develop a dynamic system of recognizing participants in collaborative learning systems. We address occlusion and recognition failures by using past information about the face detection history. We address the need for detecting faces from different poses and the need for speed by associating each participant with a collection of prototype faces computed through sampling or K-means clustering. Our results show that the proposed system is proven to be very fast and accurate. We also compare our system against a baseline system that uses InsightFace [2] and the original training video segments. We achieved an average accuracy of 86.2% compared to 70.8% for the baseline system. On average, our recognition rate was 28.1 times faster than the baseline system. 
    more » « less
  3. Research on video activity recognition has been primarily focused on differentiating among many diverse activities defined using short video clips. In this paper, we introduce the problem of reliable video activity recognition over long videos to quantify student participation in collaborative learning environments (45 minutes to 2 hours). Video activity recognition in collaborative learning environments contains several unique challenges. We introduce participation maps that identify how and when each student performs each activity to quantify student participation. We present a family of low-parameter 3D ConvNet architectures to detect these activities. We then apply spatial clustering to identify each participant and generate student participation maps using the resulting detections. We demonstrate the effectiveness by training over about 1,000 3-second samples of typing and writing and test our results over ten video sessions of about 10 hours. In terms of activity detection, our methods achieve 80% accuracy for writing and typing that match the recognition performance of TSN, SlowFast, Slowonly, and I3D trained over the same dataset while using 1200x to 1500x fewer parameters. Beyond traditional video activity recognition methods, our video activity participation maps identify how each student participates within each group. 
    more » « less
  4. In this paper we proposed a real-time face mask detection and recognition for CCTV surveillance camera videos. The proposed work consists of six steps: video acquisition and keyframes selection, data augmentation, facial parts segmentation, pixel-based feature extraction, Bag of Visual Words (BoVW) generation, face mask detection, and face recognition. In the first step, a set of keyframes are selected using histogram of gradient (HoG) algorithm. Secondly, data augmentation is involved with three steps as color normalization, illumination correction (CLAHE), and poses normalization (Angular Affine Transformation). In third step, facial parts are segmented using clustering approach i.e. Expectation Maximization with Gaussian Mixture Model (EM-GMM), in which facial regions are segmented into Eyes, Nose, Mouth, Chin, and Forehead. Then, Pixel-based Feature Extraction is performed using Yolo Nano approach, which performance is higher and lightweight model than the Yolo Tiny V2 and Yolo Tiny V3, and extracted features are constructed into Codebook by Hassanat Similarity with K-Nearest neighbor (H-M with KNN) algorithm. For mask detection, L2 distance function is used. The final step is face recognition which is implemented by a Kernel-based Extreme Learning Machine with Slime Mould Optimization (SMO). Experiments conducted using Python IDLE 3.8 for the proposed Yolo Nano model and also previous works as GMM with Deep learning (GMM+DL), Convolutional Neural Network (CNN) with VGGF, Yolo Tiny V2, and Yolo Tiny V3 in terms of various performance metrics. 
    more » « less
  5. We introduce the problem of detecting a group of students from classroom videos. The problem requires the detection of students from different angles and the separation of the group from other groups in long videos (one to one and a half hours). We use multiple image representations to solve the problem. We use FM components to separate each group from background groups, AM-FM components for detecting the back-of-the-head, and YOLO for face detection. We use classroom videos from four different groups to validate our approach. Our use of multiple representations is shown to be significantly more accurate than the use of YOLO alone. 
    more » « less