NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Automated classification of activities in classroom videos.

https://doi.org/10.1016/j.caeai.2024.100207

Foster, Jonathan K; Korban, Matthew; Youngs, Peter; Watson, Ginger S; Acton, Scott T (June 2024, Computers and Education: Artificial Intelligence)
Hwang, Gwo-Jen; Xie, Haoran; Wah, Benjamin; Gasevic, Dragan (Ed.)
Classroom videos are a common source of data for educational researchers studying classroom interactions as well as a resource for teacher education and professional development. Over the last several decades emerging technologies have been applied to classroom videos to record, transcribe, and analyze classroom interactions. With the rise of machine learning, we report on the development and validation of neural networks to classify instructional activities using video signals, without analyzing speech or audio features, from a large corpus of nearly 250 h of classroom videos from elementary mathematics and English language arts instruction. Results indicated that the neural networks performed fairly-well in detecting instructional activities, at diverse levels of complexity, as compared to human raters. For instance, one neural network achieved over 80% accuracy in detecting four common activity types: whole class activity, small group activity, individual activity, and transition. An issue that was not addressed in this study was whether the fine-grained and agnostic instructional activities detected by the neural networks could scale up to supply information about features of instructional quality. Future applications of these neural networks may enable more efficient cataloguing and analysis of classroom videos at scale and the generation of fine-grained data about the classroom environment to inform potential implications for teaching and learning.
more » « less
Full Text Available
Instructional Activity Recognition Using A Transformer Network with Multi-Semantic Attention

https://doi.org/10.1109/SSIAI59505.2024.10508634

Korban, Matthew; Acton, Scott T; Youngs, Peter; Foster, Jonathan (March 2024, IEEE)
Korban, Matthew; Acton, Scott T; Youngs, Peter; Foster, Jonathan (Ed.)
Instructional activity recognition is an analytical tool for the observation of classroom education. One of the primary challenges in this domain is dealing with the intri- cate and heterogeneous interactions between teachers, students, and instructional objects. To address these complex dynamics, we present an innovative activity recognition pipeline designed explicitly for instructional videos, leveraging a multi-semantic attention mechanism. Our novel pipeline uses a transformer network that incorporates several types of instructional seman- tic attention, including teacher-to-students, students-to-students, teacher-to-object, and students-to-object relationships. This com- prehensive approach allows us to classify various interactive activity labels effectively. The effectiveness of our proposed algo- rithm is demonstrated through its evaluation on our annotated instructional activity dataset.
more » « less
Full Text Available
A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection

https://doi.org/10.1109/TPAMI.2024.3377192

Korban, Matthew; Youngs, Peter; Acton, Scott T (January 2024, IEEE Transactions on Pattern Analysis and Machine Intelligence)
Lee, Kyoung Mu (Ed.)
This paper presents a novel spatiotemporal transformer network that introduces several original components to detect actions in untrimmed videos. First, the multi-feature selective semantic attention model calculates the correlations between spatial and motion features to model the spatiotemporal interactions between different action semantics properly. Second, the motion-aware network encodes the locations of action semantics in video frames utilizing the motion-aware 2D positional encoding algorithm. Such a motion-aware mechanism memorizes the dynamic spatiotemporal variations in action frames that current methods cannot exploit. Third, the sequence-based temporal attention model captures the heterogeneous temporal dependencies in action frames. In contrast to standard temporal attention used in natural language processing, primarily aimed at finding similarities between linguistic words, the proposed sequence-based temporal attention is designed to determine both the differences and similarities between video frames that jointly define the meaning of actions. The proposed approach outperforms the state-of-the-art solutions on four spatiotemporal action datasets: AVA 2.2, AVA 2.1, UCF101-24, and EPIC-Kitchens.
more » « less
Full Text Available
A multi-modal transformer network for action detection.

https://doi.org/10.1016/j.patcog.2023.109713

Korban, M.; Youngs, P.; Acton, S (October 2023, Pattern recognition)
Hancock, E. (Ed.)
This paper proposes a multi-modal transformer network for detecting actions in untrimmed videos. To enrich the action features, our transformer network utilizes a novel multi-modal attention mechanism that captures the correlations between different combinations of spa- tial and motion modalities. Exploring such correlations for actions effectively has not been explored before. We also suggest an algorithm to correct the motion distortion caused by camera movements. Such motion distortion severely reduces the expressive power of motion features represented by optical flow vectors. We also introduce a new instructional activity dataset that includes classroom videos from K-12 schools. We conduct comprehensive ex- periments to evaluate the performance of different approaches on our dataset. Our proposed algorithm outperforms the state-of-the-art methods on two public benchmarks, THUMOS14 and ActivityNet, and our instructional activity dataset.
more » « less
Full Text Available
Instructional Activity Detection Using Deep Neural Networks

https://doi.org/10.1109/DSP58604.2023.10167935

Korban, Matthew; Youngs, Peter; Acton, Scott T (June 2023, IEEE)
Korban, Matthew; Youngs, Peter; Acton, Scott T (Ed.)
Analyzing instructional videos via computer vision and machine learning holds promise for several tasks, such as assessing teacher performance and classroom climate, evaluating student engagement, and identifying racial bias in instruction. The traditional way of evaluating instructional videos depends on manual observation with human raters, which is time-consuming and requires a trained labor force. Therefore, this paper tests several deep network architectures in the automation of instruc- tional video analysis, where the networks are tailored to recognize classroom activity. Our experimental setup includes a set of 250 hours of primary and middle school videos that are annotated by expert human raters. We present several strategies to handle varying length of instructional activities, a major challenge in the detection of instructional activity. Based on the proposed strategies, we enhance and compare different deep networks for detecting instructional activity.
more » « less
Full Text Available
TAA-GCN: A temporally aware adaptive graph convolutional network for age estimation.

https://doi.org/10.1016/j.patcog.2022.109066

Korban, M.; Youngs, P.; Acton, S. T. (February 2023, Pattern recognition)
Hancock, E. (Ed.)
This paper proposes a novel age estimation algorithm, the Temporally-Aware Adaptive Graph Convolutional Network (TAA-GCN). Using a new representation based on graphs, the TAA-GCN utilizes skeletal, posture, clothing, and facial information to enrich the feature set associated with various ages. Such a novel graph representation has several advantages: First, reduced sensitivity to facial expression and other appearance variances; Second, ro- bustness to partial occlusion and non-frontal-planar viewpoint, which is commonplace in real-world applications such as video surveillance. The TAA-GCN employs two novel com- ponents, (1) the Temporal Memory Module (TMM) to compute temporal dependencies in age; (2) Adaptive Graph Convolutional Layer (AGCL) to refine the graphs and accommo- date the variance in appearance. The TAA-GCN outperforms the state-of-the-art methods on four public benchmarks, UTKFace, MORPHII, CACD, and FG-NET. Moreover, the TAA-GCN showed reliability in di↵erent camera viewpoints and reduced quality images.
more » « less
Full Text Available

Search for: All records