skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Toward Automated Classroom Observation: Predicting Positive and Negative Climate
We devised and evaluated a multi-modal machine learning-based system to analyze videos of school classrooms for "positive climate" and "negative climate", which are two dimensions of the Classroom Assessment Scoring System (CLASS). School classrooms are highly cluttered audiovisual scenes containing many overlapping faces and voices. Due to the difficulty of labeling them (reliable coding requires weeks of training) and their sensitive nature (students and teachers may be in stressful or potentially embarrassing situations), CLASS- labeled classroom video datasets are scarce, and their labels are sparse (just a few labels per 15-minute video dip). Thus, the overarching challenge was how to harness modern deep perceptual architectures despite the paucity of labeled data. Through training low-level CNN-based facial attribute detectors (facial expression & adult/child) as well as a direct audio-to- climate regressor, and by integrating low-level information over time using a Bi-LSTM, we constructed automated detectors of positive and negative classroom climate with accuracy (10- fold cross-validation Pearson correlation on 241 CLASS-labeled videos) of 0.40 and 0.51, respectively. These numbers are superior to what we obtained using shallower architectures. This work represents the first automated system designed to detect specific dimensions of the CLASS.  more » « less
Award ID(s):
1551594
PAR ID:
10128891
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Automatic Face and Gesture Recognition
Page Range / eLocation ID:
1 to 8
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Online classes are typically conducted by using video conferencing software such as Zoom, Microsoft Teams, and Google Meet. Research has identified drawbacks of online learning, such as “Zoom fatigue”, characterized by distractions and lack of engagement. This study presents the CUNY Affective and Responsive Virtual Environment (CARVE) Hub, a novel virtual reality hub that uses a facial emotion classification model to generate emojis for affective and informal responsive interaction in a 3D virtual classroom setting. A web-based machine learning model is employed for facial emotion classification, enabling students to communicate four basic emotions live through automated web camera capture in a virtual classroom without activating their cameras. The experiment is conducted in undergraduate classes on both Zoom and CARVE, and the results of a survey indicate that students have a positive perception of interactions in the proposed virtual classroom compared with Zoom. Correlations between automated emojis and interactions are also observed. This study discusses potential explanations for the improved interactions, including a decrease in pressure on students when they are not showing faces. In addition, video panels in traditional remote classrooms may be useful for communication but not for interaction. Students favor features in virtual reality, such as spatial audio and the ability to move around, with collaboration being identified as the most helpful feature. 
    more » « less
  2. Khosravi, H (Ed.)
    Despite a tremendous increase in the use of video for conducting research in classrooms as well as preparing and evaluating teachers, there remain notable challenges to using classroom videos at scale, including time and financial costs. Recent advances in artificial intelligence could make the process of analyzing, scoring, and cataloguing videos more efficient. These advances include natural language processing, automated speech recognition, and deep neural networks. To train artificial intelligence to accurately classify activities in classroom videos, humans must first annotate a set of videos in a consistent way. This paper describes our investigation of the degree of inter-annotator reliability regarding identification of and duration of activities among annotators with and without experience analyzing classroom videos. Validity of human annotations is crucial for research involving temporal analysis within classroom video research. The study reported here represents an important step towards applying methods developed in other fields to validate temporal analytics within learning analytics research for classifying time- and event-based activities in classroom videos. 
    more » « less
  3. Korban, Matthew; Youngs, Peter; Acton, Scott T (Ed.)
    Analyzing instructional videos via computer vision and machine learning holds promise for several tasks, such as assessing teacher performance and classroom climate, evaluating student engagement, and identifying racial bias in instruction. The traditional way of evaluating instructional videos depends on manual observation with human raters, which is time-consuming and requires a trained labor force. Therefore, this paper tests several deep network architectures in the automation of instruc- tional video analysis, where the networks are tailored to recognize classroom activity. Our experimental setup includes a set of 250 hours of primary and middle school videos that are annotated by expert human raters. We present several strategies to handle varying length of instructional activities, a major challenge in the detection of instructional activity. Based on the proposed strategies, we enhance and compare different deep networks for detecting instructional activity. 
    more » « less
  4. Transcripts of teaching episodes can be effective tools to understand discourse patterns in classroom instruction. According to most educational experts, sustained classroom discourse is a critical component of equitable, engaging, and rich learning environments for students. This paper describes the TalkMoves dataset, composed of 567 human annotated K-12 mathematics lesson transcripts (including entire lessons or portions of lessons) derived from video recordings. The set of transcripts primarily includes in-person lessons with whole-class discussions and/or small group work, as well as some online lessons. All of the transcripts are human-transcribed, segmented by the speaker (teacher or student), and annotated at the sentence level for ten discursive moves based on accountable talk theory. In addition, the transcripts include utterance-level information in the form of dialogue act labels based on the Switchboard Dialog Act Corpus. The dataset can be used by educators, policymakers, and researchers to understand the nature of teacher and student discourse in K-12 math classrooms. Portions of this dataset have been used to develop the TalkMoves application, which provides teachers with automated, immediate, and actionable feedback about their mathematics instruction. 
    more » « less
  5. Growing interest in “flipped” classrooms has made video lessons an increasingly prominent component of post-secondary mathematics curricula. This format, where students watch videos outside of class, can be leveraged to create a more active learning environment during class. Thus, for very challenging but essential classes in STEM, like calculus, the use of video lessons can have a positive impact on student success. However, relatively little is known about how students watch and learn from calculus instructional videos. This research generates knowledge about how students engage with, make sense of, and learn from calculus instructional videos. 
    more » « less