Long-term Human Video Activity Quantification of Student Participation

Jatla, V.; Teeparthi, S.; Pattichis, M.S.; Celedon-Pattichis, S.; LopezLeiva, C.

Research on video activity recognition has been primarily focused on differentiating among many diverse activities defined using short video clips. In this paper, we introduce the problem of reliable video activity recognition over long videos to quantify student participation in collaborative learning environments (45 minutes to 2 hours). Video activity recognition in collaborative learning environments contains several unique challenges. We introduce participation maps that identify how and when each student performs each activity to quantify student participation. We present a family of low-parameter 3D ConvNet architectures to detect these activities. We then apply spatial clustering to identify each participant and generate student participation maps using the resulting detections. We demonstrate the effectiveness by training over about 1,000 3-second samples of typing and writing and test our results over ten video sessions of about 10 hours. In terms of activity detection, our methods achieve 80% accuracy for writing and typing that match the recognition performance of TSN, SlowFast, Slowonly, and I3D trained over the same dataset while using 1200x to 1500x fewer parameters. Beyond traditional video activity recognition methods, our video activity participation maps identify how each student participates within each group.

More Like this