skip to main content

Title: Structure-Based Discriminative Matrix Factorization for Detecting Inefficient Learning Behaviors
Modern online learning platforms offer a wealth of learning content while leaving the choice of content for study and practice to the learner. Recent work has demonstrated that many students use inefficient learning strategies that lead to lower performance in this context. The ability to detect inefficient learning behavior by monitoring learning data opens a way to timely intervention that could lead to better learning and performance. In this work, we propose SB-DNMF, a structure-based discriminative non-negative matrix factorization model aimed to distinguish between common and distinct learning behavior patterns of low- and high-learning gain students. Our model can discover latent groups of students' behavioral micro-patterns while accounting for the structural similarities between these micro-patterns based upon a weighted edit-distance measure. Our experiments demonstrate that SB-DNMF can find meaningful latent factors that are associated with students' learning gain and can cluster the behavioral patterns into common (trait), and performance-related groups.
; ;
Award ID(s):
Publication Date:
Journal Name:
2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)
Page Range or eLocation-ID:
283 to 290
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent studies have shown that students follow stable behavioral patterns while learning in online educational systems. These behavioral patterns can further be used to group the students into different clusters. However, as these clusters include both high- and low-performance students, the relation between the behavioral patterns and student performance is yet to be clarified. In this work, we study the relationship between students’ learning behaviors and their performance, in a self-organized online learning system that allows them to freely practice with various problems and worked examples. We represent each student’s behavior as a vector of highsupport sequential micro-patterns. Then, we discover both the prevalent behavioral patterns in each group and the shared patterns across groups using discriminative non-negative matrix factorization. Our experiments show that we can successfully detect such common and specific patterns in students’ behavior that can be further interpreted into student learning behavior trait patterns and performance patterns.
  2. Obeid, Iyad Selesnick (Ed.)
    The Temple University Hospital EEG Corpus (TUEG) [1] is the largest publicly available EEG corpus of its type and currently has over 5,000 subscribers (we currently average 35 new subscribers a week). Several valuable subsets of this corpus have been developed including the Temple University Hospital EEG Seizure Corpus (TUSZ) [2] and the Temple University Hospital EEG Artifact Corpus (TUAR) [3]. TUSZ contains manually annotated seizure events and has been widely used to develop seizure detection and prediction technology [4]. TUAR contains manually annotated artifacts and has been used to improve machine learning performance on seizure detection tasks [5]. In this poster, we will discuss recent improvements made to both corpora that are creating opportunities to improve machine learning performance. Two major concerns that were raised when v1.5.2 of TUSZ was released for the Neureka 2020 Epilepsy Challenge were: (1) the subjects contained in the training, development (validation) and blind evaluation sets were not mutually exclusive, and (2) high frequency seizures were not accurately annotated in all files. Regarding (1), there were 50 subjects in dev, 50 subjects in eval, and 592 subjects in train. There was one subject common to dev and eval, five subjects common to dev andmore »train, and 13 subjects common between eval and train. Though this does not substantially influence performance for the current generation of technology, it could be a problem down the line as technology improves. Therefore, we have rebuilt the partitions of the data so that this overlap was removed. This required augmenting the evaluation and development data sets with new subjects that had not been previously annotated so that the size of these subsets remained approximately the same. Since these annotations were done by a new group of annotators, special care was taken to make sure the new annotators followed the same practices as the previous generations of annotators. Part of our quality control process was to have the new annotators review all previous annotations. This rigorous training coupled with a strict quality control process where annotators review a significant amount of each other’s work ensured that there is high interrater agreement between the two groups (kappa statistic greater than 0.8) [6]. In the process of reviewing this data, we also decided to split long files into a series of smaller segments to facilitate processing of the data. Some subscribers found it difficult to process long files using Python code, which tends to be very memory intensive. We also found it inefficient to manipulate these long files in our annotation tool. In this release, the maximum duration of any single file is limited to 60 mins. This increased the number of edf files in the dev set from 1012 to 1832. Regarding (2), as part of discussions of several issues raised by a few subscribers, we discovered some files only had low frequency epileptiform events annotated (defined as events that ranged in frequency from 2.5 Hz to 3 Hz), while others had events annotated that contained significant frequency content above 3 Hz. Though there were not many files that had this type of activity, it was enough of a concern to necessitate reviewing the entire corpus. An example of an epileptiform seizure event with frequency content higher than 3 Hz is shown in Figure 1. Annotating these additional events slightly increased the number of seizure events. In v1.5.2, there were 673 seizures, while in v1.5.3 there are 1239 events. One of the fertile areas for technology improvements is artifact reduction. Artifacts and slowing constitute the two major error modalities in seizure detection [3]. This was a major reason we developed TUAR. It can be used to evaluate artifact detection and suppression technology as well as multimodal background models that explicitly model artifacts. An issue with TUAR was the practicality of the annotation tags used when there are multiple simultaneous events. An example of such an event is shown in Figure 2. In this section of the file, there is an overlap of eye movement, electrode artifact, and muscle artifact events. We previously annotated such events using a convention that included annotating background along with any artifact that is present. The artifacts present would either be annotated with a single tag (e.g., MUSC) or a coupled artifact tag (e.g., MUSC+ELEC). When multiple channels have background, the tags become crowded and difficult to identify. This is one reason we now support a hierarchical annotation format using XML – annotations can be arbitrarily complex and support overlaps in time. Our annotators also reviewed specific eye movement artifacts (e.g., eye flutter, eyeblinks). Eye movements are often mistaken as seizures due to their similar morphology [7][8]. We have improved our understanding of ocular events and it has allowed us to annotate artifacts in the corpus more carefully. In this poster, we will present statistics on the newest releases of these corpora and discuss the impact these improvements have had on machine learning research. We will compare TUSZ v1.5.3 and TUAR v2.0.0 with previous versions of these corpora. We will release v1.5.3 of TUSZ and v2.0.0 of TUAR in Fall 2021 prior to the symposium. ACKNOWLEDGMENTS Research reported in this publication was most recently supported by the National Science Foundation’s Industrial Innovation and Partnerships (IIP) Research Experience for Undergraduates award number 1827565. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the official views of any of these organizations. REFERENCES [1] I. Obeid and J. Picone, “The Temple University Hospital EEG Data Corpus,” in Augmentation of Brain Function: Facts, Fiction and Controversy. Volume I: Brain-Machine Interfaces, 1st ed., vol. 10, M. A. Lebedev, Ed. Lausanne, Switzerland: Frontiers Media S.A., 2016, pp. 394 398. [2] V. Shah et al., “The Temple University Hospital Seizure Detection Corpus,” Frontiers in Neuroinformatics, vol. 12, pp. 1–6, 2018. [3] A. Hamid et, al., “The Temple University Artifact Corpus: An Annotated Corpus of EEG Artifacts.” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2020, pp. 1-3. [4] Y. Roy, R. Iskander, and J. Picone, “The NeurekaTM 2020 Epilepsy Challenge,” NeuroTechX, 2020. [Online]. Available: [Accessed: 01-Dec-2021]. [5] S. Rahman, A. Hamid, D. Ochal, I. Obeid, and J. Picone, “Improving the Quality of the TUSZ Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2020, pp. 1–5. [6] V. Shah, E. von Weltin, T. Ahsan, I. Obeid, and J. Picone, “On the Use of Non-Experts for Generation of High-Quality Annotations of Seizure Events,” Available: https://www.isip.picone [Accessed: 01-Dec-2021]. [7] D. Ochal, S. Rahman, S. Ferrell, T. Elseify, I. Obeid, and J. Picone, “The Temple University Hospital EEG Corpus: Annotation Guidelines,” Philadelphia, Pennsylvania, USA, 2020. [8] D. Strayhorn, “The Atlas of Adult Electroencephalography,” EEG Atlas Online, 2014. [Online]. Availabl« less
  3. Educational data mining research has demonstrated that the large volume of learning data collected by modern e-learning systems could be used to recognize student behavior patterns and group students into cohorts with similar behavior. However, few attempts have been done to connect and compare behavioral patterns with known dimensions of individual differences. To what extent learner behavior is defined by known individual differences? Which of them could be a better predictor of learner engagement and performance? Could we use behavior patterns to build a data-driven model of individual differences that could be more useful for predicting critical outcomes of the learning process than traditional models? Our paper attempts to answer these questions using a large volume of learner data collected in an online practice system. We apply a sequential pattern mining approach to build individual models of learner practice behavior and reveal latent student subgroups that exhibit considerably different practice behavior. Using these models we explored the connections between learner behavior and both, the incoming and outgoing parameters of the learning process. Among incoming parameters we examined traditionally collected individual differences such as self-esteem, gender, and knowledge monitoring skills. We also attempted to bridge the gap between cluster-based behavior patternmore »models and traditional scale-based models of individual differences by quantifying learner behavior on a latent data-driven scale. Our research shows that this data-driven model of individual differences performs significantly better than traditional models of individual differences in predicting important parameters of the learning process, such as performance and engagement.« less
  4. Students acquire knowledge as they interact with a variety of learning materials, such as video lectures, problems, and discussions. Modeling student knowledge at each point during their learning period and understanding the contribution of each learning material to student knowledge are essential for detecting students’ knowledge gaps and recommending learning materials to them. Current student knowledge modeling techniques mostly rely on one type of learning material, mainly problems, to model student knowledge growth. These approaches ignore the fact that students also learn from other types of material. In this paper, we propose a student knowledge model that can capture knowledge growth as a result of learning from a diverse set of learning resource types while unveiling the association between the learning materials of different types. Our multi-view knowledge model (MVKM) incorporates a flexible knowledge increase objective on top of a multi-view tensor factorization to capture occasional forgetting while representing student knowledge and learning material concepts in a lower-dimensional latent space. We evaluate our model in different experiments to show that it can accurately predict students’ future performance, differentiate between knowledge gain in different student groups and concepts, and unveil hidden similarities across learning materials of different types.
  5. This Work-In-Progress falls within the research category of study and, focuses on the experiences and perceptions of first- and second year engineering students when using an online engineering game that was designed to enhance understanding of statics concepts. Technology and online games are increasingly being used in engineering education to help students gain competencies in technical domains in the engineering field. Less is known about the way that these online games are designed and incorporated into the classroom environment and how these factors can ignite inequitable perspectives and experiences among engineering students. Also, little if any work that combines the TAM model and intersectionality of race and gender in engineering education has been done, though several studies have been modified to account for gender or race. This study expands upon the Technology Acceptance Model (TAM) by exploring perspectives of intersectional groups (defined as women of color who are engineering students). A Mixed Method Sequential Exploratory Research Design approach was used that extends the TAM model. Students were asked to play the engineering educational game, complete an open-ended questionnaire and then to participate in a focus group. Early findings suggest that while many students were open to learning to use themore »game and recommended inclusion of online engineering educational games as learning tools in classrooms, only a few indicated that they would use this tool to prepare for exams or technical job interviews. Some of the main themes identified in this study included unintended perpetuation of inequality through bias in favor of students who enjoyed competition-based learning and assessment of knowledge, and bias for students having prior experience in playing online games. Competition-based assessment related to presumed learning of course content enhanced student anxiety and feelings of intimidation and led to some students seeking to “game the game” versus learning the material, in efforts to achieve grade goals. Other students associated use of the game and the classroom weighted grading with intense stress that led them to prematurely stop the use of the engineering tool. Initial findings indicate that both game design and how technology is incorporated into the grading and testing of learning outcomes, influence student perceptions of the technology’s usefulness and ultimately the acceptance of the online game as a "learning tool." Results also point to the need to explore how the crediting and assessment of students’ performance and learning gains in these types of games could yield inequitable experiences in these types of courses.« less