Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)The effectiveness of feedback in enhancing learning outcomes is well documented within Educational Data Mining (EDM). Various prior research have explored methodologies to enhance the effectiveness of feedback to students in various ways. Recent developments in Large Language Models (LLMs) have extended their utility in enhancing automated feedback systems. This study aims to explore the potential of LLMs in facilitating automated feedback in math education in the form of numeric assessment scores. We examine the effectiveness of LLMs in evaluating student responses and scoring the responses by comparing 3 different models: Llama, SBERT-Canberra, and GPT4 model. The evaluation requires the model to provide a quantitative score on the student's responses to open-ended math problems. We employ Mistral, a version of Llama catered to math, and fine-tune this model for evaluating student responses by leveraging a dataset of student responses and teacher-provided scores for middle-school math problems. A similar approach was taken for training the SBERT-Canberra model, while the GPT4 model used a zero-shot learning approach. We evaluate and compare the models' performance in scoring accuracy. This study aims to further the ongoing development of automated assessment and feedback systems and outline potential future directions for leveraging generative LLMs in building automated feedback systems.more » « less
-
Gaming the system is a persistent problem in Computer-Based Learning Platforms. While substantialprogress has been made in identifying and understanding such behaviors, effective interventions remainscarce. This study uses a method of causal moderation known as Fully Latent Principal Stratification toexplore the impact of two types of interventions – gamification and manipulation of assistance access –on the learning outcomes of students who tend to game the system. The results indicate that gamificationdoes not consistently mitigate these negative behaviors. One gamified condition had a consistentlypositive effect on learning regardless of students’ propensity to game the system, whereas the other had anegative effect on gamers. However, delaying access to hints and feedback may have a positive effect onthe learning outcomes of those gaming the system. This paper also illustrates the potential for integratingdetection and causal methodologies within educational data mining to evaluate effective responses to detectedbehaviors.more » « less
-
Many online learning platforms and MOOCs incorporate some amount of video-based content into their platform, but there are few randomized controlled experiments that evaluate the effective- ness of the different methods of video integration. Given the large amount of publicly available educational videos, an investigation into this content’s impact on students could help lead to more ef- fective and accessible video integration within learning platforms. In this work, a new feature was added into an existing online learn- ing platform that allowed students to request skill-related videos while completing their online middle-school mathematics assign- ments. A total of 18,535 students participated in two large-scale randomized controlled experiments related to providing students with publicly available educational videos. The first experiment investigated the effect of providing students with the opportunity to request these videos, and the second experiment investigated the effect of using a multi-armed bandit algorithm to recommend relevant videos. Additionally, this work investigated which features of the videos were significantly predictive of students’ performance and which features could be used to personalize students’ learning. Ultimately, students were mostly disinterested in the skill-related videos, preferring instead to use the platforms existing problem- specific support, and there was no statistically significant findings in either experiment. Additionally, while no video features were significantly predictive of students’ performance, two video fea- tures had significant qualitative interactions with students’ prior knowledge, which showed that different content creators were more effective for different groups of students. These findings can be used to inform the design of future video-based features within online learning platforms and the creation of different educational videos specifically targeting higher or lower knowledge students.more » « less
-
Despite increased efforts to assess the adoption rates of open science and robustness of reproducibility in sub-disciplines of education technology, there is a lack of understanding of why some research is not reproducible. Prior work has taken the first step toward assessing reproducibility of research, but has assumed certain constraints which hinder its discovery. Thus, the purpose of this study was to replicate previous work on papers within the proceedings of the International Conference on Educational Data Mining and develop metrics to accurately report on which papers are reproducible and why. Specifically, we examined 208 papers, attempted to reproduce them, documented reasons for reproducibility failures, and asked authors to provide additional information needed to reproduce their study. Our results showed that out of 12 papers that were potentially reproducible, only one successfully reproduced all analyses, and another two reproduced most of the analyses. The most common failure for reproducibility was failure to mention libraries needed, followed by non-seeded randomness. All openly accessible work can be found in an Open Science Foundation project1.more » « less
-
There have been numerous efforts documenting the effects of open science in existing papers; however, these efforts typically only consider the author’s analyses and supplemental materials from the papers. While understanding the current rate of open science adoption is important, it is also vital that we explore the factors that may encourage such adoption. One such factor may be publishing organizations setting open science requirements of submitted arti- cles: encouraging researchers to adopt more rigorous reporting and research practices. For example, within the education technology discipline, the ACM Conference on Learning @ Scale (L@S) has been promoting open science practices since 2018 through a Call For Pa- pers statement. The purpose of this study was to replicate previous papers within the proceedings of L@S and compare the degree of open science adoption and robust reproducibility practices to other conferences in education technology without a statement on open science. Specifically, we examined 93 papers and documented the open science practices used. We then attempted to reproduce the results with intervention from authors to bolster the chance of suc- cess. Finally, we compared the overall adoption rates to those from other conferences in education technology. Our cursory review sug- gests that researchers at L@S were more knowledgeable in open science practices, such as preregistration or preprints, compared to the researchers who published in International Conference on Artificial Intelligence in Education and the International Conference on Educational Data Mining as they were less likely to say they were unfamiliar with the practices. However, the overall adoption of open science practices was significantly lower with only 1% of papers providing open data, 5% providing open materials, and no papers with a preregistration. Based on speculation, the low adoption rates maybe due to 20% of the papers not using a dataset, at-scale datasets and materials that were unable to be released to avoid security issues or sensitive data leaks, or that data were being used in ongoing research and are not considered complete enough for release by the authors. All openly accessible work can be found in an Open Science Framework projectmore » « less
-
The process of synthesizing solutions for mathematical problems is cognitively complex. Students formulate and implement strate- gies to solve mathematical problems, develop solutions, and make connections between their learned concepts as they apply their reasoning skills to solve such problems. The gaps in student knowl- edge or shallowly-learned concepts may cause students to guess at answers or otherwise apply the wrong approach, resulting in errors in their solutions. Despite the complexity of the synthesis process in mathematics learning, teachers’ knowledge and ability to anticipate areas of potential difficulty is essential and correlated with student learning outcomes. Preemptively identifying the common miscon- ceptions in students that result in subsequent incorrect attempts can be arduous and unreliable, even for experienced teachers. This pa- per aims to help teachers identify the subsequent incorrect attempts that commonly occur when students are working on math problems such that they can address the underlying gaps in knowledge and common misconceptions through feedback. We report on a longi- tudinal analysis of historical data, from a computer-based learning platform, exploring the incorrect answers in the prior school years (’15-’20) that establish the commonality of wrong answers on two Open Educational Resources (OER) curricula–Illustrative Math (IM) and EngageNY (ENY) for grades 6, 7, and 8. We observe that incor- rect answers are pervasive across 5 academic years despite changes in underlying student and teacher population. Building on our find- ings regarding the Common Wrong Answers (CWAs), we report on goals and task analysis that we leveraged in designing and develop- ing a crowdsourcing platform for teachers to write Common Wrong Answer Feedback (CWAF) aimed are remediating the underlying cause of the CWAs. Finally, we report on an in vivo study by analyz- ing the effectiveness of CWAFs using two approaches; first, we use next-problem-correctness as a dependent measure after receiving CWAF in an intent-to-treat second, using next-attempt correctness as a dependent measure after receiving CWAF in a treated analysis. With the rise in popularity and usage of computer-based learning platforms, this paper explores the potential benefits of scalability in identifying CWAs and the subsequent usage of crowd-sourced CWAFs in enhancing the student learning experience through re- mediation.more » « less
-
This work proposes Dynamic Linear Epsilon-Greedy, a novel con- textual multi-armed bandit algorithm that can adaptively assign personalized content to users while enabling unbiased statistical analysis. Traditional A/B testing and reinforcement learning ap- proaches have trade-offs between empirical investigation and max- imal impact on users. Our algorithm seeks to balance these objec- tives, allowing platforms to personalize content effectively while still gathering valuable data. Dynamic Linear Epsilon-Greedy was evaluated via simulation and an empirical study in the ASSIST- ments online learning platform. In simulation, Dynamic Linear Epsilon-Greedy performed comparably to existing algorithms and in ASSISTments, slightly increased students’ learning compared to A/B testing. Data collected from its recommendations allowed for the identification of qualitative interactions, which showed high and low knowledge students benefited from different content. Dynamic Linear Epsilon-Greedy holds promise as a method to bal- ance personalization with unbiased statistical analysis. All the data collected during the simulation and empirical study are publicly available at https://osf.io/zuwf7/.more » « less