Search for: All records

Award ID contains: 2118725

« Prev Next »

Total Resources

15

Resource Type
Conference Paper

14

Conference Proceeding

0

Dataset

0

Journal Article

1

Workshop Report

0

Availability
Full Text / Resource Available

6

Citation Only

9

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Knowledge Tracing Over Time: A Longitudinal Analysis

Lee, Morgan ; Crotreau, Ethan ; Gurung, Ashish ; Botelho, Anthony ; Heffernan ; Neil ( July 2023 , EDM 2023)

The use of Bayesian Knowledge Tracing (BKT) models in predicting student learning and mastery, especially in mathematics, is a well-established and proven approach in learning analytics. In this work, we report on our analysis examining the generalizability of BKT models across academic years attributed to ”detector rot.” We compare the generalizability of Knowledge Training (KT) models by comparing model performance in predicting student knowledge within the academic year and across academic years. Models were trained on data from two popular open-source curricula available through Open Educational Resources. We observed that the models generally were highly performant in predicting student learning within an academic year, whereas certain academic years were more generalizable than other academic years. We posit that the Knowledge Tracing models are relatively stable in terms of performance across academic years yet can still be susceptible to systemic changes and underlying learner behavior. As indicated by the evidence in this paper, we posit that learning platforms leveraging KT models need to be mindful of systemic changes or drastic changes in certain user demographics.
more » « less
Free, publicly-accessible full text available July 1, 2024
How to Open Science: Debugging Reproducibility within the Educational Data Mining Conference

Haim, Aaron ; Gyurcsan, Robert ; Baxter, Chris ; Shaw, Stacy ; Heffernan, Neil ( July 2023 , EDM 2023)

Despite increased efforts to assess the adoption rates of open science and robustness of reproducibility in sub-disciplines of education technology, there is a lack of understanding of why some research is not reproducible. Prior work has taken the first step toward assessing reproducibility of research, but has assumed certain constraints which hinder its discovery. Thus, the purpose of this study was to replicate previous work on papers within the proceedings of the International Conference on Educational Data Mining to accurately report on which papers are reproducible and why. Specifically, we examined 208 papers, attempted to reproduce them, documented reasons for reproducibility failures, and asked authors to provide additional information needed to reproduce their study. Our results showed that out of 12 papers that were potentially reproducible, only one successfully reproduced all analyses, and another two reproduced most of the analyses. The most common failure for reproducibility was failure to mention libraries needed, followed by non-seeded randomness. All openly accessible work can be found in an Open Science Foundation project1.
more » « less
Free, publicly-accessible full text available July 1, 2024
Effective Evaluation of Online Learning Interventions with Surrogate Measures

Prihar, Ethan ; Vanacore, Kirk ; Sales, Adam ; Heffernan, Neil ( July 2023 , EDM 2023)

There is a growing need to empirically evaluate the quality of online instructional interventions at scale. In response, some online learning platforms have begun to implement rapid A/B testing of instructional interventions. In these scenarios, students participate in series of randomized experiments that evaluate problem-level interventions in quick succession, which makes it difficult to discern the effect of any particular intervention on their learning. Therefore, distal measures of learning such as posttests may not provide a clear understanding of which interventions are effective, which can lead to slow adoption of new instructional methods. To help discern the effectiveness of instructional interventions, this work uses data from 26,060 clickstream sequences of students across 31 different online educational experiments exploring 51 different research questions and the students’ posttest scores to create and analyze different proximal surrogate measures of learning that can be used at the problem level. Through feature engineering and deep learning approaches, next-problem correctness was determined to be the best surrogate measure. As more data from online educational experiments are collected, model based surrogate measures can be improved, but for now, next-problem correctness is an empirically effective proximal surrogate measure of learning for analyzing rapid problemlevel experiments. The data and code used in this work can be found at https://osf.io/uj48v/.
more » « less
Free, publicly-accessible full text available July 1, 2024
Comparing Different Approaches to Generating Mathematics Explanations Using Large Language Models

Prihar, Ethan ; Lee, Morgan ; Hopman, Mia ; Kalai, Adam T. ; Vempala, Sofia ; Wang, Allison ; Wickline, Gabriel ; Murray, Aly ; Heffernan, Neil ( July 2023 , AIED 2023)

Large language models have recently been able to perform well in a wide variety of circumstances. In this work, we explore the possibility of large language models, specifically GPT-3, to write explanations for middle-school mathematics problems, with the goal of eventually using this process to rapidly generate explanations for the mathematics problems of new curricula as they emerge, shortening the time to integrate new curricula into online learning platforms. To generate explanations, two approaches were taken. The first approach attempted to summarize the salient advice in tutoring chat logs between students and live tutors. The second approach attempted to generate explanations using few-shot learning from explanations written by teachers for similar mathematics problems. After explanations were generated, a survey was used to compare their quality to that of explanations written by teachers. We test our methodology using the GPT-3 language model. Ultimately, the synthetic explanations were unable to outperform teacher written explanations. In the future more powerful large language models may be employed, and GPT-3 may still be effective as a tool to augment teachers’ process for writing explanations, rather than as a tool to replace them. The explanations, survey results, analysis code, and a dataset of tutoring chat logs are all available at https://osf.io/wh5n9/.
more » « less
Free, publicly-accessible full text available July 1, 2024
How Common are Common Wrong Answers? Crowdsourcing Remediation at Scale.

Gurung, Ashish ; Lee, Morgan ; Baral, Sami ; Sales, Adam ; Vanacore, Kirk ; McReynolds, Andrew ; Kreisberg, Hilary ; Heffernan, Cristina ; Haim, Aaron ; Heffernan, Neil ( July 2023 , Learning@Scale 2023)

Solving mathematical problems is cognitively complex, involving strategy formulation, solution development, and the application of learned concepts. However, gaps in students’ knowledge or weakly grasped concepts can lead to errors. Teachers play a crucial role in predicting and addressing these difficulties, which directly influence learning outcomes. However, preemptively identifying misconcep- tions leading to errors can be challenging. This study leverages historical data to assist teachers in recognizing common errors and addressing gaps in knowledge through feedback. We present a longitudinal analysis of incorrect answers from the 2015-2020 aca- demic years on two curricula, Illustrative Math and EngageNY, for grades 6, 7, and 8. We find consistent errors across 5 years despite varying student and teacher populations. Based on these Common Wrong Answers (CWAs), we designed a crowdsourcing platform for teachers to provide Common Wrong Answer Feedback (CWAF). This paper reports on an in vivo randomized study testing the ef- fectiveness of CWAFs in two scenarios: next-problem-correctness within-skill and next-problem-correctness within-assignment, re- gardless of the skill. We find that receiving CWAF leads to a signifi- cant increase in correctness for consecutive problems within-skill. However, the effect was not significant for all consecutive problems within-assignment, irrespective of the associated skill. This paper investigates the potential of scalable approaches in identifying Com- mon Wrong Answers (CWAs) and how the use of crowdsourced CWAFs can enhance student learning through remediation.
more » « less
Free, publicly-accessible full text available July 1, 2024
Assessing the ality of Large Language Models in Generating Mathematics Explanations

Wang, Allison ; Prihar, Ethan ; Heffernan, Neil ( July 2023 , Learning @ Scale (L@S ’23), July 20–22, 2023, Copenhagen, Denmark.)

The development and measurable improvements in performance of large language models on natural language tasks [12] opens up the opportunity to utilize large language models in an educational setting to replicate human tutoring, which is often costly and inaccessible. We are particularly interested in large language models from the GPT series, created by OpenAI [7]. In a prior study we found that the quality of explanations generated with GPT-3.5 was poor, where two dierent approaches to generating explanations resulted in a 43% and 10% success rate. In this replication study, we were interested in whether the measurable improvements in GPT-4 performance [6] led to a higher rate of success for generating valid explanations compared to GPT-3.5. A replication of the original study was conducted by using GPT-4 to generate explanations for the same problems given to GPT-3.5. Using GPT-4, explanation correctness dramatically improved to a success rate of 94%.We were further interested in evaluating if GPT-4 explanations were positively perceived compared to human-written explanations. A preregistered, single-blinded study was implemented where 10 evaluators were asked to rate the quality of randomized GPT-4 and teacher-created explanations. Even with 4% of problems containing some amount of incorrect content, GPT-4 explanations were preferred over human explanations. The implications of our signi- cant results at Learning @ Scale are that digital platforms can start A/B testing the eects of GPT-4 generated explanations on student learning, implementing explanations at scale, and also prompt programming to test dierent education theories, e.g., social emotional learning factors [5].
more » « less
Free, publicly-accessible full text available July 1, 2024
Investigating the Impact of Skill-Related Videos on Online Learning

Prihar, Ethan ; Haim, Aaron ; Shen, Tracy ; Sales, Adam ; Lee, Dongwen ; Xintao, Wu ; Heffernan, Neil ( July 2023 , Learning at Scale 2023)

Many online learning platforms and MOOCs incorporate some amount of video-based content into their platform, but there are few randomized controlled experiments that evaluate the effectiveness of the different methods of video integration. Given the large amount of publicly available educational videos, an investigation into this content’s impact on students could help lead to more effective and accessible video integration within learning platforms. In this work, a new feature was added into an existing online learning platform that allowed students to request skill-related videos while completing their online middle-school mathematics assignments. A total of 18,535 students participated in two large-scale randomized controlled experiments related to providing students with publicly available educational videos. The first experiment investigated the effect of providing students with the opportunity to request these videos, and the second experiment investigated the effect of using a multi-armed bandit algorithm to recommend relevant videos. Additionally, this work investigated which features of the videos were significantly predictive of students’ performance and which features could be used to personalize students’ learning. Ultimately, students were mostly disinterested in the skill-related videos, preferring instead to use the platforms existing problem specific support, and there was no statistically significant findings in either experiment. Additionally, while no video features were significantly predictive of students’ performance, two video features had significant qualitative interactions with students’ prior knowledge, which showed that different content creators were more effective for different groups of students. These findings can be used to inform the design of future video-based features within online learning platforms and the creation of different educational videos specifically targeting higher or lower knowledge students. The data and code used in this work can be found at https://osf.io/cxkzf/.
more » « less
Free, publicly-accessible full text available July 1, 2024
A Bandit You Can Trust

Prihar, Ethan ; Sales, Adam ; Heffernan, Neil ( July 2023 , UMAP 2023)

This work proposes Dynamic Linear Epsilon-Greedy, a novel contextual multi-armed bandit algorithm that can adaptively assign personalized content to users while enabling unbiased statistical analysis. Traditional A/B testing and reinforcement learning approaches have trade-offs between empirical investigation and maximal impact on users. Our algorithm seeks to balance these objectives, allowing platforms to personalize content effectively while still gathering valuable data. Dynamic Linear Epsilon-Greedy was evaluated via simulation and an empirical study in the ASSISTments online learning platform. In simulation, Dynamic Linear Epsilon-Greedy performed comparably to existing algorithms and in ASSISTments, slightly increased students’ learning compared to A/B testing. Data collected from its recommendations allowed for the identification of qualitative interactions, which showed high and low knowledge students benefited from different content. Dynamic Linear Epsilon-Greedy holds promise as a method to balance personalization with unbiased statistical analysis. All the data collected during the simulation and empirical study are publicly available at https://osf.io/zuwf7/.
more » « less
Free, publicly-accessible full text available July 1, 2024
Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on an Online Educational Platform: New Data and New Results*

Sales, Adam ; Prihar, Ethan ; Gagnon-BArtsch, Johann A. ; Heffernan, Neil. ( June 2023 , Journal of educational data mining)

Randomized A/B tests within online learning platforms represent an exciting direction in learning sciences. With minimal assumptions, they allow causal effect estimation without confounding bias and exact statistical inference even in small samples. However, often experimental samples and/or treatment effects are small, A/B tests are underpowered, and effect estimates are overly imprecise. Recent methodological advances have shown that power and statistical precision can be substantially boosted by coupling design-based causal estimation to machine-learning models of rich log data from historical users who were not in the experiment. Estimates using these techniques remain unbiased and inference remains exact without any additional assumptions. This paper reviews those methods and applies them to a new dataset including over 250 randomized A/B comparisons conducted within ASSISTments, an online learning platform. We compare results across experiments using four novel deep-learning models of auxiliary data and show that incorporating auxiliary data into causal estimates is roughly equivalent to increasing the sample size by 20% on average, or as much as 50-80% in some cases, relative to t-tests, and by about 10% on average, or as much as 30-50%, compared to cutting-edge machine learning unbiased estimates that use only data from the experiments. We show that the gains can be even larger for estimating subgroup effects, hold even when the remnant is unrepresentative of the A/B test sample, and extend to post-stratification population effects estimators.
more » « less
Free, publicly-accessible full text available June 1, 2024
Impact of Non-Cognitive Interventions on Student Learning Behaviors and Outcomes: An analysis of seven large-scale experimental inventions.

Vanacore, K.P. ; Gurung, A. ; McReynolds, A.A. ; Liu, A. ; Shaw, S.T. ; & Heffernan, N.T. ( March 2023 , LAK ’23: Learning Analytics & Knowledge.)

As evidence grows supporting the importance of non-cognitive factors in learning, computer-assisted learning platforms increasingly incorporate non-academic interventions to influence student learning and learning related-behaviors. Non-cognitive interventions often attempt to influence students’ mindset, motivation, or metacognitive reflection to impact learning behaviors and outcomes. In the current paper, we analyze data from five experiments, involving seven treatment conditions embedded in mastery-based learning activities hosted on a computer-assisted learning platform focused on middle school mathematics. Each treatment condition embodied a specific non-cognitive theoretical perspective. Over seven school years, 20,472 students participated in the experiments. We estimated the effects of each treatment condition on students’ response time, hint usage, likelihood of mastering knowledge components, learning efficiency, and post-tests performance. Our analyses reveal a mix of both positive and negative treatment effects on student learning behaviors and performance. Few interventions impacted learning as assessed by the post-tests. These findings highlight the difficulty in positively influencing student learning behaviors and outcomes using non-cognitive interventions.
more » « less
Full Text Available

« Prev Next »