Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
This paper systematically investigates the generation of code explanations by Large Language Models (LLMs) for code examples commonly encountered in introductory programming courses. Our findings reveal significant variations in the nature of code explanations produced by LLMs, influenced by factors such as the wording of the prompt, the specific code examples under consideration, the programming language involved, the temperature parameter, and the version of the LLM. However, a consistent pattern emerges for Java and Python, where ex- planations exhibit a Flesch-Kincaid readability level of approximately 7-8 grade and a consistent lexical density, indicating the proportion of meaningful words relative to the total explanation size. Additionally, the generated explanations consistently achieve high scores for correctness, but lower scores on three other metrics: completeness, conciseness, and specificity.more » « less
-
Understanding a student's problem-solving strategy can have a significant impact on effective math learning using Intelligent Tutoring Systems (ITSs) and Adaptive Instructional Systems (AISs). For instance, the ITS/AIS can better personalize itself to correct specific misconceptions that are indicated by incorrect strategies, specific problems can be designed to improve strategies and frustration can be minimized by adapting to a student's natural way of thinking rather than trying to fit a standard strategy for all. While it may be possible for human experts to identify strategies manually in classroom settings with sufficient student interaction, it is not possible to scale this up to big data. Therefore, we leverage advances in Machine Learning and AI methods to perform scalable strategy prediction that is also fair to students at all skill levels. Specifically, we develop an embedding called MVec where we learn a representation based on the mastery of students. We then cluster these embeddings with a non-parametric clustering method where each cluster contains instances that have approximately symmetrical strategies. The strategy prediction model is trained on instances sampled from these clusters ensuring that we train the model over diverse strategies. Using real world large-scale student interaction datasets from MATHia, we show that our approach can scale up to achieve high accuracy by training on a small sample of a large dataset and also has predictive equality, i.e., it can predict strategies equally well for learners at diverse skill levels.more » « less
-
This paper presents a tool for creating student models in logistic regression. Creating student models has typically been done by expert selection of the appropriate terms, beginning with models as simple as IRT or AFM but more recently with highly complex models like BestLR. While alternative methods exist to select the appropriate predictors for the regression-based models (e.g., step-wise selection or LASSO), we are unaware of their application to student modeling. Such automatic methods of model creation offer the possibility of better student models with either reduced complexity or better fit, in addition to relieving experts from the burden of searching for better models by hand with possible human error. Our tool builds on top of the preexisting R package LKT. We explain our search methods with two datasets demonstrating the advantages of using the tool with stepwise regression and regularization (LASSO) methods to aid in feature selection. For the stepwise method using BIC, the models are simpler (due to the BIC penalty for parameters) than alternatives like BestLR with little lack of fit. For the LASSO method, the models can be made simpler due to the fitting procedure involving a regularization parameter that penalizes large absolute coefficient values. However, LASSO also offers the possibility of highly complex models with exceptional fit.more » « less
-
Multiple choice questions are traditionally expensive to produce. Recent advances in large language models (LLMs) have led to fine-tuned LLMs that generate questions competitive with human-authored questions. However, the relative capabilities of ChatGPT-family models have not yet been established for this task. We present a carefully-controlled human evaluation of three conditions: a fine-tuned, augmented version of Macaw, instruction-tuned Bing Chat with zero-shot prompting, and humanauthored questions from a college science textbook. Our results indicate that on six of seven measures tested, both LLM’s performance was not significantly different from human performance. Analysis of LLM errors further suggests that Macaw and Bing Chat have different failure modes for this task: Macaw tends to repeat answer options whereas Bing Chat tends to not include the specified answer in the answer options. For Macaw, removing error items from analysis results in performance on par with humans for all metrics; for Bing Chat, removing error items improves performance but does not reach human-level performance.more » « less
-
Wang, N.; Rebolledo-Mendez; G., Dimitrova; V., Matsuda; Santos, O.C. (Ed.)Self-explanations could increase student’s comprehension in complex domains; however, it works most efficiently with a human tutor who could provide corrections and scaffolding. In this paper, we present our attempt to scale up the use of self-explanations in learning programming by delegating assessment and scaffolding of explanations to an intelligent tutor. To assess our approach, we performed a randomized control trial experiment that measured the impact of automatic assessment and scaffolding of self-explanations on code comprehension and learning. The study results indicate that low-prior knowledge students in the experimental condition learn more compared to high-prior knowledge in the same condition but such difference is not observed in a similar grouping of students based on prior knowledge in the control condition.more » « less
-
The ability to automatically assess learners' activities is the key to user modeling and personalization in adaptive educational systems.The work presented in this paper opens an opportunity to expand the scope of automated assessment from traditional programming problems to code comprehension tasks where students are requested to explain the critical steps of a program. The ability to automatically assess these self-explanations offers a unique opportunity to understand the current state of student knowledge, recognize possible misconceptions, and provide feedback. Annotated datasets are needed to train Artificial Intelligence/Machine Learning approaches for the automated assessment of student explanations. To answer this need, we present a novel corpus called SelfCode which consists of 1,770 sentence pairs of student and expert self-explanations of Java code examples, along with semantic similarity judgments provided by experts. We also present a baseline automated assessment model that relies on textual features. The corpus is available at the GitHub repository (https://github.com/jeevanchaps/SelfCode).more » « less
-
Frasson, C.; Mylonas, P.; Troussas, C. (Ed.)Domain modeling is an important task in designing, developing, and deploying intelligent tutoring systems and other adaptive instructional systems. We focus here on the more specific task of automatically extracting a domain model from textbooks. In particular, this paper explores using multiple textbook indexes to extract a domain model for computer programming. Our approach is based on the observation that different experts, i.e., authors of intro-to-programming textbooks in our case, break down a domain in slightly different ways, and identifying the commonalities and differences can be very revealing. To this end, we present automated approaches to extracting domain models from multiple textbooks and compare the resulting common domain model with a domain model created by experts. Specifically, we use approximate string-matching approaches to increase coverage of the resulting domain model and majority voting across different textbooks to discover common domain terms related to computer programming. Our results indicate that using approximate string matching gives more accurate domain models for computer programming with increased precision and recall. By automating our approach, we can significantly reduce the time and effort required to construct high-quality domain models, making it easy to develop and deploy tutoring systems. Furthermore, we obtain a common domain model that can serve as a benchmark or skeleton that can be used broadly and adapted to specific needs by others.more » « less
-
Understanding how students with varying capabilities think about problem solving can greatly help in improving personalized education which can have significantly better learning outcomes. Here, we present the details of a system we call NeTra that we developed for discovering strategies that students follow in the context of Math learning. Specifically, we developed this system from large-scale data from MATHia that contains millions of student-tutor interactions. The goal of this system is to provide a visual interface for educators to understand the likely strategy the student will follow for problems that students are yet to attempt. This predictive interface can help educators/tutors to develop interventions that are personalized for students. Underlying the system is a powerful AI model based on Neuro-Symbolic learning that has shown promising results in predicting both strategies and the mastery over concepts used in the strategy.more » « less
-
Multi-angle question answering models have recently been proposed that promise to perform related tasks like question generation. However, performance on related tasks has not been thoroughly studied. We investigate a leading model called Macaw on the task of multiple choice question generation and evaluate its performance on three angles that systematically reduce the complexity of the task. Our results indicate that despite the promise of generalization, Macaw performs poorly on untrained angles. Even on a trained angle, Macaw fails to generate four distinct multiple-choice options on 17% of inputs. We propose augmenting multiple choice options by paraphrasing angle input and show this increases overall success to 97.5%. A human evaluation comparing the augmented multiple-choice questions with textbook questions on the same topic reveals that Macaw questions broadly score highly but below human questions.more » « less
-
We report work-in-progress that aims to better understand prediction performance differences between Deep Knowledge Tracing (DKT) and Bayesian Knowledge Tracing (BKT) as well as “gaming the system” behavior by considering variation in features and design across individual pieces of instructional content. Our“non-monolithic”analysis considers hundreds of “workspaces” in Carnegie Learning’s MATHia intelligent tutoring system and the extent to which two relatively simple features extracted from MATHia logs, potentially related to gaming the system behavior, are correlated with differences in DKT and BKT prediction performance. We then take a closer look at a set of six MATHia workspaces, three of which represent content in which DKT out-performs BKT and three of which represent content in which BKT out-performs DKT or there is little difference in performance between the approaches. We present some preliminary findings related to the extent to which students game the system in these workspaces, across two school years, as well as other facets of variability across these pieces of instructional content. We conclude with a road map for scaling these analyses over much larger sets of MATHia workspaces and learner data.more » « less