skip to main content

Search for: All records

Award ID contains: 1931523

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. It is particularly important to identify and address issues of fairness and equity in educational contexts as academic performance can have large impacts on the types of opportunities that are made available to students. While it is always the hope that educators approach student assessment with these issues in mind, there are a number of factors that likely impact how a teacher approaches the scoring of student work. Particularly in cases where the assessment of student work requires subjective judgment, as in the case of open-ended answers and essays, contextual information such as how the student has performed in the past, general perceptions of the student, and even other external factors such as fatigue may all influence how a teacher approaches assessment. While such factors exist, however, it is not always clear how these may introduce bias, nor is it clear whether such bias poses measurable risks to fairness and equity. In this paper, we examine these factors in the context of the assessment of student answers to open response questions from middle school mathematics learners. We observe how several factors such as context and fatigue correlate with teacher-assigned grades and discuss how learning systems may support fair assessment. Keywords:more »halo effect, grading biases, fairness, subjective assessment« less
    Free, publicly-accessible full text available November 1, 2023
  2. Mitrovic, A ; Bosch, N (Ed.)
    Automatic short answer grading is an important research direction in the exploration of how to use artificial intelligence (AI)-based tools to improve education. Current state-of-theart approaches use neural language models to create vectorized representations of students responses, followed by classifiers to predict the score. However, these approaches have several key limitations, including i) they use pre-trained language models that are not well-adapted to educational subject domains and/or student-generated text and ii) they almost always train one model per question, ignoring the linkage across question and result in a significant model storage problem due to the size of advanced language models. In this paper, we study the problem of automatic short answer grading for students’ responses to math questions and propose a novel framework for this task. First, we use MathBERT, a variant of the popular language model BERT adapted to mathematical content, as our base model and fine-tune it on the downstream task of student response grading. Second, we use an in-context learning approach that provides scoring examples as input to the language model to provide additional context information and promote generalization to previously unseen questions. We evaluate our framework on a real-world dataset of student responses to open-ended mathmore »questions and show that our framework (often significantly) outperform existing approaches, especially for new questions that are not seen during training.« less
    Free, publicly-accessible full text available July 1, 2023
  3. The development and application of deep learning method- ologies has grown within educational contexts in recent years. Perhaps attributable, in part, to the large amount of data that is made avail- able through the adoption of computer-based learning systems in class- rooms and larger-scale MOOC platforms, many educational researchers are leveraging a wide range of emerging deep learning approaches to study learning and student behavior in various capacities. Variations of recurrent neural networks, for example, have been used to not only pre- dict learning outcomes but also to study sequential and temporal trends in student data; it is commonly believed that they are able to learn high- dimensional representations of learning and behavioral constructs over time, such as the evolution of a students’ knowledge state while working through assigned content. Recent works, however, have started to dis- pute this belief, instead finding that it may be the model’s complexity that leads to improved performance in many prediction tasks and that these methods may not inherently learn these temporal representations through model training. In this work, we explore these claims further in the context of detectors of student affect as well as expanding on exist- ing work that explored benchmarks inmore »knowledge tracing. Specifically, we observe how well trained models perform compared to deep learning networks where training is applied only to the output layer. While the highest results of prior works utilizing trained recurrent models are found to be superior, the application of our untrained-versions perform compa- rably well, outperforming even previous non-deep learning approaches. Keywords: Deep Learning · LSTM · Echo State Network · Affect · Knowledge Tracing.« less
    Free, publicly-accessible full text available July 1, 2023
  4. Personalized learning stems from the idea that students benefit from instructional material tailored to their needs. Many online learning platforms purport to implement some form of personalized learning, often through on-demand tutoring or self-paced instruction, but to our knowledge none have a way to automatically explore for specific opportunities to personalize students’ education nor a transparent way to identify the effects of personalization on specific groups of students. In this work we present the Automatic Personalized Learning Service (APLS). The APLS uses multi-armed bandit algorithms to recommend the most effective support to each student that requests assistance when completing their online work, and is currently used by ASSISTments, an online learning platform. The first empirical study of the APLS found that Beta-Bernoulli Thompson Sampling, a popular and effective multi-armed bandit algorithm, was only slightly more capable of selecting helpful support than randomly selecting from the relevant support options. Therefore, we also present Decision Tree Thompson Sampling (DTTS), a novel contextual multi-armed bandit algorithm that integrates the transparency and interpretability of decision trees into Thomson sampling. In simulation, DTTS overcame the challenges of recommending support within an online learning platform and was able to increase students’ learning by as much asmore »10% more than the current algorithm used by the APLS. We demonstrate that DTTS is able to identify qualitative interactions that not only help determine the most effective support for students, but that also generalize well to new students, problems, and support content. The APLS using DTTS is now being deployed at scale within ASSISTments and is a promising tool for all educational learning platforms.« less
    Free, publicly-accessible full text available June 1, 2023
  5. A s m or e e d u c at or s i nt e gr at e t h eir c urri c ul a wit h o nli n e l e ar ni n g, it i s e a si er t o cr o w d s o ur c e c o nt e nt fr o m t h e m. Cr o w ds o ur c e d t ut ori n g h a s b e e n pr o v e n t o r eli a bl y i n cr e a s e st u d e nt s’ n e xt pr o bl e m c orr e ct n e s s. I n t hi s w or k, w e c o n fir m e d t h e fi n di n g s of a pr e vi o u s st u d y i n t hi s ar e a, wit h str o n g er c o n fi d e n c e m ar gi n s t h amore »n pr e vi o u sl y, a n d r e v e al e d t h at o nl y a p orti o n of cr o w d s o ur c e d c o nt e nt cr e at or s h a d a r eli a bl e b e n e fit t o st ud e nt s. F urt h er m or e, t hi s w or k pr o vi d e s a m et h o d t o r a n k c o nt e nt cr e at or s r el ati v e t o e a c h ot h er, w hi c h w a s u s e d t o d et er mi n e w hi c h c o nt e nt cr e at or s w er e m o st eff e cti v e o v er all, a n d w hi c h c o nt e nt cr e at or s w er e m o st eff e cti v e f or s p e ci fi c gr o u p s of st u d e nt s. W h e n e x pl ori n g d at a fr o m Te a c h er A SSI S T, a f e at ur e wit hi n t h e A S SI S T m e nt s l e ar ni n g pl atf or m t h at cr o w d s o ur c e s t ut ori n g fr o m t e a c h er s, w e f o u n d t h at w hil e o v erall t hi s pr o gr a m pr o vi d e s a b e n e fit t o st u d e nt s, s o m e t e a c h er s cr e at e d m or e eff e cti v e c o nt e nt t h a n ot h er s. D e s pit e t hi s fi n di n g, w e di d n ot fi n d e vi d e n c e t h at t h e eff e cti v e n e s s of c o nt e nt r eli a bl y v ari e d b y st u d e nt k n o wl e d g e-l e v el, s u g g e sti n g t h at t h e c o nt e nt i s u nli k el y s uit a bl e f or p er s o n ali zi n g i n str u cti o n b a s e d o n st u d e nt k n o wl e d g e al o n e. T h e s e fi n di n g s ar e pr o mi si n g f or t h e f ut ur e of cr o w d s o ur c e d t ut ori n g a s t h e y h el p pr o vi d e a f o u n d ati o n f or a s s e s si n g t h e q u alit y of cr o w d s o ur c e d c o nt e nt a n d i n v e sti g ati n g c o nt e nt f or o p p ort u niti e s t o p er s o n ali z e st u d e nt s’ e d u c ati o n.« less
  6. Roll, I. ; McNamara, D. ; Sosnovsky, S. ; Luckin, R. ; Dimitrova, V. (Ed.)
    Scaffolding and providing feedback on problem-solving activities during online learning has consistently been shown to improve performance in younger learners. However, less is known about the impacts of feedback strategies on adult learners. This paper investigates how two computer-based support strategies, hints and required scaffolding questions, contribute to performance and behavior in an edX MOOC with integrated assignments from ASSISTments, a web-based platform that implements diverse student supports. Results from a sample of 188 adult learners indicated that those given scaffolds benefited less from ASSISTments support and were more likely to request the correct answer from the system.
  7. Similar content has tremendous utility in classroom and online learning environments. For example, similar content can be used to combat cheating, track students’ learning over time, and model students’ latent knowledge. These different use cases for similar content all rely on different notions of similarity, which make it difficult to determine contents’ similarities. Crowdsourcing is an effective way to identify similar content in a variety of situations by providing workers with guidelines on how to identify similar content for a particular use case. However, crowdsourced opinions are rarely homogeneous and therefore must be aggregated into what is most likely the truth. This work presents the Dynamically Weighted Majority Vote method. A novel algorithm that combines aggregating workers’ crowdsourced opinions with estimating the reliability of each worker. This method was compared to the traditional majority vote method in both a simulation study and an empirical study, in which opinions on seventh grade mathematics problems’ similarity were crowdsourced from middle school math teachers and college students. In both the simulation and the empirical study the Dynamically Weighted Majority Vote method outperformed the traditional majority vote method, suggesting that this method should be used instead of majority vote in future crowdsourcing endeavors.
  8. Educational content labeled with proper knowledge components (KCs) are particularly useful to teachers or content organizers. However, manually labeling educational content is labor intensive and error-prone. To address this challenge, prior research proposed machine learning based solutions to auto-label educational content with limited success. In this work, we significantly improve prior research by (1) expanding the input types to include KC descriptions, instructional video titles, and problem descriptions (i.e., three types of prediction task), (2) doubling the granularity of the prediction from 198 to 385 KC labels (i.e., more practical setting but much harder multinomial classification problem), (3) improving the prediction accuracies by 0.5–2.3% using Task-adaptive Pre-trained BERT, outperforming six baselines, and (4) proposing a simple evaluation measure by which we can recover 56–73% of mispredicted KC labels. All codes and data sets in the experiments are available at: