skip to main content


Search for: All records

Award ID contains: 1822830

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. As computer-based learning platforms have become ubiquitous, there is a growing need to better support teachers. Particularly in mathematics, teachers often rely on openended questions to assess students’ understanding. While prior works focusing on the development of automated openended work assessments have demonstrated their potential, many of those methods require large amounts of student data to make reliable estimates. We explore whether a problem specific automated scoring model could benefit from auxiliary data collected from similar problems to address this “cold start” problem. We examine factors such as sample size and the magnitude of similarity of utilized problem data. We find the use of data from similar problems not only provides benefits to improve predictive performance by increasing sample size, but also leads to greater overall model performance than using data solely from the original problem when sample size is held constant. 
    more » « less
  2. Abstract. With the greater application of machine learning models in educational contexts, it is important to understand where such meth- ods perform well as well as how they may be improved. As such, it is important to identify the factors that contribute to prediction error in order to develop targeted methods to enhance model accuracy and mitigate risks of algorithmic bias and unfairness. Prior works have led to the development and application of automated assessment methods that leverage machine learning and natural language processing. The performance of these methods have often been reported as being posi- tive, but other prior works have identified aspects on which they may be improved. Particularly in the context of mathematics, the presence of non-linguistic characters and expressions have been identified to con- tribute to observed model error. In this paper, we build upon this prior work by observing a developed automated assessment model for open- response questions in mathematics. We develop a new approach which we call the “Math Term Frequency” (MTF) model to address this issue caused by the presence of non-linguistic terms and ensemble it with the previously-developed assessment model. We observe that the inclusion of this approach notably improves model performance. Finally, we observe how well this ensembled method extrapolates to student responses in the context of Algorithms, a domain similarly characterized by a large number of non-linguistic terms and expressions. This work represents an example of practice of how error analyses can be leveraged to address model limitations. 
    more » « less
  3. Mitrovic, A ; Bosch, N. (Ed.)
    Automatic short answer grading is an important research direction in the exploration of how to use artificial intelligence (AI)-based tools to improve education. Current state-of-theart approaches use neural language models to create vectorized representations of students responses, followed by classifiers to predict the score. However, these approaches have several key limitations, including i) they use pre-trained language models that are not well-adapted to educational subject domains and/or student-generated text and ii) they almost always train one model per question, ignoring the linkage across question and result in a significant model storage problem due to the size of advanced language models. In this paper, we study the problem of automatic short answer grading for students’ responses to math questions and propose a novel framework for this task. First, we use MathBERT, a variant of the popular language model BERT adapted to mathematical content, as our base model and fine-tune it on the downstream task of student response grading. Second, we use an in-context learning approach that provides scoring examples as input to the language model to provide additional context information and promote generalization to previously unseen questions. We evaluate our framework on a real-world dataset of student responses to open-ended math questions and show that our framework (often significantly) outperform existing approaches, especially for new questions that are not seen during training. 
    more » « less
  4. Iyer, S. et (Ed.)
    It is particularly important to identify and address issues of fairness and equity in educational contexts as academic performance can have large impacts on the types of opportunities that are made available to students. While it is always the hope that educators approach student assessment with these issues in mind, there are a number of factors that likely impact how a teacher approaches the scoring of student work. Particularly in cases where the assessment of student work requires subjective judgment, as in the case of open-ended answers and essays, contextual information such as how the student has performed in the past, general perceptions of the student, and even other external factors such as fatigue may all influence how a teacher approaches assessment. While such factors exist, however, it is not always clear how these may introduce bias, nor is it clear whether such bias poses measurable risks to fairness and equity. In this paper, we examine these factors in the context of the assessment of student answers to open response questions from middle school mathematics learners. We observe how several factors such as context and fatigue correlate with teacher-assigned grades and discuss how learning systems may support fair assessment. 
    more » « less
  5. Since the introduction of the original BERT (i.e., BASE BERT), researchers have developed various customized BERT models with improved performance for specific domains and tasks by exploiting the benefits of transfer learning. Due to the nature of mathematical texts, which often use domain specific vocabulary along with equations and math symbols, we posit that the development of a new BERT model for mathematics would be useful for many mathematical downstream tasks. In this resource paper, we introduce our multi-institutional effort (i.e., two learning platforms and three academic institutions in the US) toward this need: MathBERT, a model created by pre-training the BASE BERT model on a large mathematical corpus ranging from pre-kindergarten (pre-k), to high-school, to college graduate level mathematical content. In addition, we select three general NLP tasks that are often used in mathematics education: prediction of knowledge component, auto-grading open-ended Q&A, and knowledge tracing, to demonstrate the superiority of MathBERT over BASE BERT. Our experiments show that MathBERT outperforms prior best methods by 1.2-22% and BASE BERT by 2-8% on these tasks. In addition, we build a mathematics specific vocabulary ‘mathVocab’ to train with MathBERT. We discover that MathBERT pre-trained with ‘mathVocab’ outperforms MathBERT trained with the BASE BERT vocabulary (i.e., ‘origVocab’). MathBERT is currently being adopted at the participated leaning platforms: Stride, Inc, a commercial educational resource provider, and ASSISTments.org, a free online educational platform. We release MathBERT for public usage at: https://github.com/tbs17/MathBERT. 
    more » « less
  6. null (Ed.)
    A s m or e e d u c at or s i nt e gr at e t h eir c urri c ul a wit h o nli n e l e ar ni n g, it i s e a si er t o cr o w d s o ur c e c o nt e nt fr o m t h e m. Cr o w ds o ur c e d t ut ori n g h a s b e e n pr o v e n t o r eli a bl y i n cr e a s e st u d e nt s’ n e xt pr o bl e m c orr e ct n e s s. I n t hi s w or k, w e c o n fir m e d t h e fi n di n g s of a pr e vi o u s st u d y i n t hi s ar e a, wit h str o n g er c o n fi d e n c e m ar gi n s t h a n pr e vi o u sl y, a n d r e v e al e d t h at o nl y a p orti o n of cr o w d s o ur c e d c o nt e nt cr e at or s h a d a r eli a bl e b e n e fit t o st ud e nt s. F urt h er m or e, t hi s w or k pr o vi d e s a m et h o d t o r a n k c o nt e nt cr e at or s r el ati v e t o e a c h ot h er, w hi c h w a s u s e d t o d et er mi n e w hi c h c o nt e nt cr e at or s w er e m o st eff e cti v e o v er all, a n d w hi c h c o nt e nt cr e at or s w er e m o st eff e cti v e f or s p e ci fi c gr o u p s of st u d e nt s. W h e n e x pl ori n g d at a fr o m Te a c h er A SSI S T, a f e at ur e wit hi n t h e A S SI S T m e nt s l e ar ni n g pl atf or m t h at cr o w d s o ur c e s t ut ori n g fr o m t e a c h er s, w e f o u n d t h at w hil e o v erall t hi s pr o gr a m pr o vi d e s a b e n e fit t o st u d e nt s, s o m e t e a c h er s cr e at e d m or e eff e cti v e c o nt e nt t h a n ot h er s. D e s pit e t hi s fi n di n g, w e di d n ot fi n d e vi d e n c e t h at t h e eff e cti v e n e s s of c o nt e nt r eli a bl y v ari e d b y st u d e nt k n o wl e d g e-l e v el, s u g g e sti n g t h at t h e c o nt e nt i s u nli k el y s uit a bl e f or p er s o n ali zi n g i n str u cti o n b a s e d o n st u d e nt k n o wl e d g e al o n e. T h e s e fi n di n g s ar e pr o mi si n g f or t h e f ut ur e of cr o w d s o ur c e d t ut ori n g a s t h e y h el p pr o vi d e a f o u n d ati o n f or a s s e s si n g t h e q u alit y of cr o w d s o ur c e d c o nt e nt a n d i n v e sti g ati n g c o nt e nt f or o p p ort u niti e s t o p er s o n ali z e st u d e nt s’ e d u c ati o n. 
    more » « less
  7. Open-ended questions in mathematics are commonly used by teachers to monitor and assess students’ deeper conceptual understanding of content. Student answers to these types of questions often exhibit a combination of language, drawn diagrams and tables, and mathematical formulas and expressions that supply teachers with insight into the processes and strategies adopted by students in formulating their responses. While these student responses help to inform teachers on their students’ progress and understanding, the amount of variation in these responses can make it difficult and time-consuming for teachers to manually read, assess, and provide feedback to student work. For this reason, there has been a growing body of research in developing AI-powered tools to support teachers in this task. This work seeks to build upon this prior research by introducing a model that is designed to help automate the assessment of student responses to open-ended questions in mathematics through sentence-level semantic representations. We find that this model outperforms previously published benchmarks across three different metrics. With this model, we conduct an error analysis to examine characteristics of student responses that may be considered to further improve the method. 
    more » « less
  8. Educational content labeled with proper knowledge components (KCs) are particularly useful to teachers or content organizers. However, manually labeling educational content is labor intensive and error-prone. To address this challenge, prior research proposed machine learning based solutions to auto-label educational content with limited success. In this work, we significantly improve prior research by (1) expanding the input types to include KC descriptions, instructional video titles, and problem descriptions (i.e., three types of prediction task), (2) doubling the granularity of the prediction from 198 to 385 KC labels (i.e., more practical setting but much harder multinomial classification problem), (3) improving the prediction accuracies by 0.5–2.3% using Task-adaptive Pre-trained BERT, outperforming six baselines, and (4) proposing a simple evaluation measure by which we can recover 56–73% of mispredicted KC labels. All codes and data sets in the experiments are available at: https://github.com/tbs17/TAPT-BERT Keywords 
    more » « less
  9. The use of computer-based systems in classrooms has provided teachers with new opportunities in delivering content to students, supplementing instruction, and assessing student knowledge and comprehension. Among the largest benefits of these systems is their ability to provide students with feedback on their work and also report student performance and progress to their teacher. While computer-based systems can automatically assess student answers to a range of question types, a limitation faced by many systems is in regard to open-ended problems. Many systems are either unable to provide support for open-ended problems, relying on the teacher to grade them manually, or avoid such question types entirely. Due to recent advancements in natural language processing methods, the automation of essay grading has made notable strides. However, much of this research has pertained to domains outside of mathematics, where the use of open-ended problems can be used by teachers to assess students' understanding of mathematical concepts beyond what is possible on other types of problems. This research explores the viability and challenges of developing automated graders of open-ended student responses in mathematics. We further explore how the scale of available data impacts model performance. Focusing on content delivered through the ASSISTments online learning platform, we present a set of analyses pertaining to the development and evaluation of models to predict teacher-assigned grades for student open responses. 
    more » « less
  10. We present and evaluate a machine learning based system that automatically grades audios of students speaking a foreign language. The use of automated systems to aid the assessment of student performance holds great promise in augmenting the teacher’s ability to provide meaningful feedback and instruction to students. Teachers spend a significant amount of time grading student work and the use of these tools can save teachers a significant amount of time on their grading. This additional time could be used to give personalized attention to each student. Significant prior research has focused on the grading of closed-form problems, open-ended essays and textual content. However, little research has focused on audio content that is much more prevalent in the language-study education. In this paper, we explore the development of automated assessment tools for audio responses in a college-level Chinese language-learning course. We analyze several challenges faced while working with data of this type as well as the generation and extraction of features for the purpose of building machine learning models to aid in the assessment of student language learning. 
    more » « less