Automatic short answer grading is an important research direction
in the exploration of how to use artificial intelligence
(AI)-based tools to improve education. Current state-of-theart
approaches use neural language models to create vectorized
representations of students responses, followed by classifiers
to predict the score. However, these approaches have
several key limitations, including i) they use pre-trained language
models that are not well-adapted to educational subject
domains and/or student-generated text and ii) they almost
always train one model per question, ignoring the linkage
across question and result in a significant model storage
problem due to the size of advanced language models. In this
paper, we study the problem of automatic short answer grading
for students’ responses to math questions and propose
a novel framework for this task. First, we use MathBERT,
a variant of the popular language model BERT adapted to
mathematical content, as our base model and fine-tune it
on the downstream task of student response grading. Second,
we use an in-context learning approach that provides
scoring examples as input to the language model to provide
additional context information and promote generalization
to previously unseen questions. We evaluate our framework
on a real-world dataset of student responses to open-ended
math questions and show that our framework (often significantly)
outperform existing approaches, especially for new
questions that are not seen during training.
more »
« less
MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education
Since the introduction of the original BERT (i.e., BASE BERT), researchers have developed various customized BERT models with
improved performance for specific domains and tasks by exploiting
the benefits of transfer learning. Due to the nature of mathematical texts, which often use domain specific vocabulary along with
equations and math symbols, we posit that the development of
a new BERT model for mathematics would be useful for many
mathematical downstream tasks. In this resource paper, we introduce our multi-institutional effort (i.e., two learning platforms and
three academic institutions in the US) toward this need: MathBERT, a model created by pre-training the BASE BERT model
on a large mathematical corpus ranging from pre-kindergarten
(pre-k), to high-school, to college graduate level mathematical content. In addition, we select three general NLP tasks that are often
used in mathematics education: prediction of knowledge component, auto-grading open-ended Q&A, and knowledge tracing, to
demonstrate the superiority of MathBERT over BASE BERT. Our
experiments show that MathBERT outperforms prior best methods
by 1.2-22% and BASE BERT by 2-8% on these tasks. In addition,
we build a mathematics specific vocabulary ‘mathVocab’ to train
with MathBERT. We discover that MathBERT pre-trained with
‘mathVocab’ outperforms MathBERT trained with the BASE BERT
vocabulary (i.e., ‘origVocab’). MathBERT is currently being adopted
at the participated leaning platforms: Stride, Inc, a commercial educational resource provider, and ASSISTments.org, a free online
educational platform. We release MathBERT for public usage at:
https://github.com/tbs17/MathBERT.
more »
« less
- Award ID(s):
- 1822830
- NSF-PAR ID:
- 10386545
- Date Published:
- Journal Name:
- NeurIPS 2021 Math AI for Education Workshop
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Mitrovic, A ; Bosch, N (Ed.)Automatic short answer grading is an important research direction in the exploration of how to use artificial intelligence (AI)-based tools to improve education. Current state-of-theart approaches use neural language models to create vectorized representations of students responses, followed by classifiers to predict the score. However, these approaches have several key limitations, including i) they use pre-trained language models that are not well-adapted to educational subject domains and/or student-generated text and ii) they almost always train one model per question, ignoring the linkage across question and result in a significant model storage problem due to the size of advanced language models. In this paper, we study the problem of automatic short answer grading for students’ responses to math questions and propose a novel framework for this task. First, we use MathBERT, a variant of the popular language model BERT adapted to mathematical content, as our base model and fine-tune it on the downstream task of student response grading. Second, we use an in-context learning approach that provides scoring examples as input to the language model to provide additional context information and promote generalization to previously unseen questions. We evaluate our framework on a real-world dataset of student responses to open-ended math questions and show that our framework (often significantly) outperform existing approaches, especially for new questions that are not seen during training.more » « less
-
Automatic short answer grading is an important research direction in the exploration of how to use artificial intelligence (AI)-based tools to improve education. Current state-of-theart approaches use neural language models to create vectorized representations of students responses, followed by classifiers to predict the score. However, these approaches have several key limitations, including i) they use pre-trained language models that are not well-adapted to educational subject domains and/or student-generated text and ii) they almost always train one model per question, ignoring the linkage across question and result in a significant model storage problem due to the size of advanced language models. In this paper, we study the problem of automatic short answer grading for students’ responses to math questions and propose a novel framework for this task. First, we use MathBERT, a variant of the popular language model BERT adapted to mathematical content, as our base model and fine-tune it on the downstream task of student response grading. Second, we use an in-context learning approach that provides scoring examples as input to the language model to provide additional context information and promote generalization to previously unseen questions. We evaluate our framework on a real-world dataset of student responses to open-ended math questions and show that our framework (often significantly) outperform existing approaches, especially for new questions that are not seen during training.more » « less
-
Automatic short answer grading is an important research direction in the exploration of how to use artificial intelligence (AI)-based tools to improve education. Current state-of-theart approaches use neural language models to create vectorized representations of students responses, followed by classifiers to predict the score. However, these approaches have several key limitations, including i) they use pre-trained language models that are not well-adapted to educational subject domains and/or student-generated text and ii) they almost always train one model per question, ignoring the linkage across question and result in a significant model storage problem due to the size of advanced language models. In this paper, we study the problem of automatic short answer grading for students’ responses to math questions and propose a novel framework for this task. First, we use MathBERT, a variant of the popular language model BERT adapted to mathematical content, as our base model and fine-tune it on the downstream task of student response grading. Second, we use an in-context learning approach that provides scoring examples as input to the language model to provide additional context information and promote generalization to previously unseen questions. We evaluate our framework on a real-world dataset of student responses to open-ended math questions and show that our framework (often significantly) outperform existing approaches, especially for new questions that are not seen during training.more » « less
-
Automatic short answer grading is an important research di- rection in the exploration of how to use artificial intelligence (AI)-based tools to improve education. Current state-of-the- art approaches use neural language models to create vector- ized representations of students responses, followed by clas- sifers to predict the score. However, these approaches have several key limitations, including i) they use pre-trained lan- guage models that are not well-adapted to educational sub- ject domains and/or student-generated text and ii) they al- most always train one model per question, ignoring the link- age across question and result in a significant model storage problem due to the size of advanced language models. In this paper, we study the problem of automatic short answer grad- ing for students’ responses to math questions and propose a novel framework for this task. First, we use MathBERT, a variant of the popular language model BERT adapted to mathematical content, as our base model and fine-tune it on the downstream task of student response grading. Sec- ond, we use an in-context learning approach that provides scoring examples as input to the language model to provide additional context information and promote generalization to previously unseen questions. We evaluate our framework on a real-world dataset of student responses to open-ended math questions and show that our framework (often signif- icantly) outperform existing approaches, especially for new questions that are not seen during training.more » « less