Abstract. With the greater application of machine learning models in
educational contexts, it is important to understand where such meth-
ods perform well as well as how they may be improved. As such, it
is important to identify the factors that contribute to prediction error
in order to develop targeted methods to enhance model accuracy and
mitigate risks of algorithmic bias and unfairness. Prior works have led
to the development and application of automated assessment methods
that leverage machine learning and natural language processing. The
performance of these methods have often been reported as being posi-
tive, but other prior works have identified aspects on which they may
be improved. Particularly in the context of mathematics, the presence
of non-linguistic characters and expressions have been identified to con-
tribute to observed model error. In this paper, we build upon this prior
work by observing a developed automated assessment model for open-
response questions in mathematics. We develop a new approach which
we call the “Math Term Frequency” (MTF) model to address this issue
caused by the presence of non-linguistic terms and ensemble it with the
previously-developed assessment model. We observe that the inclusion of
this approach notably improves model performance. Finally, we observe
how well this ensembled method extrapolates to student responses in
the context of Algorithms, a domain similarly characterized by a large
number of non-linguistic terms and expressions. This work represents an
example of practice of how error analyses can be leveraged to address
model limitations.
more »
« less
Enhancing Auto-scoring of Student Open Responses in the Presence of Mathematical Terms and Expressions
Prior works have led to the development and application of
automated assessment methods that leverage machine learning and nat-
ural language processing. The performance of these methods have often
been reported as being positive, but other prior works have identified
aspects on which they may be improved. Particularly in the context of
mathematics, the presence of non-linguistic characters and expressions
have been identified to contribute to observed model error. In this paper,
we build upon this prior work by observing a developed automated as-
sessment model for open-response questions in mathematics. We develop
a new approach which we call the “Math Term Frequency” (MTF) model
to address this issue caused by the presence of non-linguistic terms and
ensemble it with the previously-developed assessment model. We observe
that the inclusion of this approach notably improves model performance,
and present an example of practice of how error analyses can be leveraged
to address model limitations.
more »
« less
- Award ID(s):
- 1903304
- PAR ID:
- 10331805
- Date Published:
- Journal Name:
- Proceedings of the 23rd International Conference on Artificial Intelligence in Education
- Page Range / eLocation ID:
- in press
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
With the greater application of machine learning models in educational contexts, it is important to understand where such meth- ods perform well as well as how they may be improved. As such, it is important to identify the factors that contribute to prediction error in order to develop targeted methods to enhance model accuracy and mitigate risks of algorithmic bias and unfairness. Prior works have led to the development and application of automated assessment methods that leverage machine learning and natural language processing. The performance of these methods have often been reported as being posi- tive, but other prior works have identified aspects on which they may be improved. Particularly in the context of mathematics, the presence of non-linguistic characters and expressions have been identified to con- tribute to observed model error. In this paper, we build upon this prior work by observing a developed automated assessment model for open- response questions in mathematics. We develop a new approach which we call the “Math Term Frequency” (MTF) model to address this issue caused by the presence of non-linguistic terms and ensemble it with the previously-developed assessment model. We observe that the inclusion of this approach notably improves model performance. Finally, we observe how well this ensembled method extrapolates to student responses in the context of Algorithms, a domain similarly characterized by a large number of non-linguistic terms and expressions. This work represents an example of practice of how error analyses can be leveraged to address model limitations.more » « less
-
Advancements in online learning platforms have revolutionized education in multiple different ways, transforming the learning experiences and instructional practices. The development of natural language processing and machine learning methods have helped understand and process student languages, comprehend their learning state, and build automated supports for teachers. With this, there has been a growing body of research in developing automated methods to assess students’ work both in mathematical and nonmathematical domains. These automated methods address questions of two categories; closed-ended (with limited correct answers) and open-ended (are often subjective and have multiple correct answers), where open-ended questions are mostly used by teachers to learn about their student’s understanding of a particular concept. Manually assessing and providing feedback to these open-ended questions is often arduous and time-consuming for teachers. For this reason, there have been several works to understand student responses to these open-ended questions to automate the assessment and provide constructive feedback to students. In this research, we seek to improve such a prior method for assessment and feedback suggestions for student open-ended works in mathematics. For this, we present an error analysis of the prior method ”SBERT-Canberra” for auto-scoring, explore various factors that contribute to the error of the method, and propose solutions to improve upon the method by addressing these error factors. We further intend to expand this approach by improving feedback suggestions for teachers to give to their students’ open-ended work.more » « less
-
As computer-based learning platforms have become ubiquitous, there is a growing need to better support teachers. Particularly in mathematics, teachers often rely on openended questions to assess students’ understanding. While prior works focusing on the development of automated openended work assessments have demonstrated their potential, many of those methods require large amounts of student data to make reliable estimates. We explore whether a problem specific automated scoring model could benefit from auxiliary data collected from similar problems to address this “cold start” problem. We examine factors such as sample size and the magnitude of similarity of utilized problem data. We find the use of data from similar problems not only provides benefits to improve predictive performance by increasing sample size, but also leads to greater overall model performance than using data solely from the original problem when sample size is held constant.more » « less
-
As computer-based learning platforms have become ubiq- uitous, there is a growing need to better support teachers. Particularly in mathematics, teachers often rely on open- ended questions to assess students’ understanding. While prior works focusing on the development of automated open- ended work assessments have demonstrated their potential, many of those methods require large amounts of student data to make reliable estimates. We explore whether a prob- lem specific automated scoring model could benefit from auxiliary data collected from similar problems to address this “cold start” problem. We examine factors such as sam- ple size and the magnitude of similarity of utilized problem data. We find the use of data from similar problems not only provides benefits to improve predictive performance by in- creasing sample size, but also leads to greater overall model performance than using data solely from the original prob- lem when sample size is held constant.more » « less