skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Croteau, E"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Human-conducted rating tasks are resource-intensive and demand significant time and financial commitments. As Large Language Models (LLMs) like GPT emerge and exhibit prowess across various domains, their potential in automating such evaluation tasks becomes evident. In this research, we leveraged four prominent LLMs: GPT-4, GPT-3.5, Vicuna, and PaLM 2, to scrutinize their aptitude in evaluating teacher-authored mathematical explanations. We utilized a detailed rubric that encompassed accuracy, explanation clarity, the correctness of mathematical notation, and the efficacy of problem-solving strategies. During our investigation, we unexpectedly discerned the influence of HTML formatting on these evaluations. Notably, GPT-4 consistently favored explanations formatted with HTML, whereas the other models displayed mixed inclinations. When gauging Inter-Rater Reliability (IRR) among these models, only Vicuna and PaLM 2 demonstrated high IRR using the conventional Cohen’s Kappa metric for explanations formatted with HTML. Intriguingly, when a more relaxed version of the metric was applied, all model pairings showcased robust agreement. These revelations not only underscore the potential of LLMs in providing feedback on student-generated content but also illuminate new avenues, such as reinforcement learning, which can harness the consistent feedback from these models. 
    more » « less
  2. The use of Bayesian Knowledge Tracing (BKT) models in predicting student learning and mastery, especially in math- ematics, is a well-established and proven approach in learn- ing analytics. In this work, we report on our analysis exam- ining the generalizability of BKT models across academic years attributed to ”detector rot.” We compare the gen- eralizability of Knowledge Training (KT) models by com- paring model performance in predicting student knowledge within the academic year and across academic years. Models were trained on data from two popular open-source curric- ula available through Open Educational Resources. We ob- served that the models generally were highly performant in predicting student learning within an academic year, whereas certain academic years were more generalizable than other academic years. We posit that the Knowledge Tracing mod- els are relatively stable in terms of performance across aca- demic years yet can still be susceptible to systemic changes and underlying learner behavior. As indicated by the evi- dence in this paper, we posit that learning platforms lever- aging KT models need to be mindful of systemic changes or drastic changes in certain user demographics. 
    more » « less