skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Are Large Language Models Smart Enough for SQL Tutoring and Assessment?
The rise of Large Language Models (LLMs) as powerful knowledge-processing tools has sparked a wave of innovation in tutoring and assessment systems. Despite their well-documented limitations, LLMs offer unique capabilities that have been effectively harnessed for automated feedback generation and grading in intelligent learning environments. In this paper, we introduce {\em Project 360}, an experimental intelligent tutoring system designed for teaching SQL. Project 360 leverages the concept of {\em query equivalence} to assess the accuracy of student queries, using ChatGPT’s advanced natural language analysis to measure their semantic distance from a reference query. By integrating LLM-driven evaluation, Project 360 significantly outperforms traditional SQL tutoring and grading systems, offering more precise assessments and context-aware feedback. This study explores the feasibility and limitations of using ChatGPT as the analytical backbone of Project 360, evaluating its reliability for autonomous tutoring and assessment in database education. Our findings provide valuable insights into the evolving role of LLMs in education, highlighting their potential to revolutionize SQL learning while identifying areas for further refinement and improvement.  more » « less
Award ID(s):
2410668
PAR ID:
10631964
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Format(s):
Medium: X
Location:
Changhua Taiwan
Sponsoring Org:
National Science Foundation
More Like this
  1. Mills, Caitlin; Alexandron, Giora; Taibi, Davide; Lo_Bosco, Giosuè; Paquette, Luc (Ed.)
    Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific requirements and hinder their ability to achieve satisfactory performance. Retrieval-augmented generation (RAG) emerges as a promising solution by enabling LLMs to access relevant domain-specific knowledge during assessment. In this work, we propose an adaptive RAG framework for automated grading that dynamically retrieves and incorporates domain-specific knowledge based on the question and student answer context. Our approach combines semantic search and curated educational sources to retrieve valuable reference materials. Experimental results in a science education dataset demonstrate that our system achieves an improvement in grading accuracy compared to baseline LLM approaches. The findings suggest that RAG-enhanced grading systems can serve as reliable support with efficient performance gains. 
    more » « less
  2. Despite strong evidence that dialog-based intelligent tutoring systems (ITS) can increase learning gains, few courses include these tutors. In this research, we posit that existing dialog-based tutoring systems are not widely used because they are too complex and unfamiliar for a typical teacher to adapt or augment. OpenTutor is an open-source research project intended to scale up dialog-based tutoring by enabling ordinary teachers to rapidly author and improve dialog-based ITS, where authoring is presented through familiar tasks such as assessment item creation and grading. Formative usability results from a set of five non-CS educators are presented, which indicate that the OpenTutor system was relatively easy to use but that teachers would closely consider the cost benefit for time vs. student outcomes. Specifically, while OpenTutor grading was faster than expected, teachers reported that they would only spend any additional time (compared to a multiple choice) if the content required deeper learning. To decrease time to train answer classifiers, OpenTutor is investigating ways to reduce cold-start problems for tutoring dialogs. 
    more » « less
  3. We present and evaluate a machine learning based system that automatically grades audios of students speaking a foreign language. The use of automated systems to aid the assessment of student performance holds great promise in augmenting the teacher’s ability to provide meaningful feedback and instruction to students. Teachers spend a significant amount of time grading student work and the use of these tools can save teachers a significant amount of time on their grading. This additional time could be used to give personalized attention to each student. Significant prior research has focused on the grading of closed-form problems, open-ended essays and textual content. However, little research has focused on audio content that is much more prevalent in the language-study education. In this paper, we explore the development of automated assessment tools for audio responses in a college-level Chinese language-learning course. We analyze several challenges faced while working with data of this type as well as the generation and extraction of features for the purpose of building machine learning models to aid in the assessment of student language learning. 
    more » « less
  4. We present and evaluate a machine learning based system that automatically grades audios of students speaking a foreign language. The use of automated systems to aid the assessment of student performance holds great promise in augmenting the teacher’s ability to provide meaningful feedback and instruction to students. Teachers spend a significant amount of time grading student work and the use of these tools can save teachers a significant amount of time on their grading. This additional time could be used to give personalized attention to each student. Significant prior research has focused on the grading of closed-form problems, open-ended essays and textual content. However, little research has focused on audio content that is much more prevalent in language study education. In this paper, we explore the development of automated assessment tools for audio responses in a college-level Chinese language-learning course. We analyze several challenges faced while working with data of this type as well as the generation and extraction of features for the purpose of building machine learning models to aid in the assessment of student language learning. 
    more » « less
  5. This research explores a novel human-in-the-loop approach that goes beyond traditional prompt engineering approaches to harness Large Language Models (LLMs) with chain-of-thought prompting for grading middle school students’ short answer formative assessments in science and generating useful feedback. While recent efforts have successfully applied LLMs and generative AI to automatically grade assignments in secondary classrooms, the focus has primarily been on providing scores for mathematical and programming problems with little work targeting the generation of actionable insight from the student responses. This paper addresses these limitations by exploring a human-in-the-loop approach to make the process more intuitive and more effective. By incorporating the expertise of educators, this approach seeks to bridge the gap between automated assessment and meaningful educational support in the context of science education for middle school students. We have conducted a preliminary user study, which suggests that (1) co-created models improve the performance of formative feedback generation, and (2) educator insight can be integrated at multiple steps in the process to inform what goes into the model and what comes out. Our findings suggest that in-context learning and human-in-the-loop approaches may provide a scalable approach to automated grading, where the performance of the automated LLM-based grader continually improves over time, while also providing actionable feedback that can support students’ open-ended science learning. 
    more » « less