skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 14, 2026

Title: Are Large Language Models Smart Enough for SQL Tutoring and Assessment?
The rise of Large Language Models (LLMs) as powerful knowledge-processing tools has sparked a wave of innovation in tutoring and assessment systems. Despite their well-documented limitations, LLMs offer unique capabilities that have been effectively harnessed for automated feedback generation and grading in intelligent learning environments. In this paper, we introduce {\em Project 360}, an experimental intelligent tutoring system designed for teaching SQL. Project 360 leverages the concept of {\em query equivalence} to assess the accuracy of student queries, using ChatGPT’s advanced natural language analysis to measure their semantic distance from a reference query. By integrating LLM-driven evaluation, Project 360 significantly outperforms traditional SQL tutoring and grading systems, offering more precise assessments and context-aware feedback. This study explores the feasibility and limitations of using ChatGPT as the analytical backbone of Project 360, evaluating its reliability for autonomous tutoring and assessment in database education. Our findings provide valuable insights into the evolving role of LLMs in education, highlighting their potential to revolutionize SQL learning while identifying areas for further refinement and improvement.  more » « less
Award ID(s):
2410668
PAR ID:
10631964
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Format(s):
Medium: X
Location:
Changhua Taiwan
Sponsoring Org:
National Science Foundation
More Like this
  1. Mills, Caitlin; Alexandron, Giora; Taibi, Davide; Lo_Bosco, Giosuè; Paquette, Luc (Ed.)
    Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific requirements and hinder their ability to achieve satisfactory performance. Retrieval-augmented generation (RAG) emerges as a promising solution by enabling LLMs to access relevant domain-specific knowledge during assessment. In this work, we propose an adaptive RAG framework for automated grading that dynamically retrieves and incorporates domain-specific knowledge based on the question and student answer context. Our approach combines semantic search and curated educational sources to retrieve valuable reference materials. Experimental results in a science education dataset demonstrate that our system achieves an improvement in grading accuracy compared to baseline LLM approaches. The findings suggest that RAG-enhanced grading systems can serve as reliable support with efficient performance gains. 
    more » « less
  2. Despite strong evidence that dialog-based intelligent tutoring systems (ITS) can increase learning gains, few courses include these tutors. In this research, we posit that existing dialog-based tutoring systems are not widely used because they are too complex and unfamiliar for a typical teacher to adapt or augment. OpenTutor is an open-source research project intended to scale up dialog-based tutoring by enabling ordinary teachers to rapidly author and improve dialog-based ITS, where authoring is presented through familiar tasks such as assessment item creation and grading. Formative usability results from a set of five non-CS educators are presented, which indicate that the OpenTutor system was relatively easy to use but that teachers would closely consider the cost benefit for time vs. student outcomes. Specifically, while OpenTutor grading was faster than expected, teachers reported that they would only spend any additional time (compared to a multiple choice) if the content required deeper learning. To decrease time to train answer classifiers, OpenTutor is investigating ways to reduce cold-start problems for tutoring dialogs. 
    more » « less
  3. We present and evaluate a machine learning based system that automatically grades audios of students speaking a foreign language. The use of automated systems to aid the assessment of student performance holds great promise in augmenting the teacher’s ability to provide meaningful feedback and instruction to students. Teachers spend a significant amount of time grading student work and the use of these tools can save teachers a significant amount of time on their grading. This additional time could be used to give personalized attention to each student. Significant prior research has focused on the grading of closed-form problems, open-ended essays and textual content. However, little research has focused on audio content that is much more prevalent in the language-study education. In this paper, we explore the development of automated assessment tools for audio responses in a college-level Chinese language-learning course. We analyze several challenges faced while working with data of this type as well as the generation and extraction of features for the purpose of building machine learning models to aid in the assessment of student language learning. 
    more » « less
  4. We present and evaluate a machine learning based system that automatically grades audios of students speaking a foreign language. The use of automated systems to aid the assessment of student performance holds great promise in augmenting the teacher’s ability to provide meaningful feedback and instruction to students. Teachers spend a significant amount of time grading student work and the use of these tools can save teachers a significant amount of time on their grading. This additional time could be used to give personalized attention to each student. Significant prior research has focused on the grading of closed-form problems, open-ended essays and textual content. However, little research has focused on audio content that is much more prevalent in language study education. In this paper, we explore the development of automated assessment tools for audio responses in a college-level Chinese language-learning course. We analyze several challenges faced while working with data of this type as well as the generation and extraction of features for the purpose of building machine learning models to aid in the assessment of student language learning. 
    more » « less
  5. As large language models (LLMs) continue to evolve, their capacity to replace humans as their surrogates is also improving. As increasing numbers of intelligent tutoring systems (ITSs) are embracing the integration of LLMs for digital tutoring, questions are arising as to how effective they are and if their hallucinatory behaviors diminish their perceived advantages. One critical question that is seldom asked if the availability, plurality, and relative weaknesses in the reasoning process of LLMs are contributing to the much discussed digital divide and equity and fairness in online learning. In this paper, we present an experiment with database design theory assignments and demonstrate that while their capacity to reason logically is improving, LLMs are still prone to serious errors. We demonstrate that in online learning and in the absence of a human instructor, LLMs could introduce inequity in the form of “wrongful” tutoring that could be devastatingly harmful for learners, which we call ignorant bias, in increasingly popular digital learning. We also show that significant challenges remain for STEM subjects, especially for subjects for which sound and free online tutoring systems exist. Based on the set of use cases, we formulate a possible direction for an effective ITS for online database learning classes of the future. 
    more » « less