skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 17, 2026

Title: Systematically Identifying, Defining and Organizing Knowledge Components for Data Science Problem Solving through Human-LLM Collaboration
As demand grows for job-ready data science professionals, there is increasing recognition that traditional training often falls short in cultivating the higher-order reasoning and real-world problem-solving skills essential to the field. A foundational step toward addressing this gap is the identification and organization of knowledge components (KCs) that underlie data science problem solving (DSPS). KCs represent conditional knowledge—knowing about appropriate actions given particular contexts or conditions—and correspond to the critical decisions data scientists must make throughout the problem-solving process. While existing taxonomies in data science education support curriculum development, they often lack the granularity and focus needed to support the assessment and development of DSPS skills. In this paper, we present a novel framework that combines the strengths of large language models (LLMs) and human expertise to identify, define, and organize KCs specific to DSPS. We treat LLMs as ``knowledge engineering assistants" capable of generating candidate KCs by drawing on their extensive training data, which includes a vast amount of domain knowledge and diverse sets of real-world DSPS cases. Our process involves prompting multiple LLMs to generate decision points, synthesizing and refining KC definitions across models, and using sentence-embedding models to infer the underlying structure of the resulting taxonomy. Human experts then review and iteratively refine the taxonomy to ensure validity. This human-AI collaborative workflow offers a scalable and efficient proof-of-concept for LLM-assisted knowledge engineering. The resulting KC taxonomy lays the groundwork for developing fine-grained assessment tools and adaptive learning systems that support deliberate practice in DSPS. Furthermore, the framework illustrates the potential of LLMs not just as content generators but as partners in structuring domain knowledge to inform instructional design. Future work will involve extending the framework by generating a directed graph of KCs based on their input-output dependencies and validating the taxonomy through expert consensus and learner studies. This approach contributes to both the practical advancement of DSPS coaching in data science education and the broader methodological toolkit for AI-supported knowledge engineering.  more » « less
Award ID(s):
2429590
PAR ID:
10616438
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400712913
Page Range / eLocation ID:
341 to 345
Subject(s) / Keyword(s):
Domain Analysis, Knowledge Engineering, Large Language Models, Knowledge Components, Data Science Education, Data Science Problem Solving, Problem Solving
Format(s):
Medium: X
Location:
Palermo Italy
Sponsoring Org:
National Science Foundation
More Like this
  1. Knowledge components (KCs) have many applications. In computing education, knowing the demonstration of specific KCs has been challenging. This paper introduces an entirely data-driven approach for (i) discovering KCs and (ii) demonstrating KCs, using students’ actual code submissions. Our system is based on two expected properties of KCs: (i) generate learning curves following the power law of practice, and (ii) are predictive of response correctness. We train a neural architecture (named KC-Finder) that classifies the correctness of student code submissions and captures problem-KC relationships. Our evaluation on data from 351 students in an introductory Java course shows that the learned KCs can generate reasonable learning curves and predict code submission correctness. At the same time, some KCs can be interpreted to identify programming skills. We compare the learning curves described by our model to four baselines, showing that (i) identifying KCs with naive methods is a difficult task and (ii) our learning curves exhibit a substantially better curve fit. Our work represents a first step in solving the data-driven KC discovery problem in computing education. 
    more » « less
  2. Feng, Mingyu; Käser, Tanja; Talukdar, Partha (Ed.)
    Knowledge components (KCs) have many applications. In computing education, knowing the demonstration of specific KCs has been challenging. This paper introduces an entirely data-driven approach for (i) discovering KCs and (ii) demonstrating KCs, using students' actual code submissions. Our system is based on two expected properties of KCs: (i) generate learning curves following the power law of practice, and (ii) are predictive of response correctness. We train a neural architecture (named KC-Finder) that classifies the correctness of student code submissions and captures problem-KC relationships. Our evaluation on data from 351 students in an introductory Java course shows that the learned KCs can generate reasonable learning curves and predict code submission correctness. At the same time, some KCs can be interpreted to identify programming skills. We compare the learning curves described by our model to four baselines, showing that (i) identifying KCs with naive methods is a difficult task and (ii) our learning curves exhibit a substantially better curve fit. Our work represents a first step in solving the data-driven KC discovery problem in computing education. 
    more » « less
  3. Mills, Caitlin; Alexandron, Giora; Taibi, Davide; Lo_Bosco, Giosuè; Paquette, Luc (Ed.)
    Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific requirements and hinder their ability to achieve satisfactory performance. Retrieval-augmented generation (RAG) emerges as a promising solution by enabling LLMs to access relevant domain-specific knowledge during assessment. In this work, we propose an adaptive RAG framework for automated grading that dynamically retrieves and incorporates domain-specific knowledge based on the question and student answer context. Our approach combines semantic search and curated educational sources to retrieve valuable reference materials. Experimental results in a science education dataset demonstrate that our system achieves an improvement in grading accuracy compared to baseline LLM approaches. The findings suggest that RAG-enhanced grading systems can serve as reliable support with efficient performance gains. 
    more » « less
  4. Iskander, Magdy F (Ed.)
    ABSTRACT Innovative technology helps students foster creative thinking and problem‐solving abilities by augmenting human sensing and enriching input and output information. New technology can incorporate haptic sensing features—a sensing modality for user operations. Learning with haptic sensing features promises new ways to master cognitive and motor skills and higher‐order cognitive reasoning tasks (e.g., decision‐making and problem‐solving). This study conceptualizes haptic technology within the human‐technology interaction (HTI) framework. It aims to investigate the components of haptic systems to define their impact on learning and facilitate understanding of haptic technology, including application development to ease entry barriers for educators. The research builds a haptic HTI framework based on a systematic literature review on haptic applications in engineering learning over the last two decades. The review utilizes the SALSA methodology to analyze relevant studies comprehensively. The framework outcome is a haptic HTI taxonomy to build visual representations of the explicit connection between the taxonomy components and practical educational applications (by means of heatmaps). The approach led to a robust conceptualization of HTI into a taxonomy—a structured framework encompassing categories for interaction modalities, immersive technologies, and learning methodologies in engineering education. The model assists in understanding how haptic feedback can be utilized in learning with technology experiences. Applying haptic technology in engineering education includes mastering fundamental science concepts and creating customized haptic prototypes for engineering processes. A growing trend focuses on wearable haptics, such as gloves and vests, which involve kinesthetic movement, fine motor skills, and spatial awareness—all fostering spatial and temporal cognitive abilities (the ability to effectively manage and comprehend significant amounts ofspatial(how design components or resources are related to one another in the 3D space) andtemporal(the logic in a process, such as the order, sequences, and hierarchies of the resources information). The haptic human‐technology interaction (H‐HTI) framework guides future research in developing cognitive reasoning through H‐HTI, unlocking new frontiers in engineering education. 
    more » « less
  5. Education is poised for a transformative shift with the advent of neurosymbolic artificial intelligence (NAI), which will redefine how we support deeply adaptive and personalized learning experiences. The integration of Knowledge Graphs (KGs) with Large Language Models (LLMs), a significant and popular form of NAI, presents a promising avenue for advancing personalized instruction via neurosymbolic educational agents. By leveraging structured knowledge, these agents can provide individualized learning experiences that align with specific learner preferences and desired learning paths, while also mitigating biases inherent in traditional AI systems. NAI-powered education systems will be capable of interpreting complex human concepts and contexts while employing advanced problem-solving strategies, all grounded in established pedagogical frameworks. In this paper, we propose a system that leverages the unique affordances of KGs, LLMs, and pedagogical agents – embodied characters designed to enhance learning – as critical components of a hybrid NAI architecture. We discuss the rationale for our system design and the preliminary findings of our work. We conclude that education in the era of NAI will make learning more accessible, equitable, and aligned with real-world skills. This is an era that will explore a new depth of understanding in educational tools. 
    more » « less