skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: KC-Finder: Automated Knowledge Component Discovery for Programming Problems
Knowledge components (KCs) have many applications. In computing education, knowing the demonstration of specific KCs has been challenging. This paper introduces an entirely data-driven approach for (i) discovering KCs and (ii) demonstrating KCs, using students’ actual code submissions. Our system is based on two expected properties of KCs: (i) generate learning curves following the power law of practice, and (ii) are predictive of response correctness. We train a neural architecture (named KC-Finder) that classifies the correctness of student code submissions and captures problem-KC relationships. Our evaluation on data from 351 students in an introductory Java course shows that the learned KCs can generate reasonable learning curves and predict code submission correctness. At the same time, some KCs can be interpreted to identify programming skills. We compare the learning curves described by our model to four baselines, showing that (i) identifying KCs with naive methods is a difficult task and (ii) our learning curves exhibit a substantially better curve fit. Our work represents a first step in solving the data-driven KC discovery problem in computing education.  more » « less
Award ID(s):
2013502
PAR ID:
10525860
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Springer
Date Published:
Format(s):
Medium: X
Location:
In Proceedings of the 16th International Conference on Educational Data Mining (EDM).
Sponsoring Org:
National Science Foundation
More Like this
  1. Knowledge tracing is a method to model students’ knowledge and enable personalized education in many STEM disciplines such as mathematics and physics, but has so far still been a challenging task in computing disciplines. One key obstacle to successful knowledge tracing in computing education lies in the accurate extraction of knowledge components (KCs), since multiple intertwined KCs are practiced at the same time for programming problems. In this paper, we address the limitations of current methods and explore a hybrid approach for KC extraction, which combines automated code parsing with an expert-built ontology. We use an introductory (CS1) Java benchmark dataset to compare its KC extraction performance with the traditional extraction methods using a state-of-the-art evaluation approach based on learning curves. Our preliminary results show considerable improvement over traditional methods of student modeling. The results indicate the opportunity to improve automated KC extraction in CS education by incorporating expert knowledge into the process. 
    more » « less
  2. Evaluates DKT models’ ability to track individual knowledge components (KCs) in programming tasks. Proposes two enhancements—adding an explicit KC layer and code features—and shows that the KC layer yields modest improvements in KC-level interpretability, especially when tracking incorrect submissions. 
    more » « less
  3. As demand grows for job-ready data science professionals, there is increasing recognition that traditional training often falls short in cultivating the higher-order reasoning and real-world problem-solving skills essential to the field. A foundational step toward addressing this gap is the identification and organization of knowledge components (KCs) that underlie data science problem solving (DSPS). KCs represent conditional knowledge—knowing about appropriate actions given particular contexts or conditions—and correspond to the critical decisions data scientists must make throughout the problem-solving process. While existing taxonomies in data science education support curriculum development, they often lack the granularity and focus needed to support the assessment and development of DSPS skills. In this paper, we present a novel framework that combines the strengths of large language models (LLMs) and human expertise to identify, define, and organize KCs specific to DSPS. We treat LLMs as ``knowledge engineering assistants" capable of generating candidate KCs by drawing on their extensive training data, which includes a vast amount of domain knowledge and diverse sets of real-world DSPS cases. Our process involves prompting multiple LLMs to generate decision points, synthesizing and refining KC definitions across models, and using sentence-embedding models to infer the underlying structure of the resulting taxonomy. Human experts then review and iteratively refine the taxonomy to ensure validity. This human-AI collaborative workflow offers a scalable and efficient proof-of-concept for LLM-assisted knowledge engineering. The resulting KC taxonomy lays the groundwork for developing fine-grained assessment tools and adaptive learning systems that support deliberate practice in DSPS. Furthermore, the framework illustrates the potential of LLMs not just as content generators but as partners in structuring domain knowledge to inform instructional design. Future work will involve extending the framework by generating a directed graph of KCs based on their input-output dependencies and validating the taxonomy through expert consensus and learner studies. This approach contributes to both the practical advancement of DSPS coaching in data science education and the broader methodological toolkit for AI-supported knowledge engineering. 
    more » « less
  4. Novice programmers can greatly improve their understanding of challenging programming concepts by studying worked examples that demonstrate the implementation of these concepts. Despite the extensive repositories of effective worked examples created by CS education experts, a key challenge remains: identifying the most relevant worked example for a given programming problem and the specific difficulties a student faces solving the problem. Previous studies have explored similar example recommendation approaches. Our research introduces a novel method by utilizing deep learning code representation models to generate code vectors, capturing both syntactic and semantic similarities among programming examples. Driven by the need to provide relevant and personalized examples to programming students, our approach emphasizes similarity assessment and clustering techniques to identify similar code problems, examples, and challenges. This method aims to deliver more accurate and contextually relevant recommendations based on individual learning needs. Providing tailored support to students in real-time facilitates better problem-solving strategies and enhances students' learning experiences, contributing to the advancement of programming education. 
    more » « less
  5. null (Ed.)
    We analyze the submissions of 286 students as they solved Structured Query Language (SQL) homework assignments for an upper-level databases course. Databases and the ability to query them are becoming increasingly essential for not only computer scientists but also business professionals, scientists, and anyone who needs to make data-driven decisions. Despite the increasing importance of SQL and databases, little research has documented student difficulties in learning SQL. We replicate and extend prior studies of students' difficulties with learning SQL. Students worked on and submitted their homework through an online learning management system with support for autograding of code. Students received immediate feedback on the correctness of their solutions and had approximately a week to finish writing eight to ten queries. We categorized student submissions by the type of error, or lack thereof, that students made, and whether the student was eventually able to construct a correct query. Like prior work, we find that the majority of student mistakes are syntax errors. In contrast with the conclusions of prior work, we find that some students are never able to resolve these syntax errors to create valid queries. Additionally, we find that students struggle the most when they need to write SQL queries related to GROUP BY and correlated subqueries. We suggest implications for instruction and future research. 
    more » « less