skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: KC-Finder: Automated Knowledge Component Discovery for Programming Problems
Knowledge components (KCs) have many applications. In computing education, knowing the demonstration of specific KCs has been challenging. This paper introduces an entirely data-driven approach for (i) discovering KCs and (ii) demonstrating KCs, using students’ actual code submissions. Our system is based on two expected properties of KCs: (i) generate learning curves following the power law of practice, and (ii) are predictive of response correctness. We train a neural architecture (named KC-Finder) that classifies the correctness of student code submissions and captures problem-KC relationships. Our evaluation on data from 351 students in an introductory Java course shows that the learned KCs can generate reasonable learning curves and predict code submission correctness. At the same time, some KCs can be interpreted to identify programming skills. We compare the learning curves described by our model to four baselines, showing that (i) identifying KCs with naive methods is a difficult task and (ii) our learning curves exhibit a substantially better curve fit. Our work represents a first step in solving the data-driven KC discovery problem in computing education.  more » « less
Award ID(s):
2013502
PAR ID:
10525860
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Springer
Date Published:
Format(s):
Medium: X
Location:
In Proceedings of the 16th International Conference on Educational Data Mining (EDM).
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We analyze the submissions of 286 students as they solved Structured Query Language (SQL) homework assignments for an upper-level databases course. Databases and the ability to query them are becoming increasingly essential for not only computer scientists but also business professionals, scientists, and anyone who needs to make data-driven decisions. Despite the increasing importance of SQL and databases, little research has documented student difficulties in learning SQL. We replicate and extend prior studies of students' difficulties with learning SQL. Students worked on and submitted their homework through an online learning management system with support for autograding of code. Students received immediate feedback on the correctness of their solutions and had approximately a week to finish writing eight to ten queries. We categorized student submissions by the type of error, or lack thereof, that students made, and whether the student was eventually able to construct a correct query. Like prior work, we find that the majority of student mistakes are syntax errors. In contrast with the conclusions of prior work, we find that some students are never able to resolve these syntax errors to create valid queries. Additionally, we find that students struggle the most when they need to write SQL queries related to GROUP BY and correlated subqueries. We suggest implications for instruction and future research. 
    more » « less
  2. Research spanning nearly a century has found that math plays an important role in the learning of chemistry. Here, we use a large dataset of student interactions with online courseware to investigate the details of this link between math and chemistry. The activities in the courseware are labeled against a list of knowledge components (KCs) covered by the content, and student interactions are tracked over a full semester of general chemistry at a range of institutions. Logistic regression is used to model student performance as a function of the number of opportunities a student has taken to engage with a particular KC. This regression analysis generates estimates of both the initial knowledge and the learning rate for each student and each KC. Consistent with results from other domains, the initial knowledge varies substantially across students, but the learning rate is nearly the same for all students. The role of math is investigated by labeling each KC with the level of math involved. The overwhelming result from regressions based on these labels is that only the initial knowledge varies strongly across students and across the level of math involved in a particular topic. The student learning rate is nearly independent of both the level of math involved in a KC and the prior mathematical preparation of an individual student. The observation that the primary challenge for students lies in initial knowledge, rather than learning rate, may have implications for course and curriculum design. 
    more » « less
  3. null ; null ; null ; null ; null (Ed.)
    Educational content labeled with proper knowledge components (KCs) are particularly useful to teachers or content organizers. However, manually labeling educational content is labor intensive and error-prone. To address this challenge, prior research proposed machine learning based solutions to auto-label educational content with limited success. In this work, we significantly improve prior research by (1) expanding the input types to include KC descriptions, instructional video titles, and problem descriptions (i.e., three types of prediction task), (2) doubling the granularity of the prediction from 198 to 385 KC labels (i.e., more practical setting but much harder multinomial classification problem), (3) improving the prediction accuracies by 0.5–2.3% using Task-adaptive Pre-trained BERT, outperforming six baselines, and (4) proposing a simple evaluation measure by which we can recover 56–73% of mispredicted KC labels. All codes and data sets in the experiments are available at: https://github.com/tbs17/TAPT-BERT 
    more » « less
  4. Educational content labeled with proper knowledge components (KCs) are particularly useful to teachers or content organizers. However, manually labeling educational content is labor intensive and error-prone. To address this challenge, prior research proposed machine learning based solutions to auto-label educational content with limited success. In this work, we significantly improve prior research by (1) expanding the input types to include KC descriptions, instructional video titles, and problem descriptions (i.e., three types of prediction task), (2) doubling the granularity of the prediction from 198 to 385 KC labels (i.e., more practical setting but much harder multinomial classification problem), (3) improving the prediction accuracies by 0.5–2.3% using Task-adaptive Pre-trained BERT, outperforming six baselines, and (4) proposing a simple evaluation measure by which we can recover 56–73% of mispredicted KC labels. All codes and data sets in the experiments are available at: https://github.com/tbs17/TAPT-BERT Keywords 
    more » « less
  5. null (Ed.)
    We describe a data mining pipeline to convert data from educational systems into knowledge component (KC) models. In contrast to other approaches, our approach employs and compares multiple model search methodologies (e.g., sparse factor analysis, covariance clustering) within a single pipeline. In this preliminary work, we describe our approach's results on two datasets when using 2 model search methodologies for inferring item or KCs relations (i.e., implied transfer). The first method uses item covariances which are clustered to determine related KCs, and the second method uses sparse factor analysis to derive the relationship matrix for clustering. We evaluate these methods on data from experimentally controlled practice of statistics items as well as data from the Andes physics system. We explain our plans to upgrade our pipeline to include additional methods of finding item relationships and creating domain models. We discuss advantages of improving the domain model that go beyond model fit, including the fact that models with clustered item KCs result in performance predictions transferring between KCs, enabling the learning system to be more adaptive and better able to track student knowledge. 
    more » « less