Using LLM-based Filtering to Develop Reliable Coding Schemes for Rare Debugging Strategies

Fan, Aysa X; Liu, Qianhui; Paquette, Luc; Pinto, Juan

Citation Details

Identifying and annotating student use of debugging strategies when solving computer programming problems can be a meaningful tool for studying and better understanding the development of debugging skills, which may lead to the design of effective pedagogical interventions. However, this process can be challenging when dealing with large datasets, especially when the strategies of interest are rare but important. This difficulty lies not only in the scale of the dataset but also in operationalizing these rare phenomena within the data. Operationalization requires annotators to first define how these rare phenomena manifest in the data and then obtain a sufficient number of positive examples to validate that this definition is reliable by accurately measuring Inter-Rater Reliability (IRR). This paper presents a method that leverages Large Language Models (LLMs) to efficiently exclude computer programming episodes that are unlikely to exhibit a specific debugging strategy. By using LLMs to filter out irrelevant programming episodes, this method focuses human annotation efforts on the most pertinent parts of the dataset, enabling experts to operationalize the coding scheme and reach IRR more efficiently. more »

Award ID(s):: 1942962

PAR ID:: 10576099

Author(s) / Creator(s):: Fan, Aysa X; Liu, Qianhui; Paquette, Luc; Pinto, Juan

Editor(s):: Kim, Yoon_Jeon; Swiecki, Zachari

Publisher / Repository:: Springer Nature

Date Published:: 2024-11-02

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this