skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1942962

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract BackgroundLearning analytics (LA) research often aggregates learning process data to extract measurements indicating constructs of interest. However, the warranty that such aggregation will produce reliable measurements has not been explicitly examined. The reliability evidence of aggregate measurements has rarely been reported, leaving an implicit assumption that such measurements are free of errors. ObjectivesThis study addresses these gaps by investigating the psychometric pros and cons of aggregate measurements. MethodsThis study proposes a framework for aggregating process data, which includes the conditions where aggregation is appropriate, and a guideline for selecting the proper reliability evidence and the computing procedure. We support and demonstrate the framework by analysing undergraduates' academic procrastination and programming proficiency in an introductory computer science course. Results and ConclusionAggregation over a period is acceptable and may improve measurement reliability only if the construct of interest is stable during the period. Otherwise, aggregation may mask meaningful changes in behaviours and should be avoided. While selecting the type of reliability evidence, a critical question is whether process data can be regarded as repeated measurements. Another question is whether the lengths of processes are unequal and individual events are unreliable. If the answer to the second question is no, segmenting each process into a fixed number of bins assists in computing the reliability coefficient. Major TakeawaysThe proposed framework can be a general guideline for aggregating process data in LA research. Researchers should check and report the reliability evidence for aggregate measurements before the ensuing interpretation. 
    more » « less
  2. Kim, Yoon_Jeon; Swiecki, Zachari (Ed.)
    Identifying and annotating student use of debugging strategies when solving computer programming problems can be a meaningful tool for studying and better understanding the development of debugging skills, which may lead to the design of effective pedagogical interventions. However, this process can be challenging when dealing with large datasets, especially when the strategies of interest are rare but important. This difficulty lies not only in the scale of the dataset but also in operationalizing these rare phenomena within the data. Operationalization requires annotators to first define how these rare phenomena manifest in the data and then obtain a sufficient number of positive examples to validate that this definition is reliable by accurately measuring Inter-Rater Reliability (IRR). This paper presents a method that leverages Large Language Models (LLMs) to efficiently exclude computer programming episodes that are unlikely to exhibit a specific debugging strategy. By using LLMs to filter out irrelevant programming episodes, this method focuses human annotation efforts on the most pertinent parts of the dataset, enabling experts to operationalize the coding scheme and reach IRR more efficiently. 
    more » « less
    Free, publicly-accessible full text available November 2, 2025
  3. Debugging is a challenging task for novice programmers in computer science courses and calls for specific investigation and support. Although the debugging process has been explored with qualitative methods and log data analyses, the detailed code changes that describe the evolution of debugging behaviors as students gain more experience remain relatively unexplored. In this study, we elicited “constituents” of the debugging process based on experts’ interpretation of students’ debugging behaviors in an introductory computer science (CS1) course. Epistemic Network Analysis (ENA) was used to study episodes where students fixed syntax/checkstyle errors or test errors. We compared epistemic networks between students with different prior programming experience and investigated how the networks evolved as students gained more experience throughout the semester. The ENA revealed that novices and experienced students put different emphasis on fixing checkstyle or syntax errors and highlighted interesting constituent co-occurrences that we investigated through further descriptive and statistical analyses. 
    more » « less
  4. Debugging is a challenging task for novice programmers in computer science courses and calls for specific investigation and support. Although the debugging process has been explored with qualitative methods and log data analyses, the detailed code changes that describe the evolution of debugging behaviors as students gain more experience remain relatively unexplored. In this study, we elicited “constituents” of the debugging process based on experts’ interpretation of students’ debugging behaviors in an introductory computer science (CS1) course. Epistemic Network Analysis (ENA) was used to study episodes where students fixed syntax/checkstyle errors or test errors. We compared epistemic networks between students with different prior programming experience and investigated how the networks evolved as students gained more experience throughout the semester. The ENA revealed that novices and experienced students put different emphasis on fixing checkstyle or syntax errors and highlighted interesting constituent co-occurrences that we investigated through further descriptive and statistical analyses. 
    more » « less
  5. null; null (Ed.)
    We explore how different elements of student persistence on computer programming problems may be related to learning outcomes and inform us about which elements may distinguish between productive and unproductive persistence. We collected data from an introductory computer science course at a large midwestern university in the U.S. hosted on an open-source, problem-driven learning system. We defined a set of features quantifying various aspect of persistence during problem solving and used a predictive modeling approach to predict student scores on subsequent and related quiz questions. We focused on careful feature engineering and model interpretation to shed light on the intricacies of both productive and unproductive persistence. Feature importance was analyzed using SHapley Additive exPlanations (SHAP) values. We found that the most impactful features were persisting until solving the problem, rapid guessing, and taking a break, while those with the strongest correlation between their values and their impact on prediction were the number of submissions, total time, and (again) taking a break. This suggests that the former are important features for accurate prediction, while the latter are indicative of the differences between productive persistence and wheel spinning in a computer science context. 
    more » « less