skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging
Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds. However, they are imperfect and still make various mistakes. In a Computer Science education context, as these models are widely recognized as “AI pair programmers,” it becomes increasingly important to train students on evaluating and debugging the LLM-generated code. In this work, we introduce HypoCompass, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code. We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.  more » « less
Award ID(s):
2213791
PAR ID:
10601211
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Nature Switzerland
Date Published:
ISBN:
9783031643019
Page Range / eLocation ID:
265 to 279
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy, monologue reasoning, to train Code LLMs to reason comprehensive semantics, encompassing high-level functional descriptions, local execution effects of individual statements, and overall input/output behavior, thereby linking static code text with dynamic execution states. We begin by collecting PyX, a clean Python corpus of fully executable code samples with functional descriptions and test cases. We propose training Code LLMs not only to write code but also to understand code semantics by reasoning about key properties, constraints, and execution behaviors using natural language, mimicking human verbal debugging, i.e., rubber-duck debugging. This approach led to the development of SemCoder, a Code LLM with only 6.7B parameters, which shows competitive performance with GPT-3.5-turbo on code generation and execution reasoning tasks. SemCoder achieves 79.3% on HumanEval (GPT-3.5-turbo: 76.8%), 63.6% on CRUXEval-I (GPT-3.5-turbo: 50.3%), and 63.9% on CRUXEval-O (GPT-3.5-turbo: 59.0%). We also study the effectiveness of SemCoder's monologue-style execution reasoning compared to concrete scratchpad reasoning, showing that our approach integrates semantics from multiple dimensions more smoothly. Finally, we demonstrate the potential of applying learned semantics to improve Code LLMs' debugging and self-refining capabilities. Our data, code, and models are available at: https://github.com/ARiSE-Lab/SemCoder. 
    more » « less
  2. Much attention has focused on designing tools and activities that support learners in designing fully finished and functional applications such as games, robots, or e-textiles to be shared with others. But helping students learn to debug their applications often takes on a surprisingly more instructionist stance by giving them checklists, teaching them strategies or providing them with test programs. The idea of designing bugs for learning—or debugging by design—makes learners again agents of their own learning and, more importantly, of making and solving mistakes. In this paper, we report on our first implementation of “debugging by design” activities in a classroom of 25 high school students over a period of eight hours as part of a longer e-textiles unit. Here students were asked to craft buggy circuits and code for their peers to solve. In this paper we introduce the design of the debugging by design unit and, drawing on observations and interviews with students and the teacher, address the following research questions: (1) What did students gain from designing and solving bugs for others? (2) How did this experience shape students’ completion of the e-textiles unit? In the discussion, we address how debugging by design contributes to students’ learning of debugging skills. 
    more » « less
  3. B. Tangney, J. Bryne (Ed.)
    Much attention has focused on designing tools and activities that support learners in designing fully finished and functional applications such as games, robots, or e-textiles to be shared with others. But helping students learn to debug their applications often takes on a surprisingly more instructionist stance by giving them checklists, teaching them strategies or providing them with test programs. The idea of designing bugs for learning—or debugging by design—makes learners again agents of their own learning and, more importantly, of making and solving mistakes. In this paper, we report on our first implementation of “debugging by design” activities in a classroom of 25 high school students over a period of eight hours as part of a longer e-textiles unit. Here students were asked to craft buggy circuits and code for their peers to solve. In this paper we introduce the design of the debugging by design unit and, drawing on observations and interviews with students and the teacher, address the following research questions: (1) What did students gain from designing and solving bugs for others? (2) How did this experience shape students’ completion of the e-textiles unit? In the discussion, we address how debugging by design contributes to students’ learning of debugging skills. 
    more » « less
  4. Much attention has focused on designing tools and activities that support learners in designing fully finished and functional applications such as games, robots, or e-textiles to be shared with others. But helping students learn to debug their applications often takes on a surprisingly more instructionist stance by giving them checklists, teaching them strategies or providing them with test programs. The idea of designing bugs for learning—or debugging by design—makes learners again agents of their own learning and, more importantly, of making and solving mistakes. In this paper, we report on our first implementation of “debugging by design” activities in a classroom of 25 high school students over a period of eight hours as part of a longer e-textiles unit. Here students were asked to craft buggy circuits and code for their peers to solve. In this paper we introduce the design of the debugging by design unit and, drawing on observations and interviews with students and the teacher, address the following research questions: (1) What did students gain from designing and solving bugs for others? (2) How did this experience shape students’ completion of the e-textiles unit? In the discussion, we address how debugging by design contributes to students’ learning of debugging skills. 
    more » « less
  5. Large-language models (LLMs) hold significant promise in improving human-robot interaction, offering advanced conversational skills and versatility in managing diverse, open-ended user requests in various tasks and domains. Despite the potential to transform human-robot interaction, very little is known about the distinctive design requirements for utilizing LLMs in robots, which may differ from text and voice interaction and vary by task and context. To better understand these requirements, we conducted a user study (n = 32) comparing an LLM-powered social robot against text- and voice-based agents, analyzing task-based requirements in conversational tasks, including choose, generate, execute, and negotiate. Our findings show that LLM-powered robots elevate expectations for sophisticated non-verbal cues and excel in connection-building and deliberation, but fall short in logical communication and may induce anxiety. We provide design implications both for robots integrating LLMs and for fine-tuning LLMs for use with robots. 
    more » « less