Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy, monologue reasoning, to train Code LLMs to reason comprehensive semantics, encompassing high-level functional descriptions, local execution effects of individual statements, and overall input/output behavior, thereby linking static code text with dynamic execution states. We begin by collecting PyX, a clean Python corpus of fully executable code samples with functional descriptions and test cases. We propose training Code LLMs not only to write code but also to understand code semantics by reasoning about key properties, constraints, and execution behaviors using natural language, mimicking human verbal debugging, i.e., rubber-duck debugging. This approach led to the development of SemCoder, a Code LLM with only 6.7B parameters, which shows competitive performance with GPT-3.5-turbo on code generation and execution reasoning tasks. SemCoder achieves 79.3% on HumanEval (GPT-3.5-turbo: 76.8%), 63.6% on CRUXEval-I (GPT-3.5-turbo: 50.3%), and 63.9% on CRUXEval-O (GPT-3.5-turbo: 59.0%). We also study the effectiveness of SemCoder's monologue-style execution reasoning compared to concrete scratchpad reasoning, showing that our approach integrates semantics from multiple dimensions more smoothly. Finally, we demonstrate the potential of applying learned semantics to improve Code LLMs' debugging and self-refining capabilities. Our data, code, and models are available at: https://github.com/ARiSE-Lab/SemCoder.
more »
« less
How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging
Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds. However, they are imperfect and still make various mistakes. In a Computer Science education context, as these models are widely recognized as “AI pair programmers,” it becomes increasingly important to train students on evaluating and debugging the LLM-generated code. In this work, we introduce HypoCompass, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code. We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.
more »
« less
- Award ID(s):
- 2213791
- PAR ID:
- 10601211
- Publisher / Repository:
- Springer Nature Switzerland
- Date Published:
- ISBN:
- 9783031643019
- Page Range / eLocation ID:
- 265 to 279
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Much attention has focused on designing tools and activities that support learners in designing fully finished and functional applications such as games, robots, or e-textiles to be shared with others. But helping students learn to debug their applications often takes on a surprisingly more instructionist stance by giving them checklists, teaching them strategies or providing them with test programs. The idea of designing bugs for learning—or debugging by design—makes learners again agents of their own learning and, more importantly, of making and solving mistakes. In this paper, we report on our first implementation of “debugging by design” activities in a classroom of 25 high school students over a period of eight hours as part of a longer e-textiles unit. Here students were asked to craft buggy circuits and code for their peers to solve. In this paper we introduce the design of the debugging by design unit and, drawing on observations and interviews with students and the teacher, address the following research questions: (1) What did students gain from designing and solving bugs for others? (2) How did this experience shape students’ completion of the e-textiles unit? In the discussion, we address how debugging by design contributes to students’ learning of debugging skills.more » « less
-
B. Tangney, J. Bryne (Ed.)Much attention has focused on designing tools and activities that support learners in designing fully finished and functional applications such as games, robots, or e-textiles to be shared with others. But helping students learn to debug their applications often takes on a surprisingly more instructionist stance by giving them checklists, teaching them strategies or providing them with test programs. The idea of designing bugs for learning—or debugging by design—makes learners again agents of their own learning and, more importantly, of making and solving mistakes. In this paper, we report on our first implementation of “debugging by design” activities in a classroom of 25 high school students over a period of eight hours as part of a longer e-textiles unit. Here students were asked to craft buggy circuits and code for their peers to solve. In this paper we introduce the design of the debugging by design unit and, drawing on observations and interviews with students and the teacher, address the following research questions: (1) What did students gain from designing and solving bugs for others? (2) How did this experience shape students’ completion of the e-textiles unit? In the discussion, we address how debugging by design contributes to students’ learning of debugging skills.more » « less
-
Much attention has focused on designing tools and activities that support learners in designing fully finished and functional applications such as games, robots, or e-textiles to be shared with others. But helping students learn to debug their applications often takes on a surprisingly more instructionist stance by giving them checklists, teaching them strategies or providing them with test programs. The idea of designing bugs for learning—or debugging by design—makes learners again agents of their own learning and, more importantly, of making and solving mistakes. In this paper, we report on our first implementation of “debugging by design” activities in a classroom of 25 high school students over a period of eight hours as part of a longer e-textiles unit. Here students were asked to craft buggy circuits and code for their peers to solve. In this paper we introduce the design of the debugging by design unit and, drawing on observations and interviews with students and the teacher, address the following research questions: (1) What did students gain from designing and solving bugs for others? (2) How did this experience shape students’ completion of the e-textiles unit? In the discussion, we address how debugging by design contributes to students’ learning of debugging skills.more » « less
-
Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained traction in the security community, revealing security vulnerabilities and showcasing their potential in security-related tasks. This paper explores the intersection of LLMs with security and privacy. Specifically, we investigate how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs. Through a comprehensive literature review, the paper categorizes the papers into “The Good” (beneficial LLM applications), “The Bad” (offensive applications), and “The Ugly” (vulnerabilities of LLMs and their defenses). We have some interesting findings. For example, LLMs have proven to enhance code security (code vulnerability detection) and data privacy (data confidentiality protection), outperforming traditional methods. However, they can also be harnessed for various attacks (particularly user-level attacks) due to their human-like reasoning abilities. We have identified areas that require further research efforts. For example, Research on model and parameter extraction attacks is limited and often theoretical, hindered by LLM parameter scale and confidentiality. Safe instruction tuning, a recent development, requires more exploration. We hope that our work can shed light on the LLMs’ potential to both bolster and jeopardize cybersecurity.more » « less
An official website of the United States government

