This paper systematically investigates the generation of code explanations by Large Language Models (LLMs) for code examples commonly encountered in introductory programming courses. Our findings reveal significant variations in the nature of code explanations produced by LLMs, influenced by factors such as the wording of the prompt, the specific code examples under consideration, the programming language involved, the temperature parameter, and the version of the LLM. However, a consistent pattern emerges for Java and Python, where explanations exhibit a Flesch-Kincaid readability level of approximately 7-8 grade and a consistent lexical density, indicating the proportion of meaningful words relative to the total explanation size. Additionally, the generated explanations consistently achieve high scores for correctness, but lower scores on three other metrics: completeness, conciseness, and specificity.
more »
« less
The Behavior of Large Language Models When Prompted to Generate Code Explanations.
This paper systematically explores how Large Language Models (LLMs) generate explanations of code examples of the type used in intro-to-programming courses. As we show, the nature of code explanations generated by LLMs varies considerably based on the wording of the prompt, the target code examples being explained, the programming language, the temperature parameter, and the version of the LLM. Nevertheless, they are consistent in two major respects for Java and Python: the readability level, which hovers around 7-8 grade, and lexical density, i.e., the relative size of the meaninful words with respect to the total explanation size. Furthermore, the explanations score very high in correctness but less on three other metrics: completeness, conciseness, and contextualization.
more »
« less
- Award ID(s):
- 1822816
- PAR ID:
- 10482231
- Publisher / Repository:
- Proceedings of The workshop on Generative AI for Education (GAIED): Advances, Opportunities, and Challenges, The Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)
- Date Published:
- Journal Name:
- Proceedings of The workshop on Generative AI for Education (GAIED): Advances, Opportunities, and Challenges, The Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)
- Format(s):
- Medium: X
- Location:
- New Orleands, LA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper systematically investigates the generation of code explanations by Large Language Models (LLMs) for code examples commonly encountered in introductory programming courses. Our findings reveal significant variations in the nature of code explanations produced by LLMs, influenced by factors such as the wording of the prompt, the specific code examples under consideration, the programming language involved, the temperature parameter, and the version of the LLM. However, a consistent pattern emerges for Java and Python, where ex- planations exhibit a Flesch-Kincaid readability level of approximately 7-8 grade and a consistent lexical density, indicating the proportion of meaningful words relative to the total explanation size. Additionally, the generated explanations consistently achieve high scores for correctness, but lower scores on three other metrics: completeness, conciseness, and specificity.more » « less
-
Worked examples, which present an explained code for solving typical programming problems are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide explanations for many examples typically used in a programming class. In this paper, we assess the feasibility of using LLMs to generate code explanations for passive and active example exploration systems. To achieve this goal, we compare the code explanations generated by chatGPT with the explanations generated by both experts and students.more » « less
-
Worked examples (solutions to typical programming problems presented as a source code in a certain language and are used to explain the topics from a programming class) are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide line-by-line explanations for a large number of examples typically used in a programming class. In this paper, we explore and assess a human-AI collaboration approach to authoring worked examples for Java programming. We introduce an authoring system for creating Java worked examples that generates a starting version of code explanations and presents it to the instructor to edit if necessary. We also present a study that assesses the quality of explanations created with this approach.more » « less
-
Worked code examples are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide line-by-line explanations of a large number of examples typically used in a programming class. This paper explores the opportunity to facilitate the development of worked examples for Java programming through a human-AI collaborative authoring approach. The idea of collaborative authoring is to generate a starting version of code explanations using LLM and present it to the instructor to edit if necessary. The critical step towards implementing this idea is to ensure that LLM can produce code explanations that look meaningful and acceptable to instructors and students. To achieve this goal, we performed an extensive prompt engineering study and evaluated the explanation produced by the selected prompt in a user study with students and authors.more » « less
An official website of the United States government

