Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Introduction: The emergence and widespread adoption of generative AI (GenAI) chatbots such as ChatGPT, and programming assistants such as GitHub Copilot, have radically redefined the landscape of programming education. This calls for replication of studies and reexamination of findings from pre-GenAI CS contexts to understand the impact on students. Objectives: Achievement Goals are well studied in computing education and can be predictive of student interest and exam performance. The objective in this study is to compare findings from prior achievement goal studies in CS1 courses with new CS1 courses that emphasize the use of human-GenAI collaborative coding. Methods: In a CS1 course that integrates GenAI, we use linear regression to explore the relationship between achievement goals and prior experience on student interest, exam performance, and perceptions of GenAI. Results: As with prior findings in traditional CS1 classes, Mastery goals are correlated with interest in computing. Contradicting prior CS1 findings, normative goals are correlated with exam scores. Normative and mastery goals correlate with students’ perceptions of learning with GenAI. Mastery goals weakly correlate with reading and testing code output from GenAI.more » « lessFree, publicly-accessible full text available February 12, 2026
-
Generative AI (GenAI) is advancing rapidly, and the literature in computing education is expanding almost as quickly. Initial responses to GenAI tools were mixed between panic and utopian optimism. Many were fast to point out the opportunities and challenges of GenAI. Researchers reported that these new tools are capable of solving most introductory programming tasks and are causing disruptions throughout the curriculum. These tools can write and explain code, enhance error messages, create resources for instructors, and even provide feedback and help for students like a traditional teaching assistant. In 2024, new research started to emerge on the effects of GenAI usage in the computing classroom. These new data involve the use of GenAI to support classroom instruction at scale and to teach students how to code with GenAI. In support of the former, a new class of tools is emerging that can provide personalized feedback to students on their programming assignments or teach both programming and prompting skills at the same time. With the literature expanding so rapidly, this report aims to summarize and explain what is happening on the ground in computing classrooms. We provide a systematic literature review; a survey of educators and industry professionals; and interviews with educators using GenAI in their courses, educators studying GenAI, and researchers who create GenAI tools to support computing education. The triangulation of these methods and data sources expands the understanding of GenAI usage and perceptions at this critical moment for our community.more » « lessFree, publicly-accessible full text available January 22, 2026
-
The ability of students to “Explain in Plain English” (EiPE) the purpose of code is a critical skill for students in introductory programming courses to develop. EiPE questions serve as both a mechanism for students to develop and demonstrate code comprehension skills. However, evaluating this skill has been challenging as manual grading is time consuming and not easily automated. The process of constructing a prompt for the purposes of code generation for a Large Language Model, such OpenAI’s GPT-4, bears a striking resemblance to constructing EiPE responses. In this paper, we explore the potential of using test cases run on code generated by GPT-4 from students’ EiPE responses as a grading mechanism for EiPE questions. We applied this proposed grading method to a corpus of EiPE responses collected from past exams, then measured agreement between the results of this grading method and human graders. Overall, we find moderate agreement between the human raters and the results of the unit tests run on the generated code. This appears to be attributable to GPT-4’s code generation being more lenient than human graders on low-level descriptions of codemore » « less
-
In this full research paper, we examine various grading policies for second-chance testing. Second-chance testing refers to giving students the opportunity to take a second version of a test for some form of grade replacement. Second-chance testing as a pedagogical strategy bears some similarities to mastery learning, but second-chance testing is less expensive to implement. Previous work has shown that second-chance testing is associated with improved performance, but there is still a lack of clarity regarding the optimal grading policies for this testing strategy. We interviewed seven instructors who use second-chance testing in their courses to collect data on why they chose specific policies. We then conducted structured interviews with some students (N = 11) to capture more nuance about students’ decision making processes under the different grading policies. Afterwards, we conducted a quasi-experimental study to compare two second-chance testing grading policies and determine how they influenced students across multiple dimensions. We varied the grading policies used in two similar sophomore-level engineering courses. We collected assessment data and administered a survey that queried students (N = 513) about their behavior and reactions to both grading policies. Surprisingly, we found that the students’ preference between these two policies were almost perfectly split. We conclude that there are likely many policies that perform well by being simple and encouraging serious attempts on both tests.more » « less
-
We conducted an across-semester quasi-experimental study that compared students' outcomes under frequent and infrequent testing regimens in an introductory computer science course. Students in the frequent testing (4 quizzes and 4 exams) semester outperformed the infrequent testing (1 midterm and 1 final exam) semester by 9.1 to 13.5 percentage points on code writing questions. We complement these performance results with additional data from surveys, interviews, and analysis of textbook behavior. In the surveys, students report a preference for the smaller number of exams, but rated the exams in the frequent testing semester to be both less difficult and less stressful, in spite of the exams containing identical content. In the interviews, students predominantly indicated (1) that the frequent testing regimen encourages better study habits (e.g., more attention to work, less cramming) and leads to better learning, (2) that frequent testing reduces test anxiety, and (3) that the frequent testing regimen was more fair, but these opinions were not universally held. The students' impressions that the frequent testing regimen would lead to better study habits is borne out in our analysis of students' activities in the course's interactive textbook. In the frequent testing semester, students spent more time on textbook readings and appeared to answer textbook questions more earnestly (i.e., less "gaming the system'' by using hints and brute force).more » « less
An official website of the United States government

Full Text Available