Software testing is an essential skill for computer science students. Prior work reports that students desire support in determining what code to test and which scenarios should be tested. In response to this, we present a lightweight testing checklist that contains both tutorial information and testing strategies to guide students in what and how to test. To assess the impact of the testing checklist, we conducted an experimental, controlled A/B study with 32 undergraduate and graduate students. The study task was writing a test suite for an existing program. Students were given either the testing checklist (the experimental group) or a tutorial on a standard coverage tool with which they were already familiar (the control group). By analyzing the combination of student-written tests and survey responses, we found students with the checklist performed as well as or better than the coverage tool group, suggesting a potential positive impact of the checklist (or at minimum, a non-negative impact). This is particularly noteworthy given the control condition of the coverage tool is the state of the practice. These findings suggest that the testing tool support does not need to be sophisticated to be effective.
more »
« less
An Experience Report on Introducing Explicit Strategies into Testing Checklists for Advanced Beginners
Software testing is a critical skill for computing students, but learning and practicing testing can be challenging, particularly for beginners. A recent study suggests that a lightweight testing checklist that contains testing strategies and tutorial information could assist students in writing quality tests. However, students expressed a desire for more support in knowing how to test the code/scenario. Moreover, the potential costs and benefits of the testing checklist are not yet examined in a classroom setting. To that end, we improved the checklist by integrating explicit testing strategies to it (ETS Checklist), which provide step-by-step guidance on how to transfer semantic information from instructions to the possible testing scenarios. In this paper, we report our experiences in designing explicit strategies in unit testing, as well as adapting the ETS Checklist as optional tool support in a CS1.5 course. With the quantitative and qualitative analysis of the survey responses and lab assignment submissions generated by students, we discuss students' engagement with the ETS Checklists. Our results suggest that students who used the checklist intervention had significantly higher quality in their student-authored test code, in terms of code coverage, compared to those who did not, especially for assignments earlier in the course. We also observed students' unawareness of their need for help in writing high-quality tests.
more »
« less
- PAR ID:
- 10432810
- Date Published:
- Journal Name:
- Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education
- Volume:
- 1
- Page Range / eLocation ID:
- 194 to 200
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
One way to teach programming problem solving is to teach explicit, step-by-step strategies. While prior work has shown these to be effective in controlled settings, there has been little work investigating their efficacy in classrooms. We conducted a 5-week case study with 17 students aged 15-18, investigating students' sentiments toward two strategies for debugging and code reuse, students' use of scaffolding to execute these strategies, and associations between students' strategy use and their success at independently writing programs in class. We found that while students reported the strategies to be valuable, many had trouble regulating their choice of strategies, defaulting to ineffective trial and error, even when they knew systematic strategies would be more effective. Students that embraced the debugging strategy completed more features in a game development project, but this association was mediated by other factors, such as reliance on help, strategy self-efficacy, and mastery of the programming language used in the class. These results suggest that teaching of strategies may require more explicit instruction on strategy selection and self-regulation.more » « less
-
Flaky tests are a source of frustration and uncertainty for developers. In an educational environment, flaky tests can create doubts related to software behavior and student grades, especially when the grades depend on tests passing. NC State University's junior-level software engineering course models industrial practice through team-based development and testing of new features on a large electronic health record (EHR) system, iTrust2. Students are expected to maintain and supplement an extensive suite of UI tests using Selenium WebDriver. Team builds are run on the course's continuous integration (CI) infrastructure. Students report, and we confirm, that tests that pass on one build will inexplicably fail on the next, impacting productivity and confidence in code quality and the CI system. The goal of this work is to find and fix the sources of flaky tests in iTrust2. We analyze configurations of Selenium using different underlying web browsers and timeout strategies (waits) for both test stability and runtime performance. We also consider underlying hardware and operating systems. Our results show that HtmlUnit with Thread waits provides the lowest number of test failures and best runtime on poor-performing hardware. When given more resources (e.g., more memory and a faster CPU), Google Chrome with Angular waits is less flaky and faster than HtmlUnit, especially if the browser instance is not restarted between tests. The outcomes of this research are a more stable and substantially faster teaching application and a recommendation on how to configure Selenium for applications similar to iTrust2 that run in a CI environment.more » « less
-
In this paper, we explore using Parsons problems to scaffold novice programmers who are struggling while solving write-code problems. Parsons problems, in which students put mixed-up code blocks in order, can be created quickly and already serve thousands of students while other types of programming support methods are expensive to develop or do not scale. We conducted two studies in which novices were given equivalent Parsons problems as optional scaffolding while solving write-code problems. We investigated when, why, and how students used the Parsons problems as well as their perceptions of the benefits and challenges. A think-aloud observational study with 11 undergraduate students showed that students utilized the Parsons problem before writing a solution to get ideas about where to start; during writing a solution when they were stuck; and after writing a solution to debug errors and look for better strategies. Semi-structured interviews with the same 11 undergraduate students provided evidence that using Parsons problems to scaffold write-code problems helped students to reduce the difficulty, reduce the problem completion time, learn problem-solving strategies, and refine their programming knowledge. However, some students found them less useful if the Parsons solution did not match their approach or if they did not understand the solution. We then conducted a between-subjects classroom study with 81 undergraduate students to investigate the effects on learning. We found that students who received Parsons problems as scaffolding during write-code problems spent significantly less time solving those problems. However, there was no significant learning gain in either condition from pretest to posttest. We also discuss the design implications of our findings.more » « less
-
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.more » « less