NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Is Assertion Roulette still a test smell? An experiment from the perspective of testing education

https://doi.org/10.1109/VL/HCC53370.2022.9833107

Bai, Gina R.; Presler-Marshall, Kai; Fisk, Susan R.; Stolee, Kathryn T. (September 2022, 2022 IEEE Symposium on Visual Languages and Human-Centric Computing)

Full Text Available
Understanding Similar Code through Comparative Comprehension

https://doi.org/10.1109/VL/HCC53370.2022.9833117

Middleton, Justin; Stolee, Kathryn T. (September 2022, 2022 IEEE Symposium on Visual Languages and Human-Centric Computing)

Full Text Available
Check It Off: Exploring the Impact of a Checklist Intervention on the Quality of Student-authored Unit Tests

https://doi.org/10.1145/3502718.3524799

Bai, Gina R.; Presler-Marshall, Kai; Price, Thomas W.; Stolee, Kathryn T. (July 2022, Proceedings of the 27th ACM Conference on on Innovation and Technology in Computer Science Education)

Software testing is an essential skill for computer science students. Prior work reports that students desire support in determining what code to test and which scenarios should be tested. In response to this, we present a lightweight testing checklist that contains both tutorial information and testing strategies to guide students in what and how to test. To assess the impact of the testing checklist, we conducted an experimental, controlled A/B study with 32 undergraduate and graduate students. The study task was writing a test suite for an existing program. Students were given either the testing checklist (the experimental group) or a tutorial on a standard coverage tool with which they were already familiar (the control group). By analyzing the combination of student-written tests and survey responses, we found students with the checklist performed as well as or better than the coverage tool group, suggesting a potential positive impact of the checklist (or at minimum, a non-negative impact). This is particularly noteworthy given the control condition of the coverage tool is the state of the practice. These findings suggest that the testing tool support does not need to be sophisticated to be effective.
more » « less
Full Text Available
Demystifying regular expression bugs: A comprehensive study on regular expression bug causes, fixes, and testing

https://doi.org/10.1007/s10664-021-10033-1

Wang, Peipei; Brown, Chris; Jennings, Jamie A.; Stolee, Kathryn T. (January 2022, Empirical Software Engineering)

Full Text Available
An Empirical Study on Regular Expression Bugs

https://doi.org/10.1145/3379597.3387464

Wang, Peipei; Brown, Chris; Jennings, Jamie A.; Stolee, Kathryn T. (June 2020, International Conference on Mining Software Repositories)
null (Ed.)
Full Text Available
Exploring Regular Expression Evolution

https://doi.org/10.1109/SANER.2019.8667972

Wang, Peipei; Bai, Gina R.; Stolee, Kathryn T. (February 2019, IEEE 26th International Conference on Software Analysis, Evolution and Reengineering)

Although there are tools to help developers understand the matching behaviors between a regular expression and a string, regular-expression related faults are still common. Learning developers’ behavior through the change history of regular expressions can identify common edit patterns, which can inform the creation of mutation and repair operators to assist with testing and fixing regular expressions. In this work, we explore how regular expressions evolve over time, focusing on the characteristics of regular expression edits, the syntactic and semantic difference of the edits, and the feature changes of edits. Our exploration uses two datasets. First, we look at GitHub projects that have a regular expression in their current version and look back through the commit logs to collect the regular expressions’ edit history. Second, we collect regular expressions composed by study participants during problem- solving tasks. Our results show that 1) 95% of the regular expressions from GitHub are not edited, 2) most edited regular expressions have a syntactic distance of 4-6 characters from their predecessors, 3) over 50% of the edits in GitHub tend to expand the scope of regular expression, and 4) the number of features used indicates the regular expression language usage increases over time. This work has implications for supporting regular expression repair and mutation to ensure test suite quality.
more » « less
Full Text Available
Wait Wait. No, Tell Me. Analyzing Selenium Configuration Effects on Test Flakiness

https://doi.org/10.1109/AST.2019.000-1

Presler-Marshall, Kai; Horton, Eric; Heckman, Sarah; Stolee, Kathryn T. (January 2019, Proceedings of the 14th International Workshop on Automation of Software Test)

Flaky tests are a source of frustration and uncertainty for developers. In an educational environment, flaky tests can create doubts related to software behavior and student grades, especially when the grades depend on tests passing. NC State University's junior-level software engineering course models industrial practice through team-based development and testing of new features on a large electronic health record (EHR) system, iTrust2. Students are expected to maintain and supplement an extensive suite of UI tests using Selenium WebDriver. Team builds are run on the course's continuous integration (CI) infrastructure. Students report, and we confirm, that tests that pass on one build will inexplicably fail on the next, impacting productivity and confidence in code quality and the CI system. The goal of this work is to find and fix the sources of flaky tests in iTrust2. We analyze configurations of Selenium using different underlying web browsers and timeout strategies (waits) for both test stability and runtime performance. We also consider underlying hardware and operating systems. Our results show that HtmlUnit with Thread waits provides the lowest number of test failures and best runtime on poor-performing hardware. When given more resources (e.g., more memory and a faster CPU), Google Chrome with Angular waits is less flaky and faster than HtmlUnit, especially if the browser instance is not restarted between tests. The outcomes of this research are a more stable and substantially faster teaching application and a recommendation on how to configure Selenium for applications similar to iTrust2 that run in a CI environment.
more » « less
Full Text Available
Exploring tools and strategies used during regular expression composition tasks

https://doi.org/10.1109/ICPC.2019.00039

Bai, Gina R.; Clee, Brian; Shrestha, Nischal; Chapman, Carl; Wright, Cimone; Stolee, Kathryn T. (January 2019, Proceedings of the 27th International Conference on Program Comprehension)

Regular expressions are frequently found in programming projects. Studies have found that developers can accurately determine whether a string matches a regular expression. However, we still do not know the challenges associated with composing regular expressions. We conduct an exploratory case study to reveal the tools and strategies developers use during regular expression composition. In this study, 29 students are tasked with composing regular expressions that pass unit tests illustrating the intended behavior. The tasks are in Java and the Eclipse IDE was set up with JUnit tests. Participants had one hour to work and could use any Eclipse tools, web search, or web-based tools they desired. Screen- capture software recorded all interactions with browsers and the IDE. We analyzed the videos quantitatively by transcribing logs and extracting personas. Our results show that participants were 30% successful (28 of 94 attempts) at achieving a 100% pass rate on the unit tests. When participants used tools frequently, as in the case of the novice tester and the knowledgeable tester personas, or when they guess at a solution prior to searching, they are more likely to pass all the unit tests. We also found that compile errors often arise when participants searched for a result and copy/pasted the regular expression from another language into their Java files. These results point to future research into making regular expression composition easier for programmers, such as integrating visualization into the IDE to reduce context switching or providing language migration support when reusing regular expressions written in another language to reduce compile errors.
more » « less
Full Text Available
Replication can improve prior results: a GitHub study of pull request acceptance

https://doi.org/10.1109/ICPC.2019.00037

Chen, Di; Stolee, Kathryn T.; Menzies, Tim (January 2019, Proceedings of the 27th International Conference on Program Comprehension)

Crowdsourcing and data mining can be used to effectively reduce the effort associated with the partial replication and enhancement of qualitative studies. For example, in a primary study, other researchers explored factors influencing the fate of GitHub pull requests using an extensive qualitative analysis of 20 pull requests. Guided by their findings, we mapped some of their qualitative insights onto quantitative questions. To determine how well their findings generalize, we collected much more data (170 additional pull requests from 142 GitHub projects). Using crowdsourcing, that data was augmented with subjective qualitative human opinions about how pull requests extended the original issue. The crowd’s answers were then combined with quantitative features and, using data mining, used to build a predictor for whether code would be merged. That predictor was far more accurate than the one built from the primary study’s qualitative factors (F1=90 vs 68%), illustrating the value of a mixed-methods approach and replication to improve prior results. To test the generality of this approach, the next step in future work is to conduct other studies that extend qualitative studies with crowdsourcing and data mining.
more » « less
Full Text Available
Bridging the Gap: From Research to Practical Advice

https://doi.org/10.1109/MS.2018.3571235

Goues, Claire Le; Jaspan, Ciera; Ozkaya, Ipek; Shaw, Mary; Stolee, Kathryn T. (September 2018, IEEE Software)

Full Text Available

« Prev Next »

Search for: All records