NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Tool for Generating Exceptional Behavior Tests With Large Language Models

https://doi.org/10.1145/3696630.3728608

Zhong, Linghan; Yuan, Samuel; Zhang, Jiyang; Liu, Yu; Nie, Pengyu; Li, Junyi Jessy; Gligoric, Milos (June 2025, ACM)

Free, publicly-accessible full text available June 23, 2026
exLong: Generating Exceptional Behavior Tests with Large Language Models

Zhang, Jiyang; Liu, Yu; Nie, Pengyu; Li, Junyi Jessy; Gligoric, Milos (April 2025, International Conference on Software Engineering)

Free, publicly-accessible full text available April 28, 2026
Multilingual Code Co-evolution using Large Language Models

https://doi.org/10.1145/3611643.3616350

Zhang, Jiyang; Nie, Pengyu; Li, Junyi Jessy; Gligoric, Milos (November 2023, ACM)

Full Text Available
More Precise Regression Test Selection via Reasoning about Semantics-Modifying Changes

https://doi.org/10.1145/3597926.3598086

Liu, Yu; Zhang, Jiyang; Nie, Pengyu; Gligoric, Milos; Legunsen, Owolabi (July 2023, ACM)

Regression test selection (RTS) speeds up regression testing by only re-running tests that might be affected by code changes. Ideal RTS safely selects all affected tests and precisely selects only affected tests. But, aiming for this ideal is often slower than re-running all tests. So, recent RTS techniques use program analysis to trade precision for speed, i.e., lower regression testing time, or even use machine learning to trade safety for speed. We seek to make recent analysis-based RTS techniques more precise, to further speed up regression testing. Independent studies suggest that these techniques reached a “performance wall” in the speed-ups that they provide. We manually inspect code changes to discover those that do not require re-running tests that are only affected by such changes. We categorize 29 kinds of changes that we found from five projects into 13 findings, 11 of which are semantics-modifying. We enhance two RTS techniques—Ekstazi and STARTS—to reason about our findings. Using 1,150 versions of 23 projects, we evaluate the impact on safety and precision of leveraging such changes. We also evaluate if our findings from a few projects can speed up regression testing in other projects. The results show that our enhancements are effective and they can generalize. On average, they result in selecting 41.7% and 31.8% fewer tests, and take 33.7% and 28.7% less time than Ekstazi and STARTS, respectively, with no loss in safety.
more » « less
Full Text Available
CoditT5: Pretraining for Source Code and Natural Language Editing

https://doi.org/10.1145/3551349.3556955

Zhang, Jiyang; Panthaplackel, Sheena; Nie, Pengyu; Li, Junyi Jessy; Gligoric, Milos (October 2022, CoditT5: Pretraining for Source Code and Natural Language Editing)

Full Text Available
Python-by-contract dataset

https://doi.org/10.1145/3540250.3558917

Zhang, Jiyang; Ristin, Marko; Schanely, Phillip; van de Venn, Hans Wernher; Gligoric, Milos (November 2022, Python-by-contract dataset)

Full Text Available
Comparing and combining analysis-based and learning-based regression test selection

https://doi.org/10.1145/3524481.3527230

Zhang, Jiyang; Liu, Yu; Gligoric, Milos; Legunsen, Owolabi; Shi, August (May 2022, IEEE/ACM International Conference on Automation of Software Test)

Regression testing---rerunning tests on each code version to detect newly-broken functionality---is important and widely practiced. But, regression testing is costly due to the large number of tests and the high frequency of code changes. Regression test selection (RTS) optimizes regression testing by only rerunning a subset of tests that can be affected by changes. Researchers showed that RTS based on program analysis can save substantial testing time for (medium-sized) open-source projects. Practitioners also showed that RTS based on machine learning (ML) works well on very large code repositories, e.g., in Facebook's monorepository. We combine analysis-based RTS and ML-based RTS by using the latter to choose a subset of tests selected by the former. We first train several novel ML models to learn the impact of code changes on test outcomes using a training dataset that we obtain via mutation analysis. Then, we evaluate the benefits of combining ML models with analysis-based RTS on 10 projects, compared with using each technique alone. Combining ML-based RTS with two analysis-based RTS techniques-Ekstazi and STARTS-selects 25.34% and 21.44% fewer tests, respectively.
more » « less
Full Text Available
Impact of Evaluation Methodologies on Code Summarization

https://doi.org/10.18653/v1/2022.acl-long.339

Nie, Pengyu; Zhang, Jiyang; Li, Junyi Jessy; Mooney, Ray; Gligoric, Milos (January 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics)

There has been a growing interest in developing machine learning (ML) models for code summarization tasks, e.g., comment generation and method naming. Despite substantial increase in the effectiveness of ML models, the evaluation methodologies, i.e., the way people split datasets into training, validation, and test sets, were not well studied. Specifically, no prior work on code summarization considered the timestamps of code and comments during evaluation. This may lead to evaluations that are inconsistent with the intended use cases. In this paper, we introduce the time-segmented evaluation methodology, which is novel to the code summarization research community, and compare it with the mixed-project and cross-project methodologies that have been commonly used. Each methodology can be mapped to some use cases, and the time-segmented methodology should be adopted in the evaluation of ML models for code summarization. To assess the impact of methodologies, we collect a dataset of (code, comment) pairs with timestamps to train and evaluate several recent ML models for code summarization. Our experiments show that different methodologies lead to conflicting evaluation results. We invite the community to expand the set of methodologies used in evaluations.
more » « less
Full Text Available
Impact of Evaluation Methodologies on Code Summarization

Nie, Pengyu; Zhang, Jiyang; Mooney, Raymond; Li, Junyi; Gligoric, Milos (January 2022, Association for Computational Linguistics)

Full Text Available

Search for: All records