NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Neurosymbolic Repair of Test Flakiness

https://doi.org/10.1145/3650212.3680369

Chen, Yang; Jabbarvand, Reyhaneh (September 2024, ACM)

Full Text Available
Revisiting Test-Case Prioritization on Long-Running Test Suites

https://doi.org/10.1145/3650212.3680307

Cheng, Runxiang; Wang, Shuai; Jabbarvand, Reyhaneh; Marinov, Darko (September 2024, ACM)

Full Text Available
WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models

https://doi.org/10.1145/3689736

Yang, Chenyuan; Deng, Yinlin; Lu, Runyu; Yao, Jiayi; Liu, Jiawei; Jabbarvand, Reyhaneh; Zhang, Lingming (October 2024, Proceedings of the ACM on Programming Languages)

Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences over the software supply chain. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates test programs without sufficient understanding of internal compiler behaviors. As such, they often fail to construct test programs to exercise intricate optimizations. Meanwhile, traditional white-box techniques, such as symbolic execution, are computationally inapplicable to the giant codebase of compiler systems. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and even have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, guiding LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization, with a spotlight on detecting deep logic bugs in the emerging deep learning (DL) compilers. WhiteFox adopts a multi-agent framework: (i) an LLM-based analysis agent examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) an LLM-based generation agent produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are also used as feedback to further enhance the test generation prompt on the fly. Our evaluation on the three most popular DL compilers (i.e., PyTorch Inductor, TensorFlow-XLA, and TensorFlow Lite) shows that WhiteFox can generate high-quality test programs to exercise deep optimizations requiring intricate conditions, practicing up to 8 times more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 101 bugs for the compilers under test, with 92 confirmed as previously unknown and 70 already fixed. Notably, WhiteFox has been recently acknowledged by the PyTorch team, and is in the process of being incorporated into its development workflow. Finally, beyond DL compilers, WhiteFox can also be adapted for compilers in different domains, such as LLVM, where WhiteFox has already found multiple bugs.
more » « less
Full Text Available
Can ChatGPT Repair Non-Order-Dependent Flaky Tests?

https://doi.org/10.1145/3643656.3643900

Chen, Yang; Jabbarvand, Reyhaneh (April 2024, ACM)

Full Text Available
DeltaDroid : Dynamic Delivery Testing in Android

https://doi.org/10.1145/3563213

Ghorbani, Negar; Jabbarvand, Reyhaneh; Salehnamadi, Navid; Garcia, Joshua; Malek, Sam (October 2023, ACM Transactions on Software Engineering and Methodology)

Android is a highly fragmented platform with a diverse set of devices and users. To support the deployment of apps in such a heterogeneous setting, Android has introduceddynamic delivery—a new model of software deployment in which optional, device- or user-specific functionalities of an app, calledDynamic Feature Modules (DFMs), can be installed, as needed, after the app’s initial installation. This model of app deployment, however, has exacerbated the challenges of properly testing Android apps. In this article, we first describe the results of an extensive study in which we formalized a defect model representing the various conditions under which DFM installations may fail. We then presentDeltaDroid—a tool aimed at assisting the developers with validating dynamic delivery behavior in their apps by augmenting their existing test suite. Our experimental evaluation using real-world apps corroboratesDeltaDroid’s ability to detect many crashes and unexpected behaviors that the existing automated testing tools cannot reveal.
more » « less
Full Text Available
Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code

https://doi.org/10.1145/3597503.3639226

Pan, Rangeet; Ibrahimzada, Ali Reza; Krishna, Rahul; Sankar, Divya; Wassi, Lambert Pouguem; Merler, Michele; Sobolev, Boris; Pavuluri, Raju; Sinha, Saurabh; Jabbarvand, Reyhaneh (April 2024, Proceedings of the International Conference on Software Engineering)
Transforming Test Suites into Croissants

https://doi.org/10.1145/3597926.3598119

Chen, Yang; Yildiz, Alperen; Marinov, Darko; Jabbarvand, Reyhaneh (July 2023, ACM International Symposium on Software Testing and Analysis)

Full Text Available

Search for: All records