NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Automatically Detecting Numerical Instability in Machine Learning Applications via Soft Assertions

https://doi.org/10.1145/3729394

Sharmin, Shaila; Zahid, Anwar Hossain; Bhattacharjee, Subhankar; Igwilo, Chiamaka; Kim, Miryung; Le, Wei (June 2025, Proceedings of the ACM on Software Engineering)

Machine learning (ML) applications have become an integral part of our lives. ML applications extensively use floating-point computation and involve very large/small numbers; thus, maintaining the numerical stability of such complex computations remains an important challenge. Numerical bugs can lead to system crashes, incorrect output, and wasted computing resources. In this paper, we introduce a novel idea, namelysoft assertions (SA), to encode safety/error conditions for the places where numerical instability can occur. A soft assertion is an ML model automatically trained using the dataset obtained during unit testing of unstable functions. Given the values at the unstable function in an ML application, a soft assertion reports how to change these values in order to trigger the instability. We then use the output of soft assertions as signals to effectively mutate inputs to trigger numerical instability in ML applications. In the evaluation, we used the GRIST benchmark, a total of 79 programs, as well as 15 real-world ML applications from GitHub. We compared our tool with 5 state-of-the-art (SOTA) fuzzers. We found all the GRIST bugs and outperformed the baselines. We found 13 numerical bugs in real-world code, one of which had already been confirmed by the GitHub developers. While the baselines mostly found the bugs that report NaN and INF, our tool found numerical bugs with incorrect output. We showed one case where theTumor Detection Model, trained on Brain MRI images, should have predicted ”tumor”, but instead, it incorrectly predicted ”no tumor” due to the numerical bugs. Our replication package is located at https://figshare.com/s/6528d21ccd28bea94c32.
more » « less
Free, publicly-accessible full text available June 19, 2026
Closing the Gap: A User Study on the Real-world Usefulness of AI-powered Vulnerability Detection & Repair in the IDE

https://doi.org/10.1109/ICSE55347.2025.00126

Steenhoek, Benjamin; Sivaraman, Kalpathy; Gonzalez, Renata Saldivar; Mohylevskyy, Yevhen; Moghaddam, Roshanak Zilouchian; Le, Wei (April 2025, IEEE)

Free, publicly-accessible full text available April 26, 2026
From Pseudo-Code to Source Code: A Self-Supervised Search Approach

Kulkarni, Adithya; Chakraborty, Mohna; Sium, Yonas Afewerki; Valluri, Sai Charishma; Le, Wei; Li, Qi (April 2025, ICLR 2025 Third Workshop on Deep Learning for Code)

Free, publicly-accessible full text available April 24, 2026
Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection

https://doi.org/10.1145/3597503.3623345

Steenhoek, Benjamin; Gao, Hongyang; Le, Wei (February 2024, ACM)

Full Text Available
Towards Causal Deep Learning for Vulnerability Detection

https://doi.org/10.1145/3597503.3639170

Rahman, Md Mahbubur; Ceka, Ira; Mao, Chengzhi; Chakraborty, Saikat; Ray, Baishakhi; Le, Wei (April 2024, ACM)

Full Text Available
TRACED: Execution-aware Pre-training for Source Code

https://doi.org/10.1145/3597503.3608140

Ding, Yangruibo; Steenhoek, Benjamin; Pei, Kexin; Kaiser, Gail; Le, Wei; Ray, Baishakhi (February 2024, ACM)

Most existing pre-trained language models for source code focus on learning the static code text, typically augmented with static code structures (abstract syntax tree, dependency graphs, etc.). However, program semantics will not be fully exposed before the real execution. Without an understanding of the program execution, statically pre-trained models fail to comprehensively capture the dynamic code properties, such as the branch coverage and the runtime variable values, and they are consequently less effective at code understanding tasks, such as retrieving semantic clones and detecting software vulnerabilities. To close the gap between the static nature of language models and the dynamic characteristics of programs, we introduce TRACED, an execution-aware pre-training strategy for source code. Specifically, we pre-train code language models with a combination of source code, executable inputs, and corresponding execution traces. Our goal is to teach code models the complicated execution logic during the pre-training, enabling the model to statically estimate the dynamic code properties without repeatedly executing code during task-specific fine-tuning. To illustrate the effectiveness of our proposed approach, we fine-tune and evaluate TRACED on three downstream tasks: static execution estimation, clone retrieval, and vulnerability detection. The empirical results show that TRACED relatively improves the statically pre-trained code models by 12.4% for complete execution path prediction and by 25.2% for runtime variable value predictions. TRACED also significantly outperforms statically pre-trained models in clone retrieval and vulnerability detection across four public benchmarks.
more » « less
Full Text Available
Towards Understanding and Enhancing Robustness of Deep Learning Models against Malicious Unlearning Attacks

Qian, Wei; Zhao, Chenxu; Le, Wei; Ma, Meiyi; Huai, Mengdi (August 2023, Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

Given the availability of abundant data, deep learning models have been advanced and become ubiquitous in the past decade. In practice, due to many different reasons (e.g., privacy, usability, and fidelity), individuals also want the trained deep models to forget some specific data. Motivated by this, machine unlearning (also known as selective data forgetting) has been intensively studied, which aims at removing the influence that any particular training sample had on the trained model during the unlearning process. However, people usually employ machine unlearning methods as trusted basic tools and rarely have any doubt about their reliability. In fact, the increasingly critical role of machine unlearning makes deep learning models susceptible to the risk of being maliciously attacked. To well understand the performance of deep learning models in malicious environments, we believe that it is critical to study the robustness of deep learning models to malicious unlearning attacks, which happen during the unlearning process. To bridge this gap, in this paper, we first demonstrate that malicious unlearning attacks pose immense threats to the security of deep learning systems. Specifically, we present a broad class of malicious unlearning attacks wherein maliciously crafted unlearning requests trigger deep learning models to misbehave on target samples in a highly controllable and predictable manner. In addition, to improve the robustness of deep learning models, we also present a general defense mechanism, which aims to identify and unlearn effective malicious unlearning requests based on their gradient influence on the unlearned models. Further, theoretical analyses are conducted to analyze the proposed methods. Extensive experiments on real-world datasets validate the vulnerabilities of deep learning models to malicious unlearning attacks and the effectiveness of the introduced defense mechanism.
more » « less
Full Text Available
DeepLocalize: Fault Localization for Deep Neural Networks

https://doi.org/10.1109/ICSE43902.2021.00034

Wardat, Mohammad; Le, Wei; Rajan, Hridesh (May 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE))
null (Ed.)
Deep Neural Networks (DNNs) are becoming an integral part of most software systems. Previous work has shown that DNNs have bugs. Unfortunately, existing debugging techniques don't support localizing DNN bugs because of the lack of understanding of model behaviors. The entire DNN model appears as a black box. To address these problems, we propose an approach and a tool that automatically determines whether the model is buggy or not, and identifies the root causes for DNN errors. Our key insight is that historic trends in values propagated between layers can be analyzed to identify faults, and also localize faults. To that end, we first enable dynamic analysis of deep learning applications: by converting it into an imperative representation and alternatively using a callback mechanism. Both mechanisms allows us to insert probes that enable dynamic analysis over the traces produced by the DNN while it is being trained on the training data. We then conduct dynamic analysis over the traces to identify the faulty layer or hyperparameter that causes the error. We propose an algorithm for identifying root causes by capturing any numerical error and monitoring the model during training and finding the relevance of every layer/parameter on the DNN outcome. We have collected a benchmark containing 40 buggy models and patches that contain real errors in deep learning applications from Stack Overflow and GitHub. Our benchmark can be used to evaluate automated debugging tools and repair techniques. We have evaluated our approach using this DNN bug-and-patch benchmark, and the results showed that our approach is much more effective than the existing debugging approach used in the state-of-the-practice Keras library. For 34/40 cases, our approach was able to detect faults whereas the best debugging approach provided by Keras detected 32/40 faults. Our approach was able to localize 21/40 bugs whereas Keras did not localize any faults.
more » « less
Full Text Available
Validating static warnings via testing code fragments

https://doi.org/10.1145/3460319.3464832

Kallingal Joshy, Ashwin; Chen, Xueyuan; Steenhoek, Benjamin; Le, Wei (July 2021, Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis)
null (Ed.)
Full Text Available

Search for: All records