NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

NeuralSAT: A High-Performance Verification Tool for Deep Neural Networks

https://doi.org/10.1007/978-3-031-98679-6_19

Duong, Hai; Nguyen, ThanhVu; Dwyer, Matthew B (July 2025, Springer Nature Switzerland)

Abstract Deep Neural Networks (DNNs) are increasingly deployed in critical applications, where ensuring their safety and robustness is paramount. We present$$_\text {CAV25}$$ $_{CAV 25}$ , a high-performance DNN verification tool that uses the DPLL(T) framework and supports a wide-range of network architectures and activation functions. Since its debut in VNN-COMP’23, in which it achieved the New Participant Award and ranked 4th overall,$$_\text {CAV25}$$ $_{CAV 25}$ has advanced significantly, achieving second place in VNN-COMP’24. This paper presents and evaluates the latest development of$$_\text {CAV25}$$ $_{CAV 25}$ , focusing on the versatility, ease of use, and competitive performance of the tool.$$_\text {CAV25}$$ $_{CAV 25}$ is available at:https://github.com/dynaroars/neuralsat.
more » « less
Free, publicly-accessible full text available July 22, 2026
Doc2OracLL: Investigating the Impact of Documentation on LLM-Based Test Oracle Generation

https://doi.org/10.1145/3729354

Hossain, Soneya Binta; Taylor, Raygan; Dwyer, Matthew (June 2025, Proceedings of the ACM on Software Engineering)

Code documentation is a critical artifact of software development, bridging human understanding and machine- readable code. Beyond aiding developers in code comprehension and maintenance, documentation also plays a critical role in automating various software engineering tasks, such as test oracle generation (TOG). In Java, Javadoc comments offer structured, natural language documentation embedded directly within the source code, typically describing functionality, usage, parameters, return values, and exceptional behavior. While prior research has explored the use of Javadoc comments in TOG alongside other information, such as the method under test (MUT), their potential as a stand-alone input source, the most relevant Javadoc components, and guidelines for writing effective Javadoc comments for automating TOG remain less explored. In this study, we investigate the impact of Javadoc comments on TOG through a comprehensive analysis. We begin by fine-tuning 10 large language models using three different prompt pairs to assess the role of Javadoc comments alongside other contextual information. Next, we systematically analyze the impact of different Javadoc comment’s components on TOG. To evaluate the generalizability of Javadoc comments from various sources, we also generate them using the GPT-3.5 model. We perform a thorough bug detection study using Defects4J dataset to understand their role in real-world bug detection. Our results show that incorporating Javadoc comments improves the accuracy of test oracles in most cases, aligning closely with ground truth. We find that Javadoc comments alone can match or even outperform approaches that utilize the MUT implementation. Additionally, we identify that the description and the return tag are the most valuable components for TOG. Finally, our approach, when using only Javadoc comments, detects between 19% and 94% more real-world bugs in Defects4J than prior methods, establishing a new state-of-the-art. To further guide developers in writing effective documentation, we conduct a detailed qualitative study on when Javadoc comments are helpful or harmful for TOG.
more » « less
Free, publicly-accessible full text available June 19, 2026
TOGLL: Correct and Strong Test Oracle Generation with LLMS

https://doi.org/10.1109/ICSE55347.2025.00098

Hossain, Soneya Binta; Dwyer, Matthew B (April 2025, IEEE)

Free, publicly-accessible full text available April 26, 2026
Implications of data topology for deep generative models

https://doi.org/10.3389/fcomp.2024.1260604

Jin, Yinzhu; McDaniel, Rory; Tatro, N Joseph; Catanzaro, Michael J; Smith, Abraham D; Bendich, Paul; Dwyer, Matthew B; Fletcher, P Thomas (August 2024, Frontiers in Computer Science)

Many deep generative models, such as variational autoencoders (VAEs) and generative adversarial networks (GANs), learn an immersion mapping from a standard normal distribution in a low-dimensional latent space into a higher-dimensional data space. As such, these mappings are only capable of producing simple data topologies, i.e., those equivalent to an immersion of Euclidean space. In this work, we demonstrate the limitations of such latent space generative models when trained on data distributions with non-trivial topologies. We do this by training these models on synthetic image datasets with known topologies (spheres, torii, etc.). We then show how this results in failures of both data generation as well as data interpolation. Next, we compare this behavior to two classes of deep generative models that in principle allow for more complex data topologies. First, we look at chart autoencoders (CAEs), which construct a smooth data manifold from multiple latent space chart mappings. Second, we explore score-based models, e.g., denoising diffusion probabilistic models, which estimate gradients of the data distribution without resorting to an explicit mapping to a latent space. Our results show that these models do demonstrate improved ability over latent space models in modeling data distributions with complex topologies, however, challenges still remain.
more » « less
Full Text Available
Harnessing Neuron Stability to Improve DNN Verification

Duong, Hai; Xu, Dong; Nguyen, Thanhvu; Dwyer, Matthew B (July 2024, Proceedings of the ACM on Software Engineering)

Deep Neural Networks (DNN) have emerged as an effective approach to tackling real-world problems. However, like human-written software, DNNs are susceptible to bugs and attacks. This has generated significant interests in developing effective and scalable DNN verification techniques and tools. Recent developments in DNN verification have highlighted the potential of constraint-solving approaches that combine abstraction techniques with SAT solving. Abstraction approaches are effective at precisely encode neuron behavior when it is linear, but they lead to overapproximation and combinatorial scaling when behavior is non-linear. SAT approaches in DNN verification have incorporated standard DPLL techniques, but have overlooked important optimizations found in modern SAT solvers that help them scale on industrial benchmarks. In this paper, we present VeriStable, a novel extension of recently proposed DPLL-based constraint DNN verification approach. VeriStable leverages the insight that while neuron behavior may be non-linear across the entire DNN input space, at intermediate states computed during verification many neurons may be constrained to have linear behavior – these neurons are stable. Efficiently detecting stable neurons reduces combinatorial complexity without compromising the precision of abstractions. Moreover, the structure of clauses arising in DNN verification problems shares important characteristics with industrial SAT benchmarks. We adapt and incorporate multi-threading and restart optimizations targeting those characteristics to further optimize DPLL-based DNN verification. We evaluate the effectiveness of VeriStable across a range of challenging benchmarks including fully- connected feedforward networks (FNNs), convolutional neural networks (CNNs) and residual networks (ResNets) applied to the standard MNIST and CIFAR datasets. Preliminary results show that VeriStable is competitive and outperforms state-of-the-art DNN verification tools, including 𝛼-𝛽-CROWN and MN-BaB, the first and second performers of the VNN-COMP, respectively.
more » « less
Full Text Available
Harnessing Neuron Stability to Improve DNN Verification

https://doi.org/10.1145/3643765

Duong, Hai; Xu, Dong; Nguyen, Thanhvu; Dwyer, Matthew B (July 2024, Proceedings of the ACM on Software Engineering)

Deep Neural Networks (DNN) have emerged as an effective approach to tackling real-world problems. However, like human-written software, DNNs are susceptible to bugs and attacks. This has generated significant interest in developing effective and scalable DNN verification techniques and tools. Recent developments in DNN verification have highlighted the potential of constraint-solving approaches that combine abstraction techniques with SAT solving. Abstraction approaches are effective at precisely encoding neuron behavior when it is linear, but they lead to overapproximation and combinatorial scaling when behavior is non-linear. SAT approaches in DNN verification have incorporated standard DPLL techniques, but have overlooked important optimizations found in modern SAT solvers that help them scale on industrial benchmarks. In this paper, we present VeriStable, a novel extension of the recently proposed DPLL-based constraint DNN verification approach. VeriStable leverages the insight that while neuron behavior may be non-linear across the entire DNN input space, at intermediate states computed during verification many neurons may be constrained to have linear behavior – these neurons are stable. Efficiently detecting stable neurons reduces combinatorial complexity without compromising the precision of abstractions. Moreover, the structure of clauses arising in DNN verification problems shares important characteristics with industrial SAT benchmarks. We adapt and incorporate multi-threading and restart optimizations targeting those characteristics to further optimize DPLL-based DNN verification. We evaluate the effectiveness of VeriStable across a range of challenging benchmarks including fully- connected feedforward networks (FNNs), convolutional neural networks (CNNs) and residual networks (ResNets) applied to the standard MNIST and CIFAR datasets. Preliminary results show that VeriStable is competitive and outperforms state-of-the-art DNN verification tools, including α-β-CROWN and MN-BaB, the first and second performers of the VNN-COMP, respectively.
more » « less
Full Text Available
S3C: Spatial Semantic Scene Coverage for Autonomous Vehicles

https://doi.org/10.1145/3597503.3639178

Woodlief, Trey; Toledo, Felipe; Elbaum, Sebastian; Dwyer, Matthew B (April 2024, ACM)
CIT4DNN: Generating Diverse and Rare Inputs for Neural Networks Using Latent Space Combinatorial Testing

https://doi.org/10.1145/3597503.3639106

Dola, Swaroopa; McDaniel, Rory; Dwyer, Matthew B.; Soffa, Mary Lou (April 2024, ACM)

Deep neural networks (DNN) are being used in a wide range of applications including safety-critical systems. Several DNN test gen- eration approaches have been proposed to generate fault-revealing test inputs. However, the existing test generation approaches do not systematically cover the input data distribution to test DNNs with diverse inputs, and none of the approaches investigate the re- lationship between rare inputs and faults. We propose cit4dnn, an automated black-box approach to generate DNN test sets that are feature-diverse and that comprise rare inputs. cit4dnn constructs diverse test sets by applying combinatorial interaction testing to the latent space of generative models and formulates constraints over the geometry of the latent space to generate rare and fault-revealing test inputs. Evaluation on a range of datasets and models shows that cit4dnn generated tests are more feature diverse than the state-of-the-art, and can target rare fault-revealing testing inputs more effectively than existing methods.
more » « less
Full Text Available
Training for Verification: Increasing Neuron Stability to Scale DNN Verification

https://doi.org/10.1007/978-3-031-57256-2_2

Xu, Dong; Mozumder, Nusrat J; Duong, Hai; Dwyer, Matthew B (April 2024, 30th International Conference Tools and Algorithms for the Construction and Analysis of Systems)
Finkbeiner, Bernd; Kovacs, Laura (Ed.)
With the growing use of deep neural networks(DNN) in mis- sion and safety-critical applications, there is an increasing interest in DNN verification. Unfortunately, increasingly complex network struc- tures, non-linear behavior, and high-dimensional input spaces combine to make DNN verification computationally challenging. Despite tremen- dous advances, DNN verifiers are still challenged to scale to large ver- ification problems. In this work, we explore how the number of stable neurons under the precondition of a specification gives rise to verifica- tion complexity. We examine prior work on the problem, adapt it, and develop several novel approaches to increase stability. We demonstrate that neuron stability can be increased substantially without compromis- ing model accuracy and this yields a multi-fold improvement in DNN verifier performance.
more » « less
Full Text Available
Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network Testing

https://doi.org/10.1145/3576040

Dola, Swaroopa; Dwyer, Matthew B.; Soffa, Mary Lou (July 2023, ACM Transactions on Software Engineering and Methodology)

Testing deep neural networks (DNNs) has garnered great interest in the recent years due to their use in many applications. Black-box test adequacy measures are useful for guiding the testing process in covering the input domain. However, the absence of input specifications makes it challenging to apply black-box test adequacy measures in DNN testing. The Input Distribution Coverage (IDC) framework addresses this challenge by using a variational autoencoder to learn a low dimensional latent representation of the input distribution, and then using that latent space as a coverage domain for testing. IDC applies combinatorial interaction testing on a partitioning of the latent space to measure test adequacy. Empirical evaluation demonstrates that IDC is cost-effective, capable of detecting feature diversity in test inputs, and more sensitive than prior work to test inputs generated using different DNN test generation methods. The findings demonstrate that IDC overcomes several limitations of white-box DNN coverage approaches by discounting coverage from unrealistic inputs and enabling the calculation of test adequacy metrics that capture the feature diversity present in the input space of DNNs.
more » « less
Full Text Available

« Prev Next »

Search for: All records