NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fairquant: Certifying and Quantifying Fairness of Deep Neural Networks

https://doi.org/10.1109/ICSE55347.2025.00016

Kim, Brian Hyeongseok; Wang, Jingbo; Wang, Chao (April 2025, IEEE)

Free, publicly-accessible full text available April 26, 2026
WaSCR: A WebAssembly Instruction-Timing Side Channel Repairer

https://doi.org/10.1145/3696410.3714693

Huang, Liyan; He, Junzhou; Wang, Chao; Wang, Weihang (April 2025, ACM)

Free, publicly-accessible full text available April 22, 2026
An Incremental Algorithm for Algebraic Program Analysis

https://doi.org/10.1145/3704901

Zhou, Chenyu; Fang, Yuzhou; Wang, Jingbo; Wang, Chao (January 2025, Proceedings of the ACM on Programming Languages)

We propose a method for conducting algebraic program analysis (APA) incrementally in response to changes of the program under analysis. APA is a program analysis paradigm that consists of two distinct steps: computing a path expression that succinctly summarizes the set of program paths of interest, and interpreting the path expression using a properly-defined semantic algebra to obtain program properties of interest. In this context, the goal of an incremental algorithm is to reduce the analysis time by leveraging the intermediate results computed before the program changes. We have made two main contributions. First, we propose a data structure for efficiently representing path expression as a tree together with a tree-based interpreting method. Second, we propose techniques for efficiently updating the program properties in response to changes of the path expression. We have implemented our method and evaluated it on thirteen Java applications from the DaCapo benchmark suite. The experimental results show that both our method for incrementally computing path expression and our method for incrementally interpreting path expression are effective in speeding up the analysis. Compared to the baseline APA and two state-of-the-art APA methods, the speedup of our method ranges from 160X to 4761X depending on the types of program analyses performed.
more » « less
Full Text Available
Quantifying Cache Side-Channel Leakage by Refining Set-Based Abstractions

https://doi.org/10.4230/LIPIcs.ECOOP.2025.22

Mitchell, Jacqueline L; Wang, Chao (January 2025, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Aldrich, Jonathan; Silva, Alexandra (Ed.)
We propose an improved abstract interpretation based method for quantifying cache side-channel leakage by addressing two key components of precision loss in existing set-based cache abstractions. Our method targets two key sources of imprecision: (1) imprecision in the abstract transfer function used to update the abstract cache state when interpreting a memory access and (2) imprecision due to the incompleteness of the set-based domain. At the center of our method are two key improvements: (1) the introduction of a new transfer function for updating the abstract cache state which carefully leverages information in the abstract state to prevent the spurious aging of memory blocks and (2) a refinement of the set-based domain based on the finite powerset construction. We show that both the new abstract transformer and the domain refinement enjoy certain enhanced precision properties. We have implemented the method and compared it against the state-of-the-art technique on a suite of benchmark programs implementing both sorting algorithms and cryptographic algorithms. The experimental results show that our method is effective in improving the precision of cache side-channel leakage quantification.
more » « less
Full Text Available
Discovering Likely Program Invariants for Persistent Memory

https://doi.org/10.1145/3691620.3695544

Huang, Zunchen; Ravi, Srivatsan; Wang, Chao (October 2024, ACM)

Full Text Available
Constraint Based Program Repair for Persistent Memory Bugs

https://doi.org/10.1145/3597503.3639204

Huang, Zunchen; Wang, Chao (April 2024, ACM)

Full Text Available
Certifying the Fairness of KNN in the Presence of Dataset Bias

https://doi.org/10.1007/978-3-031-37703-7_16

Li, Yannan; Wang, Jingbo; Wang, Chao (July 2023, Springer)

We propose a method for certifying the fairness of the classification result of a widely used supervised learning algorithm, the k-nearest neighbors (KNN), under the assumption that the training data may have historical bias caused by systematic mislabeling of samples from a protected minority group. To the best of our knowledge, this is the first certification method for KNN based on three variants of the fairness definition: individual fairness, ϵ -fairness, and label-flipping fairness. We first define the fairness certification problem for KNN and then propose sound approximations of the complex arithmetic computations used in the state-of-the-art KNN algorithm. This is meant to lift the computation results from the concrete domain to an abstract domain, to reduce the computational cost. We show effectiveness of this abstract interpretation based technique through experimental evaluation on six datasets widely used in the fairness research literature. We also show that the method is accurate enough to obtain fairness certifications for a large number of test inputs, despite the presence of historical bias in the datasets.
more » « less
Full Text Available
Constraint Based Compiler Optimization for Energy Harvesting Applications

Li, Yannan; Wang, Chao (July 2023, Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik)

We propose a method for optimizing the energy efficiency of software code running on small computing devices in the Internet of Things (IoT) that are powered exclusively by electricity harvested from ambient energy in the environment. Due to the weak and unstable nature of the energy source, it is challenging for developers to manually optimize the software code to deal with mismatch between the intermittent power supply and the computation demand. Our method overcomes the challenge by using a combination of three techniques. First, we use static program analysis to automatically identify opportunities for precomputation, i.e., computation that may be performed ahead of time as opposed to just in time. Second, we optimize the precomputation policy, i.e., a way to split and reorder steps of a computation task in the original software to match the intermittent power supply while satisfying a variety of system requirements; this is accomplished by formulating energy optimization as a constraint satisfiability problem and then solving the problem using an off-the-shelf SMT solver. Third, we use a state-of-the-art compiler platform (LLVM) to automate the program transformation to ensure that the optimized software code is correct by construction. We have evaluated our method on a large number of benchmark programs, which are C programs implementing secure communication protocols that are popular for energy-harvesting IoT devices. Our experimental results show that the method is efficient in optimizing all benchmark programs. Furthermore, the optimized programs significantly outperform the original programs in terms of energy efficiency and latency, and the overall improvement ranges from 2.3X to 36.7X.
more » « less
Full Text Available
Systematic Testing of the Data-Poisoning Robustness of KNN

https://doi.org/10.1145/3597926.3598129

Li, Yannan; Wang, Jingbo; Wang, Chao (July 2023, ACM)

Data poisoning aims to compromise a machine learning based software component by contaminating its training set to change its prediction results for test inputs. Existing methods for deciding data-poisoning robustness have either poor accuracy or long running time and, more importantly, they can only certify some of the truly-robust cases, but remain inconclusive when certification fails. In other words, they cannot falsify the truly-non-robust cases. To overcome this limitation, we propose a systematic testing based method, which can falsify as well as certify data-poisoning robustness for a widely used supervised-learning technique named k-nearest neighbors (KNN). Our method is faster and more accurate than the baseline enumeration method, due to a novel over-approximate analysis in the abstract domain, to quickly narrow down the search space, and systematic testing in the concrete domain, to find the actual violations. We have evaluated our method on a set of supervised-learning datasets. Our results show that the method significantly outperforms state-of-the-art techniques, and can decide data-poisoning robustness of KNN prediction results for most of the test inputs.
more » « less
Full Text Available
Synthesizing MILP Constraints for Efficient and Robust Optimization

https://doi.org/10.1145/3591298

Wang, Jingbo; Gupta, Aarti; Wang, Chao (June 2023, Proceedings of the ACM on Programming Languages)

While mixed integer linear programming (MILP) solvers are routinely used to solve a wide range of important science and engineering problems, it remains a challenging task for end users to write correct and efficient MILP constraints, especially for problems specified using the inherently non-linear Boolean logic operations. To overcome this challenge, we propose a syntax guided synthesis (SyGuS) method capable of generating high-quality MILP constraints from the specifications expressed using arbitrary combinations of Boolean logic operations. At the center of our method is an extensible domain specification language (DSL) whose expressiveness may be improved by adding new integer variables as decision variables, together with an iterative procedure for synthesizing linear constraints from non-linear Boolean logic operations using these integer variables. To make the synthesis method efficient, we also propose an over-approximation technique for soundly proving the correctness of the synthesized linear constraints, and an under-approximation technique for safely pruning away the incorrect constraints. We have implemented and evaluated the method on a wide range of benchmark specifications from statistics, machine learning, and data science applications. The experimental results show that the method is efficient in handling these benchmarks, and the quality of the synthesized MILP constraints is close to, or higher than, that of manually-written constraints in terms of both compactness and solving time.
more » « less
Full Text Available

Search for: All records