NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance

https://doi.org/10.1109/ICSE55347.2025.00128

Gill, Waris; Anwar, Ali; Gulzar, Muhammad Ali (April 2025, Proceedings of the IEEE/ACM 47th International Conference on Software Engineering)

Not AvailaIn Federated Learning, clients train models on local data and send updates to a central server, which aggregates them into a global model using a fusion algorithm. This collaborative yet privacy-preserving training comes at a cost. FL developers face significant challenges in attributing global model predictions to specific clients. Localizing responsible clients is a crucial step towards (a) excluding clients primarily responsible for incorrect predictions and (b) encouraging clients who contributed high-quality models to continue participating in the future. Existing ML debugging approaches are inherently inapplicable as they are designed for single-model, centralized training. We introduce TraceFL, a fine-grained neuron provenance capturing mechanism that identifies clients responsible for a global model's prediction by tracking the flow of information from individual clients to the global model. Since inference on different inputs activates a different set of neurons of the global model, TraceFL dynamically quantifies the significance of the global model's neurons in a given prediction, identifying the most crucial neurons in the global model. It then maps them to the corresponding neurons in every participating client to determine each client's contribution, ultimately localizing the responsible client. We evaluate TraceFL on six datasets, including two real-world medical imaging datasets and four neural networks, including advanced models such as GPT. TraceFL achieves 99% accuracy in localizing the responsible client in FL tasks spanning both image and text classification tasks. At a time when state-of-the-art ML debugging approaches are mostly domain-specific (e.g., image classification only), TraceFL is the first technique to enable highly accurate automated reasoning across a wide range of FL applications.ble
more » « less
Free, publicly-accessible full text available April 26, 2026
A Metric for Measuring the Impact of Rare Paths on Program Coverage

https://doi.org/10.1109/SANER64311.2025.00042

Amour, Leo St; Tilevich, Eli; Gulzar, Muhammad Ali (March 2025, 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER))

Fuzzing has become a popular technique for discovering bugs and vulnerabilities. To increase the probability of finding bugs, developers should apply fuzzers that maximize program coverage. Program coverage typically measures the percentage of program lines or branches a fuzzer executes. However, these metrics fail to communicate the value of hitting a particular line, branch, or path. Many bugs manifest only within non-trivial control flows. To improve software quality, fuzzing non-trivial program paths should be more important than fuzzing trivial ones. This paper introduces rare-path coverage (RP-Coverage), a novel program coverage metric that conveys the value of discovering an unlikely control flow path. We have developed a new technique for estimating the probability of taking an execution path, which relies on probabilistic logic programming to declaratively express the logic for constructing and analyzing a probabilistic control flow graph. Our evaluation indicates RP-Coverage's promise as a metric for measuring fuzzing efficacy. Specifically, we observe that defects along rare paths-intuitively-substantially impact the effectiveness of fuzzers. However, we argue that existing fuzzing metrics fall short when conveying this significance. We also observe that the value of uncovering an unlikely path is better reflected by increases in RP-Coverage than existing metrics. Specifically, the average coverage increases are up to 49.5%, 11.1 %, and 15.4 % for RP-Coverage, line coverage, and branch coverage, respectively. This finding indicates that RP-Coverage is more elastic, or sensitive, to path probabilities and thus capable of more effectively quantifying a fuzzer's ability to discover unlikely program paths. As such, RP-Coverage demonstrates promise as a program coverage metric that enhances fuzzer fitness measures when supplementing standard criteria.
more » « less
Free, publicly-accessible full text available March 4, 2026
DeSQL: Interactive Debugging of SQL in Data-Intensive Scalable Computing

https://doi.org/10.1145/3643761

Haroon, Sabaat; Brown, Chris; Gulzar, Muhammad Ali (July 2024, Proceedings of the ACM on Software Engineering)

SQL is the most commonly used front-end language for data-intensive scalable computing (DISC) applications due to its broad presence in new and legacy workflows and shallow learning curve. However, DISC-backed SQL introduces several layers of abstraction that significantly reduce the visibility and transparency of workflows, making it challenging for developers to find and fix errors in a query. When a query returns incorrect outputs, it takes a non-trivial effort to comprehend every stage of the query execution and find the root cause among the input data and complex SQL query. We aim to bring the benefits of step-through interactive debugging to DISC-powered SQL with DeSQL. Due to the declarative nature of SQL, there are no ordered atomic statements to place a breakpoint to monitor the flow of data. DeSQL’s automated query decomposition breaks a SQL query into its constituent sub queries, offering natural locations for setting breakpoints and monitoring intermediate data. However, due to advanced query optimization and translation in DISC systems, a user query rarely matches the physical execution, making it challenging to associate subqueries with their intermediate data. DeSQL performs fine-grained taint analysis to dynamically map the subqueries to their intermediate data, while also recognizing subqueries removed by the optimizers. For such subqueries, DeSQL efficiently regenerates the intermediate data from a nearby subquery’s data. On the popular TPC-DC benchmark, DeSQL provides a complete debugging view in 13% less time than the original job time while incurring an average overhead of 10% in addition to retaining Apache Spark’s scalability. In a user study comprising 15 participants engaged in two debugging tasks, we find that participants utilizing DeSQL identify the root cause behind a wrong query output in 74% less time than the de-facto, manual debugging.
more » « less
Full Text Available
FedDefender: Backdoor Attack Defense in Federated Learning

https://doi.org/10.1145/3617574.3617858

Gill, Waris; Anwar, Ali; Gulzar, Muhammad Ali (December 2023, ACM)

Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients (e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL by leveraging differential testing. FedDefender first applies differential testing on clients’ models using a synthetic input. Instead of comparing the output (predicted label), which is unavailable for synthetic input, FedDefender fingerprints the neuron activations of clients’ models to identify a potentially malicious client containing a backdoor. We evaluate FedDefender using MNIST and FashionMNIST datasets with 20 and 30 clients, and our results demonstrate that FedDefender effectively mitigates such attacks, reducing the attack success rate (ASR) to 10% without deteriorating the global model performance.
more » « less
Full Text Available
Co-dependence Aware Fuzzing for Dataflow-Based Big Data Analytics

https://doi.org/10.1145/3611643.3616298

Humayun, Ahmad; Kim, Miryung; Gulzar, Muhammad Ali (November 2023, Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering)

Full Text Available
NaturalFuzz: Natural Input Generation for Big Data Analytics

https://doi.org/10.1109/ASE56229.2023.00034

Humayun, Ahmad; Wu, Yaoxuan; Kim, Miryung; Gulzar, Muhammad Ali (September 2023, IEEE)

Full Text Available
Editors' Introduction

https://doi.org/10.56553/popets-2023-0066

Mazurek, Michelle; Sherr, Micah (July 2023, Proceedings on Privacy Enhancing Technologies)

Full Text Available
FedDebug: Systematic Debugging for Federated Learning Applications

https://doi.org/10.1109/ICSE48619.2023.00053

Gill, Waris; Anwar, Ali; Gulzar, Muhammad Ali (May 2023, IEEE/ACM 45th International Conference on Software Engineering)

In Federated Learning (FL), clients independently train local models and share them with a central aggregator to build a global model. Impermissibility to access clients' data and collaborative training make FL appealing for applications with data-privacy concerns, such as medical imaging. However, these FL characteristics pose unprecedented challenges for debugging. When a global model's performance deteriorates, identifying the responsible rounds and clients is a major pain point. Developers resort to trial-and-error debugging with subsets of clients, hoping to increase the global model's accuracy or let future FL rounds retune the model, which are time-consuming and costly. We design a systematic fault localization framework, Fedde-bug,that advances the FL debugging on two novel fronts. First, Feddebug enables interactive debugging of realtime collaborative training in FL by leveraging record and replay techniques to construct a simulation that mirrors live FL. Feddebug'sbreakpoint can help inspect an FL state (round, client, and global model) and move between rounds and clients' models seam-lessly, enabling a fine-grained step-by-step inspection. Second, Feddebug automatically identifies the client(s) responsible for lowering the global model's performance without any testing data and labels-both are essential for existing debugging techniques. Feddebug's strengths come from adapting differential testing in conjunction with neuron activations to determine the client(s) deviating from normal behavior. Feddebug achieves 100% accuracy in finding a single faulty client and 90.3% accuracy in finding multiple faulty clients. Feddebug's interactive de-bugging incurs 1.2% overhead during training, while it localizes a faulty client in only 2.1% of a round's training time. With FedDebug,we bring effective debugging practices to federated learning, improving the quality and productivity of FL application developers.
more » « less
Full Text Available
Detecting Build Conflicts in Software Merge for Java Programs via Static Analysis

https://doi.org/10.1145/3551349.3556950

Towqir, Sheikh Shadab; Shen, Bowen; Gulzar, Muhammad Ali; Meng, Na (October 2022, The 37th IEEE/ACM International Conference on Automated Software Engineering)

In software merge, the edits from different branches can textually overlap (i.e., textual conflicts) or cause build and test errors (i.e., build and test conflicts), jeopardizing programmer productivity and software quality. Existing tools primarily focus on textual conflicts; few tools detect higher-order conflicts (i.e., build and test conflicts). However, existing detectors of build conflicts are limited. Due to their heavy usage of automatic build, current detectors (e.g., Crystal) only report build errors instead of identifying the root causes; developers have to manually locate conflicting edits. These detectors only help when the branches-to-merge have no textual conflict. We present a new static analysis-based approach Bucond (“build conflict detector”). Given three code versions in a merging scenario: base b, left l, and right r, Bucond models each version as a graph, and compares graphs to extract entity-related edits (e.g., class renaming) in l and r. We believe that build conflicts occur when certain edits are co-applied to related entities between branches. Bucond realizes this insight via pattern matching to identify any cross-branch edit combination that can trigger build conflicts (e.g., one branch adds a reference to field F while the other branch removes F). We systematically explored and devised 57 patterns, covering 97% of the build conflicts in our experiments. Our evaluation shows Bucond to complement build-based detectors, as it (1) detects conflicts with 100% precision and 88%–100% recall, (2) locates conflicting edits, and (3) works well when those detectors do not.
more » « less
Full Text Available
TrackerSift: untangling mixed tracking and functional web resources

https://doi.org/10.1145/3487552.3487855

Amjad, Abdul Haddi; Saleem, Danial; Gulzar, Muhammad Ali; Shafiq, Zubair; Zaffar, Fareed (November 2021, In Proceedings of the 21st ACM Internet Measurement Conference (IMC '21))

Full Text Available

Search for: All records