skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2106420

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. SQL is the most commonly used front-end language for data-intensive scalable computing (DISC) applications due to its broad presence in new and legacy workflows and shallow learning curve. However, DISC-backed SQL introduces several layers of abstraction that significantly reduce the visibility and transparency of workflows, making it challenging for developers to find and fix errors in a query. When a query returns incorrect outputs, it takes a non-trivial effort to comprehend every stage of the query execution and find the root cause among the input data and complex SQL query. We aim to bring the benefits of step-through interactive debugging to DISC-powered SQL with DeSQL. Due to the declarative nature of SQL, there are no ordered atomic statements to place a breakpoint to monitor the flow of data. DeSQL’s automated query decomposition breaks a SQL query into its constituent sub queries, offering natural locations for setting breakpoints and monitoring intermediate data. However, due to advanced query optimization and translation in DISC systems, a user query rarely matches the physical execution, making it challenging to associate subqueries with their intermediate data. DeSQL performs fine-grained taint analysis to dynamically map the subqueries to their intermediate data, while also recognizing subqueries removed by the optimizers. For such subqueries, DeSQL efficiently regenerates the intermediate data from a nearby subquery’s data. On the popular TPC-DC benchmark, DeSQL provides a complete debugging view in 13% less time than the original job time while incurring an average overhead of 10% in addition to retaining Apache Spark’s scalability. In a user study comprising 15 participants engaged in two debugging tasks, we find that participants utilizing DeSQL identify the root cause behind a wrong query output in 74% less time than the de-facto, manual debugging. 
    more » « less
  2. Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients (e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL by leveraging differential testing. FedDefender first applies differential testing on clients’ models using a synthetic input. Instead of comparing the output (predicted label), which is unavailable for synthetic input, FedDefender fingerprints the neuron activations of clients’ models to identify a potentially malicious client containing a backdoor. We evaluate FedDefender using MNIST and FashionMNIST datasets with 20 and 30 clients, and our results demonstrate that FedDefender effectively mitigates such attacks, reducing the attack success rate (ASR) to 10% without deteriorating the global model performance. 
    more » « less
  3. In Federated Learning (FL), clients independently train local models and share them with a central aggregator to build a global model. Impermissibility to access clients' data and collaborative training make FL appealing for applications with data-privacy concerns, such as medical imaging. However, these FL characteristics pose unprecedented challenges for debugging. When a global model's performance deteriorates, identifying the responsible rounds and clients is a major pain point. Developers resort to trial-and-error debugging with subsets of clients, hoping to increase the global model's accuracy or let future FL rounds retune the model, which are time-consuming and costly. We design a systematic fault localization framework, Fedde-bug,that advances the FL debugging on two novel fronts. First, Feddebug enables interactive debugging of realtime collaborative training in FL by leveraging record and replay techniques to construct a simulation that mirrors live FL. Feddebug'sbreakpoint can help inspect an FL state (round, client, and global model) and move between rounds and clients' models seam-lessly, enabling a fine-grained step-by-step inspection. Second, Feddebug automatically identifies the client(s) responsible for lowering the global model's performance without any testing data and labels-both are essential for existing debugging techniques. Feddebug's strengths come from adapting differential testing in conjunction with neuron activations to determine the client(s) deviating from normal behavior. Feddebug achieves 100% accuracy in finding a single faulty client and 90.3% accuracy in finding multiple faulty clients. Feddebug's interactive de-bugging incurs 1.2% overhead during training, while it localizes a faulty client in only 2.1% of a round's training time. With FedDebug,we bring effective debugging practices to federated learning, improving the quality and productivity of FL application developers. 
    more » « less
  4. In software merge, the edits from different branches can textually overlap (i.e., textual conflicts) or cause build and test errors (i.e., build and test conflicts), jeopardizing programmer productivity and software quality. Existing tools primarily focus on textual conflicts; few tools detect higher-order conflicts (i.e., build and test conflicts). However, existing detectors of build conflicts are limited. Due to their heavy usage of automatic build, current detectors (e.g., Crystal) only report build errors instead of identifying the root causes; developers have to manually locate conflicting edits. These detectors only help when the branches-to-merge have no textual conflict. We present a new static analysis-based approach Bucond (“build conflict detector”). Given three code versions in a merging scenario: base b, left l, and right r, Bucond models each version as a graph, and compares graphs to extract entity-related edits (e.g., class renaming) in l and r. We believe that build conflicts occur when certain edits are co-applied to related entities between branches. Bucond realizes this insight via pattern matching to identify any cross-branch edit combination that can trigger build conflicts (e.g., one branch adds a reference to field F while the other branch removes F). We systematically explored and devised 57 patterns, covering 97% of the build conflicts in our experiments. Our evaluation shows Bucond to complement build-based detectors, as it (1) detects conflicts with 100% precision and 88%–100% recall, (2) locates conflicting edits, and (3) works well when those detectors do not. 
    more » « less