NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Federation Strikes Back: A Survey of Federated Learning Privacy Attacks, Defenses, Applications, and Policy Landscape

https://doi.org/10.1145/3724113

Zhao, Joshua; Bagchi, Saurabh; Avestimehr, Salman; Chan, Kevin; Chaterji, Somali; Dimitriadis, Dimitris; Li, Jiacheng; Li, Ninghui; Nourian, Arash; Roth, Holger (September 2025, ACM Computing Surveys)

Deep learning has shown incredible potential across a wide array of tasks, and accompanied by this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices, and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology that enables collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be “reverse engineered” to infer information about the private training data. It has been shown under a wide variety of settings that this privacy premise doesnothold. In this article we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which the privacy of an FL client can be broken. We further dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL and conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants.
more » « less
Free, publicly-accessible full text available September 30, 2026
On the Robustness of LDP Protocols for Numerical Attributes under Data Poisoning Attacks

https://doi.org/10.14722/ndss.2025.241521

Li, Xiaoguang; Li, Zitao; Li, Ninghui; Sun, Wenhai (January 2025, Internet Society)

Free, publicly-accessible full text available January 1, 2026
CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling

https://doi.org/10.14722/ndss.2025.230915

Zhang, Kaiyuan; Cheng, Siyuan; Shen, Guangyu; Ribeiro, Bruno; An, Shengwei; Chen, Pin-Yu; Zhang, Xiangyu; Li, Ninghui (February 2025, Internet Society)

Federated learning collaboratively trains a neural network on a global server, where each local client receives the current global model weights and sends back parameter updates (gradients) based on its local private data. The process of sending these model updates may leak client’s private data information. Existing gradient inversion attacks can exploit this vulnerability to recover private training instances from a client’s gradient vectors. Recently, researchers have proposed advanced gradient inversion techniques that existing defenses struggle to handle effectively. In this work, we present a novel defense tailored for large neural network models. Our defense capitalizes on the high dimensionality of the model parameters to perturb gradients within a subspace orthogonal to the original gradient. By leveraging cold posteriors over orthogonal subspaces, our defense implements a refined gradient update mechanism. This enables the selection of an optimal gradient that not only safeguards against gradient inversion attacks but also maintains model utility. We conduct comprehensive experiments across three different datasets and evaluate our defense against various state-of-the-art attacks and defenses. Code is available at https://censor-gradient.github.io.
more » « less
Free, publicly-accessible full text available February 24, 2026
ARIoTEDef: Adversarially Robust IoT Early Defense System Based on Self-Evolution against Multi-step Attacks

https://doi.org/10.1145/3660646

Huang, Mengdie; Lee, Hyunwoo; Kundu, Ashish; Chen, Xiaofeng; Mudgerikar, Anand; Li, Ninghui; Bertino, Elisa (August 2024, ACM Transactions on Internet of Things)

Internet of Things (IoT) cyber threats, exemplified by jackware and crypto mining, underscore the vulnerability of IoT devices. Due to the multi-step nature of many attacks, early detection is vital for a swift response and preventing malware propagation. However, accurately detecting early-stage attacks is challenging, as attackers employ stealthy, zero-day, or adversarial machine learning to evade detection. To enhance security, we propose ARIoTEDef, an Adversarially Robust IoT Early Defense system, which identifies early-stage infections and evolves autonomously. It models multi-stage attacks based on a cyber kill chain and maintains stage-specific detectors. When anomalies in the later action stage emerge, the system retroactively analyzes event logs using an attention-based sequence-to-sequence model to identify early infections. Then, the infection detector is updated with information about the identified infections. We have evaluated ARIoTEDef against multi-stage attacks, such as the Mirai botnet. Results show that the infection detector’s average F1 score increases from 0.31 to 0.87 after one evolution round. We have also conducted an extensive analysis of ARIoTEDef against adversarial evasion attacks. Our results show that ARIoTEDef is robust and benefits from multiple rounds of evolution.
more » « less
Full Text Available
Exploring Use of Explanative Illustrations to Communicate Differential Privacy Models

https://doi.org/10.1177/21695067231195006

Xiong, Aiping; Wu, Chuhao; Wang, Tianhao; Proctor, Robert W.; Blocki, Jeremiah; Li, Ninghui; Jha, Somesh (September 2023, Proceedings of the Human Factors and Ergonomics Society Annual Meeting)

Proper communication is key to the adoption and implementation of differential privacy (DP). In this work, we designed explanative illustrations of three DP models (Central DP, Local DP, Shuffler DP) to help laypeople conceptualize how random noise is added to protect individuals’ privacy and preserve group utility. Following a pilot survey and an interview, we conducted an online experiment ( N = 300) exploring participants’ comprehension, privacy and utility perception, and data-sharing decisions across the three DP models. We obtained empirical evidence showing participants’ acceptance of the Shuffler DP model for data privacy protection. We discuss the implications of our findings.
more » « less
Full Text Available
Differentially Private Vertical Federated Clustering

https://doi.org/10.14778/3583140.3583146

Li, Zitao; Wang, Tianhao; Li, Ninghui (February 2023, Proceedings of the VLDB Endowment)

In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes, and a server wants to leverage the data to train a model. To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques, where the data parties share only information for training the model, instead of the private data. However, it is challenging to ensure that the shared information maintains privacy while learning accurate models. To the best of our knowledge, the algorithm proposed in this paper is the first practical solution for differentially private vertical federatedk-means clustering, where the server can obtain a set of global centers with a provable differential privacy guarantee. Our algorithm assumes an untrusted central server that aggregates differentially private local centers and membership encodings from local data parties. It builds a weighted grid as the synopsis of the global dataset based on the received information. Final centers are generated by running anyk-means algorithm on the weighted grid. Our approach for grid weight estimation uses a novel, light-weight, and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin sketch. To improve the estimation accuracy in the setting with more than two data parties, we further propose a refined version of the weights estimation algorithm and a parameter tuning strategy to reduce the finalk-means loss to be close to that in the central private setting. We provide theoretical utility analysis and experimental evaluation results for the cluster centers computed by our algorithm and show that our approach performs better both theoretically and empirically than the two baselines based on existing techniques
more » « less
Full Text Available
PolyScope: Multi-Policy Access Control Analysis to Triage Android Scoped Storage

https://doi.org/10.1109/TDSC.2023.3310402

Lee, Yu-Tsung; Chen, Haining; Enck, William; Vijayakumar, Hayawardh; Li, Ninghui; Qian, Zhiyun; Petracca, Giuseppe; Jaeger, Trent (January 2023, IEEE Transactions on Dependable and Secure Computing)

Full Text Available
MGD: A Utility Metric for Private Data Publication

https://doi.org/10.1145/3491371.3491385

Li, Zitao; Dang, Trung; Wang, Tianhao; Li, Ninghui (December 2021, NSysS 2021: Proceedings of the 8th International Conference on Networking, Systems and Security)

techniques to protect user data privacy. A common way for utilizing private data under DP is to take an input dataset and synthesize a new dataset that preserves features of the input dataset while satisfying DP. A trade-off always exists between the strength of privacy protection and the utility of the final output: stronger privacy protection requires larger randomness, so the outputs usually have a larger variance and can be far from optimal. In this paper, we summarize our proposed metric for the NIST “A Better Meter Stick for Differential Privacy” competition [26], MarGinal Difference (MGD), for measuring the utility of a synthesized dataset. Our metric is based on earth mover distance. We introduce new features in our metric so that it is not affected by some small random noise that is unavoidable in the DP context but focuses more on the significant difference. We show that our metric can reflect the range query error better compared with other existing metrics. We introduce an efficient computation method based on the min-cost flow to alleviate the high computation cost of the earth mover’s distance.
more » « less
Full Text Available
Federated matrix factorization with privacy guarantee

https://doi.org/10.14778/3503585.3503598

Li, Zitao; Ding, Bolin; Zhang, Ce; Li, Ninghui; Zhou, Jingren (December 2021, Proceedings of the VLDB Endowment)

Matrix factorization (MF) approximates unobserved ratings in a rating matrix, whose rows correspond to users and columns correspond to items to be rated, and has been serving as a fundamental building block in recommendation systems. This paper comprehensively studies the problem of matrix factorization in different federated learning (FL) settings, where a set of parties want to cooperate in training but refuse to share data directly. We first propose a generic algorithmic framework for various settings of federated matrix factorization (FMF) and provide a theoretical convergence guarantee. We then systematically characterize privacy-leakage risks in data collection, training, and publishing stages for three different settings and introduce privacy notions to provide end-to-end privacy protections. The first one is vertical federated learning (VFL), where multiple parties have the ratings from the same set of users but on disjoint sets of items. The second one is horizontal federated learning (HFL), where parties have ratings from different sets of users but on the same set of items. The third setting is local federated learning (LFL), where the ratings of the users are only stored on their local devices. We introduce adapted versions of FMF with the privacy notions guaranteed in the three settings. In particular, a new private learning technique called embedding clipping is introduced and used in all the three settings to ensure differential privacy. For the LFL setting, we combine differential privacy with secure aggregation to protect the communication between user devices and the server with a strength similar to the local differential privacy model, but much better accuracy. We perform experiments to demonstrate the effectiveness of our approaches.
more » « less
Full Text Available
Communicating Differential Privacy Models by Illustrations: A Survey and In-Depth Interview Study

https://doi.org/10.1177/1071181321651137

Wu, Chuhao; Wang, Tianhao; Proctor, Robert W.; Li, Ninghui; Blocki, Jeremiah; Xiong, Aiping (September 2021, Proceedings of the Human Factors and Ergonomics Society Annual Meeting)

Full Text Available

« Prev Next »

Search for: All records