Federated learning (FL) is a distributed learning paradigm that allows multiple decentralized clients to collaboratively learn a common model without sharing local data. Although local data is not exposed directly, privacy concerns nonetheless exist as clients' sensitive information can be inferred from intermediate computations. Moreover, such information leakage accumulates substantially over time as the same data is repeatedly used during the iterative learning process. As a result, it can be particularly difficult to balance the privacy-accuracy trade-off when designing privacy-preserving FL algorithms. This paper introduces Upcycled-FL, a simple yet effective strategy that applies first-order approximation at every even round of model update. Under this strategy, half of the FL updates incur no information leakage and require much less computational and transmission costs. We first conduct the theoretical analysis on the convergence (rate) of Upcycled-FL and then apply two perturbation mechanisms to preserve privacy. Extensive experiments on both synthetic and real-world data show that the Upcycled-FL strategy can be adapted to many existing FL frameworks and consistently improve the privacy-accuracy trade-off
more »
« less
Gradient Inversion with Generative Image Prior
Federated Learning (FL) is a distributed learning framework, in which the local data never leaves clients’ devices to preserve privacy, and the server trains models on the data via accessing only the gradients of those local data. Without further privacy mechanisms such as differential privacy, this leaves the system vulnerable against an attacker who inverts those gradients to reveal clients’ sensitive data. However, a gradient is often insufficient to reconstruct the user data without any prior knowledge. By exploiting a generative model pretrained on the data distribution, we demonstrate that data privacy can be easily breached. Further, when such prior knowledge is unavailable, we investigate the possibility of learning the prior from a sequence of gradients seen in the process of FL training. We experimentally show that the prior in a form of generative model is learnable from iterative interactions in FL. Our findings demonstrate that additional mechanisms are necessary to prevent privacy leakage in FL.
more »
« less
- Award ID(s):
- 2003129
- PAR ID:
- 10311448
- Editor(s):
- Ranzato, M.; Beygelzimer, A.; Liang, P.S.; Vaughan, J.W.; Dauphin, Y.
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- Volume:
- 34
- ISSN:
- 1049-5258
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Federated Learning (FL) has emerged as an effective paradigm for distributed learning systems owing to its strong potential in exploiting underlying data characteristics while preserving data privacy. In cases of practical data heterogeneity among FL clients in many Internet-of-Things (IoT) applications over wireless networks, however, existing FL frameworks still face challenges in capturing the overall feature properties of local client data that often exhibit disparate distributions. One approach is to apply generative adversarial networks (GANs) in FL to address data heterogeneity by integrating GANs to regenerate anonymous training data without exposing original client data to possible eavesdropping. Despite some successes, existing GAN-based FL frameworks still incur high communication costs and elicit other privacy concerns, limiting their practical applications. To this end, this work proposes a novel FL framework that only applies partial GAN model sharing. This new PS-FedGAN framework effectively addresses heterogeneous data distributions across clients and strengthens privacy preservation at reduced communication costs, especially over wireless networks. Our analysis demonstrates the convergence and privacy benefits of the proposed PS-FEdGAN framework. Through experimental results based on several well-known benchmark datasets, our proposed PS-FedGAN demonstrates strong potential to tackle FL under heterogeneous (non-IID) client data distributions, while improving data privacy and lowering communication overhead.more » « less
-
Deep learning models are prone to forgetting information learned in the past when trained on new data. This problem becomes even more pronounced in the context of federated learning (FL), where data is decentralized and subject to independent changes for each user. Continual Learning (CL) studies this so-called \textit{catastrophic forgetting} phenomenon primarily in centralized settings, where the learner has direct access to the complete training dataset. However, applying CL techniques to FL is not straightforward due to privacy concerns and resource limitations. This paper presents a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data. Then, clients can leverage the generative model to mitigate catastrophic forgetting locally. The generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. Therefore, it reduces the risk of data leakage as opposed to training it on the client's private data. We demonstrate significant improvements for the CIFAR-100 dataset compared to existing baselines.more » « less
-
Motivated by the ever-increasing concerns on personal data privacy and the rapidly growing data volume at local clients, federated learning (FL) has emerged as a new machine learning setting. An FL system is comprised of a central parameter server and multiple local clients. It keeps data at local clients and learns a centralized model by sharing the model parameters learned locally. No local data needs to be shared, and privacy can be well protected. Nevertheless, since it is the model instead of the raw data that is shared, the system can be exposed to the poisoning model attacks launched by malicious clients. Furthermore, it is challenging to identify malicious clients since no local client data is available on the server. Besides, membership inference attacks can still be performed by using the uploaded model to estimate the client's local data, leading to privacy disclosure. In this work, we first propose a model update based federated averaging algorithm to defend against Byzantine attacks such as additive noise attacks and sign-flipping attacks. The individual client model initialization method is presented to provide further privacy protections from the membership inference attacks by hiding the individual local machine learning model. When combining these two schemes, privacy and security can be both effectively enhanced. The proposed schemes are proved to converge experimentally under non-lID data distribution when there are no attacks. Under Byzantine attacks, the proposed schemes perform much better than the classical model based FedAvg algorithm.more » « less
-
Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients (e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL by leveraging differential testing. FedDefender first applies differential testing on clients’ models using a synthetic input. Instead of comparing the output (predicted label), which is unavailable for synthetic input, FedDefender fingerprints the neuron activations of clients’ models to identify a potentially malicious client containing a backdoor. We evaluate FedDefender using MNIST and FashionMNIST datasets with 20 and 30 clients, and our results demonstrate that FedDefender effectively mitigates such attacks, reducing the attack success rate (ASR) to 10% without deteriorating the global model performance.more » « less