This paper studies a distributed optimization problem in the federated learning (FL) framework under differential privacy constraints, whereby a set of clients having local samples are connected to an untrusted server, who wants to learn a global model while preserving the privacy of clients’ local datasets. We propose a new client sampling called self-sampling that reflects the random availability of clients in the learning process in FL. We analyze the differential privacy of the SGD with client self-sampling by composing amplification by sub-sampling along with amplification by shuffling. Furthermore, we analyze the convergence of the proposed SGD algorithm showing that we can get a reasonable learning performance while preserving the privacy of clients’ data even with client self-sampling.
more »
« less
FedNIC: enhancing privacy-preserving federated learning via homomorphic encryption offload on SmartNIC
Federated learning (FL) has emerged as a promising paradigm for secure distributed machine learning model training across multiple clients or devices, enabling model training without having to share data across the clients. However, recent studies revealed that FL could be vulnerable to data leakage and reconstruction attacks even if the data itself are never shared with another client. Thus, to resolve such vulnerability and improve the privacy of all clients, a class of techniques, called privacy-preserving FL, incorporates encryption techniques, such as homomorphic encryption (HE), to encrypt and fully protect model information from being exposed to other parties. A downside to this approach is that encryption schemes like HE are very compute-intensive, often causing inefficient and excessive use of client CPU resources that can be used for other uses. To alleviate this issue, this study introduces a novel approach by leveraging smart network interface cards (SmartNICs) to offload compute-intensive HE operations of privacy-preserving FL. By employing SmartNICs as hardware accelerators, we enable efficient computation of HE while saving CPU cycles and other server resources for more critical tasks. In addition, by offloading encryption from the host to another device, the details of encryption remain secure even if the host is compromised, ultimately improving the security of the entire FL system. Given such benefits, this paper presents an FL system named FedNIC that implements the above approach, with an in-depth description of the architecture, implementation, and performance evaluations. Our experimental results demonstrate a more secure FL system with no loss in model accuracy and up to 25% in reduced host CPU cycle, but with a roughly 46% increase in total training time, showing the feasibility and tradeoffs of utilizing SmartNICs as an encryption offload device in federated learning scenarios. Finally, we illustrate promising future study and potential optimizations for a more secure and privacy-preserving federated learning system.
more »
« less
- Award ID(s):
- 2245352
- PAR ID:
- 10639531
- Publisher / Repository:
- Frontiers
- Date Published:
- Journal Name:
- Frontiers in Computer Science
- Volume:
- 6
- ISSN:
- 2624-9898
- Subject(s) / Keyword(s):
- privacy-preserving machine learning, federated learning, homomorphic encryption, SmartNIC, network offload
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Federated Learning (FL) revolutionizes collaborative machine learning among Internet of Things (IoT) devices by enabling them to train models collectively while preserving data privacy. FL algorithms fall into two primary categories: synchronous and asynchronous. While synchronous FL efficiently handles straggler devices, its convergence speed and model accuracy can be compromised. In contrast, asynchronous FL allows all devices to participate but incurs high communication overhead and potential model staleness. To overcome these limitations, the paper introduces a semi-synchronous FL framework that uses client tiering based on computing and communication latencies. Clients in different tiers upload their local models at distinct frequencies, striking a balance between straggler mitigation and communication costs. Building on this, the paper proposes the Dynamic client clustering, bandwidth allocation, and local training for semi-synchronous Federated learning (DecantFed) algorithm to dynamically optimize client clustering, bandwidth allocation, and local training workloads in order to maximize data sample processing rates in FL. DecantFed dynamically optimizes client clustering, bandwidth allocation, and local training workloads for maximizing data processing rates in FL. It also adapts client learning rates according to their tiers, thus addressing the model staleness issue. Extensive simulations using benchmark datasets like MNIST and CIFAR-10, under both IID and non-IID scenarios, demonstrate DecantFed’s superior performance. It outperforms FedAvg and FedProx in convergence speed and delivers at least a 28% improvement in model accuracy, compared to FedProx.more » « less
-
Federated Learning (FL) has emerged as an effective paradigm for distributed learning systems owing to its strong potential in exploiting underlying data characteristics while preserving data privacy. In cases of practical data heterogeneity among FL clients in many Internet-of-Things (IoT) applications over wireless networks, however, existing FL frameworks still face challenges in capturing the overall feature properties of local client data that often exhibit disparate distributions. One approach is to apply generative adversarial networks (GANs) in FL to address data heterogeneity by integrating GANs to regenerate anonymous training data without exposing original client data to possible eavesdropping. Despite some successes, existing GAN-based FL frameworks still incur high communication costs and elicit other privacy concerns, limiting their practical applications. To this end, this work proposes a novel FL framework that only applies partial GAN model sharing. This new PS-FedGAN framework effectively addresses heterogeneous data distributions across clients and strengthens privacy preservation at reduced communication costs, especially over wireless networks. Our analysis demonstrates the convergence and privacy benefits of the proposed PS-FEdGAN framework. Through experimental results based on several well-known benchmark datasets, our proposed PS-FedGAN demonstrates strong potential to tackle FL under heterogeneous (non-IID) client data distributions, while improving data privacy and lowering communication overhead.more » « less
-
Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients (e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL by leveraging differential testing. FedDefender first applies differential testing on clients’ models using a synthetic input. Instead of comparing the output (predicted label), which is unavailable for synthetic input, FedDefender fingerprints the neuron activations of clients’ models to identify a potentially malicious client containing a backdoor. We evaluate FedDefender using MNIST and FashionMNIST datasets with 20 and 30 clients, and our results demonstrate that FedDefender effectively mitigates such attacks, reducing the attack success rate (ASR) to 10% without deteriorating the global model performance.more » « less
-
Federated learning enables users to collaboratively train a machine learning model over their private datasets. Secure aggregation protocols are employed to mitigate information leakage about the local datasets from user-submitted model updates. This setup, however, still leaks the user participation in training, which can also be sensitive. Protecting user anonymity is even more challenging in dynamic environments where users may (re)join or leave the training process at any point of time. This paper introduces AnoFel, the first framework to support private and anonymous dynamic participation in federated learning (FL). AnoFel leverages several cryptographic primitives, the concept of anonymity sets, differential privacy, and a public bulletin board to support anonymous user registration, as well as unlinkable and confidential model update submission. Our system allows dynamic participation, where users can join or leave at any time without needing any recovery protocol or interaction. To assess security, we formalize a notion for privacy and anonymity in FL, and formally prove that AnoFel satisfies this notion. To the best of our knowledge, our system is the first solution with provable anonymity guarantees. To assess efficiency, we provide a concrete implementation of AnoFel, and conduct experiments showing its ability to support learning applications scaling to a large number of clients. For a TinyImageNet classification task with 512 clients, the client setup to join is less than 3 sec, and the client runtime for each training iteration takes a total of 8 sec, where the added overhead of AnoFel is 46% of the total runtime. We also compare our system with prior work and demonstrate its practicality. AnoFel client runtime is up to 5x faster than Truex et al., despite the added anonymity guarantee and dynamic user joining in AnoFel. Compared to Bonawitz et al., AnoFel is only 2x slower for added support for privacy in output, dynamic user joining, and anonymity.more » « less
An official website of the United States government

