skip to main content


Title: Projected federated averaging with heterogeneous differential privacy
Federated Learning (FL) is a promising framework for multiple clients to learn a joint model without directly sharing the data. In addition to high utility of the joint model, rigorous privacy protection of the data and communication efficiency are important design goals. Many existing efforts achieve rigorous privacy by ensuring differential privacy for intermediate model parameters, however, they assume a uniform privacy parameter for all the clients. In practice, different clients may have different privacy requirements due to varying policies or preferences. In this paper, we focus on explicitly modeling and leveraging the heterogeneous privacy requirements of different clients and study how to optimize utility for the joint model while minimizing communication cost. As differentially private perturbations affect the model utility, a natural idea is to make better use of information submitted by the clients with higher privacy budgets (referred to as "public" clients, and the opposite as "private" clients). The challenge is how to use such information without biasing the joint model. We propose P rojected F ederated A veraging (PFA), which extracts the top singular subspace of the model updates submitted by "public" clients and utilizes them to project the model updates of "private" clients before aggregating them. We then propose communication-efficient PFA+, which allows "private" clients to upload projected model updates instead of original ones. Our experiments verify the utility boost of both algorithms compared to the baseline methods, whereby PFA+ achieves over 99% uplink communication reduction for "private" clients.  more » « less
Award ID(s):
2125530 1952192 2124104 1838200
NSF-PAR ID:
10332814
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
15
Issue:
4
ISSN:
2150-8097
Page Range / eLocation ID:
828 to 840
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Artificial Intelligence (AI) is moving towards the edge. Training an AI model for edge computing on a centralized server increases latency, and the privacy of edge users is jeopardized due to private data transfer through a less secure communication channels. Additionally, existing high-power computing systems are battling with memory and data transfer bottlenecks between the processor and memory. Federated Learning (FL) is a collaborative AI learning paradigm for distributed local devices that operates without transferring local data. Local participant devices share the updated network parameters with the central server instead of sending the original data. The central server updates the global AI model and deploys the model to the local clients. As the local data resides only on the edge, these devices need to be protected from cyberattacks. The Federated Intrusion Detection System (FIDS) could be a viable system to protect edge devices as opposed to a centralized protection system. However, on-device training of the model in resource constrained devices may suffer from excessive power drain, in addition to memory and area overhead. In this work we present a memristor based system for AI training on edge devices. Memristor devices are ideal candidates for processing in memory, as their dynamic resistance properties allow them to perform multiply-add operations in parallel in the analog domain with extreme efficiency. Alternatively, existing CMOS-based PIM systems are typically developed for edge inference based on pretrained weights, and are not equipped for on-chip training. We show the effectiveness of the system, where successful learning and recognition is achieved completely within edge devices. The classification accuracy of the memristor system shows negligible loss when compared a software implementation. To the best of our knowledge, this first demonstration of a memristor based federated learning system. We demonstrate the effectiveness of this system as an intrusion detection platform for edge devices, although given the flexibility of the learning algorithm, it could be used to enhance many types of on board leaning and classification applications. 
    more » « less
  2. We present Secrecy, a system for privacy-preserving collaborative analytics as a service. Secrecy allows multiple data holders to contribute their data towards a joint analysis in the cloud, while keeping the data siloed even from the cloud providers. At the same time, it enables cloud providers to offer their services to clients who would have otherwise refused to perform a computation altogether or insisted that it be done on private infrastructure. Secrecy ensures no information leakage and provides provable security guarantees by employing cryptographically secure Multi-Party Computation (MPC). In Secrecy we take a novel approach to optimizing MPC execution by co-designing multiple layers of the system stack and exposing the MPC costs to the query engine. To achieve practical performance, Secrecy applies physical optimizations that amortize the inherent MPC overheads along with logical optimizations that dramatically reduce the computation, communication, and space requirements during query execution. Our multi-cloud experiments demonstrate that Secrecy improves query performance by over 1000x compared to existing approaches and computes complex analytics on millions of data records with modest use of resources. 
    more » « less
  3. Increasingly large trip demands have strained urban transportation capacity, which consequently leads to traffic congestion and rapid growth of greenhouse gas emissions. In this work, we focus on achieving sustainable transportation by incentivizing passengers to switch from private cars to public transport. We address the following challenges. First, the passengers incur inconvenience costs when changing their transit behaviors due to delay and discomfort, and thus need to be reimbursed. Second, the inconvenience cost, however, is unknown to the government when choosing the incentives. Furthermore, changing transit behaviors raises privacy concerns from passengers. An adversary could infer personal information (e.g., daily routine, region of interest, and wealth) by observing the decisions made by the government, which are known to the public. We adopt the concept of differential privacy and propose privacy-preserving incentive designs under two settings, denoted as two-way communication and one-way communication. Under two-way communication, passengers submit bids and then the government determines the incentives, whereas in one-way communication, the government simply sets a price without acquiring information from the passengers. We formulate the problem under two-way communication as a mixed integer linear program and propose a polynomial-time approximation algorithm. We show the proposed approach achieves truthfulness, individual rationality, social optimality, and differential privacy. Under one-way communication, we focus on how the government should design the incentives without revealing passengers’ inconvenience costs while still preserving differential privacy. We formulate the problem as a convex program and propose a differentially private and near-optimal solution algorithm. A numerical case study using the Caltrans Performance Measurement System (PeMS) data source is presented as evaluation. The results show that the proposed approaches achieve a win-win situation in which both the government and passengers obtain non-negative utilities. 
    more » « less
  4. Federated learning (FL) is an increasingly popular approach for machine learning (ML) in cases where the training dataset is highly distributed. Clients perform local training on their datasets and the updates are then aggregated into the global model. Existing protocols for aggregation are either inefficient, or don’t consider the case of malicious actors in the system. This is a major barrier in making FL an ideal solution for privacy-sensitive ML applications. We present ELSA, a secure aggregation protocol for FL, which breaks this barrier - it is efficient and addresses the existence of malicious actors at the core of its design. Similar to prior work on Prio and Prio+, ELSA provides a novel secure aggregation protocol built out of distributed trust across two servers that keeps individual client updates private as long as one server is honest, defends against malicious clients, and is efficient end-to-end. Compared to prior works, the distinguishing theme in ELSA is that instead of the servers generating cryptographic correlations interactively, the clients act as untrusted dealers of these correlations without compromising the protocol’s security. This leads to a much faster protocol while also achieving stronger security at that efficiency compared to prior work. We introduce new techniques that retain privacy even when a server is malicious at a small added cost of 7-25% in runtime with negligible increase in communication over the case of semi-honest server. Our work improves end-to-end runtime over prior work with similar security guarantees by big margins - single-aggregator RoFL by up to 305x (for the models we consider), and distributed trust Prio by up to 8x. 
    more » « less
  5. Federated learning (FL) is an increasingly popular approach for machine learning (ML) when the training dataset is highly distributed. Clients perform local training on their datasets and the updates are then aggregated into the global model. Existing protocols for aggregation are either inefficient or don’t consider the case of malicious actors in the system. This is a major barrier to making FL an ideal solution for privacy-sensitive ML applications. In this talk, I will present ELSA, a secure aggregation protocol for FL that breaks this barrier - it is efficient and addresses the existence of malicious actors (clients + servers) at the core of its design. Similar to prior work Prio and Prio+, ELSA provides a novel secure aggregation protocol built out of distributed trust across two servers that keeps individual client updates private as long as one server is honest, defends against malicious clients, and is efficient end-to-end. Compared to prior works, the distinguishing theme in ELSA is that instead of the servers generating cryptographic correlations interactively, the clients act as untrusted dealers of these correlations without compromising the protocol’s security. This leads to a much faster protocol while also achieving stronger security at that efficiency compared to prior work. We introduce new techniques that retain privacy even when a server is malicious at a small added cost of 7-25% in runtime with a negligible increase in communication over the case of a semi-honest server. ELSA improves end-to-end runtime over prior work with similar security guarantees by big margins - single-aggregator RoFL by up to 305x (for the models we consider), and distributed-trust Prio by up to 8x (with up to 16x faster server-side protocol). Additionally, ELSA can be run in a bandwidth-saver mode for clients who are geographically bandwidth-constrained - an important property that is missing from prior works. 
    more » « less