skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the Information Theoretic Secure Aggregation with Uncoded Groupwise Keys
Secure aggregation, which is a core component of federated learning, aggregates locally trained models from distributed users at a central server. The “secure” nature of such aggregation consists of the fact that no information about the local users’ data must be leaked to the server except the aggregated local models. In order to guarantee security, some keys may be shared among the users (this is referred to as the key sharing phase). After the key sharing phase, each user masks its trained model which is then sent to the server (this is referred to as the model aggregation phase). This paper follows the information theoretic secure aggregation problem originally formulated by Zhao and Sun, with the objective to characterize the minimum communication cost from the K users in the model aggregation phase. Due to user dropouts, which are common in real systems, the server may not receive all messages from the users. A secure aggregation scheme should tolerate the dropouts of at most K – U users, where U is a system parameter. The optimal communication cost is characterized by Zhao and Sun, but with the assumption that the keys stored by the users could be any random variables with arbitrary dependency. On the motivation that uncoded groupwise keys are more convenient to be shared and could be used in large range of applications besides federated learning, in this paper we add one constraint into the above problem, namely, that the key variables are mutually independent and each key is shared by a group of S users, where S is another system parameter. To the best of our knowledge, all existing secure aggregation schemes (with information theoretic security or computational security) assign coded keys to the users. We show that if S > K–U, a new secure aggregation scheme with uncoded groupwise keys can achieve the same optimal communication cost as the best scheme with coded keys; if S ≤ K – U, uncoded groupwise key sharing is strictly sub-optimal. Finally, we also implement our proposed secure aggregation scheme into Amazon EC2, which are then compared with the existing secure aggregation schemes with offline key sharing.  more » « less
Award ID(s):
2312227 2145835 2045656 2312228 2516634
PAR ID:
10527829
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Information Theory
ISSN:
0018-9448
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Secure aggregation, which is a core component of federated learning, aggregates locally trained models from distributed users at a central server, without revealing any other information about the local users' data. This paper follows a recent information theoretic secure aggregation problem with user dropouts, where the objective is to characterize the minimum communication cost from the K users to the server during the model aggregation. All existing secure aggregation protocols let the users share and store coded keys to guarantee security. On the motivation that uncoded groupwise keys are more convenient to be shared and could be used in large range of practical applications, this paper is the first to consider uncoded groupwise keys, where the keys are mutually independent and each key is shared by a group of S users. We show that if S is beyond a threshold, a new secure aggregation protocol with uncoded groupwise keys, referred to as GroupSecAgg, can achieve the same optimal communication cost as the best protocol with coded keys. The experiments on Amazon EC2 show the considerable improvements on the key sharing and model aggregation times compared to the state-of-the art. 
    more » « less
  2. This paper considers the secure aggregation problem for federated learning under an information theoretic cryptographic formulation, where distributed training nodes (referred to as users) train models based on their own local data and a curious-but-honest server aggregates the trained models without retrieving other information about users’ local data. Secure aggregation generally contains two phases, namely key sharing phase and model aggregation phase. Due to the common effect of user dropouts in federated learning, the model aggregation phase should contain two rounds, where in the first round the users transmit masked models and, in the second round, according to the identity of surviving users after the first round, these surviving users transmit some further messages to help the server decrypt the sum of users’ trained models. The objective of the considered information theoretic formulation is to characterize the capacity region of the communication rates from the users to the server in the two rounds of the model aggregation phase, assuming that key sharing has already been performed offline in prior. In this context, Zhao and Sun completely characterized the capacity region under the assumption that the keys can be arbitrary random variables. More recently, an additional constraint, known as “uncoded groupwise keys,” has been introduced. This constraint entails the presence of multiple independent keys within the system, with each key being shared by precisely S users, where S is a defined system parameter. The capacity region for the information theoretic secure aggregation problem with uncoded groupwise keys was established in our recent work subject to the condition S > K - U, where K is the number of total users and U is the designed minimum number of surviving users (which is another system parameter). In this paper we fully characterize the capacity region for this problem by matching a new converse bound and an achievable scheme. Experimental results over the Tencent Cloud show the improvement on the model aggregation time compared to the original secure aggregation scheme. 
    more » « less
  3. This paper studies information theoretic secure aggregation in federated learning, which involves K distributed nodes and a central server. For security, the server can only recover aggregated updates of locally trained models, without any other information about the local users’ data being leaked. The secure aggregation process typically consists of two phases: the key sharing phase and the model aggregation phase. In previous research, a constraint on keys known as “uncoded groupwise keys” was introduced, and we adopt this constraint during the key sharing phase, where each set of S -users shares an independent key. During the model aggregation phase, each user transmits its encrypted model results to the server. To tolerate user dropouts in secure aggregation (i.e., some users may not respond), where up to K−U users may drop out and the identity of the surviving users is unpredictable in advance, at least two rounds of transmission are required in the model aggregation phase. In the first round, users send the masked models. Then, in the second round, based on the identity of the surviving users after the first round, these surviving users send additional messages that assist the server in decrypting the sum of the users’ trained models. Our goal is to minimize the number of transmissions in the two rounds. Additionally, we consider the potential impact of user collusion, where up to T users may collude with the server. This requires the transmissions to meet stricter security constraints, ensuring that the server cannot learn anything beyond the aggregated model updates, even if it colludes with any set of up to T users. For this more challenging problem, we propose schemes that ensure secure aggregation and achieve the capacity region when S∈{2}∪[K−U+1:K−T] . Experimental results conducted on Tencent Cloud also show that the proposed secure aggregation schemes improve the model aggregation time compared to the benchmark scheme. 
    more » « less
  4. Secure aggregation is motivated by federated learning (FL) where a cloud server aims to compute an averaged model (i.e., weights of deep neural networks) from the locally-trained models of numerous clients, while adhering to data security requirements. Hierarchical secure aggregation (HSA) studies secure aggregation of user inputs (an abstraction of the local models) in a three-layer network with clustered users connected to the server through an intermediate layer of relays. In HSA, in addition to the conventional server security, relay security is also imposed so that the relays remain oblivious to the inputs. However, existing studies on HSA have assumed that each user is associated with only one relay, which prevents coding opportunities across inter-cluster users to achieve efficient communication and key generation. In this paper, we consider HSA with a commonly used cyclic association pattern where each user is connected to B relays in a cyclic manner. We aim to determine the best communication and security key rates in such a multi-association network. We show that when B≤K−1 (K is the total number of users), to securely compute one symbol of the desired sum of inputs, each user needs to send at least R∗X=1 symbol to the associated relays, each relay needs to send at least R∗Y=1/B symbols to the server, each user needs to hold at least R∗Z=1/B secret key symbols, and all users need to collectively hold at least R∗ZΣ=max{1,K/B−1} independent key symbols. This reveals a fundamental trade-off between the association number B and the communication and key rates. When B=K, we present a scheme that achieves the optimal communication and source key rates, along with a nearoptimal individual key rate. 
    more » « less
  5. Federated learning (FL) is well-suited to 5G networks, where many mobile devices generate sensitive edge data. Secure aggregation protocols enhance privacy in FL by ensuring that individual user updates reveal no information about the underlying client data. However, the dynamic and large-scale nature of 5G-marked by high mobility and frequent dropouts-poses significant challenges to the effective adoption of these protocols. Existing protocols often require multi-round communication or rely on fixed infrastructure, limiting their practicality. We propose a lightweight, single-round secure aggregation protocol designed for 5G environments. By leveraging base stations for assisted computation and incorporating precomputation, key-homomorphic pseudorandom functions, and t-out-of-k secret sharing, our protocol ensures efficiency, robustness, and privacy. Experiments show strong security guarantees and significant gains in communication and computation efficiency, making the approach well-suited for real-world 5G FL deployments. 
    more » « less