This paper considers the secure aggregation problem for federated learning under an information theoretic cryptographic formulation, where distributed training nodes (referred to as users) train models based on their own local data and a curious-but-honest server aggregates the trained models without retrieving other information about users’ local data. Secure aggregation generally contains two phases, namely key sharing phase and model aggregation phase. Due to the common effect of user dropouts in federated learning, the model aggregation phase should contain two rounds, where in the first round the users transmit masked models and, in the second round, according to the identity of surviving users after the first round, these surviving users transmit some further messages to help the server decrypt the sum of users’ trained models. The objective of the considered information theoretic formulation is to characterize the capacity region of the communication rates from the users to the server in the two rounds of the model aggregation phase, assuming that key sharing has already been performed offline in prior. In this context, Zhao and Sun completely characterized the capacity region under the assumption that the keys can be arbitrary random variables. More recently, an additional constraint, known as “uncoded groupwise keys,” has been introduced. This constraint entails the presence of multiple independent keys within the system, with each key being shared by precisely S users, where S is a defined system parameter. The capacity region for the information theoretic secure aggregation problem with uncoded groupwise keys was established in our recent work subject to the condition S > K - U, where K is the number of total users and U is the designed minimum number of surviving users (which is another system parameter). In this paper we fully characterize the capacity region for this problem by matching a new converse bound and an achievable scheme. Experimental results over the Tencent Cloud show the improvement on the model aggregation time compared to the original secure aggregation scheme.
more »
« less
On Secure Capacity of Multiple Unicast Traffic over Separable Networks
This paper studies the problem of information theoretic secure communication when a source has private messages to transmit to m destinations, in the presence of a passive adversary who eavesdrops an unknown set of k edges. The information theoretic secure capacity is derived over unit-edge capacity separable networks, for the cases when k = 1 and m is arbitrary, or m = 3 and k is arbitrary. This is achieved by first showing that there exists a secure polynomial-time code construction that matches an outer bound over two-layer networks, followed by a deterministic mapping between two-layer and arbitrary separable networks.
more »
« less
- Award ID(s):
- 1740047
- PAR ID:
- 10185991
- Date Published:
- Journal Name:
- IEEE Information Theory Workshop (ITW)
- Page Range / eLocation ID:
- 1 to 5
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Secure aggregation, which is a core component of federated learning, aggregates locally trained models from distributed users at a central server. The “secure” nature of such aggregation consists of the fact that no information about the local users’ data must be leaked to the server except the aggregated local models. In order to guarantee security, some keys may be shared among the users (this is referred to as the key sharing phase). After the key sharing phase, each user masks its trained model which is then sent to the server (this is referred to as the model aggregation phase). This paper follows the information theoretic secure aggregation problem originally formulated by Zhao and Sun, with the objective to characterize the minimum communication cost from the K users in the model aggregation phase. Due to user dropouts, which are common in real systems, the server may not receive all messages from the users. A secure aggregation scheme should tolerate the dropouts of at most K – U users, where U is a system parameter. The optimal communication cost is characterized by Zhao and Sun, but with the assumption that the keys stored by the users could be any random variables with arbitrary dependency. On the motivation that uncoded groupwise keys are more convenient to be shared and could be used in large range of applications besides federated learning, in this paper we add one constraint into the above problem, namely, that the key variables are mutually independent and each key is shared by a group of S users, where S is another system parameter. To the best of our knowledge, all existing secure aggregation schemes (with information theoretic security or computational security) assign coded keys to the users. We show that if S > K–U, a new secure aggregation scheme with uncoded groupwise keys can achieve the same optimal communication cost as the best scheme with coded keys; if S ≤ K – U, uncoded groupwise key sharing is strictly sub-optimal. Finally, we also implement our proposed secure aggregation scheme into Amazon EC2, which are then compared with the existing secure aggregation schemes with offline key sharing.more » « less
-
Linear computation broadcast (LCBC) refers to a setting with d dimensional data stored at a central server, where K users, each with some prior linear side-information, wish to compute various linear combinations of the data. For each computation instance, the data is represented as a d-dimensional vector with elements in a finite field Fpn where pn is a power of a prime. The computation is to be performed many times, and the goal is to determine the minimum amount of information per computation instance that must be broadcast to satisfy all the users. The reciprocal of the optimal broadcast cost per computation instance is the capacity of LCBC. The capacity is known for up to K = 3 users. Since LCBC includes index coding as a special case, large K settings of LCBC are at least as hard as the index coding problem. As such the general LCBC problem is beyond our reach and we do not pursue it. Instead of the general setting (all cases), by focusing on the generic setting (almost all cases) this work shows that the generic capacity of the symmetric LCBC (where every user has m′ dimensions of side-information and m dimensions of demand) for large number of users (K ≥ d suffices) is Cg = 1/∆g, where ∆g = min{ max{0, d − m' }, dm/(m+m′)}, is the broadcast cost that is both achievable and unbeatable asymptotically almost surely for large n, among all LCBC instances with the given parameters p, K, d, m, m′. Relative to baseline schemes of random coding or separate transmissions, Cg shows an extremal gain by a factor of K as a function of number of users, and by a factor of ≈ d/4 as a function of data dimensions, when optimized over remaining parameters. For arbitrary number of users, the generic capacity of the symmetric LCBC is characterized within a factor of 2.more » « less
-
Fuzzy extractors convert noisy signals from the physical world into reliable cryptographic keys. Fuzzy min-entropy measures the limit of the length of key that a fuzzy extractor can derive from a distribution (Fuller et al. in IEEE Trans Inf Theory 66(8):5282–5298, 2020). In general, fuzzy min-entropy that is superlogarithmic in the security parameter is required for a noisy distribution to be suitable for key derivation. There is a wide gap between what is possible with respect to computational and information-theoretic adversaries. Under the assumption of general-purpose obfuscation, keys can be securely derived from all distributions with superlogarithmic entropy. Against information-theoretic adversaries, however, it is impossible to build a single fuzzy extractor that works for all distributions (Fuller et al. 2020). A weaker information-theoretic goal is building a fuzzy extractor for each probability distribution. This is the approach taken by Woodage et al. (in: Advances in Cryptology—CRYPTO, Springer, pp 682–710, 2017). Prior approaches use the full description of the probability mass function and are inefficient. We show this is inherent: for a quarter of distributions with fuzzy min-entropy and $2^k$ points there is no secure fuzzy extractor that uses less $$2^{\Theta(k)}$$ bits of information about the distribution. We show an analogous result with stronger parameters for information-theoretic secure sketches. Secure sketches are frequently used to construct fuzzy extractors.more » « less
-
Matthieu Bloch (Ed.)Motivated by an open problem and a conjecture, this work studies the problem of single server private information retrieval with private coded side information (PIR-PCSI) that was recently introduced by Heidarzadeh et al. The goal of PIR-PCSI is to allow a user to efficiently retrieve a desired message Wθ, which is one of K independent messages that are stored at a server, while utilizing private side information of a linear combination of a uniformly chosen size-M subset (S ⊂ [K]) of messages. The settings PIR-PCSI-I and PIR-PCSI-II correspond to the constraints that θ is generated uniformly from [K]\S, and S, respectively. In each case, (θ, S) must be kept private from the server. The capacity is defined as the supremum over message and field sizes, of achievable rates (number of bits of desired message retrieved per bit of download) and is characterized by Heidarzadeh et al. for PIR-PCSI-I in general, and for PIR- PCSI-II for M > (K + 1)/2 as (K − M + 1)−1. For 2 ≤ M ≤ (K + 1)/2 the capacity of PIR-PCSI-II remains open, and it is conjectured that even in this case the capacity is (K − M + 1)−1. We show the capacity of PIR-PCSI-II is equal to 2/K for 2 ≤ M ≤ K+1, which is strictly larger 2 than the conjectured value, and does not depend on M within this parameter regime. Remarkably, half the side-information is found to be redundant. We also characterize the infimum capacity (infimum over fields instead of supremum), and the capacity with private coefficients. The results are generalized to PIR-PCSI-I (θ ∈ [K] \ S) and PIR-PCSI (θ ∈ [K]) settings.more » « less
An official website of the United States government

