skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Federated Learning under Restricted user Availability
Federated Learning (FL) is a decentralized machine learning framework that enables collaborative model training while respecting data privacy. In various applications, non-uniform availability or participation of users is unavoidable due to an adverse or stochastic environment, the latter often being uncontrollable during learning. Here, we posit a generic user selection mechanism implementing a possibly randomized, stationary selection policy, suggestively termed as a Random Access Model (RAM). We propose a new formulation of the FL problem which effectively captures and mitigates limited participation of data originating from infrequent, or restricted users, at the presence of a RAM. By employing the Conditional Value-at- Risk (CVaR) over the (unknown) RAM distribution, we extend the expected loss FL objective to a risk-aware objective, enabling the design of an efficient training algorithm that is completely oblivious to the RAM, and with essentially identical complexity as FedAvg. Our experiments on synthetic and benchmark datasets show that the proposed approach achieves significantly improved performance as compared with standard FL, under a variety of setups.  more » « less
Award ID(s):
2242215
PAR ID:
10527540
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-4485-1
Page Range / eLocation ID:
7055 to 7059
Format(s):
Medium: X
Location:
Seoul, Korea, Republic of
Sponsoring Org:
National Science Foundation
More Like this
  1. Federated learning (FL) emerges as a popular distributed learning schema that learns a model from a set of participating users without sharing raw data. One major challenge of FL comes with heterogeneous users, who may have distributionally different (or non-iid) data and varying computation resources. As federated users would use the model for prediction, they often demand the trained model to be robust against malicious attackers at test time. Whereas adversarial training (AT) provides a sound solution for centralized learning, extending its usage for federated users has imposed significant challenges, as many users may have very limited training data and tight computational budgets, to afford the data-hungry and costly AT. In this paper, we study a novel FL strategy: propagating adversarial robustness from rich-resource users that can afford AT, to those with poor resources that cannot afford it, during federated learning. We show that existing FL techniques cannot be effectively integrated with the strategy to propagate robustness among non-iid users and propose an efficient propagation approach by the proper use of batch-normalization. We demonstrate the rationality and effectiveness of our method through extensive experiments. Especially, the proposed method is shown to grant federated models remarkable robustness even when only a small portion of users afford AT during learning. Source code can be accessed at https://github.com/illidanlab/FedRBN. 
    more » « less
  2. Federated learning enables users to collaboratively train a machine learning model over their private datasets. Secure aggregation protocols are employed to mitigate information leakage about the local datasets from user-submitted model updates. This setup, however, still leaks the user participation in training, which can also be sensitive. Protecting user anonymity is even more challenging in dynamic environments where users may (re)join or leave the training process at any point of time. This paper introduces AnoFel, the first framework to support private and anonymous dynamic participation in federated learning (FL). AnoFel leverages several cryptographic primitives, the concept of anonymity sets, differential privacy, and a public bulletin board to support anonymous user registration, as well as unlinkable and confidential model update submission. Our system allows dynamic participation, where users can join or leave at any time without needing any recovery protocol or interaction. To assess security, we formalize a notion for privacy and anonymity in FL, and formally prove that AnoFel satisfies this notion. To the best of our knowledge, our system is the first solution with provable anonymity guarantees. To assess efficiency, we provide a concrete implementation of AnoFel, and conduct experiments showing its ability to support learning applications scaling to a large number of clients. For a TinyImageNet classification task with 512 clients, the client setup to join is less than 3 sec, and the client runtime for each training iteration takes a total of 8 sec, where the added overhead of AnoFel is 46% of the total runtime. We also compare our system with prior work and demonstrate its practicality. AnoFel client runtime is up to 5x faster than Truex et al., despite the added anonymity guarantee and dynamic user joining in AnoFel. Compared to Bonawitz et al., AnoFel is only 2x slower for added support for privacy in output, dynamic user joining, and anonymity. 
    more » « less
  3. Federated learning (FL) is a distributed machine learning technique to address the data privacy issue. Participant selection is critical to determine the latency of the training process in a heterogeneous FL architecture, where users with different hardware setups and wireless channel conditions communicate with their base station to participate in the FL training process. Many solutions have been designed to consider computational and uploading latency of different users to select suitable participants such that the straggler problem can be avoided. However, none of these solutions consider the waiting time of a participant, which refers to the latency of a participant waiting for the wireless channel to be available, and the waiting time could significantly affect the latency of the training process, especially when a huge number of participants are involved in the training process and share the wireless channel in the time-division duplexing manner to upload their local FL models. In this paper, we consider not only the computational and uploading latency but also the waiting time (which is estimated based on an M/G/1 queueing model) of a participant to select suitable participants. We formulate an optimization problem to maximize the number of selected participants, who can upload their local models before the deadline in a global iteration. The Latency awarE pARticipant selectioN (LEARN) algorithm is proposed to solve the problem and the performance of LEARN is validated via simulations. 
    more » « less
  4. We consider the problem of predicting cellular network performance (signal maps) from measurements collected by several mobile devices. We formulate the problem within the online federated learning framework: (i) federated learning (FL) enables users to collaboratively train a model, while keeping their training data on their devices; (ii) measurements are collected as users move around over time and are used for local training in an online fashion. We consider an honest-but-curious server, who observes the updates from target users participating in FL and infers their location using a deep leakage from gradients (DLG) type of attack, originally developed to reconstruct training data of DNN image classifiers. We make the key observation that a DLG attack, applied to our setting, infers the average location of a batch of local data, and can thus be used to reconstruct the target users' trajectory at a coarse granularity. We build on this observation to protect location privacy, in our setting, by revisiting and designing mechanisms within the federated learning framework including: tuning the FL parameters for averaging, curating local batches so as to mislead the DLG attacker, and aggregating across multiple users with different trajectories. We evaluate the performance of our algorithms through both analysis and simulation based on real-world mobile datasets, and we show that they achieve a good privacy-utility tradeoff. 
    more » « less
  5. Federated learning (FL) has emerged as a new paradigm of machine learning (ML) with the goal of collaborative learning on the vast pool of private data available across distributed edge devices. The focus of most existing works in FL systems has been on addressing the challenges of computation and communication heterogeneity inherent in training with edge devices. However, the crucial impact of I/O and the role of limited on-device storage has not been explored fully in FL context. Without policies to exploit the on-device storage for placement of client data samples, and schedule clients based on I/O benefits, FL training can lead to inefficiencies, such as increased training time and impacted accuracy convergence. In this paper, we propose FedCaSe, a framework for efficiently caching client samples in-situ on limited on-device storage and scheduling client participation. FedCaSe boosts the I/O performance by exploiting a unique characteristic--- the experience, i.e., relative impact on overall performance, of data samples and clients. FedCaSe utilizes this information in adaptive caching policies for sample placement inside the limited memory of edge clients. The framework also exploits the experience information to orchestrate the future selection of clients. Our experiments with representative workloads and policies show that compared to the state of the art, FedCaSe improves the training time by 2.06× for accuracy convergence at the scale of thousands of clients. 
    more » « less