skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adaptive Participant Selection in Heterogeneous Federated Learning
Federated learning (FL) is a distributed machine learning technique to address the data privacy issue. Participant selection is critical to determine the latency of the training process in a heterogeneous FL architecture, where users with different hardware setups and wireless channel conditions communicate with their base station to participate in the FL training process. Many solutions have been designed to consider computational and uploading latency of different users to select suitable participants such that the straggler problem can be avoided. However, none of these solutions consider the waiting time of a participant, which refers to the latency of a participant waiting for the wireless channel to be available, and the waiting time could significantly affect the latency of the training process, especially when a huge number of participants are involved in the training process and share the wireless channel in the time-division duplexing manner to upload their local FL models. In this paper, we consider not only the computational and uploading latency but also the waiting time (which is estimated based on an M/G/1 queueing model) of a participant to select suitable participants. We formulate an optimization problem to maximize the number of selected participants, who can upload their local models before the deadline in a global iteration. The Latency awarE pARticipant selectioN (LEARN) algorithm is proposed to solve the problem and the performance of LEARN is validated via simulations.  more » « less
Award ID(s):
1757207
PAR ID:
10315781
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2021 IEEE Global Communications Conference (GLOBECOM)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Federated learning (FL) is a collaborative machine-learning (ML) framework particularly suited for ML models requiring numerous training samples, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Random Forest, in the context of various applications, e.g., next-word prediction and eHealth. FL involves various clients participating in the training process by uploading their local models to an FL server in each global iteration. The server aggregates these models to update a global model. The traditional FL process may encounter bottlenecks, known as the straggler problem, where slower clients delay the overall training time. This paper introduces the Latency-awarE Semi-synchronous client Selection and mOdel aggregation for federated learNing (LESSON) method. LESSON allows clients to participate at different frequencies: faster clients contribute more frequently, therefore mitigating the straggler problem and expediting convergence. Moreover, LESSON provides a tunable trade-off between model accuracy and convergence rate by setting varying deadlines. Simulation results show that LESSON outperforms two baseline methods, namely FedAvg and FedCS, in terms of convergence speed and maintains higher model accuracy compared to FedCS. 
    more » « less
  2. null (Ed.)
    Federated learning (FL) is a highly pursued machine learning technique that can train a model centrally while keeping data distributed. Distributed computation makes FL attractive for bandwidth limited applications especially in wireless communications. There can be a large number of distributed edge devices connected to a central parameter server (PS) and iteratively download/upload data from/to the PS. Due to limited bandwidth, only a subset of connected devices can be scheduled in each round. There are usually millions of parameters in the state-of-art machine learning models such as deep learning, resulting in a high computation complexity as well as a high communication burden on collecting/distributing data for training. To improve communication efficiency and make the training model converge faster, we propose a new scheduling policy and power allocation scheme using non-orthogonal multiple access (NOMA) settings to maximize the weighted sum data rate under practical constraints during the entire learning process. NOMA allows multiple users to transmit on the same channel simultaneously. The user scheduling problem is transformed into a maximum-weight independent set problem that can be solved using graph theory. Simulation results show that the proposed scheduling and power allocation scheme can help achieve a higher FL testing accuracy in NOMA based wireless networks than other existing schemes within the same learning time. 
    more » « less
  3. Federated learning (FL) has been emerging as a new distributed machine learning paradigm recently. Although FL can protect the data privacy of participants by keeping their training data on local devices, there are recent works raising new privacy concerns especially when workers or the parameter server of FL are untrustworthy or malicious. One effective way to solve the problem is using hierarchical federated learning (HFL) where a few middle-layer aggregators (or called group leaders) are used to aggregate local model updates from workers and send group model updates to the parameter server. In this paper, we consider the participant selection problem of HFL in an edge cloud with multiple FL models, where each model needs to select one parameter server, a few group leaders and a certain amount of workers from edge servers to jointly perform HFL. We first formulate this problem as a non-linear integer programming, aiming to minimize the total learning cost of all models while satisfying the constrained edge resources. We then design a three-stage algorithm by decoupling the original problem into three sub-problems and solving them iteratively. Simulations with real-world datasets and FL models confirm that our proposed algorithm can efficiently reduce the average total learning cost in edge cloud compared with existing methods. 
    more » « less
  4. As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training based on its full computing capacity. Although such fixed size subnetwork assignment enables FL training over heterogeneous mobile devices, it is unaware of (i) the dynamic changes of devices' communication and computing conditions and (ii) FL training progress and its dynamic requirements of local training contributions, both of which may cause very long FL training delay. Motivated by those dynamics, in this paper, we develop a wireless and heterogeneity aware latency efficient FL (WHALE-FL) approach to accelerate FL training through adaptive subnetwork scheduling. Instead of sticking to the fixed size subnetwork, WHALE-FL introduces a novel subnetwork selection utility function to capture device and FL training dynamics, and guides the mobile device to adaptively select the subnetwork size for local training based on (a) its computing and communication capacity, (b) its dynamic computing and/or communication conditions, and (c) FL training status and its corresponding requirements for local training contributions. Our evaluation shows that, compared with peer designs, WHALE-FL effectively accelerates FL training without sacrificing learning accuracy. 
    more » « less
  5. Federated learning (FL) enables multiple parties to collaboratively train machine learning models while preserving data privacy. However, securing communication within FL frameworks remains a significant challenge due to potential vulnerabilities to data breaches and integrity attacks. This paper proposes a novel approach using Dilithium, a robust digital signature framework, to enhance data security in FL. By integrating Dilithium into FL protocols, this study demonstrates robust communication security, preventing data tampering and unauthorized access, thereby promoting safer and more efficient collaborative model training across distributed networks. Furthermore, our approach incorporates an optimized client selection algorithm and a parallelized GPU-based training process that reduces latency and ensures seamless synchronization among participants. Experimental results demonstrate that our system achieves a total processing time of 6.891 seconds, significantly outperforming the 10.24 seconds of normal FL and 12.32 seconds of FL-Dilithium systems on the same computing platforms. Additionally, the proposed model achieves an accuracy of 94%, surpassing the 93% of the normal FL. 
    more » « less