skip to main content

Title: FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers
Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem—where clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck—where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one single dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based, synchronous mechanisms to tackle the straggler problem. However, asynchronous methods can easily create a communication bottleneck, while tiering may introduce biases that favor faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning system with Asynchronous Tiers under Non-i.i.d. training data. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5×, compared to state-of-the-art FL methods.  more » « less
Award ID(s):
1919075 1919113 2134689
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
The International Conference for High Performance Computing, Networking, Storage, and Analysis
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The conventional machine learning (ML) and deep learning (DL) methods use large amount of data to construct desirable prediction models in a central fusion center for recognizing human activities. However, such model training encounters high communication costs and leads to privacy infringement. To address the issues of high communication overhead and privacy leakage, we employed a widely popular distributed ML technique called Federated Learning (FL) that generates a global model for predicting human activities by combining participated agents’ local knowledge. The state-of-the-art FL model fails to maintain acceptable accuracy when there is a large number of unreliable agents who can infuse false model, or, resource-constrained agents that fails to perform an assigned computational task within a given time window. We developed an FL model for predicting human activities by monitoring agent’s contributions towards model convergence and avoiding the unreliable and resource-constrained agents from training. We assign a score to each client when it joins in a network and the score is updated based on the agent’s activities during training. We consider three mobile robots as FL clients that are heterogeneous in terms of their resources such as processing capability, memory, bandwidth, battery-life and data volume. We consider heterogeneous mobile robots for understanding the effects of real-world FL setting in presence of resource-constrained agents. We consider an agent unreliable if it repeatedly gives slow response or infuses incorrect models during training. By disregarding the unreliable and weak agents, we carry-out the local training of the FL process on selected agents. If somehow, a weak agent is selected and started showing straggler issues, we leverage asynchronous FL mechanism that aggregate the local models whenever it receives a model update from the agents. Asynchronous FL eliminates the issue of waiting for a long time to receive model updates from the weak agents. To the end, we simulate how we can track the behavior of the agents through a reward-punishment scheme and present the influence of unreliable and resource-constrained agents in the FL process. We found that FL performs slightly worse than centralized models, if there is no unreliable and resource-constrained agent. However, as the number of malicious and straggler clients increases, our proposed model performs more effectively by identifying and avoiding those agents while recognizing human activities as compared to the stateof-the-art FL and ML approaches. 
    more » « less
  2. Communication is a key bottleneck in federated learning where a large number of edge devices collaboratively learn a model under the orchestration of a central server without sharing their own training data. While local SGD has been proposed to reduce the number of FL rounds and become the algorithm of choice for FL, its total communication cost is still prohibitive when each device needs to communicate with the remote server repeatedly for many times over bandwidth-limited networks. In light of both device-to-device (D2D) and device-to-server (D2S) cooperation opportunities in modern communication networks, this paper proposes a new federated optimization algorithm dubbed hybrid local SGD (HL-SGD) in FL settings where devices are grouped into a set of disjoint clusters with high D2D communication bandwidth. HL-SGD subsumes previous proposed algorithms such as local SGD and gossip SGD and enables us to strike the best balance between model accuracy and runtime. We analyze the convergence of HL-SGD in the presence of heterogeneous data for general nonconvex settings. We also perform extensive experiments and show that the use of hybrid model aggregation via D2D and D2S communications in HL-SGD can largely speed up the training time of federated learning. 
    more » « less
  3. Federated learning (FL) has attracted increasing attention as a promising technique to drive a vast number of edge devices with artificial intelligence. However, it is very challenging to guarantee the efficiency of a FL system in practice due to the heterogeneous computation resources on different devices. To improve the efficiency of FL systems in the real world, asynchronous FL (AFL) and semi-asynchronous FL (SAFL) methods are proposed such that the server does not need to wait for stragglers. However, existing AFL and SAFL systems suffer from poor accuracy and low efficiency in realistic settings where the data is non-IID distributed across devices and the on-device resources are extremely heterogeneous. In this work, we propose FedSEA - a semi-asynchronous FL framework for extremely heterogeneous devices. We theoretically disclose that the unbalanced aggregation frequency is a root cause of accuracy drop in SAFL. Based on this analysis, we design a training configuration scheduler to balance the aggregation frequency of devices such that the accuracy can be improved. To improve the efficiency of the system in realistic settings where the devices have dynamic on-device resource availability, we design a scheduler that can efficiently predict the arriving time of local updates from devices and adjust the synchronization time point according to the devices' predicted arriving time. We also consider the extremely heterogeneous settings where there exist extremely lagging devices that take hundreds of times as long as the training time of the other devices. In the real world, there might be even some extreme stragglers which are not capable of training the global model. To enable these devices to join in training without impairing the systematic efficiency, Fed-SEA enables these extreme stragglers to conduct local training on much smaller models. Our experiments show that compared with status quo approaches, FedSEA improves the inference accuracy by 44.34% and reduces the systematic time cost and local training time cost by 87.02× and 792.9×. FedSEA also reduces the energy consumption of the devices with extremely limited resources by 752.9×. 
    more » « less
  4. Artificial Intelligence (AI) is moving towards the edge. Training an AI model for edge computing on a centralized server increases latency, and the privacy of edge users is jeopardized due to private data transfer through a less secure communication channels. Additionally, existing high-power computing systems are battling with memory and data transfer bottlenecks between the processor and memory. Federated Learning (FL) is a collaborative AI learning paradigm for distributed local devices that operates without transferring local data. Local participant devices share the updated network parameters with the central server instead of sending the original data. The central server updates the global AI model and deploys the model to the local clients. As the local data resides only on the edge, these devices need to be protected from cyberattacks. The Federated Intrusion Detection System (FIDS) could be a viable system to protect edge devices as opposed to a centralized protection system. However, on-device training of the model in resource constrained devices may suffer from excessive power drain, in addition to memory and area overhead. In this work we present a memristor based system for AI training on edge devices. Memristor devices are ideal candidates for processing in memory, as their dynamic resistance properties allow them to perform multiply-add operations in parallel in the analog domain with extreme efficiency. Alternatively, existing CMOS-based PIM systems are typically developed for edge inference based on pretrained weights, and are not equipped for on-chip training. We show the effectiveness of the system, where successful learning and recognition is achieved completely within edge devices. The classification accuracy of the memristor system shows negligible loss when compared a software implementation. To the best of our knowledge, this first demonstration of a memristor based federated learning system. We demonstrate the effectiveness of this system as an intrusion detection platform for edge devices, although given the flexibility of the learning algorithm, it could be used to enhance many types of on board leaning and classification applications. 
    more » « less
  5. Due to the often limited communication bandwidth of edge devices, most existing federated learning (FL) methods randomly select only a subset of devices to participate in training at each communication round. Compared with engaging all the available clients, such a random-selection mechanism could lead to significant performance degradation on non-IID (independent and identically distributed) data. In this paper, we present our key observation that the essential reason resulting in such performance degradation is the class-imbalance of the grouped data from randomly selected clients. Based on this observation, we design an efficient heterogeneity-aware client sampling mechanism, namely, Federated Class-balanced Sampling (Fed-CBS), which can effectively reduce class-imbalance of the grouped dataset from the intentionally selected clients. We first propose a measure of class-imbalance which can be derived in a privacy-preserving way. Based on this measure, we design a computationefficient client sampling strategy such that the actively selected clients will generate a more classbalanced grouped dataset with theoretical guarantees. Experimental results show that Fed-CBS outperforms the status quo approaches in terms of test accuracy and the rate of convergence while achieving comparable or even better performance than the ideal setting where all the available clients participate in the FL training. 
    more » « less