skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 11, 2026

Title: Decentralized Federated Learning with Model Caching on Mobile Agents
Federated Learning (FL) trains a shared model using data and computation power on distributed agents coordinated by a central server. Decentralized FL (DFL) utilizes local model exchange and aggregation between agents to reduce the communication and computation overheads on the central server. However, when agents are mobile, the communication opportunity between agents can be sporadic, largely hindering the convergence and accuracy of DFL. In this paper, we propose Cached Decentralized Federated Learning (Cached-DFL) to investigate delay-tolerant model spreading and aggregation enabled by model caching on mobile agents. Each agent stores not only its own model, but also models of agents encountered in the recent past. When two agents meet, they exchange their own models as well as the cached models. Local model aggregation utilizes all models stored in the cache. We theoretically analyze the convergence of Cached-DFL,explicitly taking into account the model staleness introduced by caching. We design and compare different model caching algorithms for different DFL and mobility scenarios. We conduct detailed case studies in a vehicular network to systematically investigate the interplay between agent mobility, cache staleness, and model convergence. In our experiments, Cached-DFL converges quickly, and significantly outperforms DFL without caching.  more » « less
Award ID(s):
2337914 2315614
PAR ID:
10613601
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
AAAI
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
39
Issue:
20
ISSN:
2159-5399
Page Range / eLocation ID:
21296 to 21303
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Federated Learning (FL) aims to train a shared model using data and computation power on distributed agents coordinated by a central server. Decentralized FL (DFL) utilizes local model exchange and aggregation between agents to reduce the communication and computation overheads on the central server. However, when agents are mobile, the communication opportunity between agents can be sporadic, largely hindering the convergence and accuracy of DFL. In this paper, we study delay-tolerant model spreading and aggregation enabled by model caching on mobile agents. Each agent stores not only its own model, but also models of agents encountered in the recent past. When two agents meet, they exchange their own models as well as the cached models. Local model aggregation works on all models in the cache. We theoretically analyze the convergence of DFL with cached models, explicitly taking into account the model staleness introduced by caching. We design and compare different model caching algorithms for different DFL and mobility scenarios. We conduct detailed case studies in a vehicular network to systematically investigate the interplay between agent mobility, cache staleness, and model convergence. In our experiments, cached DFL converges quickly, and significantly outperforms DFL without caching. 
    more » « less
  2. Standard ML relies on training using a centrally collected dataset, while collaborative learning techniques such as Federated Learning (FL) enable data to remain decentralized at client locations. In FL, a central server coordinates the training process, reducing computation and communication expenses for clients. However, this centralization can lead to server congestion and heightened risk of malicious activity or data privacy breaches. In contrast, Peer-to-Peer Learning (P2PL) is a fully decentralized system where nodes manage both local training and aggregation tasks. While P2PL promotes privacy by eliminating the need to trust a single node, it also results in increased computation and communication costs, along with potential difficulties in achieving consensus among nodes. To address the limitations of both FL and P2PL, we propose a hybrid approach called Hubs-and-Spokes Learning (HSL). In HSL, hubs function similarly to FL servers, maintaining consensus but exerting less control over spokes. This paper argues that HSL’s design allows for greater availability and privacy than FL, while reducing computation and communication costs compared to P2PL. Additionally, HSL maintains consensus and integrity in the learning process. 
    more » « less
  3. Communication is a key bottleneck in federated learning where a large number of edge devices collaboratively learn a model under the orchestration of a central server without sharing their own training data. While local SGD has been proposed to reduce the number of FL rounds and become the algorithm of choice for FL, its total communication cost is still prohibitive when each device needs to communicate with the remote server repeatedly for many times over bandwidth-limited networks. In light of both device-to-device (D2D) and device-to-server (D2S) cooperation opportunities in modern communication networks, this paper proposes a new federated optimization algorithm dubbed hybrid local SGD (HL-SGD) in FL settings where devices are grouped into a set of disjoint clusters with high D2D communication bandwidth. HL-SGD subsumes previous proposed algorithms such as local SGD and gossip SGD and enables us to strike the best balance between model accuracy and runtime. We analyze the convergence of HL-SGD in the presence of heterogeneous data for general nonconvex settings. We also perform extensive experiments and show that the use of hybrid model aggregation via D2D and D2S communications in HL-SGD can largely speed up the training time of federated learning. 
    more » « less
  4. As a promising approach to deal with distributed data, Federated Learning (FL) achieves major advancements in recent years. FL enables collaborative model training by exploiting the raw data dispersed in multiple edge devices. However, the data is generally non-independent and identically distributed, i.e., statistical heterogeneity, and the edge devices significantly differ in terms of both computation and communication capacity, i.e., system heterogeneity. The statistical heterogeneity leads to severe accuracy degradation while the system heterogeneity significantly prolongs the training process. In order to address the heterogeneity issue, we propose an Asynchronous Staleness-aware Model Update FL framework, i.e., FedASMU, with two novel methods. First, we propose an asynchronous FL system model with a dynamical model aggregation method between updated local models and the global model on the server for superior accuracy and high efficiency. Then, we propose an adaptive local model adjustment method by aggregating the fresh global model with local models on devices to further improve the accuracy. Extensive experimentation with 6 models and 5 public datasets demonstrates that FedASMU significantly outperforms baseline approaches in terms of accuracy (0.60% to 23.90% higher) and efficiency (3.54% to 97.98% faster). 
    more » « less
  5. The conventional machine learning (ML) and deep learning (DL) methods use large amount of data to construct desirable prediction models in a central fusion center for recognizing human activities. However, such model training encounters high communication costs and leads to privacy infringement. To address the issues of high communication overhead and privacy leakage, we employed a widely popular distributed ML technique called Federated Learning (FL) that generates a global model for predicting human activities by combining participated agents’ local knowledge. The state-of-the-art FL model fails to maintain acceptable accuracy when there is a large number of unreliable agents who can infuse false model, or, resource-constrained agents that fails to perform an assigned computational task within a given time window. We developed an FL model for predicting human activities by monitoring agent’s contributions towards model convergence and avoiding the unreliable and resource-constrained agents from training. We assign a score to each client when it joins in a network and the score is updated based on the agent’s activities during training. We consider three mobile robots as FL clients that are heterogeneous in terms of their resources such as processing capability, memory, bandwidth, battery-life and data volume. We consider heterogeneous mobile robots for understanding the effects of real-world FL setting in presence of resource-constrained agents. We consider an agent unreliable if it repeatedly gives slow response or infuses incorrect models during training. By disregarding the unreliable and weak agents, we carry-out the local training of the FL process on selected agents. If somehow, a weak agent is selected and started showing straggler issues, we leverage asynchronous FL mechanism that aggregate the local models whenever it receives a model update from the agents. Asynchronous FL eliminates the issue of waiting for a long time to receive model updates from the weak agents. To the end, we simulate how we can track the behavior of the agents through a reward-punishment scheme and present the influence of unreliable and resource-constrained agents in the FL process. We found that FL performs slightly worse than centralized models, if there is no unreliable and resource-constrained agent. However, as the number of malicious and straggler clients increases, our proposed model performs more effectively by identifying and avoiding those agents while recognizing human activities as compared to the stateof-the-art FL and ML approaches. 
    more » « less