Hierarchical Federated Learning (HFL) has shown great promise over the past few years, with significant improvements in communication efficiency and overall performance. However, current research for HFL predominantly centers on supervised learning. This focus becomes problematic when dealing with semi-supervised learning, particularly under non-IID scenarios. In order to address this gap, our paper critically assesses the performance of straightforward adaptations of current state-of-the-art semi-supervised FL (SSFL) techniques within the HFL framework. We also introduce a novel clustering mechanism for hierarchical embeddings to alleviate the challenges introduced by semi-supervised paradigms in a hierarchical setting. Our approach not only provides superior accuracy, but also converges up to 5.11× faster, while being robust to non-IID data distributions for multiple datasets with negligible communication overhead
more »
« less
Communication-efficient design of machine learning system for energy demand forecasting of electrical vehicles
Machine learning has shown incredibly potential in many fields of application ranging from ChatGPT and Bard to Tesla’s autonomous vehicles. These ML models require vast amounts of data and communications overhead in order to be effective. In this paper we propose a communication-efficient time series forecasting model combining the most recent advancements in MetaFormer architecture implemented across a federated series of learning nodes. The time series prediction performance and communication overhead cost of the distributed model is compared against a similar centralized model and shown to have parity in performance while consuming much lower data rates during training.
more »
« less
- Award ID(s):
- 1923803
- PAR ID:
- 10450270
- Date Published:
- Journal Name:
- 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Multivariate time-series data are frequently observed in critical care settings and are typically characterized by sparsity (missing information) and irregular time intervals. Existing approaches for learning representations in this domain handle these challenges by either aggregation or imputation of values, which in-turn suppresses the fine-grained information and adds undesirable noise/overhead into the machine learning model. To tackle this problem, we propose aSelf-supervisedTransformer forTime-Series (STraTS) model, which overcomes these pitfalls by treating time-series as a set of observation triplets instead of using the standard dense matrix representation. It employs a novel Continuous Value Embedding technique to encode continuous time and variable values without the need for discretization. It is composed of a Transformer component with multi-head attention layers, which enable it to learn contextual triplet embeddings while avoiding the problems of recurrence and vanishing gradients that occur in recurrent architectures. In addition, to tackle the problem of limited availability of labeled data (which is typically observed in many healthcare applications), STraTS utilizes self-supervision by leveraging unlabeled data to learn better representations by using time-series forecasting as an auxiliary proxy task. Experiments on real-world multivariate clinical time-series benchmark datasets demonstrate that STraTS has better prediction performance than state-of-the-art methods for mortality prediction, especially when labeled data is limited. Finally, we also present an interpretable version of STraTS, which can identify important measurements in the time-series data. Our data preprocessing and model implementation codes are available athttps://github.com/sindhura97/STraTS.more » « less
-
null (Ed.)Communication overhead has been identified as the primary factor in overall performance degradation for sparse and irregular problems such as SpMV. Many works have shown significant communication reductions, but only for matrices with specific characteristics and by dramatically reworking the computations. This study develops and evaluates a communication avoiding distributed heterogeneous implementation for strong scaling of SpMV on the Sierra supercomputer architecture. To address the far bigger matrices characteristic of real problems, we utilize a hypergraph partitioning package HYPE to determine workload distribution and reduce inter-node communication. Additionally we investigated the performance impact of performing hypergraph partitioning on scale free graphs which had undergone a vertex delegation pre-processing step. We achieved up to 97% reduction in average message size per process at scale when using the HYPE partitioner. Despite this we show how optimizing SpMV on existing GPU architectures does provide increased computational performance, yet does not address the dominant communication overhead factor at scale despite attempts to avoid communication where possible.more » « less
-
Data shuffling is one of the fundamental building blocks for distributed learning algorithms, that increases the statistical gain for each step of the learning process. In each iteration, different shuffled data points are assigned by a central node to a distributed set of workers to perform local computations, which leads to communication bottlenecks. The focus of this paper is on formalizing and understanding the fundamental information-theoretic tradeoff between storage (per worker) and the worst-case communication overhead for the data shuffling problem. We completely characterize the information theoretic tradeoff for K = 2, and K = 3 workers, for any value of storage capacity, and show that increasing the storage across workers can reduce the communication overhead by leveraging coding. We propose a novel and systematic data delivery and storage update strategy for each data shuffle iteration, which preserves the structural properties of the storage across the workers, and aids in minimizing the communication overhead in subsequent data shuffling iterations.more » « less
-
Li, R; Chowdhury, K (Ed.)Federated Learning (FL) enables model training across decentralized clients while preserving data privacy. However, bandwidth constraints limit the volume of information exchanged, making communication efficiency a critical challenge. In addition, non- IID data distributions require fairness-aware mechanisms to prevent performance degradation for certain clients. Existing sparsification techniques often apply fixed compression ratios uniformly, ignoring variations in client importance and bandwidth. We propose FedBand, a dynamic bandwidth allocation framework that prioritizes clients based on their contribution to the global model. Unlike conventional approaches, FedBand does not enforce uniform client participation in every communication round. Instead, it allocates more bandwidth to clients whose local updates deviate significantly from the global model, enabling them to transmit a greater number of parameters. Clients with less impactful updates contribute proportionally less or may defer transmission, reducing unnecessary overhead while maintaining generalizability. By optimizing the trade-off between communication efficiency and learning performance, FedBand substantially reduces transmission costs while preserving model accuracy. Experiments on non-IID CIFAR-10 and UTMobileNet2021 datasets, demonstrate that FedBand achieves up to 99.81% bandwidth savings per round while maintaining accuracies close to that of an unsparsified model (80% on CIFAR- 10, 95% on UTMobileNet), despite transmitting less than 1% of the model parameters in each round. Moreover, FedBand accelerates convergence by 37.4%, further improving learning efficiency under bandwidth constraints. Mininet emulations further show a 42.6% reduction in communication costs and a 65.57% acceleration in convergence compared to baseline methods, validating its real-world efficiency. These results demonstrate that adaptive bandwidth allocation can significantly enhance the scalability and communication efficiency of federated learning, making it more viable for real- world, bandwidth-constrained networking environments.more » « less
An official website of the United States government

