skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Thursday, March 12 until 2:00 AM ET on Friday, March 13 due to maintenance. We apologize for the inconvenience.


Title: Benchmarking Clustered Federated Learning Algorithms for Next-Point Prediction
Not AvailableThe collection of spatio-temporal mobility data, especially individual trajectory data from location-based services and smart devices, raises significant privacy concerns. However, it is extremely valuable for policy makers in tasks such as nextpoint predictions. Federated Learning aims to address these issues by training models locally on edge devices, thus preserving data privacy. However, the heterogeneous nature of individual trajectory data can make it challenging for a single global model to converge effectively in Federated Learning. One approach to overcome this challenge is Clustered Federated Learning (CFL). In this paper, we investigate to what extent CFL algorithms can improve the accuracy of next-point prediction models. We study four state-of-the-art CFL algorithms on two benchmark datasets, namely GeoLife and MDC and compare the performance of these algorithms in terms of accuracy and APR with state-of-the-art personalized FL models. We show that CFL is a viable option for the next-point prediction task and that it can particularly improve the performance of the model for user groups with high and low entropy. We open source a framework that can help the research community benchmark future personalized FL models against CFL algorithms.  more » « less
Award ID(s):
1853953
PAR ID:
10663542
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
IEEE
Date Published:
Page Range / eLocation ID:
18 to 26
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Over the years, Internet of Things (IoT) devices have become more powerful. This sets forth a unique opportunity to exploit local computing resources to distribute model learning and circumvent the need to share raw data. The underlying distributed and privacy-preserving data analytics approach is often termed federated learning (FL). A key challenge in FL is the heterogeneity across local datasets. In this article, we propose a new personalized FL model, PFL-DA, by adopting the philosophy of domain adaptation. PFL-DA tackles two sources of data heterogeneity at the same time: a covariate and concept shift across local devices. We show, both theoretically and empirically, that PFL-DA overcomes intrinsic shortcomings in state of the art FL approaches and is able to borrow strength across devices while allowing them to retain their own personalized model. As a case study, we apply PFL-DA to distributed desktop 3D printing where we obtain more accurate predictions of printing speed, which can help improve the efficiency of the printers. 
    more » « less
  2. Federated Learning (FL) revolutionizes collaborative machine learning among Internet of Things (IoT) devices by enabling them to train models collectively while preserving data privacy. FL algorithms fall into two primary categories: synchronous and asynchronous. While synchronous FL efficiently handles straggler devices, its convergence speed and model accuracy can be compromised. In contrast, asynchronous FL allows all devices to participate but incurs high communication overhead and potential model staleness. To overcome these limitations, the paper introduces a semi-synchronous FL framework that uses client tiering based on computing and communication latencies. Clients in different tiers upload their local models at distinct frequencies, striking a balance between straggler mitigation and communication costs. Building on this, the paper proposes the Dynamic client clustering, bandwidth allocation, and local training for semi-synchronous Federated learning (DecantFed) algorithm to dynamically optimize client clustering, bandwidth allocation, and local training workloads in order to maximize data sample processing rates in FL. DecantFed dynamically optimizes client clustering, bandwidth allocation, and local training workloads for maximizing data processing rates in FL. It also adapts client learning rates according to their tiers, thus addressing the model staleness issue. Extensive simulations using benchmark datasets like MNIST and CIFAR-10, under both IID and non-IID scenarios, demonstrate DecantFed’s superior performance. It outperforms FedAvg and FedProx in convergence speed and delivers at least a 28% improvement in model accuracy, compared to FedProx. 
    more » « less
  3. Machine Learning (ML) algorithms have shown quite promising applications in smart meter data analytics enabling intelligent energy management systems for the Advanced Metering Infrastructure (AMI). One of the major challenges in developing ML applications for the AMI is to preserve user privacy while allowing active end-users participation. This paper addresses this challenge and proposes Differential Privacy-enabled AMI with Federated Learning (DP-AMI-FL), framework for ML-based applications in the AMI. This framework provides two layers of privacy protection: first, it keeps the raw data of consumers hosting ML applications at edge devices (smart meters) with Federated Learning (FL), and second, it obfuscates the ML models using Differential Privacy (DP) to avoid privacy leakage threats on the models posed by various inference attacks. The framework is evaluated by analyzing its performance on a use case aimed to improve Short-Term Load Forecasting (STLF) for residential consumers having smart meters and home energy management systems. Extensive experiments demonstrate that the framework when used with Long Short-Term Memory (LSTM) recurrent neural network models, achieves high forecasting accuracy while preserving users data privacy. 
    more » « less
  4. Continual Federated Learning (CFL) is a distributed machine learning technique that enables multiple clients to collaboratively train a shared model without sharing their data, while also adapting to new classes without forgetting previously learned ones. This dynamic, adaptive learning process parallels the concept of founda- tion models in FL, where large, pre-trained models are fine-tuned in a decentralized, federated setting. While foundation models in FL leverage pre-trained knowledge as a starting point, CFL continu- ously updates the shared model as new tasks and data distributions emerge, requiring ongoing adaptation. Currently, there are limited evaluation models and metrics in measuring fairness in CFL, and ensuring fairness over time can be challenging as the system evolves. To address this challenge, this article explores temporal fairness in CFL, examining how the fairness of the model can be influenced by the selection and participation of clients over time. Based on individual fairness, we introduce a novel fairness metric that captures temporal aspects of client behavior and evaluates different client selection strategies for their impact on promoting fairness. 
    more » « less
  5. Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without directly sharing the raw data. Nevertheless, recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model. This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL. The existing solutions based on cryptography involve heavy computation and communication overhead and are vulnerable to inference attacks. Although the solution based on Local Differential Privacy (LDP) addresses the above problems, it leads to the low accuracy of the trained model. This paper explores to improve the accuracy of the widely deployed tree boosting algorithms satisfying differential privacy under vertical FL. Specifically, we introduce a framework called OpBoost. Three order-preserving desensitization algorithms satisfying a variant of LDP called distance-based LDP (dLDP) are designed to desensitize the training data. In particular, we optimize the dLDP definition and study efficient sampling distributions to further improve the accuracy and efficiency of the proposed algorithms. The proposed algorithms provide a trade-off between the privacy of pairs with large distance and the utility of desensitized values. Comprehensive evaluations show that OpBoost has a better performance on prediction accuracy of trained models compared with existing LDP approaches on reasonable settings. Our code is open source. 
    more » « less