skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Federated Graph Machine Learning: A Survey of Concepts, Techniques, and Applications
Graph machine learning has gained great attention in both academia and industry recently. Most of the graph machine learning models, such as Graph Neural Networks (GNNs), are trained over massive graph data. However, in many realworld scenarios, such as hospitalization prediction in healthcare systems, the graph data is usually stored at multiple data owners and cannot be directly accessed by any other parties due to privacy concerns and regulation restrictions. Federated Graph Machine Learning (FGML) is a promising solution to tackle this challenge by training graph machine learning models in a federated manner. In this survey, we conduct a comprehensive review of the literature in FGML. Specifically, we first provide a new taxonomy to divide the existing problems in FGML into two settings, namely, FL with structured data and structured FL. Then, we review the mainstream techniques in each setting and elaborate on how they address the challenges under FGML. In addition, we summarize the real-world applications of FGML from different domains and introduce open graph datasets and platforms adopted in FGML. Finally, we present several limitations in the existing studies with promising research directions in this field.  more » « less
Award ID(s):
2223769 2228534 2154962 2144209 2006844
PAR ID:
10414114
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ACM SIGKDD Explorations Newsletter
Volume:
24
Issue:
2
ISSN:
1931-0145
Page Range / eLocation ID:
32 to 47
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Federated learning involves training statistical models over edge devices such as mobile phones such that the training data are kept local. Federated Learning (FL) can serve as an ideal candidate for training spatial temporal models that rely on heterogeneous and potentially massive numbers of participants while preserving the privacy of highly sensitive location data. However, there are unique challenges involved with transitioning existing spatial temporal models to federated learning. In this survey article, we review the existing literature that has proposed FL-based models for predicting human mobility, traffic prediction, community detection, location-based recommendation systems, and other spatial-temporal tasks. We describe the metrics and datasets these works have been using and create a baseline of these approaches in comparison to the centralized settings. Finally, we discuss the challenges of applying spatial-temporal models in a decentralized setting and by highlighting the gaps in the literature we provide a road map and opportunities for the research community. 
    more » « less
  2. This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) in biomedical research. Foundation models such as ChatGPT, LLaMa, and CLIP, which are trained on vast datasets through methods including unsupervised pretraining, self-supervised learning, instructed fine-tuning, and reinforcement learning from human feedback, represent significant advancements in machine learning. These models, with their ability to generate coherent text and realistic images, are crucial for biomedical applications that require processing diverse data forms such as clinical reports, diagnostic images, and multimodal patient interactions. The incorporation of FL with these sophisticated models presents a promising strategy to harness their analytical power while safeguarding the privacy of sensitive medical data. This approach not only enhances the capabilities of FMs in medical diagnostics and personalized treatment but also addresses critical concerns about data privacy and security in healthcare. This survey reviews the current applications of FMs in federated settings, underscores the challenges, and identifies future research directions including scaling FMs, managing data diversity, and enhancing communication efficiency within FL frameworks. The objective is to encourage further research into the combined potential of FMs and FL, laying the groundwork for healthcare innovations. 
    more » « less
  3. Federated Learning (FL) enables edge devices or clients to collaboratively train machine learning (ML) models without sharing their private data. Much of the existing work in FL focuses on efficiently learning a model for a single task. In this paper, we study simultaneous training of multiple FL models using a common set of clients. The few existing simultaneous training methods employ synchronous aggregation of client updates, which can cause significant delays because large models and/or slow clients can bottleneck the aggregation. On the other hand, a naive asynchronous aggregation is adversely affected by stale client updates. We propose FedAST, a buffered asynchronous federated simultaneous training algorithm that overcomes bottlenecks from slow models and adaptively allocates client resources across heterogeneous tasks. We provide theoretical convergence guarantees of FedAST for smooth non-convex objective functions. Extensive experiments over multiple real-world datasets demonstrate that our proposed method outperforms existing simultaneous FL approaches, achieving up to 46.0% reduction in time to train multiple tasks to completion. 
    more » « less
  4. Federated learning (FL) is a promising technique for decentralized privacy-preserving Machine Learning (ML) with a diverse pool of participating devices with varying device capabilities. However, existing approaches to handle such heterogeneous environments do not consider “fairness” in model aggregation, resulting in significant performance variation among devices. Meanwhile, prior works on FL fairness remain hardware-oblivious and cannot be applied directly without severe performance penalties. To address this issue, we propose a novel hardware-sensitive FL method called\(\mathsf {FairHetero}\)that promotes fairness among heterogeneous federated clients. Our approach offers tunable fairness within a group of devices with the same ML architecture as well as across different groups with heterogeneous models. Our evaluation underMNIST,FEMNIST,CIFAR10, andSHAKESPEAREdatasets demonstrates that\(\mathsf {FairHetero}\)can reduce variance among participating clients’ test loss compared to the existing state-of-the-art techniques, resulting in increased overall performance. 
    more » « less
  5. Federated learning (FL) has been emerging as a new distributed machine learning paradigm recently. Although FL can protect the data privacy of participants by keeping their training data on local devices, there are recent works raising new privacy concerns especially when workers or the parameter server of FL are untrustworthy or malicious. One effective way to solve the problem is using hierarchical federated learning (HFL) where a few middle-layer aggregators (or called group leaders) are used to aggregate local model updates from workers and send group model updates to the parameter server. In this paper, we consider the participant selection problem of HFL in an edge cloud with multiple FL models, where each model needs to select one parameter server, a few group leaders and a certain amount of workers from edge servers to jointly perform HFL. We first formulate this problem as a non-linear integer programming, aiming to minimize the total learning cost of all models while satisfying the constrained edge resources. We then design a three-stage algorithm by decoupling the original problem into three sub-problems and solving them iteratively. Simulations with real-world datasets and FL models confirm that our proposed algorithm can efficiently reduce the average total learning cost in edge cloud compared with existing methods. 
    more » « less