skip to main content

This content will become publicly available on November 29, 2023

Title: Federated Graph Machine Learning: A Survey of Concepts, Techniques, and Applications
Graph machine learning has gained great attention in both academia and industry recently. Most of the graph machine learning models, such as Graph Neural Networks (GNNs), are trained over massive graph data. However, in many realworld scenarios, such as hospitalization prediction in healthcare systems, the graph data is usually stored at multiple data owners and cannot be directly accessed by any other parties due to privacy concerns and regulation restrictions. Federated Graph Machine Learning (FGML) is a promising solution to tackle this challenge by training graph machine learning models in a federated manner. In this survey, we conduct a comprehensive review of the literature in FGML. Specifically, we first provide a new taxonomy to divide the existing problems in FGML into two settings, namely, FL with structured data and structured FL. Then, we review the mainstream techniques in each setting and elaborate on how they address the challenges under FGML. In addition, we summarize the real-world applications of FGML from different domains and introduce open graph datasets and platforms adopted in FGML. Finally, we present several limitations in the existing studies with promising research directions in this field.  more » « less
Award ID(s):
2223769 2228534 2154962 2144209 2006844
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ACM SIGKDD Explorations Newsletter
Page Range / eLocation ID:
32 to 47
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Federated learning (FL) is a highly pursued machine learning technique that can train a model centrally while keeping data distributed. Distributed computation makes FL attractive for bandwidth limited applications especially in wireless communications. There can be a large number of distributed edge devices connected to a central parameter server (PS) and iteratively download/upload data from/to the PS. Due to limited bandwidth, only a subset of connected devices can be scheduled in each round. There are usually millions of parameters in the state-of-art machine learning models such as deep learning, resulting in a high computation complexity as well as a high communication burden on collecting/distributing data for training. To improve communication efficiency and make the training model converge faster, we propose a new scheduling policy and power allocation scheme using non-orthogonal multiple access (NOMA) settings to maximize the weighted sum data rate under practical constraints during the entire learning process. NOMA allows multiple users to transmit on the same channel simultaneously. The user scheduling problem is transformed into a maximum-weight independent set problem that can be solved using graph theory. Simulation results show that the proposed scheduling and power allocation scheme can help achieve a higher FL testing accuracy in NOMA based wireless networks than other existing schemes within the same learning time. 
    more » « less
  2. Federated Learning (FL) enables multiple clients to collaboratively learn a machine learning model without exchanging their own local data. In this way, the server can exploit the computational power of all clients and train the model on a larger set of data samples among all clients. Although such a mechanism is proven to be effective in various fields, existing works generally assume that each client preserves sufficient data for training. In practice, however, certain clients may only contain a limited number of samples (i.e., few-shot samples). For example, the available photo data taken by a specific user with a new mobile device is relatively rare. In this scenario, existing FL efforts typically encounter a significant performance drop on these clients. Therefore, it is urgent to develop a few-shot model that can generalize to clients with limited data under the FL scenario. In this paper, we refer to this novel problem as federated few-shot learning. Nevertheless, the problem remains challenging due to two major reasons: the global data variance among clients (i.e., the difference in data distributions among clients) and the local data insufficiency in each client (i.e., the lack of adequate local data for training). To overcome these two challenges, we propose a novel federated few-shot learning framework with two separately updated models and dedicated training strategies to reduce the adverse impact of global data variance and local data insufficiency. Extensive experiments on four prevalent datasets that cover news articles and images validate the effectiveness of our framework compared with the state-of-the-art baselines. 
    more » « less
  3. Machine Learning (ML) algorithms have shown quite promising applications in smart meter data analytics enabling intelligent energy management systems for the Advanced Metering Infrastructure (AMI). One of the major challenges in developing ML applications for the AMI is to preserve user privacy while allowing active end-users participation. This paper addresses this challenge and proposes Differential Privacy-enabled AMI with Federated Learning (DP-AMI-FL), framework for ML-based applications in the AMI. This framework provides two layers of privacy protection: first, it keeps the raw data of consumers hosting ML applications at edge devices (smart meters) with Federated Learning (FL), and second, it obfuscates the ML models using Differential Privacy (DP) to avoid privacy leakage threats on the models posed by various inference attacks. The framework is evaluated by analyzing its performance on a use case aimed to improve Short-Term Load Forecasting (STLF) for residential consumers having smart meters and home energy management systems. Extensive experiments demonstrate that the framework when used with Long Short-Term Memory (LSTM) recurrent neural network models, achieves high forecasting accuracy while preserving users data privacy. 
    more » « less
  4. Federated learning (FL) has attracted increasing attention as a promising technique to drive a vast number of edge devices with artificial intelligence. However, it is very challenging to guarantee the efficiency of a FL system in practice due to the heterogeneous computation resources on different devices. To improve the efficiency of FL systems in the real world, asynchronous FL (AFL) and semi-asynchronous FL (SAFL) methods are proposed such that the server does not need to wait for stragglers. However, existing AFL and SAFL systems suffer from poor accuracy and low efficiency in realistic settings where the data is non-IID distributed across devices and the on-device resources are extremely heterogeneous. In this work, we propose FedSEA - a semi-asynchronous FL framework for extremely heterogeneous devices. We theoretically disclose that the unbalanced aggregation frequency is a root cause of accuracy drop in SAFL. Based on this analysis, we design a training configuration scheduler to balance the aggregation frequency of devices such that the accuracy can be improved. To improve the efficiency of the system in realistic settings where the devices have dynamic on-device resource availability, we design a scheduler that can efficiently predict the arriving time of local updates from devices and adjust the synchronization time point according to the devices' predicted arriving time. We also consider the extremely heterogeneous settings where there exist extremely lagging devices that take hundreds of times as long as the training time of the other devices. In the real world, there might be even some extreme stragglers which are not capable of training the global model. To enable these devices to join in training without impairing the systematic efficiency, Fed-SEA enables these extreme stragglers to conduct local training on much smaller models. Our experiments show that compared with status quo approaches, FedSEA improves the inference accuracy by 44.34% and reduces the systematic time cost and local training time cost by 87.02× and 792.9×. FedSEA also reduces the energy consumption of the devices with extremely limited resources by 752.9×. 
    more » « less
  5. Background The proliferation of mobile health (mHealth) applications is partly driven by the advancements in sensing and communication technologies, as well as the integration of artificial intelligence techniques. Data collected from mHealth applications, for example, on sensor devices carried by patients, can be mined and analyzed using artificial intelligence–based solutions to facilitate remote and (near) real-time decision-making in health care settings. However, such data often sit in data silos, and patients are often concerned about the privacy implications of sharing their raw data. Federated learning (FL) is a potential solution, as it allows multiple data owners to collaboratively train a machine learning model without requiring access to each other’s raw data. Objective The goal of this scoping review is to gain an understanding of FL and its potential in dealing with sensitive and heterogeneous data in mHealth applications. Through this review, various stakeholders, such as health care providers, practitioners, and policy makers, can gain insight into the limitations and challenges associated with using FL in mHealth and make informed decisions when considering implementing FL-based solutions. Methods We conducted a scoping review following the guidelines of PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews). We searched 7 commonly used databases. The included studies were analyzed and summarized to identify the possible real-world applications and associated challenges of using FL in mHealth settings. Results A total of 1095 articles were retrieved during the database search, and 26 articles that met the inclusion criteria were included in the review. The analysis of these articles revealed 2 main application areas for FL in mHealth, that is, remote monitoring and diagnostic and treatment support. More specifically, FL was found to be commonly used for monitoring self-care ability, health status, and disease progression, as well as in diagnosis and treatment support of diseases. The review also identified several challenges (eg, expensive communication, statistical heterogeneity, and system heterogeneity) and potential solutions (eg, compression schemes, model personalization, and active sampling). Conclusions This scoping review has highlighted the potential of FL as a privacy-preserving approach in mHealth applications and identified the technical limitations associated with its use. The challenges and opportunities outlined in this review can inform the research agenda for future studies in this field, to overcome these limitations and further advance the use of FL in mHealth. 
    more » « less