skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Federated learning with matched averaging
Federated learning allows edge devices to collaboratively learn a shared model while keeping the training data on device, decoupling the ability to do model training from the need to store the data in the cloud. We propose the Federated matched averaging (FedMA) algorithm designed for federated learning of modern neural network architectures e.g. convolutional neural networks (CNNs) and LSTMs. FedMA constructs the shared global model in a layer-wise manner by matching and averaging hidden elements (i.e. channels for convolution layers; hidden states for LSTM; neurons for fully connected layers) with similar feature extraction signatures. Our experiments indicate that FedMA not only outperforms popular state-of-the-art federated learning algorithms on deep CNN and LSTM architectures trained on real world datasets, but also reduces the overall communication burden.  more » « less
Award ID(s):
1844951
PAR ID:
10149505
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Conference on Learning Representations (ICLR)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Federated learning allows edge devices to collaboratively learn a shared model while keeping the training data on device, decoupling the ability to do model training from the need to store the data in the cloud. We propose the Federated matched averaging (FedMA) algorithm designed for federated learning of modern neural network architectures e.g. convolutional neural networks (CNNs) and LSTMs. FedMA constructs the shared global model in a layer-wise manner by matching and averaging hidden elements (i.e. channels for convolution layers; hidden states for LSTM; neurons for fully connected layers) with similar feature extraction signatures. Our experiments indicate that FedMA not only outperforms popular state-of-the-art federated learning algorithms on deep CNN and LSTM architectures trained on real world datasets, but also reduces the overall communication burden. 
    more » « less
  2. Abstract Modern machine learning (ML) and deep learning (DL) techniques using high-dimensional data representations have helped accelerate the materials discovery process by efficiently detecting hidden patterns in existing datasets and linking input representations to output properties for a better understanding of the scientific phenomenon. While a deep neural network comprised of fully connected layers has been widely used for materials property prediction, simply creating a deeper model with a large number of layers often faces with vanishing gradient problem, causing a degradation in the performance, thereby limiting usage. In this paper, we study and propose architectural principles to address the question of improving the performance of model training and inference under fixed parametric constraints. Here, we present a general deep-learning framework based on branched residual learning (BRNet) with fully connected layers that can work with any numerical vector-based representation as input to build accurate models to predict materials properties. We perform model training for materials properties using numerical vectors representing different composition-based attributes of the respective materials and compare the performance of the proposed models against traditional ML and existing DL architectures. We find that the proposed models are significantly more accurate than the ML/DL models for all data sizes by using different composition-based attributes as input. Further, branched learning requires fewer parameters and results in faster model training due to better convergence during the training phase than existing neural networks, thereby efficiently building accurate models for predicting materials properties. 
    more » « less
  3. Abstract Deep learning has impacted defect prediction in additive manufacturing (AM), which is important to ensure process stability and part quality. However, its success depends on extensive training, requiring large, homogeneous datasets—remaining a challenge for the AM industry, particularly for small- and medium-sized enterprises (SMEs). The unique and varied characteristics of AM parts, along with the limited resources of SMEs, hamper data collection, posing difficulties in the independent training of deep learning models. Addressing these concerns requires enabling knowledge sharing from the similarities in the physics of the AM process and defect formation mechanisms while carefully handling privacy concerns. Federated learning (FL) offers a solution to allow collaborative model training across multiple entities without sharing local data. This article introduces an FL framework to predict section-wise heat emission during laser powder bed fusion (LPBF), a vital process signature. It incorporates a customized long short-term memory (LSTM) model for each client, capturing the dynamic AM process's time-series properties without sharing sensitive information. Three advanced FL algorithms are integrated—federated averaging (FedAvg), FedProx, and FedAvgM—to aggregate model weights rather than raw datasets. Experiments demonstrate that the FL framework ensures convergence and maintains prediction performance comparable to individually trained models. This work demonstrates the potential of FL-enabled AM modeling and prediction where SMEs can improve their product quality without compromising data privacy. 
    more » « less
  4. Neural Architecture Search (NAS) for Federated Learning (FL) is an emerging field. It automates the design and training of Deep Neural Networks (DNNs) when data cannot be centralized due to privacy, communication costs, or regulatory restrictions. Recent federated NAS methods not only reduce manual effort but also help achieve higher accuracy than traditional FL methods like FedAvg. Despite the success, existing federated NAS methods still fall short in satisfying diverse deployment targets common in on-device inference including hardware, latency budgets, or variable battery levels. Most federated NAS methods search for only a limited range of neuro-architectural patterns, repeat them in a DNN, thereby restricting achievable performance. Moreover, these methods incur prohibitive training costs to satisfy deployment targets. They perform the training and search of DNN architectures repeatedly for each case. SuperFedNAS addresses these challenges by decoupling the training and search in federated NAS. SuperFedNAS co-trains a large number of diverse DNN architectures contained inside one supernet in the FL setting. Post-training, clients perform NAS locally to find specialized DNNs by extracting different parts of the trained supernet with no additional training. SuperFedNAS takes O(1) (instead of O(N)) cost to find specialized DNN architectures in FL for any N deployment targets. As part of SuperFedNAS, we introduce MaxNet—a novel FL training algorithm that performs multi-objective federated optimization of DNN architectures (≈5∗108) under different client data distributions. SuperFedNAS achieves upto 37.7\% higher accuracy or upto 8.13x reduction in MACs than existing federated NAS methods. 
    more » « less
  5. null (Ed.)
    Federated learning enables thousands of participants to construct a deep learning model without sharing their private training data with each other. For example, multiple smartphones can jointly train a next-word predictor for keyboards without revealing what individual users type. We demonstrate that any participant in federated learning can introduce hidden backdoor functionality into the joint global model, e.g., to ensure that an image classifier assigns an attacker-chosen label to images with certain features, or that a word predictor completes certain sentences with an attacker-chosen word. We design and evaluate a new model-poisoning methodology based on model replacement. An attacker selected in a single round of federated learning can cause the global model to immediately reach 100% accuracy on the backdoor task. We evaluate the attack under different assumptions for the standard federated-learning tasks and show that it greatly outperforms data poisoning. Our generic constrain-and-scale technique also evades anomaly detection-based defenses by incorporating the evasion into the attacker's loss function during training. 
    more » « less