skip to main content

Title: ADMM-Based Hessian-Free Federated Meta-Learning for Regularized Edge Learning
In order to meet the requirements for safety and latency in many IoT applications, intelligent decisions must be made right here right now at the network edge, calling for edge intelligence. To facilitate fast edge learning, this work advocates a platform-aided federated meta-learning architecture, where a set of edge nodes joint force to learn a meta-model (i.e., model initialization for adaptation in a new learning task) by exploiting the similarity among edge nodes as well as the cloud knowledge transfer. The federated meta-learning problem is cast as a regularized stochastic optimization problem, using Bregman Divergence between the edge model and the cloud pre-trained model as the regularization. We then devise an alternating direction method of multiplier (ADMM) based Hessian-free federated meta-learning algorithm, called ADMM-FedMeta, with inexact Hessian estimation. Further, we analyze the convergence properties and the rapid adaptation performance of ADMM-FedMeta for the general non-convex case. The theoretical results show that under mild conditions, ADMM-FedMeta converges to an $\epsilon$-approximate first-order stationary point after at most $\mathcal{O}(1/\epsilon^2)$ communication rounds. Extensive experimental studies on benchmark datasets demonstrate the effectiveness and efficiency of ADMM-FedMeta, and showcase that ADMM-FedMeta outperforms the existing baselines.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
MobiHoc '21: Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As edge computing complements the cloud to enable computational services right at the network edge, federated learning (FL) can also benefit from close-by edge computing infrastructure. However, most prior works on federated edge learning (FEL) mainly focus on one shared global model during the federated training in edge systems. In a real edge computing scenario, there may co-exist multiple various FL models that are owned by different entities and used by different applications. Simultaneously training these models competes both computing and networking resources in the shared edge system. Therefore, in this work, we consider a multi-model federated edge learning where multiple FEL models are being trained in the edge network and edge servers can act as either parameter servers or workers of these FEL models. We formulate a joint participant selection and learning scheduling problem, which is a non-linear mixed-integer program, aiming to minimize the total cost of all FEL models while satisfying the desired convergence rate of trained FEL models and the constrained edge resources. We then design several algorithms by decoupling the original problem into two or three sub-problems which can be solved respectively and iteratively. Extensive simulations with real-world training datasets and FEL models show that our proposed algorithms can efficiently reduce the average total cost of all FEL models in a multi-model FEL setting compared with existing algorithms. 
    more » « less
  2. The increased ubiquitousness of small smart devices, such as cell- phones, tablets, smart watches and laptops, has led to unique user data, which can be locally processed. The sensors (e.g., microphones and webcam) and improved hardware of the new devices have al- lowed running deep learning models that 20 years ago would have been exclusive to high-end expensive machines. In spite of this progress, state-of-the-art algorithms for facial expression recognition (FER) rely on architectures that cannot be implemented on these devices due to computational and memory constraints. Alternatives involving cloud-based solutions impose privacy barriers that prevent their adoption or user acceptance in wide range of applications. This paper proposes a lightweight model that can run in real-time for image facial expression recognition (IFER) and video facial expression recognition (VFER). The approach relies on a personalization mechanism locally implemented for each subject by fine-tuning a central VFER model with unlabeled videos from a target subject. We train the IFER model to generate pseudo labels and we select the videos with the highest confident predictions to be used for adaptation. The adaptation is performed by implementing a federated learning strategy where the weights of the local model are averaged and used by the central VFER model. We demonstrate that this approach can improve not only the performance on the edge device providing personalized models to the users, but also the central VFER model. We implement a federated learning strategy where the weights of the local models are averaged and used by the central VFER. Within corpus and cross-corpus evaluations on two emotional databases demonstrate that edge models adapted with our personalization strategy achieve up to 13.1% gains in F1-scores. Furthermore, the federated learning implementation improves the mean micro F1-score of the central VFER model by up to 3.4%. The proposed lightweight solution is ideal for interactive user interfaces that preserve the data of the users. 
    more » « less
  3. Federated learning (FL) has been emerging as a new distributed machine learning paradigm recently. Although FL can protect the data privacy of participants by keeping their training data on local devices, there are recent works raising new privacy concerns especially when workers or the parameter server of FL are untrustworthy or malicious. One effective way to solve the problem is using hierarchical federated learning (HFL) where a few middle-layer aggregators (or called group leaders) are used to aggregate local model updates from workers and send group model updates to the parameter server. In this paper, we consider the participant selection problem of HFL in an edge cloud with multiple FL models, where each model needs to select one parameter server, a few group leaders and a certain amount of workers from edge servers to jointly perform HFL. We first formulate this problem as a non-linear integer programming, aiming to minimize the total learning cost of all models while satisfying the constrained edge resources. We then design a three-stage algorithm by decoupling the original problem into three sub-problems and solving them iteratively. Simulations with real-world datasets and FL models confirm that our proposed algorithm can efficiently reduce the average total learning cost in edge cloud compared with existing methods. 
    more » « less
  4. The growing demand of industrial, automotive and service robots presents a challenge to the centralized Cloud Robotics model in terms of privacy, security, latency, bandwidth, and reliability. In this paper, we present a ‘Fog Robotics’ approach to deep robot learning that distributes compute, storage and networking resources between the Cloud and the Edge in a federated manner. Deep models are trained on non-private (public) synthetic images in the Cloud; the models are adapted to the private real images of the environment at the Edge within a trusted network and subsequently, deployed as a service for low-latency and secure inference/prediction for other robots in the network. We apply this approach to surface decluttering, where a mobile robot picks and sorts objects from a cluttered floor by learning a deep object recognition and a grasp planning model. Experiments suggest that Fog Robotics can improve performance by sim-to-real domain adaptation in comparison to exclusively using Cloud or Edge resources, while reducing the inference cycle time by 4 to successfully declutter 86% of objects over 213 attempts. 
    more » « less
  5. There has recently been an increasing interest in computationally-efficient learning methods for resource-constrained applications, e.g., pruning, quantization and channel gating. In this work, we advocate a holistic approach to jointly train the backbone network and the channel gating which can speed up subnet selection for a new task at the resource-limited node. In particular, we develop a federated meta-learning algorithm to jointly train good meta-initializations for both the backbone networks and gating modules, by leveraging the model similarity across learning tasks on different nodes. In this way, the learnt meta-gating module effectively captures the important filters of a good meta-backbone network, and a task-specific conditional channel gated network can be quickly adapted from the meta-initializations using data samples of the new task. The convergence of the proposed federated meta-learning algorithm is established under mild conditions. Experimental results corroborate the effectiveness of our method in comparison to related work. 
    more » « less