skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates
Federated reinforcement learning (FedRL) enables agents to collaboratively train a global policy without sharing their individual data. However, high communication overhead remains a critical bottleneck, particularly for natural policy gradient (NPG) methods, which are second-order. To address this issue, we propose the FedNPG-ADMM framework, which leverages the alternating direction method of multipliers (ADMM) to approximate global NPG directions efficiently. We theoretically demonstrate that using ADMM-based gradient updates reduces communication complexity from $O(d^2)$ to $O(d)$ at each iteration, where $$d$$ is the number of model parameters. Furthermore, we show that achieving an $$\epsilon$$-error stationary convergence requires $$O(\frac{1}{(1-\gamma)^2-\epsilon})$$ iterations for discount factor $$\gamma$$, demonstrating that FedNPG-ADMM maintains the same convergence rate as standard FedNPG. Through evaluation of the proposed algorithms in MuJoCo environments, we demonstrate that FedNPG-ADMM maintains the reward performance of standard FedNPG, and that its convergence rate improves when the number of federated agents increases.  more » « less
Award ID(s):
2231350
PAR ID:
10543043
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S
Publisher / Repository:
Neural Information Processing Systems
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In order to meet the requirements for safety and latency in many IoT applications, intelligent decisions must be made right here right now at the network edge, calling for edge intelligence. To facilitate fast edge learning, this work advocates a platform-aided federated meta-learning architecture, where a set of edge nodes joint force to learn a meta-model (i.e., model initialization for adaptation in a new learning task) by exploiting the similarity among edge nodes as well as the cloud knowledge transfer. The federated meta-learning problem is cast as a regularized stochastic optimization problem, using Bregman Divergence between the edge model and the cloud pre-trained model as the regularization. We then devise an alternating direction method of multiplier (ADMM) based Hessian-free federated meta-learning algorithm, called ADMM-FedMeta, with inexact Hessian estimation. Further, we analyze the convergence properties and the rapid adaptation performance of ADMM-FedMeta for the general non-convex case. The theoretical results show that under mild conditions, ADMM-FedMeta converges to an $$\epsilon$$-approximate first-order stationary point after at most $$\mathcal{O}(1/\epsilon^2)$$ communication rounds. Extensive experimental studies on benchmark datasets demonstrate the effectiveness and efficiency of ADMM-FedMeta, and showcase that ADMM-FedMeta outperforms the existing baselines. 
    more » « less
  2. Abstract We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations. For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Kakade and Morimura and co-authors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the inverse penalization strength. 
    more » « less
  3. Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central location can be prohibitively expensive in terms of the communication cost, and it can also compromise the privacy of each agent’s local behavior policy. In this paper, we consider a federated reinforcement learning framework where multiple agents collaboratively learn a global model, without sharing their individual data and policies. Each agent maintains a local copy of the model and updates it using locally sampled data. Although having N agents enables the sampling of N times more data, it is not clear if it leads to proportional convergence speedup. We propose federated versions of on-policy TD, off-policy TD and Q-learning, and analyze their convergence. For all these algorithms, to the best of our knowledge, we are the first to consider Markovian noise and multiple local updates, and prove a linear convergence speedup with respect to the number of agents. To obtain these results, we show that federated TD and Q-learning are special cases of a general framework for federated stochastic approximation with Markovian noise, and we leverage this framework to provide a unified convergence analysis that applies to all the algorithms. 
    more » « less
  4. Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central location can be prohibitively expensive in terms of the communication cost, and it can also compromise the privacy of each agent’s local behavior policy. In this paper, we consider a federated reinforcement learning framework where multiple agents collaboratively learn a global model, without sharing their individual data and policies. Each agent maintains a local copy of the model and updates it using locally sampled data. Although having N agents enables the sampling of N times more data, it is not clear if it leads to proportional convergence speedup. We propose federated versions of on-policy TD, off-policy TD and Q-learning, and analyze their convergence. For all these algorithms, to the best of our knowledge, we are the first to consider Markovian noise and multiple local updates, and prove a linear convergence speedup with respect to the number of agents. To obtain these results, we show that federated TD and Q-learning are special cases of a general framework for federated stochastic approximation with Markovian noise, and we leverage this framework to provide a unified convergence analysis that applies to all the algorithms. 
    more » « less
  5. We explore a Federated Reinforcement Learning (FRL) problem where agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maximizes the average performance across all potentially completely different environments, we propose two algorithms: FedSVRPG-M and FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge to a stationary point of the average performance function, regardless of the magnitude of environment heterogeneity. Furthermore, by incorporating the benefits of variance-reduction techniques or Hessian approximation, both algorithms achieve state-of-the-art convergence results, characterized by a sample complexity of $$O(\epsilon^{-\frac{3}{2}}/N)$$. Notably, our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy. 
    more » « less