We study multiagent reinforcement learning (MARL) in a stochastic network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are static, fixed and local, e.g., between neighbors in a fixed, timeinvariant underlying graph. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be nonlocal and stochastic, and provide a finitetime error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finitetime convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of MARL in networked systems.
more »
« less
Scalable Reinforcement Learning for Multiagent Networked Systems
We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the stateaction space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a scalable actor critic (SAC) framework that exploits the network structure and finds a localized policy that is an [Formula: see text]approximation of a stationary point of the objective for some [Formula: see text], with complexity that scales with the local stateaction space size of the largest [Formula: see text]hop neighborhood of the network. We illustrate our model and approach using examples from wireless communication, epidemics, and traffic.
more »
« less
 NSFPAR ID:
 10324690
 Date Published:
 Journal Name:
 Operations Research
 ISSN:
 0030364X
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We propose subsampling as a unified algorithmic technique for submodular maximization in centralized and online settings. The idea is simple: independently sample elements from the ground set and use simple combinatorial techniques (such as greedy or local search) on these sampled elements. We show that this approach leads to optimal/stateoftheart results despite being much simpler than existing methods. In the usual offline setting, we present SampleGreedy, which obtains a [Formula: see text]approximation for maximizing a submodular function subject to a pextendible system using [Formula: see text] evaluation and feasibility queries, where k is the size of the largest feasible set. The approximation ratio improves to p + 1 and p for monotone submodular and linear objectives, respectively. In the streaming setting, we present SampleStreaming, which obtains a [Formula: see text]approximation for maximizing a submodular function subject to a pmatchoid using O(k) memory and [Formula: see text] evaluation and feasibility queries per element, and m is the number of matroids defining the pmatchoid. The approximation ratio improves to 4p for monotone submodular objectives. We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.more » « less

In this paper, we consider federated Qlearning, which aims to learn an optimal Qfunction by periodically aggregating local Qestimates trained on local data alone. Focusing on infinitehorizon tabular Markov decision processes, we provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Qlearning. In both cases, our bounds exhibit a linear speedup with respect to the number of agents and sharper dependencies on other salient problem parameters. Moreover, existing approaches to federated Qlearning adopt an equallyweighted averaging of local Qestimates, which can be highly suboptimal in the asynchronous setting since the local trajectories can be highly heterogeneous due to different local behavior policies. Existing sample complexity scales inverse proportionally to the minimum entry of the stationary stateaction occupancy distributions over all agents, requiring that every agent covers the entire stateaction space. Instead, we propose a novel importance averaging algorithm, giving larger weights to more frequently visited stateaction pairs. The improved sample complexity scales inverse proportionally to the minimum entry of the average stationary stateaction occupancy distribution of all agents, thus only requiring the agents collectively cover the entire stateaction space, unveiling the blessing of heterogeneity.more » « less

null (Ed.)This work concerns the asymptotic behavior of solutions to a (strictly) subcritical fluid model for a data communication network, where file sizes are generally distributed and the network operates under a fair bandwidthsharing policy. Here we consider fair bandwidthsharing policies that are a slight generalization of the [Formula: see text]fair policies introduced by Mo and Walrand [Mo J, Walrand J (2000) Fair endtoend windowbased congestion control. IEEE/ACM Trans. Networks 8(5):556–567.]. Since the year 2000, it has been a standing problem to prove stability of the data communications network model of Massoulié and Roberts [Massoulié L, Roberts J (2000) Bandwidth sharing and admission control for elastic traffic. Telecommunication Systems 15(1):185–201.], with general file sizes and operating under fair bandwidth sharing policies, when the offered load is less than capacity (subcritical conditions). A crucial step in an approach to this problem is to prove stability of subcritical fluid model solutions. In 2012, Paganini et al. [Paganini F, Tang A, Ferragut A, Andrew LLH (2012) Network stability under alpha fair bandwidth allocation with general file size distribution. IEEE Trans. Automatic Control 57(3):579–591.] introduced a Lyapunov function for this purpose and gave an argument, assuming that fluid model solutions are sufficiently smooth in time and space that they are strong solutions of a partial differential equation and assuming that no fluid level on any route touches zero before all route levels reach zero. The aim of the current paper is to prove stability of the subcritical fluid model without these strong assumptions. Starting with a slight generalization of the Lyapunov function proposed by Paganini et al., assuming that each component of the initial state of a measurevalued fluid model solution, as well as the file size distributions, have no atoms and have finite first moments, we prove absolute continuity in time of the composition of the Lyapunov function with any subcritical fluid model solution and describe the associated density. We use this to prove that the Lyapunov function composed with such a subcritical fluid model solution converges to zero as time goes to infinity. This implies that each component of the measurevalued fluid model solution converges vaguely on [Formula: see text] to the zero measure as time goes to infinity. Under the further assumption that the file size distributions have finite pth moments for some p > 1 and that each component of the initial state of the fluid model solution has finite pth moment, it is proved that the fluid model solution reaches the measure with all components equal to the zero measure in finite time and that the time to reach this zero state has a uniform bound for all fluid model solutions having a uniform bound on the initial total mass and the pth moment of each component of the initial state. In contrast to the analysis of Paganini et al., we do not need their strong smoothness assumptions on fluid model solutions and we rigorously treat the realistic, but singular situation, where the fluid level on some routes becomes zero, whereas other route levels remain positive.more » « less

This work aims to prove a Hardytype inequality and a trace theorem for a class of function spaces on smooth domains with a nonlocal character. Functions in these spaces are allowed to be as rough as an [Formula: see text]function inside the domain of definition but as smooth as a [Formula: see text]function near the boundary. This feature is captured by a norm that is characterized by a nonlocal interaction kernel defined heterogeneously with a special localization feature on the boundary. Thus, the trace theorem we obtain here can be viewed as an improvement and refinement of the classical trace theorem for fractional Sobolev spaces [Formula: see text]. Similarly, the Hardytype inequalities we establish for functions that vanish on the boundary show that functions in this generalized space have the same decay rate to the boundary as functions in the smaller space [Formula: see text]. The results we prove extend existing results shown in the Hilbert space setting with p = 2. A Poincarétype inequality we establish for the function space under consideration together with the new trace theorem allows formulating and proving wellposedness of a nonlinear nonlocal variational problem with conventional local boundary condition.more » « less