This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network. The existing results often assume that agents can observe the global system state, which may be infeasible in large-scale systems due to privacy concerns or communication constraints. In this work, we consider a setting with unknown system models and no centralized coordinator. We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents. Specifically, we assume that agents maintain local estimates of the global state based on their local information and communications with neighbors. At each step, every agent updates its local global state estimation, based on which it solves an approximate Q-factor locally through policy iteration. Assuming a decaying injected excitation noise during the policy evaluation, we prove that the local estimation converges to the true global state, and establish the convergence of the proposed distributed ST-based Q-learning algorithm. The experimental studies corroborate our theoretical results by showing that our proposed method achieves comparable performance with the centralized case.
more »
« less
A Primal Decomposition Approach to Globally Coupled Aggregative Optimization over Networks
We consider a class of multi-agent optimization problems, where each agent has a local objective function that depends on its own decision variables and the aggregate of others, and is willing to cooperate with other agents to minimize the sum of the local objectives. After associating each agent with an auxiliary variable and the related local estimates, we conduct primal decomposition to the globally coupled problem and reformulate it so that it can be solved distributedly. Based on the Douglas-Rachford method, an algorithm is proposed which ensures the exact convergence to a solution of the original problem. The proposed method enjoys desirable scalability by only requiring each agent to keep local estimates whose number grows linearly with the number of its neighbors. We illustrate our proposed algorithm by numerical simulations on a commodity distribution problem over a transport network.
more »
« less
- Award ID(s):
- 2014816
- PAR ID:
- 10316600
- Date Published:
- Journal Name:
- 60th IEEE Conference on Decision and Control (CDC)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper studies a distributed reinforcement learning problem in which a network of multiple agents aim to cooperatively maximize the globally averaged return through communication with only local neighbors. An asynchronous multi-agent actor-critic algorithm is proposed for possibly unidirectional communication relationships depicted by a directed graph. Each agent independently updates its variables at “event times” determined by its own clock. It is not assumed that the agents’ clocks are synchronized or that the event times are evenly spaced. It is shown that the algorithm can solve the problem for any strongly connected graph in the presence of communication and computation delays.more » « less
-
We consider an in-network optimal resource allocation problem in which a group of agents interacting over a connected graph want to meet a demand while minimizing their collective cost. The contribution of this paper is to design a distributed continuous-time algorithm for this problem inspired by a recently developed first-order transformed primal-dual method. The solution applies to cluster-based setting where each agent may have a set of subagents, and its local cost is the sum of the cost of these subagents. The proposed algorithm guarantees an exponential convergence for strongly convex costs and asymptotic convergence for convex costs. Exponential convergence when the local cost functions are strongly convex is achieved even when the local gradients are only locally Lipschitz. For convex local cost functions, our algorithm guarantees asymptotic convergence to a point in the minimizer set. Through numerical examples, we show that our proposed algorithm delivers a faster convergence compared to existing distributed resource allocation algorithms.more » « less
-
The inverse problem of recovery of a potential on a quantum tree graph from the Weyl matrix given at a number of points is considered. A method for its numerical solution is proposed. The overall approach is based on the leaf peeling method combined with Neumann series of Bessel functions (NSBF) representations for solutions of Sturm–Liouville equations. In each step, the solution of the arising inverse problems reduces to dealing with the NSBF coefficients. The leaf peeling method allows one to localize the general inverse problem to local problems on sheaves, while the approach based on the NSBF representations leads to splitting the local problems into two‐spectrum inverse problems on separate edges and reduces them to systems of linear algebraic equations for the NSBF coefficients. Moreover, the potential on each edge is recovered from the very first NSBF coefficient. The proposed method leads to an efficient numerical algorithm that is illustrated by numerical tests.more » « less
-
In this paper, we study the problem of privacy preservation of the continuous-time Laplacian static average consensus algorithm using additive perturbation signals. We consider this problem over a strongly connected and weight-balanced digraph. Starting from a local reference value, in static average consensus algorithm each agent constantly communicates with its neighboring agents to update its local state to compute the average of the reference values across the network. Since every agent transmits its local reference value to its in-neighbors, the reference value of the agents are trivially disclosed. In this paper, we investigate the possibility of preserving the privacy of the reference value of the agents by adding admissible perturbation signals to the local dynamics and the transmitted out signals of the agents. Admissible additive perturbation signals are those signals that do not perturb the final convergence point of the algorithm from the average of the reference values of the agents. Our results show that if an adversarial agent has access to the output of another agent and all the input signals transmitted to that agent, the adversary can discover the private reference value of that agent, regardless of the perturbation signals. Otherwise, the privacy of the agent can be preserved. We demonstrate our results through a numerical example.more » « less
An official website of the United States government

