skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Joint Detection and Communication over Type-Sensitive Networks
Due to the difficulty of decentralized inference with conditional dependent observations, and motivated by large-scale heterogeneous networks, we formulate a framework for decentralized detection with coupled observations. Each agent has a state, and the empirical distribution of all agents’ states or the type of network dictates the individual agents’ behavior. In particular, agents’ observations depend on both the underlying hypothesis as well as the empirical distribution of the agents’ states. Hence, our framework captures a high degree of coupling, in that an individual agent’s behavior depends on both the underlying hypothesis and the behavior of all other agents in the network. Considering this framework, the method of types, and a series of equicontinuity arguments, we derive the error exponent for the case in which all agents are identical and show that this error exponent depends on only a single empirical distribution. The analysis is extended to the multi-class case, and numerical results with state-dependent agent signaling and state-dependent channels highlight the utility of the proposed framework for analysis of highly coupled environments.  more » « less
Award ID(s):
1817200
PAR ID:
10479709
Author(s) / Creator(s):
;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Entropy
Volume:
25
Issue:
9
ISSN:
1099-4300
Page Range / eLocation ID:
1313
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. One of the challenges for multiagent reinforcement learning (MARL) is designing efficient learning algorithms for a large system in which each agent has only limited or partial information of the entire system. Whereas exciting progress has been made to analyze decentralized MARL with the network of agents for social networks and team video games, little is known theoretically for decentralized MARL with the network of states for modeling self-driving vehicles, ride-sharing, and data and traffic routing. This paper proposes a framework of localized training and decentralized execution to study MARL with the network of states. Localized training means that agents only need to collect local information in their neighboring states during the training phase; decentralized execution implies that agents can execute afterward the learned decentralized policies, which depend only on agents’ current states. The theoretical analysis consists of three key components: the first is the reformulation of the MARL system as a networked Markov decision process with teams of agents, enabling updating the associated team Q-function in a localized fashion; the second is the Bellman equation for the value function and the appropriate Q-function on the probability measure space; and the third is the exponential decay property of the team Q-function, facilitating its approximation with efficient sample efficiency and controllable error. The theoretical analysis paves the way for a new algorithm LTDE-Neural-AC, in which the actor–critic approach with overparameterized neural networks is proposed. The convergence and sample complexity are established and shown to be scalable with respect to the sizes of both agents and states. To the best of our knowledge, this is the first neural network–based MARL algorithm with network structure and provable convergence guarantee. 
    more » « less
  2. null (Ed.)
    This paper considers a class of linear-quadratic-Gaussian (LQG) mean-field games (MFGs) with partial observation structure for individual agents. Unlike other literature, there are some special features in our formulation. First, the individual state is driven by some common-noise due to the external factor and the state-average thus becomes a random process instead of a deterministic quantity. Second, the sensor function of individual observation depends on state-average thus the agents are coupled in triple manner: not only in their states and cost functionals, but also through their observation mechanism. The decentralized strategies for individual agents are derived by the Kalman filtering and separation principle. The consistency condition is obtained which is equivalent to the wellposedness of some forward-backward stochastic differential equation (FBSDE) driven by common noise. Finally, the related ϵ-Nash equilibrium property is verified. 
    more » « less
  3. The purpose of this paper is to describe the feedback particle filter algorithm for problems where there are a large number (\begin{document}$ M $$\end{document}) of non-interacting agents (targets) with a large number (\begin{document}$$ M $$\end{document}) of non-agent specific observations (measurements) that originate from these agents. In its basic form, the problem is characterized by data association uncertainty whereby the association between the observations and agents must be deduced in addition to the agent state. In this paper, the large-\begin{document}$$ M $$\end{document} limit is interpreted as a problem of collective inference. This viewpoint is used to derive the equation for the empirical distribution of the hidden agent states. A feedback particle filter (FPF) algorithm for this problem is presented and illustrated via numerical simulations. Results are presented for the Euclidean and the finite state-space cases, both in continuous-time settings. The classical FPF algorithm is shown to be the special case (with \begin{document}$$ M = 1 $$\end{document}) of these more general results. The simulations help show that the algorithm well approximates the empirical distribution of the hidden states for large \begin{document}$$ M $$\end{document}$. 
    more » « less
  4. We study how long‐lived, rational agents learn in a social network. In every period, after observing the past actions of his neighbors, each agent receives a private signal, and chooses an action whose payoff depends only on the state. Since equilibrium actions depend on higher‐order beliefs, it is difficult to characterize behavior. Nevertheless, we show that regardless of the size and shape of the network, the utility function, and the patience of the agents, the speed of learning in any equilibrium is bounded from above by a constant that only depends on the private signal distribution. 
    more » « less
  5. In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent’s task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of the Q-function on other agents decreases exponentially as the distance between them increases. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by three case studies, two wireless communication case studies with independent and dependent reward functions, respectively, and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation. 
    more » « less