—We consider a decentralized wireless network with several source-destination pairs sharing a limited number of orthogonal frequency bands. Sources learn to adapt their transmissions (specifically, their band selection strategy) over time, in a decentralized manner, without sharing information with each other. Sources can only observe the outcome of their own transmissions (i.e., success or collision), having no prior knowledge of the network size or of the transmission strategy of other sources. The goal of each source is to maximize their own throughput while striving for network-wide fairness. We propose a novel fully decentralized Reinforcement Learning (RL)-based solution that achieves fairness without coordination. The proposed Fair Share RL(FSRL)solution combines: (i) state augmentation with a semiadaptive time reference; (ii) an architecture that leverages risk control and time difference likelihood; and (iii) a fairness-driven reward structure. We evaluate FSRL in more than 50 network settings with different number of agents, different amounts of available spectrum, in the presence of jammers, and in an ad-hoc setting. Simulation results suggest that, when we compare FSRL with a common baseline RL algorithm from the literature, FSRL can be up to 89.0% fairer (as measured by Jain’s fairness index) in stringent settings with several sources and a single frequency band, and 48.1% fairer on average.
more »
« less
Multi-agent coordination via a shared wireless spectrum
This paper considers a planar multi-agent coordination problem. Unlike other related works, we explicitly consider a globally shared wireless communication channel where individual agents must choose both a frequency and power to transmit their messages at. This problem is motivated by the pressing need for algorithms that are able to efficiently and reliably operate on overcrowded wireless networks or otherwise poor-performing RF environments. We develop a self-triggered coordination algorithm that guarantees convergence to the desired set of states with probability 1. The algorithm is developed by using ideas from event/self-triggered coordination and allows agents to autonomously decide for themselves when to broadcast information, at which frequency and power, and how to move based on information received from other agents in the network. Simulations illustrate our results.
more »
« less
- Award ID(s):
- 1737989
- PAR ID:
- 10060064
- Date Published:
- Journal Name:
- IEEE Conference on Decision and Control
- Page Range / eLocation ID:
- 6714 to 6719
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study a multi-agent partially observable environment in which autonomous agents aim to coordinate their actions, while also learning the parameters of the unknown environment through repeated interactions. In particular, we focus on the role of communication in a multi-agent reinforcement learning problem. We consider a learning algorithm in which agents make decisions based on their own observations of the environment, as well as the observations of other agents, which are collected through communication between agents. We first identify two potential benefits of this type of information sharing when agents' observation quality is heterogeneous: (1) it can facilitate coordination among agents, and (2) it can enhance the learning of all participants, including the better informed agents. We show however that these benefits of communication depend in general on its timing, so that delayed information sharing may be preferred in certain scenarios.more » « less
-
We propose and evaluate a learning-based framework to address multi-agent resource allocation in coupled wireless systems. In particular we consider, multiple agents (e.g., base stations, access points, etc.) that choose amongst a set of resource allocation options towards achieving their own performance objective /requirements, and where the performance observed at each agent is further coupled with the actions chosen by the other agents, e.g., through interference, channel leakage, etc. The challenge is to find the best collective action. To that end we propose a Multi-Armed Bandit (MAB) framework wherein the best actions (aka arms) are adaptively learned through online reward feedback. Our focus is on systems which are "weakly-coupled" wherein the best arm of each agent is invariant to others' arm selection the majority of the time - this majority structure enables one to develop light weight efficient algorithms. This structure is commonly found in many wireless settings such as channel selection and power control. We develop a bandit algorithm based on the Track-and-Stop strategy, which shows a logarithmic regret with respect to a genie. Finally through simulation, we exhibit the potential use of our model and algorithm in several wireless application scenarios.more » « less
-
Doty, David; Spirakis, Paul (Ed.)We develop a framework for self-induced phase changes in programmable matter in which a collection of agents with limited computational and communication capabilities can collectively perform appropriate global tasks in response to local stimuli that dynamically appear and disappear. Agents reside on graph vertices, where each stimulus is only recognized locally, and agents communicate via token passing along edges to alert other agents to transition to an Aware state when stimuli are present and an Unaware state when the stimuli disappear. We present an Adaptive Stimuli Algorithm that is robust to competing waves of messages as multiple stimuli change, possibly adversarially. Moreover, in addition to handling arbitrary stimulus dynamics, the algorithm can handle agents reconfiguring the connections (edges) of the graph over time in a controlled way. As an application, we show how this Adaptive Stimuli Algorithm on reconfigurable graphs can be used to solve the foraging problem, where food sources may be discovered, removed, or shifted at arbitrary times. We would like the agents to consistently self-organize, using only local interactions, such that if the food remains in a position long enough, the agents transition to a gather phase in which many collectively form a single large component with small perimeter around the food. Alternatively, if no food source has existed recently, the agents should undergo a self-induced phase change and switch to a search phase in which they distribute themselves randomly throughout the lattice region to search for food. Unlike previous approaches to foraging, this process is indefinitely repeatable, withstanding competing waves of messages that may interfere with each other. Like a physical phase change, microscopic changes such as the deletion or addition of a single food source trigger these macroscopic, system-wide transitions as agents share information about the environment and respond locally to get the desired collective response.more » « less
-
Doty, David; Spirakis, Paul (Ed.)We develop a framework for self-induced phase changes in programmable matter in which a collection of agents with limited computational and communication capabilities can collectively perform appropriate global tasks in response to local stimuli that dynamically appear and disappear. Agents reside on graph vertices, where each stimulus is only recognized locally, and agents communicate via token passing along edges to alert other agents to transition to an Aware state when stimuli are present and an Unaware state when the stimuli disappear. We present an Adaptive Stimuli Algorithm that is robust to competing waves of messages as multiple stimuli change, possibly adversarially. Moreover, in addition to handling arbitrary stimulus dynamics, the algorithm can handle agents reconfiguring the connections (edges) of the graph over time in a controlled way. As an application, we show how this Adaptive Stimuli Algorithm on reconfigurable graphs can be used to solve the foraging problem, where food sources may be discovered, removed, or shifted at arbitrary times. We would like the agents to consistently self-organize, using only local interactions, such that if the food remains in a position long enough, the agents transition to a gather phase in which many collectively form a single large component with small perimeter around the food. Alternatively, if no food source has existed recently, the agents should undergo a self-induced phase change and switch to a search phase in which they distribute themselves randomly throughout the lattice region to search for food. Unlike previous approaches to foraging, this process is indefinitely repeatable, withstanding competing waves of messages that may interfere with each other. Like a physical phase change, microscopic changes such as the deletion or addition of a single food source trigger these macroscopic, system-wide transitions as agents share information about the environment and respond locally to get the desired collective response.more » « less
An official website of the United States government

