skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Deep Q-Learning Dynamic Spectrum Sharing Experiment
We report results of an experiment in applying deep Q-learning for dynamic spectrum sharing (DSS) in the Alleys of Austin scenario from the DARPA Spectrum Collaboration Challenge. This scenario mimics mobile operations in an urban environment by up to five squads (teams) of soldiers. Each team operates its own wireless network. We consider teamwise– distributed DSS, where there is no central agent to coordinate spectrum usage across teams, but spectrum usage within each team is coordinated by a single member of that team. The spatial distributions of the soldiers creates opportunities for spatial reuse by certain subsets of the teams, and our experiment is set up to evaluate whether the deep Q-learning algorithm can discover and take advantage of these opportunities. The results show that deep Q-learning is able to take advantage of spatial reuse and that doing so results in better performance than a fair-share, disjoint spectrum allocation among the teams.  more » « less
Award ID(s):
1642973 1738065
PAR ID:
10312524
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ICC 2021 - IEEE International Conference on Communications Proceedings
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Database-driven Dynamic Spectrum Sharing (DSS) is the de-facto technical paradigm adopted by Federal Communications Commission for increasing spectrum efficiency, which allows licensed spectrum to be opportunistically used by secondary users. In database-driven DSS, a geo-location database administrator (DBA) maintains spectrum availability information over its service region in the form of a Radio Environment Map (REM), where the received signal strength from the primary user at every location is either directly measured via spectrum sensing or estimated via statistical spatial interpolation. Crowdsourcing-based spectrum sensing is a promising approach for periodically collecting spectrum measurements over a large geographic area but is unfortunately vulnerable to false spectrum measurements. Despite a large body of prior work on secure cooperative spectrum sensing, how to construct an accurate REM in the presence of false measurements remains an open challenge. In this paper, we introduce ST-REM, a novel spatiotemporal approach for securely constructing an REM in the presence of false spectrum measurements. Inspired by the self-label techniques developed for semi-supervised learning, ST-REM iteratively constructs an REM from a small number of spectrum measurements from trusted anchor sensors and many more measurements from mobile users. During each iteration, the DBA evaluates the trustworthiness of each measurement by jointly considering its spatial fitness with other trusted measurements and the mobile user's long-term behavior. By gradually incorporating the most trustworthy spectrum measurements, the DBA is able to construct a REM with high accuracy. Extensive simulation studies using a real spectrum measurement dataset confirm the efficacy and efficiency of ST-REM. 
    more » « less
  2. The aggregation of individual personality tests to predict team performance is widely accepted in management theory but has significant limitations: the isolated nature of individual personality surveys fails to capture much of the team dynamics that drive real-world team performance. Artificial Swarm Intelligence (ASI), a technology that enables networked teams to think together in real-time and answer questions as a unified system, promises a solution to these limitations by enabling teams to take personality tests together, whereby the team uses ASI to converge upon answers that best represent the group’s disposition. In the present study, the group personality of 94 small teams was assessed by having teams take a standard Big Five Inventory (BFI) test both as individuals, and as a real-time system enabled by an ASI technology known as Swarm AI. The predictive accuracy of each personality assessment method was assessed by correlating the BFI personality traits to a range of real-world performance metrics. The results showed that assessments of personality generated using Swarm AI were far more predictive of team performance than the traditional survey-based method, showing a significant improvement in correlation with at least 25% of performance metrics, and in no case showing a significant decrease in predictive performance. This suggests that Swarm AI technology may be used as a highly effective team personality assessment tool that more accurately predicts future team performance than traditional survey approaches. 
    more » « less
  3. Recent algorithms have achieved superhuman performance at a number of twoplayer zero-sum games such as poker and go. However, many real-world situations are multi-player games. Zero-sum two-team games, such as bridge and football, involve two teams where each member of the team shares the same reward with every other member of that team, and each team has the negative of the reward of the other team. A popular solution concept in this setting, called TMECor, assumes that teams can jointly correlate their strategies before play, but are not able to communicate during play. This setting is harder than two-player zerosum games because each player on a team has different information and must use their public actions to signal to other members of the team. Prior works either have game-theoretic guarantees but only work in very small games, or are able to scale to large games but do not have game-theoretic guarantees. In this paper we introduce two algorithms: Team-PSRO, an extension of PSRO from twoplayer games to team games, and Team-PSRO Mix-and-Match which improves upon Team PSRO by better using population policies. In Team-PSRO, in every iteration both teams learn a joint best response to the opponent’s meta-strategy via reinforcement learning. As the reinforcement learning joint best response approaches the optimal best response, Team-PSRO is guaranteed to converge to a TMECor. In experiments on Kuhn poker and Liar’s Dice, we show that a tabular version of Team-PSRO converges to TMECor, and a version of Team PSRO using deep cooperative reinforcement learning beats self-play reinforcement learning in the large game of Google Research Football. 
    more » « less
  4. Teams can be often viewed as a dynamic system where the team configuration evolves over time (e.g., new members join the team; existing members leave the team; the skills of the members improve over time). Consequently, the performance of the team might be changing due to such team dynamics. A natural question is how to plan the (re-)staffing actions (e.g., recruiting a new team member) at each time step so as to maximize the expected cumulative performance of the team. In this paper, we address the problem of real-time team optimization by intelligently selecting the best candidates towards increasing the similarity between the current team and the high-performance teams according to the team configuration at each time-step. The key idea is to formulate it as a Markov Decision process (MDP) problem and leverage recent advances in reinforcement learning to optimize the team dynamically. The proposed method bears two main advantages, including (1) dynamics, being able to model the dynamics of the team to optimize the initial team towards the direction of a high-performance team via performance feedback; (2) efficacy, being able to handle the large state/action space via deep reinforcement learning based value estimation. We demonstrate the effectiveness of the proposed method through extensive empirical evaluations. 
    more » « less
  5. The computation of Vietoris-Rips persistence barcodes is both execution-intensive and memory-intensive. In this paper, we study the computational structure of Vietoris-Rips persistence barcodes, and identify several unique mathematical properties and algorithmic opportunities with connections to the GPU. Mathematically and empirically, we look into the properties of apparent pairs, which are independently identifiable persistence pairs comprising up to 99% of persistence pairs. We give theoretical upper and lower bounds of the apparent pair rate and model the average case. We also design massively parallel algorithms to take advantage of the very large number of simplices that can be processed independently of each other. Having identified these opportunities, we develop a GPU-accelerated software for computing Vietoris-Rips persistence barcodes, called Ripser++. The software achieves up to 30x speedup over the total execution time of the original Ripser and also reduces CPU-memory usage by up to 2.0x. We believe our GPU-acceleration based efforts open a new chapter for the advancement of topological data analysis in the post-Moore's Law era. 
    more » « less