skip to main content


Title: Reinforcement Learning for Mixed Cooperative/Competitive Dynamic Spectrum Access
A dynamic spectrum sharing problem with a mixed collaborative/competitive objective and partial information about peers’ performances that arises from the DARPA Spectrum Collaboration Challenge is considered. Because of the very high complexity of the problem and the enormous size of the state space, it is broken down into the subproblems of channel selection, flow admission control, and transmission schedule assignment. The channel selection problem is the focus of this paper. A reinforcement learning algorithm based on a reduced state is developed to select channels, and a neural network is used as a function approximator to fill in missing values in the resulting input-action matrix. The performance is compared with that obtained by a hand-tuned expert system.  more » « less
Award ID(s):
1642973
NSF-PAR ID:
10133367
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 2019 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN)
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The success of dynamic spectrum sharing in wireless networks depends on reliable automated enforcement of spectrum access policies. In this paper, a crowdsourced approach is used to select volunteers to detect spectrum misuse. Volunteer selection is based on multiple criteria, including their reputation, likelihood of being in a region and ability to effectively detect channel misuse. We formulate the volunteer selection problem as a stable matching problem, whereby, volunteers' monitoring preferences are matched to channels' attributes. Given a set of volunteers, the objective is to ensure maximum coverage of the spectrum enforcement area and accurate detection of spectrum access violation of all channels in the area. The two matching algorithms, Volunteer Matching (VM) and Reverse Volunteer Matching (RVM) are based on variants of the Gale-Shapley algorithm for stable matching. We also propose two Hybrid algorithms, HYBRID-VM and HYBRID-RVM that augment the matching algorithms with a Secretary-based algorithm to overcome the shortcomings of the individual vanilla algorithms. Simulation results show that volunteer selection by using HYBRID-VM gives better coverage of region (better by 19.2% when compared to threshold-based Secretary algorithm), better accuracy of detection and better volunteer happiness when compared to the other algorithms that are tested. 
    more » « less
  2. To address the scarcity of spectrum, FCC mandated the dynamic sharing of spectrum among the different tiers of users. The success of spectrum sharing, however, relies on the automated enforcement of spectrum policies. We focus on ex post spectrum enforcement during/after the occurrence of a potentially harmful event, but before/after an actual harm has occurred. The major challenges addressed by us are to ensure maximum channel coverage in a given region of enforcement, accurate and reliable detection of enforcement, and selection of an efficient algorithm to select entities for detection of violation. We adopt a crowdsourced methodology to monitor spectrum usage. We ensure maximum coverage of the given area by dividing it into equal-sized regions and solve the enforcement problem by a divide and conquer mechanism over the entire region. We use a variant of the Multiple Choice Secretary algorithm to select volunteers. We finally simulate the enforcement framework and analyze the results. 
    more » « less
  3. Industrial Internet of Things (IIoT) has been shown to be of great value to the deployment of smart industrial environment. With the immense growth of IoT devices, dynamic spectrum sharing is introduced, envisaged as a promising solution to the spectrum shortage in IIoT. Meanwhile, cyber-physical safety issue remains to be a great concern for the reliable operation of IIoT system. In this paper, we consider the dynamic spectrum access in IIoT under a Received Signal Strength (RSS) based adversarial localization attack. We employ a practical and effective power perturbation approach to mitigate the localization threat on the IoT devices and cast the privacy-preserving spectrum sharing problem as a stochastic channel selection game. To address the randomness induced by the power perturbation approach, we develop a two-timescale distributed learning algorithm that converges almost surely to the set of correlated equilibria of the game. The numerical results show the convergence of the algorithm and corroborate that the design of two-timescale learning process effectively alleviates the network throughput degradation brought by the power perturbation procedure. 
    more » « less
  4. We consider the optimal link rate selection problem in time-varying wireless channels with unknown channel statistics. The aim of optimal link rate selection is to transmit at the optimal rate at each time slot in order to maximize the expected throughput of the wireless channel/link or equivalently minimize the expected regret. Lack of information about channel state or channel statistics necessitates the use of online/sequential learning algorithms to determine the optimal rate. We present an algorithm called CoTS - Constrained Thompson sampling algorithm which improves upon the current state-of-the-art, is fast and is also general in the sense that it can handle several different constraints in the problem with the same algorithm. We also prove an asymptotic lower bound on the expected regret and a high probability large-horizon upper bound on the regret, which show that the regret grows logarithmically with time in an order sense. We also provide numerical results which establish that CoTS significantly outperforms the current state-of-the-art algorithms. 
    more » « less
  5. null (Ed.)
    In a wireless network with dynamic spectrum sharing, tracking temporal spectrum holes across a wide spectrum band is a challenging task. We consider a scenario in which the spectrum is divided into a large number of bands or channels, each of which has the potential to provide dynamic spectrum access opportunities. The occupancy times of each band by primary users are generally non-exponentially distributed. We develop an approach to determine and parameterize a small selected subset of the bands with good spectrum access opportunities, using limited computational resources under noisy measurements. We model the noisy measurements of the received signal in each band as a bivariate Markov modulated Gaussian process, which can be viewed as a continuous-time bivariate Markov chain observed through Gaussian noise. The underlying bivariate Markov process allows for the characterization of non-exponentially distributed state sojourn times. The proposed scheme combines an online expectation-maximization algorithm for parameter estimation with a computing budget allocation algorithm. Observation time is allocated across the bands to determine the subset of G out of G frequency bands with the largest mean idle times for dynamic spectrum access and at the same time to obtain accurate parameter estimates for this subset of bands. Our simulation results show that when channel holding times are non-exponential, the proposed scheme achieves a substantial improvement in the probability of correct selection of the best subset of bands compared to an approach based on a (univariate) Markov modulated Gaussian process model. 
    more » « less