skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reinforcement Learning for Multi-Hop Scheduling and Routing of Real-Time Flows
We consider the problem of serving real-time flows over a multi-hop wireless network. Each flow is composed of packets that have strict deadlines, and the goal is to maximize the weighted timely throughput of the system. Consistent with recent developments using mm-wave communications, we assume that the links are directional, but are lossy, and have unknown probabilities of successful packet transmission. An average link utilization budget (similar to a power constraint) constrains the system. We pose the problem in the form of a Constrained Markov Decision Process (CMDP) with an unknown transition kernel. We use a duality approach to decompose the problem into an inner unconstrained MDP with link usage costs, and an outer linkcost update step. For the inner MDP, we develop modelbased reinforcement learning algorithms that sample links by sending packets to learn the link statistics. While the first algorithm type samples links at will at the beginning and constructs the model, the second type is an online approach that can only use packets from flows to sample links that they traverse. The approach to the outer problem follows gradient descent. We characterize the sample complexity (number of packets transmitted) to obtain near-optimal policies, to show that a basic online approach has a poorer sample complexity bound, it can be modified to obtain an online algorithm that has excellent empirical performance.  more » « less
Award ID(s):
1719384
PAR ID:
10296546
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the International Symposium on Modeling and Optimization in Mobile Ad Hoc and Wireless Networks
ISSN:
2690-3334
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov decision process (MDP) when the transition kernels are unknown. Unlike the classical learning algorithms for MDPs, such as Q-learning and actor-critic algorithms, this algorithm does not depend on a stochastic approximation-based method. We show that our algorithm, which we call the empirical Q-value iteration algorithm, converges to the optimal Q-value function. We also give a rate of convergence or a nonasymptotic sample complexity bound and show that an asynchronous (or online) version of the algorithm will also work. Preliminary experimental results suggest a faster rate of convergence to a ballpark estimate for our algorithm compared with stochastic approximation-based algorithms. 
    more » « less
  2. This paper considers a wireless network where multiple flows are delivering status updates about their respective information sources. An end user aims to make accurate real-time estimations about the status of each information source using its received packets. As the accuracy of estimation is most impacted by events in the recent past, we propose to measure the credibility of an information flow by the number of timely deliveries in a window of the recent past, and say that a flow suffers from a loss-of-credibility (LoC) if this number is insufficient for the end user to make an accurate estimation. We then study the problem of minimizing the system-wide LoC in wireless networks where each flow has different requirement and link quality. We show that the problem of minimizing the system-wide LoC requires the control of temporal variance of timely deliveries for each flow. This feature makes our problem significantly different from other optimization problems that only involves the average of control variables. Surprisingly, we show that there exists a simple online scheduling algorithm that is near-optimal. Simulation results show that our proposed algorithm is significantly better than other state-of-the-art policies. 
    more » « less
  3. null (Ed.)
    Link prediction is an important task in online social networking as it can be used to infer new or previously unknown relationships of a network. However, due to the homophily principle, current algorithms are susceptible to promoting links that may lead to increase segregation of the network—an effect known as filter bubble. In this study, we examine the filter bubble problem from the perspective of algorithm fairness and introduce a dyadic-level fairness criterion based on network modularity measure. We show how the criterion can be utilized as a postprocessing step to generate more heterogeneous links in order to overcome the filter bubble problem. In addition, we also present a novel framework that combines adversarial network representation learning with supervised link prediction to alleviate the filter bubble problem. Experimental results conducted on several real-world datasets showed the effectiveness of the proposed methods compared to other baseline approaches, which include conventional link prediction and fairness-aware methods for i.i.d data. 
    more » « less
  4. Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
    We study the problem of agnostic PAC reinforcement learning (RL): given a policy class Pi, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an epsilon-suboptimal policy with respect to Pi? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set Pi and is independent of the MDP dynamics. With a generative model, we show that the spanning capacity characterizes PAC learnability for every policy class Pi. However, for online RL, the situation is more subtle. We show there exists a policy class Pi with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as recent developments for reachable-state identification and policy evaluation in reward-free exploration. 
    more » « less
  5. This paper studies the problem of robustifying an interdependent network by rewiring a small number of links in realtime during a cascading attack. Interdependent networks have been widely used to model interconnected complex systems such as a critical infrastructure network including both the power grid and the Internet. Realtime robustification of interdependent networks, therefore, has significant practical importance. This paper formulates the problem using the Markov decision process (MDP) framework. We first show the problem is NP-hard and then develop an effective and efficient greedy algorithm, named R EAL W IRE , to robustify the network in realtime. R EAL W IRE scores each link (and each node) based on the expected number of links failures resulted from the failure of the link (or the node), and rewires the links greedily according to the scores. Extensive experimental results show that R EAL W IRE outperforms other algorithms on multiple trobustness metrics. 
    more » « less