This content will become publicly available on December 2, 2025
Title: NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network Simulation
Mobile devices such as smartphones, laptops, and tablets can often connect to multiple access networks (e.g., Wi-Fi, LTE, and 5G) simultaneously. Recent advancements facilitate seamless integration of these connections below the transport layer, enhancing the experience for apps that lack inherent multi-path support. This optimization hinges on dynamically determining the traffic distribution across networks for each device, a process referred to as \textit{multi-access traffic splitting}. This paper introduces \textit{NetworkGym}, a high-fidelity network environment simulator that facilitates generating multiple network traffic flows and multi-access traffic splitting. This simulator facilitates training and evaluating different RL-based solutions for the multi-access traffic splitting problem. Our initial explorations demonstrate that the majority of existing state-of-the-art offline RL algorithms (e.g. CQL) fail to outperform certain hand-crafted heuristic policies on average. This illustrates the urgent need to evaluate offline RL algorithms against a broader range of benchmarks, rather than relying solely on popular ones such as D4RL. We also propose an extension to the TD3+BC algorithm, named Pessimistic TD3 (PTD3), and demonstrate that it outperforms many state-of-the-art offline RL algorithms. PTD3's behavioral constraint mechanism, which relies on value-function pessimism, is theoretically motivated and relatively simple to implement. more »« less
Jiayi Song, Roch Guérin
(, Proceedings 33rd International Teletraffic Congress (ITC 33))
null
(Ed.)
Much of today's traffic flows between datacenters over private networks. The operators of those networks have access to detailed traffic profiles with performance goals that need to be met as efficiently as possible, e.g., realizing latency guarantees with minimal network bandwidth. Of particular interest is the extent to which traffic (re)shaping can be of benefit. The paper focuses on the most basic network configuration, namely, a single link network, with extensions to more general, multi-node networks discussed in a companion paper. The main results are in the form of optimal solutions for different types of schedulers of varying complexity. They demonstrate how judicious traffic shaping can help lower complexity schedulers perform nearly as well as more complex ones.
Hu, Xinyue; Ghosh, Arnob; Liu, Xin; Zhang, Zhi-Li; Shroff, Ness
(, 2023 IEEE International Workshop Technical Committee on Communications Quality and Reliability (CQR))
The adaptive bitrate selection (ABR) mechanism, which decides the bitrate for each video chunk is an important part of video streaming. There has been significant interest in developing Reinforcement-Learning (RL) based ABR algorithms because of their ability to learn efficient bitrate actions based on past data and their demonstrated improvements over wired, 3G and 4G networks. However, the Quality of Experience (QoE), especially video stall time, of state-of-the-art ABR algorithms including the RL-based approaches falls short of expectations over commercial mmWave 5G networks, due to widely and wildly fluctuating throughput. These algorithms find optimal policies for a multi-objective unconstrained problem where the policies inherently depend on the predefined weight parameters of the multiple objectives (e.g., bitrate maximization, stall-time minimization). Our empirical evaluation suggests that such a policy cannot adequately adapt to the high variations of 5G throughput, resulting in long stall times. To address these issues, we formulate the ABR selection problem as a constrained Markov Decision Process where the objective is to maximize the QoE subject to a stall-time constraint. The strength of this formulation is that it helps mitigate the stall time while maintaining high bitrates. We propose COREL, a primal-dual actor-critic RL algorithm, which incorporates an additional critic network to estimate stall time compared to existing RL-based approaches and can tune the optimal dual variable or weight to guide the policy towards minimizing stall time. Our experiment results across various commercial mmWave 5G traces reveal that COREL reduces the average stall time by a factor of 4 and the 95th percentile by a factor of 2.
Kumarasamy, Vijayalakshmi K; Saroj, Abhilasha Jairam; Liang, Yu; Wu, Dalei; Hunter, Michael P; Guin, Angshuman; Sartipi, Mina
(, Symmetry)
Machine learning (ML) methods, particularly Reinforcement Learning (RL), have gained widespread attention for optimizing traffic signal control in intelligent transportation systems. However, existing ML approaches often exhibit limitations in scalability and adaptability, particularly within large traffic networks. This paper introduces an innovative solution by integrating decentralized graph-based multi-agent reinforcement learning (DGMARL) with a Digital Twin to enhance traffic signal optimization, targeting the reduction of traffic congestion and network-wide fuel consumption associated with vehicle stops and stop delays. In this approach, DGMARL agents are employed to learn traffic state patterns and make informed decisions regarding traffic signal control. The integration with a Digital Twin module further facilitates this process by simulating and replicating the real-time asymmetric traffic behaviors of a complex traffic network. The evaluation of this proposed methodology utilized PTV-Vissim, a traffic simulation software, which also serves as the simulation engine for the Digital Twin. The study focused on the Martin Luther King (MLK) Smart Corridor in Chattanooga, Tennessee, USA, by considering symmetric and asymmetric road layouts and traffic conditions. Comparative analysis against an actuated signal control baseline approach revealed significant improvements. Experiment results demonstrate a remarkable 55.38% reduction in Eco_PI, a developed performance measure capturing the cumulative impact of stops and penalized stop delays on fuel consumption, over a 24 h scenario. In a PM-peak-hour scenario, the average reduction in Eco_PI reached 38.94%, indicating the substantial improvement achieved in optimizing traffic flow and reducing fuel consumption during high-demand periods. These findings underscore the effectiveness of the integrated DGMARL and Digital Twin approach in optimizing traffic signals, contributing to a more sustainable and efficient traffic management system.
Kidambi, Rahul; Rajeswaran, Aravind; Netrapalli, Praneeth; Joachims, Thorsten
(, Advances in neural information processing systems)
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. This serves as an extreme test for an agent's ability to effectively use historical data which is known to be critical for efficient RL. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP using the offline dataset; (b) learning a near-optimal policy in this pessimistic MDP. The design of the pessimistic MDP is such that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the pessimistic MDP. This enables the pessimistic MDP to serve as a good surrogate for purposes of policy evaluation and learning. Theoretically, we show that MOReL is minimax optimal (up to log factors) for offline RL. Empirically, MOReL matches or exceeds state-of-the-art results on widely used offline RL benchmarks. Overall, the modular design of MOReL enables translating advances in its components (for e.g., in model learning, planning etc.) to improvements in offline RL.
Qiu, Haoran; Mao, Weichao; Patke, Archit; Wang, Chen; Franke, Hubertus; Kalbarczyk, Zbigniew T.; Başar, Tamer; Iyer, Ravishankar K.
(, Proceedings of the 13th ACM Symposium on Cloud Computing (SoCC 2022))
Serverless Function-as-a-Service (FaaS) offers improved programmability for customers, yet it is not server-“less” and comes at the cost of more complex infrastructure management (e.g., resource provisioning and scheduling) for cloud providers. To maintain function service-level objectives (SLOs) and improve resource utilization efficiency, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to rule-based solutions with heuristics, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. Despite the initial success of applying RL, we first show in this paper that the state-of-the-art single-agent RL algorithm (S-RL) suffers up to 4.8x higher p99 function latency degradation on multi-tenant serverless FaaS platforms compared to isolated environments and is unable to converge during training. We then design and implement a scalable and incremental multi-agent RL framework based on Proximal Policy Optimization (SIMPPO). Our experiments on widely used serverless benchmarks demonstrate that in multi-tenant environments, SIMPPO enables each RL agent to efficiently converge during training and provides online function latency performance comparable to that of S-RL trained in isolation (which we refer to as the baseline for assessing RL performance) with minor degradation (<9.2%). In addition, SIMPPO reduces the p99 function latency by 4.5x compared to S-RL in multi-tenant cases.
Haider, Momin, Yin, Ming, Zhang, Menglei, Gupta, Arpit, Zhu, Jing, and Wang, Yu-Xiang.
"NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network Simulation". Proceedings of Machine Learning Research (). Country unknown/Code not available: NeurIPS 2024 Dataset and Benchmark. https://par.nsf.gov/biblio/10558127.
@article{osti_10558127,
place = {Country unknown/Code not available},
title = {NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network Simulation},
url = {https://par.nsf.gov/biblio/10558127},
abstractNote = {Mobile devices such as smartphones, laptops, and tablets can often connect to multiple access networks (e.g., Wi-Fi, LTE, and 5G) simultaneously. Recent advancements facilitate seamless integration of these connections below the transport layer, enhancing the experience for apps that lack inherent multi-path support. This optimization hinges on dynamically determining the traffic distribution across networks for each device, a process referred to as \textit{multi-access traffic splitting}. This paper introduces \textit{NetworkGym}, a high-fidelity network environment simulator that facilitates generating multiple network traffic flows and multi-access traffic splitting. This simulator facilitates training and evaluating different RL-based solutions for the multi-access traffic splitting problem. Our initial explorations demonstrate that the majority of existing state-of-the-art offline RL algorithms (e.g. CQL) fail to outperform certain hand-crafted heuristic policies on average. This illustrates the urgent need to evaluate offline RL algorithms against a broader range of benchmarks, rather than relying solely on popular ones such as D4RL. We also propose an extension to the TD3+BC algorithm, named Pessimistic TD3 (PTD3), and demonstrate that it outperforms many state-of-the-art offline RL algorithms. PTD3's behavioral constraint mechanism, which relies on value-function pessimism, is theoretically motivated and relatively simple to implement.},
journal = {Proceedings of Machine Learning Research},
publisher = {NeurIPS 2024 Dataset and Benchmark},
author = {Haider, Momin and Yin, Ming and Zhang, Menglei and Gupta, Arpit and Zhu, Jing and Wang, Yu-Xiang},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.