skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on August 27, 2026

Title: Reinforcement Learning Environment for 5G-Enabled IoT Device Resource Management
This research project aims to develop a resource management framework for efficient allocation of 5G network resources to IoT (Internet of Things) devices. As 5G technology is increasingly integrated with IoT applications, the diverse demands and use-cases of IoT devices necessitate dynamic resource management. The focus of this study is to develop an IoT device environment utilizing reinforcement learning (RL) for resource adjustment. The environment observes IoT device parameters including the current BER (bit-error-rate), allocated bandwidth, and current signal power levels. Actions that can be taken by the RL agent on the environment include adjustments to the bandwidth and the signal power level of an IoT device. One implementation of the environment is currently tested with PPO (Proximal Policy Optimization), and DDPG (Deep Deterministic Policy Gradient) RL algorithms using a continuous action space. Initial results show that PPO models train at a faster rate, while DDPG models explore a wider range of states, leading to better model predictions. Another version is tested with PPO and DQN (Deep Q-Networks) using a discrete action space. DQN demonstrates slightly better results than the PPO, possibly due to its value-based approach and that it is better suited for discrete action spaces.  more » « less
Award ID(s):
2318636
PAR ID:
10655807
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
IEEE
Date Published:
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In urban environments, tall buildings or structures can pose limits on the direct channel link between a base station (BS) and an Internet-of-Thing device (IoTD) for wireless communication. Unmanned aerial vehicles (UAVs) with a mounted reconfigurable intelligent surface (RIS), denoted as UAV-RIS, have been introduced in recent works to enhance the system throughput capacity by acting as a relay node between the BS and the IoTDs in wireless access networks. Uncoordinated UAVs or RIS phase shift elements will make unnecessary adjustments that can significantly impact the signal transmission to IoTDs in the area. The concept of age of information (AoI) is proposed in wireless network research to categorize the freshness of the received update message. To minimize the average sum of AoI (ASoA) in the network, two model-free deep reinforcement learning (DRL) approaches – Off-Policy Deep Q-Network (DQN) and On-Policy Proximal Policy Optimization (PPO) – are developed to solve the problem by jointly optimizing the RIS phase shift, the location of the UAV-RIS, and the IoTD transmission scheduling for large-scale IoT wireless networks. Analysis of loss functions and extensive simulations is performed to compare the stability and convergence performance of the two algorithms. The results reveal the superiority of the On-Policy approach, PPO, over the Off-Policy approach, DQN, in terms of stability, convergence speed, and under diverse environment settings 
    more » « less
  2. The Third Generation Partnership Project (3GPP) introduced the fifth generation new radio (5G NR) specifications which offer much higher flexibility than legacy cellular communications standards to better handle the heterogeneous service and performance requirements of the emerging use cases. This flexibility, however, makes the resources management more complex. This paper therefore designs a data driven resource allocation method based on the deep Q-network (DQN). The objective of the proposed model is to maximize the 5G NR cell throughput while providing a fair resource allocation across all users. Numerical results using a 3GPP compliant 5G NR simulator demonstrate that the DQN scheduler better balances the cell throughput and user fairness than existing schedulers. 
    more » « less
  3. Serverless Function-As-A-Service (FaaS) is an emerging cloud computing paradigm that frees application developers from infrastructure management tasks such as resource provisioning and scaling. To reduce the tail latency of functions and improve resource utilization, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to existing heuristics-based resource management approaches, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. In this paper, we show that the state-of-The-Art single-Agent RL algorithm (S-RL) suffers up to 4.6x higher function tail latency degradation on multi-Tenant serverless FaaS platforms and is unable to converge during training. We then propose and implement a customized multi-Agent RL algorithm based on Proximal Policy Optimization, i.e., multi-Agent PPO (MA-PPO). We show that in multi-Tenant environments, MA-PPO enables each agent to be trained until convergence and provides online performance comparable to S-RL in single-Tenant cases with less than 10% degradation. Besides, MA-PPO provides a 4.4x improvement in S-RL performance (in terms of function tail latency) in multi-Tenant cases. 
    more » « less
  4. Edge Cloud (EC) is poised to brace massive machine type communication (mMTC) for 5G and IoT by providing compute and network resources at the edge. Yet, the EC being regionally domestic with a smaller scale, faces the challenges of bandwidth and computational throughput. Resource management techniques are considered necessary to achieve efficient resource allocation objectives. Software Defined Network (SDN) enabled EC architecture is emerging as a potential solution that enables dynamic bandwidth allocation and task scheduling for latency sensitive and diverse mobile applications in the EC environment. This study proposes a novel Heuristic Reinforcement Learning (HRL) based flowlevel dynamic bandwidth allocation framework and validates it through end-to-end implementation using OpenFlow meter feature. OpenFlow meter provides granular control and allows demand-based flow management to meet the diverse QoS requirements germane to IoT traffics. The proposed framework is then evaluated by emulating an EC scenario based on real NSF COSMOS testbed topology at The City College of New York. A specific heuristic reinforcement learning with linear-annealing technique and a pruning principle are proposed and compared with the baseline approach. Our proposed strategy performs consistently in both Mininet and hardware OpenFlow switches based environments. The performance evaluation considers key metrics associated with real-time applications: throughput, end-to-end delay, packet loss rate, and overall system cost for bandwidth allocation. Furthermore, our proposed linear annealing method achieves faster convergence rate and better reward in terms of system cost, and the proposed pruning principle remarkably reduces control traffic in the network. 
    more » « less
  5. With the recent deployment of 5G network, the ever increasing IoT has got a tremendous boost in its expansion and already has penetrated well into the government, commercial and private sectors. With the countless IoT devices and myriad of applications, many of them are resource constrained and have limited energy budget. These IoT devices demand low-energy technique for their computing and communication tasks to stay active for longer period. The two main baseband processes that dissipate bulk of CPU power from the IoT device are synchronization and Finite Impulse Response (FIR) filtering. In this circumstance, hardware-based baseband processing can take these tasks off of the CPU and may significantly reduce energy consumption. While conventional Binary Radix Computing (BC)-based hardware modules can improve power dissipation, Stochastic Computing (SC)-based hardware will certainly cut down much more both the power as well as silicon space in comparison. With this motivation, we propose novel SC-based hardware designs in regards to synchronization and Finite Impulse Response (FIR) filter for resource constraint IoT devices. Comparative analysis shows that our proposed SC-based design can reduce significantly more power and silicon area compared to the BC as well as other proposed SC designs. 
    more » « less