skip to main content


This content will become publicly available on May 1, 2024

Title: DyCroNO: Dynamic Cross-layer Network Orchestration and Real-time Deep Learning-based Network Load Prediction
In this paper, we present Dynamic Cross-layer Network Orchestration (DyCroNO), a dynamic service provisioning and load balancing mechanism for IP over optical networks. DyCroNO comprises of the following components: i) an end-end (E2E) service provisioning and virtual path allocation algorithm, ii) a lightweight dynamic bandwidth adjustment strategy that leverages the extended duration statistics to ensure optimal network utilization and guarantee the quality-of-service (QoS), and iii) a load distribution mechanism to optimize the network load distribution at runtime. As another contribution, we design a real-time deep learning technique to predict the network load distribution. We implemented a Long Short-Term Memory-based (LSTM) method with a sliding window technique to dynamically (at runtime) predict network load distributions at various lead times. Simulations were performed over three topologies: NSFNet, Cost266 and Eurolarge using real-world traffic traces to model the traffic patterns. Results show that our approach lowers the mean link load and total resources significantly while improving the resource utilization when compared to existing approaches. Additionally, our deep learning-based method showed promising results in load distribution prediction with low root mean squared error (RMSE) and ∼90% accuracy. URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10144888&isnumber=10144840  more » « less
Award ID(s):
1817105
NSF-PAR ID:
10464597
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2023 International Conference on Optical Network Design and Modeling (ONDM)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. 5G services and applications explicitly reserve compute and network resources in today’s complex and dynamic infrastructure of multi-tiered computing and cellular networking to ensure application-specific service quality metrics, and the infrastructure providers charge the 5G services for the resources reserved. A static, one-time reservation of resources at service deployment typically results in extended periods of under-utilization of reserved resources during the lifetime of the service operation. This is due to a plethora of reasons like changes in content from the IoT sensors (for example, change in number of people in the field of view of a camera) or a change in the environmental conditions around the IoT sensors (for example, time of the day, rain or fog can affect data acquisition by sensors). Under-utilization of a specific resource like compute can also be due to temporary inadequate availability of another resource like the network bandwidth in a dynamic 5G infrastructure. We propose a novel Reinforcement Learning-based online method to dynamically adjust an application’s compute and network resource reservations to minimize under-utilization of requested resources, while ensuring acceptable service quality metrics. We observe that a complex application-specific coupling exists between the compute and network usage of an application. Our proposed method learns this coupling during the operation of the service, and dynamically modulates the compute and network resource requests to minimize under-utilization of reserved resources. Through experimental evaluation using real-world video analytics application, we show that our technique is able to capture complex compute-network coupling relationship in an online manner i.e. while the application is running, and dynamically adapts and saves up to 65% compute and 93% network resources on average (over multiple runs), without significantly impacting application accuracy. 
    more » « less
  2. In this paper we propose a novel approach to deliver better delay-jitter performance in dynamic networks. Dynamic networks experience rapid and unpredictable fluctuations and hence, a certain amount of uncertainty about the delay-performance of various network elements is unavoidable. This uncertainty makes it difficult for network operators to guarantee a certain quality of service (in terms of delay and jitter) to users. The uncertainty about the state of the network is often overlooked to simplify problem formulation, but we capture it by modeling the delay on various links as general and potentially correlated random processes. Within this framework, a user will request a certain delay-jitter performance guarantee from the network. After verifying the feasibility of the request, the network will respond to the user by specifying a set of routes as well as the proportion of traffic which should be sent through each one to achieve the desired QoS. We propose to use mean-variance analysis as the basis for traffic distribution and route selection, and show that this technique can significantly reduce the end-to-end jitter because it accounts for the correlated nature of delay across different paths. The resulting traffic distribution is often non-uniform and the fractional flow on each path is the solution to a simple convex optimization problem. We conclude the paper by commenting on the potential application of this method to general transportation networks. 
    more » « less
  3. Several network operators run their networks at high average utilization. At high utilization, it is more likely that Resource Crunch will occur due to there not being enough capacity to serve all offered traffic. One solution is to increase the capacity of the underlying optical network by using higher modulation formats (which provide higher throughput) through transponders capable of dynamically adjusting modulations. This is possible since operators traditionally use large Optical Signal- to-Noise Ratio (OSNR) margins (i.e., the difference between the minimum OSNR for a certain modulation and the observed OSNR). Using modulation formats with higher spectral efficiency (i.e., increasing modulation) decreases OSNR margins. When OSNR margins are small, OSNR fluctuations may trigger the transponder to use more robust, lower modulations. If these changes are frequent, Quality of Service may suffer. To reduce the number of modulation changes, we propose a Machine Learning model to forecast OSNR. When Resource Crunch starts, we choose what modulations to use in each lightpath (according to the forecast); and, when it is over, we revert to large margins, in a demand-responsive manner. Our results show that, during Resource Crunch, our method carries a larger load when compared to a scenario where conservative OSNR margins are used, while incurring significantly fewer modulation changes than a system that always uses the tightest OSNR margin possible. 
    more » « less
  4. Mobile Edge Computing may become a prevalent platform to support applications where mobile devices have limited compute, storage, energy and/or data privacy concerns. In this paper, we study the efficient provisioning and man- agement of compute resources in the Edge-to-Cloud continuum for different types of real-time applications with timeliness requirements depending on application-level update rates and communication/compute delays. We begin by introducing a highly stylized network model allowing us to study the salient features of this problem including its sensitivity to compute vs. communication costs, application requirements, and traffic load variability. We then propose an online decentralized service placement algorithm, based on estimating network delays and adapting application update rates, which achieves high service availability. Our results exhibit how placement can be optimized and how a load-balancing strategy c 
    more » « less
  5. Mobile devices such as drones and autonomous vehicles increasingly rely on object detection (OD) through deep neural networks (DNNs) to perform critical tasks such as navigation, target-tracking and surveillance, just to name a few. Due to their high complexity, the execution of these DNNs requires excessive time and energy. Low-complexity object tracking (OT) is thus used along with OD, where the latter is periodically applied to generate "fresh" references for tracking. However, the frames processed with OD incur large delays, which does not comply with real-time applications requirements. Offloading OD to edge servers can mitigate this issue, but existing work focuses on the optimization of the offloading process in systems where the wireless channel has a very large capacity. Herein, we consider systems with constrained and erratic channel capacity, and establish parallel OT (at the mobile device) and OD (at the edge server) processes that are resilient to large OD latency. We propose Katch-Up, a novel tracking mechanism that improves the system resilience to excessive OD delay. We show that this technique greatly improves the quality of the reference available to tracking, and boosts performance up to 33%. However, while Katch-Up significantly improves performance, it also increases the computing load of the mobile device. Hence, we design SmartDet, a low-complexity controller based on deep reinforcement learning (DRL) that learns to achieve the right trade-off between resource utilization and OD performance. SmartDet takes as input highly-heterogeneous context-related information related to the current video content and the current network conditions to optimize frequency and type of OD offloading, as well as Katch-Up utilization. We extensively evaluate SmartDet on a real-world testbed composed by a JetSon Nano as mobile device and a GTX 980 Ti as edge server, connected through a Wi-Fi link, to collect several network-related traces, as well as energy measurements. We consider a state-of-the-art video dataset (ILSVRC 2015 - VID) and state-of-the-art OD models (EfficientDet 0, 2 and 4). Experimental results show that SmartDet achieves an optimal balance between tracking performance – mean Average Recall (mAR) and resource usage. With respect to a baseline with full Katch-Up usage and maximum channel usage, we still increase mAR by 4% while using 50% less of the channel and 30% power resources associated with Katch-Up. With respect to a fixed strategy using minimal resources, we increase mAR by 20% while using Katch-Up on 1/3 of the frames. 
    more » « less