Networking research has witnessed a renaissance from exploring the seemingly unlimited predictive power of machine learning (ML) models. One such promising direction is throughput prediction – accurately predicting the network bandwidth or achievable throughput of a client in real time using ML models can enable a wide variety of network applications to proactively adapt their behavior to the changing network dynamics to potentially achieve significantly improved QoE. Motivated by the key role of newer generations of cellular networks in supporting the new generation of latency-critical applications such as AR/MR, in this work, we focus on accurate throughput prediction in cellular networks at fine time-scales, e.g., in the order of 100 ms. Through a 4-day, 1000+ km driving trip, we collect a dataset of fine-grained throughput measurements under driving across all three major US operators. Using the collected dataset, we conduct the first feasibility study of predicting fine-grained application throughput in real-world cellular networks with mixed LTE/5G technologies. Our analysis shows that popular ML models previously claimed to predict well for various wireless networks scenarios (e.g., WiFi or singletechnology network such as LTE only) do not predict well under app-centric metrics such as ARE95 and PARE10. Further, we uncover the root cause for the poor prediction accuracy of ML models as the inherent conflicting sample sequences in the finegrained cellular network throughput data. 
                        more » 
                        « less   
                    
                            
                            On the Predictability of Fine-grained Cellular Network Throughput using Machine Learning Models
                        
                    
    
            Networking research has witnessed a renaissance from exploring the seemingly unlimited predictive power of machine learning (ML) models. One such promising direction is throughput prediction – accurately predicting the network bandwidth or achievable throughput of a client in real time using ML models can enable a wide variety of network applications to proactively adapt their behavior to the changing network dynamics to potentially achieve significantly improved QoE. Motivated by the key role of newer generations of cellular networks in supporting the new generation of latency-critical applications such as AR/MR, in this work, we focus on accurate throughput prediction in cellular networks at fine time-scales, e.g., in the order of 100 ms. Through a 4-day, 1000+ km driving trip, we collect a dataset of fine-grained throughput measurements under driving across all three major US operators. Using the collected dataset, we conduct the first feasibility study of predicting fine-grained application throughput in real-world cellular networks with mixed LTE/5G technologies. Our analysis shows that popular ML models previously claimed to predict well for various wireless networks scenarios (e.g., WiFi or singletechnology network such as LTE only) do not predict well under app-centric metrics such as ARE95 and PARE10. Further, we uncover the root cause for the poor prediction accuracy of ML models as the inherent conflicting sample sequences in the fine-grained cellular network throughput data. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10530145
- Publisher / Repository:
- Proc. of IEEE MASS 2024
- Date Published:
- Subject(s) / Keyword(s):
- Machine Learning, Throughput prediction
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            The increasing ubiquity of network traffic and the new online applications’ deployment has increased traffic analysis complexity. Traditionally, network administrators rely on recognizing well-known static ports for classifying the traffic flowing their networks. However, modern network traffic uses dynamic ports and is transported over secure application-layer protocols (e.g., HTTPS, SSL, and SSH). This makes it a challenging task for network administrators to identify online applications using traditional port-based approaches. One way for classifying the modern network traffic is to use machine learning (ML) to distinguish between the different traffic attributes such as packet count and size, packet inter-arrival time, packet send–receive ratio, etc. This paper presents the design and implementation of NetScrapper, a flow-based network traffic classifier for online applications. NetScrapper uses three ML models, namely K-Nearest Neighbors (KNN), Random Forest (RF), and Artificial Neural Network (ANN), for classifying the most popular 53 online applications, including Amazon, Youtube, Google, Twitter, and many others. We collected a network traffic dataset containing 3,577,296 packet flows with different 87 features for training, validating, and testing the ML models. A web-based user-friendly interface is developed to enable users to either upload a snapshot of their network traffic to NetScrapper or sniff the network traffic directly from the network interface card in real time. Additionally, we created a middleware pipeline for interfacing the three models with the Flask GUI. Finally, we evaluated NetScrapper using various performance metrics such as classification accuracy and prediction time. Most notably, we found that our ANN model achieves an overall classification accuracy of 99.86% in recognizing the online applications in our dataset.more » « less
- 
            Fine-grained visual reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multi-agent behavior and could provide a testbed for these tasks; however, extracting simple and standardized representations for prototyping these tasks is laborious and hinders reproducibility. In contrast, MNIST and CIFAR10, despite their extreme simplicity, have enabled rapid prototyping and reproducibility of ML methods. Following the simplicity of these datasets, we construct a benchmark fine-grained multi-agent categorization dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. Specifically, we carefully summarize a window of 255 consecutive game states to create 3.6 million summary images from 60,000 replays, including all relevant metadata such as game outcome and player races. We develop three formats of decreasing complexity: Hyperspectral images that include one channel for every unit type (similar to multispectral geospatial images), RGB images that mimic CIFAR10, and grayscale images that mimic MNIST. We show how this dataset can be used for prototyping fine-grained multi-agent categorization methods. All datasets, code for extraction, and code for dataset loading can be found at https://starcraftdata.davidinouye.com/.</p>more » « less
- 
            Abstract: With the proliferation of Dynamic Spectrum Access (DSA), Internet of Things (IoT), and Mobile Edge Computing (MEC) technologies, various methods have been proposed to deduce key network and user information in cellular systems, such as available cell bandwidths, as well as user locations and mobility. Not only is such information dominated by cellular networks of vital significance on other systems co-located spectrum-wise and/or geographically, but applications within cellular systems can also benefit remarkably from inferring such information, as exemplified by the endeavours made by video streaming to predict cell bandwidth. Hence, we are motivated to develop a new tool to uncover as much information used to be closed to outsiders or user devices as possible with off-the-shelf products. Given the wide-spread deployment of LTE and its continuous evolution to 5G, we design and implement U-CIMAN, a client-side system to accurately UnCover as much Information in Mobile Access Networks as allowed by LTE encryption. Among the many potential applications of U-CIMAN, we highlight one use case of accurately measuring the spectrum tenancy of a commercial LTE cell. Besides measuring spectrum tenancy in unit of resource blocks, U-CIMAN discovers user mobility and traffic types associated with spectrum usage through decoded control messages and user data bytes. We conduct 4-month detailed accurate spectrum measurement on a commercial LTE cell, and the observations include the predictive power of Modulation and Coding Scheme on spectrum tenancy, and channel off-time bounded under 10 seconds, to name a few.more » « less
- 
            Connected and automated vehicles (CAVs) are sup- posed to share the road with human-driven vehicles (HDVs) in a foreseeable future. Therefore, considering the mixed traffic envi- ronment is more pragmatic, as the well-planned operation of CAVs may be interrupted by HDVs. In the circumstance that human behaviors have significant impacts, CAVs need to under- stand HDV behaviors to make safe actions. In this study, we develop a driver digital twin (DDT) for the online prediction of personalized lane-change behavior, allowing CAVs to predict surrounding vehicles’ behaviors with the help of the digital twin technology. DDT is deployed on a vehicle-edge–cloud architec- ture, where the cloud server models the driver behavior for each HDV based on the historical naturalistic driving data, while the edge server processes the real-time data from each driver with his/her digital twin on the cloud to predict the personalized lane- change maneuver. The proposed system is first evaluated on a human-in-the-loop co-simulation platform, and then in a field implementation with three passenger vehicles driving along an on/off ramp segment connecting to the edge server and cloud through the 4G/LTE cellular network. The lane-change intention can be recognized in 6 s on average before the vehicle crosses the lane separation line, and the Mean Euclidean Distance between the predicted trajectory and GPS ground truth is 1.03 m within a 4-s prediction window. Compared to the general model, using a personalized model can improve prediction accuracy by 27.8%. The demonstration video of the proposed system can be watched at https://youtu.be/5cbsabgIOdM.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    