skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multitask Learning for Network Traffic Classification
Traffic classification has various applications in today's Internet, from resource allocation, billing and QoS purposes in ISPs to firewall and malware detection in clients. Classical machine learning algorithms and deep learning models have been widely used to solve the traffic classification task. However, training such models requires a large amount of labeled data. Labeling data is often the most difficult and time-consuming process in building a classifier. To solve this challenge, we reformulate the traffic classification into a multi-task learning framework where bandwidth requirement and duration of a flow are predicted along with the traffic class. The motivation of this approach is twofold: First, the bandwidth requirement and duration are useful in many applications, including routing, resource allocation, and QoS provisioning. Second, these two values can be obtained from each flow easily without the need for human labeling or capturing flows in a controlled and isolated environment. We show that with a large amount of easily obtainable data samples for bandwidth and duration prediction tasks, and only a few data samples for the traffic classification task, one can achieve high accuracy. Therefore, our proposed multi-task learning framework obviates the need for a large labeled traffic dataset. We conduct two experiments with ISCX and QUIC public datasets and show the efficacy of our approach  more » « less
Award ID(s):
1838207
PAR ID:
10210971
Author(s) / Creator(s):
;
Date Published:
Journal Name:
29th International Conference on Computer Communications and Networks (ICCCN)
Page Range / eLocation ID:
1 to 9
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Edge Cloud (EC) is poised to brace massive machine type communication (mMTC) for 5G and IoT by providing compute and network resources at the edge. Yet, the EC being regionally domestic with a smaller scale, faces the challenges of bandwidth and computational throughput. Resource management techniques are considered necessary to achieve efficient resource allocation objectives. Software Defined Network (SDN) enabled EC architecture is emerging as a potential solution that enables dynamic bandwidth allocation and task scheduling for latency sensitive and diverse mobile applications in the EC environment. This study proposes a novel Heuristic Reinforcement Learning (HRL) based flowlevel dynamic bandwidth allocation framework and validates it through end-to-end implementation using OpenFlow meter feature. OpenFlow meter provides granular control and allows demand-based flow management to meet the diverse QoS requirements germane to IoT traffics. The proposed framework is then evaluated by emulating an EC scenario based on real NSF COSMOS testbed topology at The City College of New York. A specific heuristic reinforcement learning with linear-annealing technique and a pruning principle are proposed and compared with the baseline approach. Our proposed strategy performs consistently in both Mininet and hardware OpenFlow switches based environments. The performance evaluation considers key metrics associated with real-time applications: throughput, end-to-end delay, packet loss rate, and overall system cost for bandwidth allocation. Furthermore, our proposed linear annealing method achieves faster convergence rate and better reward in terms of system cost, and the proposed pruning principle remarkably reduces control traffic in the network. 
    more » « less
  2. Network quality-of-service (QoS) does not always translate to user quality-of-experience (QoE). Consequently, knowledge of user QoE is desirable in several scenarios that have traditionally operated on QoS information. Examples include traffic management by ISPs and resource allocation by the operating system. But today these systems lack ways to measure user QoE. To help address this problem, we propose offline generation of per-app models mapping app-independent QoS metrics to app-specific QoE metrics. This enables any entity that can observe an app's network traffic-including ISPs and access points-to infer the app's QoE. We describe how to generate such models for many diverse apps with significantly different QoE metrics. We generate models for common user interactions of 60 popular apps. We then demonstrate the utility of these models by implementing a QoE-aware traffic management framework and evaluate it on a WiFi access point. Our approach successfully improves QoE metrics that reflect user-perceived performance. First, we demonstrate that prioritizing traffic for latency-sensitive apps can improve responsiveness and video frame rate, by 46% and 115%, respectively. Second, we show that a novel QoE-aware bandwidth allocation scheme for bandwidth-intensive apps can improve average video bitrate for multiple users by up to 23%. 
    more » « less
  3. Traffic classification has been studied for two decades and applied to a wide range of applications from QoS provisioning and billing in ISPs to security-related applications in firewalls and intrusion detection systems. Port-based, data packet inspection, and classical machine learning methods have been used extensively in the past, but their accuracy have been declined due to the dramatic changes in the Internet traffic, particularly the increase in encrypted traffic. With the proliferation of deep learning methods, researchers have recently investigated these methods for traffic classification task and reported high accuracy. In this article, we introduce a general framework for deep-learning-based traffic classification. We present commonly used deep learning methods and their application in traffic classification tasks. Then, we discuss open 
    more » « less
  4. Traffic classification has been studied for two decades and applied to a wide range of applications from QoS provisioning and billing in ISPs to security-related applications in firewalls and intrusion detection systems. Port-based, data packet inspection, and classical machine learning methods have been used extensively in the past, but their accuracy have been declined due to the dramatic changes in the Internet traffic, particularly the increase in encrypted traffic. With the proliferation of deep learning methods, researchers have recently investigated these methods for traffic classification task and reported high accuracy. In this article, we introduce a general framework for deep-learning-based traffic classification. We present commonly used deep learning methods and their application in traffic classification tasks. Then, we discuss open problems, challenges, and opportunities for traffic classification. 
    more » « less
  5. Aidong Zhang; Huzefa Rangwala (Ed.)
    In many scenarios, 1) data streams are generated in real time; 2) labeled data are expensive and only limited labels are available in the beginning; 3) real-world data is not always i.i.d. and data drift over time gradually; 4) the storage of historical streams is limited. This learning setting limits the applicability and availability of many Machine Learning (ML) algorithms. We generalize the learning task under such setting as a semi-supervised drifted stream learning with short lookback problem (SDSL). SDSL imposes two under-addressed challenges on existing methods in semi-supervised learning and continuous learning: 1) robust pseudo-labeling under gradual shifts and 2) anti-forgetting adaptation with short lookback. To tackle these challenges, we propose a principled and generic generation-replay framework to solve SDSL. To achieve robust pseudo-labeling, we develop a novel pseudo-label classification model to leverage supervised knowledge of previously labeled data, unsupervised knowledge of new data, and, structure knowledge of invariant label semantics. To achieve adaptive anti-forgetting model replay, we propose to view the anti-forgetting adaptation task as a flat region search problem. We propose a novel minimax game-based replay objective function to solve the flat region search problem and develop an effective optimization solver. Experimental results demonstrate the effectiveness of the proposed method. 
    more » « less