skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: IoTGemini: Modeling IoT Network Behaviors for Synthetic Traffic Generation
Synthetic traffic generation can produce sufficient data for model training of various traffic analysis tasks for IoT networks with few costs and ethical concerns. However, with the increasing functionalities of the latest smart devices, existing approaches can neither customize the traffic generation of various device functions nor generate traffic that preserves the sequentiality among packets as the real traffic. To address these limitations, this paper proposes IoTGemini, a novel framework for high-quality IoT traffic generation, which consists of a Device Modeling Module and a Traffic Generation Module. In the Device Modeling Module, we propose a method to obtain the profiles of the device functions and network behaviors, enabling IoTGemini to customize the traffic generation like using a real IoT device. In the Traffic Generation Module, we design a Packet Sequence Generative Adversarial Network (PS-GAN), which can generate synthetic traffic with high fidelity of both per-packet fields and sequential relationships. We set up a real-world IoT testbed to evaluate IoTGemini. The experiment result shows that IoTGemini can achieve great effectiveness in device modeling, high fidelity of synthetic traffic generation, and remarkable usability to downstream tasks on different traffic datasets and downstream traffic analysis tasks.  more » « less
Award ID(s):
1932418
PAR ID:
10525943
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Mobile Computing
ISSN:
1536-1233
Page Range / eLocation ID:
1 to 17
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Over the years, honeypots emerged as an important security tool to understand attacker intent and deceive attackers to spend time and resources. Recently, honeypots are being deployed for Internet of things (IoT) devices to lure attackers, and learn their behavior. However, most of the existing IoT honeypots, even the high interaction ones, are easily detected by an attacker who can observe honeypot traffic due to lack of real network traffic originating from the honeypot. This implies that, to build better honeypots and enhance cyber deception capabilities, IoT honeypots need to generate realistic network traffic flows. To achieve this goal, we propose a novel deep learning based approach for generating traffic flows that mimic real network traffic due to user and IoT device interactions.A key technical challenge that our approach overcomes is scarcity of device-specific IoT traffic data to effectively train a generator.We address this challenge by leveraging a core generative adversarial learning algorithm for sequences along with domain specific knowledge common to IoT devices.Through an extensive experimental evaluation with 18 IoT devices, we demonstrate that the proposed synthetic IoT traffic generation tool significantly outperforms state of the art sequence and packet generators in remaining indistinguishable from real traffic even to an adaptive attacker. 
    more » « less
  2. The recent spate of cyber attacks towards Internet of Things (IoT) devices in smart homes calls for effective techniques to understand, characterize, and unveil IoT device activities. In this paper, we present a new system, named IoTAthena, to unveil IoT device activities from raw network traffic consisting of timestamped IP packets. IoTAthena characterizes each IoT device activity using an activity signature consisting of an ordered sequence of IP packets with inter-packet time intervals. IoTAthena has two novel polynomial time algorithms, sigMatch and actExtract. For any given signature, sigMatch can capture all matches of the signature in the raw network traffic. Using sigMatch as a subfunction, actExtract can accurately unveil the sequence of various IoT device activities from the raw network traffic. Using the network traffic of heterogeneous IoT devices collected at the router of a real-world smart home testbed and a public IoT dataset, we demonstrate that IoTAthena is able to characterize and generate activity signatures of IoT device activities and accurately unveil the sequence of IoT device activities from raw network traffic. 
    more » « less
  3. Recently, much attention has been devoted to the development of generative network traces and their potential use in supplementing real-world data for a variety of data-driven networking tasks. Yet, the utility of existing synthetic traffic approaches are limited by their low fidelity: low feature granularity, insufficient adherence to task constraints, and subpar class coverage. As effective network tasks are increasingly reliant on raw packet captures, we advocate for a paradigm shift from coarse-grained to fine-grained traffic generation compliant to constraints. We explore this path employing controllable diffusion-based methods. Our preliminary results suggest its effectiveness in generating realistic and fine-grained network traces that mirror the complexity and variety of real network traffic required for accurate service recognition. We further outline the challenges and opportunities of this approach, and discuss a research agenda towards text-to-traffic synthesis. 
    more » « less
  4. Full-system simulation of computer systems is critical for capturing the complex interplay between various hard-ware and software components in future systems. Modeling the network subsystem is indispensable for the fidelity of full-system simulations due to the increasing importance of scale-out systems. Over the last decade, the network software stack has undergone major changes, with userspace networking stacks and data-plane networks rapidly replacing the conventional kernel network stack. Nevertheless, the current state-of-the-art architectural simulator, gem5, still employs kernel networking, which precludes realistic network application scenarios. In this work, we first demonstrate the limitations of gem5's current network stack in achieving high network bandwidth. Then, we enable a userspace networking stack on gem5. We extend gem5's NIC hardware model and device driver to sup-port userspace device drivers running the DPDK framework. Additionally, we implement a network load generator hardware model in gem5 to generate various traffic patterns and per-form per-packet timestamp and latency measurements without introducing packet loss. We develop a suite of six network-intensive benchmarks for stress testing the host network stack. These applications, based on DPDK, can run on both gem5 and real systems. Our experimental results show that enabling userspace networking improves gem5's network bandwidth by 6.3× compared with the current Linux kernel software stack. We characterize the performance of DPDK benchmarks running on both a real system and gem5, and evaluate the sensitivity of the applications to various system and microarchitecture parameters. This work marks the first step in refactoring the networking subsystem in gem5. 
    more » « less
  5. Despite the significant benefits of the widespread adoption of smart home Internet of Things (IoT) devices, these devices are known to be vulnerable to active and passive attacks. Existing literature has demonstrated the ability to infer the activities of these devices by analyzing their network traffic. In this study, we introduce a packet-based signature generation and detection system that can identify specific events associated with IoT devices by extracting simple features from raw encrypted network traffic. Unlike existing techniques that depend on specific time windows, our approach automatically determines the optimal number of packets to generate unique signatures, making it more resilient to network jitters. We evaluate the effectiveness, uniqueness, and correctness of our signatures by training and testing our system using four public datasets and an emulated dataset with varying network delays, verifying known signatures and discovering new ones. Our system achieved an average recall and precision of 98-99% and 98-100%, respectively, demonstrating the effectiveness and feasibility of using packet-level signatures to detect IoT device activities. 
    more » « less