skip to main content

This content will become publicly available on October 1, 2024

Title: Data-Explainable Website Fingerprinting with Network Simulation

Website fingerprinting (WF) attacks allow an adversary to associate a website with the encrypted traffic patterns produced when accessing it, thus threatening to destroy the client-server unlinkability promised by anonymous communication networks. Explainable WF is an open problem in which we need to improve our understanding of (1) the machine learning models used to conduct WF attacks; and (2) the WF datasets used as inputs to those models. This paper focuses on explainable datasets; that is, we develop an alternative to the standard practice of gathering low-quality WF datasets using synthetic browsers in large networks without controlling for natural network variability. In particular, we demonstrate how network simulation can be used to produce explainable WF datasets by leveraging the simulator's high degree of control over network operation. Through a detailed investigation of the effect of network variability on WF performance, we find that: (1) training and testing WF attacks in networks with distinct levels of congestion increases the false-positive rate by as much as 200%; (2) augmenting the WF attacks by training them across several networks with varying degrees of congestion decreases the false-positive rate by as much as 83%; and (3) WF classifiers trained on completely simulated data can achieve greater than 80% accuracy when applied to the real world.

more » « less
Award ID(s):
Author(s) / Creator(s):
Publisher / Repository:
Proceedings on Privacy Enhancing Technologies
Date Published:
Journal Name:
Proceedings on Privacy Enhancing Technologies
Page Range / eLocation ID:
559 to 577
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Website Fingerprinting (WF) attacks pose a serious threat to users' online privacy, including for users of the Tor anonymity system. By exploiting recent advances in deep learning, WF attacks like Deep Fingerprinting (DF) have reached up to 98% accuracy. The DF attack, however, requires large amounts of training data that needs to be updated regularly, making it less practical for the weaker attacker model typically assumed in WF. Moreover, research on WF attacks has been criticized for not demonstrating attack effectiveness under more realistic and more challenging scenarios. Most research on WF attacks assumes that the testing and training data have similar distributions and are collected from the same type of network at about the same time. In this paper, we examine how an attacker could leverage N-shot learning---a machine learning technique requiring just a few training samples to identify a given class---to reduce the effort of gathering and training with a large WF dataset as well as mitigate the adverse effects of dealing with different network conditions. In particular, we propose a new WF attack called Triplet Fingerprinting (TF) that uses triplet networks for N-shot learning. We evaluate this attack in challenging settings such as where the training and testing data are collected multiple years apart on different networks, and we find that the TF attack remains effective in such settings with 85% accuracy or better. We also show that the TF attack is also effective in the open world and outperforms traditional transfer learning. On top of that, the attack requires only five examples to recognize a website, making it dangerous in a wide variety of scenarios where gathering and training on a complete dataset would be impractical. 
    more » « less
  2. Tor provides low-latency anonymous and uncensored network access against a local or network adversary. Due to the design choice to minimize traffic overhead (and increase the pool of potential users) Tor allows some information about the client's connections to leak. Attacks using (features extracted from) this information to infer the website a user visits are called Website Fingerprinting (WF) attacks. We develop a methodology and tools to measure the amount of leaked information about a website. We apply this tool to a comprehensive set of features extracted from a large set of websites and WF defense mechanisms, allowing us to make more fine-grained observations about WF attacks and defenses. 
    more » « less
  3. null (Ed.)
    Abstract We introduce Generative Adversarial Networks for Data-Limited Fingerprinting (GANDaLF), a new deep-learning-based technique to perform Website Fingerprinting (WF) on Tor traffic. In contrast to most earlier work on deep-learning for WF, GANDaLF is intended to work with few training samples, and achieves this goal through the use of a Generative Adversarial Network to generate a large set of “fake” data that helps to train a deep neural network in distinguishing between classes of actual training data. We evaluate GANDaLF in low-data scenarios including as few as 10 training instances per site, and in multiple settings, including fingerprinting of website index pages and fingerprinting of non-index pages within a site. GANDaLF achieves closed-world accuracy of 87% with just 20 instances per site (and 100 sites) in standard WF settings. In particular, GANDaLF can outperform Var-CNN and Triplet Fingerprinting (TF) across all settings in subpage fingerprinting. For example, GANDaLF outperforms TF by a 29% margin and Var-CNN by 38% for training sets using 20 instances per site. 
    more » « less
  4. Abstract Website Fingerprinting (WF) attacks are used by local passive attackers to determine the destination of encrypted internet traffic by comparing the sequences of packets sent to and received by the user to a previously recorded data set. As a result, WF attacks are of particular concern to privacy-enhancing technologies such as Tor. In response, a variety of WF defenses have been developed, though they tend to incur high bandwidth and latency overhead or require additional infrastructure, thus making them difficult to implement in practice. Some lighter-weight defenses have been presented as well; still, they attain only moderate effectiveness against recently published WF attacks. In this paper, we aim to present a realistic and novel defense, RegulaTor, which takes advantage of common patterns in web browsing traffic to reduce both defense overhead and the accuracy of current WF attacks. In the closed-world setting, RegulaTor reduces the accuracy of the state-of-the-art attack, Tik-Tok, against comparable defenses from 66% to 25.4%. To achieve this performance, it requires 6.6% latency overhead and a bandwidth overhead 39.3% less than the leading moderate-overhead defense. In the open-world setting, RegulaTor limits a precision-tuned Tik-Tok attack to an F 1 -score of. 135, compared to .625 for the best comparable defense. 
    more » « less
  5. Unmanned Aerial Networks (UAVs) are prone to several cyber-attacks, including Global Positioning Spoofing attacks. For this purpose, numerous studies have been conducted to detect, classify, and mitigate these attacks, using Artificial Intelligence techniques; however, most of these studies provided techniques with low detection, high misdetection, and high bias rates. To fill this gap, in this paper, we propose three supervised deep learning techniques, namely Deep Neural Network, U Neural Network, and Long Short Term Memory. These models are evaluated in terms of Accuracy, Detection Rate, Misdetection Rate, False Alarm Rate, Training Time per Sample, Prediction Time, and Memory Size. The simulation results indicated that the U Neural Network outperforms other models with an accuracy of 98.80%, a probability of detection of 98.85%, a misdetection of 1.15%, a false alarm of 1.8%, a training time per sample of 0.22 seconds, a prediction time of 0.2 seconds, and a memory size of 199.87 MiB. In addition, these results depicted that the Long Short-Term Memory model provides the lowest performance among other models for detecting these attacks on UAVs. 
    more » « less