skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Privacy-Preserving Dynamic Learning of Tor Network Traffic
Experimentation tools facilitate exploration of Tor performance and security research problems and allow researchers to safely and privately conduct Tor experiments without risking harm to real Tor users. However, researchers using these tools configure them to generate network traffic based on simplifying assumptions and outdated measurements and without understanding the efficacy of their configuration choices. In this work, we design a novel technique for dynamically learning Tor network traffic models using hidden Markov modeling and privacy-preserving measurement techniques. We conduct a safe but detailed measurement study of Tor using 17 relays (~2% of Tor bandwidth) over the course of 6 months, measuring general statistics and models that can be used to generate a sequence of streams and packets. We show how our measurement results and traffic models can be used to generate traffic flows in private Tor networks and how our models are more realistic than standard and alternative network traffic generation~methods.  more » « less
Award ID(s):
1815757
NSF-PAR ID:
10107879
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security
Page Range / eLocation ID:
1944 to 1961
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Tor is a popular low-latency anonymous communication system that focuses on usability and performance: a faster network will attract more users, which in turn will improve the anonymity of everyone using the system. The standard practice for previous research attempting to enhance Tor performance is to draw conclusions from the observed results of a single simulation for standard Tor and for each research variant. But because the simulations are run in sampled Tor networks, it is possible that sampling error alone could cause the observed effects. Therefore, we call into question the practical meaning of any conclusions that are drawn without considering the statistical significance of the reported results. In this paper, we build foundations upon which we improve the Tor experimental method. First, we present a new Tor network modeling methodology that produces more representative Tor networks as well as new and improved experimentation tools that run Tor simulations faster and at a larger scale than was previously possible. We showcase these contributions by running simulations with 6,489 relays and 792k simultaneously active users, the largest known Tor network simulations and the first at a network scale of 100%. Second, we present new statistical methodologies through which we: (i) show that running multiple simulations in independently sampled networks is necessary in order to produce informative results; and (ii) show how to use the results from multiple simulations to conduct sound statistical inference. We present a case study using 420 simulations to demonstrate how to apply our methodologies to a concrete set of Tor experiments and how to analyze the results. 
    more » « less
  2. null (Ed.)
    The Tor network estimates its relays’ bandwidths using relay self-measurements of client traffic speeds. These estimates largely determine how existing traffic load is balanced across relays, and they are used to evaluate the network’s capacity to handle future traffic load increases. Thus, their accuracy is important to optimize Tor’s performance and strategize for growth. However, their accuracy has never been measured. We investigate the accuracy of Tor’s capacity estimation with an analysis of public network data and an active experiment run over the entire live network. Our results suggest that the bandwidth estimates underestimate the total network capacity by at least 50% and that the errors are larger for high-bandwidth and low-uptime relays. Our work suggests that improving Tor’s bandwidth measurement system could improve the network’s performance and better inform plans to handle future growth. 
    more » « less
  3. null (Ed.)
    Network traffic modeling plays an important role in the generation of realistic network traffic in test environments. Especially in cases where researchers carry out experiments with real production-like traffic, as seen in specific home, enterprise, campus, LAN, or WAN networks. We present our ongoing work on a new framework that enables the methodical creation of application-agnostic traffic models from given network traces of a known network topology. The framework then uses these models to generate realistic traffic on a given network topology. We share a preliminary evaluation of the framework based on repeatable experiments where we model a typical web application traffic and then regenerate the traffic based on the model in a test network on our VTS (Virtual Topology Services) testbed. 
    more » « less
  4. Abstract

    This study presents a data-driven framework for modeling complex systems, with a specific emphasis on traffic modeling. Traditional methods in traffic modeling often rely on assumptions regarding vehicle interactions. Our approach comprises two steps: first, utilizing information- theoretic (IT) tools to identify interaction directions and candidate variables thus eliminating assumptions, and second, employing the sparse identification of nonlinear systems (SINDy) tool to establish functional relationships. We validate the framework’s efficacy using synthetic data from two distinct traffic models, while considering measurement noise. Results show that IT tools can reliably detect directions of interaction as well as instances of no interaction. SINDy proves instrumental in creating precise functional relationships and determining coefficients in tested models. The innovation of our framework lies in its ability to use data-driven approach to model traffic dynamics without relying on assumptions, thus offering applications in various complex systems beyond traffic.

     
    more » « less
  5. Website fingerprinting attacks, which use statistical analysis on network traffic to compromise user privacy, have been shown to be effective even if the traffic is sent over anonymity-preserving networks such as Tor. The classical attack model used to evaluate website fingerprinting attacks assumes an on-path adversary, who can observe all traffic traveling between the user’s computer and the secure network. In this work we investigate these attacks under a different attack model, in which the adversary is capable of sending a small amount of malicious JavaScript code to the target user’s computer. The malicious code mounts a cache side-channel attack, which exploits the effects of contention on the CPU’s cache, to identify other websites being browsed. The effectiveness of this attack scenario has never been systematically analyzed, especially in the open-world model which assumes that the user is visiting a mix of both sensitive and non-sensitive sites. We show that cache website fingerprinting attacks in JavaScript are highly feasible. Specifically, we use machine learning techniques to classify traces of cache activity. Unlike prior works, which try to identify cache conflicts, our work measures the overall occupancy of the last-level cache. We show that our approach achieves high classification accuracy in both the open-world and the closed-world models. We further show that our attack is more resistant than network-based fingerprinting to the effects of response caching, and that our techniques are resilient both to network-based defenses and to side-channel countermeasures introduced to modern browsers as a response to the Spectre attack. To protect against cache-based website fingerprinting, new defense mechanisms must be introduced to privacy-sensitive browsers and websites. We investigate one such mechanism, and show that generating artificial cache activity reduces the effectiveness of the attack and completely eliminates it when used in the Tor Browser 
    more » « less