NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Data-Explainable Website Fingerprinting with Network Simulation

https://doi.org/10.56553/popets-2023-0125

Jansen, Rob; Wails, Ryan (October 2023, Proceedings on Privacy Enhancing Technologies)

Website fingerprinting (WF) attacks allow an adversary to associate a website with the encrypted traffic patterns produced when accessing it, thus threatening to destroy the client-server unlinkability promised by anonymous communication networks. Explainable WF is an open problem in which we need to improve our understanding of (1) the machine learning models used to conduct WF attacks; and (2) the WF datasets used as inputs to those models. This paper focuses on explainable datasets; that is, we develop an alternative to the standard practice of gathering low-quality WF datasets using synthetic browsers in large networks without controlling for natural network variability. In particular, we demonstrate how network simulation can be used to produce explainable WF datasets by leveraging the simulator's high degree of control over network operation. Through a detailed investigation of the effect of network variability on WF performance, we find that: (1) training and testing WF attacks in networks with distinct levels of congestion increases the false-positive rate by as much as 200%; (2) augmenting the WF attacks by training them across several networks with varying degrees of congestion decreases the false-positive rate by as much as 83%; and (3) WF classifiers trained on completely simulated data can achieve greater than 80% accuracy when applied to the real world.
more » « less
Full Text Available
Co-opting Linux Processes for High-Performance Network Simulation

Jansen, Rob; Newsome, Jim; Wails, Ryan (July 2022, 2022 USENIX Annual Technical Conference (USENIX ATC 22))

Network experimentation tools are vitally important to the process of developing, evaluating, and testing distributed systems. The state-of-the-art simulation tools are either prohibitively inefficient at large scales or are limited by nontrivial architectural challenges, inhibiting their widespread adoption. In this paper, we present the design and implementation of Phantom, a novel tool for conducting distributed system experiments. In Phantom, a discrete-event network simulator directly executes unmodified applications as Linux processes and innovatively synthesizes efficient process control, system call interposition, and data transfer methods to co-opt the processes into the simulation environment. Our evaluation demonstrates that Phantom is up to 2.2× faster than Shadow, up to 3.4× faster than NS-3, and up to 43× faster than gRaIL in large P2P benchmarks while offering performance comparable to Shadow in large Tor network simulations.
more » « less
Full Text Available
Once is Never Enough: Foundations for Sound Statistical Inference in Tor Network Experimentation

Jansen, Rob; Tracey, Justin; Goldberg, Ian (August 2021, USENIX Security Symposium)
null (Ed.)
Tor is a popular low-latency anonymous communication system that focuses on usability and performance: a faster network will attract more users, which in turn will improve the anonymity of everyone using the system. The standard practice for previous research attempting to enhance Tor performance is to draw conclusions from the observed results of a single simulation for standard Tor and for each research variant. But because the simulations are run in sampled Tor networks, it is possible that sampling error alone could cause the observed effects. Therefore, we call into question the practical meaning of any conclusions that are drawn without considering the statistical significance of the reported results. In this paper, we build foundations upon which we improve the Tor experimental method. First, we present a new Tor network modeling methodology that produces more representative Tor networks as well as new and improved experimentation tools that run Tor simulations faster and at a larger scale than was previously possible. We showcase these contributions by running simulations with 6,489 relays and 792k simultaneously active users, the largest known Tor network simulations and the first at a network scale of 100%. Second, we present new statistical methodologies through which we: (i) show that running multiple simulations in independently sampled networks is necessary in order to produce informative results; and (ii) show how to use the results from multiple simulations to conduct sound statistical inference. We present a case study using 420 simulations to demonstrate how to apply our methodologies to a concrete set of Tor experiments and how to analyze the results.
more » « less
Full Text Available
On the Accuracy of Tor Bandwidth Estimation

https://doi.org/10.1007/978-3-030-72582-2_28

Jansen, Rob; Johnson, Aaron (March 2021, Passive and Active Measurement (PAM))
null (Ed.)
The Tor network estimates its relays’ bandwidths using relay self-measurements of client traffic speeds. These estimates largely determine how existing traffic load is balanced across relays, and they are used to evaluate the network’s capacity to handle future traffic load increases. Thus, their accuracy is important to optimize Tor’s performance and strategize for growth. However, their accuracy has never been measured. We investigate the accuracy of Tor’s capacity estimation with an analysis of public network data and an active experiment run over the entire live network. Our results suggest that the bandwidth estimates underestimate the total network capacity by at least 50% and that the errors are larger for high-bandwidth and low-uptime relays. Our work suggests that improving Tor’s bandwidth measurement system could improve the network’s performance and better inform plans to handle future growth.
more » « less
Full Text Available

Search for: All records