skip to main content


Title: Once is Never Enough: Foundations for Sound Statistical Inference in Tor Network Experimentation
Tor is a popular low-latency anonymous communication system that focuses on usability and performance: a faster network will attract more users, which in turn will improve the anonymity of everyone using the system. The standard practice for previous research attempting to enhance Tor performance is to draw conclusions from the observed results of a single simulation for standard Tor and for each research variant. But because the simulations are run in sampled Tor networks, it is possible that sampling error alone could cause the observed effects. Therefore, we call into question the practical meaning of any conclusions that are drawn without considering the statistical significance of the reported results. In this paper, we build foundations upon which we improve the Tor experimental method. First, we present a new Tor network modeling methodology that produces more representative Tor networks as well as new and improved experimentation tools that run Tor simulations faster and at a larger scale than was previously possible. We showcase these contributions by running simulations with 6,489 relays and 792k simultaneously active users, the largest known Tor network simulations and the first at a network scale of 100%. Second, we present new statistical methodologies through which we: (i) show that running multiple simulations in independently sampled networks is necessary in order to produce informative results; and (ii) show how to use the results from multiple simulations to conduct sound statistical inference. We present a case study using 420 simulations to demonstrate how to apply our methodologies to a concrete set of Tor experiments and how to analyze the results.  more » « less
Award ID(s):
1925497
NSF-PAR ID:
10287720
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
USENIX Security Symposium
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Many promising networking research ideas in programmable networks never see the light of day. Yet, deploying research prototypes in production networks can help validate research ideas, improve them with faster feedback, uncover new research questions, and also ease the subsequent transition to practice. In this paper, we show how researchers can run and validate their research ideas in their own backyards---on their production campus networks---and we have seen that such a demonstrator can expedite the deployment of a research idea in practice to solve real network operation problems. We present P4Campus , a proof-of-concept that encompasses tools, an infrastructure design, strategies, and best practices---both technical and non-technical---that can help researchers run experiments against their programmable network idea in their own network. We use network tapping devices, packet brokers, and commodity programmable switches to enable running experiments to evaluate research ideas on a production campus network. We present several compelling data-plane applications as use cases that run on our campus and solve production network problems. By sharing our experiences and open-sourcing our P4 apps [28], we hope to encourage similar efforts on other campuses. 
    more » « less
  2. null (Ed.)
    Many promising networking research ideas in programmable networks never see the light of day. Yet, deploying research prototypes in production networks can help validate research ideas, improve them with faster feedback, uncover new research questions, and also ease the subsequent transition to practice. In this paper, we show how researchers can run and validate their research ideas in their own backyards—on their production campus networks—and we have seen that such a demonstrator can expedite the deployment of a research idea in practice to solve real network operation problems. We present P4Campus, a proof-of-concept that encompasses tools, an infrastructure design, strategies, and best practices—–both technical and non-technical–—that can help researchers run experiments against their programmable network idea in their own network. We use network tapping devices, packet brokers, and commodity programmable switches to enable running experiments to evaluate research ideas on a production campus network. We present several compelling data-plane applications as use cases that run on our campus and solve production network problems. By sharing our experiences and open-sourcing our P4 apps [28], we hope to encourage similar efforts on other campuses. 
    more » « less
  3. Experimentation tools facilitate exploration of Tor performance and security research problems and allow researchers to safely and privately conduct Tor experiments without risking harm to real Tor users. However, researchers using these tools configure them to generate network traffic based on simplifying assumptions and outdated measurements and without understanding the efficacy of their configuration choices. In this work, we design a novel technique for dynamically learning Tor network traffic models using hidden Markov modeling and privacy-preserving measurement techniques. We conduct a safe but detailed measurement study of Tor using 17 relays (~2% of Tor bandwidth) over the course of 6 months, measuring general statistics and models that can be used to generate a sequence of streams and packets. We show how our measurement results and traffic models can be used to generate traffic flows in private Tor networks and how our models are more realistic than standard and alternative network traffic generation~methods. 
    more » « less
  4. We study the problem of load-balancing in path selection in anonymous networks such as Tor. We first find that the current Tor path selection strategy can create significant imbalances. We then develop a (locally) optimal algorithm for selecting paths and show, using flow-level simulation, that it results in much better balancing of load across the network. Our initial algorithm uses the complete state of the network, which is impractical in a distributed setting and can compromise users' privacy. We therefore develop a revised algorithm that relies on a periodic, differentially private summary of the network state to approximate the optimal assignment. Our simulations show that the revised algorithm significantly outpe forms the current strategy while maintaining provable privacy guarantees. 
    more » « less
  5. Motivated by a wide range of applications, research on agent-based models of contagion propagation over networks has attracted a lot of attention in the literature. Many of the available software systems for simulating such agent-based models require users to download software, build the executable, and set up execution environments. Further, running the resulting executable may require access to high performance computing clusters. Our work describes an open access software system (NetSimS) that works under the “Modeling and Simulation as a Service” (MSaaS) paradigm. It enables users to run simulations by selecting models and parameter values, initial conditions, and networks through a web interface. The system supports a variety of models and networks with millions of nodes and edges. In addition to the simulator, the system includes components that enable users to choose initial conditions for simulations in a variety of ways, to analyze the data generated through simulations, and to produce plots from the data. We describe the components of NetSimS and carry out a performance evaluation of the system. We also discuss two case studies carried out on large networks using the system. NetSimS is a major component within net.science, a cyberinfrastructure for network science. 
    more » « less