skip to main content


Title: Honey Onions: A framework for characterizing and identifying misbehaving Tor HSDirs
In the last decade, Tor proved to be a very successful and widely popular system to protect users' anonymity. However, Tor remains a practical system with a variety of limitations, some of which were indeed exploited in the recent past. In particular, Tor's security relies on the fact that a substantial number of its nodes do not misbehave. In this work we introduce, the concept of honey onions, a framework to detect misbehaving Tor relays with HSDir capability. This allows to obtain lower bounds on misbehavior among relays. We propose algorithms to both estimate the number of snooping HSDirs and identify the most likely snoopers. Our experimental results indicate that during the period of the study (72 days) at least 110 such nodes were snooping information about hidden services they host. We reveal that more than half of them were hosted on cloud infrastructure and delayed the use of the learned information to prevent easy traceback.  more » « less
Award ID(s):
1643249
NSF-PAR ID:
10046941
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE Conference on Communications and Network Security (CNS)
Page Range / eLocation ID:
127 to 135
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Recent work has shown that Tor is vulnerable to attacks that manipulate inter-domain routing to compromise user privacy. Proposed solutions such as Counter-RAPTOR [29] attempt to ameliorate this issue by favoring Tor entry relays that have high resilience to these attacks. However, because these defenses bias Tor path selection on the identity of the client, they invariably leak probabilistic information about client identities. In this work, we make the following contributions. First, we identify a novel means to quantify privacy leakage in guard selection algorithms using the metric of Max-Divergence. Max-Divergence ensures that probabilistic privacy loss is within strict bounds while also providing composability over time. Second, we utilize Max-Divergence and multiple notions of entropy to understand privacy loss in the worst-case for Counter-RAPTOR. Our worst-case analysis provides a fresh perspective to the field, as prior work such as Counter-RAPTOR only analyzed average case-privacy loss. Third, we propose modifications to Counter-RAPTOR that incorporate worst-case Max-Divergence in its design. Specifically, we utilize the exponential mechanism (a mechanism for differential privacy) to guarantee a worst-case bound on Max-Divergence/privacy loss. For the quality function used in the exponential mechanism, we show that a Monte-Carlo sampling-based method for stochastic optimization can be used to improve multi-dimensional trade-offs between security, privacy, and performance. Finally, we demonstrate that compared to Counter-RAPTOR, our approach achieves an 83% decrease in Max-Divergence after one guard selection and a 245% increase in worst-case Shannon entropy after 5 guard selections. Notably, experimental evaluations using the Shadow emulator shows that our approach provides these privacy benefits with minimal impact on system performance. 
    more » « less
  2. null (Ed.)
    Tor is a popular low-latency anonymous communication system that focuses on usability and performance: a faster network will attract more users, which in turn will improve the anonymity of everyone using the system. The standard practice for previous research attempting to enhance Tor performance is to draw conclusions from the observed results of a single simulation for standard Tor and for each research variant. But because the simulations are run in sampled Tor networks, it is possible that sampling error alone could cause the observed effects. Therefore, we call into question the practical meaning of any conclusions that are drawn without considering the statistical significance of the reported results. In this paper, we build foundations upon which we improve the Tor experimental method. First, we present a new Tor network modeling methodology that produces more representative Tor networks as well as new and improved experimentation tools that run Tor simulations faster and at a larger scale than was previously possible. We showcase these contributions by running simulations with 6,489 relays and 792k simultaneously active users, the largest known Tor network simulations and the first at a network scale of 100%. Second, we present new statistical methodologies through which we: (i) show that running multiple simulations in independently sampled networks is necessary in order to produce informative results; and (ii) show how to use the results from multiple simulations to conduct sound statistical inference. We present a case study using 420 simulations to demonstrate how to apply our methodologies to a concrete set of Tor experiments and how to analyze the results. 
    more » « less
  3. null (Ed.)
    The Tor network estimates its relays’ bandwidths using relay self-measurements of client traffic speeds. These estimates largely determine how existing traffic load is balanced across relays, and they are used to evaluate the network’s capacity to handle future traffic load increases. Thus, their accuracy is important to optimize Tor’s performance and strategize for growth. However, their accuracy has never been measured. We investigate the accuracy of Tor’s capacity estimation with an analysis of public network data and an active experiment run over the entire live network. Our results suggest that the bandwidth estimates underestimate the total network capacity by at least 50% and that the errors are larger for high-bandwidth and low-uptime relays. Our work suggests that improving Tor’s bandwidth measurement system could improve the network’s performance and better inform plans to handle future growth. 
    more » « less
  4. Abstract

    We propose two schemes for asynchronous multi‐relay two‐way relay (MR‐TWR) systems in which neither the users nor the relays know the channel state information. In an MR‐TWR system, two users exchange their messages with the help ofNRrelays. Most of the existing works on MR‐TWR systems based on differential modulation assume perfect symbol‐level synchronization between all communicating nodes. However, this assumption is not valid in many practical systems, which makes the design of differentially modulated schemes more challenging. Therefore, we design differential modulation schemes that can tolerate timing misalignment under frequency‐selective fading. We investigate the performance of the proposed schemes in terms of either probability of bit error or pairwise error probability. Through numerical examples, we show that the proposed schemes outperform existing competing solutions in the literature, especially for high signal‐to‐noise ratio values. Copyright © 2016 John Wiley & Sons, Ltd.

     
    more » « less
  5. In this paper, we consider the problem of constructing paths using decode and forward (DF) relays for millimeter wave (mmWave) backhaul communications in urban environments. Due to the large number of obstacles in urban environments, line-of-sight (LoS) wireless links, which are necessary for backhaul communication, often do not exist between small-cell base stations. To address this, some earlier works proposed creating multi-hop paths that use mmWave relay nodes with LoS communication between every pair of consecutive nodes to form logical links between base stations. We present algorithms, based on a novel widest-path formulation of the problem, for selecting decode and forward relay node locations in such paths. Our main algorithm is the first polynomial-time algorithm that constructs a relay path with a throughput that is proven to be the maximum possible. We also present variations of this algorithm for constrained problems in which: 1) each possible relay location can host only one relay node, and 2) minimizing the number of hops in the relay path is also an objective. For all of the proposed algorithms, the achievable throughput and numbers of relays are evaluated through simulation based on a 3-D model of a section of downtown Atlanta. The results show that, over a large number of random cases, our algorithm can always find paths with very high throughput using a small number of relays. We also compare and contrast the results with our earlier work that studied the use of amplify-and-forward (AF) relays for the same scenario. 
    more » « less