skip to main content


Title: An Empirical Analysis of the Commercial VPN Ecosystem
Global Internet users increasingly rely on virtual private network (VPN) services to preserve their privacy, circumvent censorship, and access geo-filtered content. Due to their own lack of technical sophistication and the opaque nature of VPN clients, however, the vast majority of users have limited means to verify a given VPN service’s claims along any of these dimensions. We design an active measurement system to test various infrastructural and privacy aspects of VPN services and evaluate 62 commercial providers. Our results suggest that while commercial VPN services seem, on the whole, less likely to intercept or tamper with user traffic than other, previously studied forms of traffic proxying, many VPNs do leak user traffic—perhaps inadvertently—through a variety of means. We also find that a non-trivial fraction of VPN providers transparently proxy traffic, and many misrepresent the physical location of their vantage points: 5–30% of the vantage points, associated with 10% of the providers we study, appear to be hosted on servers located in countries other than those advertised to users.  more » « less
Award ID(s):
1705050
NSF-PAR ID:
10100952
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the Internet Measurement Conference (IMC'18)
Page Range / eLocation ID:
443 to 456
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Privacy technologies support the provision of online services while protecting user privacy. Cryptography lies at the heart of many such technologies, creating remarkable possibilities in terms of functionality while offering robust guarantees of data confidential- ity. The cryptography literature and discourse often represent that these technologies eliminate the need to trust service providers, i.e., they enable users to protect their privacy even against untrusted service providers. Despite their apparent promise, privacy technolo- gies have seen limited adoption in practice, and the most successful ones have been implemented by the very service providers these technologies purportedly protect users from. The adoption of privacy technologies by supposedly adversarial service providers highlights a mismatch between traditional models of trust in cryptography and the trust relationships that underlie deployed technologies in practice. Yet this mismatch, while well known to the cryptography and privacy communities, remains rela- tively poorly documented and examined in the academic literature— let alone broader media. This paper aims to fill that gap. Firstly, we review how the deployment of cryptographic tech- nologies relies on a chain of trust relationships embedded in the modern computing ecosystem, from the development of software to the provision of online services, that is not fully captured by tra- ditional models of trust in cryptography. Secondly, we turn to two case studies—web search and encrypted messaging—to illustrate how, rather than removing trust in service providers, cryptographic privacy technologies shift trust to a broader community of secu- rity and privacy experts and others, which in turn enables service providers to implicitly build and reinforce their trust relationship with users. Finally, concluding that the trust models inherent in the traditional cryptographic paradigm elide certain key trust relation- ships underlying deployed cryptographic systems, we highlight the need for organizational, policy, and legal safeguards to address that mismatch, and suggest some directions for future work. 
    more » « less
  2. We report the first wide-scale measurement study of server-side geographic restriction, or geoblocking, a phenomenon in which server operators intentionally deny access to users from particular countries or regions. Many sites practice geoblocking due to legal requirements or other business reasons, but excessive blocking can needlessly deny valuable content and services to entire national populations. To help researchers and policymakers understand this phenomenon, we develop a semi-automated system to detect instances where whole websites were rendered inaccessible due to geoblocking. By focusing on detecting geoblocking capabilities offered by large CDNs and cloud providers, we can reliably distinguish the practice from dynamic anti-abuse mechanisms and network-based censorship. We apply our techniques to test for geoblocking across the Alexa Top 10K sites from thousands of vantage points in 177 countries. We then expand our measurement to a sample of CDN customers in the Alexa Top 1M. We find that geoblocking occurs across a broad set of countries and sites. We observe geoblocking in nearly all countries we study, with Iran, Syria, Sudan, Cuba, and Russia experiencing the highest rates. These countries experience particularly high rates of geoblocking for finance and banking sites, likely as a result of US economic sanctions. We also verify our measurements with data provided by Cloudflare, and find our observations to be accurate. 
    more » « less
  3. null (Ed.)
    Edge data centers are an appealing place for telecommunication providers to offer in-network processing such as VPN services, security monitoring, and 5G. Placing these network services closer to users can reduce latency and core network bandwidth, but the deployment of network functions at the edge poses several important challenges. Edge data centers have limited resource capacity, yet network functions are re-source intensive with strict performance requirements. Replicating services at the edge is needed to meet demand, but balancing the load across multiple servers can be challenging due to diverse service costs, server and flow heterogeneity, and dynamic workload conditions. In this paper, we design and implement a model-based load balancer EdgeBalance for edge network data planes. EdgeBalance predicts the CPU demand of incoming traffic and adaptively distributes flows to servers to keep them evenly balanced. We overcome several challenges specific to network processing at the edge to improve throughput and latency over static load balancing and monitoring-based approaches. 
    more » « less
  4. null (Ed.)
    Residential proxy has emerged as a service gaining popularity recently, in which proxy providers relay their customers’ network traffic through millions of proxy peers under their control. We find that many of these proxy peers are mobile devices, whose role in the proxy network can have significant security implications since mobile devices tend to be privacy and resource-sensitive. However, little effort has been made so far to understand the extent of their involvement, not to mention how these devices are recruited by the proxy network and what security and privacy risks they may pose. In this paper, we report the first measurement study on the mobile proxy ecosystem. Our study was made possible by a novel measurement infrastructure, which enabled us to identify proxy providers, to discover proxy SDKs (software development kits), to detect Android proxy apps built upon the proxy SDKs, to harvest proxy IP addresses, and to understand proxy traffic. The information collected through this infrastructure has brought to us new understandings of this ecosystem and important security discoveries. More specifically, 4 proxy providers were found to offer app developers mobile proxy SDKs as a competitive app monetization channel, with $50K per month per 1M MAU (monthly active users). 1,701 Android APKs (belonging to 963 Android apps) turn out to have integrated those proxy SDKs, with most of them available on Google Play with at least 300M installations in total. Furthermore, 48.43% of these APKs are flagged by at least 5 anti-virus engines as malicious, which could explain why 86.60% of the 963 Android apps have been removed from Google Play by Oct 2019. Besides, while these apps display user consent dialogs on traffic relay, our user study indicates that the user consent texts are quite confusing. We even discover a proxy SDK that stealthily relays traffic without showing any notifications. We also captured 625K cellular proxy IPs, along with a set of suspicious activities observed in proxy traffic such as ads fraud. We have reported our findings to affected parties, offered suggestions, and proposed the methodologies to detect proxy apps and proxy traffic. 
    more » « less
  5. Development of a comprehensive legal privacy framework in the United States should be based on identification of the common deficiencies of privacy policies. We attempt to delineate deficiencies by critically analyzing the privacy policies of mobile apps, application suites, social networks, Internet Service Providers, and Internet-of-Things devices. Whereas many studies have examined readability of privacy policies, few have specifically identified the information that should be provided in privacy policies but is not. Privacy legislation invariably starts a definition of personally identifiable information. We find that privacy policies’ definitions of personally identifiable information are far too restrictive, excluding information that does not itself identify a person but which can be used to reasonably identify a person, and excluding information paired with a device identifier which can be reasonably linked to a person. Legislation should define personally identifiable information to include such information, and should differentiate between information paired with a name versus information paired with a device identifier. Privacy legislation often excludes anonymous and de-identified information from notice and choice requirements. We find that privacy policies’ descriptions of anonymous and de-identified information are far too broad, including information paired with advertising identifiers. Computer science has repeatedly demonstrated that such information is reasonably linkable. Legislation should define these categories of information to align with technological abilities. Legislation should also not exempt de-identified information from notice requirements, to increase transparency. Privacy legislation relies heavily on notice requirements. We find that, because privacy policies’ disclosures of the uses of personal information are disconnected from their disclosures about the types of personal information collected, we are often unable to determine which types of information are used for which purposes. Often, we cannot determine whether location or web browsing history is used solely for functional purposes or also for advertising. Legislation should require the disclosure of the purposes for each type of personal information collected. We also find that, because privacy policies disclosures of sharing of personal information are disconnected from their disclosures about the types of personal information collected, we are often unable to determine which types of information are shared. Legislation should require the disclosure of the types of personal information shared. Finally, privacy legislation relies heavily on user choice. We find that free services often require the collection and sharing of personal information. As a result, users often have no choices. We find that whereas some paid services afford users a wide variety of choices, paid services in less competitive sectors often afford users few choices over use and sharing of personal information for purposes unrelated to the service. As a result, users are often unable to dictate which types of information they wish to allow to be shared, and which types they wish to allow to be used for advertising. Legislation should differentiate between take-it-or-leave it, opt-out, and opt-in approaches based on the type of use and on whether the information is shared. Congress should consider whether user choices should be affected by the presence of market power. 
    more » « less