Internet censorship is pervasive, with significant effort dedicated to understanding what is censored, and where. Prior censorship measurements however have identified significant inconsistencies in their results; experiments show unexplained non-deterministic behaviors thought to be caused by censor load, end-host geographic diversity, or incomplete censorship—inconsistencies which impede reliable, repeatable and correct understanding of global censorship. In this work we investigate the extent to which Equal-cost Multi-path (ECMP) routing is the cause for these inconsistencies, developing methods to measure and compensate for them. We find that ECMP routing significantly changes observed censorship across protocols, censor mechanisms, and in 18 countries. We identify that previously observed non-determinism or regional variations are attributable to measurements between fixed endhosts taking different routes based on Flow-ID; i.e., choice of intrasubnet source IP or ephemeral source port leads to differences in observed censorship. To achieve this we develop new route-stable censorship measurement methods that allow consistent measurement of DNS, HTTP, and HTTPS censorship. We find ECMP routing yields censorship changes across 42% of IPs and 51% of ASes, but that impact is not uniform. We develop an application-level traceroute tool to construct network paths using specific censored packets, leading us to identify numerous causes of the behavior, ranging from likely failed infrastructure, to routes to the same end-host taking geographically diverse paths which experience differences in censorship en-route. Finally, we compare our results to prior global measurements, demonstrating prior studies were possibly impacted by this phenomenon, and that specific results are explainable by ECMP routing. Our work points to methods for improving future studies, reducing inconsistencies and increasing repeatability
more »
« less
Leveraging NLP and Social Network Analytic Techniques to Detect Censored Keywords: System Design and Experiments
Internet regulation in the form of online censorship and Internet shutdowns have been increasing over recent years. This paper presents a natural language processing (NLP) application for performing cross country probing that conceals the exact location of the originating request. A detailed discussion of the application aims to stimulate further investigation into new methods for measuring and quantifying Internet censorship practices around the world. In addition, results from two experiments involving search engine queries of banned keywords demonstrates censorship practices vary across different search engines. These results suggest opportunities for developing circumvention technologies that enable open and free access to information.
more »
« less
- Award ID(s):
- 1704113
- PAR ID:
- 10463464
- Date Published:
- Journal Name:
- 52nd Hawaii International Conference on System Sciences (HICSS)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House’s annual Freedom on the Net report, more than half the world’s Internet users now live in a place where the Internet is censored or restricted. China has built the world’s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention ‘sensitive’ topics or authored by ‘sensitive’ users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50% accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention.more » « less
-
If it remains debatable whether the Internet has surpassed print media in making information accessible to the public, then it must nevertheless be conceded that the Internet makes the manipulation and censorship of information easier than had been on the printed page. In coming years and in an increasing number of countries, everyday producers and consumers of online information will likely have to cultivate a sense of censorship. It behooves the online community to learn how to detect and evade interference by governments, regimes, corporations, con-artists, and vandals. The contribution of this research is to describe a method and platform to study Internet censorship detection and evasion. This paper presents the concepts, initial theories, and future work.more » « less
-
Calandrino, Joseph A.; Troncoso, Carmela (Ed.)The arms race between Internet freedom advocates and censors has catalyzed the emergence of sophisticated blocking techniques and directed significant research emphasis toward the development of automated censorship measurement and evasion tools based on packet manipulation. However, we observe that the probing process of censorship middleboxes using state-of-the-art evasion tools can be easily fingerprinted by censors, necessitating detection-resilient probing techniques. We validate our hypothesis by developing a real-time detection approach that utilizes Machine Learning (ML) to detect flow-level packet-manipulation and an algorithm for IP-level detection based on Threshold Random Walk (TRW). We then take the first steps toward detection-resilient censorship evasion by presenting DeResistor, a system that facilitates detection-resilient probing for packet-manipulation-based censorship-evasion. DeResistor aims to defuse detection logic employed by censors by performing detection-guided pausing of censorship evasion attempts and interleaving them with normal user-driven network activity. We evaluate our techniques by leveraging Geneva, a state-of-the-art evasion strategy generator, and validate them against 11 simulated censors supplied by Geneva, while also testing them against real-world censors (i.e., China’s Great Firewall (GFW), India and Kazakhstan). From an adversarial perspective, our proposed real-time detection method can quickly detect clients that attempt to probe censorship middle-boxes with manipulated packets after inspecting only two probing flows. From a defense perspective, DeResistor is effective at shielding Geneva training from detection while enabling it to narrow the search space to produce less detectable traffic. Importantly, censorship evasion strategies generated using DeResistor can attain a high success rate from different vantage points against the GFW (up to 98%) and 100% in India and Kazakhstan. Finally, we discuss detection countermeasures and extensibility of our approach to other censor-probing-based tools.more » « less
-
The objective of this research paper is to provide a methodology for measuring the financial impacts of Internet outages. The financial impacts are measured against a Nation’s Gross Domestic Product (GDP) for several states in India to project the aftermath of Internet outage episodes. In addition historical trends are analyzed to help derive predictive logic for Internet outages in order to forecast Internet shutdown incidents based on antecedent events. Results demonstrate the proposed method for determining economic loss highlights several factors and may at times be influenced by the frequency of events compared to overall size of GDP. In addition, historical trend analysis of Internet outages suggests that a predictive model to forecast future outages can help reveal underlying policies toward Internet censorship.more » « less
An official website of the United States government

