skip to main content


Title: Neural Network Prediction of Censorable Language
Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House’s annual Freedom on the Net report, more than half the world’s Internet users now live in a place where the Internet is censored or restricted. China has built the world’s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention ‘sensitive’ topics or authored by ‘sensitive’ users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50% accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention.  more » « less
Award ID(s):
1704113
NSF-PAR ID:
10463459
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 3rd Workshop on NLP and Computational Social Science (NLP+CSS) held in conjunction with NAACL 2019
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts thatwere published onSinaWeibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34%accuracy in predicting whether a blog post would be censored on Sina Weibo. 
    more » « less
  2. null (Ed.)
    Abstract Refraction networking is a next-generation censorship circumvention approach that locates proxy functionality in the network itself, at participating ISPs or other network operators. Following years of research and development and a brief pilot, we established the world’s first production deployment of a Refraction Networking system. Our deployment uses a highperformance implementation of the TapDance protocol and is enabled as a transport in the popular circumvention app Psiphon. It uses TapDance stations at four physical uplink locations of a mid-sized ISP, Merit Network, with an aggregate bandwidth of 140 Gbps. By the end of 2019, our system was enabled as a transport option in 559,000 installations of Psiphon, and it served upwards of 33,000 unique users per month. This paper reports on our experience building the deployment and operating it for the first year. We describe how we overcame engineering challenges, present detailed performance metrics, and analyze how our system has responded to dynamic censor behavior. Finally, we review lessons learned from operating this unique artifact and discuss prospects for further scaling Refraction Networking to meet the needs of censored users. 
    more » « less
  3. This paper investigates the relationship between demographics and the frequency of censored posts (weibos) on Sina Weibo. Our results indicate that demographics such as location, gender and paid for features do not provide a good degree of predictive power but help explain how censorship is applied on social media. Using a dataset of 226 million weibos collected in 2012, we apply a binomial regression model to evaluate the predictive quality of user demographics to identify candidates that may be targeted for censorship. Our results suggest male users who are verified (pay for mobile and security features) are more likely to be censored than females or users who are not verified. In addition, users from provinces such as Hong Kong, Macao, and Beijing are more heavily censored compared to any other province in China over the same period. 
    more » « less
  4. For the past 20 years, China has increasingly restricted the access of minors to online games using addiction prevention systems (APSes). At the same time, and through different means, i.e., the Great Firewall of China (GFW), it also restricts general population access to the international Internet. This paper studies how these restrictions impact young online gamers, and their evasion efforts. We present results from surveys (n = 2,415) and semi-structured interviews (n = 35) revealing viable commonly deployed APS evasion techniques and APS vulnerabilities. We conclude that the APS does not work as designed, even against very young online game players, and can act as a censorship evasion training ground for tomorrow’s adults, by familiarization with and normalization of general evasion techniques, and desensitization to their dangers. Findings from these studies may further inform developers of censorship-resistant systems about the perceptions and evasion strategies of their prospective users, and help design tools that leverage services and platforms popular among the censored audience. 
    more » « less
  5. For the past 20 years, China has increasingly restricted the access of minors to online games using addiction prevention systems (APSes). At the same time, and through different means, i.e., the Great Firewall of China (GFW), it also restricts general population access to the international Internet. This paper studies how these restrictions impact young online gamers, and their evasion efforts. We present results from surveys (n = 2,415) and semi-structured interviews (n = 35) revealing viable commonly deployed APS evasion techniques and APS vulnerabilities. We conclude that the APS does not work as designed, even against very young online game players, and can act as a censorship evasion training ground for tomorrow’s adults, by familiarization with and normalization of general evasion techniques, and desensitization to their dangers. Findings from these studies may further inform developers of censorship-resistant systems about the perceptions and evasion strategies of their prospective users, and help design tools that leverage services and platforms popular among the censored audience. 
    more » « less