In response to the growing sophistication of censor- ship methods deployed by governments worldwide, the existence of open-source censorship measurement platforms has increased. Analyzing censorship data is challenging due to the data’s large size, diversity, and variability, requiring a comprehensive under- standing of the data collection process and applying established data analysis techniques for thorough information extraction. In this work, we develop a framework that is applicable across all major censorship datasets to continually identify changes in cen- sorship data trends and reveal potentially unreported censorship. Our framework consists of control charts and the Mann-Kendall trend detection test, originating from statistical process control theory, and we implement it on Censored Planet, GFWatch, the Open Observatory of Network Interference (OONI), and Tor data from Russia, Myanmar, China, Iran, T ¨ urkiye, and Pakistan from January 2021 through March 2023. Our study confirms results from prior studies and also identifies new events that we validate through media reports. Our correlation analysis reveals minimal similarities between censorship datasets. However, because our framework is applicable across all major censorship datasets, it significantly reduces the manual effort required to employ multiple datasets, which we further demonstrate by applying it to four additional Internet outage-related datasets. Our work thus provides a tool for continuously monitoring censorship activity and acts as a basis for developing more systematic, holistic, and in-depth analysis techniques for censorship data.
more »
« less
Detecting Media Self-Censorship without Explicit Training Data
The motives and means of explicit state censorship have been well studied, both quantitatively and qualitatively. Self-censorship by media outlets, however, has not received nearly as much attention, mostly because it is difficult to systematically detect. We develop a novel approach to identify news media self-censorship by using social media as a sensor. We develop a hypothesis testing framework to identify and evaluate censored clusters of keywords and a near-linear-time algorithm (called GraphDPD) to identify the highest-scoring clusters as indicators of censorship. We evaluate the accuracy of our framework, versus other state-of-the-art algorithms, using both semi-synthetic and real-world data from Mexico and Venezuela during Year 2014. These tests demonstrate the capacity of our framework to identify self-censorship and provide an indicator of broader media freedom. The results of this study lay the foundation for detection, study, and policy-response to self-censorship.
more »
« less
- PAR ID:
- 10223465
- Editor(s):
- Demeniconi; Carlotta; Nitesh V. Chawla
- Date Published:
- Journal Name:
- Proceedings of the 2020 SIAM International Conference on Data Mining
- Page Range / eLocation ID:
- 550-558
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Government censorship—internet shutdowns, blockages, firewalls—impose significant barriers to the transnational flow of information despite the connective power of digital technologies. In this paper, we examine whether and how information flows across borders despite government censorship. We develop a semi-automated system that combines deep learning and human annotation to find co-occurring content across different social media platforms and languages. We use this system to detect co-occurring content between Twitter and Sina Weibo as Covid-19 spread globally, and we conduct in-depth investigations of co-occurring content to identify those that constitute an inflow of information from the global information ecosystem into China. We find that approximately one-fourth of content with relevance for China that gains widespread public attention on Twitter makes its way to Weibo. Unsurprisingly, Chinese state-controlled media and commercialized domestic media play a dominant role in facilitating these inflows of information. However, we find that Weibo users without traditional media or government affiliations are also an important mechanism for transmitting information into China. These results imply that while censorship combined with media control provide substantial leeway for the government to set the agenda, social media provides opportunities for non-institutional actors to influence the information environment. Methodologically, the system we develop offers a new approach for the quantitative analysis of cross-platform and cross-lingual communication.more » « less
-
This paper investigates the relationship between demographics and the frequency of censored posts (weibos) on Sina Weibo. Our results indicate that demographics such as location, gender and paid for features do not provide a good degree of predictive power but help explain how censorship is applied on social media. Using a dataset of 226 million weibos collected in 2012, we apply a binomial regression model to evaluate the predictive quality of user demographics to identify candidates that may be targeted for censorship. Our results suggest male users who are verified (pay for mobile and security features) are more likely to be censored than females or users who are not verified. In addition, users from provinces such as Hong Kong, Macao, and Beijing are more heavily censored compared to any other province in China over the same period.more » « less
-
Protest event analysis is an important method for the study of collective action and social movements and typically draws on traditional media reports as the data source. We introduce collective action from social media (CASM)—a system that uses convolutional neural networks on image data and recurrent neural networks with long short-term memory on text data in a two-stage classifier to identify social media posts about offline collective action. We implement CASM on Chinese social media data and identify more than 100,000 collective action events from 2010 to 2017 (CASM-China). We evaluate the performance of CASM through cross-validation, out-of-sample validation, and comparisons with other protest data sets. We assess the effect of online censorship and find it does not substantially limit our identification of events. Compared to other protest data sets, CASM-China identifies relatively more rural, land-related protests and relatively few collective action events related to ethnic and religious conflict.more » « less
-
Global Positioning Systems (GPSs) can collect tracking data to remotely monitor livestock well-being and pasture use. Supervised machine learning requires behavioral observations of monitored animals to identify changes in behavior, which is labor-intensive. Our goal was to identify animal behaviors automatically without using human observations. We designed a novel framework using unsupervised learning techniques. The framework contains two steps. The first step segments cattle tracking data using state-of-the-art time series segmentation algorithms, and the second step groups segments into clusters and then labels the clusters. To evaluate the applicability of our proposed framework, we utilized GPS tracking data collected from five cows in a 1096 ha rangeland pasture. Cow movement pathways were grouped into six behavior clusters based on velocity (m/min) and distance from water. Again, using velocity, these six clusters were classified into walking, grazing, and resting behaviors. The mean velocity for predicted walking and grazing and resting behavior was 44, 13 and 2 min/min, respectively, which is similar to other research. Predicted diurnal behavior patterns showed two primary grazing bouts during early morning and evening, like in other studies. Our study demonstrates that the proposed two-step framework can use unlabeled GPS tracking data to predict cattle behavior without human observations.more » « less
An official website of the United States government

