skip to main content


Title: Detecting Media Self-Censorship without Explicit Training Data
The motives and means of explicit state censorship have been well studied, both quantitatively and qualitatively. Self-censorship by media outlets, however, has not received nearly as much attention, mostly because it is difficult to systematically detect. We develop a novel approach to identify news media self-censorship by using social media as a sensor. We develop a hypothesis testing framework to identify and evaluate censored clusters of keywords and a near-linear-time algorithm (called GraphDPD) to identify the highest-scoring clusters as indicators of censorship. We evaluate the accuracy of our framework, versus other state-of-the-art algorithms, using both semi-synthetic and real-world data from Mexico and Venezuela during Year 2014. These tests demonstrate the capacity of our framework to identify self-censorship and provide an indicator of broader media freedom. The results of this study lay the foundation for detection, study, and policy-response to self-censorship.  more » « less
Award ID(s):
1954376 1750911
NSF-PAR ID:
10223465
Author(s) / Creator(s):
; ; ; ; ; ;
Editor(s):
Demeniconi; Carlotta; Nitesh V. Chawla
Date Published:
Journal Name:
Proceedings of the 2020 SIAM International Conference on Data Mining
Page Range / eLocation ID:
550-558
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Government censorship—internet shutdowns, blockages, firewalls—impose significant barriers to the transnational flow of information despite the connective power of digital technologies. In this paper, we examine whether and how information flows across borders despite government censorship. We develop a semi-automated system that combines deep learning and human annotation to find co-occurring content across different social media platforms and languages. We use this system to detect co-occurring content between Twitter and Sina Weibo as Covid-19 spread globally, and we conduct in-depth investigations of co-occurring content to identify those that constitute an inflow of information from the global information ecosystem into China. We find that approximately one-fourth of content with relevance for China that gains widespread public attention on Twitter makes its way to Weibo. Unsurprisingly, Chinese state-controlled media and commercialized domestic media play a dominant role in facilitating these inflows of information. However, we find that Weibo users without traditional media or government affiliations are also an important mechanism for transmitting information into China. These results imply that while censorship combined with media control provide substantial leeway for the government to set the agenda, social media provides opportunities for non-institutional actors to influence the information environment. Methodologically, the system we develop offers a new approach for the quantitative analysis of cross-platform and cross-lingual communication.

     
    more » « less
  2. Protest event analysis is an important method for the study of collective action and social movements and typically draws on traditional media reports as the data source. We introduce collective action from social media (CASM)—a system that uses convolutional neural networks on image data and recurrent neural networks with long short-term memory on text data in a two-stage classifier to identify social media posts about offline collective action. We implement CASM on Chinese social media data and identify more than 100,000 collective action events from 2010 to 2017 (CASM-China). We evaluate the performance of CASM through cross-validation, out-of-sample validation, and comparisons with other protest data sets. We assess the effect of online censorship and find it does not substantially limit our identification of events. Compared to other protest data sets, CASM-China identifies relatively more rural, land-related protests and relatively few collective action events related to ethnic and religious conflict. 
    more » « less
  3. This paper investigates the relationship between demographics and the frequency of censored posts (weibos) on Sina Weibo. Our results indicate that demographics such as location, gender and paid for features do not provide a good degree of predictive power but help explain how censorship is applied on social media. Using a dataset of 226 million weibos collected in 2012, we apply a binomial regression model to evaluate the predictive quality of user demographics to identify candidates that may be targeted for censorship. Our results suggest male users who are verified (pay for mobile and security features) are more likely to be censored than females or users who are not verified. In addition, users from provinces such as Hong Kong, Macao, and Beijing are more heavily censored compared to any other province in China over the same period. 
    more » « less
  4. Abstract Background

    Repetitive action, resistance to environmental change and fine motor disruptions are hallmarks of autism spectrum disorder (ASD) and other neurodevelopmental disorders, and vary considerably from individual to individual. In animal models, conventional behavioral phenotyping captures such fine-scale variations incompletely. Here we observed male and female C57BL/6J mice to methodically catalog adaptive movement over multiple days and examined two rodent models of developmental disorders against this dynamic baseline. We then investigated the behavioral consequences of a cerebellum-specific deletion in Tsc1 protein and a whole-brain knockout in Cntnap2 protein in mice. Both of these mutations are found in clinical conditions and have been associated with ASD.

    Methods

    We used advances in computer vision and deep learning, namely a generalized form of high-dimensional statistical analysis, to develop a framework for characterizing mouse movement on multiple timescales using a single popular behavioral assay, the open-field test. The pipeline takes virtual markers from pose estimation to find behavior clusters and generate wavelet signatures of behavior classes. We measured spatial and temporal habituation to a new environment across minutes and days, different types of self-grooming, locomotion and gait.

    Results

    Both Cntnap2 knockouts and L7-Tsc1 mutants showed forelimb lag during gait. L7-Tsc1 mutants and Cntnap2 knockouts showed complex defects in multi-day adaptation, lacking the tendency of wild-type mice to spend progressively more time in corners of the arena. In L7-Tsc1 mutant mice, failure to adapt took the form of maintained ambling, turning and locomotion, and an overall decrease in grooming. However, adaptation in these traits was similar between wild-type mice and Cntnap2 knockouts. L7-Tsc1 mutant and Cntnap2 knockout mouse models showed different patterns of behavioral state occupancy.

    Limitations

    Genetic risk factors for autism are numerous, and we tested only two. Our pipeline was only done under conditions of free behavior. Testing under task or social conditions would reveal more information about behavioral dynamics and variability.

    Conclusions

    Our automated pipeline for deep phenotyping successfully captures model-specific deviations in adaptation and movement as well as differences in the detailed structure of behavioral dynamics. The reported deficits indicate that deep phenotyping constitutes a robust set of ASD symptoms that may be considered for implementation in clinical settings as quantitative diagnosis criteria.

     
    more » « less
  5. Analyzing gender is critical to study mental health (MH) support in CVD (cardiovascular disease). The existing studies on using social media for extracting MH symptoms consider symptom detection and tend to ignore user context, disease, or gender. The current study aims to design and evaluate a system to capture how MH symptoms associated with CVD are expressed differently with the gender on social media. We observe that the reliable detection of MH symptoms expressed by persons with heart disease in user posts is challenging because of the co-existence of (dis)similar MH symptoms in one post and due to variation in the description of symptoms based on gender. We collect a corpus of 150k items (both posts and comments) annotated using the subreddit labels and transfer learning approaches. We propose GeM, a novel task-adaptive multi-task learning approach to identify the MH symptoms in CVD patients based on gender. Specifically, we adapt a knowledge-assisted RoBERTa based bi-encoder model to capture CVD-related MH symptoms. Moreover, it enhances the reliability for differentiating the gender language in MH symptoms when compared to the state-of-art language models. Our model achieves high (statistically significant) performance and predicts four labels of MH issues and two gender labels, which outperforms RoBERTa, improving the recall by 2.14% on the symptom identification task and by 2.55% on the gender identification task. 
    more » « less