skip to main content

Title: Assessing Reliability of Chinese Geotagged Social Media Data for Spatiotemporal Representation of Human Mobility
Understanding the space-time dynamics of human activities is essential in studying human security issues such as climate change impacts, pandemic spreading, or urban sustainability. Geotagged social media posts provide an open and space-time continuous data source with user locations which is convenient for studying human movement. However, the reliability of Chinese geotagged social media data for representing human mobility remains unclear. This study compares human movement data derived from the posts of Sina Weibo, one of the largest social media software in China, and that of Baidu Qianxi, a high-resolution human movement dataset from ‘Baidu Map’, a popular location-based service in China with 1.3 billion users. Correlation analysis was conducted from multiple dimensions of time periods (weekly and monthly), geographic scales (cities and provinces), and flow directions (inflow and outflow), and a case study on COVID-19 transmission was further explored with such data. The result shows that Sina Weibo data can reveal similar patterns as that of Baidu Qianxi, and that the correlation is higher at the provincial level than at the city level and higher at the monthly scale than at the weekly scale. The study also revealed spatial variations in the degree of similarity between the two sources. Findings from this study reveal the values and properties and spatiotemporal heterogeneity of human mobility data extracted from Weibo tweets, providing a reference for the proper use of social media posts as the data sources for human mobility studies.  more » « less
Award ID(s):
1841403 1841520
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
ISPRS International Journal of Geo-Information
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Background Human movement is one of the forces that drive the spatial spread of infectious diseases. To date, reducing and tracking human movement during the COVID-19 pandemic has proven effective in limiting the spread of the virus. Existing methods for monitoring and modeling the spatial spread of infectious diseases rely on various data sources as proxies of human movement, such as airline travel data, mobile phone data, and banknote tracking. However, intrinsic limitations of these data sources prevent us from systematic monitoring and analyses of human movement on different spatial scales (from local to global). Objective Big data from social media such as geotagged tweets have been widely used in human mobility studies, yet more research is needed to validate the capabilities and limitations of using such data for studying human movement at different geographic scales (eg, from local to global) in the context of global infectious disease transmission. This study aims to develop a novel data-driven public health approach using big data from Twitter coupled with other human mobility data sources and artificial intelligence to monitor and analyze human movement at different spatial scales (from global to regional to local). Methods We will first develop a database with optimized spatiotemporal indexing to store and manage the multisource data sets collected in this project. This database will be connected to our in-house Hadoop computing cluster for efficient big data computing and analytics. We will then develop innovative data models, predictive models, and computing algorithms to effectively extract and analyze human movement patterns using geotagged big data from Twitter and other human mobility data sources, with the goal of enhancing situational awareness and risk prediction in public health emergency response and disease surveillance systems. Results This project was funded as of May 2020. We have started the data collection, processing, and analysis for the project. Conclusions Research findings can help government officials, public health managers, emergency responders, and researchers answer critical questions during the pandemic regarding the current and future infectious risk of a state, county, or community and the effectiveness of social/physical distancing practices in curtailing the spread of the virus. International Registered Report Identifier (IRRID) DERR1-10.2196/24432 
    more » « less
  2. When COVID-19 first emerged in China, there was speculation that the outbreak would trigger public anger and weaken the Chinese regime. By analyzing millions of social media posts from Sina Weibo made between December 2019 and February 2020, we describe the contours of public, online discussions pertaining to COVID-19 in China. We find that discussions of COVID-19 became widespread on January 20, 2020, consisting primarily of personal reflections, opinions, updates, and appeals. We find that the largest bursts of discussion, which contain simultaneous spikes of criticism and support targeting the Chinese government, coincide with the January 23 lockdown of Wuhan and the February 7 death of Dr. Li Wenliang. Criticisms are directed at the government for perceived lack of action, incompetence, and wrongdoing—in particular, censoring information relevant to public welfare. Support is directed at the government for aggressive action and positive outcomes. As the crisis unfolds, the same events are interpreted differently by different people, with those who criticize focusing on the government’s shortcomings and those who praise focusing on the government’s actions. 
    more » « less
  3. Government censorship—internet shutdowns, blockages, firewalls—impose significant barriers to the transnational flow of information despite the connective power of digital technologies. In this paper, we examine whether and how information flows across borders despite government censorship. We develop a semi-automated system that combines deep learning and human annotation to find co-occurring content across different social media platforms and languages. We use this system to detect co-occurring content between Twitter and Sina Weibo as Covid-19 spread globally, and we conduct in-depth investigations of co-occurring content to identify those that constitute an inflow of information from the global information ecosystem into China. We find that approximately one-fourth of content with relevance for China that gains widespread public attention on Twitter makes its way to Weibo. Unsurprisingly, Chinese state-controlled media and commercialized domestic media play a dominant role in facilitating these inflows of information. However, we find that Weibo users without traditional media or government affiliations are also an important mechanism for transmitting information into China. These results imply that while censorship combined with media control provide substantial leeway for the government to set the agenda, social media provides opportunities for non-institutional actors to influence the information environment. Methodologically, the system we develop offers a new approach for the quantitative analysis of cross-platform and cross-lingual communication.

    more » « less
  4. Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House’s annual Freedom on the Net report, more than half the world’s Internet users now live in a place where the Internet is censored or restricted. China has built the world’s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention ‘sensitive’ topics or authored by ‘sensitive’ users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50% accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention. 
    more » « less
  5. This paper investigates the relationship between demographics and the frequency of censored posts (weibos) on Sina Weibo. Our results indicate that demographics such as location, gender and paid for features do not provide a good degree of predictive power but help explain how censorship is applied on social media. Using a dataset of 226 million weibos collected in 2012, we apply a binomial regression model to evaluate the predictive quality of user demographics to identify candidates that may be targeted for censorship. Our results suggest male users who are verified (pay for mobile and security features) are more likely to be censored than females or users who are not verified. In addition, users from provinces such as Hong Kong, Macao, and Beijing are more heavily censored compared to any other province in China over the same period. 
    more » « less