skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Analysis of Moving Events Using Tweets
The Digital Library Research Laboratory (DLRL) has collected over 3.5 billion tweets on different events for the Coordinated, Behaviorally-Aware Recovery for Transportation and Power Disruptions (CBAR-tpd), the Integrated Digital Event Archiving and Library (IDEAL), and the Global Event Trend Archive Research (GETAR) projects. The tweet collection topics include heart attack, solar eclipse, terrorism, etc. There are several collections on naturally occurring events such as hurricanes, floods, and solar eclipses. Such naturally occurring events are distributed across space and time. It would be beneficial to researchers if we can perform a spatial-temporal analysis to test some hypotheses, and to find any trends that tweets would reveal for such events. I apply an existing algorithm to detect locations from tweets by modifying it to work better with the type of datasets I work with. I use the time captured in tweets and also identify the tense of the sentences in tweets to perform the temporal analysis. I build a rule-based model for obtaining the tense of a tweet. The results from these two algorithms are merged to analyze naturally occurring moving events such as solar eclipses and hurricanes. Using the spatial-temporal information from tweets, I study if tweets can be a relevant source of information in understanding the movement of the event. I create visualizations to compare the actual path of the event with the information extracted by my algorithms. After examining the results from the analysis, I noted that Twitter can be a reliable source to identify places affected by moving events almost immediately. The locations obtained are at a more detailed level than in news-wires. We can also identify the time that an event affected a particular region by date.  more » « less
Award ID(s):
1638207 1619028 1319578
PAR ID:
10210446
Author(s) / Creator(s):
Date Published:
Journal Name:
Virginia tech
ISSN:
0274-9904
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The Tweet Collection Management (TWT) Team aims to ingest 5 billion tweets, clean this data, analyze the metadata present, extract key information, classify tweets into categories, and finally, index these tweets into Elasticsearch to browse and query. The main deliverable of this project is a running software application for searching tweets and for viewing Twitter collections from Digital Library Research Laboratory (DLRL) event archive projects. As a starting point, we focused on two development goals: (1) hashtag-based and (2) username-based search for tweets. For IR1, we completed extraction of two fields within our sample collection: hashtags and username. Sample code for TwiRole, a user-classification program, was investigated for use in our project. We were able to sample from multiple collections of tweets, spanning topics like COVID-19 and hurricanes. Initial work encompassed using a sample collection, provided via Google Drive. An NFS-based persistent storage was later involved to allow access to larger collections. In total, we have developed 9 services to extract key information like username, hashtags, geo-location, and keywords from tweets. We have also developed services to allow for parsing and cleaning of raw API data, and backup of data in an Apache Parquet filestore. All services are Dockerized and added to the GitLab Container Registry. The services are deployed in the CS cloud cluster to integrate services into the full search engine workflow. A service is created to convert WARC files to JSON for reading archive files into the application. Unit testing of services is complete and end-to-end tests have been conducted to improve system robustness and avoid failure during deployment. The TWT team has indexed 3,200 tweets into the Elasticsearch index. Future work could involve parallelization of the extraction of metadata, an alternative feature-flag approach, advanced geo-location inference, and adoption of the DMI-TCAT format. Key deliverables include a data body that allows for search, sort, filter, and visualization of raw tweet collections and metadata analysis; a running software application for searching tweets and for viewing Twitter collections from Digital Library Research Laboratory (DLRL) event archive projects; and a user guide to assist those using the system. 
    more » « less
  2. The purpose of the Twitter Disaster Behavior project is to identify patterns in online behavior during natural disasters by analyzing Twitter data. The main goal is to better understand the needs of a community during and after a disaster, to aid in recovery. The datasets analyzed were collections of tweets about Hurricane Maria, and recent earthquake events, in Puerto Rico. All tweets pertaining to Hurricane Maria are from the timeframe of September 15 through October 14, 2017. Similarly, tweets pertaining to the Puerto Rico earthquake from January 7 through February 6, 2020 were collected. These tweets were then analyzed for their content, number of retweets, and the geotag associated with the author of the tweet. We counted the occurrence of key words in topics relating to preparation, response, impact, and recovery. This data was then graphed using Python and Matplotlib. Additionally, using a Twitter crawler, we extracted a large dataset of tweets by users that used geotags. These geotags are used to examine location changes among the users before, during, and after each natural disaster. Finally, after performing these analyses, we developed easy to understand visuals and compiled these figures into a poster. Using these figures and graphs, we compared the two datasets in order to identify any significant differences in behavior and response. The main differences we noticed stemmed from two key reasons: hurricanes can be predicted whereas earthquakes cannot, and hurricanes are usually an isolated event whereas earthquakes are followed by aftershocks. Thus, the Hurricane Maria dataset experienced the highest amount of tweet activity at the beginning of the event and the Puerto Rico earthquake dataset experienced peaks in tweet activity throughout the entire period, usually corresponding to aftershock occurrences. We studied these differences, as well as other important trends we identified. 
    more » « less
  3. With the rapid development of the Internet of Things (IoT) and Big Data infrastructure, crowdsourcing techniques have emerged to facilitate data processing and problem solving particularly for flood emergences purposes. A Flood Analytics Information System (FAIS) has been developed as a Python Web application to gather Big Data from multiple servers and analyze flooding impacts during historical and real-time events. The application is smartly designed to integrate crowd intelligence, machine learning (ML), and natural language processing of tweets to provide flood warning with the aim to improve situational awareness for flood risk management. FAIS, a national scale prototype, combines flood peak rates and river level information with geotagged tweets to identify a dynamic set of at-risk locations to flooding. The prototype was successfully tested in real-time during Hurricane Dorian flooding as well as for historical event (Hurricanes Florence) across the Carolinas, USA where the storm made extensive disruption to infrastructure and communities. 
    more » « less
  4. With the rapid development of the Internet of Things (IoT) and Big Data infrastructure, crowdsourcing techniques have emerged to facilitate data processing and problem solving particularly for flood emergences purposes. A Flood Analytics Information System (FAIS) has been developed as a Python Web application to gather Big Data from multiple servers and analyze flooding impacts during historical and real-time events. The application is smartly designed to integrate crowd intelligence, machine learning (ML), and natural language processing of tweets to provide flood warning with the aim to improve situational awareness for flood risk management. FAIS, a national scale prototype, combines flood peak rates and river level information with geotagged tweets to identify a dynamic set of at-risk locations to flooding. The prototype was successfully tested in real-time during Hurricane Dorian flooding as well as for historical event (Hurricanes Florence) across the Carolinas, USA where the storm made extensive disruption to infrastructure and communities. 
    more » « less
  5. Social media data have been used to improve geographic situation awareness in the past decade. Although they have free and openly availability advantages, only a small proportion is related to situation awareness, and reliability or trustworthiness is a challenge. A credibility framework is proposed for Twitter data in the context of disaster situation awareness. The framework is derived from crowdsourcing, which states that errors propagated in volunteered information decrease as the number of contributors increases. In the proposed framework, credibility is hierarchically assessed on two tweet levels. The framework was tested using Hurricane Harvey Twitter data, in which situation awareness related tweets were extracted using a set of predefined keywords including power, shelter, damage, casualty, and flood. For each tweet, text messages and associated URLs were integrated to enhance the information completeness. Events were identified by aggregating tweets based on their topics and spatiotemporal characteristics. Credibility for events was calculated and analyzed against the spatial, temporal, and social impacting scales. This framework has the potential to calculate the evolving credibility in real time, providing users insight on the most important and trustworthy events. 
    more » « less