Social media platforms are playing increasingly critical roles in disaster response and rescue operations. During emergencies, users can post rescue requests along with their addresses on social media, while volunteers can search for those messages and send help. However, efficiently leveraging social media in rescue operations remains challenging because of the lack of tools to identify rescue request messages on social media automatically and rapidly. Analyzing social media data, such as Twitter data, relies heavily on Natural Language Processing (NLP) algorithms to extract information from texts. The introduction of bidirectional transformers models, such as the Bidirectional Encoder Representations from Transformers (BERT) model, has significantly outperformed previous NLP models in numerous text analysis tasks, providing new opportunities to precisely understand and classify social media data for diverse applications. This study developed and compared ten VictimFinder models for identifying rescue request tweets, three based on milestone NLP algorithms and seven BERT-based. A total of 3191 manually labeled disaster-related tweets posted during 2017 Hurricane Harvey were used as the training and testing datasets. We evaluated the performance of each model by classification accuracy, computation cost, and model stability. Experiment results show that all BERT-based models have significantly increased the accuracy of categorizing rescue-related tweets. The best model for identifying rescue request tweets is a customized BERT-based model with a Convolutional Neural Network (CNN) classifier. Its F1-score is 0.919, which outperforms the baseline model by 10.6%. The developed models can promote social media use for rescue operations in future disaster events.
more »
« less
Real-Time Twitter Data Mining Approach to Infer User Perception Toward Active Mobility
This study evaluates the level of service of shared transportation facilities through mining geotagged data from social media and analyzing the perceptions of road users. An algorithm is developed adopting a text classification approach with contextual understanding to filter out relevant information related to users’ perceptions toward active mobility. Using a heuristic-based keyword matching approach produces about 75% tweets that are out of context, so that approach is deemed unsuitable for information extraction from Twitter. This study implements six different text classification models and compares the performance of these models for tweet classification. The model is applied to real-world data to filter out relevant information, and content analysis is performed to check the distribution of keywords within the filtered data. The text classification model “term frequency-inverse document frequency” vectorizer-based logistic regression model performed best at classifying the tweets. To select the best model, the performances of the models are compared based on precision, recall, F1 score (geometric mean of precision and recall), and accuracy metrics. The findings from the analysis show that the proposed method can help produce more relevant information on walking and biking facilities as well as safety concerns. By analyzing the sentiments of the filtered data, the existing condition of biking and walking facilities in the DC area can be inferred. This method can be a critical part of the decision support system to understand the qualitative level of service of existing transportation facilities.
more »
« less
- Award ID(s):
- 1917019
- PAR ID:
- 10291507
- Date Published:
- Journal Name:
- Transportation Research Record: Journal of the Transportation Research Board
- ISSN:
- 0361-1981
- Page Range / eLocation ID:
- 036119812110049
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The increased social media usage in modern history instigates data collection from various users with different backgrounds. Mass media has been a rich source of information and might be utilized for countless purposes, from business and personal to political determination. Because more people tend to express their opinions through social media platforms, researchers are excited to collect data and use it as a free survey tool on what the public ponders about a particular issue. Because of the detrimental effect of news on social networks, many irresponsible users generate and promote fake news to influence public belief on a specific issue. The U.S. presidential election has been a significant and popular event, so both parties invest and extend their efforts to pursue and win the general election. Undoubtedly, spreading and promoting fake news through social media is one of the ways negligent individuals or groups sway societies toward their goals. This project examined the impact of removing fake tweets to predict the electoral outcomes during the 2020 general election. Eliminating mock tweets has improved the correctness of model prediction from 74.51 percent to 86.27 percent with the electoral outcomes of the election. Finally, we compared classification model performances with the highest model accuracy of 99.74634 percent, precision of 99.99881 percent, recall of 99.49430 percent, and an F1 score of 99.74592 percent. The study concludes that removing fake tweets improves the correctness of the model with the electoral outcomes of the U.S. election.more » « less
-
Website privacy policies sometimes provide users the option to opt-out of certain collections and uses of their personal data. Unfortunately, many privacy policies bury these instructions deep in their text, and few web users have the time or skill necessary to discover them. We describe a method for the automated detection of opt-out choices in privacy policy text and their presentation to users through a web browser extension. We describe the creation of two corpora of opt-out choices, which enable the training of classifiers to identify opt-outs in privacy policies. Our overall approach for extracting and classifying opt-out choices combines heuristics to identify commonly found opt-out hyperlinks with supervised machine learning to automatically identify less conspicuous instances. Our approach achieves a precision of 0.93 and a recall of 0.9. We introduce Opt-Out Easy, a web browser extension designed to present available opt-out choices to users as they browse the web. We evaluate the usability of our browser extension with a user study. We also present results of a large-scale analysis of opt-outs found in the text of thousands of the most popular websites.more » « less
-
As the problem of drug abuse intensifies in the U.S., many studies that primarily utilize social media data, such as postings on Twitter, to study drug abuse-related activities use machine learning as a powerful tool for text classification and filtering. However, given the wide range of topics of Twitter users, tweets related to drug abuse are rare in most of the datasets. This imbalanced data remains a major issue in building effective tweet classifiers, and is especially obvious for studies that include abuse-related slang terms. In this study, we approach this problem by designing an ensemble deep learning model that leverages both word-level and character-level features to classify abuse-related tweets. Experiments are reported on a Twitter dataset, where we can configure the percentages of the two classes (abuse vs. non-abuse) to simulate the data imbalance with different amplitudes. Results show that our ensemble deep learning models exhibit better performance than ensembles of traditional machine learning models, especially on heavily imbalanced datasets.more » « less
-
The COVID-19 pandemic has had a profound impact on the global community, and vaccination has been recognized as a crucial intervention. To gain insight into public perceptions of COVID-19 vaccines, survey studies and the analysis of social media platforms have been conducted. However, existing methods lack consideration of individual vaccination intentions or status and the relationship between public perceptions and actual vaccine uptake. To address these limitations, this study proposes a text classification approach to identify tweets indicating a user’s intent or status on vaccination. A comparative analysis between the proportions of tweets from different categories and real-world vaccination data reveals notable alignment, suggesting that tweets may serve as a precursor to actual vaccination status. Further, regression analysis and time series forecasting were performed to explore the potential of tweet data, demonstrating the significance of incorporating tweet data in predicting future vaccination status. Finally, clustering was applied to the tweet sets with positive and negative labels to gain insights into underlying focuses of each stance.more » « less