Abstract Social media has been increasingly utilized to spread breaking news and risk communications during disasters of all magnitudes. Unfortunately, due to the unmoderated nature of social media platforms such as Twitter, rumors and misinformation are able to propagate widely. Given this, a surfeit of research has studied false rumor diffusion on Twitter, especially during natural disasters. Within this domain, studies have also focused on the misinformation control efforts from government organizations and other major agencies. A prodigious gap in research exists in studying the monitoring of misinformation on social media platforms in times of disasters and other crisis events. Such studies would offer organizations and agencies new tools and ideologies to monitor misinformation on platforms such as Twitter, and make informed decisions on whether or not to use their resources in order to debunk. In this work, we fill the research gap by developing a machine learning framework to predict the veracity of tweets that are spread during crisis events. The tweets are tracked based on the veracity of their content as either true, false, or neutral. We conduct four separate studies, and the results suggest that our framework is capable of tracking multiple cases of misinformation simultaneously, with scores exceeding 87%. In the case of tracking a single case of misinformation, our framework reaches an score of 83%. We collect and drive the algorithms with 15,952 misinformation‐related tweets from the Boston Marathon bombing (2013), Manchester Arena bombing (2017), Hurricane Harvey (2017), Hurricane Irma (2017), and the Hawaii ballistic missile false alert (2018). This article provides novel insights on how to efficiently monitor misinformation that is spread during disasters. 
                        more » 
                        « less   
                    
                            
                            Evaluating collective action theory-based model to simulate mobs
                        
                    
    
            Abstract A mob is an event that is organized via social media, email, SMS, or other forms of digital communication technologies in which a group of people (who might have an agenda) get together online or offline to collectively conduct an act and then disperse (quickly or over a long period). In recent years, these events are increasingly happening worldwide due to the anonymity of the internet, affordability of social media, boredom, etc. Studying such a phenomenon is difficult due to a lack of data, theoretical underpinning, and resources. In this research, we use the Agent-Based Modeling (ABM) technique to model the mobbers and the Monte Carlo method to assign random values to the factors extracted from the theory of Collective Action and conduct many simulations. We also leverage our previous research on Deviant Cyber Flash Mobs to implement various scenarios the mobber could face when they decide to act in a mob or not. This resulted in a model that can simulate mobs, estimate the mob success rate, and the needed powerful actors (e.g., mob organizers) for a mob to succeed. We finally evaluate our model using real-world mob data collected from the Meetup social media platform. This research is one step toward fully understanding mob formation and the motivations of its participants and organizers. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1920920
- PAR ID:
- 10520268
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Social Network Analysis and Mining
- Volume:
- 14
- Issue:
- 1
- ISSN:
- 1869-5469
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            The immense volume of user-generated content on social media provides a rich data source for big data research. Comentioned entities in social media content offer valuable information that can support a broad range of studies, from product market competition to dynamic social network mining and modeling. This paper introduces a new approach that combines named entity recognition (NER) and network modeling to extract and analyze co-mention relationships among entities in the same domain from unstructured social media data. This approach contributes to design for market systems literature because little research has investigated product competition via co-mention networks using large-scale unstructured social media data. In particular, the proposed approach provides designers with a new way to gain insight into market trends and aggregated customer preferences when customer choice data is insufficient. Moreover, our approach can easily support the evolution analysis of co-mention relationships beyond cross-sectional analysis of co-mention networks in a single year due to the abundance of social media data in multiple years. To demonstrate the approach to supporting multi-year product competition analysis, we perform a case study on mining co-mention networks of car models with Twitter data. The result shows that our approach can successfully extract the co-mention relationships of car models in multiple years from 2016 to 2019 from massive Twitter content; and enables us to conduct evolutionary co-mention network analysis with temporal network modeling and descriptive network analysis. The analysis confirmed that the co-mention network is capable of identifying frequently discussed entities and topics, such as car model pairs that often involve in competition and emerging vehicle technologies such as electric vehicles (EV). Furthermore, conducting evolutionary co-mention network analysis provides designers with an efficient way to monitor shifts in customer preferences for car features and to track trends in public discussions such as environmental issues associated with EVs over time. Our approach can be generally applied to other studies on co-mention relationships between entities, such as emerging technologies, cellphones, and political figures.more » « less
- 
            Large language models’ (LLMs) abilities are drawn from their pretraining data, and model development begins with data curation. However, decisions around what data is retained or removed during this initial stage are underscrutinized. In our work, we ground web text, which is a popular pretraining data source, to its social and geographic contexts. We create a new dataset of 10.3 million self-descriptions of website creators, and extract information about who they are and where they are from: their topical interests, social roles, and geographic affiliations. Then, we conduct the first study investigating how ten “quality” and English language identification (langID) filters affect webpages that vary along these social dimensions. Our experiments illuminate a range of implicit preferences in data curation: we show that some quality classifiers act like topical domain filters, and langID can overlook English content from some regions of the world. Overall, we hope that our work will encourage a new line of research on pretraining data curation practices and its social implications.more » « less
- 
            Abstract Research shows that certain external factors can affect the mental health of many people in a community. Moreover, the importance of mental health has significantly increased in recent years due to the COVID-19 pandemic. Many people communicate and express their emotions through social media platforms, which provide researchers with opportunities to examine insights into their opinions and mental state. While social sensing studies using social media data have flourished in the last decade, many studies using social media data to detect and predict mental health status have focused on the individual level. In this study, we aim to generate a social sensing index for mental health to monitor emotional well-being, which is closely related to mental health, and to identify daily trends in negative emotions at the city level. We conduct sentiment analysis on Twitter data and compute entropy of the degree of sentiment change to develop the index. We observe sentiment trends fluctuate significantly in response to unusual events. It is found that the social sensing index for mental health reflects both city-wide and local events that trigger negative emotions, as well as areas where negative emotions persist. The study contributes to the growing body of research that uses social media data to examine mental health at a city-level. We focus on mental health at the city-level rather than individual, which provides a broader perspective on the mental health of a population. Social sensing index for mental health allows public health professionals to monitor and identify persistent negative sentiments and potential areas where mental health issues may emerge.more » « less
- 
            Social media is being increasingly utilized to spread breaking news and updates during disasters of all magnitudes. Unfortunately, due to the unmoderated nature of social media platforms such as Twitter, rumors and misinformation are able to propagate widely. Given this, a surfeit of research has studied rumor diffusion on social media, especially during natural disasters. In many studies, researchers manually code social media data to further analyze the patterns and diffusion dynamics of users and misinformation. This method requires many human hours, and is prone to significant incorrect classifications if the work is not checked over by another individual. In our studies, we fill the research gap by applying seven different machine learning algorithms to automatically classify misinformed Twitter data that is spread during disaster events. Due to the unbalanced nature of the data, three different balancing algorithms are also applied and compared. We collect and drive the classifiers with data from the Manchester Arena bombing (2017), Hurricane Harvey (2017), the Hawaiian incoming missile alert (2018), and the East Coast US tsunami alert (2018). Over 20,000 tweets are classified based on the veracity of their content as either true, false, or neutral, with overall accuracies exceeding 89%.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
