In this empirical study, a framework was developed for binary and multi-class classification of Twitter data. We first introduce a manually built gold standard dataset of 4000 tweets related to the environmental health hazards in Barbados for the period 2014 - 2018. Then, the binary classification was used to categorize each tweet as relevant or irrelevant. Next, the multiclass classification was then used to further classify relevant tweets into four types of community engagement: reporting information, expressing negative engagement, expressing positive engagement, and asking for information. Results indicate that (combination of TF-IDF, psychometric, linguistic, sentiment and Twitter-specific features) using a Random Forest algorithm is the best feature for detecting and predicting binary classification with (87% F1 score). For multi-class classification, TF-IDF using Decision Tree algorithm was the best with (74% F1 score). 
                        more » 
                        « less   
                    
                            
                            You Don’t Say....Linguistic Features in Sarcasm Detection
                        
                    
    
            We explore linguistic features that contribute to sarcasm detection. The linguistic features that we investigate are a combination of text and word complexity, stylistic and psychological features. We experiment with sarcastic tweets with and without context. The results of our experiments indicate that contextual information is crucial for sarcasm prediction. One important observation is that sarcastic tweets are typically incongruent with their context in terms of sentiment or emotional load. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1704113
- PAR ID:
- 10463476
- Date Published:
- Journal Name:
- CLIC-IT 2021: Seventh Italian Conference on Computational Linguistics Bologna
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Social media platforms are frequently used to share information and opinions around vaccinations. The more often a message is reshared, the wider the reach of the message and potential influence it may have on shaping people’s opinions to get vaccinated or not. We used a negative binomial regression to investigate whether a message’s linguistic characteristics (degree of concreteness, emotional arousal, and sentiment) and user characteristics (political ideology and number of followers) may influence users’ decisions to reshare tweets related to the COVID-19 vaccine. We analyzed US English-language tweets related to the COVID-19 vaccine between May 2020 and October 2021 (N = 236,054). Tweets with positive and high-arousal words were more often retweeted than negative, low-arousal tweets. Tweets with abstract words were more often retweeted than tweets with concrete words. In addition, while Liberal users were more likely to have tweets with a positive sentiment reshared, Conservative users were more likely to have tweets with a negative sentiment reshared. Our results can inform public health messaging on how to best phrase vaccine information to impact engagement and information resharing, and potentially persuade a wider set of people to get vaccinated.more » « less
- 
            Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House’s annual Freedom on the Net report, more than half the world’s Internet users now live in a place where the Internet is censored or restricted. China has built the world’s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention ‘sensitive’ topics or authored by ‘sensitive’ users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50% accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention.more » « less
- 
            This paper examines the factors that govern persuasion for a priori UNDECIDED versus DECIDED audience members in the context of on-line debates. We separately study two types of influences: linguistic factors — features of the language of the debate itself; and audience factors — features of an audience member encoding demographic information, prior beliefs, and debate platform behavior. In a study of users of a popular debate platform, we find first that different combinations of linguistic features are critical for predicting persuasion outcomes for UNDECIDED versus DECIDED members of the audience. We additionally find that audience factors have more influence on predicting the side (PRO/CON) that persuaded UNDECIDED users than for DECIDED users that flip their stance to the opposing side. Our results emphasize the importance of considering the undecided and decided audiences separately when studying linguistic factors of persuasion.more » « less
- 
            In this commentary I provide a review of the microaggression construct within a linguistic-pragmatic framework. From this perspective, microaggressions can be viewed as nonconventional indirect speech acts, that is, utterances that, because of their aggressive meaning, require some type of inferential processing on the part of the hearer. This inferential process requires a consideration of the remark in the context within which it occurs, including the prior discourse, as well as the roles and statuses of the interactants. Because microaggressions are indirect, the speaker always has the option, especially if they are higher in power, of denying any aggressive meaning. Focusing on their linguistic/pragmatic features allows for the development of a more principled framework for specifying what constitutes a microaggression, as well as helping to identify the relevant features of the context and the processes involved in the recognition of microaggressions.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    