skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Detecting Domain Polarity-Changes of Words in a Sentiment Lexicon
Sentiment lexicons are instrumental for sentiment analysis. One can use a set of sentiment words provided in a sentiment lexicon and a lexicon-based classifier to perform sentiment analysis. One major issue with this approach is that many sentiment words (from the lexicon) are domain dependent. That is, they may be positive in some domains but negative in some others. We refer to this problem as domain polarity-changes of words from a sentiment lexicon. Detecting such words and correcting their sentiment for an application domain is very important. In this paper, we propose a graph-based technique to tackle this problem. Experimental results show its effectiveness on multiple datasets from different domains.  more » « less
Award ID(s):
1910424
PAR ID:
10302863
Author(s) / Creator(s):
Date Published:
Journal Name:
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The advancement of generative AI, involving the utilization of large language models (LLMs) like ChatGPT to assess public opinion and sentiment, has become increasingly prevalent. However, this upsurge in usage raises significant questions about the transparency and interpretability of the predictions made by these LLM Models. Hence, this paper explores the imperative of ensuring transparency in the application of ChatGPT for public sentiment analysis. To tackle these challenges, we propose using a lexicon-based model as a surrogate to approximate both global and local predictions. Through case studies, we demonstrate how transparency mechanisms, bolstered by the lexicon-based model, can be seamlessly integrated into ChatGPT’s deployment for sentiment analysis. Drawing on the results of our study, we further discuss the implications for future research involving the utilization of LLMs in governmental functions, policymaking, and public engagement. 
    more » « less
  2. null (Ed.)
    Aspect category detection (ACD) is one of the challenging sub-tasks in aspect-based sentiment analysis. The goal of this task is to detect implicit or explicit aspect categories from the sentences of user-generated reviews. Since annotation over the aspects is time-consuming, the amount of labeled data is limited for super-vised learning. In this paper, we study contextual representations of reviews using the BERT model to better extract useful features from text segments in the reviews, and train a supervised classifier with a small amount of labeled data for the ACD task. Experimental results obtained on Amazon reviews of six product domains show that our method is effective in some domains. 
    more » « less
  3. Sentiment Analysis is a popular text classification task in natural language processing. It involves developing algorithms or machine learning models to determine the sentiment or opinion expressed in a piece of text. The results of this task can be used by business owners and product developers to understand their consumers’ perceptions of their products. Asides from customer feedback and product/service analysis, this task can be useful for social media monitoring (Martin et al., 2021). One of the popular applications of sentiment analysis is for classifying and detecting the positive and negative sentiments on movie reviews. Movie reviews enable movie producers to monitor the performances of their movies (Abhishek et al., 2020) and enhance the decision of movie viewers to know whether a movie is good enough and worth investing time to watch (Lakshmi Devi et al., 2020). However, the task has been under-explored for African languages compared to their western counterparts, ”high resource languages”, that are privileged to have received enormous attention due to the large amount of available textual data. African languages fall under the category of the low resource languages which are on the disadvantaged end because of the limited availability of data that gives them a poor representation (Nasim & Ghani, 2020). Recently, sentiment analysis has received attention on African languages in the Twitter domain for Nigerian (Muhammad et al., 2022) and Amharic (Yimam et al., 2020) languages. However, there is no available corpus in the movie domain. We decided to tackle the problem of unavailability of Yoru`ba´ data for movie sentiment analysis by creating the first Yoru`ba´ sentiment corpus for Nollywood movie reviews. Also, we develop sentiment classification models using state-of-the-art pre-trained language models like mBERT (Devlin et al., 2019) and AfriBERTa (Ogueji et al., 2021). 
    more » « less
  4. Creating a domain model, even for classical, domain-independent planning, is a notoriously hard knowledge-engineering task. A natural approach to solve this problem is to learn a domain model from observations. However, model learning approaches frequently do not provide safety guarantees: the learned model may assume actions are applicable when they are not, and may incorrectly capture actions' effects. This may result in generating plans that will fail when executed. In some domains such failures are not acceptable, due to the cost of failure or inability to replan online after failure. In such settings, all learning must be done offline, based on some observations collected, e.g., by some other agents or a human. Through this learning, the task is to generate a plan that is guaranteed to be successful. This is called the model-free planning problem. Prior work proposed an algorithm for solving the model-free planning problem in classical planning. However, they were limited to learning grounded domains, and thus they could not scale. We generalize this prior work and propose the first safe model-free planning algorithm for lifted domains. We prove the correctness of our approach, and provide a statistical analysis showing that the number of trajectories needed to solve future problems with high probability is linear in the potential size of the domain model. We also present experiments on twelve IPC domains showing that our approach is able to learn the real action model in all cases with at most two trajectories. 
    more » « less
  5. null (Ed.)
    A natural language processing (NLP) application requires sophisticated lexical resources to support its processing goals. Different solutions, such as dictionary lookup and MetaMap, have been proposed in the healthcare informatics literature to identify disease terms with more than one word (multi-gram disease named entities). Although a lot of work has been done in the identification of protein- and gene-named entities in the biomedical field, not much research has been done on the recognition and resolution of terminologies in the clinical trial subject eligibility analysis. In this study, we develop a specialized lexicon for improving NLP and text mining analysis in the breast cancer domain, and evaluate it by comparing it with the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). We use a hybrid methodology, which combines the knowledge of domain experts, terms from multiple online dictionaries, and the mining of text from sample clinical trials. Use of our methodology introduces 4243 unique lexicon items, which increase bigram entity match by 38.6% and trigram entity match by 41%. Our lexicon, which adds a significant number of new terms, is very useful for matching patients to clinical trials automatically based on eligibility matching. Beyond clinical trial matching, the specialized lexicon developed in this study could serve as a foundation for future healthcare text mining applications. 
    more » « less