Most e-commerce websites (e.g., Amazon and TripAdvisor) show their users an initial set of useful product reviews. These reviews allow users to form a general idea about the product's characteristics. The usefulness of a review is mainly based on a score that the website users provide. Studies have shown that this score is not a good indicator of a review's actual helpfulness. Nonetheless, most past works still use it to classify a review as helpful or not. With the growing number of reviews, finding those helpful ones is a challenging task. In this work, we propose NovRev, a new unsupervised approach to recommend a personalized subset of unread useful reviews for those users looking to increase their knowledge about a product. NovRev considers an initial set of reviews as a context and recommends reviews that increase the product's information. We have extensively tested NovRev against five baseline methods, using eight real-life datasets from different product domains. The results show that NovRev can recommend novel, relevant, and diverse reviews while covering more information about the product.
more »
« less
A Knowledge-Driven Approach to Classifying Object and Attribute Coreferences in Opinion Mining
Classifying and resolving coreferences of objects (e.g., product names) and attributes (e.g., product aspects) in opinionated reviews is crucial for improving the opinion mining performance. However, the task is challenging as one often needs to consider domain-specific knowledge (e.g., iPad is a tablet and has aspect resolution) to identify coreferences in opinionated reviews. Also, compiling a handcrafted and curated domain-specific knowledge base for each domain is very time consuming and arduous. This paper proposes an approach to automatically mine and leverage domain-specific knowledge for classifying objects and attribute coreferences. The approach extracts domain-specific knowledge from unlabeled review data and trains a knowledge aware neural coreference classification model to leverage (useful) domain knowledge together with general commonsense knowledge for the task. Experimental evaluation on real world datasets involving five domains (product types) shows the effectiveness of the approach.
more »
« less
- Award ID(s):
- 1910424
- PAR ID:
- 10302860
- Date Published:
- Journal Name:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Most e-commerce websites (e.g., Amazon and TripAdvisor) show their users an initial set of useful product reviews. These reviews allow users to form a general idea about the product’s characteristics. The usefulness of a review is mainly based on a score that the website users provide. Studies have shown that this score is not a good indicator of a review’s actual helpfulness. Nonetheless, most past works still use it to classify a review as helpful or not. With the growing number of reviews, finding those helpful ones is a challenging task. In this work, we propose NovRev, a new unsupervised approach to recommend a personalized subset of unread useful reviews for those users looking to increase their knowledge about a product. NovRev considers an initial set of reviews as a context and recommends reviews that increase the product’s information. We have extensively tested NovRev against five baseline methods, using eight real-life datasets from different product domains. The results show that NovRev can recommend novel, relevant, and diverse reviews while covering more information about the product.more » « less
-
Classifying whether collected information related to emerging topics and domains is fake/incorrect is not an easy task because we do not have enough labeled data in the domains. Given labeled data from source domains (e.g., gossip and health) and limited labeled data from a newly emerging target domain (e.g., COVID-19 and Ukraine war), simply applying knowledge learned from source domains to the target domain may not work well because of different data distribution. To solve the problem, in this paper, we propose an energy-based domain adaptation with active learning for early misinformation detection. Given three real world news datasets, we evaluate our proposed model against two baselines in both domain adaptation and the whole pipeline. Our model outperforms the baselines, improving at least 5% in the domain adaptation task and 10% in the whole pipeline, showing effectiveness of our proposed approach.more » « less
-
Sentiment Analysis is a popular text classification task in natural language processing. It involves developing algorithms or machine learning models to determine the sentiment or opinion expressed in a piece of text. The results of this task can be used by business owners and product developers to understand their consumers’ perceptions of their products. Asides from customer feedback and product/service analysis, this task can be useful for social media monitoring (Martin et al., 2021). One of the popular applications of sentiment analysis is for classifying and detecting the positive and negative sentiments on movie reviews. Movie reviews enable movie producers to monitor the performances of their movies (Abhishek et al., 2020) and enhance the decision of movie viewers to know whether a movie is good enough and worth investing time to watch (Lakshmi Devi et al., 2020). However, the task has been under-explored for African languages compared to their western counterparts, ”high resource languages”, that are privileged to have received enormous attention due to the large amount of available textual data. African languages fall under the category of the low resource languages which are on the disadvantaged end because of the limited availability of data that gives them a poor representation (Nasim & Ghani, 2020). Recently, sentiment analysis has received attention on African languages in the Twitter domain for Nigerian (Muhammad et al., 2022) and Amharic (Yimam et al., 2020) languages. However, there is no available corpus in the movie domain. We decided to tackle the problem of unavailability of Yoru`ba´ data for movie sentiment analysis by creating the first Yoru`ba´ sentiment corpus for Nollywood movie reviews. Also, we develop sentiment classification models using state-of-the-art pre-trained language models like mBERT (Devlin et al., 2019) and AfriBERTa (Ogueji et al., 2021).more » « less
-
User modeling is critical for understanding user intents, while it is also challenging as user intents are so diverse and not directly observable. Most existing works exploit specific types of behavior signals for user modeling, e.g., opinionated data or network structure; but the dependency among different types of user-generated data is neglected. We focus on self-consistence across multiple modalities of user-generated data to model user intents. A probabilistic generative model is developed to integrate two companion learning tasks of opinionated content modeling and social network structure modeling for users. Individual users are modeled as a mixture over the instances of paired learning tasks to realize their behavior heterogeneity, and the tasks are clustered by sharing a global prior distribution to capture the homogeneity among users. Extensive experimental evaluations on large collections of Amazon and Yelp reviews with social network structures confirm the effectiveness of the proposed solution. The learned user models are interpretable and predictive: they enable more accurate sentiment classification and item/friend recommendations than the corresponding baselines that only model a singular type of user behaviors.more » « less
An official website of the United States government

