skip to main content


Title: Aspect Extraction from Online Consumer Reviews with WordNet-Guided Continuous-Space Language
Online consumer reviews contain rich yet implicit information regarding consumers’ preferences for specific aspects of products/services. Extracting aspects from online consumer reviews has been recognized as a valuable step in performing fine-grained analytical tasks (e.g. aspect-based sentiment analysis). Extant approaches to aspect extraction are dominated by discrete models. Despite explosive research interests in continuous-space language models in recent years, these models have yet to be explored for the task of extracting product/service aspects from online consumer reviews. In addition, previous continuous-space models for information extraction have largely overlooked the role of semantic information embedded in texts. In this study, we propose an approach of aspect extraction that leverages semantic information from WordNet in conjunction of building continuous-space language models from review texts. The experiment results with online restaurant reviews demonstrate that the WordNet-guided continuous-space language models outperform both discrete models and continuous-space language models without incorporating the semantic information. The research findings have important implications for understanding consumer preferences and improving business performances.  more » « less
Award ID(s):
1912898
NSF-PAR ID:
10095441
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
The 28th Annual Workshop on Information Technologies and Systems
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Sentiment, judgments and expressed positions are crucial concepts across international relations and the social sciences more generally. Yet, contemporary quantitative research has conventionally avoided the most direct and nuanced source of this information: political and social texts. In contrast, qualitative research has long relied on the patterns in texts to understand detailed trends in public opinion, social issues, the terms of international alliances, and the positions of politicians. Yet, qualitative human reading does not scale to the accelerating mass of digital information available currently. Researchers are in need of automated tools that can extract meaningful opinions and judgments from texts. Thus, there is an emerging opportunity to marry the model-based, inferential focus of quantitative methodology, as exemplified by ideal point models, with high resolution, qualitative interpretations of language and positions. We suggest that using alternatives to simple bag of words (BOW) representations and re-focusing on aspect-sentiment representations of text will aid researchers in systematically extracting people’s judgments and what is being judged at scale. The experimental results below show that our approach which automates the extraction of aspect and sentiment MWE pairs, outperforms BOW in classification tasks, while providing more interpretable parameters. By connecting expressed sentiment and the aspects being judged, PULSAR (Parsing Unstructured Language into Sentiment-Aspect Representations) also has deep implications for understanding the underlying dimensionality of issue positions and ideal points estimated with text. Our approach to parsing text into aspects-sentiment expressions recovers both expressive phrases (akin to categorical votes), as well as the aspects that are being judged (akin to bills). Thus, PULSAR or future systems like it, open up new avenues for the systematic analysis of high-dimensional opinions and judgments at scale within existing ideal point models. 
    more » « less
  2. null (Ed.)
    Aspect-based sentiment analysis of review texts is of great value for understanding user feedback in a fine-grained manner. It has in general two sub-tasks: (i) extracting aspects from each review, and (ii) classifying aspect-based reviews by sentiment polarity. In this pa-per, we propose a weakly-supervised approach for aspect-based sentiment analysis, which uses only a few keywords describing each aspect/sentiment without using any labeled examples. Existing methods are either designed only for one of the sub-tasks, neglecting the benefit of coupling both, or are based on topic models that may contain overlapping concepts. We propose to first learn sentiment, aspectjoint topic embeddings in the word embedding space by imposing regularizations to encourage topic distinctiveness, and then use neural models to generalize the word-level discriminative information by pre-training the classifiers with embedding-based predictions and self-training them on unlabeled data. Our comprehensive performance analysis shows that our method generates quality joint topics and outperforms the baselines significantly (7.4%and 5.1% F1-score gain on average for aspect and sentiment classification respectively) on benchmark datasets. 
    more » « less
  3. null (Ed.)
    Abstract Creativity research requires assessing the quality of ideas and products. In practice, conducting creativity research often involves asking several human raters to judge participants’ responses to creativity tasks, such as judging the novelty of ideas from the alternate uses task (AUT). Although such subjective scoring methods have proved useful, they have two inherent limitations—labor cost (raters typically code thousands of responses) and subjectivity (raters vary on their perceptions and preferences)—raising classic psychometric threats to reliability and validity. We sought to address the limitations of subjective scoring by capitalizing on recent developments in automated scoring of verbal creativity via semantic distance, a computational method that uses natural language processing to quantify the semantic relatedness of texts. In five studies, we compare the top performing semantic models (e.g., GloVe, continuous bag of words) previously shown to have the highest correspondence to human relatedness judgements. We assessed these semantic models in relation to human creativity ratings from a canonical verbal creativity task (AUT; Studies 1–3) and novelty/creativity ratings from two word association tasks (Studies 4–5). We find that a latent semantic distance factor—comprised of the common variance from five semantic models—reliably and strongly predicts human creativity and novelty ratings across a range of creativity tasks. We also replicate an established experimental effect in the creativity literature (i.e., the serial order effect) and show that semantic distance correlates with other creativity measures, demonstrating convergent validity. We provide an open platform to efficiently compute semantic distance, including tutorials and documentation ( https://osf.io/gz4fc/ ). 
    more » « less
  4. Extracting and analyzing informative user opinion from large-scale online reviews is a key success factor in product design processes. However, user reviews are naturally unstructured, noisy, and verbose. Recent advances in abstractive text summrization provide an unprecedented opportunity to systematically generate summaries of user opinions to facilitate need finding for designers. Yet, two main gaps in the state-of-the-art opinion summarization methods limit their applicability to the product design domain. First is the lack of capabilities to guide the generative process with respect to various product aspects and user sentiments (e.g., polarity, subjectivity), and the second gap is the lack of annotated training datasets for supervised learning. This paper tackles these gaps by (1) devising an efficient and scalable methodology for abstractive opinion summarization from online reviews guided by aspects terms and sentiment polarities, and (2) automatically generating a reusable synthetic training dataset that captures various degrees of granularity and polarity. The methodology contributes a multi-instance pooling model with aspect and sentiment information integrated (MAS), a synthetic data assembled using the results of the MAS model, and a fine-tuned pretrained sequence-to-sequence model “T5” for summary generation. Numerical experiments are conducted on a large dataset scraped from a major e-commerce retail store for sneakers to demonstrate the performance, feasibility, and potentials of the developed methodology. Several directions are provided for future exploration in the area of automated opinion summarization for user-centered product design.

     
    more » « less
  5. null (Ed.)
    Neural natural language generation (NNLG) from structured meaning representations has become increasingly popular in recent years. While we have seen progress with generating syntactically correct utterances that preserve semantics, various shortcomings of NNLG systems are clear: new tasks require new training data which is not available or straightforward to acquire, and model outputs are simple and may be dull and repetitive. This paper addresses these two critical challenges in NNLG by: (1) scalably (and at no cost) creating training datasets of parallel meaning representations and reference texts with rich style markup by using data from freely available and naturally descriptive user reviews, and (2) systematically exploring how the style markup enables joint control of semantic and stylistic aspects of neural model output. We present YelpNLG, a corpus of 300,000 rich, parallel meaning representations and highly stylistically varied reference texts spanning different restaurant attributes, and describe a novel methodology that can be scalably reused to generate NLG datasets for other domains. The experiments show that the models control important aspects, including lexical choice of adjectives, output length, and sentiment, allowing the models to successfully hit multiple style targets without sacrificing semantics. 
    more » « less