skip to main content


Title: Aspect-Sentiment-Guided Opinion Summarization for User Need Elicitation From Online Reviews
Extracting and analyzing informative user opinion from large-scale online reviews is a key success factor in product design processes. However, user reviews are naturally unstructured, noisy, and verbose. Recent advances in abstractive text summrization provide an unprecedented opportunity to systematically generate summaries of user opinions to facilitate need finding for designers. Yet, two main gaps in the state-of-the-art opinion summarization methods limit their applicability to the product design domain. First is the lack of capabilities to guide the generative process with respect to various product aspects and user sentiments (e.g., polarity, subjectivity), and the second gap is the lack of annotated training datasets for supervised learning. This paper tackles these gaps by (1) devising an efficient and scalable methodology for abstractive opinion summarization from online reviews guided by aspects terms and sentiment polarities, and (2) automatically generating a reusable synthetic training dataset that captures various degrees of granularity and polarity. The methodology contributes a multi-instance pooling model with aspect and sentiment information integrated (MAS), a synthetic data assembled using the results of the MAS model, and a fine-tuned pretrained sequence-to-sequence model “T5” for summary generation. Numerical experiments are conducted on a large dataset scraped from a major e-commerce retail store for sneakers to demonstrate the performance, feasibility, and potentials of the developed methodology. Several directions are provided for future exploration in the area of automated opinion summarization for user-centered product design.

 
more » « less
Award ID(s):
2050052
NSF-PAR ID:
10387290
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Aspect-Sentiment-Guided Opinion Summarization for User Need Elicitation From Online Reviews
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Eliciting informative user opinions from online reviews is a key success factor for innovative product design and development. The unstructured, noisy, and verbose nature of user reviews, however, often complicate large-scale need finding in a format useful for designers without losing important information. Recent advances in abstractive text summarization has created the opportunity to systematically generate opinion summaries from online reviews to inform the early stages of product design and development. However, two knowledge gaps hinder the applicability of opinion summarization methods in practice. First, there is a lack of formal mechanisms to guide the generative process with respect to different categories of product attributes and user sentiments. Second, the annotated training datasets needed for supervised training of abstractive summarization models are often difficult and costly to create. This article addresses these gaps by (1) devising an efficient computational framework for abstractive opinion summarization guided by specific product attributes and sentiment polarities, and (2) automatically generating a synthetic training dataset that captures various degrees of granularity and polarity. A hierarchical multi-instance attribute-sentiment inference mode is developed for assembling a high-quality synthetic dataset, which is utilized to fine-tune a pretrained language model for abstractive summary generation. Numerical experiments conducted on a large dataset scraped from three major e-Commerce retail store for apparel and footwear products indicate the performance, feasibility, and potentials of the developed framework. Several directions are provided for future exploration in the area of automated opinion summarization for user-centered design. 
    more » « less
  2. Proc. 2023 ACM Int. Conf. on Web Search and Data Mining (Ed.)
    Target-oriented opinion summarization is to profile a target by extracting user opinions from multiple related documents. Instead of simply mining opinion ratings on a target (e.g., a restaurant) or on multiple aspects (e.g., food, service) of a target, it is desirable to go deeper, to mine opinion on fine-grained sub-aspects (e.g., fish). However, it is expensive to obtain high-quality annotations at such fine-grained scale. This leads to our proposal of a new framework, FineSum, which advances the frontier of opinion analysis in three aspects: (1) minimal supervision, where no document-summary pairs are provided, only aspect names and a few aspect/sentiment keywords are available; (2) fine-grained opinion analysis, where sentiment analysis drills down to a specific subject or characteristic within each general aspect; and (3) phrase-based summarization, where short phrases are taken as basic units for summarization, and semantically coherent phrases are gathered to improve the consistency and comprehensiveness of summary. Given a large corpus with no annotation, FineSum first automatically identifies potential spans of opinion phrases, and further reduces the noise in identification results using aspect and sentiment classifiers. It then constructs multiple fine-grained opinion clusters under each aspect and sentiment. Each cluster expresses uniform opinions towards certain sub-aspects (e.g., “fish” in “food” aspect) or characteristics (e.g., “Mexican” in “food” aspect). To accomplish this, we train a spherical word embedding space to explicitly represent different aspects and sentiments. We then distill the knowledge from embedding to a contextualized phrase classifier, and perform clustering using the contextualized opinion-aware phrase embedding. Both automatic evaluations on the benchmark and quantitative human evaluation validate the effectiveness of our approach. 
    more » « less
  3. null (Ed.)
    Neural natural language generation (NNLG) from structured meaning representations has become increasingly popular in recent years. While we have seen progress with generating syntactically correct utterances that preserve semantics, various shortcomings of NNLG systems are clear: new tasks require new training data which is not available or straightforward to acquire, and model outputs are simple and may be dull and repetitive. This paper addresses these two critical challenges in NNLG by: (1) scalably (and at no cost) creating training datasets of parallel meaning representations and reference texts with rich style markup by using data from freely available and naturally descriptive user reviews, and (2) systematically exploring how the style markup enables joint control of semantic and stylistic aspects of neural model output. We present YelpNLG, a corpus of 300,000 rich, parallel meaning representations and highly stylistically varied reference texts spanning different restaurant attributes, and describe a novel methodology that can be scalably reused to generate NLG datasets for other domains. The experiments show that the models control important aspects, including lexical choice of adjectives, output length, and sentiment, allowing the models to successfully hit multiple style targets without sacrificing semantics. 
    more » « less
  4. Generative Adversarial Networks (GANs) have shown remarkable success in various generative design tasks, from topology optimization to material design, and shape parametrization. However, most generative design approaches based on GANs lack evaluation mechanisms to ensure the generation of diverse samples. In addition, no GAN-based generative design model incorporates user sentiments in the loss function to generate samples with high desirability from the aggregate perspectives of users. Motivated by these knowledge gaps, this paper builds and validates a novel GAN-based generative design model with an offline design evaluation function to generate samples that are not only realistic, but also diverse and desirable. A multimodal Data-driven Design Evaluation (DDE) model is developed to guide the generative process by automatically predicting user sentiments for the generated samples based on large-scale user reviews of previous designs. This paper incorporates DDE into the StyleGAN structure, a state-of-the-art GAN model, to enable data-driven generative processes that are innovative and user-centered. The results of experiments conducted on a large dataset of footwear products demonstrate the effectiveness of the proposed DDE-GAN in generating high-quality, diverse, and desirable concepts. 
    more » « less
  5. null (Ed.)
    Aspect-based sentiment analysis of review texts is of great value for understanding user feedback in a fine-grained manner. It has in general two sub-tasks: (i) extracting aspects from each review, and (ii) classifying aspect-based reviews by sentiment polarity. In this pa-per, we propose a weakly-supervised approach for aspect-based sentiment analysis, which uses only a few keywords describing each aspect/sentiment without using any labeled examples. Existing methods are either designed only for one of the sub-tasks, neglecting the benefit of coupling both, or are based on topic models that may contain overlapping concepts. We propose to first learn sentiment, aspectjoint topic embeddings in the word embedding space by imposing regularizations to encourage topic distinctiveness, and then use neural models to generalize the word-level discriminative information by pre-training the classifiers with embedding-based predictions and self-training them on unlabeled data. Our comprehensive performance analysis shows that our method generates quality joint topics and outperforms the baselines significantly (7.4%and 5.1% F1-score gain on average for aspect and sentiment classification respectively) on benchmark datasets. 
    more » « less