Extracting and analyzing informative user opinion from large-scale online reviews is a key success factor in product design processes. However, user reviews are naturally unstructured, noisy, and verbose. Recent advances in abstractive text summrization provide an unprecedented opportunity to systematically generate summaries of user opinions to facilitate need finding for designers. Yet, two main gaps in the state-of-the-art opinion summarization methods limit their applicability to the product design domain. First is the lack of capabilities to guide the generative process with respect to various product aspects and user sentiments (e.g., polarity, subjectivity), and the second gap is the lack of annotated training datasets for supervised learning. This paper tackles these gaps by (1) devising an efficient and scalable methodology for abstractive opinion summarization from online reviews guided by aspects terms and sentiment polarities, and (2) automatically generating a reusable synthetic training dataset that captures various degrees of granularity and polarity. The methodology contributes a multi-instance pooling model with aspect and sentiment information integrated (MAS), a synthetic data assembled using the results of the MAS model, and a fine-tuned pretrained sequence-to-sequence model “T5” for summary generation. Numerical experiments are conducted on a large dataset scraped from a major e-commerce retail storemore »
Beyond a Bag of Words: Using PULSAR to Extract Judgments on Specific Human Rights at Scale
Abstract Sentiment, judgments and expressed positions are crucial concepts across international relations and the social sciences more generally. Yet, contemporary quantitative research has conventionally avoided the most direct and nuanced source of this information: political and social texts. In contrast, qualitative research has long relied on the patterns in texts to understand detailed trends in public opinion, social issues, the terms of international alliances, and the positions of politicians. Yet, qualitative human reading does not scale to the accelerating mass of digital information available currently. Researchers are in need of automated tools that can extract meaningful opinions and judgments from texts. Thus, there is an emerging opportunity to marry the model-based, inferential focus of quantitative methodology, as exemplified by ideal point models, with high resolution, qualitative interpretations of language and positions. We suggest that using alternatives to simple bag of words (BOW) representations and re-focusing on aspect-sentiment representations of text will aid researchers in systematically extracting people’s judgments and what is being judged at scale. The experimental results below show that our approach which automates the extraction of aspect and sentiment MWE pairs, outperforms BOW in classification tasks, while providing more interpretable parameters. By connecting expressed sentiment and the aspects more »
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Peace Economics, Peace Science and Public Policy
- Sponsoring Org:
- National Science Foundation
More Like this
Understanding who blames or supports whom in news text is a critical research question in computational social science. Traditional methods and datasets for sentiment analysis are, however, not suitable for the domain of political text as they do not consider the direction of sentiments expressed between entities. In this paper, we propose a novel NLP task of identifying directed sentiment relationship between political entities from a given news document, which we call directed sentiment extraction. From a million-scale news corpus, we construct a dataset of news sentences where sentiment relations of political entities are manually annotated. We present a simple but effective approach for utilizing a pretrained transformer, which infers the target class by predicting multiple question-answering tasks and combining the outcomes. We demonstrate the utility of our proposed method for social science research questions by analyzing positive and negative opinions between political entities in two major events: 2016 U.S. presidential election and COVID-19. The newly proposed problem, data, and method will facilitate future studies on interdisciplinary NLP methods and applications.
Aspect classification, identifying aspects of text segments, facilitates numerous applications, such as sentiment analysis and review summarization. To alleviate the extensive human effort required by existing aspect classification methods, in this paper, we focus on a weakly supervised setting—the model input only contains domainspecific raw texts and a few seed words per pre-defined aspect. We identify a unique challenge here as to how to classify texts without any pre-defined aspects. The existence of this kind of “misc” aspect text segments is very common in review corpora. It is difficult, even for domain experts, to nominate seed words for the “misc” aspect, which makes existing seed-driven text classification methods not applicable. Therefore, we propose to jointly model pre-defined aspects and the “misc” aspect through a novel framework, ARYA. It enables mutual enhancements between pre-defined aspects and the “misc” aspect via iterative classifier training and seed set updating. Specifically, it trains a classifier for pre-defined aspects and then leverages it to induce the supervision for the “misc” aspect. The prediction results of the “misc” aspect are later utilized to further filter the seed word selections for pre-defined aspects. Experiments in three domains demonstrate the superior performance of our proposed framework, as wellmore »
Online consumer reviews contain rich yet implicit information regarding consumers’ preferences for specific aspects of products/services. Extracting aspects from online consumer reviews has been recognized as a valuable step in performing fine-grained analytical tasks (e.g. aspect-based sentiment analysis). Extant approaches to aspect extraction are dominated by discrete models. Despite explosive research interests in continuous-space language models in recent years, these models have yet to be explored for the task of extracting product/service aspects from online consumer reviews. In addition, previous continuous-space models for information extraction have largely overlooked the role of semantic information embedded in texts. In this study, we propose an approach of aspect extraction that leverages semantic information from WordNet in conjunction of building continuous-space language models from review texts. The experiment results with online restaurant reviews demonstrate that the WordNet-guided continuous-space language models outperform both discrete models and continuous-space language models without incorporating the semantic information. The research findings have important implications for understanding consumer preferences and improving business performances.
Obeid, I. (Ed.)The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples , as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” . The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition , image recognition  and text processing  are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do notmore »