skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Structured Neural Topic Models for Reviews
We present Variational Aspect-based Latent Topic Allocation (VALTA), a family of autoencoding topic models that learn aspect-based representations of reviews. VALTA defines a user-item encoder that maps bag-of-words vectors for combined reviews associated with each paired user; item onto structured embeddings, which in turn define per-aspect topic weights. We model individual reviews in a structured manner by inferring an aspect assignment for each sentence in a given review, where the per-aspect topic weights obtained by the user-item encoder serve to define a mixture over topics, conditioned on the aspect. The result is an autoencoding neural topic model for reviews, which can be trained in a fully unsupervised manner to learn topics that are structured into aspects. Experimental evaluation on large number of datasets demonstrates that aspects are interpretable, yield higher coherence scores than non-structured autoencoding topic model variants,; can be utilized to perform aspect-based comparison; genre discovery.  more » « less
Award ID(s):
1835309
PAR ID:
10107753
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
89
ISSN:
2640-3498
Page Range / eLocation ID:
3429--3439
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Aspect-based sentiment analysis of review texts is of great value for understanding user feedback in a fine-grained manner. It has in general two sub-tasks: (i) extracting aspects from each review, and (ii) classifying aspect-based reviews by sentiment polarity. In this pa-per, we propose a weakly-supervised approach for aspect-based sentiment analysis, which uses only a few keywords describing each aspect/sentiment without using any labeled examples. Existing methods are either designed only for one of the sub-tasks, neglecting the benefit of coupling both, or are based on topic models that may contain overlapping concepts. We propose to first learn sentiment, aspectjoint topic embeddings in the word embedding space by imposing regularizations to encourage topic distinctiveness, and then use neural models to generalize the word-level discriminative information by pre-training the classifiers with embedding-based predictions and self-training them on unlabeled data. Our comprehensive performance analysis shows that our method generates quality joint topics and outperforms the baselines significantly (7.4%and 5.1% F1-score gain on average for aspect and sentiment classification respectively) on benchmark datasets. 
    more » « less
  2. An overall rating cannot reveal the details of user’s preferences toward each feature of a product. One widespread practice of e-commerce websites is to provide ratings on predefined aspects of the product and user-generated reviews. Most recent multi-criteria works employ aspect preferences of users or user reviews to understand the opinions and behavior of users. However, these works fail to learn how users correlate these information sources when users express their opinion about an item. In this work, we present Multi-task & Multi-Criteria Review-based Rating (MMCRR), a framework to predict the overall ratings of items by learning how users represent their preferences when using multi-criteria ratings and text reviews. We conduct extensive experiments with three real-life datasets and six baseline models. The results show that MMCRR can reduce prediction errors while learning features better from the data. 
    more » « less
  3. Social recommendation has achieved great success in many domains including e-commerce and location-based social networks. Existing methods usually explore the user-item interactions or user-user connections to predict users’ preference behaviors. However, they usually learn both user and item representations in Euclidean space, which has large limitations for exploring the latent hierarchical property in the data. In this article, we study a novel problem of hyperbolic social recommendation, where we aim to learn the compact but strong representations for both users and items. Meanwhile, this work also addresses two critical domain-issues, which are under-explored. First, users often make trade-offs with multiple underlying aspect factors to make decisions during their interactions with items. Second, users generally build connections with others in terms of different aspects, which produces different influences with aspects in social network. To this end, we propose a novel graph neural network (GNN) framework with multiple aspect learning, namely, HyperSoRec. Specifically, we first embed all users, items, and aspects into hyperbolic space with superior representations to ensure their hierarchical properties. Then, we adapt a GNN with novel multi-aspect message-passing-receiving mechanism to capture different influences among users. Next, to characterize the multi-aspect interactions of users on items, we propose an adaptive hyperbolic metric learning method by introducing learnable interactive relations among different aspects. Finally, we utilize the hyperbolic translational distance to measure the plausibility in each user-item pair for recommendation. Experimental results on two public datasets clearly demonstrate that our HyperSoRec not only achieves significant improvement for recommendation performance but also shows better representation ability in hyperbolic space with strong robustness and reliability. 
    more » « less
  4. In the era of big data, online doctor review platforms, which enable patients to give feedback to their doctors, have become one of the most important components in healthcare systems. On one hand, they help patients to choose their doctors based on the experience of others. On the other hand, they help doctors to improve the quality of their service. Moreover, they provide important sources for us to discover common concerns of patients and existing problems in clinics, which potentially improve current healthcare systems. In this paper, we systematically investigate the dataset from one of such review platform, namely, ratemds.com, where each review for a doctor comes with an overall rating and ratings of four different aspects. A comprehensive statistical analysis is conducted first for reviews, ratings, and doctors. Then, we explore the content of reviews by extracting latent topics related to different aspects with unsupervised topic modeling techniques. As the core component of this paper, we propose a multi-task learning framework for the document-level multi-aspect sentiment classification. This task helps us to not only recover missing aspect-level ratings and detect inconsistent rating scores but also identify aspect-keywords for a given review based on ratings. The proposed model takes both features of doctors and aspect-keywords into consideration. Extensive experiments have been conducted on two subsets of ratemds dataset to demonstrate the effectiveness of the proposed model. 
    more » « less
  5. Proceedings of the Sixteenth (Ed.)
    Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user’s interest. To model the semantic correlation between words and seeds for discovering topic-indicative terms, existing seedguided approaches utilize different types of context signals, such as document-level word co-occurrences, sliding window-based local contexts, and generic linguistic knowledge brought by pre-trained language models. In this work, we analyze and show empirically that each type of context information has its value and limitation in modeling word semantics under seed guidance, but combining three types of contexts (i.e., word embeddings learned from local contexts, pre-trained language model representations obtained from general-domain training, and topic-indicative sentences retrieved based on seed information) allows them to complement each other for discovering quality topics. We propose an iterative framework, SeedTopicMine, which jointly learns from the three types of contexts and gradually fuses their context signals via an ensemble ranking process. Under various sets of seeds and on multiple datasets, SeedTopicMine consistently yields more coherent and accurate topics than existing seed-guided topic discovery approaches. 
    more » « less