skip to main content


Title: REVEAL 2020: Bandit and Reinforcement Learning from User Interactions
The REVEAL workshop1 focuses on framing the recommendation problem as a one of making personalized interventions, e.g. deciding to recommend a particular item to a particular user. Moreover, these interventions sometimes depend on each other, where a stream of interactions occurs between the user and the system, and where each decision to recommend something will have an impact on future steps and long-term rewards. This framing creates a number of challenges we will discuss at the workshop. How can recommender systems be evaluated offline in such a context? How can we learn recommendation policies that are aware of these delayed consequences and outcomes?  more » « less
Award ID(s):
1901168
NSF-PAR ID:
10309946
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
ACM Conference on Recommender Systems
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Background Shared decision making requires evidence to be conveyed to the patient in a way they can easily understand and compare. Patient decision aids facilitate this process. This article reviews the current evidence for how to present numerical probabilities within patient decision aids. Methods Following the 2013 review method, we assembled a group of 9 international experts on risk communication across Australia, Germany, the Netherlands, the United Kingdom, and the United States. We expanded the topics covered in the first review to reflect emerging areas of research. Groups of 2 to 3 authors reviewed the relevant literature based on their expertise and wrote each section before review by the full authorship team. Results Of 10 topics identified, we present 5 fundamental issues in this article. Although some topics resulted in clear guidance (presenting the chance an event will occur, addressing numerical skills), other topics (context/evaluative labels, conveying uncertainty, risk over time) continue to have evolving knowledge bases. We recommend presenting numbers over a set time period with a clear denominator, using consistent formats between outcomes and interventions to enable unbiased comparisons, and interpreting the numbers for the reader to meet the needs of varying numeracy. Discussion Understanding how different numerical formats can bias risk perception will help decision aid developers communicate risks in a balanced, comprehensible manner and avoid accidental “nudging” toward a particular option. Decisions between probability formats need to consider the available evidence and user skills. The review may be useful for other areas of science communication in which unbiased presentation of probabilities is important. 
    more » « less
  2. The main objective of Personalized Tour Recommendation (PTR) is to generate a sequence of point-of-interest (POIs) for a particular tourist, according to the user-specific constraints such as duration time, start and end points, the number of attractions planned to visit, and so on. Previous PTR solutions are based on either heuristics for solving the orienteering problem to maximize a global reward with a specified budget or approaches attempting to learn user visiting preferences and transition patterns with the stochastic process or recurrent neural networks. However, existing learning methodologies rely on historical trips to train the model and use the next visited POI as the supervised signal, which may not fully capture the coherence of preferences and thus recommend similar trips to different users, primarily due to the data sparsity problem and long-tailed distribution of POI popularity. This work presents a novel tour recommendation model by distilling knowledge and supervision signals from the trips in a self-supervised manner. We propose Contrastive Trajectory Learning for Tour Recommendation (CTLTR), which utilizes the intrinsic POI dependencies and traveling intent to discover extra knowledge and augments the sparse data via pre-training auxiliary self-supervised objectives. CTLTR provides a principled way to characterize the inherent data correlations while tackling the implicit feedback and weak supervision problems by learning robust representations applicable for tour planning. We introduce a hierarchical recurrent encoder-decoder to identify tourists’ intentions and use the contrastive loss to discover subsequence semantics and their sequential patterns through maximizing the mutual information. Additionally, we observe that a data augmentation step as the preliminary of contrastive learning can solve the overfitting issue resulting from data sparsity. We conduct extensive experiments on a range of real-world datasets and demonstrate that our model can significantly improve the recommendation performance over the state-of-the-art baselines in terms of both recommendation accuracy and visiting orders. 
    more » « less
  3. Conversational recommender systems (CRS) dynamically obtain the users' preferences via multi-turn questions and answers. The existing CRS solutions are widely dominated by deep reinforcement learning algorithms. However, deep reinforcement learning methods are often criticized for lacking interpretability and requiring a large amount of training data to perform.In this paper, we explore a simpler alternative and propose a decision tree based solution to CRS. The underlying challenge in CRS is that the same item can be described differently by different users. We show that decision trees are sufficient to characterize the interactions between users and items, and solve the key challenges in multi-turn CRS: namely which questions to ask, how to rank the candidate items, when to recommend, and how to handle user's negative feedback on the recommendations. Firstly, the training of decision trees enables us to find questions which effectively narrow down the search space. Secondly, by learning embeddings for each item and tree nodes, the candidate items can be ranked based on their similarity to the conversation context encoded by the tree nodes. Thirdly, the diversity of items associated with each tree node allows us to develop an early stopping strategy to decide when to make recommendations. Fourthly, when the user rejects a recommendation, we adaptively choose the next decision tree to improve subsequent questions and recommendations. Extensive experiments on three publicly available benchmark CRS datasets show that our approach provides significant improvement to the state of the art CRS methods. 
    more » « less
  4. Context has been recognized as an important factor to consider in personalized recommender systems. Particularly in location-based services (LBSs), a fundamental task is to recommend to a mobile user where he/she could be interested to visit next at the right time. Additionally, location-based social networks (LBSNs) allow users to share location-embedded information with friends who often co-occur in the same or nearby points-of-interest (POIs) or share similar POI visiting histories, due to the social homophily theory and Tobler’s first law of geography. So, both the time information and LBSN friendship relations should be utilized for POI recommendation. Tensor completion has recently gained some attention in time-aware recommender systems. The problem decomposes a user-item-time tensor into low-rank embedding matrices of users, items and times using its observed entries, so that the underlying low-rank subspace structure can be tracked to fill the missing entries for time-aware recommendation. However, these tensor completion methods ignore the social-spatial context information available in LBSNs, which is important for POI recommendation since people tend to share their preferences with their friends, and near things are more related than distant things. In this paper, we utilize the side information of social networks and POI locations to enhance the tensor completion model paradigm for more effective time-aware POI recommendation. Specifically, we propose a regularization loss head based on a novel social Hausdorff distance function to optimize the reconstructed tensor. We also quantify the popularity of different POIs with location entropy to prevent very popular POIs from being over-represented hence suppressing the appearance of other more diverse POIs. To address the sensitivity of negative sampling, we train the model on the whole data by treating all unlabeled entries in the observed tensor as negative, and rewriting the loss function in a smart way to reduce the computational cost. Through extensive experiments on real datasets, we demonstrate the superiority of our model over state-of-the-art tensor completion methods. 
    more » « less
  5. null (Ed.)
    Most commercial music services rely on collaborative filtering to recommend artists and songs. While this method is effective for popular artists with large fanbases, it can present difficulties for recommending novel, lesser known artists due to a relative lack of user preference data. In this paper, we therefore seek to understand how content-based approaches can be used to more effectively recommend songs from these lesser known artists. Specifically, we conduct a user study to answer three questions. Firstly, do most users agree which songs are most acoustically similar? Secondly, is acoustic similarity a good proxy for how an individual might construct a playlist or recommend music to a friend? Thirdly, if so, can we find acoustic features that are related to human judgments of acoustic similarity? To answer these questions, our study asked 117 test subjects to compare two unknown candidate songs relative to a third known reference song. Our findings show that 1) judgments about acoustic similarity are fairly consistent, 2) acoustic similarity is highly correlated with playlist selection and recommendation, but not necessarily personal preference, and 3) we identify a subset of acoustic features from the Spotify Web API that is particularly predictive of human similarity judgments. 
    more » « less