skip to main content

This content will become publicly available on April 1, 2023

Title: TRACE: Travel Reinforcement Recommendation Based on Location-Aware Context Extraction
As the popularity of online travel platforms increases, users tend to make ad-hoc decisions on places to visit rather than preparing the detailed tour plans in advance. Under the situation of timeliness and uncertainty of users’ demand, how to integrate real-time context into a dynamic and personalized recommendations have become a key issue in travel recommender system. In this paper, by integrating the users’ historical preferences and real-time context, a location-aware recommender system called TRACE (Travel Reinforcement Recommendations Based on Location-Aware Context Extraction) is proposed. It captures users’ features based on location-aware context learning model, and makes dynamic recommendations based on reinforcement learning. Specifically, this research: (1) designs a travel reinforcing recommender system based on an Actor-Critic framework, which can dynamically track the user preference shifts and optimize the recommender system performance; (2) proposes a location-aware context learning model, which aims at extracting user context from real-time location and then calculating the impacts of nearby attractions on users’ preferences; and (3) conducts both offline and online experiments. Our proposed model achieves the best performance in both of the two experiments, which demonstrates that tracking the users’ preference shifts based on real-time location is valuable for improving the recommendation results.
; ;
Award ID(s):
Publication Date:
Journal Name:
ACM transactions on knowledge discovery from data
Sponsoring Org:
National Science Foundation
More Like this
  1. Collaborative bandit learning, i.e., bandit algorithms that utilize collaborative filtering techniques to improve sample efficiency in online interactive recommendation, has attracted much research attention as it enjoys the best of both worlds. However, all existing collaborative bandit learning solutions impose a stationary assumption about the environment, i.e., both user preferences and the dependency among users are assumed static over time. Unfortunately, this assumption hardly holds in practice due to users' ever-changing interests and dependency relations, which inevitably costs a recommender system sub-optimal performance in practice. In this work, we develop a collaborative dynamic bandit solution to handle a changing environment for recommendation. We explicitly model the underlying changes in both user preferences and their dependency relation as a stochastic process. Individual user's preference is modeled by a mixture of globally shared contextual bandit models with a Dirichlet process prior. Collaboration among users is thus achieved via Bayesian inference over the global bandit models. To balance exploitation and exploration during the interactions, Thompson sampling is used for both model selection and arm selection. Our solution is proved to maintain a standard $\tilde O(\sqrt{T})$ Bayesian regret in this challenging environment. Extensive empirical evaluations on both synthetic and real-world datasets further confirmed themore »necessity of modeling a changing environment and our algorithm's practical advantages against several state-of-the-art online learning solutions.« less
  2. Context has been recognized as an important factor to consider in personalized recommender systems. Particularly in location-based services (LBSs), a fundamental task is to recommend to a mobile user where he/she could be interested to visit next at the right time. Additionally, location-based social networks (LBSNs) allow users to share location-embedded information with friends who often co-occur in the same or nearby points-of-interest (POIs) or share similar POI visiting histories, due to the social homophily theory and Tobler’s first law of geography. So, both the time information and LBSN friendship relations should be utilized for POI recommendation. Tensor completion has recently gained some attention in time-aware recommender systems. The problem decomposes a user-item-time tensor into low-rank embedding matrices of users, items and times using its observed entries, so that the underlying low-rank subspace structure can be tracked to fill the missing entries for time-aware recommendation. However, these tensor completion methods ignore the social-spatial context information available in LBSNs, which is important for POI recommendation since people tend to share their preferences with their friends, and near things are more related than distant things. In this paper, we utilize the side information of social networks and POI locations to enhance themore »tensor completion model paradigm for more effective time-aware POI recommendation. Specifically, we propose a regularization loss head based on a novel social Hausdorff distance function to optimize the reconstructed tensor. We also quantify the popularity of different POIs with location entropy to prevent very popular POIs from being over-represented hence suppressing the appearance of other more diverse POIs. To address the sensitivity of negative sampling, we train the model on the whole data by treating all unlabeled entries in the observed tensor as negative, and rewriting the loss function in a smart way to reduce the computational cost. Through extensive experiments on real datasets, we demonstrate the superiority of our model over state-of-the-art tensor completion methods.« less
  3. Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually assume a stationary reward distribution, which hardly holds in practice as users' preferences are dynamic. This inevitably costs a recommender system consistent suboptimal performance. In this paper, we consider the situation where the underlying distribution of reward remains unchanged over (possibly short) epochs and shifts at unknown time instants. In accordance, we propose a contextual bandit algorithm that detects possible changes of environment based on its reward estimation confidence and updates its arm selection strategy respectively. Rigorous upper regret bound analysis of the proposed algorithm demonstrates its learning effectiveness in such a non-trivial environment. Extensive empirical evaluations on both synthetic and real-world datasets for recommendation confirm its practical utility in a changing environment.
  4. Conversational recommender systems (CRS) dynamically obtain the users' preferences via multi-turn questions and answers. The existing CRS solutions are widely dominated by deep reinforcement learning algorithms. However, deep reinforcement learning methods are often criticized for lacking interpretability and requiring a large amount of training data to perform.In this paper, we explore a simpler alternative and propose a decision tree based solution to CRS. The underlying challenge in CRS is that the same item can be described differently by different users. We show that decision trees are sufficient to characterize the interactions between users and items, and solve the key challenges in multi-turn CRS: namely which questions to ask, how to rank the candidate items, when to recommend, and how to handle user's negative feedback on the recommendations. Firstly, the training of decision trees enables us to find questions which effectively narrow down the search space. Secondly, by learning embeddings for each item and tree nodes, the candidate items can be ranked based on their similarity to the conversation context encoded by the tree nodes. Thirdly, the diversity of items associated with each tree node allows us to develop an early stopping strategy to decide when to make recommendations. Fourthly, whenmore »the user rejects a recommendation, we adaptively choose the next decision tree to improve subsequent questions and recommendations. Extensive experiments on three publicly available benchmark CRS datasets show that our approach provides significant improvement to the state of the art CRS methods.« less
  5. In this work, we propose to improve long-term user engagement in a recommender system from the perspective of sequential decision optimization, where users' click and return behaviors are directly modeled for online optimization. A bandit-based solution is formulated to balance three competing factors during online learning, including exploitation for immediate click, exploitation for expected future clicks, and exploration of unknowns for model estimation. We rigorously prove that with a high probability our proposed solution achieves a sublinear upper regret bound in maximizing cumulative clicks from a population of users in a given period of time, while a linear regret is inevitable if a user's temporal return behavior is not considered when making the recommendations. Extensive experimentation on both simulations and a large-scale real-world dataset collected from Yahoo frontpage news recommendation log verified the effectiveness and significant improvement of our proposed algorithm compared with several state-of-the-art online learning baselines for recommendation.