Recommender systems predict users’ preferences over a large number of items by pooling similar information from other users and/or items in the presence of sparse observations. One major challenge is how to utilize user-item specific covariates and networks describing user-item interactions in a high-dimensional situation, for accurate personalized prediction. In this article, we propose a smooth neighborhood recommender in the framework of the latent factor models. A similarity kernel is utilized to borrow neighborhood information from continuous covariates over a user-item specific network, such as a user’s social network, where the grouping information defined by discrete covariates is also integrated through the network. Consequently, user-item specific information is built into the recommender to battle the ‘cold-start” issue in the absence of observations in collaborative and content- based filtering. Moreover, we utilize a “divide-and-conquer” version of the alternating least squares algorithm to achieve scalable computation, and establish asymptotic results for the proposed method, demonstrating that it achieves superior prediction accuracy. Finally, we illustrate that the proposed method improves substantially over its competitors in simulated examples and real benchmark data–Last.fm music data.
more »
« less
HoORaYs: High-order Optimization of Rating Distance for Recommender Systems
Latent factor models have become a prevalent method in recommender systems, to predict users' preference on items based on the historical user feedback. Most of the existing methods, explicitly or implicitly, are built upon the first-order rating distance principle, which aims to minimize the difference between the estimated and real ratings. In this paper, we generalize such first-order rating distance principle and propose a new latent factor model (HoORaYs) for recommender systems. The core idea of the proposed method is to explore high-order rating distance, which aims to minimize not only (i) the difference between the estimated and real ratings of the same (user, item) pair (i.e., the first-order rating distance), but also (ii) the difference between the estimated and real rating difference of the same user across different items (i.e., the second-order rating distance). We formulate it as a regularized optimization problem, and propose an effective and scalable algorithm to solve it. Our analysis from the geometry and Bayesian perspectives indicate that by exploring the high-order rating distance, it helps to reduce the variance of the estimator, which in turns leads to better generalization performance (e.g., smaller prediction error). We evaluate the proposed method on four real-world data sets, two with explicit user feedback and the other two with implicit user feedback. Experimental results show that the proposed method consistently outperforms the state-of-the-art methods in terms of the prediction accuracy.
more »
« less
- PAR ID:
- 10062448
- Date Published:
- Journal Name:
- KDD
- Page Range / eLocation ID:
- 525 to 534
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recommender systems predict users’ preferences over a large number of items by pooling similar information from other users and/or items in the presence of sparse observations. One major challenge is how to utilize user-item specific covariates and networks describing user-item interactions in a high-dimensional situation, for accurate personalized prediction. In this article, we propose a smooth neighborhood recommender in the framework of the latent factor models. A similarity kernel is utilized to borrow neighborhood information from continuous covariates over a user-item specific network, such as a user’s social network, where the grouping information defined by discrete covariates is also integrated through the network. Consequently, user-item specific information is built into the recommender to battle the ‘cold-start” issue in the absence of observations in collaborative and contentbased filtering. Moreover, we utilize a “divide-and-conquer” version of the alternating least squares algorithm to achieve scalable computation, and establish asymptotic results for the proposed method, demonstrating that it achieves superior prediction accuracy. Finally, we illustrate that the proposed method improves substantially over its competitors in simulated examples and real benchmark data–Last.fm music data.more » « less
-
null (Ed.)Collaborative filtering has been widely used in recommender systems. Existing work has primarily focused on improving the prediction accuracy mainly via either building refined models or incorporating additional side information, yet has largely ignored the inherent distribution of the input rating data. In this paper, we propose a data debugging framework to identify overly personalized ratings whose existence degrades the performance of a given collaborative filtering model. The key idea of the proposed approach is to search for a small set of ratings whose editing (e.g., modification or deletion) would near-optimally improve the recommendation accuracy of a validation set. Experimental results demonstrate that the proposed approach can significantly improve the recommendation accuracy. Furthermore, we observe that the identified ratings significantly deviate from the average ratings of the corresponding items, and the proposed approach tends to modify them towards the average. This result sheds light on the design of future recommender systems in terms of balancing between the overall accuracy and personalization.more » « less
-
Cross-domain collaborative filtering recommenders exploit data from other domains (e.g., movie ratings) to predict users’ interests in a different target domain (e.g., suggest music). Most current cross-domain recommenders focus on modeling user ratings but pay limited attention to user reviews. Additionally, due to the complexity of these recommender systems, they cannot provide any information to users to support user decisions. To address these challenges, we propose Deep Hybrid Cross Domain (DHCD) model, a cross-domain neural framework, that can simultaneously predict user ratings, and provide useful information to strengthen the suggestions and support user decision across multiple domains. Specifically, DHCD enhances the predicted ratings by jointly modeling two crucial facets of users’ product assessment: ratings and reviews. To support decisions, it models and provides natural review-like sentences across domains according to user interests and item features. This model is robust in integrating user rating and review information from more than two domains. Our extensive experiments show that DHCD can significantly outperform advanced baselines in rating predictions and review generation tasks. For rating prediction tasks, it outperforms cross-domain and single-domain collaborative filtering as well as hybrid recommender systems. Furthermore, our review generation experiments suggest an improved perplexity score and transfer of review information in DHCD.more » « less
-
In this paper, we consider the Collaborative Ranking (CR) problem for recommendation systems. Given a set of pairwise preferences between items for each user, collaborative ranking can be used to rank un-rated items for each user, and this ranking can be naturally used for recommendation. It is observed that collaborative ranking algorithms usually achieve better performance since they directly minimize the ranking loss; however, they are rarely used in practice due to the poor scalability. All the existing CR algorithms have time complexity at least O(|Ω|r) per iteration, where r is the target rank and |Ω| is number of pairs which grows quadratically with number of ratings per user. For example, the Netflix data contains totally 20 billion rating pairs, and at this scale all the current algorithms have to work with significant subsampling, resulting in poor prediction on testing data. In this paper, we propose a new collaborative ranking algorithm called Primal-CR that reduces the time complexity toO(|Ω|+d1d2r), where d1 is number of users and d2 is the averaged number of items rated by a user. Note that d1, d2 is strictly smaller and open much smaller than |Ω|. Furthermore, by exploiting the fact that most data is in the form of numerical ratings instead of pairwise comparisons, we propose Primal-CR++ with O(d1d2(r + log d2)) time complexity. Both algorithms have better theoretical time complexity than existing approaches and also outperform existing approaches in terms of NDCG and pairwise error on real data sets. To the best of our knowledge, this is the first collaborative ranking algorithm capable of working on the full Netflix dataset using all the 20 billion rating pairs, and this leads to a model with much better recommendation compared with previous models trained on subsamples. Finally, compared with classical matrix factorization algorithm which also requires O(d1 d2r) time, our algorithm has almost the same efficiency while making much better recommendations since we consider the ranking loss.more » « less
An official website of the United States government

