Despite the benefits of personalizing items and information tailored to users’ needs, it has been found that recommender systems tend to introduce biases that favor popular items or certain categories of items and dominant user groups. In this study, we aim to characterize the systematic errors of a recommendation system and how they manifest in various accountability issues, such as stereotypes, biases, and miscalibration. We propose a unified framework that distinguishes the sources of prediction errors into a set of key measures that quantify the various types of system-induced effects, at both the individual and collective levels. Based on our measuring framework, we examine the most widely adopted algorithms in the context of movie recommendation. Our research reveals three important findings: (1) Differences between algorithms: recommendations generated by simpler algorithms tend to be more stereotypical but less biased than those generated by more complex algorithms. (2) Disparate impact on groups and individuals: system-induced biases and stereotypes have a disproportionate effect on atypical users and minority groups (e.g., women and older users). (3) Mitigation opportunity: using structural equation modeling, we identify the interactions between user characteristics (typicality and diversity), system-induced effects, and miscalibration. We further investigate the possibility of mitigating system-induced effects by oversampling underrepresented groups and individuals, which was found to be effective in reducing stereotypes and improving recommendation quality. Our research is the first systematic examination of not only system-induced effects and miscalibration but also the stereotyping issue in recommender systems.
more »
« less
Calibration in Collaborative Filtering Recommender Systems: a User-Centered Analysis
Recommender systems learn from past user preferences in order to predict future user interests and provide users with personalized suggestions. Previous research has demonstrated that biases in user profiles in the aggregate can influence the recommendations to users who do not share the majority preference. One consequence of this bias propagation effect is miscalibration, a mismatch between the types or categories of items that a user prefers and the items provided in recommendations. In this paper, we conduct a systematic analysis aimed at identifying key characteristics in user profiles that might lead to miscalibrated recommendations. We consider several categories of profile characteristics, including similarity to the average user, propensity towards popularity, profile diversity, and preference intensity. We develop predictive models of miscalibration and use these models to identify the most important features correlated with miscalibration, given different algorithms and dataset characteristics. Our analysis is intended to help system designers predict miscalibration effects and to develop recommendation algorithms with improved calibration properties.
more »
« less
- Award ID(s):
- 1911025
- PAR ID:
- 10253140
- Date Published:
- Journal Name:
- HT '20: Proceedings of the 31st ACM Conference on Hypertext and Social Media
- Page Range / eLocation ID:
- 197 to 206
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Recently there has been a growing interest in fairness-aware recommender systems including fairness in providing consistent performance across different users or groups of users. A recommender system could be considered unfair if the recommendations do not fairly represent the tastes of a certain group of users while other groups receive recommendations that are consistent with their preferences. In this paper, we use a metric called miscalibration for measuring how a recommendation algorithm is responsive to users’ true preferences and we consider how various algorithms may result in different degrees of miscalibration for different users. In particular, we conjecture that popularity bias which is a well-known phenomenon in recommendation is one important factor leading to miscalibration in recommendation. Our experimental results using two real-world datasets show that there is a connection between how different user groups are affected by algorithmic popularity bias and their level of interest in popular items. Moreover, we show that the more a group is affected by the algorithmic popularity bias, the more their recommendations are miscalibrated.more » « less
-
Latent factor models have achieved great success in personalized recommendations, but they are also notoriously difficult to explain. In this work, we integrate regression trees to guide the learning of latent factor models for recommendation, and use the learnt tree structure to explain the resulting latent factors. Specifically, we build regression trees on users and items respectively with user-generated reviews, and associate a latent profile to each node on the trees to represent users and items. With the growth of regression tree, the latent factors are gradually refined under the regularization imposed by the tree structure. As a result, we are able to track the creation of latent profiles by looking into the path of each factor on regression trees, which thus serves as an explanation for the resulting recommendations. Extensive experiments on two large collections of Amazon and Yelp reviews demonstrate the advantage of our model over several competitive baseline algorithms. Besides, our extensive user study also confirms the practical value of explainable recommendations generated by our model.more » « less
-
Explaining automatically generated recommendations allows users to make more informed and accurate decisions about which results to utilize, and therefore improves their satisfaction. In this work, we develop a multi-task learning solution for explainable recommendation. Two companion learning tasks of user preference modeling for recommendation and opinionated content modeling for explanation are integrated via a joint tensor factorization. As a result, the algorithm predicts not only a user's preference over a list of items, i.e., recommendation, but also how the user would appreciate a particular item at the feature level, i.e., opinionated textual explanation. Extensive experiments on two large collections of Amazon and Yelp reviews confirmed the effectiveness of our solution in both recommendation and explanation tasks, compared with several existing recommendation algorithms. And our extensive user study clearly demonstrates the practical value of the explainable recommendations generated by our algorithm.more » « less
-
To promote engagement, recommendation algorithms on platforms like YouTube increasingly personalize users’ feeds, limiting users’ exposure to diverse content and depriving them of opportunities to reflect on their interests compared to others’. In this work, we investigate how exchanging recommendations with strangers can help users discover new content and reflect. We tested this idea by developing OtherTube—a browser extension for YouTube that displays strangers’ personalized YouTube recommendations. OtherTube allows users to (i) create an anonymized profile for social comparison, (ii) share their recommended videos with others, and (iii) browse strangers’ YouTube recommendations. We conducted a 10-day-long user study (n = 41) followed by a post-study interview (n = 11). Our results reveal that users discovered and developed new interests from seeing OtherTube recommendations. We identified user and content characteristics that affect interaction and engagement with exchanged recommendations; for example, younger users interacted more with OtherTube, while the perceived irrelevance of some content discouraged users from watching certain videos. Users reflected on their interests as well as others’, recognizing similarities and differences. Our work shows promise for designs leveraging the exchange of personalized recommendations with strangers.more » « less