skip to main content


Title: Uncertainty Quantification for Fairness in Two-Stage Recommender Systems
Many large-scale recommender systems consist of two stages. The first stage efficiently screens the complete pool of items for a small subset of promising candidates, from which the second-stage model curates the final recommendations. In this paper, we investigate how to ensure group fairness to the items in this two-stage architecture. In particular, we find that existing first-stage recommenders might select an irrecoverably unfair set of candidates such that there is no hope for the second-stage recommender to deliver fair recommendations. To this end, motivated by recent advances in uncertainty quantification, we propose two threshold-policy selection rules that can provide distribution-free and finite-sample guarantees on fairness in first-stage recommenders. More concretely, given any relevance model of queries and items and a point-wise lower confidence bound on the expected number of relevant items for each threshold-policy, the two rules find near-optimal sets of candidates that contain enough relevant items in expectation from each group of items. To instantiate the rules, we demonstrate how to derive such confidence bounds from potentially partial and biased user feedback data, which are abundant in many large-scale recommender systems. In addition, we provide both finite-sample and asymptotic analyses of how close the two threshold selection rules are to the optimal thresholds. Beyond this theoretical analysis, we show empirically that these two rules can consistently select enough relevant items from each group while minimizing the size of the candidate sets for a wide range of settings.  more » « less
Award ID(s):
2008139
NSF-PAR ID:
10466320
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ACM Conference on Web Search and Data Mining (WSDM)
Page Range / eLocation ID:
940 to 948
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Recently there has been a growing interest in fairness-aware recommender systems including fairness in providing consistent performance across different users or groups of users. A recommender system could be considered unfair if the recommendations do not fairly represent the tastes of a certain group of users while other groups receive recommendations that are consistent with their preferences. In this paper, we use a metric called miscalibration for measuring how a recommendation algorithm is responsive to users’ true preferences and we consider how various algorithms may result in different degrees of miscalibration for different users. In particular, we conjecture that popularity bias which is a well-known phenomenon in recommendation is one important factor leading to miscalibration in recommendation. Our experimental results using two real-world datasets show that there is a connection between how different user groups are affected by algorithmic popularity bias and their level of interest in popular items. Moreover, we show that the more a group is affected by the algorithmic popularity bias, the more their recommendations are miscalibrated. 
    more » « less
  2. The strategy for selecting candidate sets — the set of items that the recommendation system is expected to rank for each user — is an important decision in carrying out an offline top-N recommender system evaluation. The set of candidates is composed of the union of the user’s test items and an arbitrary number of non-relevant items that we refer to as decoys. Previous studies have aimed to understand the effect of different candidate set sizes and selection strategies on evaluation. In this paper, we extend this knowledge by studying the specific interaction of candidate set selection strategies with popularity bias, and use simulation to assess whether sampled candidate sets result in metric estimates that are less biased with respect to the true metric values under complete data that is typically unavailable in ordinary experiments. 
    more » « less
  3. Effects of High Impact Educational Practices on Engineering and Computer Science Student Participation, Persistence, and Success at Land Grant Universities: Award# RIEF-1927218 – Year 2 Abstract Funded by the National Science Foundation (NSF), this project aims to investigate and identify associations (if any) that exist between student participation in High Impact Educational Practices (HIP) and their educational outcomes in undergraduate engineering and computer science (E/CS) programs. To understand the effects of HIP participation among E/CS students from groups historically underrepresented and underserved in E/CS, this study takes place within the rural, public university context at two western land grant institutions (one of which is an Hispanic-serving institution). Conceptualizing diversity broadly, this study considers gender, race and ethnicity, and first-generation, transfer, and nontraditional student status to be facets of identity that contribute to the diversity of academic programs and the technical workforce. This sequential, explanatory, mixed-methods study is guided by the following research questions: 1. To what extent do E/CS students participate in HIP? 2. What relationships (if any) exist between E/CS student participation in HIP and their educational outcomes (i.e., persistence in major, academic performance, and graduation)? 3. How do contextual factors (e.g., institutional, programmatic, personal, social, financial, etc.) affect E/CS student awareness of, interest in, and participation in HIP? During Project Year 1, a survey driven quantitative study was conducted. A survey informed by results of the National Survey of Student Engagement (NSSE) from each institution was developed and deployed. Survey respondents (N = 531) were students enrolled in undergraduate E/CS programs at either institution. Frequency distribution analyses were conducted to assess the respondents’ level of participation in extracurricular HIPs (i.e., global learning and study aboard, internships, learning communities, service and community-based learning, and undergraduate research) that have been shown in the literature to positively impact undergraduate student success. Further statistical analysis was conducted to understand the effects of HIP participation, coursework enjoyability, and confidence at completing a degree on the academic success of underrepresented and nontraditional E/CS students. Exploratory factor analysis was used to derive an "academic success" variable from five items that sought to measure how students persevere to attain academic goals. Results showed that a linear relationship in the target population exists and that the resultant multiple regression model is a good fit for the data. During the Project Year 2, survey results were used to develop focus group interview protocols and guide the purposive selection of focus group participants. Focus group interviews were conducted with a total of 27 undergraduates (12 males, 15 females, 16 engineering students, 11 computer science students) across both institutions via video conferencing (i.e., ZOOM) during the spring and fall 2021 semesters. Currently, verified focus group transcripts are being systematically analyzed and coded by a team of four trained coders to identify themes and answer the research questions. This paper will provide an overview of the preliminary themes so far identified. Future project activities during Project Year 3 will focus on refining themes identified during the focus group transcript analysis. Survey and focus group data will then be combined to develop deeper understandings of why and how E/CS students participate in the HIP at their university, taking into account the institutional and programmatic contexts at each institution. Ultimately, the project will develop and disseminate recommendations for improving diverse E/CS student awareness of, interest in, and participation in HIP, at similar land grant institutions nationally. 
    more » « less
  4. A large number of two-sided markets are now mediated by search and recommender systems, ranging from online retail and streaming entertainment to employment and romantic-partner matching. I will discuss in this talk how the design decisions that go into these search and recommender systems carry substantial power in shaping markets and allocating opportunity to the participants. This does not only raise legal and fairness questions, but also questions about how these systems shape incentives and the long-term effectiveness of the market. At the core of these questions lies the problem of where to rank each item, and how this affects both sides of the market. While it is well understood how to maximize the utility to the users, this talk focuses on how rankings affect the items that are being ranked. From the items perspective, the ranking system is an arbiter of exposure and thus economic opportunity. I will discuss how machine learning algorithms that follow the conventional Probability Ranking Principle [1] can lead to undesirable and unfair exposure allocation for both exogenous and endogenous reasons. Exogenous reasons often manifest themselves as biases in the training data, which then get reflected in the learned ranking policy. But even when trained with unbiased data, reasons endogenous to the system can lead to unfair or undesirable allocation of opportunity. To overcome these challenges, I will present new machine learning algorithms [2,3,4] that directly address both endogenous and exogenous factors, allowing the designer to tailor the ranking policy to be appropriate for the specific two-sided market. 
    more » « less
  5. While the algorithms used by music streaming services to provide recommendations have often been studied in offline, isolated settings, little research has been conducted studying the nature of their recommendations within the full context of the system itself. This work seeks to compare the level of diversity of the real-world recommendations provided by five of the most popular music streaming services, given the same lists of low-, medium- and high-diversity input items. We contextualized our results by examining the reviews for each of the five services on the Google Play Store, focusing on users’ perception of their recommender systems and the diversity of their output. We found that YouTube Music offered the most diverse recommendations, but the perception of the recommenders was similar across the five services. Consumers had multiple perspectives on the recommendations provided by their music service—ranging from not wanting any recommendations to applauding the algorithm for helping them find new music. 
    more » « less