skip to main content


This content will become publicly available on October 26, 2024

Title: Candidate Set Sampling for Evaluating Top-N Recommendation
The strategy for selecting candidate sets — the set of items that the recommendation system is expected to rank for each user — is an important decision in carrying out an offline top-N recommender system evaluation. The set of candidates is composed of the union of the user’s test items and an arbitrary number of non-relevant items that we refer to as decoys. Previous studies have aimed to understand the effect of different candidate set sizes and selection strategies on evaluation. In this paper, we extend this knowledge by studying the specific interaction of candidate set selection strategies with popularity bias, and use simulation to assess whether sampled candidate sets result in metric estimates that are less biased with respect to the true metric values under complete data that is typically unavailable in ordinary experiments.  more » « less
Award ID(s):
1751278
NSF-PAR ID:
10487293
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE
Date Published:
Page Range / eLocation ID:
88 to 94
Subject(s) / Keyword(s):
["recommender systems","evaluation"]
Format(s):
Medium: X
Location:
Venice, Italy
Sponsoring Org:
National Science Foundation
More Like this
  1. Many large-scale recommender systems consist of two stages. The first stage efficiently screens the complete pool of items for a small subset of promising candidates, from which the second-stage model curates the final recommendations. In this paper, we investigate how to ensure group fairness to the items in this two-stage architecture. In particular, we find that existing first-stage recommenders might select an irrecoverably unfair set of candidates such that there is no hope for the second-stage recommender to deliver fair recommendations. To this end, motivated by recent advances in uncertainty quantification, we propose two threshold-policy selection rules that can provide distribution-free and finite-sample guarantees on fairness in first-stage recommenders. More concretely, given any relevance model of queries and items and a point-wise lower confidence bound on the expected number of relevant items for each threshold-policy, the two rules find near-optimal sets of candidates that contain enough relevant items in expectation from each group of items. To instantiate the rules, we demonstrate how to derive such confidence bounds from potentially partial and biased user feedback data, which are abundant in many large-scale recommender systems. In addition, we provide both finite-sample and asymptotic analyses of how close the two threshold selection rules are to the optimal thresholds. Beyond this theoretical analysis, we show empirically that these two rules can consistently select enough relevant items from each group while minimizing the size of the candidate sets for a wide range of settings. 
    more » « less
  2. Explanations in a recommender system assist users make informed decisions among a set of recommended items. Extensive research attention has been devoted to generate natural language explanations to depict how the recommendations are generated and why the users should pay attention to them. However, due to different limitations of those solutions, e.g., template-based or generation-based, it is hard to make the explanations easily perceivable, reliable, and personalized at the same time. In this work, we develop a graph attentive neural network model that seamlessly integrates user, item, attributes and sentences for extraction-based explanation. The attributes of items are selected as the intermediary to facilitate message passing for user-item specific evaluation of sentence relevance. And to balance individual sentence relevance, overall attribute coverage and content redundancy, we solve an integer linear programming problem to make the final selection of sentences. Extensive empirical evaluations against a set of state-of-the-art baseline methods on two benchmark review datasets demonstrated the generation quality of proposed solution. 
    more » « less
  3. Subset selection, which is the task of finding a small subset of representative items from a large ground set, finds numerous applications in different areas. Sequential data, including time-series and ordered data, contain important structural relation- ships among items, imposed by underlying dynamic models of data, that should play a vital role in the selection of representatives. However, nearly all existing subset selection techniques ignore underlying dynamics of data and treat items independently, leading to incompatible sets of representatives. In this paper, we develop a new framework for sequential subset selection that finds a set of represen- tatives compatible with the dynamic models of data. To do so, we equip items with transition dynamic models and pose the problem as an integer binary optimization over assignments of sequential items to representatives, that leads to high encoding, diversity and transition potentials. Our formulation generalizes the well-known facility location objective to deal with sequential data, incorporating transition dynamics among facilities. As the proposed formulation is non-convex, we derive a max-sum message passing algorithm to solve the problem efficiently. Experiments on synthetic and real data, including instructional video summarization, show that our sequential subset selection framework not only achieves better encoding and diversity than the state of the art, but also successfully incorporates dynamics of data, leading to compatible representatives. 
    more » « less
  4. Subset selection, which is the task of finding a small subset of representative items from a large ground set, finds numerous applications in different areas. Sequential data, including time-series and ordered data, contain important structural relationships among items, imposed by underlying dynamic models of data, that should play a vital role in the selection of representatives. However, nearly all existing subset selection techniques ignore underlying dynamics of data and treat items independently, leading to incompatible sets of representatives. In this paper, we develop a new framework for sequential subset selection that finds a set of representatives compatible with the dynamic models of data. To do so, we equip items with transition dynamic models and pose the problem as an integer binary optimization over assignments of sequential items to representatives, that leads to high encoding, diversity and transition potentials. Our formulation generalizes the well-known facility location objective to deal with sequential data, incorporating transition dynamics among facilities. As the proposed formulation is non-convex, we derive a max-sum message passing algorithm to solve the problem efficiently. Experiments on synthetic and real data, including instructional video summarization, show that our sequential subset selection framework not only achieves better encoding and diversity than the state of the art, but also successfully incorporates dynamics of data, leading to compatible representatives. 
    more » « less
  5. Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)
    There has been a lack of standardization of the evaluation of sequential decoding systems in the bioengineering community. Assessment of the accuracy of a candidate system’s segmentations and measurement of a false alarm rate are examples of two performance metrics that are very critical to the operational acceptance of a technology. However, measurement of such quantities in a consistent manner require many scoring software implementation details to be resolved. Results can be highly sensitive to these implementation details. In this paper, we revisit and evaluate a set of metrics introduced in our open source scoring software for sequential decoding of multichannel signals. This software was used to rank sixteen automatic seizure detection systems recently developed for the 2020 Neureka® Epilepsy Challenge. The systems produced by the participants provided us with a broad range of design variations that allowed assessment of the consistency of the proposed metrics. We present a comprehensive assessment of four of these new metrics and validate our findings with our previous studies. We also validate a proposed new metric, time-aligned event scoring, that focuses on the segmentation behavior of an algorithm. We demonstrate how we can gain insight into the performance of a system using these metrics. 
    more » « less