Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Information access systems, such as search and recommender systems, often use ranked lists to present results believed to be relevant to the user’s information need. Evaluating these lists for their fairness along with other traditional metrics provide a more complete understanding of an information access system’s behavior beyond accuracy or utility constructs. To measure the (un)fairness of rankings, particularly with respect to protected group(s) of producers or providers, several metrics have been proposed in the last several years. However, an empirical and comparative analyses of these metrics showing the applicability to specific scenario or real data, conceptual similarities, and differencesmore »Free, publicly-accessible full text available July 11, 2023
-
Free, publicly-accessible full text available June 1, 2023
-
Music is an important part of childhood development, with online music listening platforms being a significant channel by which children consume music. Children’s offline music listening behavior has been heavily researched, yet relatively few studies explore how their behavior manifests online. In this paper, we use data from LastFM 1 Billion and the Spotify API to explore online music listening behavior of children, ages 6–17, using education levels as lenses for our analysis. Understanding the music listening behavior of children can be used to inform the future design of recommender systems.Free, publicly-accessible full text available September 20, 2022
-
There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate evenmore »
-
Collaborative filtering algorithms find useful patterns in rating and consumption data and exploit these patterns to guide users to good items. Many of these patterns reflect important real-world phenomena driving interactions between the various users and items; other patterns may be irrelevant or reflect undesired discrimination, such as discrimination in publishing or purchasing against authors who are women or ethnic minorities. In this work, we examine the response of collaborative filtering recommender algorithms to the distribution of their input data with respect to one dimension of social concern, namely content creator gender. Using publicly available book ratings data, we measuremore »
-
We introduce the concept of \emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principle is desirable for many retrieval objectives and scenarios, including topical diversity and fair ranking. Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to relatedmore »
-
LensKit is an open-source toolkit for building, researching, and learning about recommender systems. First released in 2010 as a Java framework, it has supported diverse published research, small-scale production deployments, and education in both MOOC and traditional classroom settings. In this paper, I present the next generation of the LensKit project, re-envisioning the original tool's objectives as flexible Python package for supporting recommender systems research and development. LensKit for Python (LKPY) enables researchers and students to build robust, flexible, and reproducible experiments that make use of the large and growing PyData and Scientific Python ecosystem, including scikit-learn, and TensorFlow. Tomore »
-
Offline evaluation protocols for recommender systems are intended to estimate users' satisfaction with recommendations using static data from prior user interactions. These evaluations allow researchers and production developers to carry out first-pass estimates of the likely performance of a new system and weed out bad ideas before presenting them to users. However, offline evaluations cannot accurately assess novel, relevant recommendations, because the most novel recommendations items that were previously unknown to the user; such items are missing from the historical data, so they cannot be judged as relevant. A breakthrough that reliably produces novel, relevant recommendations would score poorly withmore »
-
We present StoryTime, a book recommender for children. Our web-based recommender is co-designed with children and uses images to elicit their preferences. By building on existing solutions related to both visual interfaces and book recommendation strategies for children, StoryTime can generate suggestions without historical data or adult guidance. We discuss the benefits of StoryTime as a starting point for further research exploring the cold start problem, incorporating historical data, and needs related to children as a complex audience to enhance the recommendation process.
-
Traditional offline evaluations of recommender systems apply metrics from machine learning and information retrieval in settings where their underlying assumptions no longer hold. This results in significant error and bias in measures of top-N recommendation performance, such as precision, recall, and nDCG. Several of the specific causes of these errors, including popularity bias and misclassified decoy items, are well-explored in the existing literature. In this paper we survey a range of work on identifying and addressing these problems, and report on our work in progress to simulate the recommender data generation and evaluation processes to quantify the extent of evaluationmore »