skip to main content

Search for: All records

Creators/Authors contains: "Ekstrand, Michael D"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. LensKit is an open-source toolkit for building, researching, and learning about recommender systems. First released in 2010 as a Java framework, it has supported diverse published research, small-scale production deployments, and education in both MOOC and traditional classroom settings. In this paper, I present the next generation of the LensKit project, re-envisioning the original tool's objectives as flexible Python package for supporting recommender systems research and development. LensKit for Python (LKPY) enables researchers and students to build robust, flexible, and reproducible experiments that make use of the large and growing PyData and Scientific Python ecosystem, including scikit-learn, and TensorFlow. Tomore »that end, it provides classical collaborative filtering implementations, recommender system evaluation metrics, data preparation routines, and tools for efficiently batch running recommendation algorithms, all usable in any combination with each other or with other Python software. This paper describes the design goals, use cases, and capabilities of LKPY, contextualized in a reflection on the successes and failures of the original LensKit for Java software.« less
    Free, publicly-accessible full text available October 19, 2021
  2. We introduce the concept of \emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principle is desirable for many retrieval objectives and scenarios, including topical diversity and fair ranking. Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to relatedmore »metrics in information retrieval evaluation. Importantly, this methodology relaxes classic information retrieval assumptions, allowing a system, in response to a query, to produce a \emph{distribution over rankings} instead of a single fixed ranking. We study the behavior of the expected exposure metric and stochastic rankers across a variety of information access conditions, including \emph{ad hoc} retrieval and recommendation. We believe that measuring and optimizing expected exposure metrics using randomization opens a new area for retrieval algorithm development and progress.« less
    Free, publicly-accessible full text available October 19, 2021
  3. Offline evaluation protocols for recommender systems are intended to estimate users' satisfaction with recommendations using static data from prior user interactions. These evaluations allow researchers and production developers to carry out first-pass estimates of the likely performance of a new system and weed out bad ideas before presenting them to users. However, offline evaluations cannot accurately assess novel, relevant recommendations, because the most novel recommendations items that were previously unknown to the user; such items are missing from the historical data, so they cannot be judged as relevant. A breakthrough that reliably produces novel, relevant recommendations would score poorly withmore »current offline evaluation techniques. While the existence of this problem is noted in the literature, its extent is not well-understood. We present a simulation study to estimate the error that such missing data causes in commonly-used evaluation metrics in order to assess its prevalence and impact. We find that missing data in the rating or observation process causes the evaluation protocol to systematically mis-estimate metric values, and in some cases erroneously determine that a popularity-based recommender outperforms even a perfect personalized recommender. Substantial breakthroughs in recommendation quality, therefore, will be difficult to assess with existing offline techniques.« less
    Free, publicly-accessible full text available March 14, 2021
  4. We present StoryTime, a book recommender for children. Our web-based recommender is co-designed with children and uses images to elicit their preferences. By building on existing solutions related to both visual interfaces and book recommendation strategies for children, StoryTime can generate suggestions without historical data or adult guidance. We discuss the benefits of StoryTime as a starting point for further research exploring the cold start problem, incorporating historical data, and needs related to children as a complex audience to enhance the recommendation process.
  5. Traditional offline evaluations of recommender systems apply metrics from machine learning and information retrieval in settings where their underlying assumptions no longer hold. This results in significant error and bias in measures of top-N recommendation performance, such as precision, recall, and nDCG. Several of the specific causes of these errors, including popularity bias and misclassified decoy items, are well-explored in the existing literature. In this paper we survey a range of work on identifying and addressing these problems, and report on our work in progress to simulate the recommender data generation and evaluation processes to quantify the extent of evaluationmore »metric errors and assess their sensitivity to various assumptions.« less