Benchmark data repositories for better benchmarking

Longjohn, Rachel; Kelly, Markelle; Singh, Sameer; Smyth, Padhraic

Citation Details

This content will become publicly available on June 5, 2026

Benchmark data repositories for better benchmarking

In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for—and levies criticisms at—data and benchmarking practices in machine learning, comparatively less attention has been paid to the data repositories where these datasets are stored, documented, and shared. In this paper, we analyze the landscape of these benchmark data repositories and the role they can play in improving benchmarking. This role includes addressing issues with both datasets themselves (e.g., representational harms, construct validity) and the manner in which evaluation is carried out using such datasets (e.g., overemphasis on a few datasets and metrics, lack of reproducibility). To this end, we identify and discuss a set of considerations surrounding the design and use of benchmark data repositories, with a focus on improving benchmarking practices in machine learning. more »

Award ID(s):: 2046873 1925741

PAR ID:: 10635518

Author(s) / Creator(s):: Longjohn, Rachel; Kelly, Markelle; Singh, Sameer; Smyth, Padhraic

Publisher / Repository:: Neural Information Processing Systems (NeurIPS)

Date Published:: 2025-06-05

Page Range / eLocation ID:: 86435 - 86457

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 5, 2026
Conference Paper:
The DOI is not currently available.

More Like this