Inference at Scale: Significance Testing for Large Search and Recommendation Experiments

Ihemelandu, Ngozi; Ekstrand, Michael D.

Citation Details

This content will become publicly available on July 23, 2024

Inference at Scale: Significance Testing for Large Search and Recommendation Experiments

A number of information retrieval studies have been done to assess which statistical techniques are appropriate for comparing systems. However, these studies are focused on TREC-style experiments, which typically have fewer than 100 topics. There is no similar line of work for large search and recommendation experiments; such studies typically have thousands of topics or users and much sparser relevance judgements, so it is not clear if recommendations for analyzing traditional TREC experiments apply to these settings. In this paper, we empirically study the behavior of significance tests with large search and recommendation evaluation data. Our results show that the Wilcoxon and Sign tests show significantly higher Type-1 error rates for large sample sizes than the bootstrap, randomization and t-tests, which were more consistent with the expected error rate. While the statistical tests displayed differences in their power for smaller sample sizes, they showed no difference in their power for large sample sizes. We recommend the sign and Wilcoxon tests should not be used to analyze large scale evaluation results. Our result demonstrate that with Top-\(N\) recommendation and large search evaluation data, most tests would have a 100% chance of finding statistically significant results. Therefore, the effect size should be used to determine practical or scientific significance. more »

Award ID(s):: 1751278

NSF-PAR ID:: 10423691

Author(s) / Creator(s):: Ihemelandu, Ngozi; Ekstrand, Michael D.

Date Published:: 2023-07-23

Journal Name:: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23)

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on July 23, 2024
Conference Paper:
The DOI is not currently available.

More Like this