ON SPEEDING UP LANGUAGE MODEL EVALUATION

Zhou, Jin Peng; Belardi, Christian K; Wu, Ruihan; Zhang, Travis; Gomes, Carla P; Sun, Wen; Weinberger, Kilian Q

Citation Details

This content will become publicly available on June 11, 2026

ON SPEEDING UP LANGUAGE MODEL EVALUATION

Developing prompt-based methods with Large Language Models (LLMs) requires making numerous decisions, which give rise to a combinatorial search problem over hyper-parameters. This exhaustive evaluation can be time-consuming and costly. In this paper, we propose an adaptive approach to explore this space. We are exploiting the fact that often only few samples are needed to identify clearly superior or inferior settings, and that many evaluation tests are highly correlated. We lean on multi-armed bandits to sequentially identify the next (method, validation sample)-pair to evaluate and utilize low-rank matrix factorization to fill in missing evaluations. We carefully assess the efficacy of our approach on several competitive benchmark problems and show that it can identify the top-performing method using only 5-15% of the typical resources—resulting in 85-95% LLM cost savings. Our code is available at https://github.com/kilian-group/banditeval. more »

Award ID(s):: 1934714

PAR ID:: 10615991

Author(s) / Creator(s):: Zhou, Jin Peng; Belardi, Christian K; Wu, Ruihan; Zhang, Travis; Gomes, Carla P; Sun, Wen; Weinberger, Kilian Q

Publisher / Repository:: International Conference on Learning Representations

Date Published:: 2025-06-11

ISBN:: 9798331320850

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 11, 2026
Conference Proceeding:
The DOI is not currently available.

More Like this