Adaptively profiling models with task elicitation

Brown, Davis; Balehannina, Prithvi; Jin, Helen; Havaldar, Shreya; Hassani, Hamed; Wong, Eric

doi:10.18653/v1/2025.emnlp-main.1270

Citation Details

Adaptively profiling models with task elicitation

Language model evaluations often fail to characterize consequential failure modes, forcing experts to inspect outputs and build new benchmarks. We introduce task elicitation, a method that automatically builds new evaluations to profile model behavior. Task elicitation finds hundreds of natural-language tasks—an order of magnitude more than prior work—where frontier models exhibit systematic failures, in domains ranging from forecasting to online harassment. For example, we find that Sonnet 3.5 over-associates quantum computing and AGI and that o3-mini is prone to hallucination when fabrications are repeated in-context. more »

Award ID(s):: 2442421

PAR ID:: 10675370

Author(s) / Creator(s):: Brown, Davis; Balehannina, Prithvi; Jin, Helen; Havaldar, Shreya; Hassani, Hamed; Wong, Eric

Publisher / Repository:: Association for Computational Linguistics

Date Published:: 2025-11-01

Page Range / eLocation ID:: 24996 to 25031

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
https://doi.org/10.18653/v1/2025.emnlp-main.1270

More Like this