Correlated Errors in Large Language Models

Kim, Elliot_Myunghoon; Garg, Avi; Peng, Kenny; Garg, Nikhil

Citation Details

This content will become publicly available on June 18, 2026

Correlated Errors in Large Language Models

Diversity in training data, architecture, and providers is assumed to mitigate homogeneity in LLMs. However, we lack empirical evidence on whether different LLMs differ \textit{meaningfully}. We conduct a large-scale empirical evaluation on over 350 LLMs overall, using two popular leaderboards and a resume-screening task. We find substantial correlation in model errors---on one leaderboard dataset, models agree 60% of the time when both models err. We identify factors driving model correlation, including shared architectures and providers. Crucially, however, larger and more accurate models have highly correlated errors, even with distinct architectures and providers. Finally, we show the effects of correlation in two downstream tasks: LLM-as-judge evaluation and hiring---the latter reflecting theoretical predictions regarding algorithmic monoculture. more »

Award ID(s):: 2339427

PAR ID:: 10616986

Author(s) / Creator(s):: Kim, Elliot_Myunghoon; Garg, Avi; Peng, Kenny; Garg, Nikhil

Publisher / Repository:: International Conference on Machine Learning

Date Published:: 2025-06-18

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 18, 2026
Conference Paper:
The DOI is not currently available.

More Like this