Testing the limits of large language models in debating humans

Flamino, James; Modi, Mohammed Shahid; Szymanski, Boleslaw K; Cross, Brendan; Mikolajczyk, Colton

doi:10.1038/s41598-025-98378-1

Citation Details

Testing the limits of large language models in debating humans

Large Language Models (LLMs) have shown remarkable promise in communicating with humans. Their potential use as artificial partners with humans in sociological experiments involving conversation is an exciting prospect. But how viable is it? Here, we rigorously test the limits of agents that debate using LLMs in a preregistered study that runs multiple debate-based opinion consensus games. Each game starts with six humans, six agents, or three humans and three agents. We found that agents can blend in and concentrate on a debate’s topic better than humans, improving the productivity of all players. Yet, humans perceive agents as less convincing and confident than other humans, and several behavioral metrics of humans and agents we collected deviate measurably from each other. We observed that agents are already decent debaters, but their behavior generates a pattern distinctly different from the human-generated data. more »

Award ID(s):: 2214216

PAR ID:: 10667867

Author(s) / Creator(s):: Flamino, James; Modi, Mohammed Shahid; Szymanski, Boleslaw K; Cross, Brendan; Mikolajczyk, Colton

Publisher / Repository:: Nature

Date Published:: 2025-12-01

Journal Name:: Scientific Reports

Volume:: 15

Issue:: 1

ISSN:: 2045-2322

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1038/s41598-025-98378-1

More Like this