Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences

Shankar, Shreya (ORCID:0000000209199672); Zamfirescu-Pereira, JD (ORCID:0000000253106728); Hartmann, Bjoern (ORCID:0000000206930829); Parameswaran, Aditya (ORCID:0000000245384752); Arawjo, Ian (ORCID:0000000189100822)

doi:10.1145/3654777.3676450

Abstract Automatic differentiation (AD) enables powerful metasurface inverse design but requires extensive theoretical and programming expertise. We present a Model Context Protocol (MCP) assisted framework that allows researchers to conduct inverse design with differentiable solvers through large language models (LLMs). Since LLMs inherently lack knowledge of specialized solvers, our proposed solution provides dynamic access to verified code templates and comprehensive documentation through dedicated servers. The LLM autonomously accesses these resources to generate complete inverse design codes without prescribed coordination rules. Evaluation on the Huygens meta-atom design task with the differentiable TorchRDIT solver shows that while both natural language and structured prompting strategies achieve high success rates, structured prompting significantly outperforms in design quality, workflow efficiency, computational cost, and error reduction. The minimalist server design, using only 5 APIs, demonstrates how MCP makes sophisticated computational tools accessible to researchers without programming expertise, offering a generalizable integration solution for other scientific tasks.

More Like this