Abstract Automatic differentiation (AD) enables powerful metasurface inverse design but requires extensive theoretical and programming expertise. We present a Model Context Protocol (MCP) assisted framework that allows researchers to conduct inverse design with differentiable solvers through large language models (LLMs). Since LLMs inherently lack knowledge of specialized solvers, our proposed solution provides dynamic access to verified code templates and comprehensive documentation through dedicated servers. The LLM autonomously accesses these resources to generate complete inverse design codes without prescribed coordination rules. Evaluation on the Huygens meta-atom design task with the differentiable TorchRDIT solver shows that while both natural language and structured prompting strategies achieve high success rates, structured prompting significantly outperforms in design quality, workflow efficiency, computational cost, and error reduction. The minimalist server design, using only 5 APIs, demonstrates how MCP makes sophisticated computational tools accessible to researchers without programming expertise, offering a generalizable integration solution for other scientific tasks.
more »
« less
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
- Award ID(s):
- 2243822
- PAR ID:
- 10646607
- Publisher / Repository:
- ACM
- Date Published:
- Page Range / eLocation ID:
- 1 to 14
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Introduction: The emergence and widespread adoption of generative AI (GenAI) chatbots such as ChatGPT, and programming assistants such as GitHub Copilot, have radically redefined the landscape of programming education. This calls for replication of studies and reexamination of findings from pre-GenAI CS contexts to understand the impact on students. Objectives: Achievement Goals are well studied in computing education and can be predictive of student interest and exam performance. The objective in this study is to compare findings from prior achievement goal studies in CS1 courses with new CS1 courses that emphasize the use of human-GenAI collaborative coding. Methods: In a CS1 course that integrates GenAI, we use linear regression to explore the relationship between achievement goals and prior experience on student interest, exam performance, and perceptions of GenAI. Results: As with prior findings in traditional CS1 classes, Mastery goals are correlated with interest in computing. Contradicting prior CS1 findings, normative goals are correlated with exam scores. Normative and mastery goals correlate with students’ perceptions of learning with GenAI. Mastery goals weakly correlate with reading and testing code output from GenAI.more » « less
-
With the advent of multi-modal large language models (MLLMs), datasets used for visual question answering (VQA) and referring expression comprehension have seen a resurgence. However, the most popular datasets used to evaluate MLLMs are some of the earliest ones created, and they have many known problems, including extreme bias, spurious correlations, and an inability to permit fine-grained analysis. In this paper, we pioneer evaluating recent MLLMs (LLaVA 1.5, LLaVA-NeXT, BLIP2, InstructBLIP, GPT-4V, and GPT-4o) on datasets designed to address weaknesses in earlier ones. We assess three VQA datasets: 1) TDIUC, which permits fine-grained analysis on 12 question types; 2) TallyQA, which has simple and complex counting questions; and 3) DVQA, which requires optical character recognition for chart understanding. We also study VQDv1, a dataset that requires identifying all image regions that satisfy a given query. Our experiments reveal the weaknesses of many MLLMs that have not previously been reported. Our code is integrated into the widely used LAVIS framework for MLLM evaluation, enabling the rapid assessment of future MLLMs.more » « less
An official website of the United States government

