skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
The adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain‐specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text‐only inputs. To address these limitations, we propose MultiMedRes, a multimodal medical collaborative reasoning framework that simulates human physicians’ communication by incorporating a learner agent to proactively acquire information from domain‐specific expert models. MultiMedRes addresses medical multimodal reasoning problems through three steps i) Inquire: The learner agent decomposes complex medical reasoning problems into multiple domain‐specific sub‐problems; ii) Interact: The agent engages in iterative “ask‐answer” interactions with expert models to obtain domain‐specific knowledge; and iii) Integrate: The agent integrates all the acquired domain‐specific knowledge to address the medical reasoning problems (e.g., identifying the difference of disease levels and abnormality sizes between medical images). We validate the effectiveness of our method on the task of difference visual question answering for X‐ray images. The experiments show that our zero‐shot prediction achieves state‐of‐the‐art performance, surpassing fully supervised methods, which demonstrates that MultiMedRes could offer trustworthy and interpretable assistance to physicians in monitoring the treatment progression of patients, paving the way for effective human–AI interaction and collaboration.  more » « less
Award ID(s):
2145625
PAR ID:
10641510
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Advanced Intelligent Systems
Volume:
7
Issue:
8
ISSN:
2640-4567
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper examines the performance of Multimodal LLMs (MLLMs) in skilled production work, with a focus on welding. Using a novel data set of real-world and online weld images, annotated by a domain expert, we evaluate the performance of two state-of-the-art MLLMs in assessing weld acceptability across three contexts: RV & Marine, Aeronautical, and Farming. While both models perform better on online images, likely due to prior exposure or memorization, they also perform relatively well on unseen, real-world weld images. Additionally, we introduce WeldPrompt, a prompting strategy that combines Chain-of-Thought generation with in-context learning to mitigate hallucinations and improve reasoning. WeldPrompt improves model recall in certain contexts but exhibits inconsistent performance across others. These results underscore the limitations and potentials of MLLMs in high-stakes technical domains and highlight the importance of fine-tuning, domain-specific data, and more sophisticated prompting strategies to improve model reliability. The study opens avenues for further research into multimodal learning in industry applications. 
    more » « less
  2. Agarwal, Alekh; Belgrave, Danielle; Cho, Kyunghyun; Oh, Alice (Ed.)
    We propose a new approach to automated theorem proving where an AlphaZero-style agent is self-training to refine a generic high-level expert strategy expressed as a nondeterministic program. An analogous teacher agent is self-training to generate tasks of suitable relevance and difficulty for the learner. This allows leveraging minimal amounts of domain knowledge to tackle problems for which training data is unavailable or hard to synthesize. As a specific illustration, we consider loop invariant synthesis for imperative programs and use neural networks to refine both the teacher and solver strategies. 
    more » « less
  3. Abstract Industrial environments demand accurate detection of anomalies to maintain product quality and ensure operational safety. Traditional industrial anomaly detection (IAD) methods often lack the flexibility and adaptability needed in dynamic production settings, where new defect types and operational changes continually emerge. Recent advancements in multimodal large language models (MLLMs) have shown promise by combining visual and textual processing capabilities, yet they are often limited by their lack of domain-specific expertise, particularly regarding industry-standard defect tolerances. To overcome limitations, we introduce Echo, a novel multi-expert framework designed to enhance MLLM performance for IAD. Echo integrates four specialized modules: the Reference Extractor retrieves similar normal images to establish contextual baselines; the Knowledge Guide provides critical, industry-specific insights; the Reasoning Expert enables structured, stepwise analysis for complex queries; and the Decision Maker synthesizes information from the preceding modules to deliver precise, context-aware responses. Evaluations on the MMAD benchmark reveal that Echo significantly improves adaptability, precision, and robustness compared to conventional approaches. Our results demonstrate that guided MLLMs, when augmented with expert modules, can effectively bridge the gap between general visual understanding and the specialized requirements of industrial anomaly detection, paving the way for more reliable and interpretable inspection systems. 
    more » « less
  4. Cox, Michael T. (Ed.)
    Goal reasoning agents can solve novel problems by detecting an anomaly between expectations and observations; generating explanations about plausible causes for the anomaly; and formulating goals to remove the cause. Yet not all anomalies represent problems. We claim that the task of discerning the difference between benign anomalies and those that represent an actual problem by an agent will increase its performance. Furthermore, we present a new definition of the term “problem” in a goal reasoning context. This paper discusses the role of explanations and goal formulation in response to developing problems and implements the response. The paper illustrates goal formulation in a mine clearance domain and a labor relations domain. We also show the empirical difference between a standard planning agent, an agent that detects anomalies and an agent that recognizes problems. 
    more » « less
  5. Expert-layman text style transfer technologies have the potential to improve communication between members of scientific communities and the general public. High-quality information produced by experts is often filled with difficult jargon laypeople struggle to understand. This is a particularly notable issue in the medical domain, where layman are often confused by medical text online. At present, two bottlenecks interfere with the goal of building high-quality medical expert-layman style transfer systems: a dearth of pretrained medical-domain language models spanning both expert and layman terminologies and a lack of parallel corpora for training the transfer task itself. To mitigate the first issue, we propose a novel language model (LM) pretraining task, Knowledge Base Assimilation, to synthesize pretraining data from the edges of a graph of expert- and layman-style medical terminology terms into an LM during self-supervised learning. To mitigate the second issue, we build a large-scale parallel corpus in the medical expert-layman domain using a margin-based criterion. Our experiments show that transformer-based models pretrained on knowledge base assimilation and other well-established pretraining tasks fine-tuning on our new parallel corpus leads to considerable improvement against expert-layman transfer benchmarks, gaining an average relative improvement of our human evaluation, the Overall Success Rate (OSR), by 106%. 
    more » « less