NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench

Liu, Zheyuan; Dou, Guangyao; Jia, Mengzhao; Tan, Zhaoxuan; Zeng, Qingkai; Yuan, Yongle; Jiang, Meng (April 2025, Association for Computational Linguistics)

Generative models such as Large Language Models (LLM) and Multimodal Large Language models (MLLMs) trained on massive web corpora can memorize and disclose individuals’ confidential and private data, raising legal and ethical concerns. While many previous works have addressed this issue in LLM via machine unlearning, it remains largely unexplored for MLLMs. To tackle this challenge, we introduce Multimodal Large Language Model Unlearning Benchmark (MLLMU-Bench), a novel benchmark aimed at advancing the understanding of multimodal machine unlearning. MLLMU-Bench consists of 500 fictitious profiles and 153 profiles for public celebrities, each profile feature over 14 customized question-answer pairs, evaluated from both multimodal (image+text) and unimodal (text) perspectives. The benchmark is divided into four sets to assess unlearning algorithms in terms of efficacy, generalizability, and model utility. Finally, we provide baseline results using existing generative model unlearning algorithms. Surprisingly, our experiments show that unimodal unlearning algorithms excel in generation tasks, while multimodal unlearning approaches perform better in classification with multimodal inputs.
more » « less
Full Text Available
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

Zhu, Zifeng; Jia, Mengzhao; Zhang, Zhihan; Li, Lang; Jiang, Meng (April 2025, Association for Computational Linguistics)

Multimodal Large Language Models (MLLMs) have demonstrated impressive abilities across various tasks, including visual question answering and chart comprehension, yet existing benchmarks for chart-related tasks fall short in capturing the complexity of real-world multi-chart scenarios. Current benchmarks primarily focus on single-chart tasks, neglecting the multi-hop reasoning required to extract and integrate information from multiple charts, which is essential in practical applications. To fill this gap, we introduce MultiChartQA, a benchmark that evaluates MLLMs’ capabilities in four key areas: direct question answering, parallel question answering, comparative reasoning, and sequential reasoning. Our evaluation of a wide range of MLLMs reveals significant performance gaps compared to humans. These results highlight the challenges in multi-chart comprehension and the potential of MultiChartQA to drive advancements in this field. Our code and data are available at https://github.com/Zivenzhu/Multi-chart-QA.
more » « less
Full Text Available
IHEval: Evaluating Language Models on Following the Instruction Hierarchy

Zhang, Zhihan; Li, Shiyang; Zhang, Zixuan; Liu, Xin; Jiang, Haoming; Tang, Xianfeng; Gao, Yifan; Li, Zheng; Wang, Haodong; Tan, Zhaoxuan; et al (April 2025, Association for Computational Linguistics)
Chiruzzo, Luis; Ritter, Alan; Wang, Lu (Ed.)
The instruction hierarchy, which establishes a priority order from system messages to user messages, conversation history, and tool outputs, is essential for ensuring consistent and safe behavior in language models (LMs). Despite its importance, this topic receives limited attention, and there is a lack of comprehensive benchmarks for evaluating models’ ability to follow the instruction hierarchy. We bridge this gap by introducing IHEval, a novel benchmark comprising 3,538 examples across nine tasks, covering cases where instructions in different priorities either align or conflict. Our evaluation of popular LMs highlights their struggle to recognize instruction priorities. All evaluated models experience a sharp performance decline when facing conflicting instructions, compared to their original instruction-following performance. Moreover, the most competitive open-source model only achieves 48% accuracy in resolving such conflicts. Our results underscore the need for targeted optimization in the future development of LMs.
more » « less
Full Text Available
QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation

https://doi.org/10.18653/v1/2025.acl-long.1268

Nguyen, Bang; Du, Tingting; Yu, Mengxia; Angrave, Lawrence; Jiang, Meng (January 2025, Association for Computational Linguistics)

Full Text Available
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

https://doi.org/10.18653/v1/2025.naacl-long.566

Zhu, Zifeng; Jia, Mengzhao; Zhang, Zhihan; Li, Lang; Jiang, Meng (January 2025, Association for Computational Linguistics)

Full Text Available
Aligning Large Language Models with Implicit Preferences from User-Generated Content

https://doi.org/10.18653/v1/2025.acl-long.384

Tan, Zhaoxuan; Li, Zheng; Liu, Tianyi; Wang, Haodong; Yun, Hyokun; Zeng, Ming; Chen, Pei; Zhang, Zhihan; Gao, Yifan; Wang, Ruijie; et al (January 2025, Association for Computational Linguistics)

Full Text Available
Optimizing Decomposition for Optimal Claim Verification

https://doi.org/10.18653/v1/2025.acl-long.254

Lu, Yining; Ziems, Noah; Dang, Hy; Jiang, Meng (January 2025, Association for Computational Linguistics)

Full Text Available
Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models

https://doi.org/10.18653/v1/2025.acl-long.295

Liu, Zheyuan; Dou, Guangyao; Yuan, Xiangchi; Zhang, Chunhui; Tan, Zhaoxuan; Jiang, Meng (January 2025, Association for Computational Linguistics)

Full Text Available
CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts

https://doi.org/10.18653/v1/2025.findings-acl.214

Zeng, Qingkai; Bai, Yuyang; Tan, Zhaoxuan; Wu, Zhenyu; Feng, Shangbin; Jiang, Meng (January 2025, Association for Computational Linguistics)

Full Text Available
Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates

https://doi.org/10.18653/v1/2025.emnlp-main.1242

Dang, Hy; Liu, Tianyi; Wu, Zhuofeng; Yang, Jingfeng; Jiang, Haoming; Yang, Tao; Chen, Pei; Wang, Zhengyang; Wang, Helen; Li, Huasheng; et al (January 2025, Association for Computational Linguistics)

Full Text Available

« Prev Next »

Search for: All records