NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs

Kil, Jihyung; Mai, Zheda; Lee, Justin; Chowdhury, Arpita; Wang, Zihe; Cheng, Kerrie; Wang, Lemeng; Liu, Ye; Chao, Wei-Lun (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))

The ability to compare objects, scenes, or situations is crucial for effective decision-making and problem-solving in everyday life. For instance, comparing the freshness of apples enables better choices during grocery shopping, while comparing sofa designs helps optimize the aesthetics of our living space. Despite its significance, the comparative capability is largely unexplored in artificial general intelligence (AGI). In this paper, we introduce MLLM-COMPBENCH, a benchmark designed to evaluate the comparative reasoning capability of multimodal large language models (MLLMs). MLLM-COMPBENCH mines and pairs images through visually oriented questions covering eight dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. We curate a collection of around 40K image pairs using metadata from diverse vision datasets and CLIP similarity scores. These image pairs span a broad array of visual domains, including animals, fashion, sports, and both outdoor and indoor scenes. The questions are carefully crafted to discern relative characteristics between two images and are labeled by human annotators for accuracy and relevance. We use MLLM-COMPBENCH to evaluate recent MLLMs, including GPT-4V(ision), Gemini-Pro, and LLaVA-1.6. Our results reveal notable shortcomings in their comparative abilities. We believe MLLM-COMPBENCH not only sheds light on these limitations but also establishes a solid foundation for future enhancements in the comparative capability of MLLMs.
more » « less
Free, publicly-accessible full text available December 15, 2025
COMPBENCH: A Comparative Reasoning Benchmark for Multimodal LLMs

Kil, Jihyung; Mai, Zheda; Lee, Justin; Wang, Zihe; Cheng, Kerrie; Wang, Lemeng; Liu, Ye; Chowdhury, Arpita; Chao, Wei-Lun (December 2024, NeurIPS)

Free, publicly-accessible full text available December 12, 2025
RAPTURE: a Remotely Accessible Platform of Testbeds for UAS Research and Education

https://doi.org/10.2514/6.2024-3569

Lee, Justin S; Palmer, Nicholas D; Xie, Junfei; Wan, Yan; Lu, Kejie; Fu, Shengli (July 2024, American Institute of Aeronautics and Astronautics)

Full Text Available
The Importance of Prompt Tuning for Automated Neuron Explanations

Lee, Justin; Oikarinen, Tuomas; Chatha, Arjun; Chang, Keng-Chi; Chen, Yilan; Weng, Tsui-Wei (December 2023, NeurIPS 2023 Attrib workshop)

Recent advances have greatly increased the capabilities of large language models (LLMs), but our understanding of the models and their safety has not progressed as fast. In this paper we aim to understand LLMs deeper by studying their individual neurons. We build upon previous work showing large language models such as GPT-4 can be useful in explaining what each neuron in a language model does. Specifically, we analyze the effect of the prompt used to generate explanations and show that reformatting the explanation prompt in a more natural way can significantly improve neuron explanation quality and greatly reduce computational cost. We demonstrate the effects of our new prompts in three different ways, incorporating both automated and human evaluations.
more » « less
Using Deep Learning to Detect Islamophobia on Reddit

https://doi.org/10.32473/flairs.36.133324

Aldreabi, Esraa; Lee, Justin M.; Blackburn, Jeremy (May 2023, The International FLAIRS Conference Proceedings)

Islamophobia, a negative predilection towards the Muslim community, is present on social media platforms. In addition to causing harm to victims, it also hurts the reputation of social media platforms that claim to provide a safe online environment for all users. The volume of social media content is impossible to be manually reviewed, thus, it is important to find automated solutions to combat hate speech on social media platforms. Machine learning approaches have been used in the literature as a way to automate hate speech detection. In this paper, we use deep learning techniques to detect Islamophobia over Reddit and topic modeling to analyze the content and reveal topics from comments identified as Islamophobic. Some topics we identified include the Islamic dress code, religious practices, marriage, and politics. To detect Islamophobia, we used deep learning models. The highest performance was achieved with BERTbase+CNN, with an F1-Score of 0.92.
more » « less
Full Text Available
Comparison of bacterial suppression by phage cocktails, dual‐receptor generalists, and coevolutionarily trained phages

https://doi.org/10.1111/eva.13518

Borin, Joshua M.; Lee, Justin J.; Gerbino, Krista R.; Meyer, Justin R. (January 2023, Evolutionary Applications)

Full Text Available
The elephant in the room: attention to salient scene features increases with comedic expertise

https://doi.org/10.1007/s10339-022-01079-0

Amir, Ori; Utterback, Konrad J.; Lee, Justin; Lee, Kevin S.; Kwon, Suehyun; Carroll, Dave M.; Papoutsaki, Alexandra (May 2022, Cognitive Processing)

Full Text Available
The positive environmental impact of virtual isotretinoin management

https://doi.org/10.1111/pde.14600

Lee, Justin; Yousaf, Ahmed; Jenkins, Samantha; Zaki, Mohammed Tamim; Napier, Cecelia; Abdul‐Aziz, Omar I.; Zinn, Zachary (May 2021, Pediatric Dermatology)

Full Text Available
Lignin-Derived Non-Heme Iron and Manganese Complexes: Catalysts for the On-Demand Production of Chlorine Dioxide in Water under Mild Conditions

https://doi.org/10.1021/acs.inorgchem.0c02742

Champ, Tayyebeh B.; Jang, Jun H.; Lee, Justin L.; Wu, Guang; Reynolds, Michael A.; Abu-Omar, Mahdi M. (March 2021, Inorganic Chemistry)
null (Ed.)
Full Text Available
Bluetongue Research at a Crossroads: Modern Genomics Tools Can Pave the Way to New Insights

https://doi.org/10.1146/annurev-animal-051721-023724

Kopanke, Jennifer; Carpenter, Molly; Lee, Justin; Reed, Kirsten; Rodgers, Case; Burton, Mollie; Lovett, Kierra; Westrich, Joseph A.; McNulty, Erin; McDermott, Emily; et al (February 2022, Annual Review of Animal Biosciences)

Bluetongue virus (BTV) is an arthropod-borne, segmented double-stranded RNA virus that can cause severe disease in both wild and domestic ruminants. BTV evolves via several key mechanisms, including the accumulation of mutations over time and the reassortment of genome segments.Additionally, BTV must maintain fitness in two disparate hosts, the insect vector and the ruminant. The specific features of viral adaptation in each host that permit host-switching are poorly characterized. Limited field studies and experimental work have alluded to the presence of these phenomena at work, but our understanding of the factors that drive or constrain BTV's genetic diversification remains incomplete. Current research leveraging novel approaches and whole genome sequencing applications promises to improve our understanding of BTV's evolution, ultimately contributing to the development of better predictive models and management strategies to reduce future impacts of bluetongue epizootics.
more » « less
Full Text Available

« Prev Next »

Search for: All records