NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM

Liu, Jiawei; Çoban, Enis Berk; Schevchenko, Zarina; Tang, Hao; Zhu, Zhigang; Mandel, Michael; Devaney, Johanna (December 2025, NeurIPS 2025 Multimodal Algorithmic Reasoning Workshop)

Standard training for Multi-modal Large Language Models (MLLMs) involves concatenating non-textual information, like vision or audio, with a text prompt. This approach may not encourage deep integration of modalities, limiting the model's ability to leverage the core language model's reasoning capabilities. This work examined the impact of interleaved instruction tuning in an audio MLLM, where audio tokens are interleaved within the prompt. Using the Listen, Think, and Understand (LTU) model as a testbed, we conduct an experiment using the Synonym and Hypernym Audio Reasoning Dataset (SHARD), our newly created reasoning benchmark for audio-based semantic reasoning focusing on synonym and hypernym recognition. Our findings show that while even zero-shot interleaved prompting improves performance on our reasoning tasks, a small amount of fine-tuning using interleaved training prompts improves the results further, however, at the expense of the MLLM's audio labeling ability.
more » « less
Free, publicly-accessible full text available December 7, 2026
Towards High Resolution Weather Monitoring With Sound Data

https://doi.org/10.1109/ICASSP48485.2024.10445999

Çoban, Enis Berk; Perra, Megan; Mandel, Michael I (April 2024, IEEE)

Full Text Available
Data-Centric Methods for Environmental Sound Classification With Limited Labels

https://doi.org/10.1109/TASLP.2024.3414332

Syed, Ali Raza; Çoban, Enis Berk; Pir, Dara; Mandel, Michael (January 2024, IEEE/ACM Transactions on Audio, Speech, and Language Processing)

Full Text Available
EDANSA-2019: The Ecoacoustic Dataset from Arctic North Slope Alaska

https://doi.org/10.5281/zenodo.6824272

Çoban, Enis Berk; Perra, Megan; Pir, Dara; Mandel, Michael (January 2022, Zenodo)

We are sharing the Ecoacoustic Dataset from Arctic North Slope Alaska (EDANSA-2019), a dataset with audio samples collected from the area of 9000 square miles throughout the 2019 summer season on the North Slope of Alaska and neighboring regions.</p> There are over 27 hours of labeled data according to 28 tags with enough instances of 9 important environmental classes to train baseline convolutional recognizers.</p> Please see the following GitHub page for the accompanying publication, updates about the dataset, and baseline code: https://github.com/speechLabBcCuny/EDANSA-2019 </p>
more » « less
EDANSA-2019: THE ECOACOUSTIC DATASET FROM ARCTIC NORTH SLOPE ALASKA

Çoban, Enis Berk; Perra, Megan; Pir, Dara; Mandel, Michael I. (January 2022, Workshop on the Detection and Classification of Acoustic Scenes and Events)

The arctic is warming at three times the rate of the global average, affecting the habitat and lifecycles of migratory species that reproduce there, like birds and caribou. Ecoacoustic monitoring can help efficiently track changes in animal phenology and behavior over large areas so that the impacts of climate change on these species can be better understood and potentially mitigated. We introduce here the Ecoacoustic Dataset from Arctic North Slope Alaska (EDANSA-2019), a dataset collected by a network of 100 autonomous recording units covering an area of 9000 square miles over the course of the 2019 summer season on the North Slope of Alaska and neighboring regions. We labeled over 27 hours of this dataset according to 28 tags with enough instances of 9 important environmental classes to train baseline convolutional recognizers. We are releasing this dataset and the corresponding baseline to the community to accelerate the recognition of these sounds and facilitate automated analyses of large-scale ecoacoustic databases.
more » « less
Full Text Available

Search for: All records