NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Sun, Fan-Yun; Liu, Weiyu; Gu, Siyu; Lim, Dylan; Bhat, Goutam; Tombari, Federico; Li, Manling; Haber, Nick; Wu, Jiajun (June 2025, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Free, publicly-accessible full text available June 5, 2026
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos

Liu, Yunong; Eyzaguirre, Cristobal; Li, Manling; Khanna, Shubh; Niebles, Juan Carlos; Ravi, Vineeth; Mishra, Saumitra; Liu, Weiyu; Wu, Jiajun (December 2024, Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks)

Full Text Available
The Law of Knowledge Overshadowing: Towards Understanding, Predicting and Preventing LLM Hallucination

https://doi.org/10.18653/v1/2025.findings-acl.1199

Zhang, Yuji; Li, Sha; Qian, Cheng; Liu, Jiateng; Yu, Pengfei; Han, Chi; Fung, Yi R; McKeown, Kathleen; Zhai, ChengXiang; Li, Manling; et al (January 2025, Association for Computational Linguistics)

Full Text Available
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

Li, Manling; Zhao, Shiyu; Wang, Qineng; Wang, Kangrui; Zhou, Yu; Srivastava, Sanjana; Gokmen, Cem; Lee, Tony; Li, Li Erran; Zhang, Ruohan; et al (December 2024, Advances in Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks))

Full Text Available
Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification

https://doi.org/10.18653/v1/2023.acl-long.312

Li, Sha; Zhao, Ruining; Li, Manling; Ji, Heng; Callison-Burch, Chris; Han, Jiawei (January 2023, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics)

Event schemas are a form of world knowledge about the typical progression of events. Recent methods for event schema induction use information extraction systems to construct a large number of event graph instances from documents, and then learn to generalize the schema from such instances. In contrast, we propose to treat event schemas as a form of commonsense knowledge that can be derived from large language models (LLMs). This new paradigm greatly simplifies the schema induction process and allows us to handle both hierarchical relations and temporal relations between events in a straightforward way. Since event schemas have complex graph structures, we design an incremental prompting and verification method INCPROMPT to break down the construction of a complex event graph into three stages: event skeleton construction, event expansion, and event-event relation verification. Compared to directly using LLMs to generate a linearized graph, INCPROMPT can generate large and complex schemas with 7.2% F1 improvement in temporal relations and 31.0% F1 improvement in hierarchical relations. In addition, compared to the previous state-of-the-art closed-domain schema induction model, human assessors were able to cover ∼10% more events when translating the schemas into coherent stories and rated our schemas 1.3 points higher (on a 5-point scale) in terms of readability.
more » « less
Full Text Available
New Frontiers of Information Extraction

https://doi.org/10.18653/v1/2022.naacl-tutorials.3

Chen, Muhao; Huang, Lifu; Li, Manling; Zhou, Ben; Ji, Heng; Roth, Dan (January 2022, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials)

This tutorial targets researchers and practitioners who are interested in AI and ML technologies for structural information extraction (IE) from unstructured textual sources. Particularly, this tutorial will provide audience with a systematic introduction to recent advances of IE, by answering several important research questions. These questions include (i) how to develop an robust IE system from noisy, insufficient training data, while ensuring the reliability of its prediction? (ii) how to foster the generalizability of IE through enhancing the system’s cross-lingual, cross-domain, cross-task and cross-modal transferability? (iii) how to precisely support extracting structural information with extremely fine-grained, diverse and boundless labels? (iv) how to further improve IE by leveraging indirect supervision from other NLP tasks, such as NLI, QA or summarization, and pre-trained language models? (v) how to acquire knowledge to guide the inference of IE systems? We will discuss several lines of frontier research that tackle those challenges, and will conclude the tutorial by outlining directions for further investigation.
more » « less
Full Text Available
Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization

Li, Manling; Zhang, Lingyu; Radke, Richard J.; Ji, Heng (July 2019, 57th Conference of the Association for Computational Linguistics)

Transcripts of natural, multi-person meetings differ significantly from documents like news articles, which can make Natural Language Generation models generate unfocused summaries. We develop an abstractive meeting summarizer from both videos and audios of meeting recordings. Specifically, we propose a multi-modal hierarchical attention mechanism across three levels: topic segment, utterance and word. To narrow down the focus into topically-relevant segments, we jointly model topic segmentation and summarization. In addition to traditional textual features, we introduce new multi-modal features derived from visual focus of attention, based on the assumption that an utterance is more important if its speaker receives more attention. Experiments show that our model significantly outperforms the state-of-the-art with both BLEU and ROUGE measures.
more » « less
Full Text Available
The unobtrusive group interaction (UGI) corpus

https://doi.org/10.1145/3304109.3325816

Bhattacharya, Indrani; Foley, Michael; Ku, Christine; Zhang, Ni; Zhang, Tongtao; Mine, Cameron; Li, Manling; Ji, Heng; Riedl, Christoph; Welles, Brooke Foucault; et al (June 2019, Proceedings of the 10th ACM Multimedia Systems Conference)

Studying group dynamics requires fine-grained spatial and temporal understanding of human behavior. Social psychologists studying human interaction patterns in face-to-face group meetings often find themselves struggling with huge volumes of data that require many hours of tedious manual coding. There are only a few publicly available multi-modal datasets of face-to-face group meetings that enable the development of automated methods to study verbal and non-verbal human behavior. In this paper, we present a new, publicly available multi-modal dataset for group dynamics study that differs from previous datasets in its use of ceiling-mounted, unobtrusive depth sensors. These can be used for fine-grained analysis of head and body pose and gestures, without any concerns about participants' privacy or inhibited behavior. The dataset is complemented by synchronized and time-stamped meeting transcripts that allow analysis of spoken content. The dataset comprises 22 group meetings in which participants perform a standard collaborative group task designed to measure leadership and productivity. Participants' post-task questionnaires, including demographic information, are also provided as part of the dataset. We show the utility of the dataset in analyzing perceived leadership, contribution, and performance, by presenting results of multi-modal analysis using our sensor-fusion algorithms designed to automatically understand audio-visual interactions.
more » « less
Full Text Available

Search for: All records