NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Inquiry and Logical Form

https://doi.org/10.1111/phpe.70001

Stojnić, Una; Stone, Matthew (September 2025, Philosophical Perspectives)

ABSTRACT Joint inquiry requires agents to exchange public content about some target domain, which in turn requires them to track which content a linguistic form contributes to a conversation. But, often, the inquiry delivers a necessary truth. For example, if we are inquiring whether a particular bird, Tweety, is a woodpecker, and discover that it is, then our inquiry concluding in this fact would conclude in a necessity, and the form “Tweety is a woodpecker” expresses this necessary truth. Still, whether Tweety is a woodpecker seems a perfectly legitimate object of study, and the answers we accrue can be informative. But the dominant model of inquiry (Stalnaker, 1978, 1984) treats this situation as linguistically deviant, and diagnoses our ignorance and subsequent discovery as metalinguistic: we were ignorant, and ultimately discovered something, about the meaning of our terms. Rather than linguistic deviation, we argue this situation is the norm, and one that calls for an alternative model of inquiry. This paper develops such a model. It shows that to capture how agents can learn something informative about the world—and not merely language—even when inquiry concerns necessary facts, it's key to track how moves in discourse contribute public content onto the conversational record, but also, crucially, how those moves are connected by coherence relations to one another and to real‐world situations they are about. This allows us to capture that utterances contribute determinate, public content, while representing the information states of the interlocutors who may have only partial access to the evidence and content of the conversation, without making their ignorance metalinguistic. It lets us give precise explanations why some discourses can be transparently convincing in the conclusions they underwrite. The model thus precisifies the role of public context and shared content in anchoring an inquiry. It allows for imperfect tracking of linguistic contributions that are binding for how inquiry unfolds, and it allows for an inquiry into the status of necessary truths to be both informative, and involve empirical, rather than metalinguistic, ignorance.
more » « less
Free, publicly-accessible full text available September 23, 2026
Disentangling audio content and emotion with adaptive instance normalization for expressive facial animation synthesis

https://doi.org/10.1002/cav.2076

Chang, Che‐Jui; Zhao, Long; Zhang, Sen; Kapadia, Mubbasir (July 2022, Computer Animation and Virtual Worlds)

Abstract 3D facial animation synthesis from audio has been a focus in recent years. However, most existing literature works are designed to map audio and visual content, providing limited knowledge regarding the relationship between emotion in audio and expressive facial animation. This work generates audio‐matching facial animations with the specified emotion label. In such a task, we argue that separating the content from audio is indispensable—the proposed model must learn to generate facial content from audio content while expressions from the specified emotion. We achieve it by an adaptive instance normalization module that isolates the content in the audio and combines the emotion embedding from the specified label. The joint content‐emotion embedding is then used to generate 3D facial vertices and texture maps. We compare our method with state‐of‐the‐art baselines, including the facial segmentation‐based and voice conversion‐based disentanglement approaches. We also conduct a user study to evaluate the performance of emotion conditioning. The results indicate that our proposed method outperforms the baselines in animation quality and expression categorization accuracy.
more » « less
Can LLMs Disambiguate Grounded Language? The Case of PP Attachment

Blackmore, John; Stone, Matthew (September 2025, ACL anthology)

We study resolution of ambiguity in prepositional phrase attachment by Large Language Models in the zero-shot setting. We evaluate a strong “plausibility” baseline derived from token probabilities of descriptions encoding alternative attachments, and explore possible improvements using additional token probabilities that reflect aspects of information structure. Error analysis suggests directions for more sophisticated tools, common-sense reasoning, world knowledge, and additional context to better resolve ambiguity.
more » « less
Free, publicly-accessible full text available September 8, 2026
Using MRS for Semantic Representation in Task-Oriented Dialogue

George, Denson; Khalid, Baber; Stone, Matthew (August 2025, https://aclanthology.org/2025.dmr-1.4/)
Lai, Kenneth; Wein, Shira (Ed.)
Task-oriented dialogue (TOD) requires capabilities such as lookahead planning, reasoning, and belief state tracking, which continue to present challenges for end-to-end methods based on large language models (LLMs). As a possible method of addressing these concerns, we are exploring the integration of structured semantic representations with planning inferences. As a first step in this project, we describe an algorithm for generating Minimal Recursion Semantics (MRS) from dependency parses, obtained from a machine learning (ML) syntactic parser, and validate its performance on a challenging cooking domain. Specifically, we compare predicate-argument relations recovered by our approach with predicate-argument relations annotated using Abstract Meaning Representation (AMR). Our system is consistent with the gold standard in 94.1% of relations.
more » « less
Free, publicly-accessible full text available August 4, 2026
Image–text coherence and its implications for multimodal AI

https://doi.org/10.3389/frai.2023.1048874

Alikhani, Malihe; Khalid, Baber; Stone, Matthew (May 2023, Frontiers in Artificial Intelligence)

Human communication often combines imagery and text into integrated presentations, especially online. In this paper, we show how image–text coherence relations can be used to model the pragmatics of image–text presentations in AI systems. In contrast to alternative frameworks that characterize image–text presentations in terms of the priority, relevance, or overlap of information across modalities, coherence theory postulates that each unit of a discourse stands in specific pragmatic relations to other parts of the discourse, with each relation involving its own information goals and inferential connections. Text accompanying an image may, for example, characterize what's visible in the image, explain how the image was obtained, offer the author's appraisal of or reaction to the depicted situation, and so forth. The advantage of coherence theory is that it provides a simple, robust, and effective abstraction of communicative goals for practical applications. To argue this, we review case studies describing coherence in image–text data sets, predicting coherence from few-shot annotations, and coherence models of image–text tasks such as caption generation and caption evaluation.
more » « less
Full Text Available
Procedure-Aware Pretraining for Instructional Video Understanding

Honglu Zhou, Roberto Martín-Martín (April 2023, Computer Vision and Pattern Recognition)

Full Text Available
The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents

https://doi.org/10.1145/3581641.3584045

Chang, Che-Jui; Sohn, Samuel S; Zhang, Sen; Jayashankar, Rajath; Usman, Muhammad; Kapadia, Mubbasir (March 2023, Intelligent User Interfaces 2023)

Full Text Available
An Integrated Architecture for Common Ground in Collaboration

Geib, Christopher; George, Denson; Khalid, Baber; Magnotti, Richard; Stone, Matthew (November 2022, http://www.cogsys.org/)

Effective teamwork depends on teammates’ ability to maintain common ground: mutual knowledge about the relevant state of the world and the relevant status of teammates’ actions and plans. This ability integrates diverse skills of reasoning and communication: agents can track common ground by recognizing and registering public updates to ongoing activity, but when this evidence is incomplete, agents may need to describe what they are doing or ask what others are doing. In this paper, we introduce an architecture for integrating these diverse skills to maintain common ground in human–AI teamwork. Our approach offers unique advantages of simplicity, modularity, and extensibility by leveraging generic tools for plan recognition, planning, natural language understanding and generation, and dialogue management. Worked examples illustrate how linguistic and practical reasoning complement each other in the realization of key interactive skills.
more » « less
Full Text Available
The IVI Lab entry to the GENEA Challenge 2022 – A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention Mechanism

https://doi.org/10.1145/3536221.3558060

Chang, Che-Jui; Zhang, Sen; Kapadia, Mubbasir (November 2022, GENEA Challenge 2022)

Full Text Available
HM: Hybrid Masking for Few-Shot Segmentation

Seonghyeon Moon, Samuel S. (November 2022, ECCV 2022)

Full Text Available

« Prev Next »

Search for: All records