Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            ABSTRACT Joint inquiry requires agents to exchange public content about some target domain, which in turn requires them to track which content a linguistic form contributes to a conversation. But, often, the inquiry delivers a necessary truth. For example, if we are inquiring whether a particular bird, Tweety, is a woodpecker, and discover that it is, then our inquiry concluding in this fact would conclude in a necessity, and the form “Tweety is a woodpecker” expresses this necessary truth. Still, whether Tweety is a woodpecker seems a perfectly legitimate object of study, and the answers we accrue can be informative. But the dominant model of inquiry (Stalnaker, 1978, 1984) treats this situation as linguistically deviant, and diagnoses our ignorance and subsequent discovery as metalinguistic: we were ignorant, and ultimately discovered something, about the meaning of our terms. Rather than linguistic deviation, we argue this situation is the norm, and one that calls for an alternative model of inquiry. This paper develops such a model. It shows that to capture how agents can learn something informative about the world—and not merely language—even when inquiry concerns necessary facts, it's key to track how moves in discourse contribute public content onto the conversational record, but also, crucially, how those moves are connected by coherence relations to one another and to real‐world situations they are about. This allows us to capture that utterances contribute determinate, public content, while representing the information states of the interlocutors who may have only partial access to the evidence and content of the conversation, without making their ignorance metalinguistic. It lets us give precise explanations why some discourses can be transparently convincing in the conclusions they underwrite. The model thus precisifies the role of public context and shared content in anchoring an inquiry. It allows for imperfect tracking of linguistic contributions that are binding for how inquiry unfolds, and it allows for an inquiry into the status of necessary truths to be both informative, and involve empirical, rather than metalinguistic, ignorance.more » « lessFree, publicly-accessible full text available September 23, 2026
- 
            Abstract 3D facial animation synthesis from audio has been a focus in recent years. However, most existing literature works are designed to map audio and visual content, providing limited knowledge regarding the relationship between emotion in audio and expressive facial animation. This work generates audio‐matching facial animations with the specified emotion label. In such a task, we argue that separating the content from audio is indispensable—the proposed model must learn to generate facial content from audio content while expressions from the specified emotion. We achieve it by an adaptive instance normalization module that isolates the content in the audio and combines the emotion embedding from the specified label. The joint content‐emotion embedding is then used to generate 3D facial vertices and texture maps. We compare our method with state‐of‐the‐art baselines, including the facial segmentation‐based and voice conversion‐based disentanglement approaches. We also conduct a user study to evaluate the performance of emotion conditioning. The results indicate that our proposed method outperforms the baselines in animation quality and expression categorization accuracy.more » « less
- 
            We study resolution of ambiguity in prepositional phrase attachment by Large Language Models in the zero-shot setting. We evaluate a strong “plausibility” baseline derived from token probabilities of descriptions encoding alternative attachments, and explore possible improvements using additional token probabilities that reflect aspects of information structure. Error analysis suggests directions for more sophisticated tools, common-sense reasoning, world knowledge, and additional context to better resolve ambiguity.more » « lessFree, publicly-accessible full text available September 8, 2026
- 
            Lai, Kenneth; Wein, Shira (Ed.)Task-oriented dialogue (TOD) requires capabilities such as lookahead planning, reasoning, and belief state tracking, which continue to present challenges for end-to-end methods based on large language models (LLMs). As a possible method of addressing these concerns, we are exploring the integration of structured semantic representations with planning inferences. As a first step in this project, we describe an algorithm for generating Minimal Recursion Semantics (MRS) from dependency parses, obtained from a machine learning (ML) syntactic parser, and validate its performance on a challenging cooking domain. Specifically, we compare predicate-argument relations recovered by our approach with predicate-argument relations annotated using Abstract Meaning Representation (AMR). Our system is consistent with the gold standard in 94.1% of relations.more » « lessFree, publicly-accessible full text available August 4, 2026
- 
            Human communication often combines imagery and text into integrated presentations, especially online. In this paper, we show how image–text coherence relations can be used to model the pragmatics of image–text presentations in AI systems. In contrast to alternative frameworks that characterize image–text presentations in terms of the priority, relevance, or overlap of information across modalities, coherence theory postulates that each unit of a discourse stands in specific pragmatic relations to other parts of the discourse, with each relation involving its own information goals and inferential connections. Text accompanying an image may, for example, characterize what's visible in the image, explain how the image was obtained, offer the author's appraisal of or reaction to the depicted situation, and so forth. The advantage of coherence theory is that it provides a simple, robust, and effective abstraction of communicative goals for practical applications. To argue this, we review case studies describing coherence in image–text data sets, predicting coherence from few-shot annotations, and coherence models of image–text tasks such as caption generation and caption evaluation.more » « less
- 
            Effective teamwork depends on teammates’ ability to maintain common ground: mutual knowledge about the relevant state of the world and the relevant status of teammates’ actions and plans. This ability integrates diverse skills of reasoning and communication: agents can track common ground by recognizing and registering public updates to ongoing activity, but when this evidence is incomplete, agents may need to describe what they are doing or ask what others are doing. In this paper, we introduce an architecture for integrating these diverse skills to maintain common ground in human–AI teamwork. Our approach offers unique advantages of simplicity, modularity, and extensibility by leveraging generic tools for plan recognition, planning, natural language understanding and generation, and dialogue management. Worked examples illustrate how linguistic and practical reasoning complement each other in the realization of key interactive skills.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available