skip to main content


Title: InFillmore: Frame-Guided Language Generation with Bidirectional Context
We propose a structured extension to bidirectional-context conditional language generation, or “infilling,” inspired by Frame Semantic theory (Fillmore, 1976). Guidance is provided through two approaches: (1) model fine-tuning, conditioning directly on observed symbolic frames, and (2) a novel extension to disjunctive lexically constrained decoding that leverages frame semantic lexical units. Automatic and human evaluations confirm that frame-guided generation allows for explicit manipulation of intended infill semantics, with minimal loss in distinguishability from human-generated text. Our methods flexibly apply to a variety of use scenarios, and we provide an interactive web demo  more » « less
Award ID(s):
2020969
NSF-PAR ID:
10302086
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 10th Conference on Lexical and Computational Semantics
Page Range / eLocation ID:
129-142
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Language-guided smart systems can help to design next-generation human-machine interactive applications. The dense text description is one of the research areas where systems learn the semantic knowledge and visual features of each video frame and map them to describe the video's most relevant subjects and events. In this paper, we consider untrimmed sports videos as our case study. Generating dense descriptions in the sports domain to supplement journalistic works without relying on commentators and experts requires more investigation. Motivated by this, we propose an end-to-end automated text-generator, SpecTextor, that learns the semantic features from untrimmed videos of sports games and generates associated descriptive texts. The proposed approach considers the video as a sequence of frames and sequentially generates words. After splitting videos into frames, we use a pre-trained VGG-16 model for feature extraction and encoding the video frames. With these encoded frames, we posit a Long Short-Term Memory (LSTM) based attention-decoder pipeline that leverages soft-attention mechanism to map the semantic features with relevant textual descriptions to generate the explanation of the game. Because developing a comprehensive description of the game warrants training on a set of dense time-stamped captions, we leverage two available public datasets: ActivityNet Captions and Microsoft Video Description. In addition, we utilized two different decoding algorithms: beam search and greedy search and computed two evaluation metrics: BLEU and METEOR scores. 
    more » « less
  2. Dutch, Rebecca Ellis. (Ed.)
    ABSTRACT Opium poppy mosaic virus (OPMV) is a recently discovered umbravirus in the family Tombusviridae . OPMV has a plus-sense genomic RNA (gRNA) of 4,241 nucleotides (nt) from which replication protein p35 and p35 extension product p98, the RNA-dependent RNA polymerase (RdRp), are expressed. Movement proteins p27 (long distance) and p28 (cell to cell) are expressed from a 1,440-nt subgenomic RNA (sgRNA2). A highly conserved structure was identified just upstream from the sgRNA2 transcription start site in all umbraviruses, which includes a carmovirus consensus sequence, denoting generation by an RdRp-mediated mechanism. OPMV also has a second sgRNA of 1,554 nt (sgRNA1) that starts just downstream of a canonical exoribonuclease-resistant sequence (xrRNA D ). sgRNA1 codes for a 30-kDa protein in vitro that is in frame with p28 and cannot be synthesized in other umbraviruses. Eliminating sgRNA1 or truncating the p30 open reading frame (ORF) without affecting p28 substantially reduced accumulation of OPMV gRNA, suggesting a functional role for the protein. The 652-nt 3′ untranslated region of OPMV contains two 3′ cap-independent translation enhancers (3′ CITEs), a T-shaped structure (TSS) near its 3′ end, and a Barley yellow dwarf virus -like translation element (BTE) in the central region. Only the BTE is functional in luciferase reporter constructs containing gRNA or sgRNA2 5′ sequences in vivo , which differs from how umbravirus 3′ CITEs were used in a previous study. Similarly to most 3′ CITEs, the OPMV BTE links to the 5′ end via a long-distance RNA-RNA interaction. Analysis of 14 BTEs revealed additional conserved sequences and structural features beyond the previously identified 17-nt conserved sequence. IMPORTANCE Opium poppy mosaic virus (OPMV) is an umbravirus in the family Tombusviridae . We determined that OPMV accumulates two similarly sized subgenomic RNAs (sgRNAs), with the smaller known to code for proteins expressed from overlapping open reading frames. The slightly larger sgRNA1 has a 5′ end just upstream from a previously predicted xrRNA D site, identifying this sgRNA as an unusually long product produced by exoribonuclease trimming. Although four umbraviruses have similar predicted xrRNA D sites, only sgRNA1 of OPMV can code for a protein that is an extension product of umbravirus ORF4. Inability to generate the sgRNA or translate this protein was associated with reduced gRNA accumulation in vivo . We also characterized the OPMV BTE structure, a 3′ cap-independent translation enhancer (3′ CITE). Comparisons of 13 BTEs with the OPMV BTE revealed additional stretches of sequence similarity beyond the 17-nt signature sequence, as well as conserved structural features not previously recognized in these 3′ CITEs. 
    more » « less
  3. Chenyang Lu (Ed.)

    The design and analysis of multi-agent human cyber-physical systems in safety-critical or industry-critical domains calls for an adequate semantic foundation capable of exhaustively and rigorously describing all emergent effects in the joint dynamic behavior of the agents that are relevant to their safety and well-behavior. We present such a semantic foundation. This framework extends beyond previous approaches by extending the agent-local dynamic state beyond state components under direct control of the agent and belief about other agents (as previously suggested for understanding cooperative as well as rational behavior) to agent-local evidence and belief about the overall cooperative, competitive, or coopetitive game structure. We argue that this extension is necessary for rigorously analyzing systems of human cyber-physical systems because humans are known to employ cognitive replacement models of system dynamics that are both non-stationary and potentially incongruent. These replacement models induce visible and potentially harmful effects on their joint emergent behavior and the interaction with cyber-physical system components.

     
    more » « less
  4. The goal of this article is to enable robots to perform robust task execution following human instructions in partially observable environments. A robot’s ability to interpret and execute commands is fundamentally tied to its semantic world knowledge. Commonly, robots use exteroceptive sensors, such as cameras or LiDAR, to detect entities in the workspace and infer their visual properties and spatial relationships. However, semantic world properties are often visually imperceptible. We posit the use of non-exteroceptive modalities including physical proprioception, factual descriptions, and domain knowledge as mechanisms for inferring semantic properties of objects. We introduce a probabilistic model that fuses linguistic knowledge with visual and haptic observations into a cumulative belief over latent world attributes to infer the meaning of instructions and execute the instructed tasks in a manner robust to erroneous, noisy, or contradictory evidence. In addition, we provide a method that allows the robot to communicate knowledge dissonance back to the human as a means of correcting errors in the operator’s world model. Finally, we propose an efficient framework that anticipates possible linguistic interactions and infers the associated groundings for the current world state, thereby bootstrapping both language understanding and generation. We present experiments on manipulators for tasks that require inference over partially observed semantic properties, and evaluate our framework’s ability to exploit expressed information and knowledge bases to facilitate convergence, and generate statements to correct declared facts that were observed to be inconsistent with the robot’s estimate of object properties. 
    more » « less
  5. Object proposal generation serves as a standard pre-processing step in Vision-Language (VL) tasks (image captioning, visual question answering, etc.). The performance of object proposals generated for VL tasks is currently evaluated across all available annotations, a protocol that we show is misaligned - higher scores do not necessarily correspond to improved performance on downstream VL tasks. Our work serves as a study of this phenomenon and explores the effectiveness of semantic grounding to mitigate its effects. To this end, we propose evaluating object proposals against only a subset of available annotations, selected by thresholding an annotation importance score. Importance of object annotations to VL tasks is quantified by extracting relevant semantic information from text describing the image. We show that our method is consistent and demonstrates greatly improved alignment with annotations selected by image captioning metrics and human annotation when compared against existing techniques. Lastly, we compare current detectors used in the Scene Graph Generation (SGG) benchmark as a use case, which serves as an example of when traditional object proposal evaluation techniques are misaligned. 
    more » « less