skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: What is Learned in Visually Grounded Neural Syntax Acquisition
Visual features are a promising signal for learning bootstrap textual models. However, blackbox learning models make it difficult to isolate the specific contribution of visual components. In this analysis, we consider the case study of the Visually Grounded Neural Syntax Learner (Shi et al., 2019), a recent approach for learning syntax from a visual training signal. By constructing simplified versions of the model, we isolate the core factors that yield the model’s strong performance. Contrary to what the model might be capable of learning, we find significantly less expressive versions produce similar predictions and perform just as well, or even better. We also find that a simple lexical signal of noun concreteness plays the main role in the model’s predictions as opposed to more complex syntactic reasoning.  more » « less
Award ID(s):
1656998
PAR ID:
10197947
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Page Range / eLocation ID:
2615 to 2635
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Giove, Federico (Ed.)
    Approaches studying the dynamics of resting-state functional magnetic resonance imaging (rs-fMRI) activity often focus on time-resolved functional connectivity (tr-FC). While many tr-FC approaches have been proposed, most are linear approaches, e.g. computing the linear correlation at a timestep or within a window. In this work, we propose to use a generative non-linear deep learning model, a disentangled variational autoencoder (DSVAE), that factorizes out window-specific (context) information from timestep-specific (local) information. This has the advantage of allowing our model to capture differences at multiple temporal scales. We find that by separating out temporal scales our model’s window-specific embeddings, or as we refer to them, context embeddings, more accurately separate windows from schizophrenia patients and control subjects than baseline models and the standard tr-FC approach in a low-dimensional space. Moreover, we find that for individuals with schizophrenia, our model’s context embedding space is significantly correlated with both age and symptom severity. Interestingly, patients appear to spend more time in three clusters, one closer to controls which shows increased visual-sensorimotor, cerebellar-subcortical, and reduced cerebellar-visual functional network connectivity (FNC), an intermediate station showing increased subcortical-sensorimotor FNC, and one that shows decreased visual-sensorimotor, decreased subcortical-sensorimotor, and increased visual-subcortical domains. We verify that our model captures features that are complementary to - but not the same as - standard tr-FC features. Our model can thus help broaden the neuroimaging toolset in analyzing fMRI dynamics and shows potential as an approach for finding psychiatric links that are more sensitive to individual and group characteristics. 
    more » « less
  2. Simulink is a leading modelling language and data-flow environment for Model-Driven Engineering, prevalent in both industrial and educational contexts. Accordingly, there are many standalone publicly-available tools for analyzing and using Simulink models for various purposes. However, Simulink's model format has evolved to a new proprietary format, rendering many of these tools useless. To combat this, we devise an approach, SLX2MDL, that applies transformation rules based on Simulink syntax to transform the new SLX format models to models conforming to the legacy MDL syntax. The resulting approach enables backwards compatibility with existing tools, including previous versions of Simulink itself. Our 4-phase process includes analysis and extraction, merging and transformation of the common elements, transformation of the specialized Stateflow elements, and output production. We position this problem within the literature by comparing and contrasting similar, but insufficient, related approaches. We evaluate and validate SLX2MDL by applying it to 543 standard and publicly available models from an established and curated corpus. Our evaluation demonstrates 100% validity and correctness on these models based on functional equivalence. Further, we evaluate our approach's performance and find it consistent and scalable as model size and complexity increases. 
    more » « less
  3. We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. The agent hears multiple audio sources simultaneously (e.g., a person speaking down the hall in a noisy household) and it must use its eyes and ears to automatically separate out the sounds originating from a target object within a limited time budget. Towards this goal, we introduce a reinforcement learning approach that trains movement policies controlling the agent’s camera and microphone placement over time, guided by the improvement in predicted audio separation quality. We demonstrate our approach in scenarios motivated by both augmented reality (system is already co-located with the target object) and mobile robotics (agent begins arbitrarily far from the target object). Using state-of-the-art realistic audio-visual simulations in 3D environments, we demonstrate our model’s ability to find minimal movement sequences with maximal payoff for audio source separation. 
    more » « less
  4. Complex machine learning models for NLP are often brittle, making different predictions for input instances that are extremely similar semantically. To automatically detect this behavior for individual instances, we present semantically equivalent adversaries (SEAs) – semantic-preserving perturbations that induce changes in the model’s predictions. We generalize these adversaries into semantically equivalent adversarial rules (SEARs) – simple, universal replacement rules that induce adversaries on many instances. We demonstrate the usefulness and flexibility of SEAs and SEARs by detecting bugs in black-box state-of-the-art models for three domains: machine comprehension, visual question-answering, and sentiment analysis. Via user studies, we demonstrate that we generate high-quality local adversaries for more instances than humans, and that SEARs induce four times as many mistakes as the bugs discovered by human experts. SEARs are also actionable: retraining models using data augmentation significantly reduces bugs, while maintaining accuracy. 
    more » « less
  5. Model explanations that shed light on the model’s predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model’s predictions. In this work, we propose a novel framework for guiding model explanations by supervising them explicitly. To this end, our method, LEXPLAIN, uses task-related lexicons to directly supervise model explanations. This approach consistently improves the plausibility of model’s explanations without sacrificing performance on the task, as we demonstrate on sentiment analysis and toxicity detection. Our analyses show that our method also demotes spurious correlations (i.e., with respect to African American English dialect) on toxicity detection, improving fairness. 
    more » « less