skip to main content

Title: Mutual exclusivity as a challenge for deep neural networks
Strong inductive biases allow children to learn in fast and adaptable ways. Children use the mutual exclusivity (ME) bias to help disambiguate how words map to referents, assuming that if an object has one label then it does not need another. In this paper, we investigate whether or not vanilla neural architectures have an ME bias, demonstrating that they lack this learning assumption. Moreover, we show that their inductive biases are poorly matched to lifelong learning formulations of classification and translation. We demonstrate that there is a compelling case for designing task-general neural networks that learn through mutual exclusivity, which remains an open challenge.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Infant language learners are faced with the difficult inductive problem of determining how new words map to novel or known objects in their environment. Bayesian inference models have been successful at using the sparse information available in natural child‐directed speech to build candidate lexicons and infer speakers’ referential intentions. We begin by asking how a Bayesian model optimized for monolingual input (the Intentional Model; Frank et al., 2009) generalizes to new monolingual or bilingual corpora and find that, especially in the case of the bilingual input, the model shows a significant decrease in performance. In the next experiment, we propose the ME Model, a modified Bayesian model, which approximates infants’ mutual exclusivity bias to support the differential demands of monolingual and bilingual learning situations. The extended model is assessed using the same corpora of real child‐directed speech, showing that its performance is more robust against varying input and less dependent than the Intentional Model on optimization of its parsimony parameter. We argue that both monolingual and bilingual demands on word learning are important considerations for a computational model, as they can yield significantly different results than when only one such context is considered.

    more » « less
  2. TypeScript is a widely used optionally-typed language where developers can adopt “pay as you go” typing: they can add types as desired, and benefit from static typing. The “type annotation tax” or manual effort required to annotate new or existing TypeScript can be reduced by a variety of automatic methods. Probabilistic machine-learning (ML) approaches work quite well. ML approaches use different inductive biases, ranging from simple token sequences to complex graphical neural network (GNN) models capturing syntax and semantic relations. More sophisticated inductive biases are hand-engineered to exploit the formal nature of software. Rather than deploying fancy inductive biases for code, can we just use “big data” to learn natural patterns relevant to typing? We find evidence suggesting that this is the case. We present TypeBert, demonstrating that even with simple token-sequence inductive bias used in BERT-style models and enough data, type-annotation performance of the most sophisticated models can be surpassed. 
    more » « less
  3. To adapt to their environments, animals must generate behaviors that are closely aligned to a rapidly changing sensory world. However, behavioral states such as foraging or courtship typically persist over long time scales to ensure proper execution. It remains unclear how neural circuits generate persistent behavioral states while maintaining the flexibility to select among alternative states when the sensory context changes. Here, we elucidate the functional architecture of a neural circuit controlling the choice between roaming and dwelling states, which underlie exploration and exploitation during foraging in C. elegans . By imaging ensemble-level neural activity in freely moving animals, we identify stereotyped changes in circuit activity corresponding to each behavioral state. Combining circuit-wide imaging with genetic analysis, we find that mutual inhibition between two antagonistic neuromodulatory systems underlies the persistence and mutual exclusivity of the neural activity patterns observed in each state. Through machine learning analysis and circuit perturbations, we identify a sensory processing neuron that can transmit information about food odors to both the roaming and dwelling circuits and bias the animal towards different states in different sensory contexts, giving rise to context-appropriate state transitions. Our findings reveal a potentially general circuit architecture that enables flexible, sensory-driven control of persistent behavioral states. 
    more » « less
  4. Dataset bias and spurious correlations can significantly impair generalization in deep neural networks. Many prior efforts have addressed this problem using either alternative loss functions or sampling strategies that focus on rare patterns. We propose a new direction: modifying the network architecture to impose inductive biases that make the network robust to dataset bias. Specifically, we propose OccamNets, which are biased to favor simpler solutions by design. OccamNets have two inductive biases. First, they are biased to use as little network depth as needed for an individual example. Second, they are biased toward using fewer image locations for prediction. While OccamNets are biased toward simpler hypotheses, they can learn more complex hypotheses if necessary. In experiments, OccamNets outperform or rival state-of-the-art methods run on architectures that do not incorporate these inductive biases. Furthermore, we demonstrate that when the state-of-the-art debiasing methods are combined with OccamNets results further improve. 
    more » « less
  5. When children learn their native language, they tend to treat objects as if they only have one label—a principle known as mutual exclusivity. However, bilingual children are faced with a different cognitive challenge—they need to learn to associate two labels with one object. In the present study, we compared bilingual and monolingual 24-month-olds' performance on a challenging and semi-naturalistic forced-choice referent selection task and retention test. Overall, both language groups performed similarly on referent selection but differed on retention. Specifically, while monolingual infants showed some retention, bilingual infants performed at chance and significantly worse than their monolingual peers. 
    more » « less