skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Award ID(s):
2046816
PAR ID:
10582331
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Proceedings of the 41st International Conference on Machine Learning
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A hallmark of human intelligence is the ability to understand and influence other minds. Humans engage in inferential social learning (ISL) by using commonsense psychology to learn from others and help others learn. Recent advances in artificial intelligence (AI) are raising new questions about the feasibility of human–machine interactions that support such powerful modes of social learning. Here, we envision what it means to develop socially intelligent machines that can learn, teach, and communicate in ways that are characteristic of ISL. Rather than machines that simply predict human behaviours or recapitulate superficial aspects of human sociality (e.g. smiling, imitating), we should aim to build machines that can learn from human inputs and generate outputs for humans by proactively considering human values, intentions and beliefs. While such machines can inspire next-generation AI systems that learn more effectively from humans (as learners) and even help humans acquire new knowledge (as teachers), achieving these goals will also require scientific studies of its counterpart: how humans reason about machine minds and behaviours. We close by discussing the need for closer collaborations between the AI/ML and cognitive science communities to advance a science of both natural and artificial intelligence. This article is part of a discussion meeting issue ‘Cognitive artificial intelligence’. 
    more » « less
  2. Humans can learn complex functional relationships between variables from small amounts of data. In doing so, they draw on prior expectations about the form of these relationships. In three experiments, we show that people learn to adjust these expectations through experience, learning about the likely forms of the functions they will encounter. Previous work has used Gaussian processes—a statistical framework that extends Bayesian nonparametric approaches to regression—to model human function learning. We build on this work, modeling the process of learning to learn functions as a form of hierarchical Bayesian inference about the Gaussian process hyperparameters. 
    more » « less
  3. State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of modern language models that enables task execution without parameter optimization, remain less explored compared to Transformers. In this study, we evaluate the ICL performance of SSMs, focusing on Mamba, against Transformer models across various tasks. Our results show that SSMs perform comparably to Transformers in standard regression ICL tasks, while outperforming them in tasks like sparse parity learning. However, SSMs fall short in tasks involving non-standard retrieval functionality. To address these limitations, we introduce a hybrid model, MambaFormer, that combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently. Our findings suggest that hybrid architectures offer promising avenues for enhancing ICL in language models. 
    more » « less
  4. We explore the behavior of a standard convolutional neural net in a continual-learning setting that introduces visual classification tasks sequentially and requires the net to master new tasks while preserving mastery of previously learned tasks. This setting corresponds to that which human learners face as they acquire domain expertise serially, for example, as an individual studies a textbook. Through simulations involving sequences of ten related visual tasks, we find reason for optimism that nets will scale well as they advance from having a single skill to becoming multi-skill domain experts. We observe two key phenomena. First, forward facilitation—the accelerated learning of task n+1 having learned n previous tasks—grows with n. Second, backward interference— the forgetting of the n previous tasks when learning task n + 1—diminishes with n. Amplifying forward facilitation is the goal of research on metalearning, and attenuating backward interference is the goal of research on catastrophic forgetting. We find that both of these goals are attained simply through broader exposure to a domain. 
    more » « less