skip to main content


Title: Universality and individuality in neural dynamics across large populations of recurrent networks
Many recent studies have employed task-based modeling with recurrent neural networks (RNNs) to infer the computational function of different brain regions. These models are often assessed by quantitatively comparing the low-dimensional neural dynamics of the model and the brain, for example using canonical correlation analysis (CCA). However, the nature of the detailed neurobiological inferences one can draw from such efforts remains elusive. For example, to what extent does training neural networks to solve simple tasks, prevalent in neuroscientific studies, uniquely determine the low-dimensional dynamics independent of neural architectures? Or alternatively, are the learned dynamics highly sensitive to different neural architectures? Knowing the answer to these questions has strong implications on whether and how to use task-based RNN modeling to understand brain dynamics. To address these foundational questions, we study populations of thousands of networks of commonly used RNN architectures trained to solve neuroscientifically motivated tasks and characterize their low-dimensional dynamics via CCA and nonlinear dynamical systems analysis. We find the geometry of the dynamics can be highly sensitive to different network architectures, and further find striking dissociations between geometric similarity as measured by CCA and network function, yielding a cautionary tale. Moreover, we find that while the geometry of neural dynamics can vary greatly across architectures, the underlying computational scaffold: the topological structure of fixed points, transitions between them, limit cycles, and linearized dynamics, often appears {\it universal} across all architectures. Overall, this analysis of universality and individuality across large populations of RNNs provides a much needed foundation for interpreting quantitative measures of dynamical similarity between RNN and brain dynamics.  more » « less
Award ID(s):
1845166
NSF-PAR ID:
10291268
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Advances in neural information processing systems
Volume:
32
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription for how to develop such an understanding, remains elusive. In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task. Given a trained network, we find fixed points of the recurrent dynamics and linearize the nonlinear system around these fixed points. Despite their theoretical capacity to implement complex, high-dimensional computations, we find that trained networks converge to highly interpretable, low-dimensional representations. In particular, the topological structure of the fixed points and corresponding linearized dynamics reveal an approximate line attractor within the RNN, which we can use to quantitatively understand how the RNN solves the sentiment analysis task. Finally, we find this mechanism present across RNN architectures (including LSTMs, GRUs, and vanilla RNNs) trained on multiple datasets, suggesting that our findings are not unique to a particular architecture or dataset. Overall, these results demonstrate that surprisingly universal and human interpretable computations can arise across a range of recurrent networks. 
    more » « less
  2. Abstract

    Recurrent neural networks (RNNs) are often used to model circuits in the brain and can solve a variety of difficult computational problems requiring memory, error correction, or selection (Hopfield, 1982; Maass et al., 2002; Maass, 2011). However, fully connected RNNs contrast structurally with their biological counterparts, which are extremely sparse (about 0.1%). Motivated by the neocortex, where neural connectivity is constrained by physical distance along cortical sheets and other synaptic wiring costs, we introduce locality masked RNNs (LM-RNNs) that use task-agnostic predetermined graphs with sparsity as low as 4%. We study LM-RNNs in a multitask learning setting relevant to cognitive systems neuroscience with a commonly used set of tasks, 20-Cog-tasks (Yang et al., 2019). We show through reductio ad absurdum that 20-Cog-tasks can be solved by a small pool of separated autapses that we can mechanistically analyze and understand. Thus, these tasks fall short of the goal of inducing complex recurrent dynamics and modular structure in RNNs. We next contribute a new cognitive multitask battery, Mod-Cog, consisting of up to 132 tasks that expands by about seven-fold the number of tasks and task complexity of 20-Cog-tasks. Importantly, while autapses can solve the simple 20-Cog-tasks, the expanded task set requires richer neural architectures and continuous attractor dynamics. On these tasks, we show that LM-RNNs with an optimal sparsity result in faster training and better data efficiency than fully connected networks.

     
    more » « less
  3. Gutkin, Boris S. (Ed.)
    Converging evidence suggests the brain encodes time in dynamic patterns of neural activity, including neural sequences, ramping activity, and complex dynamics. Most temporal tasks, however, require more than just encoding time, and can have distinct computational requirements including the need to exhibit temporal scaling, generalize to novel contexts, or robustness to noise. It is not known how neural circuits can encode time and satisfy distinct computational requirements, nor is it known whether similar patterns of neural activity at the population level can exhibit dramatically different computational or generalization properties. To begin to answer these questions, we trained RNNs on two timing tasks based on behavioral studies. The tasks had different input structures but required producing identically timed output patterns. Using a novel framework we quantified whether RNNs encoded two intervals using either of three different timing strategies: scaling, absolute, or stimulus-specific dynamics. We found that similar neural dynamic patterns at the level of single intervals, could exhibit fundamentally different properties, including, generalization, the connectivity structure of the trained networks, and the contribution of excitatory and inhibitory neurons. Critically, depending on the task structure RNNs were better suited for generalization or robustness to noise. Further analysis revealed different connection patterns underlying the different regimes. Our results predict that apparently similar neural dynamic patterns at the population level (e.g., neural sequences) can exhibit fundamentally different computational properties in regards to their ability to generalize to novel stimuli and their robustness to noise—and that these differences are associated with differences in network connectivity and distinct contributions of excitatory and inhibitory neurons. We also predict that the task structure used in different experimental studies accounts for some of the experimentally observed variability in how networks encode time. 
    more » « less
  4. Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear State-Space Layer (LSSL) maps a sequence u↦y by simply simulating a linear continuous-time state-space representation ˙x=Ax+Bu,y=Cx+Du. Theoretically, we show that LSSL models are closely related to the three aforementioned families of models and inherit their strengths. For example, they generalize convolutions to continuous-time, explain common RNN heuristics, and share features of NDEs such as time-scale adaptation. We then incorporate and generalize recent theory on continuous-time memorization to introduce a trainable subset of structured matrices A that endow LSSLs with long-range memory. Empirically, stacking LSSL layers into a simple deep neural network obtains state-of-the-art results across time series benchmarks for long dependencies in sequential image classification, real-world healthcare regression tasks, and speech. On a difficult speech classification task with length-16000 sequences, LSSL outperforms prior approaches by 24 accuracy points, and even outperforms baselines that use hand-crafted features on 100x shorter sequences. 
    more » « less
  5. Biological agents do not have infinite resources to learn new things. For this reason, a central aspect of human learning is the ability to recycle previously acquired knowledge in a way that allows for faster, less resource-intensive acquisition of new skills. In spite of that, how neural networks in the brain leverage existing knowledge to learn new computations is not well understood. In this work, we study this question in artificial recurrent neural networks (RNNs) trained on a corpus of commonly used neuroscience tasks. Combining brain-inspired inductive biases we call functional and structural, we propose a system that learns new tasks by building on top of pre-trained latent dynamics organised into separate recurrent modules. These modules, acting as prior knowledge acquired previously through evolution or development, are pre-trained on the statistics of the full corpus of tasks so as to be independent and maximally informative. The resulting model, we call a Modular Latent Primitives (MoLaP) network, allows for learning multiple tasks while keeping parameter counts, and updates, low. We also show that the skills acquired with our approach are more robust to a broad range of perturbations compared to those acquired with other multi-task learning strategies, and that generalisation to new tasks is facilitated. This work offers a new perspective on achieving efficient multi-task learning in the brain, illustrating the benefits of leveraging pre-trained latent dynamical primitives. 
    more » « less