skip to main content


Title: Recurrent neural network models for working memory of continuous variables: activity manifolds, connectivity patterns, and dynamic codes
Many daily activities and psychophysical experiments involve keeping multiple items in working memory. When items take continuous values (e.g., orientation, contrast, length, loudness) they must be stored in a continuous structure of appropriate dimensions. We investigate how this structure is represented in neural circuits by training recurrent networks to report two previously shown stimulus orientations. We find the activity manifold for the two orientations resembles a Clifford torus. Although a Clifford and standard torus (the surface of a donut) are topologically equivalent, they have important functional differences. A Clifford torus treats the two orientations equally and keeps them in orthogonal subspaces, as demanded by the task, whereas a standard torus does not. We find and characterize the connectivity patterns that support the Clifford torus. Moreover, in addition to attractors that store information via persistent activity, our networks also use a dynamic code where units change their tuning to prevent new sensory input from overwriting the previously stored one. We argue that such dynamic codes are generally required whenever multiple inputs enter a memory system via shared connections. Finally, we apply our framework to a human psychophysics experiment in which subjects reported two remembered orientations. By varying the training conditions of the RNNs, we test and support the hypothesis that human behavior is a product of both neural noise and reliance on the more stable and behaviorally relevant memory of the ordinal relationship between the two orientations. This suggests that suitable inductive biases in RNNs are important for uncovering how the human brain implements working memory. Together, these results offer an understanding of the neural computations underlying a class of visual decoding tasks, bridging the scales from human behavior to synaptic connectivity.  more » « less
Award ID(s):
1754211
NSF-PAR ID:
10323618
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ArXivorg
ISSN:
2331-8422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Recurrent neural networks (RNNs) are often used to model circuits in the brain and can solve a variety of difficult computational problems requiring memory, error correction, or selection (Hopfield, 1982; Maass et al., 2002; Maass, 2011). However, fully connected RNNs contrast structurally with their biological counterparts, which are extremely sparse (about 0.1%). Motivated by the neocortex, where neural connectivity is constrained by physical distance along cortical sheets and other synaptic wiring costs, we introduce locality masked RNNs (LM-RNNs) that use task-agnostic predetermined graphs with sparsity as low as 4%. We study LM-RNNs in a multitask learning setting relevant to cognitive systems neuroscience with a commonly used set of tasks, 20-Cog-tasks (Yang et al., 2019). We show through reductio ad absurdum that 20-Cog-tasks can be solved by a small pool of separated autapses that we can mechanistically analyze and understand. Thus, these tasks fall short of the goal of inducing complex recurrent dynamics and modular structure in RNNs. We next contribute a new cognitive multitask battery, Mod-Cog, consisting of up to 132 tasks that expands by about seven-fold the number of tasks and task complexity of 20-Cog-tasks. Importantly, while autapses can solve the simple 20-Cog-tasks, the expanded task set requires richer neural architectures and continuous attractor dynamics. On these tasks, we show that LM-RNNs with an optimal sparsity result in faster training and better data efficiency than fully connected networks.

     
    more » « less
  2. The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge, referring to the discrepancy between an assumed truth model and the imperfect mechanistic model as model error. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T T , the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error, assuming that it is governed by a finite-dimensional hidden variable and that, together, the observed and hidden variables form a continuous-time Markovian system. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz ’63, Lorenz ’96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less datahungry and more parametrically efficient. We also find that, while a continuous-time framing allows for robustness to irregular sampling and desirable domain- interpretability, a discrete-time framing can provide similar or better predictive performance, especially when data are undersampled and the vector field defining the true dynamics cannot be identified. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models. 
    more » « less
  3. Gutkin, Boris S. (Ed.)
    Converging evidence suggests the brain encodes time in dynamic patterns of neural activity, including neural sequences, ramping activity, and complex dynamics. Most temporal tasks, however, require more than just encoding time, and can have distinct computational requirements including the need to exhibit temporal scaling, generalize to novel contexts, or robustness to noise. It is not known how neural circuits can encode time and satisfy distinct computational requirements, nor is it known whether similar patterns of neural activity at the population level can exhibit dramatically different computational or generalization properties. To begin to answer these questions, we trained RNNs on two timing tasks based on behavioral studies. The tasks had different input structures but required producing identically timed output patterns. Using a novel framework we quantified whether RNNs encoded two intervals using either of three different timing strategies: scaling, absolute, or stimulus-specific dynamics. We found that similar neural dynamic patterns at the level of single intervals, could exhibit fundamentally different properties, including, generalization, the connectivity structure of the trained networks, and the contribution of excitatory and inhibitory neurons. Critically, depending on the task structure RNNs were better suited for generalization or robustness to noise. Further analysis revealed different connection patterns underlying the different regimes. Our results predict that apparently similar neural dynamic patterns at the population level (e.g., neural sequences) can exhibit fundamentally different computational properties in regards to their ability to generalize to novel stimuli and their robustness to noise—and that these differences are associated with differences in network connectivity and distinct contributions of excitatory and inhibitory neurons. We also predict that the task structure used in different experimental studies accounts for some of the experimentally observed variability in how networks encode time. 
    more » « less
  4. Dynamic adaptation is an error-driven process of adjusting planned motor actions to changes in task dynamics (Shadmehr, 2017). Adapted motor plans are consolidated into memories that contribute to better performance on re-exposure. Consolidation begins within 15 min following training (Criscimagna-Hemminger and Shadmehr, 2008), and can be measured via changes in resting state functional connectivity (rsFC). For dynamic adaptation, rsFC has not been quantified on this timescale, nor has its relationship to adaptative behavior been established. We used a functional magnetic resonance imaging (fMRI)-compatible robot, the MR-SoftWrist (Erwin et al., 2017), to quantify rsFC specific to dynamic adaptation of wrist movements and subsequent memory formation in a mixed-sex cohort of human participants. We acquired fMRI during a motor execution and a dynamic adaptation task to localize brain networks of interest, and quantified rsFC within these networks in three 10-min windows occurring immediately before and after each task. The next day, we assessed behavioral retention. We used a mixed model of rsFC measured in each time window to identify changes in rsFC with task performance, and linear regression to identify the relationship between rsFC and behavior. Following the dynamic adaptation task, rsFC increased within the cortico-cerebellar network and decreased interhemispherically within the cortical sensorimotor network. Increases within the cortico-cerebellar network were specific to dynamic adaptation, as they were associated with behavioral measures of adaptation and retention, indicating that this network has a functional role in consolidation. Instead, decreases in rsFC within the cortical sensorimotor network were associated with motor control processes independent from adaptation and retention.

    SIGNIFICANCE STATEMENTMotor memory consolidation processes have been studied via functional magnetic resonance imaging (fMRI) by analyzing changes in resting state functional connectivity (rsFC) occurring more than 30 min after adaptation. However, it is unknown whether consolidation processes are detectable immediately (<15 min) following dynamic adaptation. We used an fMRI-compatible wrist robot to localize brain regions involved in dynamic adaptation in the cortico-thalamic-cerebellar (CTC) and cortical sensorimotor networks and quantified changes in rsFC within each network immediately after adaptation. Different patterns of change in rsFC were observed compared with studies conducted at longer latencies. Increases in rsFC in the cortico-cerebellar network were specific to adaptation and retention, while interhemispheric decreases in the cortical sensorimotor network were associated with alternate motor control processes but not with memory formation.

     
    more » « less
  5. Abstract

    Despite thelack of invariance problem(the many‐to‐many mapping between acoustics and percepts), human listeners experience phonetic constancy and typically perceive what a speaker intends. Most models of human speech recognition (HSR) have side‐stepped this problem, working with abstract, idealized inputs and deferring the challenge of working with real speech. In contrast, carefully engineered deep learning networks allow robust, real‐world automatic speech recognition (ASR). However, the complexities of deep learning architectures and training regimens make it difficult to use them to provide direct insights into mechanisms that may support HSR. In this brief article, we report preliminary results from a two‐layer network that borrows one element from ASR,long short‐term memorynodes, which provide dynamic memory for a range of temporal spans. This allows the model to learn to map real speech from multiple talkers to semantic targets with high accuracy, with human‐like timecourse of lexical access and phonological competition. Internal representations emerge that resemble phonetically organized responses in human superior temporal gyrus, suggesting that the model develops a distributed phonological code despite no explicit training on phonetic or phonemic targets. The ability to work with real speech is a major advance for cognitive models of HSR.

     
    more » « less