skip to main content

This content will become publicly available on April 8, 2023

Title: Reinforcement Learning for Central Pattern Generation in Dynamical Recurrent Neural Networks
Lifetime learning, or the change (or acquisition) of behaviors during a lifetime, based on experience, is a hallmark of living organisms. Multiple mechanisms may be involved, but biological neural circuits have repeatedly demonstrated a vital role in the learning process. These neural circuits are recurrent, dynamic, and non-linear and models of neural circuits employed in neuroscience and neuroethology tend to involve, accordingly, continuous-time, non-linear, and recurrently interconnected components. Currently, the main approach for finding configurations of dynamical recurrent neural networks that demonstrate behaviors of interest is using stochastic search techniques, such as evolutionary algorithms. In an evolutionary algorithm, these dynamic recurrent neural networks are evolved to perform the behavior over multiple generations, through selection, inheritance, and mutation, across a population of solutions. Although, these systems can be evolved to exhibit lifetime learning behavior, there are no explicit rules built into these dynamic recurrent neural networks that facilitate learning during their lifetime (e.g., reward signals). In this work, we examine a biologically plausible lifetime learning mechanism for dynamical recurrent neural networks. We focus on a recently proposed reinforcement learning mechanism inspired by neuromodulatory reward signals and ongoing fluctuations in synaptic strengths. Specifically, we extend one of the best-studied and most-commonly used more » dynamic recurrent neural networks to incorporate the reinforcement learning mechanism. First, we demonstrate that this extended dynamical system (model and learning mechanism) can autonomously learn to perform a central pattern generation task. Second, we compare the robustness and efficiency of the reinforcement learning rules in relation to two baseline models, a random walk and a hill-climbing walk through parameter space. Third, we systematically study the effect of the different meta-parameters of the learning mechanism on the behavioral learning performance. Finally, we report on preliminary results exploring the generality and scalability of this learning mechanism for dynamical neural networks as well as directions for future work. « less
; ; ;
Award ID(s):
Publication Date:
Journal Name:
Frontiers in Computational Neuroscience
Sponsoring Org:
National Science Foundation
More Like this
  1. Living organisms learn on multiple time scales: evolutionary as well as individual-lifetime learning. These two learning modes are complementary: the innate phenotypes developed through evolution significantly influence lifetime learning. However, it is still unclear how these two learning methods interact and whether there is a benefit to part of the system being optimized on a different time scale using a population-based approach while the rest of it is trained on a different time-scale using an individualistic learning algorithm. In this work, we study the benefits of such a hybrid approach using an actor-critic framework where the critic part of an agent is optimized over evolutionary time based on its ability to train the actor part of an agent during its lifetime. Typically, critics are optimized on the same time-scale as the actor using the Bellman equation to represent long-term expected reward. We show that evolution can find a variety of different solutions that can still enable an actor to learn to perform a behavior during its lifetime. We also show that although the solutions found by evolution represent different functions, they all provide similar training signals during the lifetime. This suggests that learning on multiple time-scales can effectively simplify themore »overall optimization process in the actor-critic framework by finding one of many solutions that can still train an actor just as well. Furthermore, analysis of the evolved critics can yield additional possibilities for reinforcement learning beyond the Bellman equation.« less
  2. Abstract

    The field of basal cognition seeks to understand how adaptive, context-specific behavior occurs in non-neural biological systems. Embryogenesis and regeneration require plasticity in many tissue types to achieve structural and functional goals in diverse circumstances. Thus, advances in both evolutionary cell biology and regenerative medicine require an understanding of how non-neural tissues could process information. Neurons evolved from ancient cell types that used bioelectric signaling to perform computation. However, it has not been shown whether or how non-neural bioelectric cell networks can support computation. We generalize connectionist methods to non-neural tissue architectures, showing that a minimal non-neural Bio-Electric Network (BEN) model that utilizes the general principles of bioelectricity (electrodiffusion and gating) can compute. We characterize BEN behaviors ranging from elementary logic gates to pattern detectors, using both fixed and transient inputs to recapitulate various biological scenarios. We characterize the mechanisms of such networks using dynamical-systems and information-theory tools, demonstrating that logic can manifest in bidirectional, continuous, and relatively slow bioelectrical systems, complementing conventional neural-centric architectures. Our results reveal a variety of non-neural decision-making processes as manifestations of general cellular biophysical mechanisms and suggest novel bioengineering approaches to construct functional tissues for regenerative medicine and synthetic biology as well asmore »new machine learning architectures.

    « less
  3. A major goal in neuroscience is to understand the relationship between an animal’s behavior and how this is encoded in the brain. Therefore, a typical experiment involves training an animal to perform a task and recording the activity of its neurons – brain cells – while the animal carries out the task. To complement these experimental results, researchers “train” artificial neural networks – simplified mathematical models of the brain that consist of simple neuron-like units – to simulate the same tasks on a computer. Unlike real brains, artificial neural networks provide complete access to the “neural circuits” responsible for a behavior, offering a way to study and manipulate the behavior in the circuit. One open issue about this approach has been the way in which the artificial networks are trained. In a process known as reinforcement learning, animals learn from rewards (such as juice) that they receive when they choose actions that lead to the successful completion of a task. By contrast, the artificial networks are explicitly told the correct action. In addition to differing from how animals learn, this limits the types of behavior that can be studied using artificial neural networks. Recent advances in the field of machinemore »learning that combine reinforcement learning with artificial neural networks have now allowed Song et al. to train artificial networks to perform tasks in a way that mimics the way that animals learn. The networks consisted of two parts: a “decision network” that uses sensory information to select actions that lead to the greatest reward, and a “value network” that predicts how rewarding an action will be. Song et al. found that the resulting artificial “brain activity” closely resembled the activity found in the brains of animals, confirming that this method of training artificial neural networks may be a useful tool for neuroscientists who study the relationship between brains and behavior. The training method explored by Song et al. represents only one step forward in developing artificial neural networks that resemble the real brain. In particular, neural networks modify connections between units in a vastly different way to the methods used by biological brains to alter the connections between neurons. Future work will be needed to bridge this gap.« less
  4. This paper considers the problem of tracking and predicting dynamical processes with model switching. The classical approach to this problem has been to use an interacting multiple model (IMM) which uses multiple Kalman filters and an auxiliary system to estimate the posterior probability of each model given the observations. More recently, data-driven approaches such as recurrent neural networks (RNNs) have been used for tracking and prediction in a variety of settings. An advantage of data-driven approaches like the RNN is that they can be trained to provide good performance even when the underlying dynamic models are unknown. This paper studies the use of temporal convolutional networks (TCNs) in this setting since TCNs are also data-driven but have certain structural advantages over RNNs. Numerical simulations demonstrate that a TCN matches or exceeds the performance of an IMM and other classical tracking methods in two specific settings with model switching: (i) a Gilbert-Elliott burst noise communication channel that switches between two different modes, each modeled as a linear system, and (ii) a maneuvering target tracking scenario where the target switches between a linear constant velocity mode and a nonlinear coordinated turn mode. In particular, the results show that the TCN tends tomore »identify a mode switch as fast or faster than an IMM and that, in some cases, the TCN can perform almost as well as an omniscient Kalman filter with perfect knowledge of the current mode of the dynamical system.« less
  5. Abstract

    Rapid and flexible learning during behavioral choices is critical to our daily endeavors and constitutes a hallmark of dynamic reasoning. An important paradigm to examine flexible behavior involves learning new arbitrary associations mapping visual inputs to motor outputs. We conjectured that visuomotor rules are instantiated by translating visual signals into actions through dynamic interactions between visual, frontal and motor cortex. We evaluated the neural representation of such visuomotor rules by performing intracranial field potential recordings in epilepsy subjects during a rule-learning delayed match-to-behavior task. Learning new visuomotor mappings led to the emergence of specific responses associating visual signals with motor outputs in 3 anatomical clusters in frontal, anteroventral temporal and posterior parietal cortex. After learning, mapping selective signals during the delay period showed interactions with visual and motor signals. These observations provide initial steps towards elucidating the dynamic circuits underlying flexible behavior and how communication between subregions of frontal, temporal, and parietal cortex leads to rapid learning of task-relevant choices.