skip to main content


Title: Biologically inspired reinforcement learning for mobile robot collision avoidance
Collision avoidance is a key technology enabling applications such as autonomous vehicles and robots. Various reinforcement learning techniques such as the popular Q-learning algorithms have emerged as a promising solution for collision avoidance in robotics. While spiking neural networks (SNNs), the third generation model of neural networks, have gained increased interest due to their closer resemblance to biological neural circuits in the brain, the application of SNNs to mobile robot navigation has not been well studied. Under the context of reinforcement learning, this paper aims to investigate the potential of biologically-motivated spiking neural networks for goal-directed collision avoidance in reasonably complex environments. Unlike the existing additive reward-modulated spike timing dependent plasticity learning rule (A-RM-STDP), for the first time, we explore a new multiplicative RM-STDP scheme (M-RM-STDP) for the targeted application. Furthermore, we propose a more biologically plausible feed-forward spiking neural network architecture with fine-grained global rewards. Finally, by combining the above two techniques we demonstrate a further improved solution to collision avoidance. Our proposed approaches not only completely outperform Q-learning for cases where Q-learning can hardly reach the target without collision, but also significantly outperform a baseline SNN with A-RMSTDP in terms of both success rate and the quality of navigation trajectories.  more » « less
Award ID(s):
1639995
NSF-PAR ID:
10026441
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of ... International Joint Conference on Neural Networks
ISSN:
2161-4393
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Asynchronous event-driven computation and communication using spikes facilitate the realization of spiking neural networks (SNN) to be massively parallel, extremely energy efficient and highly robust on specialized neuromorphic hardware. However, the lack of a unified robust learning algorithm limits the SNN to shallow networks with low accuracies. Artificial neural networks (ANN), however, have the backpropagation algorithm which can utilize gradient descent to train networks which are locally robust universal function approximators. But backpropagation algorithm is neither biologically plausible nor neuromorphic implementation friendly because it requires: 1) separate backward and forward passes, 2) differentiable neurons, 3) high-precision propagated errors, 4) coherent copy of weight matrices at feedforward weights and the backward pass, and 5) non-local weight update. Thus, we propose an approximation of the backpropagation algorithm completely with spiking neurons and extend it to a local weight update rule which resembles a biologically plausible learning rule spike-timing-dependent plasticity (STDP). This will enable error propagation through spiking neurons for a more biologically plausible and neuromorphic implementation friendly backpropagation algorithm for SNNs. We test the proposed algorithm on various traditional and non-traditional benchmarks with competitive results. 
    more » « less
  2. Spiking neural networks(SNNs) have drawn broad research interests in recent years due to their high energy efficiency and biologically-plausibility. They have proven to be competitive in many machine learning tasks. Similar to all Artificial Neural Network(ANNs) machine learning models, the SNNs rely on the assumption that the training and testing data are drawn from the same distribution. As the environment changes gradually, the input distribution will shift over time, and the performance of SNNs turns out to be brittle. To this end, we propose a unified framework that can adapt nonstationary streaming data by exploiting unlabeled intermediate domain, and fits with the in-hardware SNN learning algorithm Error-modulated STDP. Specifically, we propose a unique self training framework to generate pseudo labels to retrain the model for intermediate and target domains. In addition, we develop an online-normalization method with an auxiliary neuron to normalize the output of the hidden layers. By combining the normalization with self-training, our approach gains average classification improvements over 10% on MNIST, NMINST, and two other datasets. 
    more » « less
  3. Spiking neural networks(SNNs) have drawn broad research interests in recent years due to their high energy efficiency and biologically-plausibility. They have proven to be competitive in many machine learning tasks. Similar to all Artificial Neural Network(ANNs) machine learning models, the SNNs rely on the assumption that the training and testing data are drawn from the same distribution. As the environment changes gradually, the input distribution will shift over time, and the performance of SNNs turns out to be brittle. To this end, we propose a unified framework that can adapt non-stationary streaming data by exploiting unlabeled intermediate domain, and fits with the in-hardware SNN learning algorithm Error-modulated STDP. Specifically, we propose a unique self-training framework to generate pseudo labels to retrain the model for intermediate and target domains. In addition, we develop an online-normalization method with an auxiliary neuron to normalize the output of the hidden layers. By combining the normalization with self-training, our approach gains average classification improvements over 10% on MNIST, NMINST, and two other datasets. 
    more » « less
  4. The Artificial Intelligence (AI) disruption continues unabated, albeit at extreme compute requirements. Neuromorphic circuits and systems offer a panacea for this extravagance. To this effect, event-based learning such as spike-timing-dependent plasticity (STDP) in spiking neural networks (SNNs) is an active area of research. Hebbian learning in SNNs fundamentally involves synaptic weight updates based on temporal correlations between pre- and post- synaptic neural activities. While there are broadly two approaches of realizing STDP, i.e. All-to-All versus Nearest Neighbor (NN), there exist strong arguments favoring the NN approach on the biologically plausibility front. In this paper, we present a novel current-mode implementation of a postsynaptic event-based NN STDP-based synapse. We leverage transistor subthreshold dynamics to generate exponential STDP traces using repurposed log-domain low-pass filter circuits. Synaptic weight operations involving addition and multiplications are achieved by the Kirchoff current law and the translinear principle respectively. Simulation results from the NCSU TSMC 180 nm technology are presented. Finally, the ideas presented here hold implications for engineering efficient hardware to meet the growing AI training and inference demands. 
    more » « less
  5. The Artificial Intelligence (AI) disruption continues unabated, albeit at extreme compute requirements. Neuromorphic circuits and systems offer a panacea for this extravagance. To this effect, event-based learning such as spike-timing-dependent plasticity (STDP) in spiking neural networks (SNNs) is an active area of research. Hebbian learning in SNNs fundamentally involves synaptic weight updates based on temporal correlations between pre- and post- synaptic neural activities. While there are broadly two approaches of realizing STDP, i.e. All-to-All versus Nearest Neighbor (NN), there exist strong arguments favoring the NN approach on the biologically plausibility front. In this paper, we present a novel current-mode implementation of a postsynaptic event-based NN STDP-based synapse. We leverage transistor subthreshold dynamics to generate exponential STDP traces using repurposed log-domain low-pass filter circuits. Synaptic weight operations involving addition and multiplications are achieved by the Kirchoff current law and the translinear principle respectively. Simulation results from the NCSU TSMC 180 nm technology are presented. Finally, the ideas presented here hold implications for engineering efficient hardware to meet the growing AI training and inference demands. 
    more » « less