skip to main content


Title: Neural sampling machine with stochastic synapse allows brain-like learning and inference
Abstract

Many real-world mission-critical applications require continual online learning from noisy data and real-time decision making with a defined confidence level. Brain-inspired probabilistic models of neural network can explicitly handle the uncertainty in data and allow adaptive learning on the fly. However, their implementation in a compact, low-power hardware remains a challenge. In this work, we introduce a novel hardware fabric that can implement a new class of stochastic neural network called Neural Sampling Machine (NSM) by exploiting the stochasticity in the synaptic connections for approximate Bayesian inference. We experimentally demonstrate an in silico hybrid stochastic synapse by pairing a ferroelectric field-effect transistor (FeFET)-based analog weight cell with a two-terminal stochastic selector element. We show that the stochastic switching characteristic of the selector between the insulator and the metallic states resembles the multiplicative synaptic noise of the NSM. We perform network-level simulations to highlight the salient features offered by the stochastic NSM such as performing autonomous weight normalization for continual online learning and Bayesian inferencing. We show that the stochastic NSM can not only perform highly accurate image classification with 98.25% accuracy on standard MNIST dataset, but also estimate the uncertainty in prediction (measured in terms of the entropy of prediction) when the digits of the MNIST dataset are rotated. Building such a probabilistic hardware platform that can support neuroscience inspired models can enhance the learning and inference capability of the current artificial intelligence (AI).

 
more » « less
Award ID(s):
1652159
NSF-PAR ID:
10366836
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
13
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The remarkable success of the Transformer model in Natural Language Processing (NLP) is increasingly capturing the attention of vision researchers in contemporary times. The Vision Transformer (ViT) model effectively models long-range dependencies while utilizing a self-attention mechanism by converting image information into meaningful representations. Moreover, the parallelism property of ViT ensures better scalability and model generalization compared to Recurrent Neural Networks (RNN). However, developing robust ViT models for high-risk vision applications, such as self-driving cars, is critical. Deterministic ViT models are susceptible to noise and adversarial attacks and incapable of yielding a level of confidence in output predictions. Quantifying the confidence (or uncertainty) level in the decision is highly important in such real-world applications. In this work, we introduce a probabilistic framework for ViT to quantify the level of uncertainty in the model's decision. We approximate the posterior distribution of network parameters using variational inference. While progressing through non-linear layers, the first-order Taylor approximation was deployed. The developed framework propagates the mean and covariance of the posterior distribution through layers of the probabilistic ViT model and quantifies uncertainty at the output predictions. Quantifying uncertainty aids in providing warning signals to real-world applications in case of noisy situations. Experimental results from extensive simulation conducted on numerous benchmark datasets (e.g., MNIST and Fashion-MNIST) for image classification tasks exhibit 1) higher accuracy of proposed probabilistic ViT under noise or adversarial attacks compared to the deterministic ViT. 2) Self-evaluation through uncertainty becomes notably pronounced as noise levels escalate. Simulations were conducted at the Texas Advanced Computing Center (TACC) on the Lonestar6 supercomputer node. With the help of this vital resource, we completed all the experiments within a reasonable period. 
    more » « less
  2. Abstract

    In neuromorphic computing, artificial synapses provide a multi‐weight (MW) conductance state that is set based on inputs from neurons, analogous to the brain. Herein, artificial synapses based on magnetic materials that use a magnetic tunnel junction (MTJ) and a magnetic domain wall (DW) are explored. By fabricating lithographic notches in a DW track underneath a single MTJ, 3–5 stable resistance states that can be repeatably controlled electrically using spin‐orbit torque are achieved. The effect of geometry on the synapse behavior is explored, showing that a trapezoidal device has asymmetric weight updates with high controllability, while a rectangular device has higher stochasticity, but with stable resistance levels. The device data is input into neuromorphic computing simulators to show the usefulness of application‐specific synaptic functions. Implementing an artificial neural network (NN) applied to streamed Fashion‐MNIST data, the trapezoidal magnetic synapse can be used as a metaplastic function for efficient online learning. Implementing a convolutional NN for CIFAR‐100 image recognition, the rectangular magnetic synapse achieves near‐ideal inference accuracy, due to the stability of its resistance levels. This work shows MW magnetic synapses are a feasible technology for neuromorphic computing and provides design guidelines for emerging artificial synapse technologies.

     
    more » « less
  3. Spiking neural networks (SNNs) are positioned to enable spatio-temporal information processing and ultra-low power event-driven neuromorphic hardware. However, SNNs are yet to reach the same performances of conventional deep artificial neural networks (ANNs), a long-standing challenge due to complex dynamics and non-differentiable spike events encountered in training. The existing SNN error backpropagation (BP) methods are limited in terms of scalability, lack of proper handling of spiking discontinuities, and/or mismatch between the rate coded loss function and computed gradient. We present a hybrid macro/micro level backpropagation (HM2-BP) algorithm for training multi-layer SNNs. The temporal effects are precisely captured by the proposed spike-train level post-synaptic potential (S-PSP) at the microscopic level. The rate-coded errors are defined at the macroscopic level, computed and back-propagated across both macroscopic and microscopic levels. Different from existing BP methods, HM2-BP directly computes the gradient of the rate-coded loss function w.r.t tunable parameters. We evaluate the proposed HM2-BP algorithm by training deep fully connected and convolutional SNNs based on the static MNIST [14] and dynamic neuromorphic N-MNIST [26]. HM2-BP achieves an accuracy level of 99:49% and 98:88% for MNIST and N-MNIST, respectively, outperforming the best reported performances obtained from the existing SNN BP algorithms. Furthermore, the HM2-BP produces the highest accuracies based on SNNs for the EMNIST [3] dataset, and leads to high recognition accuracy for the 16-speaker spoken English letters of TI46 Corpus [16], a challenging spatio-temporal speech recognition benchmark for which no prior success based on SNNs was reported. It also achieves competitive performances surpassing those of conventional deep learning models when dealing with asynchronous spiking streams. 
    more » « less
  4. Modern deep learning systems rely on (a) a handtuned neural network topology, (b) massive amounts of labelled training data, and (c) extensive training over large-scale compute resources to build a system that can perform efficient image classification or speech recognition. Unfortunately, we are still far away from implementing adaptive general purpose intelligent systems which would need to learn autonomously in unknown environments and may not have access to some or any of these three components. Reinforcement learning and evolutionary algorithm (EA) based methods circumvent this problem by continuously interacting with the environment and updating the models based on obtained rewards. However, deploying these algorithms on ubiquitous autonomous agents at the edge (robots/drones) demands extremely high energy-efficiency due to (i) tight power and energy budgets, (ii) continuous / lifelong interaction with the environment, (iii) intermittent or no connectivity to the cloud to offloadheavy-weight processing. To address this need, we present GENESYS, a HW-SW prototype of a EA-based learning system, that comprises of a closed loop learning engine called EvE and an inference engine called ADAM. EvE can evolve the topology and weights of neural networks completely in hardware for the task at hand, without requiring hand-optimization or backpropogation training. ADAM continuously interacts with the environment and is optimized for efficiently running the irregular neural networks generated by EvE. GENESYS identifies and leverages multiple unique avenues of parallelism unique to EAs that we term “gene”- level parallelism, and “population”-level parallelism. We ran GENESYS with a suite of environments from OpenAI gym and observed 2-5 orders of magnitude higher energy-efficiency over state-of-the-art embedded and desktop CPU and GPU systems. 
    more » « less
  5. Abstract

    Adaptive ‘life-long’ learning at the edge and during online task performance is an aspirational goal of artificial intelligence research. Neuromorphic hardware implementing spiking neural networks (SNNs) are particularly attractive in this regard, as their real-time, event-based, local computing paradigm makes them suitable for edge implementations and fast learning. However, the long and iterative learning that characterizes state-of-the-art SNN training is incompatible with the physical nature and real-time operation of neuromorphic hardware. Bi-level learning, such as meta-learning is increasingly used in deep learning to overcome these limitations. In this work, we demonstrate gradient-based meta-learning in SNNs using the surrogate gradient method that approximates the spiking threshold function for gradient estimations. Because surrogate gradients can be made twice differentiable, well-established, and effective second-order gradient meta-learning methods such as model agnostic meta learning (MAML) can be used. We show that SNNs meta-trained using MAML perform comparably to conventional artificial neural networks meta-trained with MAML on event-based meta-datasets. Furthermore, we demonstrate the specific advantages that accrue from meta-learning: fast learning without the requirement of high precision weights or gradients, training-to-learn with quantization and mitigating the effects of approximate synaptic plasticity rules. Our results emphasize how meta-learning techniques can become instrumental for deploying neuromorphic learning technologies on real-world problems.

     
    more » « less