skip to main content


Title: High Throughput Neuromorphic Brain Interface with CuO x Resistive Crossbars for Real-time Spike Sorting
Real-time spike sorting with large data throughput is essential for studying neural dynamics and brain-machine interfaces. Neural recordings from high-density multi-electrode arrays that consist of hundreds of electrodes impose stringent demands on spike sorting hardware regarding data transmission bandwidth and computation complexity. That leads to an urgent need for specialized hardware with high throughput, low power, and latency. Here, we present a real-time spike sorting processor that utilizes high-density BEOL-integrable CuO x resistive crossbars to perform in-memory spike segregation. We experimentally demonstrate, for the first time, efficient hardware implementation of spike sorting from in vivo extracellular recordings with high accuracy. Our neuromorphic interface promises substantial performance gains ( ∼1000×less area,∼200×less power,4.8 μs latency for sorting 100 channels) for in vivo real-time spike sorting.  more » « less
Award ID(s):
1752241
NSF-PAR ID:
10395622
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
IEEE International Electron Devices Meeting
Page Range / eLocation ID:
16.5.1 to 16.5.4
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Neuromorphic computing systems execute machine learning tasks designed with spiking neural networks. These systems are embracing non-volatile memory to implement high-density and low-energy synaptic storage. Elevated voltages and currents needed to operate non-volatile memories cause aging of CMOS-based transistors in each neuron and synapse circuit in the hardware, drifting the transistor’s parameters from their nominal values. If these circuits are used continuously for too long, the parameter drifts cannot be reversed, resulting in permanent degradation of circuit performance over time, eventually leading to hardware faults. Aggressive device scaling increases power density and temperature, which further accelerates the aging, challenging the reliable operation of neuromorphic systems. Existing reliability-oriented techniques periodically de-stress all neuron and synapse circuits in the hardware at fixed intervals, assuming worst-case operating conditions, without actually tracking their aging at run-time. To de-stress these circuits, normal operation must be interrupted, which introduces latency in spike generation and propagation, impacting the inter-spike interval and hence, performance (e.g., accuracy). We observe that in contrast to long-term aging, which permanently damages the hardware, short-term aging in scaled CMOS transistors is mostly due to bias temperature instability. The latter is heavily workload-dependent and, more importantly, partially reversible. We propose a new architectural technique to mitigate the aging-related reliability problems in neuromorphic systems by designing an intelligent run-time manager (NCRTM), which dynamically de-stresses neuron and synapse circuits in response to the short-term aging in their CMOS transistors during the execution of machine learning workloads, with the objective of meeting a reliability target. NCRTM de-stresses these circuits only when it is absolutely necessary to do so, otherwise reducing the performance impact by scheduling de-stress operations off the critical path. We evaluate NCRTM with state-of-the-art machine learning workloads on a neuromorphic hardware. Our results demonstrate that NCRTM significantly improves the reliability of neuromorphic hardware, with marginal impact on performance. 
    more » « less
  2. High-quality 3D image recognition is an important component of many vision and robotics systems. However, the accurate processing of these images requires the use of compute-expensive 3D Convolutional Neural Networks (CNNs). To address this challenge, we propose the use of Spiking Neural Networks (SNNs) that are generated from iso-architecture CNNs and trained with quantization-aware gradient descent to optimize their weights, membrane leak, and firing thresholds. During both training and inference, the analog pixel values of a 3D image are directly applied to the input layer of the SNN without the need to convert to a spike-train. This significantly reduces the training and inference latency and results in high degree of activation sparsity, which yields significant improvements in computational efficiency. However, this introduces energy-hungry digital multiplications in the first layer of our models, which we propose to mitigate using a processing-in-memory (PIM) architecture. To evaluate our proposal, we propose a 3D and a 3D/2D hybrid SNN-compatible convolutional architecture and choose hyperspectral imaging (HSI) as an application for 3D image recognition. We achieve overall test accuracy of 98.68, 99.50, and 97.95% with 5 time steps (inference latency) and 6-bit weight quantization on the Indian Pines, Pavia University, and Salinas Scene datasets, respectively. In particular, our models implemented using standard digital hardware achieved accuracies similar to state-of-the-art (SOTA) with ~560.6× and ~44.8× less average energy than an iso-architecture full-precision and 6-bit quantized CNN, respectively. Adopting the PIM architecture in the first layer, further improves the average energy, delay, and energy-delay-product (EDP) by 30, 7, and 38%, respectively. 
    more » « less
  3. Neural signal recording and optical stimulation using implantable devices have become a ubiquitous method to treat brain disorders, yet there lie some shortcomings, such as size, weight, and functionalities of the implants. This work presents a commercial off-the-shelf (COTS) component-based miniaturized wireless optogenetic headstage with simultaneous optical stimulation and electrophysiological recording for freely moving rats. The system includes a battery-based neural stimulator consisting of a low-dropout (LDO) regulator, an oscillator, and a μ LED. The electrophysiological signal recording system includes an intracortical neural probe implemented on a shape memory polymer (SMP) substrate, an array of neural amplifiers with an integrated analog-to-digital converter (ADC), a transceiver IC, and a ceramic antenna. A digital sub-1-GHz transceiver integrated with a low-power microcontroller (MCU) is used to transmit the acquired neural data to a remote receiver unit, followed by offline spike detection and sorting in LabVIEW. The front-end recording amplifiers provide a gain of 45.7 dB with the input-referred noise of 2.4μVrms . The integrated multiplexer (MUX) with the ADC allows sampling of the amplified voltage at a configurable sampling rate of 160–480 kSamples/s. The total power consumption of the stimulation and the recording system is 23 mW. The dimension of the headstage device is 13.5×21.3 mm, weighing 4 g without the battery. The system is experimentally validated in an in vivo setting by placing the headstage on the head of a male rat and recording the neural signals from the ventral tegmental area (VTA) of the brain. This integrative neural signal recording and spike sorting approach would be useful for the development of a closed-loop neuromodulation system. 
    more » « less
  4. In this paper, we evaluate the use of a trained Long Short-Term Memory (LSTM) network as a surrogate for a Euler–Bernoulli beam model, and then we describe and characterize an FPGA-based deployment of the model for use in real-time structural health monitoring applications. The focus of our efforts is the DROPBEAR (Dynamic Reproduction of Projectiles in Ballistic Environments for Advanced Research) dataset, which was generated as a benchmark for the study of real-time structural modeling applications. The purpose of DROPBEAR is to evaluate models that take vibration data as input and give the initial conditions of the cantilever beam on which the measurements were taken as output. DROPBEAR is meant to serve an exemplar for emerging high-rate “active structures” that can be actively controlled with feedback latencies of less than one microsecond. Although the Euler–Bernoulli beam model is a well-known solution to this modeling problem, its computational cost is prohibitive for the time scales of interest. It has been previously shown that a properly structured LSTM network can achieve comparable accuracy with less workload, but achieving sub-microsecond model latency remains a challenge. Our approach is to deploy the LSTM optimized specifically for latency on FPGA. We designed the model using both high-level synthesis (HLS) and hardware description language (HDL). The lowest latency of 1.42 µS and the highest throughput of 7.87 Gops/s were achieved on Alveo U55C platform for HDL design. 
    more » « less
  5. Separating neural signals from noise can improve brain-computer interface performance and stability. However, most algorithms for separating neural action potentials from noise are not suitable for use in real time and have shown mixed effects on decoding performance. With the goal of removing noise that impedes online decoding, we sought to automate the intuition of human spike-sorters to operate in real time with an easily tunable parameter governing the stringency with which spike waveforms are classified. We trained an artificial neural network with one hidden layer on neural waveforms that were hand-labeled as either spikes or noise. The network output was a likelihood metric for each waveform it classified, and we tuned the network’s stringency by varying the minimum likelihood value for a waveform to be considered a spike. Using the network’s labels to exclude noise waveforms, we decoded remembered target location during a memory-guided saccade task from electrode arrays implanted in prefrontal cortex of rhesus macaque monkeys. The network classified waveforms in real time, and its classifications were qualitatively similar to those of a human spike-sorter. Compared with decoding with threshold crossings, in most sessions we improved decoding performance by removing waveforms with low spike likelihood values. Furthermore, decoding with our network’s classifications became more beneficial as time since array implantation increased. Our classifier serves as a feasible preprocessing step, with little risk of harm, that could be applied to both off-line neural data analyses and online decoding. NEW & NOTEWORTHY Although there are many spike-sorting methods that isolate well-defined single units, these methods typically involve human intervention and have inconsistent effects on decoding. We used human classified neural waveforms as training data to create an artificial neural network that could be tuned to separate spikes from noise that impaired decoding. We found that this network operated in real time and was suitable for both off-line data processing and online decoding. 
    more » « less