skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: HALO: A Hardware–Software Co-Designed Processor for Brain–Computer Interfaces
Brain-computer interfaces (BCIs) enable direct communication with the brain, providing valuable information about brain function and enabling novel treatment of brain disorders. Our group has been building {\abssys}, a flexible and ultra-low-power processing architecture for BCIs. HALO can process up to 46Mbps of neural data, a significant increase over the interfacing bandwidth achievable by prior BCIs. HALO can also be programmed to support several applications, unlike most prior BCIs. Key to HALO's effectiveness is a hardware accelerator cluster, where each accelerator operates within its own clock domain. A configurable interconnect connects the accelerators to create data flow pipelines that realize neural signal processing algorithms. We have taped out our design in a 12nm CMOS process. The resulting chip runs at 0.88V, per-accelerator frequencies of 3--180MHz, and consumes at most 5.0mW for each signal processing pipeline. Evaluations using electrophysiological data collected from a non-human primate confirm HALO's flexibility and superior performance per watt.  more » « less
Award ID(s):
1815718
PAR ID:
10470173
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
IEEE Micro
Date Published:
Journal Name:
IEEE Micro
Volume:
43
Issue:
3
ISSN:
0272-1732
Page Range / eLocation ID:
64 to 72
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A scalp-recording electroencephalography (EEG)-based brain-computer interface (BCI) system can greatly improve the quality of life for people who suffer from motor disabilities. Deep neural networks consisting of multiple convolutional, LSTM and fully-connected layers are created to decode EEG signals to maximize the human intention recognition accuracy. However, prior FPGA, ASIC, ReRAM and photonic accelerators cannot maintain sufficient battery lifetime when processing realtime intention recognition. In this paper, we propose an ultra-low-power photonic accelerator, MindReading, for human intention recognition by only low bit-width addition and shift operations. Compared to prior neural network accelerators, to maintain the real-time processing throughput, MindReading reduces the power consumption by 62.7% and improves the throughput per Watt by 168%. 
    more » « less
  2. Conventional in vivo neural signal processing involves extracting spiking activity within the recorded signals from an ensemble of neurons and transmitting only spike counts over an adequate interval. However, for brain-computer interface (BCI) applications utilizing continuous local field potentials (LFPs) for cognitive decoding, the volume of neural data to be transmitted to a computer imposes relatively high data rate requirements. This is particularly true for BCIs employing high-density intracortical recordings with hundreds or thousands of electrodes. This article introduces the first autoencoder-based compression digital circuit for the efficient transmission of LFP neural signals. Various algorithmic and architectural-level optimizations are implemented to significantly reduce the computational complexity and memory requirements of the designed in vivo compression circuit. This circuit employs an autoencoder-based neural network, providing a robust signal reconstruction. The application-specific integrated circuit (ASIC) of the in vivo compression logic occupies the smallest silicon area and consumes the lowest power among the reported state-of-the-art compression ASICs. Additionally, it offers a higher compression rate and a superior signal-to-noise and distortion ratio. 
    more » « less
  3. As the machine learning and systems communities strive to achieve higher energy-efficiency through custom deep neural network (DNN) accelerators, varied precision or quantization levels, and model compression techniques, there is a need for design space exploration frameworks that incorporate quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present QUIDAM , a highly parameterized quantization-aware DNN accelerator and model co-exploration framework. Our framework can facilitate future research on design space exploration of DNN accelerators for various design choices such as bit precision, processing element type, scratchpad sizes of processing elements, global buffer size, number of total processing elements, and DNN configurations. Our results show that different bit precisions and processing element types lead to significant differences in terms of performance per area and energy. Specifically, our framework identifies a wide range of design points where performance per area and energy varies more than 5 × and 35 ×, respectively. With the proposed framework, we show that lightweight processing elements achieve on par accuracy results and up to 5.7 × more performance per area and energy improvement when compared to the best INT16 based implementation. Finally, due to the efficiency of the pre-characterized power, performance, and area models, QUIDAM can speed up the design exploration process by 3-4 orders of magnitude as it removes the need for expensive synthesis and characterization of each design. 
    more » « less
  4. Although Convolutional Neural Networks (CNNs) have demonstrated the state-of-the-art inference accuracy in various intelligent applications, each CNN inference involves millions of expensive floating point multiply-accumulate (MAC) operations. To energy-efficiently process CNN inferences, prior work proposes an electro-optical accelerator to process power-of-2 quantized CNNs by electro-optical ripple-carry adders and optical binary shifters. The electro-optical accelerator also uses SRAM registers to store intermediate data. However, electro-optical ripple-carry adders and SRAMs seriously limit the operating frequency and inference throughput of the electro-optical accelerator, due to the long critical path of the adder and the long access latency of SRAMs. In this paper, we propose a photonic nonvolatile memory (NVM)-based accelerator, Light-Bulb, to process binarized CNNs by high frequency photonic XNOR gates and popcount units. LightBulb also adopts photonic racetrack memory to serve as input/output registers to achieve high operating frequency. Compared to prior electro-optical accelerators, on average, LightBulb improves the CNN inference throughput by 17× ~ 173× and the inference throughput per Watt by 17.5 × ~ 660×. 
    more » « less
  5. null (Ed.)
    Digital predistortion is the process of using digital signal processing to correct nonlinearities caused by the analog RF front-end of a wireless transmitter. These nonlinearities contribute to adjacent channel leakage, degrade the error vector magnitude of transmitted signals, and often force the transmitter to reduce its transmission power into a more linear but less power-efficient region of the device. Most predistortion techniques are based on polynomial models with an indirect learning architecture which have been shown to be overly sensitive to noise. In this work, we use neural network based predistortion with a novel neural network training method that avoids the indirect learning architecture and that shows significant improvements in both the adjacent channel leakage ratio and error vector magnitude. Moreover, we show that, by using a neural network based predistorter, we are able to achieve a 42% reduction in latency and 9.6% increase in throughput on an FPGA accelerator with 15% fewer multiplications per sample when compared to a similarly performing memory-polynomial implementation. 
    more » « less