Although Convolutional Neural Networks (CNNs) have demonstrated the state-of-the-art inference accuracy in various intelligent applications, each CNN inference involves millions of expensive floating point multiply-accumulate (MAC) operations. To energy-efficiently process CNN inferences, prior work proposes an electro-optical accelerator to process power-of-2 quantized CNNs by electro-optical ripple-carry adders and optical binary shifters. The electro-optical accelerator also uses SRAM registers to store intermediate data. However, electro-optical ripple-carry adders and SRAMs seriously limit the operating frequency and inference throughput of the electro-optical accelerator, due to the long critical path of the adder and the long access latency of SRAMs. In this paper, we propose a photonic nonvolatile memory (NVM)-based accelerator, Light-Bulb, to process binarized CNNs by high frequency photonic XNOR gates and popcount units. LightBulb also adopts photonic racetrack memory to serve as input/output registers to achieve high operating frequency. Compared to prior electro-optical accelerators, on average, LightBulb improves the CNN inference throughput by 17× ~ 173× and the inference throughput per Watt by 17.5 × ~ 660×.
more »
« less
MindReading: An Ultra-Low-Power Photonic Accelerator for EEG-based Human Intention Recognition
A scalp-recording electroencephalography (EEG)-based brain-computer interface (BCI) system can greatly improve the quality of life for people who suffer from motor disabilities. Deep neural networks consisting of multiple convolutional, LSTM and fully-connected layers are created to decode EEG signals to maximize the human intention recognition accuracy. However, prior FPGA, ASIC, ReRAM and photonic accelerators cannot maintain sufficient battery lifetime when processing realtime intention recognition. In this paper, we propose an ultra-low-power photonic accelerator, MindReading, for human intention recognition by only low bit-width addition and shift operations. Compared to prior neural network accelerators, to maintain the real-time processing throughput, MindReading reduces the power consumption by 62.7% and improves the throughput per Watt by 168%.
more »
« less
- PAR ID:
- 10167929
- Date Published:
- Journal Name:
- Asia and South Pacific Design Automation Conference
- Page Range / eLocation ID:
- 464 to 469
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Brain-computer interfaces (BCIs) enable direct communication with the brain, providing valuable information about brain function and enabling novel treatment of brain disorders. Our group has been building {\abssys}, a flexible and ultra-low-power processing architecture for BCIs. HALO can process up to 46Mbps of neural data, a significant increase over the interfacing bandwidth achievable by prior BCIs. HALO can also be programmed to support several applications, unlike most prior BCIs. Key to HALO's effectiveness is a hardware accelerator cluster, where each accelerator operates within its own clock domain. A configurable interconnect connects the accelerators to create data flow pipelines that realize neural signal processing algorithms. We have taped out our design in a 12nm CMOS process. The resulting chip runs at 0.88V, per-accelerator frequencies of 3--180MHz, and consumes at most 5.0mW for each signal processing pipeline. Evaluations using electrophysiological data collected from a non-human primate confirm HALO's flexibility and superior performance per watt.more » « less
-
Convolutional Neural Networks (CNNs) are widely used due to their effectiveness in various AI applications such as object recognition, speech processing, etc., where the multiply-and-accumulate (MAC) operation contributes to ∼95% of the computation time. From the hardware implementation perspective, the performance of current CMOS-based MAC accelerators is limited mainly due to their von-Neumann architecture and corresponding limited memory bandwidth. In this way, silicon photonics has been recently explored as a promising solution for accelerator design to improve the speed and power-efficiency of the designs as opposed to electronic memristive crossbars. In this work, we briefly study recent silicon photonics accelerators and take initial steps to develop an open-source and adaptive crossbar architecture simulator for that. Keeping the original functionality of the MNSIM tool [1], we add a new photonic mode that utilizes the pre-existing algorithm to work with a photonic Phase Change Memory (pPCM) based crossbar structure. With inputs from the CNN's topology, the accelerator configuration, and experimentally-benchmarked data, the presented simulator can report the optimal crossbar size, the number of crossbars needed, and the estimation of total area, power, and latency.more » « less
-
The unprecedented success of artificial intelligence (AI) enriches machine learning (ML)-based applications. The availability of big data and compute-intensive algorithms empowers versatility and high accuracy in ML approaches. However, the data processing and innumerable computations burden conventional hardware systems with high power consumption and low performance. Breaking away from the traditional hardware design, non-conventional accelerators exploiting emerging technology have gained significant attention with a leap forward since the emerging devices enable processing-in-memory (PIM) designs of dramatic improvement in efficiency. This paper presents a summary of state-of-the-art PIM accelerators over a decade. The PIM accelerators have been implemented for diverse models and advanced algorithm techniques across diverse neural networks in language processing and image recognition to expedite inference and training. We will provide the implemented designs, methodologies, and results, following the development in the past years. The promising direction of the PIM accelerators, vertically stacking for More than Moore, is also discussed.more » « less
-
The convergence of edge computing and artificial intelligence requires that inference is performed on-device to provide rapid response with low latency and high accuracy without transferring large amounts of data to the cloud. However, power and size limitations make it challenging for electrical accelerators to support both inference and training for large neural network models. To this end, we propose Trident, a low-power photonic accelerator that combines the benefits of phase change material (PCM) and photonics to implement both inference and training in one unified architecture. Emerging silicon photonics has the potential to exploit the parallelism of neural network models, reduce power consumption and provide high bandwidth density via wavelength division multiplexing, making photonics an ideal candidate for on-device training and inference. As PCM is reconfigurable and non-volatile, we utilize it for two distinct purposes: (i) to maintain resonant wavelength without expensive electrical or thermal heaters, and (ii) to implement non-linear activation function, which eliminates the need to move data between memory and compute units. This multi-purpose use of PCM is shown to lead to significant reduction in energy consumption and execution time. Compared to photonic accelerators DEAP-CNN, CrossLight, and PIXEL, Trident improves energy efficiency by up to 43% and latency by up to 150% on average. Compared to electronic edge AI accelerators Google Coral which utilizes the Google Edge TPU and Bearkey TB96-AI, Trident improves energy efficiency by 11% and 93% respectively. While NVIDIA AGX Xavier is more energy efficient, the reduced data movement and GST activation of Trident reduce latency by 107% on average compared to the NVIDIA accelerator. When compared to the Google Coral and the Bearkey TB96-AI, Trident reduces latency by 1413% and 595% on average.more » « less