skip to main content

Title: LightBulb: A Photonic-Nonvolatile-Memory-based Accelerator for Binarized Convolutional Neural Networks
Although Convolutional Neural Networks (CNNs) have demonstrated the state-of-the-art inference accuracy in various intelligent applications, each CNN inference involves millions of expensive floating point multiply-accumulate (MAC) operations. To energy-efficiently process CNN inferences, prior work proposes an electro-optical accelerator to process power-of-2 quantized CNNs by electro-optical ripple-carry adders and optical binary shifters. The electro-optical accelerator also uses SRAM registers to store intermediate data. However, electro-optical ripple-carry adders and SRAMs seriously limit the operating frequency and inference throughput of the electro-optical accelerator, due to the long critical path of the adder and the long access latency of SRAMs. In this paper, we propose a photonic nonvolatile memory (NVM)-based accelerator, Light-Bulb, to process binarized CNNs by high frequency photonic XNOR gates and popcount units. LightBulb also adopts photonic racetrack memory to serve as input/output registers to achieve high operating frequency. Compared to prior electro-optical accelerators, on average, LightBulb improves the CNN inference throughput by 17× ~ 173× and the inference throughput per Watt by 17.5 × ~ 660×.
; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Design, Automation & Test in Europe Conference & Exhibition
Page Range or eLocation-ID:
1438 to 1443
Sponsoring Org:
National Science Foundation
More Like this
  1. Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we present a novel optical-domain BNN accelerator, named ROBIN , which intelligently integrates heterogeneous microring resonator optical devices with complementary capabilities to efficiently implement the key functionalities in BNNs. We perform detailed fabrication-process variation analyses at the optical device level, explore efficient corrective tuning for these devices, and integrate circuit-level optimization to counter thermal variations. As a result, our proposed ROBIN architecture possesses the desirable traits of being robust, energy-efficient, low latency, and high throughput, when executing BNN models. Our analysis shows that ROBIN can outperform the best-known optical BNN accelerators and many electronic accelerators. Specifically, our energy-efficient ROBIN design exhibits energy-per-bit values that are ∼4 × lower than electronic BNN accelerators and ∼933 × lower than a recently proposed photonic BNNmore »accelerator, while a performance-efficient ROBIN design shows ∼3 × and ∼25 × better performance than electronic and photonic BNN accelerators, respectively.« less
  2. In this paper, we pave a novel way towards the concept of bit-wise In-Memory Convolution Engine (IMCE) that could implement the dominant convolution computation of Deep Convolutional Neural Networks (CNN) within memory. IMCE employs parallel computational memory sub-array as a fundamental unit based on our proposed Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) design. Then, we propose an accelerator system architecture based on IMCE to efficiently process low bit-width CNNs. This architecture can be leveraged to greatly reduce energy consumption dealing with convolutional layers and also accelerate CNN inference. The device to architecture co-simulation results show that the proposed system architecture can process low bit-width AlexNet on ImageNet data-set favorably with 785.25μJ/img, which consumes ~3× less energy than that of recent RRAM based counterpart. Besides, the chip area is ~4× smaller.
  3. BACKGROUND Electromagnetic (EM) waves underpin modern society in profound ways. They are used to carry information, enabling broadcast radio and television, mobile telecommunications, and ubiquitous access to data networks through Wi-Fi and form the backbone of our modern broadband internet through optical fibers. In fundamental physics, EM waves serve as an invaluable tool to probe objects from cosmic to atomic scales. For example, the Laser Interferometer Gravitational-Wave Observatory and atomic clocks, which are some of the most precise human-made instruments in the world, rely on EM waves to reach unprecedented accuracies. This has motivated decades of research to develop coherent EM sources over broad spectral ranges with impressive results: Frequencies in the range of tens of gigahertz (radio and microwave regimes) can readily be generated by electronic oscillators. Resonant tunneling diodes enable the generation of millimeter (mm) and terahertz (THz) waves, which span from tens of gigahertz to a few terahertz. At even higher frequencies, up to the petahertz level, which are usually defined as optical frequencies, coherent waves can be generated by solid-state and gas lasers. However, these approaches often suffer from narrow spectral bandwidths, because they usually rely on well-defined energy states of specific materials, which results inmore »a rather limited spectral coverage. To overcome this limitation, nonlinear frequency-mixing strategies have been developed. These approaches shift the complexity from the EM source to nonresonant-based material effects. Particularly in the optical regime, a wealth of materials exist that support effects that are suitable for frequency mixing. Over the past two decades, the idea of manipulating these materials to form guiding structures (waveguides) has provided improvements in efficiency, miniaturization, and production scale and cost and has been widely implemented for diverse applications. ADVANCES Lithium niobate, a crystal that was first grown in 1949, is a particularly attractive photonic material for frequency mixing because of its favorable material properties. Bulk lithium niobate crystals and weakly confining waveguides have been used for decades for accessing different parts of the EM spectrum, from gigahertz to petahertz frequencies. Now, this material is experiencing renewed interest owing to the commercial availability of thin-film lithium niobate (TFLN). This integrated photonic material platform enables tight mode confinement, which results in frequency-mixing efficiency improvements by orders of magnitude while at the same time offering additional degrees of freedom for engineering the optical properties by using approaches such as dispersion engineering. Importantly, the large refractive index contrast of TFLN enables, for the first time, the realization of lithium niobate–based photonic integrated circuits on a wafer scale. OUTLOOK The broad spectral coverage, ultralow power requirements, and flexibilities of lithium niobate photonics in EM wave generation provides a large toolset to explore new device functionalities. Furthermore, the adoption of lithium niobate–integrated photonics in foundries is a promising approach to miniaturize essential bench-top optical systems using wafer scale production. Heterogeneous integration of active materials with lithium niobate has the potential to create integrated photonic circuits with rich functionalities. Applications such as high-speed communications, scalable quantum computing, artificial intelligence and neuromorphic computing, and compact optical clocks for satellites and precision sensing are expected to particularly benefit from these advances and provide a wealth of opportunities for commercial exploration. Also, bulk crystals and weakly confining waveguides in lithium niobate are expected to keep playing a crucial role in the near future because of their advantages in high-power and loss-sensitive quantum optics applications. As such, lithium niobate photonics holds great promise for unlocking the EM spectrum and reshaping information technologies for our society in the future. Lithium niobate spectral coverage. The EM spectral range and processes for generating EM frequencies when using lithium niobate (LN) for frequency mixing. AO, acousto-optic; AOM, acousto-optic modulation; χ (2) , second-order nonlinearity; χ (3) , third-order nonlinearity; EO, electro-optic; EOM, electro-optic modulation; HHG, high-harmonic generation; IR, infrared; OFC, optical frequency comb; OPO, optical paramedic oscillator; OR, optical rectification; SCG, supercontinuum generation; SHG, second-harmonic generation; UV, ultraviolet.« less
  4. Ultra-fast & low-power superconductor single-flux-quantum (SFQ)-based CNN systolic accelerators are built to enhance the CNN inference throughput. However, shift-register (SHIFT)-based scratchpad memory (SPM) arrays prevent a SFQ CNN accelerator from exceeding 40% of its peak throughput, due to the lack of random access capability. This paper first documents our study of a variety of cryogenic memory technologies, including Vortex Transition Memory (VTM), Josephson-CMOS SRAM, MRAM, and Superconducting Nanowire Memory, during which we found that none of the aforementioned technologies made a SFQ CNN accelerator achieve high throughput, small area, and low power simultaneously. Second, we present a heterogeneous SPM architecture, SMART, composed of SHIFT arrays and a random access array to improve the inference throughput of a SFQ CNN systolic accelerator. Third, we propose a fast, low-power and dense pipelined random access CMOS-SFQ array by building SFQ passive-transmission-line-based H-Trees that connect CMOS sub-banks. Finally, we create an ILP-based compiler to deploy CNN models on SMART. Experimental results show that, with the same chip area overhead, compared to the latest SHIFT-based SFQ CNN accelerator, SMART improves the inference throughput by 3.9 × (2.2 ×), and reduces the inference energy by 86% (71%) when inferring a single image (a batch of images).
  5. Large Convolutional Neural Networks (CNNs) are often pruned and compressed to reduce the amount of parameters and memory requirement. However, the resulting irregularity in the sparse data makes it difficult for FPGA accelerators that contains systolic arrays of Multiply-and-Accumulate (MAC) units, such as Intel’s FPGA-based Deep Learning Accelerator (DLA), to achieve their maximum potential. Moreover, FPGAs with low-bandwidth off-chip memory could not satisfy the memory bandwidth requirement for sparse matrix computation. In this paper, we present 1) a sparse matrix packing technique that condenses sparse inputs and filters before feeding them into the systolic array of MAC units in the Intel DLA, and 2) a customization of the Intel DLA which allows the FPGA to efficiently utilize a high bandwidth memory (HBM2) integrated in the same package. For end-to-end inference with randomly pruned ResNet-50/MobileNet CNN models, our experiments demonstrate 2.7x/3x performance improvement compared to an FPGA with DDR4, 2.2x/2.1x speedup against a server-class Intel SkyLake CPU, and comparable performance with 1.7x/2x power efficiency gain as compared to an NVidia V100 GPU.