skip to main content

Title: ROBIN: A Robust Optical Binary Neural Network Accelerator
Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we present a novel optical-domain BNN accelerator, named ROBIN , which intelligently integrates heterogeneous microring resonator optical devices with complementary capabilities to efficiently implement the key functionalities in BNNs. We perform detailed fabrication-process variation analyses at the optical device level, explore efficient corrective tuning for these devices, and integrate circuit-level optimization to counter thermal variations. As a result, our proposed ROBIN architecture possesses the desirable traits of being robust, energy-efficient, low latency, and high throughput, when executing BNN models. Our analysis shows that ROBIN can outperform the best-known optical BNN accelerators and many electronic accelerators. Specifically, our energy-efficient ROBIN design exhibits energy-per-bit values that are ∼4 × lower than electronic BNN accelerators and ∼933 × lower than a recently proposed photonic BNN accelerator, while a performance-efficient ROBIN design shows ∼3 × and ∼25 × better performance than electronic and photonic BNN accelerators, respectively.  more » « less
Award ID(s):
2006788 1813370
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM Transactions on Embedded Computing Systems
Page Range / eLocation ID:
1 to 24
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Binary neural networks (BNNs) substitute complex arithmetic operations with simple bit-wise operations. The binarized weights and activations in BNNs can drastically reduce memory requirement and energy consumption, making it attractive for edge ML applications with limited resources. However, the severe memory capacity and energy constraints of low-power edge devices call for further reduction of BNN models beyond binarization. Weight pruning is a proven solution for reducing the size of many neural network (NN) models, but the binary nature of BNN weights make it difficult to identify insignificant weights to remove. In this paper, we present a pruning method based on latent weight with layer-level pruning sensitivity analysis which reduces the over-parameterization of BNNs, allowing for accuracy gains while drastically reducing the model size. Our method advocates for a heuristics that distinguishes weights by their latent weights, a real-valued vector used to compute the pseudogradient during backpropagation. It is tested using three different convolutional NNs on the MNIST, CIFAR-10, and Imagenette datasets with results indicating a 33%--46% reduction in operation count, with no accuracy loss, improving upon previous works in accuracy, model size, and total operation count. 
    more » « less
  2. In pursuit of higher inference accuracy, deep neural network (DNN) models have significantly increased in complexity and size. To overcome the consequent computational challenges, scalable chiplet-based accelerators have been proposed. However, data communication using metallic-based interconnects in these chiplet-based DNN accelerators is becoming a primary obstacle to performance, energy efficiency, and scalability. The photonic interconnects can provide adequate data communication support due to some superior properties like low latency, high bandwidth and energy efficiency, and ease of broadcast communication. In this paper, we propose SPACX: a Silicon Photonics-based Chiplet ACcelerator for DNN inference applications. Specifically, SPACX includes a photonic network design that enables seamless single-chiplet and cross-chiplet broadcast communications, and a tailored dataflow that promotes data broadcast and maximizes parallelism. Furthermore, we explore the broadcast granularities of the photonic network and implications on system performance and energy efficiency. A flexible bandwidth allocation scheme is also proposed to dynamically adjust communication bandwidths for different types of data. Simulation results using several DNN models show that SPACX can achieve 78% and 75% reduction in execution time and energy, respectively, as compared to other state-of-the-art chiplet-based DNN accelerators. 
    more » « less
  3. Due to the high computational complexity and memory storage requirement, it is hard to directly deploy a full-precision Convolutional neural network (CNN) on embedded devices. The hardware-friendly designs are needed for resource-limited and energy-constrained embedded devices. Emerging solutions are adopted for the neural network compression, e.g., binary/ternary weight network, pruned network and quantized network. Among them, binary neural network (BNN) is believed to be the most hardware-friendly framework due to its small network size and low computational complexity. No existing work has further shrunk the size of BNN. In this work, we explore the redundancy in BNN and build a compact BNN (CBNN) based on the bit-level sensitivity analysis and bit-level data pruning. The input data is converted to a high dimensional bit-sliced format. In the post-training stage, we analyze the impact of different bit slices to the accuracy. By pruning the redundant input bit slices and shrinking the network size, we are able to build a more compact BNN. Our result shows that we can further scale down the network size of the BNN up to 3.9x with no more than a 1% accuracy drop. The actual runtime can be reduced up to 2x and 9.9x compared with the baseline BNN and its full-precision counterpart, respectively. 
    more » « less
  4. As the machine learning and systems communities strive to achieve higher energy-efficiency through custom deep neural network (DNN) accelerators, varied precision or quantization levels, and model compression techniques, there is a need for design space exploration frameworks that incorporate quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present QUIDAM , a highly parameterized quantization-aware DNN accelerator and model co-exploration framework. Our framework can facilitate future research on design space exploration of DNN accelerators for various design choices such as bit precision, processing element type, scratchpad sizes of processing elements, global buffer size, number of total processing elements, and DNN configurations. Our results show that different bit precisions and processing element types lead to significant differences in terms of performance per area and energy. Specifically, our framework identifies a wide range of design points where performance per area and energy varies more than 5 × and 35 ×, respectively. With the proposed framework, we show that lightweight processing elements achieve on par accuracy results and up to 5.7 × more performance per area and energy improvement when compared to the best INT16 based implementation. Finally, due to the efficiency of the pre-characterized power, performance, and area models, QUIDAM can speed up the design exploration process by 3-4 orders of magnitude as it removes the need for expensive synthesis and characterization of each design. 
    more » « less
  5. Battery-free and intermittently powered devices offer long lifetimes and enable deployment in new applications and environments. Unfortunately, developing sophisticated inference-capable applications is still challenging due to the lack of platform support for more advanced (32-bit) microprocessors and specialized accelerators---which can execute data-intensive machine learning tasks, but add complexity across the stack when dealing with intermittent power. We present Protean to bridge the platform gap for inference-capable battery-free sensors. Designed for runtime scalability, meeting the dynamic range of energy harvesters with matching heterogeneous processing elements like neural network accelerators. We develop a modular "plug-and-play" hardware platform, SuperSensor, with a reconfigurable energy storage circuit that powers a 32-bit ARM-based microcontroller with a convolutional neural network accelerator. An adaptive task-based runtime system, Chameleon, provides intermittency-proof execution of machine learning tasks across heterogeneous processing elements. The runtime automatically scales and dispatches these tasks based on incoming energy, current state, and programmer annotations. A code generator, Metamorph, automates conversion of ML models to intermittent safe execution across heterogeneous compute elements. We evaluate Protean with audio and image workloads and demonstrate up to 666x improvement in inference energy efficiency by enabling usage of modern computational elements within intermittent computing. Further, Protean provides up to 166% higher throughput compared to non-adaptive baselines. 
    more » « less