skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Ocelli: Efficient Processing-in-Pixel Array Enabling Edge Inference of Ternary Neural Networks
Convolutional Neural Networks (CNNs), due to their recent successes, have gained lots of attention in various vision-based applications. They have proven to produce incredible results, especially on big data, that require high processing demands. However, CNN processing demands have limited their usage in embedded edge devices with constrained energy budgets and hardware. This paper proposes an efficient new architecture, namely Ocelli includes a ternary compute pixel (TCP) consisting of a CMOS-based pixel and a compute add-on. The proposed Ocelli architecture offers several features; (I) Because of the compute add-on, TCPs can produce ternary values (i.e., −1, 0, +1) regarding the light intensity as pixels’ inputs; (II) Ocelli realizes analog convolutions enabling low-precision ternary weight neural networks. Since the first layer’s convolution operations are the performance bottleneck of accelerators, Ocelli mitigates the overhead of analog buffers and analog-to-digital converters. Moreover, our design supports a zero-skipping scheme to further power reduction; (III) Ocelli exploits non-volatile magnetic RAMs to store CNN’s weights, which remarkably reduces the static power consumption; and finally, (IV) Ocelli has two modes, including sensing and processing. Once the object is detected, the architecture switches to the typical sensing mode to capture the image. Compared to the conventional pixels, it achieves an average 10% efficiency on its lane detection power consumption compared with existing edge detection algorithms. Moreover, considering different CNN workloads, our design shows more than 23% power efficiency over conventional designs, while it can achieve better accuracy.  more » « less
Award ID(s):
2216772 2228028 2216773
PAR ID:
10426802
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Low Power Electronics and Applications
Volume:
12
Issue:
4
ISSN:
2079-9268
Page Range / eLocation ID:
57
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the Artificial Intelligence of Things (AIoT) era, always-on intelligent and self-powered visual perception systems have gained considerable attention and are widely used. Thus, this paper proposes TizBin, a low-power processing in-sensor scheme with event and object detection capabilities to eliminate power costs of data conversion and transmission and enable data-intensive neural network tasks. Once the moving object is detected, TizBin architecture switches to the high-power object detection mode to capture the image. TizBin offers several unique features, such as analog convolutions enabling low-precision ternary weight neural networks (TWNN) to mitigate the overhead of analog buffer and analog-to-digital converters. Moreover, TizBin exploits non-volatile magnetic RAMs to store NN’s weights, remarkably reducing static power consumption. Our circuit-to-application co-simulation results for TWNNs demonstrate minor accuracy degradation on various image datasets, while TizBin achieves a frame rate of 1000 and efficiency of ∼1.83 TOp/s/W. 
    more » « less
  2. Recently, Intelligent IoT (IIoT), including various sensors, has gained significant attention due to its capability of sensing, deciding, and acting by leveraging artificial neural networks (ANN). Nevertheless, to achieve acceptable accuracy and high performance in visual systems, a power-delay-efficient architecture is required. In this paper, we propose an ultra-low-power processing in-sensor architecture, namely SenTer, realizing low-precision ternary multi-layer perceptron networks, which can operate in detection and classification modes. Moreover, SenTer supports two activation functions based on user needs and the desired accuracy-energy trade-off. SenTer is capable of performing all the required computations for the MLP's first layer in the analog domain and then submitting its results to a co-processor. Therefore, SenTer significantly reduces the overhead of analog buffers, data conversion, and transmission power consumption by using only one ADC. Additionally, our simulation results demonstrate acceptable accuracy on various datasets compared to the full precision models. 
    more » « less
  3. Energy-efficient image acquisition on the edge is crucial for enabling remote sensing applications where the sensor node has weak compute capabilities and must transmit data to a remote server/cloud for processing. To reduce the edge energy consumption, this paper proposes a sensor-algorithm co-designed system called SNAPPIX, which compresses raw pixels in the analog domain inside the sensor. We use coded exposure (CE) as the in-sensor compression strategy as it offers the flexibility to sample, i.e., selectively expose pixels, both spatially and temporally. SNAPPIX has three contributions. First, we propose a task-agnostic strategy to learn the sampling/exposure pattern based on the classic theory of efficient coding. Second, we co- design the downstream vision model with the exposure pattern to address the pixel-level non-uniformity unique to CE-compressed images. Finally, we propose lightweight augmentations to the image sensor hardware to support our in-sensor CE compres- sion. Evaluating on action recognition and video reconstruction, SNAPPIX outperforms state-of-the-art video-based methods at the same speed while reducing the energy by up to 15.4×. We have open-sourced the code at: https://github.com/horizon- research/SnapPix. 
    more » « less
  4. This paper presents a novel transposed MRAM architecture (WinEdge) specifically optimized for Winograd convolution acceleration in edge computing devices. Leveraging Magnetic Tunnel Junctions (MTJs) with Spin Hall Effect (SHE)-assisted Spin-Transfer Torque (STT) writing, the proposed design enables a single SHE current to simultaneously write data to four MTJs, substantially reducing power consumption. Additionally, the integration of stacked MTJs significantly improves storage density. The proposed WinEdge efficiently supports both standard and transposed data access modes regardless of bit-width, achieving up to 36% lower power, 47% reduced energy consumption, and 28% faster processing speed compared to existing designs. Simulations conducted in 45 nm CMOS technology validate its superiority over conventional SRAM-based solutions for convolutional neural network (CNN) acceleration in resource-constrained edge environments. 
    more » « less
  5. Abstract Optical metasurfaces performing analog image processing – such as spatial differentiation and edge detection – hold the potential to reduce processing times and power consumption, while avoiding bulky 4 F lens systems. However, current designs have been suffering from trade-offs between spatial resolution, throughput, polarization asymmetry, operational bandwidth, and isotropy. Here, we show that dispersion engineering provides an elegant way to design metasurfaces where all these critical metrics are simultaneously optimized. We experimentally demonstrate silicon metasurfaces performing isotropic and dual-polarization edge detection, with numerical apertures above 0.35 and spectral bandwidths of 35 nm around 1500 nm. Moreover, we introduce quantitative metrics to assess the efficiency of these devices. Thanks to the low loss nature and dual-polarization response, our metasurfaces feature large throughput efficiencies, approaching the theoretical maximum for a given NA. Our results pave the way for low-loss, high-efficiency and broadband optical computing and image processing with free-space metasurfaces. 
    more » « less