Evolution has honed predatory skills in the natural world where localizing and intercepting fast-moving prey is required. The current generation of robotic systems mimics these biological systems using deep learning. High-speed processing of the camera frames using convolutional neural networks (CNN) (frame pipeline) on such constrained aerial edge-robots gets resource-limited. Adding more compute resources also eventually limits the throughput at the frame rate of the camera as frame-only traditional systems fail to capture the detailed temporal dynamics of the environment. Bio-inspired event cameras and spiking neural networks (SNN) provide an asynchronous sensor-processor pair (event pipeline) capturing the continuous temporal details of the scene for high-speed but lag in terms of accuracy. In this work, we propose a target localization system combining event-camera and SNN-based high-speed target estimation and frame-based camera and CNN-driven reliable object detection by fusing complementary spatio-temporal prowess of event and frame pipelines. One of our main contributions involves the design of an SNN filter that borrows from the neural mechanism for ego-motion cancelation in houseflies. It fuses the vestibular sensors with the vision to cancel the activity corresponding to the predator's self-motion. We also integrate the neuro-inspired multi-pipeline processing with task-optimized multi-neuronal pathway structure in primates and insects. The system is validated to outperform CNN-only processing using prey-predator drone simulations in realistic 3D virtual environments. The system is then demonstrated in a real-world multi-drone set-up with emulated event data. Subsequently, we use recorded actual sensory data from multi-camera and inertial measurement unit (IMU) assembly to show desired working while tolerating the realistic noise in vision and IMU sensors. We analyze the design space to identify optimal parameters for spiking neurons, CNN models, and for checking their effect on the performance metrics of the fused system. Finally, we map the throughput controlling SNN and fusion network on edge-compatible Zynq-7000 FPGA to show a potential 264 outputs per second even at constrained resource availability. This work may open new research directions by coupling multiple sensing and processing modalities inspired by discoveries in neuroscience to break fundamental trade-offs in frame-based computer vision 1 .
more »
« less
Live Demonstration: Hybrid RRAM and SRAM SoC for Fused Frame and Event Target Tracking
Event and frame cameras capture the complemen-tary spatial and temporal details of a scene providing an accuracy vs. latency trade-off. Fusing these processing modalities using convolutional (CNN) and spiking neural networks (SNN) respectively has been shown for target tracking. We present our heterogeneous RRAM compute-in-memory (CIM) and SRAM compute-near-memory (CNM) SoC for simultaneous processing of CNN and SNN. We will show the advantage of using fused vision over frame-only vision and demonstrate python programmable data streaming. The visitors will be able to see the processing-dependent dynamic power gating of non-volatile RRAM and in-memory error correction capability.
more »
« less
- Award ID(s):
- 2153440
- PAR ID:
- 10497849
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 978-1-6654-5109-3
- Page Range / eLocation ID:
- 1 to 1
- Format(s):
- Medium: X
- Location:
- Monterey, CA, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
IEEE (Ed.)Resistive random access Memory (RRAM) based spiking neural networks (SNN) are becoming increasingly attractive for pervasive energy-efficient classification tasks. However, such networks suffer from degradation of performance (as determined by classification accuracy) due to the effects of process variations on fabricated RRAM devices resulting in loss of manufacturing yield. To address such yield loss, a two-step approach is developed. First, an alternative test framework is used to predict the performance of fabricated RRAM based SNNs using the SNN response to a small subset of images from the test image dataset, called the SNN response signature (to minimize test cost). This diagnoses those SNNs that need to be performance-tuned for yield recovery. Next, SNN tuning is performed by modulating the spiking thresholds of the SNN neurons on a layer-by-layer basis using a trained regressor that maps the SNN response signature to the optimal spiking thresholdvalues during tuning. The optimal spiking threshold values are determined by an off-line optimization algorithm. Experiments show that the proposed framework can reduce the number of out-of-spec SNN devices by up to 54% and improve yield by as much as 8.6%.more » « less
-
High-quality 3D image recognition is an important component of many vision and robotics systems. However, the accurate processing of these images requires the use of compute-expensive 3D Convolutional Neural Networks (CNNs). To address this challenge, we propose the use of Spiking Neural Networks (SNNs) that are generated from iso-architecture CNNs and trained with quantization-aware gradient descent to optimize their weights, membrane leak, and firing thresholds. During both training and inference, the analog pixel values of a 3D image are directly applied to the input layer of the SNN without the need to convert to a spike-train. This significantly reduces the training and inference latency and results in high degree of activation sparsity, which yields significant improvements in computational efficiency. However, this introduces energy-hungry digital multiplications in the first layer of our models, which we propose to mitigate using a processing-in-memory (PIM) architecture. To evaluate our proposal, we propose a 3D and a 3D/2D hybrid SNN-compatible convolutional architecture and choose hyperspectral imaging (HSI) as an application for 3D image recognition. We achieve overall test accuracy of 98.68, 99.50, and 97.95% with 5 time steps (inference latency) and 6-bit weight quantization on the Indian Pines, Pavia University, and Salinas Scene datasets, respectively. In particular, our models implemented using standard digital hardware achieved accuracies similar to state-of-the-art (SOTA) with ~560.6× and ~44.8× less average energy than an iso-architecture full-precision and 6-bit quantized CNN, respectively. Adopting the PIM architecture in the first layer, further improves the average energy, delay, and energy-delay-product (EDP) by 30, 7, and 38%, respectively.more » « less
-
Combinatorial optimization problems prevail in engineering and industry. Some are NP-hard and thus become difficult to solve on edge devices due to limited power and computing resources. Quadratic Unconstrained Binary Optimization (QUBO) problem is a valuable emerging model that can formulate numerous combinatorial problems, such as Max-Cut, traveling salesman problems, and graphic coloring. QUBO model also reconciles with two emerging computation models, quantum computing and neuromorphic computing, which can potentially boost the speed and energy efficiency in solving combinatorial problems. In this work, we design a neuromorphic QUBO solver composed of a swarm of spiking neural networks (SNN) that conduct a population-based meta-heuristic search for solutions. The proposed model can achieve about x20 40 speedup on large QUBO problems in terms of time steps compared to a traditional neural network solver. As a codesign, we evaluate the neuromorphic swarm solver on a 40nm 25mW Resistive RAM (RRAM) Compute-in-Memory (CIM) SoC with a 2.25MB RRAM-based accelerator and an embedded Cortex M3 core. The collaborative SNN swarm can fully exploit the specialty of CIM accelerator in matrix and vector multiplications. Compared to previous works, such an algorithm-hardware synergized solver exhibits advantageous speed and energy efficiency for edge devices.more » « less
-
This work presents the first resistive random access memory (RRAM)-based compute-in-memory (CIM) macro design tailored for genome processing. We analyze and demonstrate two key types of genome processing applications using our developed CIM chip prototype: the state-of-the-art (SOTA) burrows–wheeler transform (BWT)-based DNA short- read alignment and alignment-free mRNA quantification. Our CIM macro is designed and optimized to support the major functions essential to these algorithms, e.g., parallel XNOR operations, count, addition, and parallel bit-wise and operations. The proposed CIM macro prototype is fabricated with monolithic integration of HfO2 RRAM and 65-nm CMOS, achieving 2.07 TOPS/W (tera-operations per second per watt) and 2.12 G suffixes/J (suffixes per joule) at 1.0 V, which is the most energy-efficient solution to date for genome processing.more » « less