Brain-inspired HyperDimensional (HD) computing emulates cognitive tasks by computing with long binary vectors–aka hypervectors–as opposed to computing with numbers. However, we observed that in order to provide acceptable classification accuracy on practical applications, HD algorithms need to be trained and tested on non-binary hypervectors. In this paper, we propose SearcHD, a fully binarized HD computing algorithm with a fully binary training. SearcHD maps every data points to a high-dimensional space with binary elements. Instead of training an HD model with non-binary elements, SearcHD implements a full binary training method which generates multiple binary hypervectors for each class. We also use the analog characteristic of non-volatile memories (NVMs) to perform all encoding, training, and inference computations in memory. We evaluate the efficiency and accuracy of SearcHD on a wide range of classification applications. Our evaluation shows that SearcHD can provide on average 31.1× higher energy efficiency and 12.8× faster training as compared to the state-of-the-art HD computing algorithms.
more »
« less
QuantHD: A Quantization Framework for Hyperdimensional Computing
Brain-inspired Hyperdimensional (HD) computing models cognition by exploiting properties of high dimensional statistics– high-dimensional vectors, instead of working with numeric values used in contemporary processors. A fundamental weakness of existing HD computing algorithms is that they require to use floating point models in order to provide acceptable accuracy on realistic classification problems. However, working with floating point values significantly increases the HD computation cost. To address this issue, we proposed QuantHD, a novel framework for quantization of HD computing model during training. QuantHD enables HD computing to work with a low-cost quantized model (binary or ternary model) while providing a similar accuracy as the floating point model. We accordingly propose an FPGA implementation which accelerates HD computing in both training and inference phases. We evaluate QuantHD accuracy and efficiency on various real-world applications, and observe that QuantHD can achieve on average 17.2% accuracy improvement as compared to the existing binarized HD computing algorithms which provide a similar computation cost. In terms of efficiency, QuantHD FPGA implementation can achieve on average 42.3× and 4.7× (34.1× and 4.1×) energy efficiency improvement and speedup during inference (training) as compared to the state-of-the-art HD computing algorithms.
more »
« less
- Award ID(s):
- 1911095
- PAR ID:
- 10166199
- Date Published:
- Journal Name:
- IEEE transactions on computeraided design of integrated circuits and systems
- ISSN:
- 1937-4151
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Processing large amounts of data, especially in learning algorithms, poses a challenge for current embedded computing systems. Hyperdimensional (HD) computing (HDC) is a brain-inspired computing paradigm that works with high-dimensional vectors called hypervectors . HDC replaces several complex learning computations with bitwise and simpler arithmetic operations at the expense of an increased amount of data due to mapping the data into high-dimensional space. These hypervectors, more often than not, cannot be stored in memory, resulting in long data transfers from storage. In this article, we propose Store-n-Learn, an in-storage computing solution that performs HDC classification and clustering by implementing encoding, training, retraining, and inference across the flash hierarchy. To hide the latency of training and enable efficient computation, we introduce the concept of batching in HDC. We also present on-chip acceleration for HDC encoding in flash planes. This enables us to exploit the high parallelism provided by the flash hierarchy and encode multiple data points in parallel in both batched and non-batched fashion. Store-n-Learn also implements a single top-level FPGA accelerator with novel implementations for HDC classification training, retraining, inference, and clustering on the encoded data. Our evaluation over 10 popular datasets shows that Store-n-Learn is on average 222× (543×) faster than CPU and 10.6× (7.3×) faster than the state-of-the-art in-storage computing solution, INSIDER for HDC classification (clustering).more » « less
-
There have been many recent attempts to extend the successes of convolutional neural networks (CNNs) from 2-dimensional (2D) image classification to 3-dimensional (3D) video recognition by exploring 3D CNNs. Considering the emerging growth of mobile or Internet of Things (IoT) market, it is essential to investigate the deployment of 3D CNNs on edge devices. Previous works have implemented standard 3D CNNs (C3D) on hardware platforms, however, they have not exploited model compression for acceleration of inference. This work proposes a hardware-aware pruning approach that can fully adapt to the loop tiling technique of FPGA design and is applied onto a novel 3D network called R(2+1)D. Leveraging the powerful ADMM, the proposed pruning method achieves simultaneous high accuracy and significant acceleration of computation on FPGA. With layer-wise pruning rates up to 10× and negligible accuracy loss, the pruned model is implemented on a Xilinx ZCU102 FPGA board, where the pruned model achieves 2.6× speedup compared with the unpruned version, and 2.3× speedup and 2.3× power efficiency improvement compared with state-of-the-art FPGA implementation of C3D.more » « less
-
Computational imaging systems with embedded processing have potential advantages in power consumption, computing speed, and cost. However, common processors in embedded vision systems have limited computing capacity and low level of parallelism. The widely used iterative algorithms for image reconstruction rely on floating-point processors to ensure calculation precision, which require more computing resources than fixed-point processors. Here we present a regularized Landweber fixed-point iterative solver for image reconstruction, implemented on a field programmable gated array (FPGA). Compared with floating-point embedded uniprocessors, iterative solvers implemented on the fixed-point FPGA gain 1 to 2 orders of magnitude acceleration, while achieving the same reconstruction accuracy in comparable number of effective iterations. Specifically, we have demonstrated the proposed fixed-point iterative solver in fiber borescope image reconstruction, successfully correcting the artifacts introduced by the lenses and fiber bundle.more » « less
-
In-memory computing (IMC) provides energy- efficient solutions to deep neural networks (DNN). Most IMC de- signs for DNNs employ fixed-point precisions. However, floating- point precision is still required for DNN training and complex inference models to maintain high accuracy. There have not been float-point precision based IMC works in the literature where the float-point computation is immersed into the weight memory storage. In this work, we propose a novel floating-point precision IMC macro with a configurable architecture that supports both normal 8-bit floating point (FP8) and 8-bit block floating point (BF8) with a shared exponent. The proposed FP-IMC macro implemented in 28nm CMOS demonstrates 12.1 TOPS/W for FP8 precision and 66.6 TOPS/W for BF8 precision, improving energy-efficiency beyond the state-of-the-art FP IMC macros.more » « less
An official website of the United States government

