In this work, we leverage the uni-polar switching behavior of Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM) to develop an efficient digital Computing-in-Memory (CiM) platform named XOR-CiM. XOR-CiM converts typical MRAM sub-arrays to massively parallel computational cores with ultra-high bandwidth, greatly reducing energy consumption dealing with convolutional layers and accelerating X(N)OR-intensive Binary Neural Networks (BNNs) inference. With a similar inference accuracy to digital CiMs, XOR-CiM achieves ∼4.5× and 1.8× higher energy-efficiency and speed-up compared to the recent MRAM-based CiM platforms.
more »
« less
A Near-Sensor Processing Accelerator for Approximate Local Binary Pattern Networks
In this work, a high-speed and energy-efficient comparator-based N ear- S ensor L ocal B inary P attern accelerator architecture (NS-LBP) is proposed to execute a novel local binary pattern deep neural network. First, inspired by recent LBP networks, we design an approximate, hardware-oriented, and multiply-accumulate (MAC)-free network named Ap-LBP for efficient feature extraction, further reducing the computation complexity. Then, we develop NS-LBP as a processing-in-SRAM unit and a parallel in-memory LBP algorithm to process images near the sensor in a cache, remarkably reducing the power consumption of data transmission to an off-chip processor. Our circuit-to-application co-simulation results on MNIST and SVHN datasets demonstrate minor accuracy degradation compared to baseline CNN and LBP-network models, while NS-LBP achieves 1.25 GHz and an energy-efficiency of 37.4 TOPS/W. NS-LBP reduces energy consumption by 2.2× and execution time by a factor of 4× compared to the best recent LBP-based networks.
more »
« less
- PAR ID:
- 10427746
- Date Published:
- Journal Name:
- IEEE Transactions on Emerging Topics in Computing
- ISSN:
- 2376-4562
- Page Range / eLocation ID:
- 1 to 11
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning , and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task. Further, quantization-aware pruning typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization. Surprisingly, while networks with different training configurations can have similar performance for the benchmark application, the information content in the network can vary significantly, affecting its generalizability.more » « less
-
Energy consumption of memory accesses dominates the compute energy in energy-constrained robots, which require a compact 3-D map of the environment to achieve autonomy. Recent mapping frameworks only focused on reducing the map size while incurring significant memory usage during map construction due to the multipass processing of each depth image. In this work, we present a memory-efficient continuous occupancy map, named GMMap, that accurately models the 3-D environment using a Gaussian mixture model (GMM). Memory efficient GMMap construction is enabled by the single-pass compression of depth images into local GMMs, which are directly fused together into a globally-consistent map. By extending Gaussian Mixture Regression (GMR) to model unexplored regions, occupancy probability is directly computed from Gaussians. Using a low power ARM Cortex A57 CPU, GMMap can be constructed in real time at up to 60 images/s. Compared with prior works, GMMap maintains high accuracy while reducing the map size by at least 56%, memory overhead by at least 88%, dynamic random-access memory (DRAM) access by at least 78%, and energy consumption by at least 69%. Thus, GMMap enables real-time 3-D mapping on energy-constrained robots.more » « less
-
Dynamic trip optimization in electric rail networks is a relatively unexplored topic. In this paper, we propose a transactive controller that includes an optimization framework and a control algorithm that enable minimum cost operation of an electric rail network. The optimization framework attempts to minimize the operational costs for a given electricity price by allowing variations of the trains’ acceleration profiles and therefore their power consumption and energy costs. Constraints imposed by the train dynamics, their schedules, and power consumption are included in this framework. A control algorithm is then proposed to optimize the electricity price through an iterative procedure that combines the desired demand profiles obtained from the optimization framework together with the variations in Distributed Energy Resources (DERs) while ensuring power balance. Together, they form to an overall framework that yields the desired transactions between the railway and power grid infrastructures. This approach is validated using simulation studies of the Southbound Amtrak service along the Northeast Corridor (NEC) between Boston, MA and New Haven, CT in the United States, reducing energy costs by 10% when compared to standard trip optimization based on minimum work.more » « less
-
null (Ed.)With the recent advances in both machine learning and embedded systems research, the demand to deploy computational models for real-time execution on edge devices has increased substantially. Without deploying computational models on edge devices, the frequent transmission of sensor data to the cloud results in rapid battery draining due to the energy consumption of wireless data transmission. This rapid power dissipation leads to a considerable reduction in the battery lifetime of the system, therefore jeopardizing the real-world utility of smart devices. It is well-established that for difficult machine learning tasks, models with higher performance often require more computation power and thus are not power-efficient choices for deployment on edge devices. However, the trade-offs between performance and power consumption are not well studied. While numerous methods (e.g., model compression) have been developed to obtain an optimal model, these methods focus on improving the efficiency of a single model. In an entirely new direction, we introduce an effective method to find a combination of multiple models that are optimal in terms of power-efficiency and performance by solving an optimization problem in which both performance and power consumption are taken into account. Experimental results demonstrate that on the ImageNet dataset, we can achieve a 20% energy reduction with only 0.3% accuracy drop compared to Squeeze-and-Excitation Networks. Compared to a pruned convolutional neural network for human activity recognition, while consuming 1.7% less energy, our proposed policy achieves 1.3% higher accuracy.more » « less
An official website of the United States government

