skip to main content

Title: An online quality management framework for approximate communication in network-on-chips
Approximate communication is being seriously considered as an effective technique for reducing power consumption and improving the communication efficiency of network-on-chips (NoCs). A major problem faced by these techniques is quality control: how do we ensure that the network will transmit data with sufficient accuracy for applications to produce acceptable results? Previous methods that addressed this issue require each application to calculate the approximation level for every piece of approximable data, which takes hundreds of cycles. So the approximation information is often not available when a request packet is transmitted. Therefore, the reply packet with the approximable data is transmitted with unnecessarily absolute accuracy, reducing the effectiveness of approximate communication. In this paper, we propose a hardware-based quality management framework for approximate communication to minimize the time needed for the approximation level calculation. The proposed framework employs a configuration algorithm to continuously adjust the quality of every piece of data based on the difference between the output quality and the application's quality requirement. When the proposed framework is implemented in a network, every request packet can be transmitted with the updated approximation level. This framework results in fewer flits in each data packet and reduces traffic in NoCs while meeting the quality requirements of applications. Our cycle-accurate simulation using the AxBench benchmark suite shows that the proposed online quality management framework can reduce network latency by up to 52% and dynamic power consumption by 59% compared to previous approximate communication techniques while ensuring 95% output quality. This hardware-software codesign incurs 1% area overhead over previous techniques.  more » « less
Award ID(s):
1812495 1901165 1953980
Author(s) / Creator(s):
Date Published:
Journal Name:
ACM International Conference on Supercomputing
Page Range / eLocation ID:
217 to 226
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Network-on-Chips (NoCs) have emerged as the standard on-chip communication fabrics for multi/many core systems and system on chips. However, as the number of cores on chip increases, so does power consumption. Recent studies have shown that NoC power consumption can reach up to 40% of the overall chip power. Considerable research efforts have been deployed to significantly reduce NoC power consumption. In this paper, we build on approximate computing techniques and propose an approximate communication methodology called DEC-NoC for reducing NoC power consumption. The proposed DEC-NoC leverages applications' error tolerance and dynamically reduces the amount of error checking and correction in packet transmission, which results in a significant reduction in the number of retransmitted packets. The reduction in packet retransmission results in reduced power consumption. Our cycle accurate simulation using PARSEC benchmark suites shows that DEC-NoC achieves up to 56% latency reduction and up to 58% dynamic power reduction compared to NoC architectures with conventional error control techniques. 
    more » « less
  2. As technology scales, Network-on-Chips (NoCs), currently being used for on-chip communication in manycore architectures, face several problems including high network latency, excessive power consumption, and low reliability. Simultaneously addressing these problems is proving to be difficult due to the explosion of the design space and the complexity of handling many trade-offs. In this paper, we propose IntelliNoC, an intelligent NoC design framework which introduces architectural innovations and uses reinforcement learning to manage the design complexity and simultaneously optimize performance, energy-efficiency, and reliability in a holistic manner. IntelliNoC integrates three NoC architectural techniques: (1) multifunction adaptive channels (MFACs) to improve energy-efficiency; (2) adaptive error detection/correction and re-transmission control to enhance reliability; and (3) a stress-relaxing bypass feature which dynamically powers off NoC components to prevent overheating and fatigue. To handle the complex dynamic interactions induced by these techniques, we train a dynamic control policy using Q-learning, with the goal of providing improved fault-tolerance and performance while reducing power consumption and area overhead. Simulation using PARSEC benchmarks shows that our proposed IntelliNoC design improves energy-efficiency by 67% and mean-time-to-failure (MTTF) by 77%, and decreases end-to-end packet latency by 32% and area requirements by 25% over baseline NoC architecture. 
    more » « less
  3. null (Ed.)
    Machine learning applied to architecture design presents a promising opportunity with broad applications. Recent deep reinforcement learning (DRL) techniques, in particular, enable efficient exploration in vast design spaces where conventional design strategies may be inadequate. This paper proposes a novel deep reinforcement framework, taking routerless networks-on-chip (NoC) as an evaluation case study. The new framework successfully resolves problems with prior design approaches, which are either unreliable due to random searches or inflexible due to severe design space restrictions. The framework learns (near-)optimal loop placement for routerless NoCs with various design constraints. A deep neural network is developed using parallel threads that efficiently explore the immense routerless NoC design space with a Monte Carlo search tree. Experimental results show that, compared with conventional mesh, the proposed deep reinforcement learning (DRL) routerless design achieves a 3.25x increase in throughput, 1.6x reduction in packet latency, and 5x reduction in power. Compared with the state-of-the-art routerless NoC, DRL achieves a 1.47x increase in throughput, 1.18x reduction in packet latency, 1.14x reduction in average hop count, and 6.3% lower power consumption. 
    more » « less
  4. With the expansion of sensor nodes to newer avenues of technologies, such as the Internet of things (IoT), internet of bodies (IoB), augmented reality (AR), and mixed reality, the demand to support high-speed operations, such as audio and video, with a minimal increase in power consumption is gaining much traction. In this work, we focus on these nodes operating in audio-based AR (AAR) and explore the opportunity of supporting audio at a low power budget. For sensor nodes, communicating one bit of data usually consumes significantly higher power than the power associated with sensing and processing/computing one data bit. Compressing the number of communication bits at the expense of a few computation cycles considerably reduces the overall power consumption of the nodes. Audio codecs such as AAC and LDAC that currently perform compression and decompression of audio streams burn significant power and create a floor to the minimum power possible in these applications. Compressive sensing (CS), a powerful mathematical tool for compression, is often used in physiological signal sensing, such as EEG and ECG, and it can offer a promising low-power alternative to audio codecs. We introduce a new paradigm of using the CS-based approach to realize audio compression that can function as a new independent technique or augment the existing codecs for a higher level of compression. This work, CS-Audio, fabricated in TSMC 65-nm CMOS technology, presents the first CS-based compression, equipped with an ON-chip DWT sparsifier for non-sparse audio signals. The CS design, realized in a pipelined architecture, achieves high data rates and enables a wake-up implementation to bypass computation for insignificant input samples, reducing the power consumption of the hardware. The measurement results demonstrate a 3X-15X reduction in transmitted audio data without a perceivable degradation of audio quality, as indicated by the perceptual evaluation of audio quality mean opinion score (PEAQ MOS) >1.5. The hardware consumes 238 μW power at 0.65 V and 15 Mbps, which is (~20X-40X) lower than audio codecs. 
    more » « less
  5. In this article, we present a low-energy inference method for convolutional neural networks in image classification applications. The lower energy consumption is achieved by using a highly pruned (lower-energy) network if the resulting network can provide a correct output. More specifically, the proposed inference method makes use of two pruned neural networks (NNs), namely mildly and aggressively pruned networks, which are both designed offline. In the system, a third NN makes use of the input data for the online selection of the appropriate pruned network. The third network, for its feature extraction, employs the same convolutional layers as those of the aggressively pruned NN, thereby reducing the overhead of the online management. There is some accuracy loss induced by the proposed method where, for a given level of accuracy, the energy gain of the proposed method is considerably larger than the case of employing any one pruning level. The proposed method is independent of both the pruning method and the network architecture. The efficacy of the proposed inference method is assessed on Eyeriss hardware accelerator platform for some of the state-of-the-art NN architectures. Our studies show that this method may provide, on average, 70% energy reduction compared to the original NN at the cost of about 3% accuracy loss on the CIFAR-10 dataset. 
    more » « less