skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DEEP: Developing Extremely Efficient Runtime On-Chip Power Meters
Accurate and efficient on-chip power modeling is crucial to runtime power, energy, and voltage management. Such power monitoring can be achieved by designing and integrating on-chip power meters (OPMs) into the target design. In this work, we propose a new method named DEEP to automatically develop extremely efficient OPM solutions for a given design. DEEP selects OPM inputs from all individual bits in RTL signals. Such bit-level selection provides an unprecedentedly large number of input candidates and supports lower hardware cost, compared with signal-level selection in prior works. In addition, DEEP proposes a powerful two-step OPM input selection method, and it supports reporting both total power and the power of major design components. Experiments on a commercial microprocessor demonstrate that DEEP’s OPM solution achieves correlation 𝑅 > 0.97 in per-cycle power prediction with an unprecedented low area overhead on hardware, i.e., < 0.1% of the microprocessor layout. This reduces the OPM hardware cost by 4 − 6× compared with the state-of-the-art solution.  more » « less
Award ID(s):
2106828
PAR ID:
10435297
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
Page Range / eLocation ID:
1 to 9
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the rising adoption of deep neural networks (DNNs) for commercial and high-stakes applications that process sensitive user data and make critical decisions, security concerns are paramount. An adversary can undermine the confidentiality of user input or a DNN model, mislead a DNN to make wrong predictions, or even render a machine learning application unavailable to valid requests. While security vulnerabilities that enable such exploits can exist across multiple levels of the technology stack that supports machine learning applications, the hardware-level vulnerabilities can be particularly problematic. In this article, we provide a comprehensive review of the hardware-level vulnerabilities affecting domain-specific DNN inference accelerators and recent progress in secure hardware design to address these. As domain-specific DNN accelerators have a number of differences compared to general-purpose processors and cryptographic accelerators where the hardware-level vulnerabilities have been thoroughly investigated, there are unique challenges and opportunities for secure machine learning hardware. We first categorize the hardware-level vulnerabilities into three scenarios based on an adversary’s capability: 1) an adversary can only attack the off-chip components, such as the off-chip DRAM and the data bus; 2) an adversary can directly attack the on-chip structures in a DNN accelerator; and 3) an adversary can insert hardware trojans during the manufacturing and design process. For each category, we survey recent studies on attacks that pose practical security challenges to DNN accelerators. Then, we present recent advances in the defense solutions for DNN accelerators, addressing those security challenges with circuit-, architecture-, and algorithm-level techniques. 
    more » « less
  2. The continuous growth of CNN complexity not only intensifies the need for hardware acceleration but also presents a huge challenge. That is, the solution space for CNN hardware design and dataflow mapping becomes enormously large besides the fact that it is discrete and lacks a well behaved structure. Most previous works either are stochastic metaheuristics, such as genetic algorithm, which are typically very slow for solving large problems, or rely on expensive sampling, e.g., Gumbel Softmax-based differentiable optimization and Bayesian optimization. We propose an analytical model for evaluating power and performance of CNN hardware design and dataflow solutions. Based on this model, we introduce a co-optimization method consisting of nonlinear programming and parallel local search. A key innovation in this model is its matrix form, which enables the use of deep learning toolkit for highly efficient computations of power/performance values and gradients in the optimization. In handling power-performance tradeoff, our method can lead to better solutions than minimizing a weighted sum of power and latency. The average relative error of our model compared with Timeloop is as small as 1%. Compared to state-of-the-art methods, our approach achieves solutions with up to 1.7 × shorter inference latency, 37.5% less power consumption, and 3 × less area on ResNet 18. Moreover, it provides a 6.2 × speedup of optimization 
    more » « less
  3. Post-silicon validation is a vital step in System-on-Chip (SoC) design cycle. A major challenge in post-silicon validation is the limited observability of internal signal states using trace buffers. Hardware assertions are promising to improve observability during post-silicon debug. Unfortunately, we cannot synthesize thousands (or millions) of pre-silicon assertions as hardware checkers (coverage monitors) due to hardware overhead constraints. Therefore, we need to select the most profitable assertions based on design constraints. However, the design constraints can also change dynamically during the device lifetime due to changes in use-case scenarios as well as input variations. Therefore, assertion selection needs to dynamically adapt based on changing circumstances. In this article, we propose an assertion-based post-silicon validation framework to address the above challenges. Specifically, this article makes two important contributions. We propose a clustering-based assertion selection technique that can select the most profitable pre-silicon assertions to maximize the fault coverage. We also present a cost-aware dynamic refinement technique that can select beneficial hardware checkers during runtime based on changing design constraints. Experimental evaluation demonstrates that our proposed pre-silicon assertion selection can outperform state-of-the-art assertion ranking methods (Goldmine and HARM). The results also highlight that our proposed post-silicon dynamic refinement can accurately predict area (less than 5% error) and power consumption (less than 3% error) of hardware checkers at runtime. This accurate prediction enables the identification of the most profitable hardware checkers based on design constraints. 
    more » « less
  4. Abstract We present a novel photonic chip design for high bandwidth four-degree optical switches that support high-dimensional switching mechanisms with low insertion loss and low crosstalk in a low power consumption level and a short switching time. Such four-degree photonic chips can be used to build an integrated full-grid Photonic-on-Chip Network (PCN). With four distinct input/output directions, the proposed photonic chips are superior compared to the current bidirectional photonic switches, where a conventionally sizable PCN can only be constructed as a linear chain of bidirectional chips. Our four-directional photonic chips are more flexible and scalable for the design of modern optical switches, enabling the construction of multi-dimensional photonic chip networks that are widely applied for intra-chip communication networks and photonic data centers. More noticeably, our photonic networks can be self-controlling with our proposed Multi-Sample Discovery model, a deep reinforcement learning model based on Proximal Policy Optimization. On a PCN, we can optimize many criteria such as transmission loss, power consumption, and routing time, while preserving performance and scaling up the network with dynamic changes. Experiments on simulated data demonstrate the effectiveness and scalability of the proposed architectural design and optimization algorithm. Perceivable insights make the constructed architecture become the self-controlling photonic-on-chip networks. 
    more » « less
  5. Today's CMOS technologies allow larger circuit designs to fit on a single chip. However, this advantage comes at a high price of increased process-voltage-temperature (PVT) variations. FPGAs and their designs are no exceptions to such variations. In fact, the same bit file loaded into two different FPGAs of the same model can produce a significant difference in power and thermal characteristics due to variations that exist within the chip. Since it is increasingly difficult to control physical variations through manufacturing tasks, there is a need for practical ways to sense chip variations to provide a way for circuit designers to compensate or avoid its negative effects. One of the most critical aspects of such variation is power. Therefore, we developed and demonstrated a high accuracy on-chip on-line Energy-per-Component (EPC) measurement technology on Xilinx FPGAs since 2011. However, we found that the hardware overhead associated with such method limited the use of the technology. Therefore, our follow-up work in Energy-per-Operation (EPO) on Spartan FPGA with OpenRISC SoC produced an equally accurate power monitoring technology with drastically lower hardware overhead. While this method made our technology more practical for SoC designs on FPGAs, it did not produce component level power dissipation data that previous EPC method provided. Therefore, we extend this prior work with a new algorithm to extract EPC values from EPO result. Despite the lower hardware overhead, this change ended up improving the accuracy of the power result by unraveling the instruction-level abstraction into component-level energy consumption. 
    more » « less