skip to main content


Search for: All records

Award ID contains: 2113928

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. This work proposes a new dynamic thermal and reliability management framework via task mapping and migration to improve thermal performance and reliability of commercial multi-core processors considering workload-dependent thermal hot spot stress. The new method is motivated by the observation that different workloads activate different spatial power and thermal hot spots within each core of processors. Existing run-time thermal management, which is based on on-chip location-fixed thermal sensor information, can lead to suboptimal management solutions as the temperatures provided by those sensors may not be the true hot spots. The new method, called Hot-Trim, utilizes a machine learning-based approach to characterize the power density hot spots across each core, then a new task mapping/migration scheme is developed based on the hot spot stresses. Compared to existing works, the new approach is the first to optimize VLSI reliabilities by exploring workload-dependent power hot spots. The advantages of the proposed method over the Linux baseline task mapping and the temperature-based mapping method are demonstrated and validated on real commercial chips. Experiments on a real Intel Core i7 quad-core processor executing PARSEC-3.0 and SPLASH-2 benchmarks show that, compared to the existing Linux scheduler, core and hot spot temperature can be lowered by 1.15 to 1.31C. In addition, Hot-Trim can improve the chip's EM, NBTI and HCI related reliability by 30.2%, 7.0% and 31.1% respectively compared to Linux baseline without any performance degradation. Furthermore, it improves EM and HCI related reliability by 29.6% and 19.6% respectively, and at the same time even further reduces the temperature by half a degree compared to the conventional temperature-based mapping technique. 
    more » « less
  2. Stochastic computing (SC) can lead area-efficient implementation of logic designs. Existing SC multiplication, however, suffers a long-standing problem: large multiplication error with small inputs due to its intrinsic nature of bit-stream based computing. In this article, we propose a new scaled counting-based SC multiplication approach, called {\it Scaled-CBSC}, to mitigate this issue by introducing scaling bits to ensure the bit `1' density of the stochastic number is sufficiently large. The idea is to convert the ``small'' inputs to ``large'' inputs, thus improve the accuracy of SC multiplication. But different from an existing stream-bit based approach, the new method uses the binary format and does not require stochastic addition as the SC multiplication always starts with binary numbers. Furthermore, Scaled-CBSC only requires all the numbers to be larger than 0.5 instead of arbitrary defined threshold, which leads to integer numbers only for the scaling term. The experimental results show that the 8-bit Scaled-CBSC multiplication with 3 scaling bits can achieve up to 46.6\% and 30.4\% improvements in mean error and standard deviation, respectively; reduce the peak relative error from 100\% to 1.8\%; and improve 12.6\%, 51.5\%, 57.6\%, 58.4\% in delay, area, area-delay product, energy consumption, respectively, over the state of art work. 
    more » « less
  3. 2.5D chiplet-based technology promises an efficient integration technique for advanced designs with more functionality and higher performance. Temperature and related thermal optimization, heat removal are of critical importance for temperature-aware physical synthesis for chiplets. This paper presents a novel graph convolutional networks (GCN) architecture to estimate the thermal map of the 2.5D chiplet-based systems with the thermal resistance networks built by the compact thermal model (CTM). First, we take the total power of all chiplets as an input feature, which is a global feature. This additional global information can overcome the limitation that the GCN can only extract local information via neighborhood aggregation. Second, inspired by convolutional neural networks (CNN), we add skip connection into the GCN to pass the global feature directly across the hidden layers with the concatenation operation. Third, to consider the edge embedding feature, we propose an edge-based attention mechanism based on the graph attention networks (GAT). Last, with the multiple aggregators and scalers of principle neighborhood aggregation (PNA) networks, we can further improve the modeling capacity of the novel GCN. The experimental results show that the proposed GCN model can achieve an average RMSE of 0.31 K and deliver a 2.6$\times$ speedup over the fast steady-state solver of open-source {\it HotSpot} based on SuperLU. More importantly, the GCN model demonstrates more useful generalization or transferable capability. Our results show that the trained GCN can be directly applied to predict thermal maps of six unseen datasets with acceptable mean RMSEs of less than 0.67 K without retraining via inductive learning. 
    more » « less
  4. null (Ed.)
    In this article, we address the problem of accurate full-chip power and thermal map estimation for commercial off-the-shelf multi-core processors. Processors operating with heat sink cooling remains a challenging problem due to the difficulty in direct measurement. We first propose an accurate full-chip steady-state power density map estimation method for commercial multi-core microprocessors. The new method consists of a few steps. First, 2D spatial Laplace operation is performed on the measured thermal maps (images) without heat sink to obtain the so-called "raw power maps". Then, a novel scheme is developed to generate the true power density maps from the raw power density maps. The new approach is based on thermal measurements of the processor with back-side cooling using an advanced infrared (IR) thermal imaging system. FEM thermal model constructed in COMSOL Multiphysics is used to validate the estimated power density maps and thermal conductivity. Later, this work creates a high-fidelity FEM thermal model with heat sink and reconstructs the full-chip thermal maps while the heat sink is on. Ensuring that power maps are similar under back cooling and heat sink cooling settings, the reconstructed thermal maps are verified by the matching between the on-chip thermal sensor readings and the corresponding elements of thermal maps. Experiments on an Intel i7-8650U 4-core processor with back cooling shows 96\% similarity (2D correlation) between the measured thermal maps and the thermal maps reconstructed from the estimated power maps, with 1.3$\rm ^\circ$C average absolute error. Under heat sink cooling, the average absolute error is 2.2$\rm ^\circ$C over a 56$\rm ^\circ$C temperature range and about 3.9\% error between the computed and the real thermal maps at the sensor locations. Furthermore, the proposed power map estimation method achieves higher resolution and at least 100$\times$ speedup than a recently proposed state-of-art Blind Power Identification method. 
    more » « less
  5. In this paper, we propose a novel transient full-chip thermal map estimation method for multi-core commercial CPU based on the data-driven generative adversarial learning method. We treat the thermal modeling problem as an image-generation problem using the generative neural networks. In stead of using traditional functional unit powers as input, the new models are directly based on the measurable real-time high level chip utilizations and thermal sensor information of commercial chips without any assumption of additional physical sensors requirement. The resulting thermal map estimation method, called {\it ThermGAN} can provide tool-accurate full-chip {\it transient} thermal maps from the given performance monitor traces of commercial off-the-shelf multi-core processors. In our work, both generator and discriminator are composed of simple convolutional layers with Wasserstein distance as loss function. ThermGAN can provide the transient and real-time thermal map without using any historical data for training and inferences, which is contrast with a recent RNN-based thermal map estimation method in which historical data is needed. Experimental results show the trained model is very accurate in thermal estimation with an average RMSE of 0.47C, namely, 0.63\% of the full-scale error. Our data further show that the speed of the model is faster than 7.5ms per inference, which is two orders of magnitude faster than the traditional finite element based thermal analysis. Furthermore, the new method is about 4x more accurate than recently proposed LSTM-based thermal map estimation method and has faster inference speed. It also achieves about 2x accuracy with much less computational cost than a state-of-the-art pre-silicon based estimation method. 
    more » « less