Real-Time Full-Chip Thermal Tracking: A Post-Silicon, Machine Learning Perspective
In this work, we present a novel approach to real-time tracking of full-chip heatmaps for commercial off-the-shelf microprocessors based on machine-learning. The proposed post-silicon approach, named RealMaps, only uses the existing embedded temperature sensors and workload-independent utilization information, which are available in real-time. Moreover, RealMaps does not require any knowledge of the proprietary design details or manufacturing process-specific information of the chip. Consequently, the methods presented in this work can be implemented by either the original chip manufacturer or a third party alike, and is aimed at supplementing, rather than substituting, the temperature data sensed from the existing embedded sensors. The new approach starts with offline acquisition of accurate spatial and temporal heatmaps using an infrared thermal imaging setup while nominal working conditions are maintained on the chip. To build the dynamic thermal model, a temporal-aware long-short-term-memory (LSTM) neutral network is trained with system-level features such as chip frequency, instruction counts, and other high-level performance metrics as inputs. Instead of a pixel-wise heatmap estimation, we perform 2D spatial discrete cosine transformation (DCT) on the heatmaps so that they can be expressed with just a few dominant DCT coefficients. This allows for the model to be built to estimate just the more »
Authors:
; ; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10324767
Journal Name:
IEEE Transactions on Computers
Page Range or eLocation-ID:
1 to 1
ISSN:
0018-9340
1. In tis work, we propose a novel approach to real-time estimation of full-chip transient heatmaps for commercial processors based on machine learning. The model derived in this work supplements the temperature data sensed from the existing on-chip sensors, allowing for the development of more robust runtime power and thermal control schemes that can take advantage of the additional thermal information that is otherwise not available. The new approach involves offline acquisition of accurate spatial and temporal heatmaps using an infrared thermal imaging setup while nominal working conditions are maintained on the chip. To build the dynamic thermal model, we apply Long-Short-Term-Memory (LSTM) neutral networks with system-level variables such as chip frequency, instruction counts, and other performance metrics as inputs. To reduce the dimensionality of the model, 2D spatial discrete cosine transformation (DCT) is first performed on the heatmaps so that they can be expressed with just their dominant DCT frequencies. Our study shows that only $6\times 6$ DCT coefficients are required to maintain sufficient accuracy across a variety of workloads. Experimental results show that the proposed approach can estimate the full-chip heatmaps with less than 1.4C root-mean-square-error and take only 19ms for each inference which suits well for real-time use.