skip to main content

Search for: All records

Award ID contains: 2007135

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. This work proposes a new dynamic thermal and reliability management framework via task mapping and migration to improve thermal performance and reliability of commercial multi-core processors considering workload-dependent thermal hot spot stress. The new method is motivated by the observation that different workloads activate different spatial power and thermal hot spots within each core of processors. Existing run-time thermal management, which is based on on-chip location-fixed thermal sensor information, can lead to suboptimal management solutions as the temperatures provided by those sensors may not be the true hot spots. The new method, called Hot-Trim, utilizes a machine learning-based approach to characterize the power density hot spots across each core, then a new task mapping/migration scheme is developed based on the hot spot stresses. Compared to existing works, the new approach is the first to optimize VLSI reliabilities by exploring workload-dependent power hot spots. The advantages of the proposed method over the Linux baseline task mapping and the temperature-based mapping method are demonstrated and validated on real commercial chips. Experiments on a real Intel Core i7 quad-core processor executing PARSEC-3.0 and SPLASH-2 benchmarks show that, compared to the existing Linux scheduler, core and hot spot temperature can be lowered by 1.15 to 1.31C. In addition, Hot-Trim can improve the chip's EM, NBTI and HCI related reliability by 30.2%, 7.0% and 31.1% respectively compared to Linux baseline without any performance degradation. Furthermore, it improves EM and HCI related reliability by 29.6% and 19.6% respectively, and at the same time even further reduces the temperature by half a degree compared to the conventional temperature-based mapping technique. 
    more » « less
  2. In this paper, we propose a new spatial temperature aware transient EM induced stress analysis method. The new method consists of two new contributions: First, we propose a new TM-aware void saturation volume estimation method for fast immortality check in the post-voiding phase for the first time. We derive the analytic formula to estimate the void saturation in the presence of spatial temperature gradients due to Joule heating. Second, we developed a fast numerical solution for EM-induced stress analysis for multi-segment interconnect trees considering TM effect. The new method first transforms the coupled EM-TM partial differential equations into linear time-invariant ordinary differential equations (ODEs). Then extended Krylov subspace-based reduction technique is employed to reduce the size of the original system matrices so that they can be efficiently simulated in the time domain. The proposed method can perform the simulation process for both void nucleation and void growth phases under time-varying input currents and position-dependent temperatures. The numerical results show that, compared to the recently proposed semi-analytic EM-TM method, the proposed method can lead to about 28x speedup on average for the interconnect with up to 1000 branches for both void nucleation and growth phases with negligible errors. 
    more » « less
  3. null (Ed.)
    Electromigration (EM) becomes a major concern for VLSI circuits as the technology advances in the nanometer regime. With Korhonen equations, EM assessment for VLSI circuits remains challenged due to the increasing integrated density. VLSI multisegment interconnect trees can be naturally viewed as graphs. Based on this observation, we propose a new graph convolution network (GCN) model, which is called {\it EMGraph} considering both node and edge embedding features, to estimate the transient EM stress of interconnect trees. Compared with recently proposed generative adversarial network (GAN) based stress image-generation method, EMGraph model can learn more transferable knowledge to predict stress distributions on new graphs without retraining via inductive learning. Trained on the large dataset, the model shows less than 1.5% averaged error compared to the ground truth results and is orders of magnitude faster than both COMSOL and state-of-the-art method. It also achieves smaller model size, 4X accuracy and 14X speedup over the GAN-based method. 
    more » « less
  4. In this paper, we propose a new dynamic reliability technique using an accuracy-reconfigurable stochastic computing (ARSC) framework for deep learning computing. Unlike the conventional stochastic computing that conducts design time accuracy power/energy trade-off, the new ARSC design can adjust the bit-width of the data in run time. Hence, the ARSC can mitigate the long-term aging effects by slowing the system clock frequency, while maintaining the inference throughput by reducing the data bit-width at a small cost of accuracy. We show how to implement the recently proposed counter-based SC multiplication and bit-width reduction on a layer-wise quantization scheme for CNN networks with dynamic fixed-point data. We validate an ARSC-based five-layer convolutional neural network designs for the MNIST dataset based on Vivado HLS with constraints from Xilinx Zynq-7000 family xc7z045 platform. Experimental results show that new ARSC DNN can sufficiently compensate the NBTI induced aging effects in 10 years with marginal classification accuracy loss while maintaining or even exceeding the pre-aging computing throughput. At the same time, the proposed ARSC computing framework also reduces the active power consumption due to the frequency scaling, which can further improve system reliability due to the reduced temperature. 
    more » « less
  5. null (Ed.)