skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the Correlation between Resource Minimization and Interconnect Complexities in High-Level Synthesis
As the technology node of VLSI designs advances to sub10 nm, two interconnect-centric metrics of a circuit, the interconnect complexity (either number of interconnects or wirelength/WL) and congestion, become critically important across all design stages alongside conventional resource or function-unit (FU)-centric metrics like area/number-of-FUs and leakage power. High Level synthesis (HLS), one of the earliest and most impactful design stages, rarely monitors interconnect metrics, which makes their recovery at later stages very difficult. HLS algorithms and tools typically perform FU-centric minimization via operation scheduling, module selection (S&MS) and binding. As a consequence, it mostly overlooks interconnect-based metrics. In this paper, we explore whether this can adversely affect interconnect metrics, and in general explore the correlation between FU-centric optimization in S&MS, and the resulting interconnect metrics co-optimized (along with FU metrics) in the later binding stage(s). For this purpose we develop a probabilistic analysis for post-scheduling binding to estimate interconnect metrics, and verify its accuracy by comparison to empirical results across different scheduling techniques that generate different degrees of FU optimization. Based on both empirical and analytical results we predict how interconnects metrics will pan out with different degrees of FU optimization. Finally, based on our analysis, we also provide suggestions to improve interconnect metrics for whatever FU optimization degree an available S&MS technique can achieve.  more » « less
Award ID(s):
2035610
PAR ID:
10403147
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2021 22nd International Symposium on Quality Electronic Design (ISQED)
Page Range / eLocation ID:
355 to 360
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The architecture of a coarse-grained reconfigurable array (CGRA) interconnect has a significant effect on not only the flexibility of the resulting accelerator, but also its power, performance, and area. Design decisions that have complex trade-offs need to be explored to maintain efficiency and performance across a variety of evolving applications. This paper presents Canal, a Python-embedded domain-specific language (eDSL) and compiler for specifying and generating reconfigurable interconnects for CGRAs. Canal uses a graph-based intermediate representation (IR) that allows for easy hardware generation and tight integration with place and route tools. We evaluate Canal by constructing both a fully static interconnect and a hybrid interconnect with ready-valid signaling, and by conducting design space exploration of the interconnect architecture by modifying the switch box topology, the number of routing tracks, and the interconnect tile connections. Through the use of a graph-based IR for CGRA interconnects, the eDSL, and the interconnect generation system, Canal enables fast design space exploration and creation of CGRA interconnects. 
    more » « less
  2. This work presents an analytical approach for analyzing electromigration (EM) in modern technologies that use copper dual damascene (Cu DD) interconnects. In these technologies, due to design rule and methodology constraints, wires are typically laid out unidirectionally in each metal layer; since EM in Cu DD interconnects do not cross layer boundaries, the problem reduces to one of analyzing EM in multisegment interconnect lines. In contrast with traditional empirical methodologies, our approach is based on physics-based modeling, directly solving the differential equations that model EM-induced stress. This article places a focus on interconnect lines, for reasons described above, and introduces the new concept of boundary reflections of stress flux that ascribes a physical (wave-like) analogy to the transient stress behavior in a finite multisegment line. This framework is used to derive analytical expressions of transient EM stress for lines with any number of segments, which can also be tailored to include the appropriate number of terms for any desired level of accuracy. The approach is applied to both the nucleation phase and the postvoiding phase on large power grid benchmarks. These experiments demonstrate excellent accuracy as compared to accurate numerical solution, as well as linear complexity with the number of segments for evaluating stress at a specified point and time. 
    more » « less
  3. High-level synthesis (HLS) is an automated design process that transforms high-level code into optimized hardware designs, enabling rapid development of efficient hardware accelerators for various applications such as image processing, machine learning, and signal processing. To achieve optimal performance, HLS tools rely on pragmas, which are directives inserted into the source code to guide the synthesis process, and these pragmas can have various settings and values that significantly impact the resulting hardware design. State-of the-art ML-based HLS methods, such as harp, first train a deep learning model, typically based on graph neural networks (GNNs) applied to graph-based representations of the source code and its pragmas. They then perform design space exploration (DSE) to explore the pragma design space, rank candidate designs using the trained model, and return the top designs as the final designs. However, traditional DSE methods face challenges due to the highly nonlinear relationship between pragma settings and performance metrics, along with complex interactions between pragmas that affect performance in non-obvious ways. To address these challenges, we propose compareXplore, a novel approach that learns to compare hardware designs for effective HLS optimization. compareXplore introduces a hybrid loss function that combines pairwise preference learning with pointwise performance prediction, enabling the model to capture both relative preferences and absolute performance values. Moreover, we introduce a novel Node Difference Attention module that focuses on the most informative differences between designs, enhancing the model’s ability to identify critical pragmas impacting performance. compareXplore adopts a two-stage DSE approach, where a pointwise prediction model is used for the initial design pruning, followed by a pairwise comparison stage for precise performance verification. Experimental results demonstrate that compareXplore achieves significant improvements in ranking metrics and generates high quality HLS results for the selected designs, outperforming the existing state-of-the-art method. 
    more » « less
  4. Gate-exhaustive and cell-aware tests are generated based on input patterns of cells in a design. While the tests provide thorough testing of the cells, the interconnects between them are tested only as input and output lines of cells. This paper defines cell-based faults that allow the interconnects to be tested more thoroughly within a uniform framework that only targets input patterns of cells. In contrast to a real cell that is part of the design, a dummy cell is used for defining interconnect-aware faults. Using a gate-level description of the circuit, a dummy cell contains an interconnect, an output gate of the real cell that drives it, and an input gate of the real cell that it drives. Experimental results for benchmark circuits show that many of the interconnect-aware faults are not detected accidentally by gate-exhaustive tests, and that the quality of the test set is improved by targeting interconnect-aware faults. Here, quality is measured by the numbers of detections of single stuck-at faults in a gate-level representation of the circuit. 
    more » « less
  5. High-level synthesis (HLS) has enabled the rapid development of custom hardware circuits for many software applications. However, developing high-performance hardware circuits using HLS is still a non-trivial task requiring expertise in hardware design. Further, the hardware design space, especially for multi-kernel applications, grows exponentially. Therefore, several HLS automation and abstraction frameworks have been proposed recently, but many issues remain unresolved. These issues include: 1) relying mainly on hardware directives (pragmas) to apply hardware optimizations without exploring loop scheduling opportunities. 2) targeting single-kernel applications only. 3) lacking automatic and/or global design space exploration. 4) missing critical hardware optimizations, such as graph-level pipelining for multi-kernel applications. To address these challenges, we propose a novel methodology and framework on top of the popular multi-level intermediate representation (MLIR) infrastructure called Stream-HLS. Our framework takes a C/C++ or PyTorch software code and automatically generates an optimized dataflow architecture along with host code for field-programmable gate arrays (FPGAs). To achieve this, we developed an accurate analytical performance model for global scheduling and optimization of dataflow architectures. Stream-HLS is evaluated using various standard HLS benchmarks and real-world benchmarks from transformer models, convolution neural networks, and multilayer perceptrons. Stream-HLS designs outperform the designs of prior state-of-the-art automation frameworks and manually-optimized designs of abstraction frameworks by up to 79.43× and 10.62× geometric means respectively. Finally, the Stream-HLS framework is modularized, extensible, and open-sourced at https://github.com/UCLA-VAST/Stream-HLS( https://doi.org/10.5281/zenodo.14585909 ). 
    more » « less