skip to main content


This content will become publicly available on September 30, 2024

Title: AutoScaleDSE: A Scalable Design Space Exploration Engine for High-Level Synthesis

High-Level Synthesis (HLS) has enabled users to rapidly develop designs targeted for FPGAs from the behavioral description of the design. However, to synthesize an optimal design capable of taking better advantage of the target FPGA, a considerable amount of effort is needed to transform the initial behavioral description into a form that can capture the desired level of parallelism. Thus, a design space exploration (DSE) engine capable of optimizing large complex designs is needed to achieve this goal. We present a new DSE engine capable of considering code transformation, compiler directives (pragmas), and the compatibility of these optimizations. To accomplish this, we initially express the structure of the input code as a graph to guide the exploration process. To appropriately transform the code, we take advantage of ScaleHLS based on the multi-level compiler infrastructure (MLIR). Finally, we identify problems that limit the scalability of existing DSEs, which we name the “design space merging problem.” We address this issue by employing a Random Forest classifier that can successfully decrease the number of invalid design points without invoking the HLS compiler as a validation tool. We evaluated our DSE engine against the ScaleHLS DSE, outperforming it by a maximum of 59×. We additionally demonstrate the scalability of our design by applying our DSE to large-scale HLS designs, achieving a maximum speedup of 12× for the benchmarks in the MachSuite and Rodinia set.

 
more » « less
Award ID(s):
2117997
NSF-PAR ID:
10477702
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
https://dl.acm.org/doi/10.1145/3572959
Date Published:
Journal Name:
ACM Transactions on Reconfigurable Technology and Systems
Volume:
16
Issue:
3
ISSN:
1936-7406
Page Range / eLocation ID:
1 to 30
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    High-level synthesis (HLS) raises the level of design abstraction, expedites the process of hardware design, and enriches the set of final designs by automatically translating a behavioral specification into a hardware implementation. To obtain different implementations, HLS users can apply a variety of knobs, such as loop unrolling or function inlining, to particular code regions of the specification. The applied knob configuration significantly affects the synthesized design's performance and cost, e.g., application latency and area utilization. Hence, HLS users face the design-space exploration (DSE) problem, i.e. determine which knob configurations result in Pareto-optimal implementations in this multi-objective space. Whereas it can be costly in time and resources to run HLS flows with an enormous number of knob configurations, machine learning approaches can be employed to predict the performance and cost. Still, they require a sufficient number of sample HLS runs. To enhance the training performance and reduce the sample complexity, we propose a transfer learning approach that reuses the knowledge obtained from previously explored design spaces in exploring a new target design space. We develop a novel neural network model for mixed-sharing multi-domain transfer learning. Experimental results demonstrate that the proposed model outperforms both single-domain and hard-sharing models in predicting the performance and cost at early stages of HLS-driven DSE. 
    more » « less
  2. This paper presents an enhanced version of a scalable HLS (High-Level Synthesis) framework named ScaleHLS, which can compile HLS C/C++ programs and PyTorch models to highly-efficient and synthesizable C++ designs. The original version of ScaleHLS achieved significant speedup on both C/C++ kernels and PyTorch models [14]. In this paper, we first highlight the key features of ScaleHLS on tackling the challenges present in the representation, optimization, and exploration of large-scale HLS designs. To further improve the scalability of ScaleHLS, we then propose an enhanced HLS transform and analysis library supported in both C++ and Python, and a new design space exploration algorithm to handle HLS designs with hierarchical structures more effectively. Comparing to the original ScaleHLS, our enhanced version improves the speedup by up to 60.9× on FPGAs. ScaleHLS is fully open-sourced at https://github.com/hanchenye/scalehls. 
    more » « less
  3. Due to the globalization of Integrated Circuit supply chain, hardware Trojans and the attacks that can trigger them have become an important security issue. One type of hardware Trojans leverages the “don’t care transitions” in Finite-state Machines (FSMs) of hardware designs. In this article, we present a symbolic approach to detecting don’t care transitions and the hidden Trojans. Our detection approach works at both register-transfer level (RTL) and gate level, does not require a golden design, and works in three stages. In the first stage, it explores the reachable states. In the second stage, it performs an approximate analysis to find the don’t care transitions and any discrepancies in the register values or output lines due to don’t care transitions. The second stage can be used for both predicting don’t care triggered Trojans and for guiding don’t care aware reachability analysis. In the third stage, it performs a state-space exploration from reachable states that have incoming don’t care transitions to explore the Trojan payload and to find behavioral discrepancies with respect to what has been observed in the first stage. We also present a pruning technique based on the reachability of FSM states. We present a methodology that leverages both RTL and gate-level for soundness and efficiency. Specifically, we show that don’t care transitions and Trojans that leverage them must be detected at the gate-level, i.e., after synthesis has been performed, for soundness. However, under specific conditions, Trojan payload exploration can be performed more efficiently at RTL. Additionally, the modular design of our approach also provides a fast Trojan prediction method even at the gate level when the reachable states of the FSM is known a priori . Evaluation of our approach on a set of benchmarks from OpenCores and TrustHub and using gate-level representation generated by two synthesis tools, YOSYS and Synopsis Design Compiler (SDC), shows that our approach is both efficient (up to 10× speedup w.r.t. no pruning) and precise (0% false positives both at RTL and gate-level netlist) in detecting don’t care transitions and the Trojans that leverage them. Additionally, the total analysis time can achieve up to 1.62× (using YOSYS) and 1.92× (using SDC) speedup when synthesis preserves the FSM structure, the foundry is trusted, and the Trojan detection is performed at RTL. 
    more » « less
  4. Traditionally, FPGA programming has been done via a hardware description language (HDL). An HDL provides fine-grained control over reconfigurable hardware but with limited productivity due to a steep learning curve and tedious design cycle. Thus, high-level synthesis (HLS) approaches have been a significant boon to productivity, and in recent years, OpenCL has emerged as a vendor-agnostic HLS language that offers the added benefit of interoperation with other OpenCL platforms (e.g., CPU, GPU, DSP) and existing OpenCL software. However, OpenCL's productivity can also suffer from tedious boilerplate code and the need to manually coordinate the host (i.e., CPU) and device (i.e., FPGA or other device). So, we present MetaCL, a compiler-assisted interface that takes OpenCL kernel functions as input and automatically generates OpenCL host-side code as output. MetaCL produces more efficient and readable host-side code, ensures portability, and introduces minimal additional runtime overhead compared to unassisted OpenCL development. 
    more » « less
  5. Designs generated by high-level synthesis (HLS) tools typically achieve a lower frequency compared to manual RTL designs. In this work, we study the timing issues in a diverse set of realistic and complex FPGA HLS designs. (1) We observe that in almost all cases the frequency degradation is caused by the broadcast structures generated by the HLS compiler. (2)We classify three major types of broadcasts in HLS-generated designs, including high-fanout data signals, pipeline flow control signals and synchronization signals for concurrent modules. (3) We reveal a number of limitations of the current HLS tools that result in those broadcast-related timing issues. (4) We propose a set of effective yet easy-to-implement approaches, including broadcast-aware scheduling, synchronization pruning, and skid-buffer-based flow control. Our experimental results show that our methods can improve the maximum frequency of a set of nine representative HLS benchmarks by 53% on average. In some cases, the frequency gain is more than 100 MHz. 
    more » « less