skip to main content

Title: High-Level Synthesis of Irregular Applications: A Case Study on Influence Maximization
FPGAs are promising platforms for accelerating irregular applications due to their ability to implement highly specialized hardware designs for each kernel. However, the design and implementation of FPGA-accelerated kernels can take several months using hardware design languages. High Level Synthesis (HLS) tools provide fast, high quality results for regular applications, but lack the support to effectively accelerate more irregular, complex workloads. This work analyzes the challenges and benefits of using a commercial state-of-the-art HLS tool and its available optimizations to accelerate graph sampling. We evaluate the resulting designs and their effectiveness when deployed in a state-of-the-art heterogeneous framework that implements the Influence Maximization with Martingales (IMM) algorithm, a complex graph analytics algorithm. We discuss future opportunities for improvement in hardware, HLS tools, and hardware/software co-design methodology to better support complex irregular applications such as IMM.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
CF '23: Proceedings of the 20th ACM International Conference on Computing Frontiers
Page Range / eLocation ID:
12 to 22
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Field-programmable gate arrays (FPGAs) provide an opportunity to co-design applications with hardware accelerators, yet they remain difficult to program. High-level synthesis (HLS) tools promise to raise the level of abstraction by compiling C or C++ to accelerator designs. Repurposing legacy software languages, however, requires complex heuristics to map imperative code onto hardware structures. We find that the black-box heuristics in HLS can be unpredictable: changing parameters in the program that should improve performance can counterintuitively yield slower and larger designs. This paper proposes a type system that restricts HLS to programs that can predictably compile to hardware accelerators. The key idea is to model consumable hardware resources with a time-sensitive affine type system that prevents simultaneous uses of the same hardware structure. We implement the type system in Dahlia, a language that compiles to HLS C++, and show that it can reduce the size of HLS parameter spaces while accepting Pareto-optimal designs. 
    more » « less
  2. null (Ed.)
    We present Calyx, a new intermediate language (IL) for compiling high-level programs into hardware designs. Calyx combines a hardware-like structural language with a software-like control flow representation with loops and conditionals. This split representation enables a new class of hardware-focused optimizations that require both structural and control flow information which are crucial for high-level programming models for hardware design. The Calyx compiler lowers control flow constructs using finite-state machines and generates synthesizable hardware descriptions. We have implemented Calyx in an optimizing compiler that translates high-level programs to hardware. We demonstrate Calyx using two DSL-to-RTL compilers, a systolic array generator and one for a recent imperative accelerator language, and compare them to equivalent designs generated using high-level synthesis (HLS). The systolic arrays are 4.6× faster and 1.11× larger on average than HLS implementations, and the HLS-like imperative language compiler is within a few factors of a highly optimized commercial HLS toolchain. We also describe three optimizations implemented in the Calyx compiler. 
    more » « less
  3. Breath-first search (BFS) is a fundamental building block in many graph-based applications. It is challenging to optimize due to its irregular memory-access pattern. Prior work, based on hardware description languages (HDLs) and high-level synthesis (HLS), address the memory-access bottleneck by using techniques such as edge-centric traversal, data alignment, and compute-unit (CU) replication. While these optimizations work well for dense graph datasets, optimizing BFS on sparse graphs remains a significant challenge due to the kernel launch overhead and poor workload distribution across processing elements. As a complement to the prior work, we present and evaluate optimizations in OpenCL for BFS on sparse graphs. Specifically, we explore application-specific and architecture-aware optimizations aimed at mitigating the irregular global-memory access bottleneck in sparse graphs. In our kernel design, we consider factors such as choice of data structure between queue and array, number of memory banks, and kernel launch configuration. We evaluate the impact of proposed optimizations on a diverse set of sparse graphs. In comparison with the state-of-the-art OpenCL implementation for FPGA, we achieve 5.7x-22.3x speedup on Stratix 10 SX 2800 FPGA for the graphs that are most sensitive to our optimization scheme. 
    more » « less
  4. The development of FPGA-based applications using HLS is fraught with performance pitfalls and large design space exploration times. These issues are exacerbated when the application is complicated and its performance is dependent on the input data set, as is often the case with graph neural network approaches to machine learning. Here, we introduce HLPerf, an open-source, simulation-based performance evaluation framework for dataflow architectures that both supports early exploration of the design space and shortens the performance evaluation cycle. We apply the methodology to GNNHLS, an HLS-based graph neural network benchmark containing 6 commonly used graph neural network models and 4 datasets with distinct topologies and scales. The results show that HLPerf achieves over 10 000 × average simulation acceleration relative to RTL simulation and over 400 × acceleration relative to state-of-the-art cycle-accurate tools at the cost of 7% mean error rate relative to actual FPGA implementation performance. This acceleration positions HLPerf as a viable component in the design cycle.

    more » « less
  5. With the ever-growing popularity of Graph Neural Networks (GNNs), efficient GNN inference is gaining tremendous attention. Field-Programmable Gate Arrays (FPGAs) are a promising execution platform due to their fine-grained parallelism, low power consumption, reconfigurability, and concurrent execution. Even better, High-Level Synthesis (HLS) tools help bridge the gap between the non-trivial FPGA development efforts and rapid emergence of new GNN models. To enable investigation into how effectively modern HLS tools can accelerate GNN inference, we present GNNHLS, a benchmark suite containing a software stack for data generation and baseline deployment and FPGA implementations of 6 well-tuned GNN HLS kernels. 
    more » « less