The development of FPGA-based applications using HLS is fraught with performance pitfalls and large design space exploration times. These issues are exacerbated when the application is complicated and its performance is dependent on the input data set, as is often the case with graph neural network approaches to machine learning. Here, we introduce HLPerf, an open-source, simulation-based performance evaluation framework for dataflow architectures that both supports early exploration of the design space and shortens the performance evaluation cycle. We apply the methodology to GNNHLS, an HLS-based graph neural network benchmark containing 6 commonly used graph neural network models and 4 datasets with distinct topologies and scales. The results show that HLPerf achieves over 10 000 × average simulation acceleration relative to RTL simulation and over 400 × acceleration relative to state-of-the-art cycle-accurate tools at the cost of 7% mean error rate relative to actual FPGA implementation performance. This acceleration positions HLPerf as a viable component in the design cycle.
- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
03000010000
- More
- Availability
-
31
- Author / Contributor
- Filter by Author / Creator
-
-
Chamberlain, Roger D. (4)
-
Faber, Clayton J. (4)
-
Cabrera, Anthony M. (3)
-
Booker, Orondé (1)
-
Buhler, Jeremy D. (1)
-
Dwaraki, Abhishek (1)
-
Harris, Steven D. (1)
-
Kodali, Samatha (1)
-
Maayan, Gabe (1)
-
Plano, Tom (1)
-
Xiac, Zhili (1)
-
Xiao, Zhili (1)
-
Zhang, Xuan (1)
-
Zhao, Chenfeng (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available April 2, 2025
-
Faber, Clayton J. ; Harris, Steven D. ; Xiac, Zhili ; Chamberlain, Roger D. ; Cabrera, Anthony M. ( , Proc. of IEEE High-Performance Extreme Computing Conference)
-
Faber, Clayton J. ; Plano, Tom ; Kodali, Samatha ; Xiao, Zhili ; Dwaraki, Abhishek ; Buhler, Jeremy D. ; Chamberlain, Roger D. ; Cabrera, Anthony M. ( , Proc. of IEEE/ACM Workshop on Redefining Scalability for Diversely Heterogeneous Architectures (RSDHA))
-
Faber, Clayton J. ; Cabrera, Anthony M. ; Booker, Orondé ; Maayan, Gabe ; Chamberlain, Roger D. ( , Proc. of 7th International Workshop on OpenCL)In the era of big data, many new algorithms are developed to try and find the most efficient way to perform computations with massive amounts of data. However, what is often overlooked is the preprocessing step for many of these applications. The Data Integration Benchmark Suite (DIBS) was designed to understand the characteristics of dataset transformations in a hardware agnostic way. While on the surface these applications have a high amount of data parallelism, there are caveats in their specification that can potentially affect this characteristic. Even still, OpenCL can be an effective deployment environment for these applications. In this work we take a subset of the data transformations from each category presented in DIBS and implement them in OpenCL to evaluate their performance for heterogeneous systems. For targeting heterogeneous systems, we take a common application and attempt to deploy it to three platforms targetable by OpenCL (CPU, GPU, and FPGA). The applications are evaluated by their average transformation data rate. We illustrate the advantages of each compute device in the data integration space along with different communications schemes allowed for host/device communication in the OpenCL platform.more » « less