Scalable Window Generation for the Intel Broadwell+Arria 10 and High-Bandwidth FPGA Systems

Stitt, Greg; Gupta, Abhay; Emas, Madison N.; Wilson, David; Baylis, Austin

doi:10.1145/3174243.3174262

Citation Details

Scalable Window Generation for the Intel Broadwell+Arria 10 and High-Bandwidth FPGA Systems

Emerging FPGA systems are providing higher external memory bandwidth to compete with GPU performance. However, because FPGAs often achieve parallelism through deep pipelines, traditional FPGA design strategies do not necessarily scale well to large amounts of replicated pipelines that can take advantage of higher bandwidth. We show that sliding-window applications, an important subset of digital signal processing, demonstrate this scalability problem. We introduce a window generator architecture that enables replication to over 330 GB/s, which is an 8.7x improvement over previous work. We evaluate the window generator on the Intel Broadwell+Arria10 system for 2D convolution and show that for traditional convolution (one filter per image), our approach outperforms a 12-core Xeon Broadwell E5 by 81x and a high-end Nvidia P6000 GPU by an order of magnitude for most input sizes, while improving energy by 15.7x. For convolutional neural nets (CNNs), we show that although the GPU and Xeon typically outperform existing FPGA systems, projected performances of the window generator running on FPGAs with sufficient bandwidth can outperform high-end GPUs for many common CNN parameters. more »

Award ID(s):: 1738420

PAR ID:: 10073108

Author(s) / Creator(s):: Stitt, Greg; Gupta, Abhay; Emas, Madison N.; Wilson, David; Baylis, Austin

Date Published:: 2018-02-27

Journal Name:: 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’18

Page Range / eLocation ID:: 173 to 182

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3174243.3174262

More Like this