NSF PAR Search | NSF Public Access Repository

Scalable Window Generation for the Intel Broadwell+Arria 10 and High-Bandwidth FPGA Systems

https://doi.org/10.1145/3174243.3174262

Stitt, Greg; Gupta, Abhay; Emas, Madison N.; Wilson, David; Baylis, Austin (February 2018, 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’18)

Emerging FPGA systems are providing higher external memory bandwidth to compete with GPU performance. However, because FPGAs often achieve parallelism through deep pipelines, traditional FPGA design strategies do not necessarily scale well to large amounts of replicated pipelines that can take advantage of higher bandwidth. We show that sliding-window applications, an important subset of digital signal processing, demonstrate this scalability problem. We introduce a window generator architecture that enables replication to over 330 GB/s, which is an 8.7x improvement over previous work. We evaluate the window generator on the Intel Broadwell+Arria10 system for 2D convolution and show that for traditional convolution (one filter per image), our approach outperforms a 12-core Xeon Broadwell E5 by 81x and a high-end Nvidia P6000 GPU by an order of magnitude for most input sizes, while improving energy by 15.7x. For convolutional neural nets (CNNs), we show that although the GPU and Xeon typically outperform existing FPGA systems, projected performances of the window generator running on FPGAs with sufficient bandwidth can outperform high-end GPUs for many common CNN parameters.

Full Text Available

Search for: All records