NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FlexCNN: An End-to-End Framework for Composing CNN Accelerators on FPGA

https://doi.org/10.1145/3570928

Basalama, Suhail; Sohrabizadeh, Atefeh; Wang, Jie; Guo, Licheng; Cong, Jason (December 2022, ACM Transactions on Reconfigurable Technology and Systems)

With reduced data reuse and parallelism, recent convolutional neural networks (CNNs) create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable architectures for convolutional layers, but without proper optimizations, their efficiency drops dramatically for reasons: 1) the different dimensions within same-type layers, 2) the different convolution layers especially transposed and dilated convolutions, and 3) CNN’s complex dataflow graph. Furthermore, significant overheads arise when integrating FPGAs into machine learning frameworks. Therefore, we present a flexible, composable architecture called FlexCNN, which delivers high computation efficiency by employing dynamic tiling, layer fusion, and data layout optimizations. Additionally, we implement a novel versatile SA to process normal, transposed, and dilated convolutions efficiently. FlexCNN also uses a fully-pipelined software-hardware integration that alleviates the software overheads. Moreover, with an automated compilation flow, FlexCNN takes a CNN in the ONNX representation, performs a design space exploration, and generates an FPGA accelerator. The framework is tested using three complex CNNs: OpenPose, U-Net, and E-Net. The architecture optimizations achieve 2.3 × performance improvement. Compared to a standard SA, the versatile SA achieves close-to-ideal speedups, with up to 15.98 × and 13.42 × for transposed and dilated convolutions, with a 6% average area overhead. The pipelined integration leads to a 5 × speedup for OpenPose.
more » « less
Full Text Available
AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators

https://doi.org/10.1145/3494534

Sohrabizadeh, Atefeh; Yu, Cody Hao; Gao, Min; Cong, Jason (July 2022, ACM Transactions on Design Automation of Electronic Systems)

Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis (HLS) , accelerator designers still have to manually perform code reconstruction and cumbersome parameter tuning to achieve optimal performance. While many learning models have been leveraged by existing work to automate the design of efficient accelerators, the unpredictability of modern HLS tools becomes a major obstacle for them to maintain high accuracy. To address this problem, we propose an automated DSE framework— AutoDSE —that leverages a bottleneck-guided coordinate optimizer to systematically find a better design point. AutoDSE detects the bottleneck of the design in each step and focuses on high-impact parameters to overcome it. The experimental results show that AutoDSE is able to identify the design point that achieves, on the geometric mean, 19.9× speedup over one CPU core for MachSuite and Rodinia benchmarks. Compared to the manually optimized HLS vision kernels in Xilinx Vitis libraries, AutoDSE can reduce their optimization pragmas by 26.38× while achieving similar performance. With less than one optimization pragma per design on average, we are making progress towards democratizing customizable computing by enabling software programmers to design efficient FPGA accelerators.
more » « less
Full Text Available
End-to-End Optimization of Deep Learning Applications

https://doi.org/10.1145/3373087.3375321

Sohrabizadeh, Atefeh; Wang, Jie; Cong, Jason (February 2020, Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’20), February 23–25, 2020, Seaside, C)

Full Text Available
PolySA: Polyhedral-Based Systolic Array Auto-Compilation

Cong, Jason; Wang, Jie (November 2018, ICCAD -- IEEE/ACM International Conference on Computer-Aided Design)

Full Text Available
Supporting Augmented Reality: Looking Beyond Performance

Soh, Lemuel; Burke, Jeff; Zhang, Lixia (August 2018, ACM SIGCOMM 2018 Workshop on VR/AR Network)

Full Text Available
Security, privacy, and access control in information-centric networking: A survey

Tourani, Reza; Mick, Travis; Misra, Satyajayant; Panwar, Gaurav (August 2018, IEEE Communications surveys and tutorials)

Full Text Available
Mobile Data Repositories at the Edge

Psaras, Ioannis; Ascigil, Onur; Rene, Sergi; Pavlou, George; Afanasyev, Alex; Zhang, Lixia (July 2018, USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18))

Full Text Available
TACTIC: Tag-based Access ConTrol Framework for the Information-Centric Wireless Edge Networks

Tourani, Reza; Misra, Satyajayant; Stubbs, Ray (July 2018, IEEE ICDCS)

Full Text Available
Towards Edge Computing Over Named Data Networking

Mtibaa, Abderrahmen; Tourani, Reza; Misra, Satyajayant; Burke, Jeff; Zhang, Lixia (July 2018, IEEE International Conference on Edge Computing)

Full Text Available
Achieving Resilient Data Availability in Wireless Sensor Networks

Xu, Xin; Zhang, Haitao; Li, Tianxiang; Zhang, Lixia (May 2018, IEEE International Conference on Communications)

Full Text Available

« Prev Next »

Search for: All records