skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Accelerate Scientific Deep Learning Models on Heterogeneous Computing Platform with FPGA
AI and deep learning are experiencing explosive growth in almost every domain involving analysis of big data. Deep learning using Deep Neural Networks (DNNs) has shown great promise for such scientific data analysis applications. However, traditional CPU-based sequential computing without special instructions can no longer meet the requirements of mission-critical applications, which are compute-intensive and require low latency and high throughput. Heterogeneous computing (HGC), with CPUs integrated with GPUs, FPGAs, and other science-targeted accelerators, offers unique capabilities to accelerate DNNs. Collaborating researchers at SHREC1at the University of Florida, CERN Openlab, NERSC2at Lawrence Berkeley National Lab, Dell EMC, and Intel are studying the application of heterogeneous computing (HGC) to scientific problems using DNN models. This paper focuses on the use of FPGAs to accelerate the inferencing stage of the HGC workflow. We present case studies and results in inferencing state-of-the-art DNN models for scientific data analysis, using Intel distribution of OpenVINO, running on an Intel Programmable Acceleration Card (PAC) equipped with an Arria 10 GX FPGA. Using the Intel Deep Learning Acceleration (DLA) development suite to optimize existing FPGA primitives and develop new ones, we were able accelerate the scientific DNN models under study with a speedup from 2.46x to 9.59x for a single Arria 10 FPGA against a single core (single thread) of a server-class Skylake CPU.  more » « less
Award ID(s):
1738420
PAR ID:
10289018
Author(s) / Creator(s):
Date Published:
Journal Name:
EPJ web of conferences
Volume:
245
ISSN:
2100-014X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. AI and deep learning are experiencing explosive growth in almost every domain involving analysis of big data. Deep learning using Deep Neural Networks (DNNs) has shown great promise for such scientific data analysis applications. However, traditional CPU-based sequential computing can no longer meet the requirements of mission-critical applications, which are compute-intensive and require low latency and high throughput. Heterogeneous computing (HGC), with CPUs integrated with accelerators such as GPUs and FPGAs, offers unique capabilities to accelerate DNNs. Collaborating researchers at SHREC\inst{1} at the University of Florida, NERSC\inst{2} at Lawrence Berkeley National Lab, CERN Openlab, Dell EMC, and Intel are studying the application of heterogeneous computing (HGC) to scientific problems using DNN models. This paper focuses on the use of FPGAs to accelerate the inferencing stage of the HGC workflow. We present case studies and results in inferencing state-of-the-art DNN models for scientific data analysis, using Intel distribution of OpenVINO, running on an Intel Programmable Acceleration Card (PAC) equipped with an Arria 10 GX FPGA. Using the Intel Deep Learning Acceleration (DLA) development suite to optimize existing FPGA primitives and develop new ones, we were able accelerate the scientific DNN models under study with a speedup from 3x to 6x for a single Arria 10 FPGA against a single core (single thread) of a server-class Skylake CPU. 
    more » « less
  2. In recent times, AI and deep learning have witnessed explosive growth in almost every subject involving data. Complex data analyses problems that took prolonged periods, or required laborious manual effort, are now being tackled through AI and deep-learning techniques with unprecedented accuracy. Machine learning (ML) using Convolutional Neural Networks (CNNs) has shown great promise for such applications. However, traditional CPU-based sequential computing no longer can meet the requirements of mission-critical applications which are compute-intensive and require low latency. Heterogeneous computing (HGC), with CPUs integrated with accelerators such as GPUs and FPGAs, offers unique capabilities to accelerate CNNs. In this presentation, we will focus on using FPGA-based reconfigurable computing to accelerate various aspects of CNN. We will begin with the current state of the art in using FPGAs for CNN acceleration, followed by the related R&D activities (outlined below) in the SHREC* Center at the University of Florida, based on which we will discuss the opportunities in heterogeneous computing for machine learning. 
    more » « less
  3. FPGAs offer a heterogenous compute solution to the continuous de- sire for increased performance by enabling the creation of application- specific hardware that accelerates computation. While the barrier to entry has historically been steep, advances in High Level Synthe- sis (HLS) are making FPGAs more accessible. Specifically, the Intel FPGA OpenCL SDK allows software designers to abstract away low level details of architecting hardware on an FPGA and allows them to author computational kernels in a higher level language. Furthermore, Intel has developed a system that incorporates both a multicore Xeon CPU and Arria 10 FPGA into the same chip package as part of the Heterogeneous Accelerator Research Program (HARP) that can be targeted by their SDK. In this work, we target the second iteration of the HARP platform (HARPv2) using HLS through porting of OpenCL kernels originally written for FPGAs connected via a PCIe bus. We evaluate the HARPv2 system’s performance against previously reported results, explore the portability of kernels through a hardware design space search, and empirically show the benefits of using the shared virtual memory (SVM) abstraction over explicit reads and writes. 
    more » « less
  4. The wide deployment of Deep Neural Networks (DNN) in high-performance cloud computing platforms brought to light multi-tenant cloud field-programmable gate arrays (FPGA) as a popular choice of accelerator to boost performance due to its hardware reprogramming flexibility. Such a multi-tenant FPGA setup for DNN acceleration potentially exposes DNN interference tasks under severe threat from malicious users. This work, to the best of our knowledge, is the first to explore DNN model vulnerabilities in multi-tenant FPGAs. We propose a novel adversarial attack framework: Deep-Dup, in which the adversarial tenant can inject adversarial faults to the DNN model in the victim tenant of FPGA. Specifically, she can aggressively overload the shared power distribution system of FPGA with malicious power-plundering circuits, achieving adversarial weight duplication (AWD) hardware attack that duplicates certain DNN weight packages during data transmission between off-chip memory and on-chip buffer, to hijack the DNN function of the victim tenant. Further, to identify the most vulnerable DNN weight packages for a given malicious objective, we propose a generic vulnerable weight package searching algorithm, called Progressive Differential Evolution Search (P-DES), which is, for the first time, adaptive to both deep learning white-box and black-box attack models. The proposed Deep-Dup is experimentally validated in a developed multi-tenant FPGA prototype, for two popular deep learning applications, i.e., Object Detection and Image Classification. Successful attacks are demonstrated in six popular DNN architectures (e.g., YOLOv2, ResNet-50, MobileNet, etc.) on three datasets (COCO, CIFAR-10, and ImageNet). 
    more » « less
  5. Field-programmable gate arrays (FPGAs) have largely been used in communication and high-performance computing, and given the recent advances in big data, machine learning and emerging trends in cloud computing (e.g., serverless [1]), FPGAs are increasingly being introduced into these domains (e.g., Microsoft’s datacenters [2] and Amazon Web Services [3]). To address these domains’ processing needs, recent research has focused on using FPGAs to accelerate workloads, ranging from analytics and machine learning to databases and network function virtualization. In this paper, we present a high-performance FPGA-as-a-microservice (FaaM) architecture for the cloud. We discuss some of the technical challenges and propose several solutions for efficiently integrating FPGAs into virtualized environments. Our case study deploying a multi-threaded, multi-user compression as a microservice using FaaM indicates that microservices-based FPGA acceleration can sustain high-performance as compared to a straightforward CPU implementation with minimal to no communication overhead despite the hardware abstraction. 
    more » « less