Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Data transformation tasks are a critical and costly part of many data processing and analytics applications. A simple computing model that can efficiently represent data transformation and be mapped to different platforms can provide programmers with the flexibility o f u sing different data representations and allow for exploiting different platforms, including general-purpose processors and accelerators.We propose extended Deterministic Finite State Transducers (DFST+s), a computing model that enables the compact expression of data transformations (a significantly terser expression compared to the DFSTs model, a traditional computational abstraction for data transformation), aiding their correct and efficient implementation. We define the TF ORM language to facilitate expressing the DFST+, and the TFORM virtual machine to enable a further compact expression, leading to a high performance and portable implementation. We propose two TFORM VM execution models and evaluate them using a variety of data transformations (from Apache Parquet file format and sparse matrices). Our results show both effective portability across CPU and a hardware accelerator, and performance increases of 1.7× and 11.7× geometric mean, respectively, over a custom CPU implementation of the same transformations.more » « less
-
null (Ed.)While FPGAs have been traditionally considered hard to program, recently there have been efforts aimed to allow the use of high-level programming models and libraries intended for multi-core CPUs and GPUs to program FPGAs. For example, both Intel and Xilinx are now providing toolchains to deploy OpenCL code onto FPGA. However, because the nature of the parallelism offered by GPU and FPGA devices is fundamentally different, OpenCL code optimized for GPU can prove very inefficient on FPGA, in terms of both performance and hardware resource utilization. This paper explores this problem on finite automata traversal. In particular, we consider an OpenCL NFA traversal kernel optimized for GPU but exhibiting FPGA-friendly characteristics, namely: limited memory requirements, lack of synchronization, and SIMD execution. We explore a set of structural code changes, custom and best-practice optimizations to retarget this code to FPGA. We showcase the effect of these optimizations on an Intel Stratix V FPGA board using various NFA topologies from different application domains. Our evaluation shows that, while the resource requirements of the original code exceed the capacity of the FPGA in use, our optimizations lead to significant resource savings and allow the transformed code to fit the FPGA for all considered NFA topologies. In addition, our optimizations lead to speedups up to 4x over an already optimized code-variant aimed to fit the NFA traversal kernel on FPGA. Some of the proposed optimizations can be generalized for other applications and introduced in OpenCL-to-FPGA compiler.more » « less
-
In this paper, we study the acceleration of applications that require searching for all occurrences of thousands of string-patterns in an input data-stream, using the Automata Processor (AP). For this purpose, we use two applications from two fields, namely, network security and bioinformatics. The first application, called Fast-SNAP (for Fast-SNort using AP), scans network data for 4312 signatures of intrusion derived from the popular open-source Snort database. Using the resources of a single AP board, Fast-SNAP can scan for all these signatures at 10.3 Gbps. The second application, called PROTOMATA (for PROTein autOMATA), looks for all occurrences of 1308 protein motifs from the PROSITE database in protein sequences. PROTOMATA is up to half a million times faster than its single-CPU-based counterpart. The techniques developed to program these applications may be useful in the design and development of similar applications using this new hardware accelerator.more » « less