Data Transformation Acceleration using Deterministic Finite-State Transducers

Nourian, Marziyeh; Nguyen, Tri; Chien, Andrew A.; Becchi, Michela

doi:10.1109/BigData55660.2022.10020756

Citation Details

Data Transformation Acceleration using Deterministic Finite-State Transducers

Data transformation tasks are a critical and costly part of many data processing and analytics applications. A simple computing model that can efficiently represent data transformation and be mapped to different platforms can provide programmers with the flexibility o f u sing different data representations and allow for exploiting different platforms, including general-purpose processors and accelerators.We propose extended Deterministic Finite State Transducers (DFST+s), a computing model that enables the compact expression of data transformations (a significantly terser expression compared to the DFSTs model, a traditional computational abstraction for data transformation), aiding their correct and efficient implementation. We define the TF ORM language to facilitate expressing the DFST+, and the TFORM virtual machine to enable a further compact expression, leading to a high performance and portable implementation. We propose two TFORM VM execution models and evaluate them using a variety of data transformations (from Apache Parquet file format and sparse matrices). Our results show both effective portability across CPU and a hardware accelerator, and performance increases of 1.7× and 11.7× geometric mean, respectively, over a custom CPU implementation of the same transformations. more »

Award ID(s):: 1812727 1907863

NSF-PAR ID:: 10430773

Author(s) / Creator(s):: Nourian, Marziyeh; Nguyen, Tri; Chien, Andrew A.; Becchi, Michela

Date Published:: 2022-12-17

Journal Name:: 2022 IEEE International Conference on Big Data (Big Data)

Page Range / eLocation ID:: 141 to 150

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/BigData55660.2022.10020756

More Like this