PolyFrame: a retargetable query-based approach to scaling dataframes

Sinthong, Phanwadee; Carey, Michael J.

doi:10.14778/3476249.3476281

Citation Details

PolyFrame: a retargetable query-based approach to scaling dataframes

In the last few years, the field of data science has been growing rapidly as various businesses have adopted statistical and machine learning techniques to empower their decision-making and applications. Scaling data analyses to large volumes of data requires the utilization of distributed frameworks. This can lead to serious technical challenges for data analysts and reduce their productivity. AFrame, a data analytics library, is implemented as a layer on top of Apache AsterixDB, addressing these issues by providing the data scientists' familiar interface, Pandas Dataframe, and transparently scaling out the evaluation of analytical operations through a Big Data management system. While AFrame is able to leverage data management facilities (e.g., indexes and query optimization) and allows users to interact with a large volume of data, the initial version only generated SQL++ queries and only operated against AsterixDB. In this work, we describe a new design that retargets AFrame's incremental query formation to other query-based database systems, making it more flexible for deployment against other data management systems with composable query languages. more »

Award ID(s):: 1954962

PAR ID:: 10300517

Author(s) / Creator(s):: Sinthong, Phanwadee; Carey, Michael J.

Date Published:: 2021-07-01

Journal Name:: Proceedings of the VLDB Endowment

Volume:: 14

Issue:: 11

ISSN:: 2150-8097

Page Range / eLocation ID:: 2296 to 2304

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.14778/3476249.3476281

More Like this