Flexible rule-based decomposition and metadata independence in modin: a parallel dataframe system

Petersohn, Devin; Tang, Dixin; Durrani, Rehan; Melik-Adamyan, Areg; Gonzalez, Joseph E.; Joseph, Anthony D.; Parameswaran, Aditya G.

doi:10.14778/3494124.3494152

Citation Details

Flexible rule-based decomposition and metadata independence in modin: a parallel dataframe system

Dataframes have become universally popular as a means to represent data in various stages of structure, and manipulate it using a rich set of operators---thereby becoming an essential tool in the data scientists' toolbox. However, dataframe systems, such as pandas, scale poorly---and are non-interactive on moderate to large datasets. We discuss our experiences developing Modin, our first cut at a parallel dataframe system, which already has users across several industries and over 1M downloads. Modin translates pandas functions into a core set of operators that are individually parallelized via columnar, row-wise, or cell-wise decomposition rules that we formalize in this paper. We also introduce metadata independence to allow metadata---such as order and type---to be decoupled from the physical representation and maintained lazily. Using rule-based decomposition and metadata independence, along with careful engineering, Modin is able to support pandas operations across both rows and columns on very large dataframes---unlike Koalas and Dask DataFrames that either break down or are unable to support such operations, while also being much faster than pandas. more »

Award ID(s):: 1940757

PAR ID:: 10324483

Author(s) / Creator(s):: Petersohn, Devin; Tang, Dixin; Durrani, Rehan; Melik-Adamyan, Areg; Gonzalez, Joseph E.; Joseph, Anthony D.; Parameswaran, Aditya G.

Date Published:: 2021-11-01

Journal Name:: Proceedings of the VLDB Endowment

Volume:: 15

Issue:: 3

ISSN:: 2150-8097

Page Range / eLocation ID:: 739 to 751

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.14778/3494124.3494152

More Like this