To Ship or Not to (Function) Ship

Liu, Feilong; Kamat, Niranjan; Blanas, Spyros; Nandi, Arnab

Citation Details

Sampling is often used to reduce query latency for interactive big data analytics. The established parallel data processing paradigm relies on function shipping, where a coordinator dispatches queries to worker nodes and then collects the results. The commoditization of high-performance networking makes data shipping possible, where the coordinator directly reads data in the workers’ memory using RDMA while workers process other queries. In this work, we explore when to use function shipping or data shipping for interactive query processing with sampling. Whether function shipping or data shipping should be preferred depends on the amount of data transferred, the current CPU utilization and the sampling method. The results show that data shipping is up to 6.5× faster when performing clustered sampling with heavily-utilized workers. more »

Award ID(s):: 1816577

PAR ID:: 10104373

Author(s) / Creator(s):: Liu, Feilong; Kamat, Niranjan; Blanas, Spyros; Nandi, Arnab

Date Published:: 2018-09-25

Journal Name:: 2018 IEEE High Performance Extreme Computing Conference (HPEC '18)

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
The DOI is not currently available.

More Like this