Topology-aware Parallel Data Processing: Models, Algorithms and Systems at Scale

Blanas, Spyros; Koutris, Paraschos; Sidiropoulos, Anastasios

Citation Details

The analysis of massive datasets requires a large number of processors. Prior research has largely assumed that tracking the actual data distribution and the underlying network structure of a cluster, which we collectively refer to as the topology, comes with a high cost and has little practical benefit. As a result, theoretical models, algorithms and systems often assume a uniform topology; however this assumption rarely holds in practice. This necessitates an end-to-end investigation of how one can model, design and deploy topology-aware algorithms for fundamental data processing tasks at large scale. To achieve this goal, we first develop a theoretical parallel model that can jointly capture the cost of computation and communication. Using this model, we explore algorithms with theoretical guarantees for three basic tasks: aggregation, join, and sorting. Finally, we consider the practical aspects of implementing topology-aware algorithms at scale, and show that they have the potential to be orders of magnitude faster than their topology-oblivious counterparts. more »

Award ID(s):: 1816577

PAR ID:: 10178062

Author(s) / Creator(s):: Blanas, Spyros; Koutris, Paraschos; Sidiropoulos, Anastasios

Date Published:: 2020-01-12

Journal Name:: 10th Annual Conference on Innovative Data Systems Research (CIDR ‘20)

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this